continuous batching Archives

Continuous batching for LLM inference on GPUs

Continuous Batching: Why 60% of Your GPU Sits Idle

June 26, 2026 0 Comments

Naive static batching leaves roughly 60% of an H100 GPU idle during LLM serving, because finished requests hold their slots until the slowest sequence in the batch completes. Continuous batching — iteration-level scheduling introduced in the Orca paper and now the default in vLLM, TensorRT-LLM and TGI — fixes this …

Editorial team