Continuous Batching: Why 60% of Your GPU Sits Idle

Naive static batching leaves roughly 60% of an H100 GPU idle during LLM serving, because finished requests hold their slots until the slowest sequence in the batch completes. Continuous batching — iteration-level scheduling introduced in the Orca paper and now the default in vLLM, TensorRT-LLM and TGI — fixes this …