Speculative Decoding Cuts MoE Inference Cost by 19%

19% Cheaper Inference, Zero Accuracy Loss Red Hat benchmarked gpt-oss-120B with Eagle3 speculative decoding on vLLM v0.13.0 and measured a 19.4% reduction in cost per 1M output tokens on H200 GPUs running SWE-bench workloads. Not a benchmark trick — production throughput at 200 concurrent requests, with output distributions mathematically identical …