LLM Inference Nondeterminism: Why Temperature 0 Fails You

LLM inference nondeterminism means identical prompts can return different outputs even at temperature 0, because dynamic batching changes the order of floating-point reductions inside GPU kernels — a property called batch invariance. Thinking Machines Lab measured 80 distinct completions from 1,000 identical requests on Qwen3-235B, and researchers documented up to …