A 70B-parameter model in FP16 burns roughly 140 GB of VRAM just to hold its weights. Compress those weights to 4-bit integers and the footprint collapses to about 35 GB — small enough to fit on a single 80 GB GPU with room left for the KV cache. That fourfold …