Google TPU v8 Puts KV Cache on Silicon to Cut Inference Cost

Google Put KV Cache on Silicon Google’s TPU 8i triples on-chip SRAM to 384 MB and crams 288 GB of HBM onto a single chip — enough to host massive KV caches entirely in silicon, bypassing the memory wall that has bottlenecked LLM inference since the transformer era began. The …