Google Just Replaced Vertex AI With Gemini Enterprise Agent Platform — Here Is What Changes for Your Infrastructure
At Google Cloud Next ’26, Google made a move that rewrites the infrastructure planning of every team running production AI workloads on its cloud. Gemini Enterprise Agent Platform is no longer a roadmap item or a rebrand. It replaces the standalone Vertex AI roadmap entirely. Every future Vertex AI service evolution will ship through Agent Platform, and the implications touch model selection, agent lifecycle management, networking, silicon, security governance, and cost allocation.
This is not a UI refresh. Google is restructuring how teams build, deploy, govern, and observe agents at scale — and backing it with two new custom TPU chips (TPU 8t and TPU 8i) and a megascale data center fabric called Virgo Network. If you are running inference on GCP today or planning a migration, the decisions you make in the next quarter will look very different from a year ago.
This article breaks down what actually shipped, what the technical trade-offs are, and how to evaluate whether the new platform fits your deployment reality.
What Gemini Enterprise Agent Platform Actually Is
The platform organizes agent development around four operational pillars: build, scale, govern, and optimize. Under the hood, it consolidates what was previously scattered across Vertex AI, Model Garden, and various disconnected tooling into a single control plane.
Model Garden now provides first-class access to over 200 models. Google’s own Gemini 3.1 Pro, Gemini 3.1 Flash Image, Lyria 3, and Gemma 4 sit alongside third-party options including Anthropic’s Claude Opus, Sonnet, and Haiku. The practical implication: you no longer need to manage separate endpoints or provisioning pipelines for multi-model architectures. A single gateway handles routing, identity, and policy enforcement.
New components include:
- Agent Studio — a low-code visual interface for agent construction
- Agent Development Kit (ADK) — a code-first framework with graph-based sub-agent orchestration, now processing over six trillion tokens monthly
- Agent Runtime — supports long-running agents maintaining state for days, backed by Memory Bank for persistent context
- Agent Gateway — unified connectivity layer that enforces Model Armor protections against prompt injection and data leakage
- Agent Identity and Agent Registry — centralized tracking of every agent’s identity, tools, and approved assets
- Agent Simulation and Agent Evaluation — testing against synthetic multi-turn interactions and continuous live-traffic scoring
- Agent Observability — full execution traces with real-time reasoning visibility
For infrastructure teams, the critical shift is this: agents are now managed enterprise workloads with identity, policy enforcement, runtime controls, and observability. Not one-off AI applications sitting outside your governance perimeter.
TPU 8t and TPU 8i: Two Chips, Two Different Problems
Google split its eighth-generation TPU into two purpose-built architectures. The decision reflects a hard-earned lesson: training and inference have fundamentally different hardware profiles, and optimizing one chip for both compromises both.
TPU 8t — Training
A single TPU 8t superpod scales to 9,600 chips with 2 petabytes of shared high-bandwidth memory, delivering 121 ExaFlops of compute. That is nearly three times the compute performance per pod compared with the previous generation (Ironwood). Storage access is 10 times faster, and the chip integrates TPUDirect to pull data straight into the TPU without host CPU involvement.
The reliability story matters as much as the throughput. Google targets over 97% “goodput” — the percentage of time the cluster is doing useful training work rather than recovering from failures. Real-time telemetry across tens of thousands of chips, automatic rerouting around faulty interconnect links without interrupting jobs, and Optical Circuit Switching that reconfigures hardware around failures with no human intervention all contribute to that figure.
TPU 8i — Inference
TPU 8i pairs 288 GB of high-bandwidth memory with 384 MB of on-chip SRAM — three times more on-chip SRAM than the previous generation. The design goal is to keep a model’s active working set entirely on-chip, eliminating the “memory wall” that leaves processors idle while waiting for data. For Mixture of Expert (MoE) models, interconnect bandwidth doubles to 19.2 Tb/s. A new Collectives Acceleration Engine reduces on-chip latency by up to five times.
Google claims 80% better performance-per-dollar compared with the previous generation. Both chips run on Google’s Axion Arm-based CPU host, support JAX, MaxText, PyTorch, SGLang, and vLLM natively, and offer bare-metal access.
Performance Comparison
| Metric | TPU 8t (Training) | TPU 8i (Inference) |
|---|---|---|
| Primary Workload | Large-scale model training | Low-latency inference / agentic serving |
| Max Chips per Superpod | 9,600 | N/A (optimized for serving density) |
| Shared HBM | 2 PB per superpod | 288 GB HBM + 384 MB on-chip SRAM |
| Compute | 121 ExaFlops | Optimized for throughput-per-dollar |
| ICI Bandwidth | 2x previous generation | 19.2 Tb/s (2x previous gen) |
| Key Innovation | Near-linear scaling to 1M chips | 3x on-chip SRAM, 5x lower on-chip latency |
| Perf/$ Improvement | ~3x compute/pod | 80% better perf/$ |
| Host CPU | Axion ARM | Axion ARM (2x physical CPU hosts) |
Virgo Network: Why Your Network Fabric Is Now an AI Infrastructure Component
Virgo Network is Google’s new megascale data center fabric, purpose-built for AI workloads. It represents a departure from treating networking as a generic data center layer and instead positions it as a first-order AI infrastructure component.
The architecture separates network functions into three layers:
- Scale-up domain — high-bandwidth, low-latency interconnect for accelerator communication within a single pod
- Scale-out accelerator fabric (east-west) — dedicated RDMA fabric for horizontal scale across pods, engineered for deterministic latency and maximum goodput
- Jupiter front-end network (north-south) — access to distributed storage and general-purpose compute, preventing data access bottlenecks
The numbers: Virgo Network links 134,000 TPU 8t chips with up to 47 petabits per second of non-blocking bi-sectional bandwidth in a single fabric. Bandwidth per TPU 8t accelerator is four times the previous generation, with 40% lower unloaded fabric latency.
For reliability, Virgo uses independent switching planes for fault isolation, sub-millisecond telemetry for congestion detection, and automated straggler and hang detection. The design philosophy: at hundreds of thousands of chips, hardware failures are inevitable, so the network must absorb them without degrading the entire cluster.
What the Agent Platform Means for Your Deployment Architecture
The shift from Vertex AI to Agent Platform changes three things for teams running production AI on GCP:
1. Agent lifecycle becomes a managed workflow. Previously, building an agent meant assembling model endpoints, orchestration logic, memory management, and monitoring from separate Vertex AI services. Agent Platform consolidates these into a single lifecycle: build in Agent Studio or ADK, deploy through Agent Runtime, govern via Agent Identity and Gateway, optimize through Simulation and Evaluation.
2. Multi-model routing is now a platform capability. With 200+ models accessible through a single gateway, you can route different agent tasks to different models without maintaining separate provisioning pipelines. Claude Opus for complex reasoning, Gemini 3.1 Flash for fast classification, Gemma 4 for on-prem or edge deployments — all through the same control plane.
3. Security governance extends to agents as first-class entities. Agent Identity, Agent Registry, and Agent Gateway provide the infrastructure to enforce access policies, audit agent behavior, and detect anomalies at the agent level rather than at the model endpoint level. Agent Anomaly Detection uses statistical models and an LLM-as-a-judge framework to flag unusual reasoning patterns. A new Agent Security dashboard unifies threat detection and risk analysis through Security Command Center.
Agentic Data Cloud: The Data Layer You Need Before Deploying Agents
Agents without governed data access are expensive toys. Google’s Agentic Data Cloud announcement addresses this directly with three components:
- Knowledge Catalog — evolved from Dataplex Universal Catalog, it maps business meaning across your enterprise data estate. It aggregates context across Google Cloud and partner data platforms, enriches data using usage logs and profiling, and supports access-control-aware search so agents retrieve only authorized assets.
- Data Agent Kit — a portable suite of skills, tools, and plugins for VS Code, Gemini CLI, Codex, and Claude Code. Includes Data Engineering Agent (GA), Data Science Agent (GA), and Database Observability Agent (preview).
- Cross-cloud lakehouse — Cross-Cloud Interconnect integration, Apache Iceberg REST Catalog, bi-directional federation (preview), Spanner Omni (preview), and Lakehouse federation for AlloyDB (preview). Lightning Engine for Apache Spark delivers 2x price-performance over proprietary alternatives.
The operational takeaway: before deploying agents at scale, you need a data governance layer that understands business context and enforces access policies. Knowledge Catalog is Google’s answer to that requirement. Without it, agents either hallucinate from ungoverned data or get locked out of the context they need to be useful.
GKE Agent Sandbox and Cross-Cloud Compute Updates
For teams running AI agents on Kubernetes, GKE Agent Sandbox is the most directly relevant infrastructure announcement. It uses trusted gVisor isolation, launches up to 300 sandboxes per second per cluster, and delivers up to 30% better price-performance than competitors when running on Google Axion N4A instances.
The cross-cloud infrastructure updates also include:
- C4N VMs — processing up to 95 million packets per second, targeted at security appliances, streaming media, and open-source databases
- M4N with Hyperdisk Extreme — 26.57 GB RAM per vCPU, targeting large data I/O from agents and analytics. Google claims over 20% Oracle workload TCO reduction compared with leading hyperscale clouds
- Cloud Network Insights — hybrid and multicloud observability
- Enhanced Cloud Next Generation Firewall and Cloud Armor
- Gemini on Google Distributed Cloud — for sovereign and edge deployments
Implementation Playbook: Evaluating Agent Platform for Your Stack
Here is a practical evaluation sequence for teams considering the migration from Vertex AI to Agent Platform:
- Audit your current Vertex AI usage. Catalog every model endpoint, pipeline, and custom deployment. Identify which services map to Agent Platform components and which will require migration paths.
- Map agent workloads to the four pillars. For each agent, determine: build requirements (Studio vs ADK), scaling needs (stateful vs stateless, session duration), governance requirements (data sensitivity, compliance boundaries), and optimization metrics (latency targets, accuracy thresholds).
- Evaluate multi-model routing. If you are currently managing separate endpoints for different models, quantify the operational overhead. Agent Platform’s unified gateway may reduce provisioning complexity, but you need to benchmark latency and routing overhead for your specific traffic patterns.
- Test Agent Runtime with your longest-running workloads. If your agents maintain state across multi-day sessions, validate that Memory Bank performance and reliability meet your requirements under production load.
- Implement Agent Identity and Gateway before scaling. Do not wait until you have hundreds of agents in production to establish governance. Deploy Agent Registry and Agent Gateway early, even if your initial agent count is small. Retroactive governance is always more expensive than proactive governance.
- Set up Agent Evaluation and Observability from day one. Continuous scoring against live traffic is the only reliable way to catch degradation in agent reasoning quality. Configure multi-turn autoraters and execution trace logging before you scale.
- Plan your silicon strategy. If you are running training workloads, evaluate TPU 8t availability windows and pricing. If inference is your bottleneck, benchmark TPU 8i against your current GPU-based serving infrastructure. The 80% performance-per-dollar improvement claim needs validation against your specific model architecture and traffic patterns.
- Align your data layer with Agentic Data Cloud. Deploy Knowledge Catalog to map your data estate. If agents need access to sensitive data, configure access-control-aware search before connecting agents to data sources.
Cost and Operational Trade-Offs
The headline numbers from Google are compelling — 80% better inference performance-per-dollar, 3x training compute per pod, 30% better GKE Agent Sandbox price-performance. But the real cost picture depends on your specific deployment pattern.
Where the savings are real:
- Consolidated model routing through Agent Gateway eliminates the need for separate endpoint management and custom load balancing
- TPU 8i’s on-chip SRAM reduces memory bandwidth bottlenecks that force over-provisioning
- GKE Agent Sandbox’s high sandbox density (300/sec/cluster) reduces per-agent isolation overhead
- Agent Optimizer automates refinement that would otherwise require manual log analysis
Where to be cautious:
- Vendor lock-in increases significantly. Agent Platform’s integrated lifecycle management means migration costs compound over time
- TPU pricing and availability have historically been less predictable than GPU spot instances
- Agent Identity and Governance capabilities are new — maturity and edge cases are unproven at scale
- The Agentic Data Cloud lakehouse features (bi-directional federation, Spanner Omni) are still in preview
Frequently Asked Questions
What happens to existing Vertex AI deployments?
All Vertex AI services continue to work. Future roadmap evolutions and new features will ship exclusively through Agent Platform. Google has not announced a deprecation timeline for standalone Vertex AI, but the strategic direction is clear. Plan your migration on your own timeline, but start planning now.
Can I use non-Google models in Agent Platform?
Yes. Model Garden provides access to over 200 models including Anthropic Claude Opus, Sonnet, and Haiku. The platform is designed for multi-model architectures where different tasks route to different models.
How does Agent Platform compare to AWS Bedrock or Azure AI Foundry?
Agent Platform’s differentiator is the depth of its agent lifecycle management — identity, governance, observability, and simulation are first-class platform components rather than bolt-on services. AWS Bedrock offers broader model marketplace access but less integrated agent governance. Azure AI Foundry has stronger copilot integration with Microsoft 365 but less flexibility for custom agent architectures. The right choice depends on whether your priority is governance depth (Google), model breadth (AWS), or enterprise integration (Azure).
Is TPU 8i available now?
No. Both TPU 8t and TPU 8i are announced as “coming soon” with general availability expected later in 2026. Interested customers can request information through Google’s TPU interest form. Current TPU generations remain available.
What is the minimum viable setup for Agent Platform?
You can start with Agent Studio for low-code agent construction and Agent Runtime for deployment. Add Agent Gateway and Agent Identity when you need governance. The platform is modular — you do not need to adopt every component simultaneously.
Sources
- Google Cloud Blog — Introducing Gemini Enterprise Agent Platform (April 2026)
- Google Blog — Eighth-Generation TPUs for the Agentic Era (April 2026)
- Google Cloud Blog — Introducing Virgo Network Megascale Data Center Fabric (April 2026)
- Virtualization Review — Google Cloud Next ’26: Gemini Enterprise Agent Platform Leads AI-Centric News (April 24, 2026)
- Google Cloud Blog — AI Infrastructure at Next ’26 (April 2026)



