How I Build Production AI Agent Platforms on GCP — Google ADK, Vertex Agent Engine & Vertex AI
TL;DR: Build production-ready AI agent platforms using Vertex AI and modern GCP tooling.
How I Build Production AI Agent Platforms on GCP — Google ADK, Vertex Agent Engine & Vertex AI
The prototype-to-production gap in AI agents is wider than most CTOs expect. I’ve seen teams spin up a Gemini-powered chatbot in an afternoon and assume that production is six weeks away. Then reality hits: How do you secure agent-to-API access without static keys? How do you observe what a multi-agent system is actually doing? How do you deploy agent updates without breaking running sessions? How do you make the whole thing reproducible across environments? These are platform engineering problems, not AI problems — and they’re the ones nobody talks about in the demos.
This post is my architectural playbook for building AI agent platforms on GCP using Google ADK, Vertex Agent Engine, and Vertex AI. It’s written for CTOs and platform engineers who are past the prototype stage and need to understand what production actually looks like.
The Stack I Use — And Why
Before getting into architecture decisions, it’s worth being clear about what each layer of the stack does and why I’ve settled on this combination.
Google Agent Development Kit (ADK) is Google’s open-source Python framework for building multi-agent systems. It gives you a structured way to define agent behaviour, tool use, memory, and inter-agent communication — and it’s designed to deploy natively to Vertex Agent Engine. The key thing ADK gives you that a raw LLM API call doesn’t is a composable agent model: you can build a coordinator agent that delegates to specialised sub-agents, each with its own tool set and scope.
Vertex Agent Engine is the managed GCP runtime for ADK agents. It handles deployment, scaling, session management, and execution — no infrastructure to stand up or manage. For most use cases this is the right deployment target. For teams that need more control over the runtime environment, Cloud Run is the alternative — same ADK framework, more operational flexibility.
Vertex AI provides the model layer — Gemini models via the Gemini API, plus Vertex AI’s broader MLOps capabilities for teams running fine-tuned or custom models alongside their agents.
MCP Servers (FastMCP) sit between the agents and the systems they need to interact with. The Model Context Protocol gives agents a standardised way to call tools — whether that’s a GCP API, an internal database, or an external service. I build custom FastMCP servers that expose your GCP environment — cost data, infrastructure state, deployment status, monitoring signals — as structured tools agents can reason over and act on.
At the base: GCP APIs accessed via Workload Identity Federation. No static service account keys anywhere in the stack. This is non-negotiable.
The Architecture: How the Layers Connect
A production AI agent platform on GCP isn’t a single service — it’s a layered system. Here’s how I structure it:
User / Trigger Layer
↓ Coordinator Agent (ADK) — running on Vertex Agent Engine
↓ Sub-Agents (ADK) — specialised, scoped, independently deployable
↓ MCP Tool Servers (FastMCP) — on Cloud Run
↓ GCP APIs + Internal Systems — accessed via WIF, private networking
The coordinator/sub-agent pattern is the most important architectural decision. A single agent trying to handle everything becomes brittle and hard to observe. Breaking the system into a coordinator that routes intent, and sub-agents that execute specific tasks, gives you isolation, independent scaling, and a clear boundary for debugging when something goes wrong.
In a FinOps agent I’m designing, for example:
- The coordinator receives a natural language query (“show me what drove last month’s cost spike”)
- It routes to a Billing sub-agent that queries Cloud Billing export in BigQuery
- And a Resource sub-agent that queries current infrastructure state
- Results flow back to the coordinator, which synthesises and responds
Each sub-agent has its own IAM binding — the Billing sub-agent can only read billing data, the Resource sub-agent can only read infrastructure state. Least-privilege enforced at the agent level.
Security — The Part Most Agent Tutorials Skip
Every agent tutorial I’ve seen authenticates to GCP with a service account key file. That’s fine for a local demo. It’s not acceptable for production.
Here’s the security model I apply to every agent platform:
Workload Identity Federation for all GCP API access. ADK agents running on Vertex Agent Engine or Cloud Run authenticate to GCP services using WIF — the same keyless approach I use for GKE workloads and CI/CD pipelines. No JSON key files, no static credentials, no rotation headaches. The WIF case study covers the implementation pattern in detail.
Secret Manager for external credentials. Any credential the agent needs to access a non-GCP system — a SaaS API key, a database password — lives in Secret Manager and is injected at runtime via the Secret Manager CSI Driver or environment injection. Never hardcoded, never in source control.
Least-privilege IAM per agent. Each sub-agent’s service account has only the permissions it needs for its specific tool set. A reporting agent doesn’t need write access. An infra-query agent doesn’t need access to customer data. Scope everything.
VPC controls where needed. For agents accessing sensitive data stores — regulated databases, BigQuery datasets with PII — I wrap the relevant GCP services in a VPC Service Perimeter. Agent tool calls that touch regulated data stay inside the perimeter.
Cloud Audit Logs on everything. Every agent action that touches a GCP API generates an audit log entry. This is your compliance evidence and your forensic trail if something goes wrong.
This maps to the S (Security by Design) pillar of the SCALE Framework — the same architectural principles that govern every GCP platform I build apply here. If you want the full platform architecture context, the SCALE Framework hub post is worth reading alongside this one.
Observability — The Hardest Part of Agentic Systems
Traditional application observability is straightforward: a request comes in, you trace it through your services, you measure latency and errors. Agent systems are harder. A single user query might trigger a coordinator agent, three sub-agents, five tool calls, two LLM inference requests, and a database write — across multiple GCP services, with branching logic that varies per run.
What I instrument on every agent platform:
Tool call traces — which tools were called, in what order, with what inputs and outputs
Latency per agent hop — where is time being spent? LLM inference, tool execution, network?
Token consumption per agent — both for cost attribution and for detecting runaway agent loops
Error rates by agent and tool — which parts of the system are fragile?
Session completion rates — are agents reaching a useful conclusion or timing out?
I deploy this via Datadog (preferred for teams that already use it) or Google Cloud Operations with custom metrics. The goal is a dashboard where you can look at any agent execution and understand exactly what happened — not just whether it succeeded or failed.
Deployment — Treating Agents Like Software
One of the patterns I push hard on: agent definitions should be versioned, deployed through CI/CD, and promoted through environments exactly like any other application code.
This means:
ADK agent code in Git — versioned, reviewed, tested
Terraform for all infrastructure — Vertex Agent Engine resources, MCP server deployments on Cloud Run, IAM bindings, Secret Manager secrets. Everything reproducible.
Environment promotion — Dev → Staging → Prod, with agent behaviour validated in staging before production deployment
Blue-green agent updates — when updating a running agent, I deploy the new version alongside the old and shift traffic gradually. Agent sessions in progress complete on the old version; new sessions start on the new version.
This is the A (Automation/IaC) pillar of SCALE applied to AI infrastructure. An agent platform that can only be deployed by the person who built it isn’t a platform — it’s a prototype that happened to make it to production.
What an Engagement Looks Like
Most AI agent engagements I run start with a 2-week architecture sprint — no code, just decisions. Agent topology, tool set design, data access model, IAM structure, observability requirements, deployment architecture. Getting these right before writing production code saves months.
From there: MCP server implementation, ADK agent development, Vertex Agent Engine deployment, observability instrumentation, Terraform build-out.
Typical timeline for a first production agent: 4–6 weeks from architecture sprint to deployed, observable, Terraform-managed agent running on GCP.
The Foundation Requirement
One thing I want to be direct about: AI agent infrastructure is not a greenfield concern. It lands on your existing GCP platform — your IAM model, your network topology, your security controls. If the foundation isn’t right, the agent platform inherits those problems.
The GCP Landing Zone Blueprint covers what that foundation needs to look like. If you’re building an agent platform on a “default” GCP setup with no org hierarchy, no VPC controls, and no IaC — the agent work needs to wait two weeks while we fix the foundation.
Related reading:
How I Think About GCP Platform Architecture — The SCALE Framework — the architectural methodology this post applies to AI agent platforms
GCP Landing Zone Blueprint — the secure foundation agent workloads run on
Migrating to Keyless GCP Auth: WIF Case Study — the identity model that secures agent-to-GCP API access
4 Ways to Inject GCP Secrets into GKE — secrets injection patterns applicable to agent workloads
MLOps & GenAI Platforms on GCP — how I engage for AI platform work
Building an AI Agent Platform on GCP? Let’s Talk Architecture First.
If you’re moving from prototype to production and need the platform engineering layer — security, observability, IaC, deployment model — done properly, that’s the conversation I’m built for.