MLOps & GenAI Platform Infrastructure on GCP

I help engineering and ML teams design the infrastructure and platform layer that machine learning and generative AI workloads depend on to run reliably in production on Google Cloud. My focus is not data science — it’s the underlying GCP architecture: the GKE clusters and Vertex AI pipelines that train and serve models, the BigQuery and GCS data infrastructure that feeds them, the Terraform foundations that make environments reproducible, and the DevSecOps pipelines that get models from experiment to production without manual steps or security gaps.

I’m Amit Malhotra, a Principal GCP Architect based in Toronto with 20+ years in IT and 6+ years hands-on with Google Cloud, Terraform, GKE, Vertex AI, and BigQuery. I design ML and GenAI infrastructure for teams who have the ML expertise and need the platform engineering expertise to match — so their ML engineers can focus on models rather than infrastructure.

Every MLOps and GenAI platform engagement I run is guided by the SCALE Framework — in particular the Security by Design pillar (protecting training data and model artifacts), Automation with Terraform (reproducible compute environments), and Elastic Scalability (infrastructure that scales with training and inference demand without manual intervention).

The Problem Most Teams Face

What I Typically Work On

The MLOps and GenAI Infrastructure Work I Do

ML Platform Design on Google Cloud: I design the end-to-end ML platform architecture — Vertex AI pipeline infrastructure, GKE compute for custom training, BigQuery and GCS data layer design, model registry strategy, and the serving infrastructure for training and inference workloads. The platform is designed so ML engineers can run experiments and ship models without infrastructure tickets.
Training and Inference Infrastructure: I design and implement the GCP compute infrastructure for model training and serving — GPU and TPU node pools on GKE with autoscaling, Vertex AI Training job configuration, spot VM strategy for training cost reduction, and inference serving on Cloud Run or GKE with autoscaling and health checks configured for ML workload patterns.
Secure GenAI Service Deployment: I design the infrastructure for deploying GenAI services to production on GCP — API layers on Cloud Run or GKE, Vertex AI Model Garden and Gemini API integration with proper access controls, rate limiting and quota management, request/response logging for auditability, and cost controls that prevent runaway inference spend.
CI/CD for ML Workflows: I design and implement MLOps CI/CD pipelines — automated pipeline for model training, evaluation against defined quality thresholds, artifact signing, and deployment to serving infrastructure. Integrated with the same
CI/CD for ML Workflows: I design and implement MLOps CI/CD pipelines — automated workflows for model training, evaluation against defined quality thresholds, artifact signing, and deployment to serving infrastructure. Integrated with DevSecOps practices so model deployments go through security and quality gates, not manual uploads.
Environment Isolation and Access Control: Separate GCP projects for ML development, staging, and production environments — with IAM boundaries between environments, Workload Identity Federation for all Vertex AI and GCS access, and VPC Service Controls preventing training data from moving outside the defined security perimeter.
Cost and Resource Governance: GCP budget alerts, per-team and per-experiment resource quotas, GKE node pool autoscaling configuration, Spot VM strategy for training jobs, and BigQuery cost controls — so ML infrastructure costs are predictable and visible rather than a surprise at month-end.

MLOps Platform Architecture

The MLOps and GenAI Infrastructure Work I Do

ML Platform Design on Google Cloud: I design the end-to-end ML platform architecture — Vertex AI pipeline infrastructure, GKE compute for custom training, BigQuery and GCS data layer design, model registry strategy, and the serving infrastructure for training and inference workloads. The platform is designed so ML engineers can run experiments and ship models without infrastructure tickets.
Training and Inference Infrastructure: I design and implement the GCP compute infrastructure for model training and serving — GPU and TPU node pools on GKE with autoscaling, Vertex AI Training job configuration, spot VM strategy for training cost reduction, and inference serving on Cloud Run or GKE with autoscaling and health checks configured for ML workload patterns.
Secure GenAI Service Deployment: I design the infrastructure for deploying GenAI services to production on GCP — API layers on Cloud Run or GKE, Vertex AI Model Garden and Gemini API integration with proper access controls, rate limiting and quota management, request/response logging for auditability, and cost controls that prevent runaway inference spend.
CI/CD for ML Workflows: I design and implement MLOps CI/CD pipelines — automated pipeline for model training, evaluation against defined quality thresholds, artifact signing, and deployment to serving infrastructure. Integrated with the same
CI/CD for ML Workflows: I design and implement MLOps CI/CD pipelines — automated workflows for model training, evaluation against defined quality thresholds, artifact signing, and deployment to serving infrastructure. Integrated with DevSecOps practices so model deployments go through security and quality gates, not manual uploads.
Environment Isolation and Access Control: Separate GCP projects for ML development, staging, and production environments — with IAM boundaries between environments, Workload Identity Federation for all Vertex AI and GCS access, and VPC Service Controls preventing training data from moving outside the defined security perimeter.
Cost and Resource Governance: GCP budget alerts, per-team and per-experiment resource quotas, GKE node pool autoscaling configuration, Spot VM strategy for training jobs, and BigQuery cost controls — so ML infrastructure costs are predictable and visible rather than a surprise at month-end.

GenAI Infrastructure Patterns

Infrastructure Patterns for GenAI Services on GCP

GenAI systems have a different infrastructure profile from traditional ML models — lower training infrastructure requirements but more complex serving, governance, and cost management challenges. The patterns I design for GenAI services on GCP:

API-Based GenAI Services: GenAI capabilities exposed through well-defined API layers on Cloud Run or GKE — with versioning, rate limiting, and request validation built into the service layer so downstream applications have a stable, managed interface to the GenAI capability rather than direct model access.
Secure Access to Foundation Models: Vertex AI Model Garden and Gemini API access controlled through IAM — least-privilege service accounts, Workload Identity Federation for application authentication, and API quotas configured to prevent runaway costs from unexpected traffic spikes.
Inference Layers on Cloud Run and GKE: For custom model serving or RAG pipeline serving — Cloud Run for stateless inference APIs with scale-to-zero cost efficiency, GKE for workloads that need persistent connections, GPU access, or more control over the serving environment. Autoscaling configured for the bursty traffic patterns typical of GenAI services.
Integration with Internal Systems: Secure integration patterns for connecting GenAI services to internal data sources — VPC-private connectivity to databases and APIs, Secret Manager for credentials, and audit logging for every external data access that feeds the GenAI system.
Rate Limiting, Auditing, and Governance: Per-user and per-application rate limiting at the API layer, request and response logging for auditability and compliance, content filtering integration where required, and usage dashboards that show who is using the GenAI service and at what volume.
Cost Controls and Usage Tracking: GCP budget alerts for Vertex AI and Gemini API spend, per-application cost attribution through resource labels, and autoscaling configuration that prevents serving infrastructure from over-provisioning in response to traffic spikes.

Security & Governance for AI

AI Platforms Introduce Security Risks That Standard GCP Security Doesn’t Cover

AI and ML platforms have a set of security and governance challenges that go beyond standard GCP security practices — and that require intentional design decisions at the platform level. Combined with the broader DevSecOps & Cloud Security service, I address these AI-specific risks:

Data Leakage Prevention: VPC Service Controls perimeters around training data and model artifacts, GCS bucket IAM policies scoped to the minimum required access, BigQuery column-level security for sensitive training data fields, and audit logging for all data access — so training data can’t leave the defined security boundary through misconfigured services or compromised credentials.
Model Access Control: Vertex AI model and endpoint IAM policies scoped to authorised applications and users, Workload Identity Federation for all programmatic model access, and model artifact storage in Artifact Registry with access controls — so model weights and configurations are treated as sensitive assets rather than shared files.
Prompt and Output Governance: Request and response logging for all GenAI API calls, content filtering integration where compliance requires it, and prompt injection detection at the API layer — with logs retained in Cloud Logging for the duration required by your compliance framework.
Compliance and Auditability: For regulated industries — CMEK encryption for training data and model artifacts, Cloud Audit Logs for all Vertex AI operations, data residency controls enforcing that training data stays within defined GCP regions, and access audit trails that satisfy SOC 2, OSFI, or PIPEDA requirements.

Who This Is For

AI Platforms Introduce Security Risks That Standard GCP Security Doesn’t Cover

Data Leakage Prevention: VPC Service Controls perimeters around training data and model artifacts, GCS bucket IAM policies scoped to the minimum required access, BigQuery column-level security for sensitive training data fields, and audit logging for all data access — so training data can’t leave the defined security boundary through misconfigured services or compromised credentials.
Model Access Control: Vertex AI model and endpoint IAM policies scoped to authorised applications and users, Workload Identity Federation for all programmatic model access, and model artifact storage in Artifact Registry with access controls — so model weights and configurations are treated as sensitive assets rather than shared files.
Prompt and Output Governance: Request and response logging for all GenAI API calls, content filtering integration where compliance requires it, and prompt injection detection at the API layer — with logs retained in Cloud Logging for the duration required by your compliance framework.
Compliance and Auditability: For regulated industries — CMEK encryption for training data and model artifacts, Cloud Audit Logs for all Vertex AI operations, data residency controls enforcing that training data stays within defined GCP regions, and access audit trails that satisfy SOC 2, OSFI, or PIPEDA requirements.

LET’S TALK

Building ML or GenAI Systems on GCP? Let’s Talk About the Infrastructure.

Getting ML models from experiment to production is almost entirely a platform engineering problem — and it’s one I’ve solved before. I start with a free 30-minute architecture review: a direct conversation about your ML workloads, your current GCP setup, and what the infrastructure needs to look like to support your AI systems reliably and securely in production. You work directly with me, Amit Malhotra, throughout — no account layer, no hand-offs.

Let’s Talk

Speak Directly With Amit Malhotra

Email

amit@buoyantcloudtech.com

Operating From

Based in Toronto (EST), working with engineering teams across Canada & USA

Ready to Architect Your Future on Google Cloud?

Speak directly with me — a Principal Cloud Architect — about your GCP architecture, security, platform engineering, or MLOps goals. I typically respond within one business day.

✓ Free 30-minute call ✓ No proposal, no pressure ✓ Responds within one business day

Get In Touch

Trusted Technical Advisor

Amit works as a true architecture partner, not just a consultant. He focuses on making the right decisions early and designing systems that remain maintainable as they scale. His guidance helped us avoid costly redesigns and establish a solid cloud foundation from the start.

Architecture leadership

Amit helped us redesign our Google Cloud architecture to support rapid growth without increasing operational complexity. His ability to simplify difficult architectural decisions and design scalable platform foundations had an immediate impact on our engineering velocity and system reliability.

Platform engineering & DevSecOps

We engaged Amit to build a secure and scalable platform on Google Cloud with Terraform, Cloud Run, Kong API gateway and automated CI/CD. He brought deep hands-on expertise and designed everything with long-term operability in mind. Our deployment process is now significantly more reliable and secure.