7 Ways to Reduce GCP Cost — Real Examples From Production Environments
Cloud cost overruns are rarely caused by one big mistake. They’re the result of a dozen small decisions — made quickly during a build, never revisited — that compound over time into a bill that surprises everyone.
I work with engineering teams across Canada and the USA who bring me in specifically to reduce GCP spend. Not by cutting capability, but by fixing the architecture decisions that were never designed with cost in mind. These are the seven patterns I find most consistently, and what I do to fix them.
1. Oversized Node Pools Running 24/7
This is the single highest-impact finding in almost every GCP cost review I run. A team provisions a GKE node pool sized for peak load — or for the workload they expect to have in six months — and it runs at that size continuously, regardless of actual demand.
In one engagement with a SaaS startup I worked with, the production GKE cluster was running six n2-standard-8 nodes around the clock. Actual average CPU utilisation across the cluster was under 15%. The team had disabled cluster autoscaler because a previous scaling event had caused a brief outage, and no one had re-enabled it.
The fix: re-enable cluster autoscaler with appropriate min/max bounds, configure Horizontal Pod Autoscaler for all deployments, and right-size the base node pool using Vertical Pod Autoscaler recommendations. Non-production environments moved to scale-to-zero overnight. Monthly saving on compute alone was material.
The deeper fix: node pool sizing decisions should be revisited on a quarterly cadence, not set-and-forgotten at launch. This is the Lifecycle Ops pillar of the SCALE Framework (https://buoyantcloudtech.com/scale-framework-gcp-architecture/) — the platform needs active management, not just initial provisioning.
2. Cloud NAT Egress on Workloads That Should Use Private Google Access
Cloud NAT is necessary for workloads that need outbound internet access. It is not necessary for workloads that only need to reach GCP services — Cloud Storage, BigQuery, Pub/Sub, Cloud SQL. Those workloads should use Private Google Access, which routes traffic internally without NAT charges.
I see this consistently in environments where the network was set up quickly — Private Google Access was not enabled on subnets, so all traffic to GCP APIs routes via NAT. On high-throughput workloads, NAT egress charges add up fast.
The fix is straightforward: enable Private Google Access on subnets used by workloads that only communicate with GCP APIs. Disable NAT for those workloads. For workloads that genuinely need internet egress, keep NAT in place. This is a network configuration change, not an architectural change, and it has zero impact on application behaviour.
3. Persistent Disks Attached to Deleted VMs
Persistent disks in GCP continue to incur storage charges after the VM they were attached to is deleted — unless the disk is explicitly deleted or the VM was created with auto-delete disk enabled.
In environments with a lot of VM churn — dev and test environments, CI/CD workers, short-lived compute — orphaned persistent disks accumulate quietly. I’ve found environments with dozens of detached persistent disks, some hundreds of GB in size, that had been accruing charges for months.
The fix: audit detached persistent disks via `gcloud compute disks list –filter=”NOT users:*”` and delete those that are no longer needed. Going forward, enforce auto-delete on disks for ephemeral VMs, and add a scheduled Cloud Function or workflow to alert on detached disks older than 7 days.
4. Cloud Run Services Not Configured for Scale-to-Zero
Cloud Run scales to zero by default — but only if minimum instances is set to zero. Many teams set minimum instances to 1 or higher to avoid cold starts, which is sometimes the right decision for latency-sensitive production services. It is almost never the right decision for staging, development, or internal tooling services.
I regularly find non-production Cloud Run services running with minimum instances set to 1 across multiple environments. Each instance runs continuously. For services that receive traffic only during business hours or during CI pipeline runs, this means paying for compute 24 hours a day for a service that is actually needed for 6.
The fix: set minimum instances to 0 on all non-production Cloud Run services. For production services where cold start latency matters, evaluate whether the latency cost is actually user-facing or whether it can be mitigated with startup CPU boost rather than always-on instances. Full detail on Cloud Run architecture at https://buoyantcloudtech.com/gcp-serverless-architecture-cloud-run/.
5. BigQuery On-Demand Queries Without Cost Controls
BigQuery’s on-demand pricing model charges per byte scanned. For teams running ad hoc queries or poorly optimised analytical workloads, this can generate significant unexpected charges — particularly when a query scans a full large table that could have been filtered or partitioned.
A large Canadian healthcare platform I worked with had a data team running exploratory BigQuery queries against unpartitioned tables. A single analyst running a broad query against a year of event data could scan several terabytes in one query. There were no cost controls in place — no per-user quotas, no query cost estimates enforced before execution.
The fix: enable table partitioning and clustering on large tables to reduce bytes scanned per query, set per-user and per-project BigQuery cost controls via reservation capacity or custom quotas, and enforce the use of `SELECT` with explicit column lists rather than `SELECT *`. For teams with predictable query volumes, BigQuery capacity commitments (flat-rate pricing) often reduce cost significantly versus on-demand.
6. Logging Export Costs From Verbose Log Sinks
Cloud Logging charges for log ingestion above the free tier, and for logs exported to Cloud Storage or BigQuery. Environments that export all logs — including verbose application debug logs and noisy GKE system logs — to long-retention sinks accumulate logging costs that are easy to miss because they appear as storage and BigQuery charges rather than a “logging” line item.
The fix: audit what is being exported via log sinks and filter out log types that have no compliance or operational value. GKE system component logs, verbose application debug logs, and high-frequency health check logs are the most common candidates for exclusion. Set log-based retention policies aligned to actual compliance requirements — storing 2 years of debug logs because no one set a retention period is a common and expensive mistake.
7. No Committed Use Discounts on Stable Workloads
GCP offers committed use discounts (CUDs) of up to 57% on Compute Engine and GKE node pools for 1 or 3 year commitments. For workloads with stable, predictable resource requirements — production GKE node pools, Cloud SQL instances, always-on Compute Engine — not having CUDs in place means paying on-demand rates for resources you will definitely use.
I find this most commonly in teams that started small and never revisited their billing model as the platform stabilised. The conversation about commitments was never had because cloud spend felt manageable. By the time it doesn’t feel manageable, you’ve been overpaying for months.
The fix: export billing data to BigQuery, analyse 3 months of usage to identify stable resource consumption, and purchase CUDs for those resources. Resource-based CUDs on Compute Engine are flexible enough that they don’t lock you into specific VM types. Spend-based CUDs provide even more flexibility for mixed workloads. For GKE specifically, combining CUDs with autoscaler means you commit on the baseline and scale above it on-demand — best of both models.
Where to Start
If you’re looking at a GCP bill that has grown faster than your team expected, the fastest path to understanding where the money is going is a billing export to BigQuery and an hour of analysis. Most environments I review have two or three of the above patterns in place simultaneously — fixing them in combination produces meaningful savings without any reduction in capability.
I run GCP cost and architecture reviews for engineering teams in Toronto, across Canada, and in the USA. The review covers billing analysis, architecture assessment, and a prioritised remediation plan. More about my background and approach: https://buoyantcloudtech.com/about/
FAQ
What is the fastest GCP cost reduction with the least risk?
Enabling cluster autoscaler on oversized GKE node pools and setting non-production Cloud Run services to scale-to-zero. Both changes are reversible, have no impact on production availability, and typically produce immediate savings on the next billing cycle.
How do I find out what is driving my GCP bill?
Export billing data to BigQuery using the billing export feature, then query by service, SKU, and project. The GCP Billing console also provides cost breakdowns by service, but BigQuery gives you the flexibility to identify specific resources and trends over time.
Should I use committed use discounts or sustained use discounts?
Sustained use discounts apply automatically for Compute Engine VMs that run for a significant portion of the month — you don’t need to do anything. Committed use discounts require a 1 or 3 year commitment but offer deeper savings. For stable production workloads, combining both maximises savings. GKE Autopilot workloads are eligible for sustained use discounts automatically.
How much can a GCP cost review typically save?
It varies significantly by environment. In my experience, teams that have never done a structured cost review typically find 20-35% reduction opportunities in their first review — primarily from right-sizing, scale-to-zero configuration, and CUD adoption. Environments with specific patterns like unpartitioned BigQuery tables or uncapped Cloud NAT can see larger reductions.
Related Reading
– The SCALE Framework: https://buoyantcloudtech.com/scale-framework-gcp-architecture/
– GCP Serverless Architecture with Cloud Run: https://buoyantcloudtech.com/gcp-serverless-architecture-cloud-run/
– GKE Operational Excellence: https://buoyantcloudtech.com/gke-operational-excellence-resilient-workloads/
– Strategic IaC and Terraform on GCP: https://buoyantcloudtech.com/strategic-iac-terraform-gcp-guide/
– GCP Landing Zone Blueprint: https://buoyantcloudtech.com/gcp-landing-zone-blueprint/
– Why Your GKE Cluster Is Costing Too Much: https://buoyantcloudtech.com/why-gke-cluster-costing-too-much/
Book a Free GCP Architecture Review: https://buoyantcloudtech.com/contact-gcp-consulting/