Terraform Mistakes That Are Increasing Your GCP Cloud Bill

Terraform is the right tool for managing GCP infrastructure. It is also a tool that makes it easy to provision resources at scale — which means it makes it equally easy to provision the wrong resources at scale, leave them running, and never notice.

The mistakes I see in Terraform-managed GCP environments are not syntax errors. They are architectural and operational decisions made early in a platform’s life that quietly compound into unnecessary spend. Here is what I find and how to fix it.

Mistake 1 — Hardcoded Machine Types That Were Never Right-Sized

The most common pattern: machine types hardcoded in Terraform with no mechanism for environment-specific overrides. The production machine type — chosen conservatively — gets applied to staging, development, and QA environments because the variable was never parameterised.

A typical example from environments I review: `n2-standard-8` hardcoded across all environments in a GKE node pool module. Production may justify that instance type. Staging running at 10% of production load does not. Development used by two engineers definitely does not.

The fix: parameterise machine types by environment using Terraform variable files or workspace-specific tfvars. Development and staging environments use smaller, cheaper machine types — `e2-standard-2` or `e2-medium` for most workloads — while production uses the type matched to actual load. This single change, applied consistently, typically reduces non-production compute costs by 40-60%.

Mistake 2 — Resources Provisioned in terraform apply But Never Destroyed

Terraform makes provisioning fast. It does not automatically clean up resources that are no longer needed. In environments where Terraform is used to spin up feature branch environments, test infrastructure, or temporary workloads, those resources persist until someone explicitly runs `terraform destroy` or removes them from the configuration.

What I find: Cloud SQL instances from feature branch environments that were merged months ago, GCS buckets created for a data migration that completed and were never removed, static external IP addresses allocated and no longer attached to any resource. Each one accrues charges silently.

The fix: implement automated lifecycle management for ephemeral environments. A CI/CD pipeline that creates a feature branch environment should also destroy it on branch merge or after a TTL. Cloud Scheduler combined with a Cloud Function can audit for resources matching specific labels (e.g. `environment=feature-branch`) and flag or destroy those older than a defined threshold. Label all Terraform-managed resources with `created-by`, `environment`, and `ttl` tags from the start.

Mistake 3 — Terraform State Stored Without Lifecycle Controls

Terraform state stored in GCS without object versioning and lifecycle policies accumulates old state versions indefinitely. This is a storage cost that is small per file but significant at scale — particularly in environments with many workspaces or frequent apply runs.

More commonly, I find environments where the GCS bucket storing Terraform state has no storage class lifecycle policy. State files that are months or years old remain in STANDARD storage class when they could be transitioned to NEARLINE or COLDLINE at a fraction of the cost.

The fix: enable object versioning on the state bucket (required for state locking), and set a lifecycle rule to transition non-current versions to NEARLINE after 30 days and COLDLINE after 90 days. Delete non-current versions older than 365 days. This is a backend configuration change that has no impact on Terraform operations.

Mistake 4 — No count or for_each Governance Leading to Resource Sprawl

Terraform’s `count` and `for_each` meta-arguments make it easy to create multiple resources from a single configuration block. They also make it easy to accidentally create far more resources than intended — or to create resources that persist after the input list has changed.

A common example: a `for_each` over a list of environments that was extended during a refactor but the old environments were never removed from the list. Three additional Cloud SQL instances running in environments no one uses, provisioned cleanly by Terraform, never flagged because they are “managed infrastructure.”

The fix: audit `count` and `for_each` usage in your Terraform configuration and verify that every item in the input list corresponds to a current, active requirement. Implement a tagging strategy that makes it easy to identify which resources are associated with which logical environment or feature, so that when an environment is retired the associated resources are easy to identify and remove.

Mistake 5 — Terraform Modules Defaulting to High-Availability Configuration Everywhere

HA configuration — Cloud SQL with high availability enabled, regional GKE clusters with 3-zone node pools, multi-region Cloud Storage — is the right choice for production. It is not the right choice for development environments where data loss is acceptable and availability requirements are minimal.

I find Terraform modules written for production that are reused without modification for lower environments. Cloud SQL HA doubles the instance cost. Regional GKE clusters with 3-zone node pools triple the node count compared to a zonal cluster. For dev environments, these are costs with no benefit.

The fix: expose HA configuration as a variable in your Terraform modules with a default appropriate for production. Override to `false` or single-zone in dev and staging variable files. This is not a compromise on production reliability — it is an acknowledgment that dev environments have different requirements.

Mistake 6 — No Cost Estimation in the Terraform Pipeline

Terraform plan shows you what resources will be created, modified, or destroyed. It does not show you what those changes will cost. Without cost estimation in the pipeline, an engineer can inadvertently provision expensive resources — a high-memory Cloud SQL instance, a large persistent disk, a regional GKE cluster — without any visibility into the cost implication before `apply`.

The fix: integrate Infracost into the Terraform CI/CD pipeline. Infracost runs alongside `terraform plan` and produces a cost diff — showing the estimated monthly cost of the planned infrastructure changes before they are applied. For PRs that increase estimated cost above a defined threshold, require an explicit approval. This adds cost awareness to the engineering workflow without creating process overhead.

Mistake 7 — Terraform Drift Leaving Manually Created Resources Unmanaged

Terraform drift occurs when resources exist in GCP that are not in Terraform state — provisioned manually via the console or gcloud, or imported and then not maintained. Drifted resources are invisible to Terraform’s cost management and lifecycle controls.

What I find: environments where 20-30% of running resources were never brought under Terraform management. Some are legacy, some were created for quick fixes, some were provisioned by engineers who didn’t have Terraform access. All of them are outside the cost governance model.

The fix: use `terraform import` to bring existing resources under management, combined with a policy of enforcing Terraform-only provisioning going forward. The `iam.disableServiceAccountKeyCreation` org policy is one example of enforcing this at the GCP level — similar org policies exist for other resource types. For identifying unmanaged resources, Cloud Asset Inventory provides a complete view of what exists in your GCP org, which can be compared against Terraform state.

What a Typical Terraform Review Finds

In environments I review that have been using Terraform for 12+ months without a structured review, the most common combination is hardcoded machine types in non-production, unmanaged ephemeral resource sprawl, and no cost estimation in the pipeline. Together these three patterns typically account for 15-25% of total GCP spend that could be eliminated without any impact on production workloads.

Get a Second Set of Eyes on Your GCP and Terraform Setup

If your GCP bill has grown faster than your team expected, and you’re managing infrastructure with Terraform, there is almost certainly savings to be found in the IaC layer. I run short GCP cost and architecture audits for engineering teams in Toronto, across Canada, and in the USA — reviewing Terraform structure, resource configuration, and billing patterns, then sharing a prioritised list of findings.

If you want a second set of eyes on your setup, I offer a short audit covering both cost and security. Reach out and we can start with a short conversation: https://buoyantcloudtech.com/contact-gcp-consulting/

More about my background and approach: https://buoyantcloudtech.com/about/

FAQ

How do I find resources in GCP that are not managed by Terraform?

Cloud Asset Inventory provides a complete list of all resources in your GCP organisation. Export it to BigQuery and compare against your Terraform state. Resources present in Asset Inventory but not in any Terraform state file are unmanaged. This comparison is a useful baseline for any IaC governance programme.

Infracost has an open source CLI that is free to use and integrates with GitHub Actions, GitLab CI, and other pipeline tools. It uses GCP public pricing to estimate costs. For teams wanting policy enforcement and team-level cost controls, Infracost Cloud is a paid product.

Use workspace-specific variable files (terraform.tfvars per workspace) or a directory-based structure with shared modules and environment-specific configurations. The key principle: modules define what resources look like, variable files define environment-specific values. Never hardcode environment-specific values in module code.

Terraform structure determines how easy it is to apply environment-specific configurations, enforce tagging for cost attribution, and manage resource lifecycle. A well-structured Terraform codebase — modular, parameterised, with consistent tagging — is a prerequisite for meaningful GCP cost governance. I covered the full Terraform architecture approach at https://buoyantcloudtech.com/strategic-iac-terraform-gcp-guide/.

Related Reading

Buoyant Cloud Inc
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.