Top 10 Terraform Interview Mistakes
Published 21 June 2026 by Ace Cloud Interviews
Terraform interviews go well beyond "what does terraform plan do". Interviewers probe state management, failure handling, secrets, module design, and the decisions that separate someone who has run Terraform on a team from someone who has only followed tutorials. These are the 10 mistakes that reveal shallow knowledge - and what to say instead.
Saying "Terraform is like CloudFormation" without knowing the critical differences
What candidates say
“Terraform is basically CloudFormation but cloud-agnostic.”
Why interviewers mark this down
This framing misses the fundamental operational differences. CloudFormation is a managed AWS service - AWS executes the apply, owns the state, and handles rollback automatically. Terraform is a client-side tool - you run it, you own the state file, and a failed partial apply leaves resources in an inconsistent state you resolve manually. CloudFormation stacks roll back automatically on failure. Terraform does not. These differences have real consequences in production.
What to say instead
Say: "The key difference is the execution model. CloudFormation is a managed service that runs deployments on your behalf with built-in rollback and stack drift detection. Terraform is a local tool - you own the state file, you run the apply, and failed applies leave partial changes that you must resolve. CloudFormation is simpler for AWS-only infrastructure. Terraform's provider ecosystem and plan/apply workflow give more control and work across cloud providers."
Treating state as just a record of what was deployed
What candidates say
“State is just a record of what Terraform has deployed.”
Why interviewers mark this down
State is the source of truth that Terraform uses to determine what changes to make next. Without state, Terraform cannot map configuration to real resources or resolve dependencies. State contains resource IDs, metadata, and sensitive values. If state is lost, you cannot manage existing resources without re-importing them manually. If two engineers run apply against the same state file simultaneously, you get a race condition that can corrupt infrastructure. State must be stored remotely and protected.
What to say instead
Say: "State is what allows Terraform to know that aws_instance.web in config maps to i-0abc123 in AWS. I always store state in S3 with DynamoDB locking for team use - DynamoDB prevents concurrent applies from corrupting state. State is never committed to Git because it can contain sensitive values. Before any destructive operation I take a state backup. Drift - where real infrastructure diverges from state - is one of the most common production Terraform issues and is resolved with terraform refresh or targeted imports."
Not knowing how to protect a production database from accidental deletion
What candidates say
“terraform destroy tears down everything defined in the configuration.”
Why interviewers mark this down
The question interviewers are really asking is: do you know how to protect stateful resources from accidental destruction? Without the lifecycle prevent_destroy flag, a terraform apply that removes a resource block or a terraform destroy will silently delete a production database. This is one of the most catastrophic and common Terraform mistakes. Interviewers who have worked in production Terraform environments will ask about this directly.
What to say instead
Say: "For any resource I cannot recreate without data loss - RDS databases, S3 buckets with data, Elasticsearch domains - I always set lifecycle { prevent_destroy = true }. This makes Terraform throw an error if anything attempts to destroy the resource. A developer must explicitly remove that flag and re-plan before the resource can be deleted. I combine this with RDS deletion protection at the AWS API level as a second independent layer. I also require terraform plan output review in CI so any destroy actions need a human approval."
Using count for everything instead of for_each
What candidates say
“I'd use count to create multiple instances of the resource.”
Why interviewers mark this down
count and for_each have different behaviour when items are removed from the middle of a list. count indexes resources numerically: resource[0], resource[1], resource[2]. Removing item 1 renumbers resource[2] to resource[1], and Terraform destroys and recreates the renamed resource. for_each indexes by key: resource["web"], resource["api"]. Removing "api" only affects that resource. For stateful resources like databases or instances with persistent data, using count can trigger accidental destruction during what appears to be an unrelated change.
What to say instead
Say: "I use for_each when creating named resources where each one has an identity - a map of S3 buckets, a set of IAM roles, multiple subnets by AZ name. for_each indexes by key so removing one item does not affect the others. I use count for simple toggles (count = var.enable_feature ? 1 : 0) or when I genuinely want N identical, interchangeable resources. The rule of thumb: if removing one item from the collection should not affect the others, use for_each."
Treating -target as a normal workflow tool
What candidates say
“I'd use terraform apply -target to update just that one resource.”
Why interviewers mark this down
Using -target regularly is a significant red flag for interviewers with production Terraform experience. The -target flag bypasses Terraform's dependency graph and applies changes to a subset of resources. This leaves state out of sync with the full configuration - Terraform's internal configuration hash no longer matches reality. Over time, repeated -target usage creates drift that is hard to untangle. It exists as a break-glass recovery tool, not a workflow shortcut.
What to say instead
Say: "I only use -target as a last resort during recovery - for example, when one resource failed mid-apply and I need to create it in isolation before retrying the full apply. For normal workflows, if I want to update a single resource I refactor the configuration so that resource is in its own module or workspace. Any -target usage gets documented in the PR with a plan to follow up with a clean full apply as soon as possible."
Not knowing how to handle secrets in Terraform
What candidates say
“I'd put the database password in a variable and pass it via terraform.tfvars.”
Why interviewers mark this down
Any value that flows through a Terraform variable ends up in the state file. Terraform state is not encrypted by default. Secrets in variables that get stored in state mean your database password is in S3 or on disk, potentially in tfvars files that get committed to Git. This is a serious security vulnerability. Interviewers asking about secrets in Terraform are testing whether you know this fundamental limitation.
What to say instead
Say: "Secrets should not pass through Terraform variables or live in state. I provision the secret resource in Terraform (aws_secretsmanager_secret or aws_ssm_parameter of SecureString type) and populate the actual value separately, outside Terraform, via the AWS console or a secrets management pipeline. The application fetches the secret at runtime. If Terraform must set a password at creation time, I use the random_password resource and store it directly into Secrets Manager in the same apply - but I treat that state file as sensitive and ensure the S3 backend has encryption and strict access controls."
Not knowing what happens after a failed apply
What candidates say
“If terraform apply fails, it rolls back the changes.”
Why interviewers mark this down
Terraform does not roll back. This is one of the most important differences from CloudFormation and a question interviewers ask directly. If an apply fails halfway through creating 10 resources, the successfully created resources stay. State reflects what was applied before the error. The developer must diagnose the failure, fix it, and run apply again. Over time, without understanding this, teams end up with orphaned resources and state inconsistencies that are difficult to resolve.
What to say instead
Say: "Terraform does not roll back on failure - successfully created resources remain and state reflects what was applied before the error. On the next apply, Terraform picks up from where it left off. I check the error, fix the root cause, and run apply again. If the state is unrecoverable I use terraform state rm to remove the broken resource from state and let Terraform recreate it, or use terraform import to bring an existing resource back under management. I never assume a failed apply cleaned up after itself."
Not pinning provider versions or understanding the lock file
What candidates say
“I'd run terraform init and use whatever version of the AWS provider is available.”
Why interviewers mark this down
Without pinned provider versions and a committed .terraform.lock.hcl, different engineers or CI runs may initialise with different provider versions and produce different plan output. Provider breaking changes occur frequently. The lock file records exact provider checksums so every team member and every CI run gets identical provider binaries. Not pinning versions is a common cause of "works on my machine" Terraform problems.
What to say instead
Say: "I always pin provider versions with a pessimistic constraint: ~> 5.0 allows patch updates within 5.x but not a jump to 6.0. The .terraform.lock.hcl file is committed to Git so every engineer and CI run uses identical provider checksums. When I want to upgrade a provider I run terraform init -upgrade, review the plan output carefully for unexpected changes, and commit the updated lock file as a deliberate decision."
Confusing variables, locals, and outputs
What candidates say
“Using a variable to derive a value from another resource in the same configuration.”
Why interviewers mark this down
Variables, locals, and outputs have distinct purposes. Variables are inputs - values provided from outside the configuration at apply time. They cannot reference resources defined in the same module. Locals are internal computed values derived from resources or other expressions, scoped to the current module and not exposed externally. Outputs are exports - values surfaced to callers of a module or readable by other configurations via remote_state. Using a variable where a local is needed results in a "variables may not be used here" error that is confusing if you do not understand the distinction.
What to say instead
Say: "Variables are for external inputs - values you do not know at code-write time like environment name or instance type. Locals are for internal derivation - local.name_prefix = format('%s-%s', var.project, var.environment). Outputs are for sharing values with callers - output 'vpc_id' exports the VPC ID so another module can reference it. A clean module has variables at the boundary, locals for internal composition, and outputs for everything a caller might need."
Not structuring modules for reuse
What candidates say
“I'd put everything in one main.tf file - it is simpler.”
Why interviewers mark this down
A single main.tf works for solo experiments but does not scale to teams or multiple environments. Interviewers testing infrastructure design maturity want to hear how you structure Terraform for reuse, maintainability, and safe testing. A flat configuration makes it impossible to test a VPC change in dev before applying it to prod, because everything is in one state. It also prevents sharing common patterns across projects.
What to say instead
Say: "I structure Terraform as root modules per environment (dev, staging, prod) that call reusable child modules (vpc, eks-cluster, rds, security-groups). Each child module has a documented interface: typed variables with descriptions and defaults, and outputs for everything a caller might use. Root modules are thin - they mostly wire variables into child modules and connect outputs. This way a change to the VPC module can be tested in dev before being applied to prod, and the same module is reused across every environment."
The bottom line
Terraform interviews reward candidates who have run it on a team and hit the real failure modes: corrupt state, accidentally destroyed resources, secrets in tfvars files, provider version drift. The candidates who stand out are the ones who can explain not just what the commands do but what happens when they go wrong - and how to design the configuration so those failures are recoverable. The best way to build this knowledge is to operate a real Terraform codebase, make mistakes, and fix them.