Terraform is not perfect

TL;DR

Terraform is a powerful tool for infrastructure as code, but like any tool, it comes with challenges. In this article, we cover the top 20 common mistakes people make when using Terraform and how to fix them. By the end, you’ll be well-equipped to avoid these pitfalls and become an expert in troubleshooting Terraform issues.

Introduction

Terraform is awesome, but even the most seasoned pros can make mistakes. Whether it’s due to misconfiguration or the inherent complexity of cloud environments, Terraform can throw curveballs your way. In this article, we’ll walk you through 20 common mistakes that users encounter when working with Terraform, along with practical solutions. Get ready to level up your Terraform skills!

Understanding these mistakes will not only help you save time and effort but also ensure your infrastructure remains reliable and scalable. From avoiding pitfalls in state management to securely managing secrets, this guide covers everything you need to know to elevate your Terraform game. By the time you finish reading, you’ll have a solid understanding of best practices that will make your Terraform projects smoother and more efficient.

1. Using Local State Instead of Remote State

Storing Terraform state locally might work for small projects but becomes a bottleneck in team environments. Without remote state, it can be challenging to collaborate effectively and ensure that everyone is working with the latest infrastructure state.

Solution: Always use remote state storage (e.g., AWS S3, Azure Blob Storage) with locking enabled to avoid state conflicts. Remote state storage provides better collaboration and avoids issues that arise from multiple team members trying to modify the state at the same time.

Example configuration for S3 with DynamoDB locking:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-west-2"
    dynamodb_table = "terraform_locks"
    encrypt        = true
  }
}

By using remote state, you also benefit from features like state versioning, which allows you to recover from mistakes by rolling back to a previous state.

2. Storing Secrets in Plaintext

Many users mistakenly store sensitive information, like credentials, directly in .tfvars or terraform.tfstate. This practice exposes your infrastructure to potential security risks, especially if these files are checked into version control.

Solution: Use tools like HashiCorp Vault or AWS Secrets Manager to handle secrets securely. These tools allow you to manage secrets in a secure manner, providing encryption and access control.

If storing secrets in version control is unavoidable, use tools like Mozilla SOPS to encrypt them. Another good practice is to leverage environment variables to inject secrets at runtime, reducing the risk of exposing sensitive information in your codebase.

3. Version Mismatch Between Terraform Binary and State File

Running Terraform with a different version from the one that created the state can cause compatibility issues and break deployments. This is especially problematic when working in a team, as different members may have different Terraform versions installed.

Solution: Set version constraints in your Terraform files to ensure consistency across environments:

terraform {
  required_version = ">= 0.12.9, < 0.13.0"
}

Use tfenv to easily switch between Terraform versions and avoid such issues. By specifying the required version in your configuration, you ensure that Terraform will notify you if you’re using an incompatible version, preventing errors before they happen.

4. Not Pinning Module and Provider Versions

Using unpinned versions for modules or providers can lead to inconsistent deployments due to breaking changes in newer versions. Without version pinning, your infrastructure can change unexpectedly when a newer version of a provider or module is released.

Solution: Pin the versions explicitly to ensure consistency:

provider "aws" {
  version = ">= 3.0.0"
}

Pinning versions helps you maintain control over your infrastructure. It also allows you to upgrade dependencies at your own pace, ensuring that any changes are tested before being applied to production environments.

5. Forgetting to Lock the State File

Simultaneous access to a Terraform state file can corrupt it, particularly in team environments. When multiple team members run Terraform commands simultaneously, it can lead to state corruption, making it difficult to recover.

Solution: Enable state locking with remote backends, like AWS S3 with DynamoDB, to prevent multiple operations at the same time. State locking ensures that only one Terraform process can modify the state at any given time, reducing the risk of state corruption.

State locking also provides better visibility, as it prevents accidental overwrites and ensures that everyone is working with the latest state file.

6. Incorrect AWS Credentials Configuration

One of the most common issues is Terraform failing to authenticate with AWS due to incorrect or missing credentials. This often results in errors that can be frustrating to troubleshoot.

Solution: Double-check your AWS credential configuration in ~/.aws/credentials or set them as environment variables:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key

Alternatively, use IAM roles if you’re running Terraform from an EC2 instance. IAM roles provide a more secure way to manage access without hardcoding credentials, reducing the risk of accidental exposure.

7. Circular Dependencies Between Resources

Terraform can throw circular dependency errors when resources depend on each other in a loop. Circular dependencies make it impossible for Terraform to determine the correct order of resource creation.

Solution: Use the depends_on argument to break circular dependencies or re-architect your infrastructure to eliminate them. By explicitly specifying dependencies, you give Terraform the information it needs to create resources in the correct order.

In more complex scenarios, consider breaking resources into separate modules to manage dependencies more effectively.

8. Not Running `terraform fmt` Regularly

Terraform configurations can become messy and hard to maintain without proper formatting. Inconsistent formatting makes it difficult for team members to collaborate effectively and can lead to errors.

Solution: Run terraform fmt regularly or add it as a pre-commit hook to ensure your code is consistently formatted. Proper formatting not only improves readability but also reduces the likelihood of syntax errors.

By enforcing consistent formatting, you make it easier for others to understand and contribute to your Terraform configurations.

9. Hardcoding Resource Values Instead of Using Variables

Hardcoding values in Terraform files makes your code inflexible and difficult to maintain. When values like instance types or region names are hardcoded, changing them requires modifying multiple places in your code.

Solution: Use variables for configurable resources and pass them via .tfvars or environment variables to increase flexibility. Variables make your Terraform configurations reusable and easier to manage.

For example, instead of hardcoding an AWS region, define a variable and reference it throughout your configuration:

variable "region" {
  default = "us-west-2"
}

This approach allows you to deploy the same configuration in different regions or environments without modifying the code.

10. Not Using Data Sources Properly

Overlooking data sources makes Terraform configurations harder to maintain by manually specifying resource IDs. This approach is error-prone and can lead to issues when resources are updated or changed.

Solution: Use Terraform’s data blocks to fetch resources dynamically. Data sources allow you to reference existing infrastructure, making your configurations more dynamic and adaptable to changes.

For example, retrieve an existing VPC by its tags:

data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["main-vpc"]
  }
}

Using data sources ensures that your configuration remains accurate and reduces the need for manual updates.

11. Improper Handling of Timeouts

Terraform might fail to create or destroy resources due to insufficient timeouts for cloud APIs. This is particularly common when dealing with complex resources that take a long time to provision.

Solution: Adjust timeouts in your provider blocks to ensure Terraform has enough time to complete operations:

provider "azurerm" {
  timeouts {
    create = "20m"
    delete = "15m"
  }
}

Properly setting timeouts helps avoid failures during long-running operations, providing a more reliable deployment experience.

12. Misconfigured Provider Blocks

When using multiple cloud providers, improper provider configuration is a common issue. Misconfigured providers can lead to errors and make it difficult to manage infrastructure across different environments.

Solution: Set up providers correctly, and use aliases when necessary to differentiate between multiple instances of the same provider:

provider "aws" {
  alias  = "us_east_1"
  region = "us-east-1"
}

Using aliases allows you to create resources in multiple regions or accounts without confusion, making your configuration more flexible and easier to manage.

13. Exceeding API Rate Limits

With many resources, Terraform might hit cloud provider API rate limits, causing operations to fail. This is especially common in large-scale deployments where Terraform makes numerous API calls.

Solution: Add retry configurations or limit the number of parallel executions with the -parallelism flag:

terraform apply -parallelism=5

Reducing parallelism helps avoid rate limit issues, but it may slow down the deployment process. Finding the right balance between speed and reliability is key.

14. Not Using `terraform plan` Before Applying Changes

Skipping the terraform plan step can lead to unexpected infrastructure changes. Without a plan, you may inadvertently destroy or modify critical resources.

Solution: Always run terraform plan to preview changes before applying them. The plan output shows exactly what changes Terraform will make, allowing you to catch potential issues before they affect your infrastructure.

Running a plan before applying changes is a crucial step in ensuring that your infrastructure remains stable and predictable.

15. Overly Complex Modules

Creating overly complex and non-modular Terraform modules leads to difficulties in maintaining and reusing code. Complex modules can become a liability, making it hard for team members to understand and modify them.

Solution: Keep modules simple, reusable, and focused on one function. Use inputs and outputs efficiently to make modules more flexible. By breaking down complex modules into smaller, focused ones, you make your codebase easier to manage and understand.

Simple modules are easier to test, debug, and reuse across multiple projects, reducing the time and effort needed to maintain your infrastructure.

16. Ignoring `terraform-docs`

Failing to document Terraform configurations makes onboarding new team members challenging. Without proper documentation, understanding how the infrastructure is set up and how to make changes becomes difficult.

Solution: Use terraform-docs to automatically generate documentation from your configuration files. This tool extracts useful information from your Terraform code and creates human-readable documentation, making it easier for others to understand your infrastructure.

Good documentation is crucial for effective collaboration and helps ensure that everyone on the team is on the same page.

17. Not Splitting Terraform States for Large Infrastructures

Handling too many resources in a single state file can slow down Terraform operations and increase the risk of state corruption. Large state files are also harder to manage and debug.

Solution: Split your Terraform state into multiple smaller states based on environments or services. For example, separate the state for networking, compute, and storage resources. This approach improves performance and makes managing your infrastructure more straightforward.

By splitting state files, you also reduce the blast radius of changes, making it easier to manage and troubleshoot individual components.

18. Bad Directory Structure

Organizing Terraform files by resource type rather than by service or environment can lead to chaos as your project scales. A poorly structured directory makes it hard to locate files and understand the overall architecture.

Solution: Use a service-level directory structure, grouping Terraform files by services (e.g., network, compute). This approach makes it easier to manage and understand your configurations, especially as your project grows.

A well-organized directory structure also helps enforce best practices and makes it easier for new team members to get up to speed.

19. Not Using `terraform destroy` Carefully

Improper use of terraform destroy can lead to the accidental deletion of critical infrastructure. Without careful consideration, you may end up destroying resources that are still in use, causing outages.

Solution: Use terraform destroy cautiously, and always run a terraform plan before destroying resources. Reviewing the plan ensures that you understand what will be destroyed and helps you avoid costly mistakes.

In production environments, consider using targeted destroys to limit the impact, rather than destroying all resources at once.

20. Skipping `terraform refresh` Before Plan

If you haven’t run terraform refresh, your state file might not match the actual state of resources. This can lead to incorrect plans and unexpected changes.

Solution: Run terraform refresh regularly to ensure the state file is up-to-date with your infrastructure. This command updates the state file to match the real-world state of your infrastructure, reducing the risk of inconsistencies.

Keeping your state file in sync with the actual infrastructure state helps prevent Terraform from making unnecessary or incorrect changes.

Conclusion

Mistakes are part of the Terraform learning curve, but being aware of these common pitfalls will help you avoid them and manage your infrastructure efficiently. By following best practices—like using remote state, securing secrets, and managing versions—you’ll be well on your way to becoming a Terraform expert. Happy Terraforming!

Remember, the key to mastering Terraform is continuous learning and improvement. Keep refining your skills, stay updated with the latest features, and always test your configurations thoroughly before applying changes in production. With practice, you’ll find that Terraform becomes an indispensable tool in your DevOps toolkit.

Cloud AI

Terraform is not perfect

TL;DR

Introduction

1. Using Local State Instead of Remote State

2. Storing Secrets in Plaintext

3. Version Mismatch Between Terraform Binary and State File

4. Not Pinning Module and Provider Versions

5. Forgetting to Lock the State File

6. Incorrect AWS Credentials Configuration

7. Circular Dependencies Between Resources

8. Not Running `terraform fmt` Regularly

9. Hardcoding Resource Values Instead of Using Variables

10. Not Using Data Sources Properly

11. Improper Handling of Timeouts

12. Misconfigured Provider Blocks

13. Exceeding API Rate Limits

14. Not Using `terraform plan` Before Applying Changes

15. Overly Complex Modules

16. Ignoring `terraform-docs`

17. Not Splitting Terraform States for Large Infrastructures

18. Bad Directory Structure

19. Not Using `terraform destroy` Carefully

20. Skipping `terraform refresh` Before Plan

Conclusion

Relevant Resources

Related Posts:

Terraform is not perfect

TL;DR

Introduction

1. Using Local State Instead of Remote State

2. Storing Secrets in Plaintext

3. Version Mismatch Between Terraform Binary and State File

4. Not Pinning Module and Provider Versions

5. Forgetting to Lock the State File

6. Incorrect AWS Credentials Configuration

7. Circular Dependencies Between Resources

8. Not Running terraform fmt Regularly

9. Hardcoding Resource Values Instead of Using Variables

10. Not Using Data Sources Properly

11. Improper Handling of Timeouts

12. Misconfigured Provider Blocks

13. Exceeding API Rate Limits

14. Not Using terraform plan Before Applying Changes

15. Overly Complex Modules

16. Ignoring terraform-docs

17. Not Splitting Terraform States for Large Infrastructures

18. Bad Directory Structure

19. Not Using terraform destroy Carefully

20. Skipping terraform refresh Before Plan

Conclusion

Relevant Resources

Related Posts:

8. Not Running `terraform fmt` Regularly

14. Not Using `terraform plan` Before Applying Changes

16. Ignoring `terraform-docs`

19. Not Using `terraform destroy` Carefully

20. Skipping `terraform refresh` Before Plan