Files
2025-11-29 17:51:17 +08:00

16 KiB

name, description
name description
iac-terraform Infrastructure as Code with Terraform and Terragrunt. Use for creating, validating, troubleshooting, and managing Terraform configurations, modules, and state. Covers Terraform workflows, best practices, module development, state management, Terragrunt patterns, and common issue resolution.

Infrastructure as Code - Terraform & Terragrunt

Comprehensive guidance for infrastructure as code using Terraform and Terragrunt, from development through production deployment.

When to Use This Skill

Use this skill when:

  • Writing or refactoring Terraform configurations
  • Creating reusable Terraform modules
  • Troubleshooting Terraform/Terragrunt errors
  • Managing Terraform state
  • Implementing IaC best practices
  • Setting up Terragrunt project structure
  • Reviewing infrastructure code
  • Debugging plan/apply issues

Core Workflows

1. New Infrastructure Development

Workflow Decision Tree:

Is this reusable across environments/projects?
├─ Yes → Create a Terraform module
│   └─ See "Creating Terraform Modules" below
└─ No → Create environment-specific configuration
    └─ See "Environment Configuration" below

Creating Terraform Modules

When building reusable infrastructure:

  1. Scaffold new module with script:
python3 scripts/init_module.py my-module-name

This automatically creates:

  • Standard module file structure
  • Template files with proper formatting
  • Examples directory
  • README with documentation
  1. Use module template structure:

    • See assets/templates/MODULE_TEMPLATE.md for complete structure
    • Required files: main.tf, variables.tf, outputs.tf, versions.tf, README.md
    • Recommended: examples/ directory with working examples
  2. Follow module best practices:

    • Single responsibility - one module, one purpose
    • Sensible defaults for optional variables
    • Complete descriptions for all variables and outputs
    • Input validation using validation blocks
    • Mark sensitive values with sensitive = true
  3. Validate module:

python3 scripts/validate_module.py /path/to/module

This checks for:

  • Required files present
  • Variables have descriptions and types
  • Outputs have descriptions
  • README exists and is complete
  • Naming conventions followed
  • Sensitive values properly marked
  1. Test module:
cd examples/complete
terraform init
terraform plan
  1. Document module:
    • Use terraform-docs to auto-generate: terraform-docs markdown . > README.md
    • Include usage examples
    • Document all inputs and outputs

Key Module Patterns:

See references/best_practices.md "Module Design" section for:

  • Composability patterns
  • Variable organization
  • Output design
  • Module versioning strategies

Environment Configuration

For environment-specific infrastructure:

  1. Structure by environment:
environments/
├── dev/
├── staging/
└── prod/
  1. Use consistent file organization:
environment/
├── main.tf           # Resource definitions
├── variables.tf      # Variable declarations
├── terraform.tfvars  # Default values (committed)
├── secrets.auto.tfvars  # Sensitive values (.gitignore)
├── backend.tf        # State configuration
├── outputs.tf        # Output values
└── versions.tf       # Version constraints
  1. Reference modules:
module "vpc" {
  source = "git::https://github.com/company/terraform-modules.git//vpc?ref=v1.2.0"
  
  name        = "${var.environment}-vpc"
  vpc_cidr    = var.vpc_cidr
  environment = var.environment
}

2. State Management & Inspection

When to inspect state:

  • Before major changes
  • Investigating drift
  • Debugging resource issues
  • Auditing infrastructure

Inspect state and check health:

python3 scripts/inspect_state.py /path/to/terraform/directory

Check for drift:

python3 scripts/inspect_state.py /path/to/terraform/directory --check-drift

The script provides:

  • Resource count and types
  • Backend configuration
  • Provider versions
  • Issues with resources (tainted, etc.)
  • Drift detection (if requested)

Manual state operations:

# List all resources
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Remove from state (doesn't destroy)
terraform state rm aws_instance.web

# Move/rename resource
terraform state mv aws_instance.web aws_instance.web_server

# Import existing resource
terraform import aws_instance.web i-1234567890abcdef0

State best practices: See references/best_practices.md "State Management" section for:

  • Remote backend setup (S3 + DynamoDB)
  • State file organization strategies
  • Encryption and security
  • Backup and recovery procedures

3. Standard Terraform Workflow

# 1. Initialize (first time or after module changes)
terraform init

# 2. Format code
terraform fmt -recursive

# 3. Validate syntax
terraform validate

# 4. Plan changes (always review!)
terraform plan -out=tfplan

# 5. Apply changes
terraform apply tfplan

# 6. Verify outputs
terraform output

With Terragrunt:

# Run for single module
terragrunt plan
terragrunt apply

# Run for all modules in directory tree
terragrunt run-all plan
terragrunt run-all apply

4. Troubleshooting Issues

When encountering errors:

  1. Read the complete error message - Don't skip details

  2. Check common issues: See references/troubleshooting.md for:

    • State lock errors
    • State drift/corruption
    • Provider authentication failures
    • Resource errors (already exists, dependency errors, timeouts)
    • Module source issues
    • Terragrunt-specific issues (dependency cycles, hooks)
    • Performance problems
  3. Enable debug logging if needed:

export TF_LOG=DEBUG
export TF_LOG_PATH=terraform-debug.log
terraform plan
  1. Isolate the problem:
# Test specific resource
terraform plan -target=aws_instance.web
terraform apply -target=aws_instance.web
  1. Common quick fixes:

State locked:

# Verify no one else running, then:
terraform force-unlock <lock-id>

Provider cache issues:

rm -rf .terraform
terraform init -upgrade

Module cache issues:

rm -rf .terraform/modules
terraform init

5. Code Review & Quality

Before committing:

  1. Format code:
terraform fmt -recursive
  1. Validate syntax:
terraform validate
  1. Lint with tflint:
tflint --module
  1. Security scan with checkov:
checkov -d .
  1. Validate modules:
python3 scripts/validate_module.py modules/vpc
  1. Generate documentation:
terraform-docs markdown modules/vpc > modules/vpc/README.md

Review checklist:

  • All variables have descriptions
  • Sensitive values marked as sensitive
  • Outputs have descriptions
  • Resources follow naming conventions
  • No hardcoded values (use variables)
  • README is complete and current
  • Examples directory exists and works
  • Version constraints specified
  • Security best practices followed

See references/best_practices.md for comprehensive guidelines.

Terragrunt Patterns

Project Structure

terragrunt-project/
├── terragrunt.hcl              # Root config
├── account.hcl                 # Account-level vars
├── region.hcl                  # Region-level vars
└── environments/
    ├── dev/
    │   ├── env.hcl            # Environment vars
    │   └── us-east-1/
    │       ├── vpc/
    │       │   └── terragrunt.hcl
    │       └── eks/
    │           └── terragrunt.hcl
    └── prod/
        └── us-east-1/
            ├── vpc/
            └── eks/

Dependency Management

# In eks/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"
  
  # Mock outputs for plan/validate
  mock_outputs = {
    vpc_id         = "vpc-mock"
    subnet_ids     = ["subnet-mock"]
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

inputs = {
  vpc_id     = dependency.vpc.outputs.vpc_id
  subnet_ids = dependency.vpc.outputs.private_subnet_ids
}

Common Patterns

See assets/templates/MODULE_TEMPLATE.md for complete Terragrunt configuration templates including:

  • Root terragrunt.hcl with provider generation
  • Remote state configuration
  • Module-level terragrunt.hcl patterns
  • Dependency handling

Reference Documentation

references/best_practices.md

Comprehensive best practices covering:

  • Project Structure - Recommended directory layouts
  • State Management - Remote state, locking, organization
  • Module Design - Single responsibility, composability, versioning
  • Variable Management - Declarations, files hierarchy, secrets
  • Resource Naming - Conventions and standards
  • Security Practices - Least privilege, encryption, secret management
  • Testing & Validation - Tools and approaches
  • CI/CD Integration - Pipeline patterns

Read this when:

  • Setting up new Terraform projects
  • Establishing team standards
  • Designing reusable modules
  • Implementing security controls
  • Setting up CI/CD pipelines

references/troubleshooting.md

Detailed troubleshooting guide for:

  • State Issues - Lock errors, drift, corruption
  • Provider Issues - Version conflicts, authentication
  • Resource Errors - Already exists, dependencies, timeouts
  • Module Issues - Source not found, version conflicts
  • Terragrunt Specific - Dependency cycles, hooks
  • Performance Issues - Slow plans, optimization strategies

Read this when:

  • Encountering specific error messages
  • Investigating unexpected behavior
  • Debugging failed deployments
  • Performance tuning

Each issue includes:

  • Symptom description
  • Common causes
  • Step-by-step resolution
  • Prevention strategies

references/cost_optimization.md

Cloud cost optimization strategies for Terraform-managed infrastructure:

  • Right-Sizing Resources - Compute, database, and storage optimization
  • Spot and Reserved Instances - Cost-effective instance strategies
  • Storage Optimization - S3 lifecycle policies, EBS volume types
  • Networking Costs - VPC endpoints, data transfer optimization
  • Resource Lifecycle - Scheduled shutdown, cleanup automation
  • Cost Tagging - Comprehensive tagging for cost allocation
  • Monitoring and Alerts - Budget alerts, anomaly detection
  • Multi-Cloud - Azure, GCP cost optimization patterns

Read this when:

  • Planning infrastructure to minimize costs
  • Conducting cost reviews or optimization initiatives
  • Implementing auto-scaling and scheduling
  • Setting up cost monitoring and alerts
  • Designing cost-effective architectures

CI/CD Workflows

Ready-to-use CI/CD pipeline templates in assets/workflows/:

github-actions-terraform.yml

Complete GitHub Actions workflow including:

  • Terraform validation and formatting checks
  • TFLint linting
  • Checkov security scanning
  • Terraform plan on PRs with comment posting
  • Terraform apply on main branch with approval
  • OIDC authentication support

github-actions-terragrunt.yml

Terragrunt-specific workflow featuring:

  • Changed module detection
  • Multi-module parallel planning
  • Run-all commands
  • Dependency-aware apply ordering
  • Manual workflow dispatch with environment selection

gitlab-ci-terraform.yml

GitLab CI/CD pipeline with:

  • Multi-stage pipeline (validate, lint, security, plan, apply)
  • Artifact management
  • Manual deployment gates
  • Multi-environment configuration examples

Use these templates as starting points for your CI/CD pipelines. Customize based on your:

  • Cloud provider and authentication method
  • Repository structure
  • Team approval workflows
  • Environment promotion strategy

Scripts

init_module.py

Scaffolds a new Terraform module with proper structure and template files.

Usage:

# Create module in current directory
python3 scripts/init_module.py my-vpc

# Create in specific path
python3 scripts/init_module.py my-vpc --path ./modules

# Get JSON output
python3 scripts/init_module.py my-vpc --json

Creates:

  • main.tf - Resource definitions with TODO placeholders
  • variables.tf - Input variables with validation examples
  • outputs.tf - Output values with descriptions
  • versions.tf - Terraform and provider version constraints
  • README.md - Module documentation template
  • examples/complete/ - Complete usage example

Use when:

  • Starting a new Terraform module
  • Ensuring consistent module structure across team
  • Quickly bootstrapping module development
  • Teaching module best practices

inspect_state.py

Comprehensive state inspection and health check.

Usage:

# Basic inspection
python3 scripts/inspect_state.py /path/to/terraform

# Include drift detection
python3 scripts/inspect_state.py /path/to/terraform --check-drift

Provides:

  • State health status
  • Resource counts and types
  • Provider versions
  • Backend configuration
  • Resource issues (tainted, etc.)
  • Configuration drift detection (optional)
  • Actionable recommendations

Use when:

  • Before major infrastructure changes
  • Investigating resource issues
  • Auditing infrastructure state
  • Detecting configuration drift

validate_module.py

Validates Terraform modules against best practices.

Usage:

python3 scripts/validate_module.py /path/to/module

Checks:

  • Required files present (main.tf, variables.tf, outputs.tf)
  • Variable descriptions and types
  • Output descriptions
  • Sensitive value handling
  • README completeness
  • Version constraints
  • Example configurations
  • Naming conventions
  • Hard-coded values that should be variables

Returns:

  • Issues (must fix)
  • Warnings (should fix)
  • Suggestions (consider)

Use when:

  • Creating new modules
  • Reviewing module code
  • Before releasing module versions
  • Establishing quality standards

Assets

templates/MODULE_TEMPLATE.md

Complete Terraform module template including:

  • File-by-file structure and examples
  • main.tf patterns
  • variables.tf with validation
  • outputs.tf best practices
  • versions.tf constraints
  • README.md template
  • Example usage configurations
  • Terragrunt configuration templates

Use this when:

  • Creating new modules from scratch
  • Standardizing module structure
  • Onboarding team members
  • Establishing module conventions

Quick Reference

Essential Commands

# Initialize
terraform init
terraform init -upgrade  # Update providers

# Validate
terraform validate
terraform fmt -recursive

# Plan
terraform plan
terraform plan -out=tfplan

# Apply
terraform apply
terraform apply tfplan
terraform apply -auto-approve  # CI/CD only

# State
terraform state list
terraform state show <resource>
terraform state rm <resource>
terraform state mv <old> <new>

# Import
terraform import <resource_address> <resource_id>

# Destroy
terraform destroy
terraform destroy -target=<resource>

# Outputs
terraform output
terraform output <output_name>

Terragrunt Commands

# Single module
terragrunt init
terragrunt plan
terragrunt apply

# All modules
terragrunt run-all plan
terragrunt run-all apply
terragrunt run-all destroy

# With specific modules
terragrunt run-all apply --terragrunt-include-dir vpc --terragrunt-include-dir eks

Best Practices Summary

Always:

  • Use remote state with locking
  • Plan before apply (review changes)
  • Pin Terraform and provider versions
  • Use modules for reusable components
  • Mark sensitive values as sensitive
  • Document everything
  • Test in non-production first

Never:

  • Commit secrets or credentials
  • Manually edit state files
  • Use root AWS credentials
  • Skip code review for production changes
  • Deploy without testing
  • Ignore security scan findings

Key Principles:

  • Infrastructure as code (everything in version control)
  • DRY (Don't Repeat Yourself) - use modules
  • Immutable infrastructure
  • Environment parity (dev/staging/prod similar)
  • Security by default
  • Document for future you