Initial commit
This commit is contained in:
709
skills/references/best_practices.md
Normal file
709
skills/references/best_practices.md
Normal file
@@ -0,0 +1,709 @@
|
||||
# Terraform Best Practices
|
||||
|
||||
Comprehensive guide to Terraform best practices for infrastructure as code.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Project Structure](#project-structure)
|
||||
2. [State Management](#state-management)
|
||||
3. [Module Design](#module-design)
|
||||
4. [Variable Management](#variable-management)
|
||||
5. [Resource Naming](#resource-naming)
|
||||
6. [Security Practices](#security-practices)
|
||||
7. [Testing & Validation](#testing--validation)
|
||||
8. [CI/CD Integration](#cicd-integration)
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Recommended Directory Layout
|
||||
|
||||
```
|
||||
terraform-project/
|
||||
├── environments/
|
||||
│ ├── dev/
|
||||
│ │ ├── main.tf
|
||||
│ │ ├── variables.tf
|
||||
│ │ ├── outputs.tf
|
||||
│ │ ├── terraform.tfvars
|
||||
│ │ └── backend.tf
|
||||
│ ├── staging/
|
||||
│ └── prod/
|
||||
├── modules/
|
||||
│ ├── networking/
|
||||
│ │ ├── main.tf
|
||||
│ │ ├── variables.tf
|
||||
│ │ ├── outputs.tf
|
||||
│ │ ├── versions.tf
|
||||
│ │ └── README.md
|
||||
│ ├── compute/
|
||||
│ └── database/
|
||||
├── global/
|
||||
│ ├── iam/
|
||||
│ └── dns/
|
||||
└── README.md
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
|
||||
**Separate Environments**
|
||||
- Use directories for each environment (dev, staging, prod)
|
||||
- Each environment has its own state file
|
||||
- Prevents accidental changes to wrong environment
|
||||
|
||||
**Reusable Modules**
|
||||
- Common infrastructure patterns in modules/
|
||||
- Modules are versioned and tested
|
||||
- Used across multiple environments
|
||||
|
||||
**Global Resources**
|
||||
- Resources shared across environments (IAM, DNS)
|
||||
- Separate state for better isolation
|
||||
- Carefully managed with extra review
|
||||
|
||||
---
|
||||
|
||||
## State Management
|
||||
|
||||
### Remote State is Essential
|
||||
|
||||
**Why Remote State:**
|
||||
- Team collaboration and locking
|
||||
- State backup and versioning
|
||||
- Secure credential handling
|
||||
- Disaster recovery
|
||||
|
||||
**Recommended Backend: S3 + DynamoDB**
|
||||
|
||||
```hcl
|
||||
terraform {
|
||||
backend "s3" {
|
||||
bucket = "company-terraform-state"
|
||||
key = "prod/networking/terraform.tfstate"
|
||||
region = "us-east-1"
|
||||
encrypt = true
|
||||
dynamodb_table = "terraform-state-lock"
|
||||
kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**State Best Practices:**
|
||||
|
||||
1. **Enable Encryption**: Always encrypt state at rest
|
||||
2. **Enable Versioning**: On S3 bucket for state recovery
|
||||
3. **Use State Locking**: DynamoDB table prevents concurrent modifications
|
||||
4. **Restrict Access**: IAM policies limiting who can read/write state
|
||||
5. **Separate State Files**: Different states for different components
|
||||
6. **Regular Backups**: Automated backups of state files
|
||||
|
||||
### State File Organization
|
||||
|
||||
**Bad - Single State:**
|
||||
```
|
||||
terraform.tfstate (contains everything)
|
||||
```
|
||||
|
||||
**Good - Multiple States:**
|
||||
```
|
||||
networking/terraform.tfstate
|
||||
compute/terraform.tfstate
|
||||
database/terraform.tfstate
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Reduced blast radius
|
||||
- Faster plan/apply operations
|
||||
- Parallel team work
|
||||
- Easier to understand and debug
|
||||
|
||||
### State Management Commands
|
||||
|
||||
```bash
|
||||
# List resources in state
|
||||
terraform state list
|
||||
|
||||
# Show specific resource
|
||||
terraform state show aws_instance.example
|
||||
|
||||
# Move resource to different address
|
||||
terraform state mv aws_instance.old aws_instance.new
|
||||
|
||||
# Remove resource from state (doesn't destroy)
|
||||
terraform state rm aws_instance.example
|
||||
|
||||
# Import existing resource
|
||||
terraform import aws_instance.example i-1234567890abcdef0
|
||||
|
||||
# Pull state for inspection (read-only)
|
||||
terraform state pull > state.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Module Design
|
||||
|
||||
### Module Structure
|
||||
|
||||
Every module should have:
|
||||
|
||||
```
|
||||
module-name/
|
||||
├── main.tf # Primary resources
|
||||
├── variables.tf # Input variables
|
||||
├── outputs.tf # Output values
|
||||
├── versions.tf # Version constraints
|
||||
├── README.md # Documentation
|
||||
└── examples/ # Usage examples
|
||||
└── complete/
|
||||
├── main.tf
|
||||
└── variables.tf
|
||||
```
|
||||
|
||||
### Module Best Practices
|
||||
|
||||
**1. Single Responsibility**
|
||||
Each module should do one thing well:
|
||||
- ✅ `vpc-module` creates VPC with subnets, route tables, NACLs
|
||||
- ❌ `infrastructure` creates VPC, EC2, RDS, S3, everything
|
||||
|
||||
**2. Composability**
|
||||
Modules should work together:
|
||||
```hcl
|
||||
module "vpc" {
|
||||
source = "./modules/vpc"
|
||||
cidr = "10.0.0.0/16"
|
||||
}
|
||||
|
||||
module "eks" {
|
||||
source = "./modules/eks"
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
}
|
||||
```
|
||||
|
||||
**3. Sensible Defaults**
|
||||
```hcl
|
||||
variable "instance_type" {
|
||||
type = string
|
||||
description = "EC2 instance type"
|
||||
default = "t3.micro" # Reasonable default
|
||||
}
|
||||
|
||||
variable "enable_monitoring" {
|
||||
type = bool
|
||||
description = "Enable detailed monitoring"
|
||||
default = false # Cost-effective default
|
||||
}
|
||||
```
|
||||
|
||||
**4. Complete Documentation**
|
||||
|
||||
```hcl
|
||||
variable "vpc_cidr" {
|
||||
type = string
|
||||
description = "CIDR block for VPC. Must be a valid IPv4 CIDR."
|
||||
|
||||
validation {
|
||||
condition = can(cidrhost(var.vpc_cidr, 0))
|
||||
error_message = "Must be a valid IPv4 CIDR block."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**5. Output Useful Values**
|
||||
|
||||
```hcl
|
||||
output "vpc_id" {
|
||||
description = "ID of the VPC"
|
||||
value = aws_vpc.main.id
|
||||
}
|
||||
|
||||
output "private_subnet_ids" {
|
||||
description = "List of private subnet IDs for deploying workloads"
|
||||
value = aws_subnet.private[*].id
|
||||
}
|
||||
|
||||
output "nat_gateway_ips" {
|
||||
description = "Elastic IPs of NAT gateways for firewall whitelisting"
|
||||
value = aws_eip.nat[*].public_ip
|
||||
}
|
||||
```
|
||||
|
||||
### Module Versioning
|
||||
|
||||
**Use Git Tags for Versioning:**
|
||||
```hcl
|
||||
module "vpc" {
|
||||
source = "git::https://github.com/company/terraform-modules.git//vpc?ref=v1.2.3"
|
||||
# Configuration...
|
||||
}
|
||||
```
|
||||
|
||||
**Semantic Versioning:**
|
||||
- v1.0.0 → First stable release
|
||||
- v1.1.0 → New features (backward compatible)
|
||||
- v1.1.1 → Bug fixes
|
||||
- v2.0.0 → Breaking changes
|
||||
|
||||
---
|
||||
|
||||
## Variable Management
|
||||
|
||||
### Variable Declaration
|
||||
|
||||
**Always Include:**
|
||||
```hcl
|
||||
variable "environment" {
|
||||
type = string
|
||||
description = "Environment name (dev, staging, prod)"
|
||||
|
||||
validation {
|
||||
condition = contains(["dev", "staging", "prod"], var.environment)
|
||||
error_message = "Environment must be dev, staging, or prod."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Variable Files Hierarchy
|
||||
|
||||
```
|
||||
terraform.tfvars # Default values (committed, no secrets)
|
||||
dev.tfvars # Dev overrides
|
||||
prod.tfvars # Prod overrides
|
||||
secrets.auto.tfvars # Auto-loaded (in .gitignore)
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
terraform apply -var-file="prod.tfvars"
|
||||
```
|
||||
|
||||
### Sensitive Variables
|
||||
|
||||
**Mark as Sensitive:**
|
||||
```hcl
|
||||
variable "database_password" {
|
||||
type = string
|
||||
description = "Master password for database"
|
||||
sensitive = true
|
||||
}
|
||||
```
|
||||
|
||||
**Never commit secrets:**
|
||||
```bash
|
||||
# .gitignore
|
||||
*.auto.tfvars
|
||||
secrets.tfvars
|
||||
terraform.tfvars # If contains secrets
|
||||
```
|
||||
|
||||
**Better: Use External Secret Management**
|
||||
```hcl
|
||||
data "aws_secretsmanager_secret_version" "db_password" {
|
||||
secret_id = "prod/database/master-password"
|
||||
}
|
||||
|
||||
resource "aws_db_instance" "main" {
|
||||
password = data.aws_secretsmanager_secret_version.db_password.secret_string
|
||||
}
|
||||
```
|
||||
|
||||
### Variable Organization
|
||||
|
||||
**Group related variables:**
|
||||
```hcl
|
||||
# Network Configuration
|
||||
variable "vpc_cidr" { }
|
||||
variable "availability_zones" { }
|
||||
variable "public_subnet_cidrs" { }
|
||||
variable "private_subnet_cidrs" { }
|
||||
|
||||
# Application Configuration
|
||||
variable "app_name" { }
|
||||
variable "app_version" { }
|
||||
variable "instance_count" { }
|
||||
|
||||
# Tagging
|
||||
variable "tags" {
|
||||
type = map(string)
|
||||
description = "Common tags for all resources"
|
||||
default = {}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resource Naming
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
**Terraform Resources (snake_case):**
|
||||
```hcl
|
||||
resource "aws_vpc" "main_vpc" { }
|
||||
resource "aws_subnet" "public_subnet_az1" { }
|
||||
resource "aws_instance" "web_server_01" { }
|
||||
```
|
||||
|
||||
**AWS Resource Names (kebab-case):**
|
||||
```hcl
|
||||
resource "aws_s3_bucket" "logs" {
|
||||
bucket = "company-prod-application-logs"
|
||||
# company-{env}-{service}-{purpose}
|
||||
}
|
||||
|
||||
resource "aws_instance" "web" {
|
||||
tags = {
|
||||
Name = "prod-web-server-01"
|
||||
# {env}-{service}-{type}-{number}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Naming Standards
|
||||
|
||||
**Pattern: `{company}-{environment}-{service}-{resource_type}`**
|
||||
|
||||
Examples:
|
||||
- `acme-prod-api-alb`
|
||||
- `acme-dev-workers-asg`
|
||||
- `acme-staging-database-rds`
|
||||
|
||||
**Benefits:**
|
||||
- Easy filtering in AWS console
|
||||
- Clear ownership and purpose
|
||||
- Consistent across environments
|
||||
- Billing and cost tracking
|
||||
|
||||
---
|
||||
|
||||
## Security Practices
|
||||
|
||||
### 1. Principle of Least Privilege
|
||||
|
||||
```hcl
|
||||
# Bad - Too permissive
|
||||
resource "aws_iam_policy" "bad" {
|
||||
policy = jsonencode({
|
||||
Statement = [{
|
||||
Effect = "Allow"
|
||||
Action = "*"
|
||||
Resource = "*"
|
||||
}]
|
||||
})
|
||||
}
|
||||
|
||||
# Good - Specific permissions
|
||||
resource "aws_iam_policy" "good" {
|
||||
policy = jsonencode({
|
||||
Statement = [{
|
||||
Effect = "Allow"
|
||||
Action = [
|
||||
"s3:GetObject",
|
||||
"s3:PutObject"
|
||||
]
|
||||
Resource = "arn:aws:s3:::my-bucket/*"
|
||||
}]
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Encryption Everywhere
|
||||
|
||||
```hcl
|
||||
# Encrypt S3 buckets
|
||||
resource "aws_s3_bucket" "secure" {
|
||||
bucket = "my-secure-bucket"
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "secure" {
|
||||
bucket = aws_s3_bucket.secure.id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "aws:kms"
|
||||
kms_master_key_id = aws_kms_key.bucket.arn
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Encrypt EBS volumes
|
||||
resource "aws_instance" "secure" {
|
||||
root_block_device {
|
||||
encrypted = true
|
||||
}
|
||||
}
|
||||
|
||||
# Encrypt RDS databases
|
||||
resource "aws_db_instance" "secure" {
|
||||
storage_encrypted = true
|
||||
kms_key_id = aws_kms_key.rds.arn
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Network Security
|
||||
|
||||
```hcl
|
||||
# Restrictive security groups
|
||||
resource "aws_security_group" "web" {
|
||||
name_prefix = "web-"
|
||||
|
||||
# Only allow specific inbound
|
||||
ingress {
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"] # Consider restricting further
|
||||
}
|
||||
|
||||
# Explicit outbound
|
||||
egress {
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
}
|
||||
|
||||
# Use private subnets for workloads
|
||||
resource "aws_subnet" "private" {
|
||||
map_public_ip_on_launch = false # No public IPs
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Secret Management
|
||||
|
||||
**Never in Code:**
|
||||
```hcl
|
||||
# ❌ NEVER DO THIS
|
||||
resource "aws_db_instance" "bad" {
|
||||
password = "MySecretPassword123" # NEVER!
|
||||
}
|
||||
```
|
||||
|
||||
**Use AWS Secrets Manager:**
|
||||
```hcl
|
||||
# ✅ CORRECT APPROACH
|
||||
data "aws_secretsmanager_secret_version" "db" {
|
||||
secret_id = var.db_secret_arn
|
||||
}
|
||||
|
||||
resource "aws_db_instance" "good" {
|
||||
password = data.aws_secretsmanager_secret_version.db.secret_string
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Resource Tagging
|
||||
|
||||
```hcl
|
||||
locals {
|
||||
common_tags = {
|
||||
Environment = var.environment
|
||||
ManagedBy = "Terraform"
|
||||
Owner = "platform-team"
|
||||
Project = var.project_name
|
||||
CostCenter = var.cost_center
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_instance" "web" {
|
||||
tags = merge(
|
||||
local.common_tags,
|
||||
{
|
||||
Name = "web-server"
|
||||
Role = "webserver"
|
||||
}
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing & Validation
|
||||
|
||||
### Pre-Deployment Validation
|
||||
|
||||
**1. Terraform Validate**
|
||||
```bash
|
||||
terraform validate
|
||||
```
|
||||
Checks syntax and configuration validity.
|
||||
|
||||
**2. Terraform Plan**
|
||||
```bash
|
||||
terraform plan -out=tfplan
|
||||
```
|
||||
Review changes before applying.
|
||||
|
||||
**3. tflint**
|
||||
```bash
|
||||
tflint --module
|
||||
```
|
||||
Linter for catching errors and enforcing conventions.
|
||||
|
||||
**4. checkov**
|
||||
```bash
|
||||
checkov -d .
|
||||
```
|
||||
Security and compliance scanning.
|
||||
|
||||
**5. terraform-docs**
|
||||
```bash
|
||||
terraform-docs markdown . > README.md
|
||||
```
|
||||
Auto-generate documentation.
|
||||
|
||||
### Automated Testing
|
||||
|
||||
**Terratest (Go):**
|
||||
```go
|
||||
func TestVPCCreation(t *testing.T) {
|
||||
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
|
||||
TerraformDir: "../examples/complete",
|
||||
})
|
||||
|
||||
defer terraform.Destroy(t, terraformOptions)
|
||||
terraform.InitAndApply(t, terraformOptions)
|
||||
|
||||
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
|
||||
assert.NotEmpty(t, vpcId)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions Example
|
||||
|
||||
```yaml
|
||||
name: Terraform
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main]
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
terraform:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Setup Terraform
|
||||
uses: hashicorp/setup-terraform@v2
|
||||
|
||||
- name: Terraform Init
|
||||
run: terraform init
|
||||
|
||||
- name: Terraform Validate
|
||||
run: terraform validate
|
||||
|
||||
- name: Terraform Plan
|
||||
run: terraform plan -no-color
|
||||
if: github.event_name == 'pull_request'
|
||||
|
||||
- name: Terraform Apply
|
||||
run: terraform apply -auto-approve
|
||||
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
|
||||
```
|
||||
|
||||
### Best Practices for CI/CD
|
||||
|
||||
1. **Always run plan on PRs** - Review changes before merge
|
||||
2. **Require approvals** - Human review for production
|
||||
3. **Use workspaces or directories** - Separate pipeline per environment
|
||||
4. **Store state remotely** - S3 backend with locking
|
||||
5. **Use credential management** - OIDC or IAM roles, never store credentials
|
||||
6. **Run security scans** - checkov, tfsec in pipeline
|
||||
7. **Tag releases** - Version your infrastructure code
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
### 1. Not Using Remote State
|
||||
- ❌ Local state doesn't work for teams
|
||||
- ✅ Use S3, Terraform Cloud, or other remote backend
|
||||
|
||||
### 2. Hardcoding Values
|
||||
- ❌ `region = "us-east-1"` in every resource
|
||||
- ✅ Use variables and locals
|
||||
|
||||
### 3. Not Using Modules
|
||||
- ❌ Copying code between environments
|
||||
- ✅ Create reusable modules
|
||||
|
||||
### 4. Ignoring State
|
||||
- ❌ Manually modifying infrastructure
|
||||
- ✅ All changes through Terraform
|
||||
|
||||
### 5. Poor Naming
|
||||
- ❌ `resource "aws_instance" "i1" { }`
|
||||
- ✅ `resource "aws_instance" "web_server_01" { }`
|
||||
|
||||
### 6. No Documentation
|
||||
- ❌ No README, no comments
|
||||
- ✅ Document everything
|
||||
|
||||
### 7. Massive State Files
|
||||
- ❌ Single state for entire infrastructure
|
||||
- ✅ Break into logical components
|
||||
|
||||
### 8. No Testing
|
||||
- ❌ Apply directly to production
|
||||
- ✅ Test in dev/staging first
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Essential Commands
|
||||
```bash
|
||||
# Initialize
|
||||
terraform init
|
||||
|
||||
# Validate configuration
|
||||
terraform validate
|
||||
|
||||
# Format code
|
||||
terraform fmt -recursive
|
||||
|
||||
# Plan changes
|
||||
terraform plan
|
||||
|
||||
# Apply changes
|
||||
terraform apply
|
||||
|
||||
# Destroy resources
|
||||
terraform destroy
|
||||
|
||||
# Show current state
|
||||
terraform show
|
||||
|
||||
# List resources
|
||||
terraform state list
|
||||
|
||||
# Output values
|
||||
terraform output
|
||||
```
|
||||
|
||||
### Useful Flags
|
||||
```bash
|
||||
# Plan without color
|
||||
terraform plan -no-color
|
||||
|
||||
# Apply without prompts
|
||||
terraform apply -auto-approve
|
||||
|
||||
# Destroy specific resource
|
||||
terraform destroy -target=aws_instance.example
|
||||
|
||||
# Use specific var file
|
||||
terraform apply -var-file="prod.tfvars"
|
||||
|
||||
# Set variable via CLI
|
||||
terraform apply -var="environment=prod"
|
||||
```
|
||||
Reference in New Issue
Block a user