Initial commit
This commit is contained in:
274
skills/cost-optimization/SKILL.md
Normal file
274
skills/cost-optimization/SKILL.md
Normal file
@@ -0,0 +1,274 @@
|
||||
---
|
||||
name: cost-optimization
|
||||
description: Optimize cloud costs through resource rightsizing, tagging strategies, reserved instances, and spending analysis. Use when reducing cloud expenses, analyzing infrastructure costs, or implementing cost governance policies.
|
||||
---
|
||||
|
||||
# Cloud Cost Optimization
|
||||
|
||||
Strategies and patterns for optimizing cloud costs across AWS, Azure, and GCP.
|
||||
|
||||
## Purpose
|
||||
|
||||
Implement systematic cost optimization strategies to reduce cloud spending while maintaining performance and reliability.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Reduce cloud spending
|
||||
- Right-size resources
|
||||
- Implement cost governance
|
||||
- Optimize multi-cloud costs
|
||||
- Meet budget constraints
|
||||
|
||||
## Cost Optimization Framework
|
||||
|
||||
### 1. Visibility
|
||||
- Implement cost allocation tags
|
||||
- Use cloud cost management tools
|
||||
- Set up budget alerts
|
||||
- Create cost dashboards
|
||||
|
||||
### 2. Right-Sizing
|
||||
- Analyze resource utilization
|
||||
- Downsize over-provisioned resources
|
||||
- Use auto-scaling
|
||||
- Remove idle resources
|
||||
|
||||
### 3. Pricing Models
|
||||
- Use reserved capacity
|
||||
- Leverage spot/preemptible instances
|
||||
- Implement savings plans
|
||||
- Use committed use discounts
|
||||
|
||||
### 4. Architecture Optimization
|
||||
- Use managed services
|
||||
- Implement caching
|
||||
- Optimize data transfer
|
||||
- Use lifecycle policies
|
||||
|
||||
## AWS Cost Optimization
|
||||
|
||||
### Reserved Instances
|
||||
```
|
||||
Savings: 30-72% vs On-Demand
|
||||
Term: 1 or 3 years
|
||||
Payment: All/Partial/No upfront
|
||||
Flexibility: Standard or Convertible
|
||||
```
|
||||
|
||||
### Savings Plans
|
||||
```
|
||||
Compute Savings Plans: 66% savings
|
||||
EC2 Instance Savings Plans: 72% savings
|
||||
Applies to: EC2, Fargate, Lambda
|
||||
Flexible across: Instance families, regions, OS
|
||||
```
|
||||
|
||||
### Spot Instances
|
||||
```
|
||||
Savings: Up to 90% vs On-Demand
|
||||
Best for: Batch jobs, CI/CD, stateless workloads
|
||||
Risk: 2-minute interruption notice
|
||||
Strategy: Mix with On-Demand for resilience
|
||||
```
|
||||
|
||||
### S3 Cost Optimization
|
||||
```hcl
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "example" {
|
||||
bucket = aws_s3_bucket.example.id
|
||||
|
||||
rule {
|
||||
id = "transition-to-ia"
|
||||
status = "Enabled"
|
||||
|
||||
transition {
|
||||
days = 30
|
||||
storage_class = "STANDARD_IA"
|
||||
}
|
||||
|
||||
transition {
|
||||
days = 90
|
||||
storage_class = "GLACIER"
|
||||
}
|
||||
|
||||
expiration {
|
||||
days = 365
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Azure Cost Optimization
|
||||
|
||||
### Reserved VM Instances
|
||||
- 1 or 3 year terms
|
||||
- Up to 72% savings
|
||||
- Flexible sizing
|
||||
- Exchangeable
|
||||
|
||||
### Azure Hybrid Benefit
|
||||
- Use existing Windows Server licenses
|
||||
- Up to 80% savings with RI
|
||||
- Available for Windows and SQL Server
|
||||
|
||||
### Azure Advisor Recommendations
|
||||
- Right-size VMs
|
||||
- Delete unused resources
|
||||
- Use reserved capacity
|
||||
- Optimize storage
|
||||
|
||||
## GCP Cost Optimization
|
||||
|
||||
### Committed Use Discounts
|
||||
- 1 or 3 year commitment
|
||||
- Up to 57% savings
|
||||
- Applies to vCPUs and memory
|
||||
- Resource-based or spend-based
|
||||
|
||||
### Sustained Use Discounts
|
||||
- Automatic discounts
|
||||
- Up to 30% for running instances
|
||||
- No commitment required
|
||||
- Applies to Compute Engine, GKE
|
||||
|
||||
### Preemptible VMs
|
||||
- Up to 80% savings
|
||||
- 24-hour maximum runtime
|
||||
- Best for batch workloads
|
||||
|
||||
## Tagging Strategy
|
||||
|
||||
### AWS Tagging
|
||||
```hcl
|
||||
locals {
|
||||
common_tags = {
|
||||
Environment = "production"
|
||||
Project = "my-project"
|
||||
CostCenter = "engineering"
|
||||
Owner = "team@example.com"
|
||||
ManagedBy = "terraform"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_instance" "example" {
|
||||
ami = "ami-12345678"
|
||||
instance_type = "t3.medium"
|
||||
|
||||
tags = merge(
|
||||
local.common_tags,
|
||||
{
|
||||
Name = "web-server"
|
||||
}
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
**Reference:** See `references/tagging-standards.md`
|
||||
|
||||
## Cost Monitoring
|
||||
|
||||
### Budget Alerts
|
||||
```hcl
|
||||
# AWS Budget
|
||||
resource "aws_budgets_budget" "monthly" {
|
||||
name = "monthly-budget"
|
||||
budget_type = "COST"
|
||||
limit_amount = "1000"
|
||||
limit_unit = "USD"
|
||||
time_period_start = "2024-01-01_00:00"
|
||||
time_unit = "MONTHLY"
|
||||
|
||||
notification {
|
||||
comparison_operator = "GREATER_THAN"
|
||||
threshold = 80
|
||||
threshold_type = "PERCENTAGE"
|
||||
notification_type = "ACTUAL"
|
||||
subscriber_email_addresses = ["team@example.com"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Anomaly Detection
|
||||
- AWS Cost Anomaly Detection
|
||||
- Azure Cost Management alerts
|
||||
- GCP Budget alerts
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Pattern 1: Serverless First
|
||||
- Use Lambda/Functions for event-driven
|
||||
- Pay only for execution time
|
||||
- Auto-scaling included
|
||||
- No idle costs
|
||||
|
||||
### Pattern 2: Right-Sized Databases
|
||||
```
|
||||
Development: t3.small RDS
|
||||
Staging: t3.large RDS
|
||||
Production: r6g.2xlarge RDS with read replicas
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Tier Storage
|
||||
```
|
||||
Hot data: S3 Standard
|
||||
Warm data: S3 Standard-IA (30 days)
|
||||
Cold data: S3 Glacier (90 days)
|
||||
Archive: S3 Deep Archive (365 days)
|
||||
```
|
||||
|
||||
### Pattern 4: Auto-Scaling
|
||||
```hcl
|
||||
resource "aws_autoscaling_policy" "scale_up" {
|
||||
name = "scale-up"
|
||||
scaling_adjustment = 2
|
||||
adjustment_type = "ChangeInCapacity"
|
||||
cooldown = 300
|
||||
autoscaling_group_name = aws_autoscaling_group.main.name
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
|
||||
alarm_name = "cpu-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = "2"
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/EC2"
|
||||
period = "60"
|
||||
statistic = "Average"
|
||||
threshold = "80"
|
||||
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
|
||||
}
|
||||
```
|
||||
|
||||
## Cost Optimization Checklist
|
||||
|
||||
- [ ] Implement cost allocation tags
|
||||
- [ ] Delete unused resources (EBS, EIPs, snapshots)
|
||||
- [ ] Right-size instances based on utilization
|
||||
- [ ] Use reserved capacity for steady workloads
|
||||
- [ ] Implement auto-scaling
|
||||
- [ ] Optimize storage classes
|
||||
- [ ] Use lifecycle policies
|
||||
- [ ] Enable cost anomaly detection
|
||||
- [ ] Set budget alerts
|
||||
- [ ] Review costs weekly
|
||||
- [ ] Use spot/preemptible instances
|
||||
- [ ] Optimize data transfer costs
|
||||
- [ ] Implement caching layers
|
||||
- [ ] Use managed services
|
||||
- [ ] Monitor and optimize continuously
|
||||
|
||||
## Tools
|
||||
|
||||
- **AWS:** Cost Explorer, Cost Anomaly Detection, Compute Optimizer
|
||||
- **Azure:** Cost Management, Advisor
|
||||
- **GCP:** Cost Management, Recommender
|
||||
- **Multi-cloud:** CloudHealth, Cloudability, Kubecost
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `references/tagging-standards.md` - Tagging conventions
|
||||
- `assets/cost-analysis-template.xlsx` - Cost analysis spreadsheet
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `terraform-module-library` - For resource provisioning
|
||||
- `multi-cloud-architecture` - For cloud selection
|
||||
226
skills/hybrid-cloud-networking/SKILL.md
Normal file
226
skills/hybrid-cloud-networking/SKILL.md
Normal file
@@ -0,0 +1,226 @@
|
||||
---
|
||||
name: hybrid-cloud-networking
|
||||
description: Configure secure, high-performance connectivity between on-premises infrastructure and cloud platforms using VPN and dedicated connections. Use when building hybrid cloud architectures, connecting data centers to cloud, or implementing secure cross-premises networking.
|
||||
---
|
||||
|
||||
# Hybrid Cloud Networking
|
||||
|
||||
Configure secure, high-performance connectivity between on-premises and cloud environments using VPN, Direct Connect, and ExpressRoute.
|
||||
|
||||
## Purpose
|
||||
|
||||
Establish secure, reliable network connectivity between on-premises data centers and cloud providers (AWS, Azure, GCP).
|
||||
|
||||
## When to Use
|
||||
|
||||
- Connect on-premises to cloud
|
||||
- Extend datacenter to cloud
|
||||
- Implement hybrid active-active setups
|
||||
- Meet compliance requirements
|
||||
- Migrate to cloud gradually
|
||||
|
||||
## Connection Options
|
||||
|
||||
### AWS Connectivity
|
||||
|
||||
#### 1. Site-to-Site VPN
|
||||
- IPSec VPN over internet
|
||||
- Up to 1.25 Gbps per tunnel
|
||||
- Cost-effective for moderate bandwidth
|
||||
- Higher latency, internet-dependent
|
||||
|
||||
```hcl
|
||||
resource "aws_vpn_gateway" "main" {
|
||||
vpc_id = aws_vpc.main.id
|
||||
tags = {
|
||||
Name = "main-vpn-gateway"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_customer_gateway" "main" {
|
||||
bgp_asn = 65000
|
||||
ip_address = "203.0.113.1"
|
||||
type = "ipsec.1"
|
||||
}
|
||||
|
||||
resource "aws_vpn_connection" "main" {
|
||||
vpn_gateway_id = aws_vpn_gateway.main.id
|
||||
customer_gateway_id = aws_customer_gateway.main.id
|
||||
type = "ipsec.1"
|
||||
static_routes_only = false
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. AWS Direct Connect
|
||||
- Dedicated network connection
|
||||
- 1 Gbps to 100 Gbps
|
||||
- Lower latency, consistent bandwidth
|
||||
- More expensive, setup time required
|
||||
|
||||
**Reference:** See `references/direct-connect.md`
|
||||
|
||||
### Azure Connectivity
|
||||
|
||||
#### 1. Site-to-Site VPN
|
||||
```hcl
|
||||
resource "azurerm_virtual_network_gateway" "vpn" {
|
||||
name = "vpn-gateway"
|
||||
location = azurerm_resource_group.main.location
|
||||
resource_group_name = azurerm_resource_group.main.name
|
||||
|
||||
type = "Vpn"
|
||||
vpn_type = "RouteBased"
|
||||
sku = "VpnGw1"
|
||||
|
||||
ip_configuration {
|
||||
name = "vnetGatewayConfig"
|
||||
public_ip_address_id = azurerm_public_ip.vpn.id
|
||||
private_ip_address_allocation = "Dynamic"
|
||||
subnet_id = azurerm_subnet.gateway.id
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 2. Azure ExpressRoute
|
||||
- Private connection via connectivity provider
|
||||
- Up to 100 Gbps
|
||||
- Low latency, high reliability
|
||||
- Premium for global connectivity
|
||||
|
||||
### GCP Connectivity
|
||||
|
||||
#### 1. Cloud VPN
|
||||
- IPSec VPN (Classic or HA VPN)
|
||||
- HA VPN: 99.99% SLA
|
||||
- Up to 3 Gbps per tunnel
|
||||
|
||||
#### 2. Cloud Interconnect
|
||||
- Dedicated (10 Gbps, 100 Gbps)
|
||||
- Partner (50 Mbps to 50 Gbps)
|
||||
- Lower latency than VPN
|
||||
|
||||
## Hybrid Network Patterns
|
||||
|
||||
### Pattern 1: Hub-and-Spoke
|
||||
```
|
||||
On-Premises Datacenter
|
||||
↓
|
||||
VPN/Direct Connect
|
||||
↓
|
||||
Transit Gateway (AWS) / vWAN (Azure)
|
||||
↓
|
||||
├─ Production VPC/VNet
|
||||
├─ Staging VPC/VNet
|
||||
└─ Development VPC/VNet
|
||||
```
|
||||
|
||||
### Pattern 2: Multi-Region Hybrid
|
||||
```
|
||||
On-Premises
|
||||
├─ Direct Connect → us-east-1
|
||||
└─ Direct Connect → us-west-2
|
||||
↓
|
||||
Cross-Region Peering
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Cloud Hybrid
|
||||
```
|
||||
On-Premises Datacenter
|
||||
├─ Direct Connect → AWS
|
||||
├─ ExpressRoute → Azure
|
||||
└─ Interconnect → GCP
|
||||
```
|
||||
|
||||
## Routing Configuration
|
||||
|
||||
### BGP Configuration
|
||||
```
|
||||
On-Premises Router:
|
||||
- AS Number: 65000
|
||||
- Advertise: 10.0.0.0/8
|
||||
|
||||
Cloud Router:
|
||||
- AS Number: 64512 (AWS), 65515 (Azure)
|
||||
- Advertise: Cloud VPC/VNet CIDRs
|
||||
```
|
||||
|
||||
### Route Propagation
|
||||
- Enable route propagation on route tables
|
||||
- Use BGP for dynamic routing
|
||||
- Implement route filtering
|
||||
- Monitor route advertisements
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Use private connectivity** (Direct Connect/ExpressRoute)
|
||||
2. **Implement encryption** for VPN tunnels
|
||||
3. **Use VPC endpoints** to avoid internet routing
|
||||
4. **Configure network ACLs** and security groups
|
||||
5. **Enable VPC Flow Logs** for monitoring
|
||||
6. **Implement DDoS protection**
|
||||
7. **Use PrivateLink/Private Endpoints**
|
||||
8. **Monitor connections** with CloudWatch/Monitor
|
||||
9. **Implement redundancy** (dual tunnels)
|
||||
10. **Regular security audits**
|
||||
|
||||
## High Availability
|
||||
|
||||
### Dual VPN Tunnels
|
||||
```hcl
|
||||
resource "aws_vpn_connection" "primary" {
|
||||
vpn_gateway_id = aws_vpn_gateway.main.id
|
||||
customer_gateway_id = aws_customer_gateway.primary.id
|
||||
type = "ipsec.1"
|
||||
}
|
||||
|
||||
resource "aws_vpn_connection" "secondary" {
|
||||
vpn_gateway_id = aws_vpn_gateway.main.id
|
||||
customer_gateway_id = aws_customer_gateway.secondary.id
|
||||
type = "ipsec.1"
|
||||
}
|
||||
```
|
||||
|
||||
### Active-Active Configuration
|
||||
- Multiple connections from different locations
|
||||
- BGP for automatic failover
|
||||
- Equal-cost multi-path (ECMP) routing
|
||||
- Monitor health of all connections
|
||||
|
||||
## Monitoring and Troubleshooting
|
||||
|
||||
### Key Metrics
|
||||
- Tunnel status (up/down)
|
||||
- Bytes in/out
|
||||
- Packet loss
|
||||
- Latency
|
||||
- BGP session status
|
||||
|
||||
### Troubleshooting
|
||||
```bash
|
||||
# AWS VPN
|
||||
aws ec2 describe-vpn-connections
|
||||
aws ec2 get-vpn-connection-telemetry
|
||||
|
||||
# Azure VPN
|
||||
az network vpn-connection show
|
||||
az network vpn-connection show-device-config-script
|
||||
```
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
1. **Right-size connections** based on traffic
|
||||
2. **Use VPN for low-bandwidth** workloads
|
||||
3. **Consolidate traffic** through fewer connections
|
||||
4. **Minimize data transfer** costs
|
||||
5. **Use Direct Connect** for high bandwidth
|
||||
6. **Implement caching** to reduce traffic
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `references/vpn-setup.md` - VPN configuration guide
|
||||
- `references/direct-connect.md` - Direct Connect setup
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `multi-cloud-architecture` - For architecture decisions
|
||||
- `terraform-module-library` - For IaC implementation
|
||||
177
skills/multi-cloud-architecture/SKILL.md
Normal file
177
skills/multi-cloud-architecture/SKILL.md
Normal file
@@ -0,0 +1,177 @@
|
||||
---
|
||||
name: multi-cloud-architecture
|
||||
description: Design multi-cloud architectures using a decision framework to select and integrate services across AWS, Azure, and GCP. Use when building multi-cloud systems, avoiding vendor lock-in, or leveraging best-of-breed services from multiple providers.
|
||||
---
|
||||
|
||||
# Multi-Cloud Architecture
|
||||
|
||||
Decision framework and patterns for architecting applications across AWS, Azure, and GCP.
|
||||
|
||||
## Purpose
|
||||
|
||||
Design cloud-agnostic architectures and make informed decisions about service selection across cloud providers.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Design multi-cloud strategies
|
||||
- Migrate between cloud providers
|
||||
- Select cloud services for specific workloads
|
||||
- Implement cloud-agnostic architectures
|
||||
- Optimize costs across providers
|
||||
|
||||
## Cloud Service Comparison
|
||||
|
||||
### Compute Services
|
||||
|
||||
| AWS | Azure | GCP | Use Case |
|
||||
|-----|-------|-----|----------|
|
||||
| EC2 | Virtual Machines | Compute Engine | IaaS VMs |
|
||||
| ECS | Container Instances | Cloud Run | Containers |
|
||||
| EKS | AKS | GKE | Kubernetes |
|
||||
| Lambda | Functions | Cloud Functions | Serverless |
|
||||
| Fargate | Container Apps | Cloud Run | Managed containers |
|
||||
|
||||
### Storage Services
|
||||
|
||||
| AWS | Azure | GCP | Use Case |
|
||||
|-----|-------|-----|----------|
|
||||
| S3 | Blob Storage | Cloud Storage | Object storage |
|
||||
| EBS | Managed Disks | Persistent Disk | Block storage |
|
||||
| EFS | Azure Files | Filestore | File storage |
|
||||
| Glacier | Archive Storage | Archive Storage | Cold storage |
|
||||
|
||||
### Database Services
|
||||
|
||||
| AWS | Azure | GCP | Use Case |
|
||||
|-----|-------|-----|----------|
|
||||
| RDS | SQL Database | Cloud SQL | Managed SQL |
|
||||
| DynamoDB | Cosmos DB | Firestore | NoSQL |
|
||||
| Aurora | PostgreSQL/MySQL | Cloud Spanner | Distributed SQL |
|
||||
| ElastiCache | Cache for Redis | Memorystore | Caching |
|
||||
|
||||
**Reference:** See `references/service-comparison.md` for complete comparison
|
||||
|
||||
## Multi-Cloud Patterns
|
||||
|
||||
### Pattern 1: Single Provider with DR
|
||||
|
||||
- Primary workload in one cloud
|
||||
- Disaster recovery in another
|
||||
- Database replication across clouds
|
||||
- Automated failover
|
||||
|
||||
### Pattern 2: Best-of-Breed
|
||||
|
||||
- Use best service from each provider
|
||||
- AI/ML on GCP
|
||||
- Enterprise apps on Azure
|
||||
- General compute on AWS
|
||||
|
||||
### Pattern 3: Geographic Distribution
|
||||
|
||||
- Serve users from nearest cloud region
|
||||
- Data sovereignty compliance
|
||||
- Global load balancing
|
||||
- Regional failover
|
||||
|
||||
### Pattern 4: Cloud-Agnostic Abstraction
|
||||
|
||||
- Kubernetes for compute
|
||||
- PostgreSQL for database
|
||||
- S3-compatible storage (MinIO)
|
||||
- Open source tools
|
||||
|
||||
## Cloud-Agnostic Architecture
|
||||
|
||||
### Use Cloud-Native Alternatives
|
||||
|
||||
- **Compute:** Kubernetes (EKS/AKS/GKE)
|
||||
- **Database:** PostgreSQL/MySQL (RDS/SQL Database/Cloud SQL)
|
||||
- **Message Queue:** Apache Kafka (MSK/Event Hubs/Confluent)
|
||||
- **Cache:** Redis (ElastiCache/Azure Cache/Memorystore)
|
||||
- **Object Storage:** S3-compatible API
|
||||
- **Monitoring:** Prometheus/Grafana
|
||||
- **Service Mesh:** Istio/Linkerd
|
||||
|
||||
### Abstraction Layers
|
||||
|
||||
```
|
||||
Application Layer
|
||||
↓
|
||||
Infrastructure Abstraction (Terraform)
|
||||
↓
|
||||
Cloud Provider APIs
|
||||
↓
|
||||
AWS / Azure / GCP
|
||||
```
|
||||
|
||||
## Cost Comparison
|
||||
|
||||
### Compute Pricing Factors
|
||||
|
||||
- **AWS:** On-demand, Reserved, Spot, Savings Plans
|
||||
- **Azure:** Pay-as-you-go, Reserved, Spot
|
||||
- **GCP:** On-demand, Committed use, Preemptible
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
1. Use reserved/committed capacity (30-70% savings)
|
||||
2. Leverage spot/preemptible instances
|
||||
3. Right-size resources
|
||||
4. Use serverless for variable workloads
|
||||
5. Optimize data transfer costs
|
||||
6. Implement lifecycle policies
|
||||
7. Use cost allocation tags
|
||||
8. Monitor with cloud cost tools
|
||||
|
||||
**Reference:** See `references/multi-cloud-patterns.md`
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Phase 1: Assessment
|
||||
- Inventory current infrastructure
|
||||
- Identify dependencies
|
||||
- Assess cloud compatibility
|
||||
- Estimate costs
|
||||
|
||||
### Phase 2: Pilot
|
||||
- Select pilot workload
|
||||
- Implement in target cloud
|
||||
- Test thoroughly
|
||||
- Document learnings
|
||||
|
||||
### Phase 3: Migration
|
||||
- Migrate workloads incrementally
|
||||
- Maintain dual-run period
|
||||
- Monitor performance
|
||||
- Validate functionality
|
||||
|
||||
### Phase 4: Optimization
|
||||
- Right-size resources
|
||||
- Implement cloud-native services
|
||||
- Optimize costs
|
||||
- Enhance security
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use infrastructure as code** (Terraform/OpenTofu)
|
||||
2. **Implement CI/CD pipelines** for deployments
|
||||
3. **Design for failure** across clouds
|
||||
4. **Use managed services** when possible
|
||||
5. **Implement comprehensive monitoring**
|
||||
6. **Automate cost optimization**
|
||||
7. **Follow security best practices**
|
||||
8. **Document cloud-specific configurations**
|
||||
9. **Test disaster recovery** procedures
|
||||
10. **Train teams** on multiple clouds
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `references/service-comparison.md` - Complete service comparison
|
||||
- `references/multi-cloud-patterns.md` - Architecture patterns
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `terraform-module-library` - For IaC implementation
|
||||
- `cost-optimization` - For cost management
|
||||
- `hybrid-cloud-networking` - For connectivity
|
||||
249
skills/terraform-module-library/SKILL.md
Normal file
249
skills/terraform-module-library/SKILL.md
Normal file
@@ -0,0 +1,249 @@
|
||||
---
|
||||
name: terraform-module-library
|
||||
description: Build reusable Terraform modules for AWS, Azure, and GCP infrastructure following infrastructure-as-code best practices. Use when creating infrastructure modules, standardizing cloud provisioning, or implementing reusable IaC components.
|
||||
---
|
||||
|
||||
# Terraform Module Library
|
||||
|
||||
Production-ready Terraform module patterns for AWS, Azure, and GCP infrastructure.
|
||||
|
||||
## Purpose
|
||||
|
||||
Create reusable, well-tested Terraform modules for common cloud infrastructure patterns across multiple cloud providers.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Build reusable infrastructure components
|
||||
- Standardize cloud resource provisioning
|
||||
- Implement infrastructure as code best practices
|
||||
- Create multi-cloud compatible modules
|
||||
- Establish organizational Terraform standards
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
terraform-modules/
|
||||
├── aws/
|
||||
│ ├── vpc/
|
||||
│ ├── eks/
|
||||
│ ├── rds/
|
||||
│ └── s3/
|
||||
├── azure/
|
||||
│ ├── vnet/
|
||||
│ ├── aks/
|
||||
│ └── storage/
|
||||
└── gcp/
|
||||
├── vpc/
|
||||
├── gke/
|
||||
└── cloud-sql/
|
||||
```
|
||||
|
||||
## Standard Module Pattern
|
||||
|
||||
```
|
||||
module-name/
|
||||
├── main.tf # Main resources
|
||||
├── variables.tf # Input variables
|
||||
├── outputs.tf # Output values
|
||||
├── versions.tf # Provider versions
|
||||
├── README.md # Documentation
|
||||
├── examples/ # Usage examples
|
||||
│ └── complete/
|
||||
│ ├── main.tf
|
||||
│ └── variables.tf
|
||||
└── tests/ # Terratest files
|
||||
└── module_test.go
|
||||
```
|
||||
|
||||
## AWS VPC Module Example
|
||||
|
||||
**main.tf:**
|
||||
```hcl
|
||||
resource "aws_vpc" "main" {
|
||||
cidr_block = var.cidr_block
|
||||
enable_dns_hostnames = var.enable_dns_hostnames
|
||||
enable_dns_support = var.enable_dns_support
|
||||
|
||||
tags = merge(
|
||||
{
|
||||
Name = var.name
|
||||
},
|
||||
var.tags
|
||||
)
|
||||
}
|
||||
|
||||
resource "aws_subnet" "private" {
|
||||
count = length(var.private_subnet_cidrs)
|
||||
vpc_id = aws_vpc.main.id
|
||||
cidr_block = var.private_subnet_cidrs[count.index]
|
||||
availability_zone = var.availability_zones[count.index]
|
||||
|
||||
tags = merge(
|
||||
{
|
||||
Name = "${var.name}-private-${count.index + 1}"
|
||||
Tier = "private"
|
||||
},
|
||||
var.tags
|
||||
)
|
||||
}
|
||||
|
||||
resource "aws_internet_gateway" "main" {
|
||||
count = var.create_internet_gateway ? 1 : 0
|
||||
vpc_id = aws_vpc.main.id
|
||||
|
||||
tags = merge(
|
||||
{
|
||||
Name = "${var.name}-igw"
|
||||
},
|
||||
var.tags
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
**variables.tf:**
|
||||
```hcl
|
||||
variable "name" {
|
||||
description = "Name of the VPC"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "cidr_block" {
|
||||
description = "CIDR block for VPC"
|
||||
type = string
|
||||
validation {
|
||||
condition = can(regex("^([0-9]{1,3}\\.){3}[0-9]{1,3}/[0-9]{1,2}$", var.cidr_block))
|
||||
error_message = "CIDR block must be valid IPv4 CIDR notation."
|
||||
}
|
||||
}
|
||||
|
||||
variable "availability_zones" {
|
||||
description = "List of availability zones"
|
||||
type = list(string)
|
||||
}
|
||||
|
||||
variable "private_subnet_cidrs" {
|
||||
description = "CIDR blocks for private subnets"
|
||||
type = list(string)
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "enable_dns_hostnames" {
|
||||
description = "Enable DNS hostnames in VPC"
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Additional tags"
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
```
|
||||
|
||||
**outputs.tf:**
|
||||
```hcl
|
||||
output "vpc_id" {
|
||||
description = "ID of the VPC"
|
||||
value = aws_vpc.main.id
|
||||
}
|
||||
|
||||
output "private_subnet_ids" {
|
||||
description = "IDs of private subnets"
|
||||
value = aws_subnet.private[*].id
|
||||
}
|
||||
|
||||
output "vpc_cidr_block" {
|
||||
description = "CIDR block of VPC"
|
||||
value = aws_vpc.main.cidr_block
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use semantic versioning** for modules
|
||||
2. **Document all variables** with descriptions
|
||||
3. **Provide examples** in examples/ directory
|
||||
4. **Use validation blocks** for input validation
|
||||
5. **Output important attributes** for module composition
|
||||
6. **Pin provider versions** in versions.tf
|
||||
7. **Use locals** for computed values
|
||||
8. **Implement conditional resources** with count/for_each
|
||||
9. **Test modules** with Terratest
|
||||
10. **Tag all resources** consistently
|
||||
|
||||
## Module Composition
|
||||
|
||||
```hcl
|
||||
module "vpc" {
|
||||
source = "../../modules/aws/vpc"
|
||||
|
||||
name = "production"
|
||||
cidr_block = "10.0.0.0/16"
|
||||
availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
|
||||
|
||||
private_subnet_cidrs = [
|
||||
"10.0.1.0/24",
|
||||
"10.0.2.0/24",
|
||||
"10.0.3.0/24"
|
||||
]
|
||||
|
||||
tags = {
|
||||
Environment = "production"
|
||||
ManagedBy = "terraform"
|
||||
}
|
||||
}
|
||||
|
||||
module "rds" {
|
||||
source = "../../modules/aws/rds"
|
||||
|
||||
identifier = "production-db"
|
||||
engine = "postgres"
|
||||
engine_version = "15.3"
|
||||
instance_class = "db.t3.large"
|
||||
|
||||
vpc_id = module.vpc.vpc_id
|
||||
subnet_ids = module.vpc.private_subnet_ids
|
||||
|
||||
tags = {
|
||||
Environment = "production"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `assets/vpc-module/` - Complete VPC module example
|
||||
- `assets/rds-module/` - RDS module example
|
||||
- `references/aws-modules.md` - AWS module patterns
|
||||
- `references/azure-modules.md` - Azure module patterns
|
||||
- `references/gcp-modules.md` - GCP module patterns
|
||||
|
||||
## Testing
|
||||
|
||||
```go
|
||||
// tests/vpc_test.go
|
||||
package test
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"github.com/gruntwork-io/terratest/modules/terraform"
|
||||
"github.com/stretchr/testify/assert"
|
||||
)
|
||||
|
||||
func TestVPCModule(t *testing.T) {
|
||||
terraformOptions := &terraform.Options{
|
||||
TerraformDir: "../examples/complete",
|
||||
}
|
||||
|
||||
defer terraform.Destroy(t, terraformOptions)
|
||||
terraform.InitAndApply(t, terraformOptions)
|
||||
|
||||
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
|
||||
assert.NotEmpty(t, vpcID)
|
||||
}
|
||||
```
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `multi-cloud-architecture` - For architectural decisions
|
||||
- `cost-optimization` - For cost-effective designs
|
||||
63
skills/terraform-module-library/references/aws-modules.md
Normal file
63
skills/terraform-module-library/references/aws-modules.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# AWS Terraform Module Patterns
|
||||
|
||||
## VPC Module
|
||||
- VPC with public/private subnets
|
||||
- Internet Gateway and NAT Gateways
|
||||
- Route tables and associations
|
||||
- Network ACLs
|
||||
- VPC Flow Logs
|
||||
|
||||
## EKS Module
|
||||
- EKS cluster with managed node groups
|
||||
- IRSA (IAM Roles for Service Accounts)
|
||||
- Cluster autoscaler
|
||||
- VPC CNI configuration
|
||||
- Cluster logging
|
||||
|
||||
## RDS Module
|
||||
- RDS instance or cluster
|
||||
- Automated backups
|
||||
- Read replicas
|
||||
- Parameter groups
|
||||
- Subnet groups
|
||||
- Security groups
|
||||
|
||||
## S3 Module
|
||||
- S3 bucket with versioning
|
||||
- Encryption at rest
|
||||
- Bucket policies
|
||||
- Lifecycle rules
|
||||
- Replication configuration
|
||||
|
||||
## ALB Module
|
||||
- Application Load Balancer
|
||||
- Target groups
|
||||
- Listener rules
|
||||
- SSL/TLS certificates
|
||||
- Access logs
|
||||
|
||||
## Lambda Module
|
||||
- Lambda function
|
||||
- IAM execution role
|
||||
- CloudWatch Logs
|
||||
- Environment variables
|
||||
- VPC configuration (optional)
|
||||
|
||||
## Security Group Module
|
||||
- Reusable security group rules
|
||||
- Ingress/egress rules
|
||||
- Dynamic rule creation
|
||||
- Rule descriptions
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. Use AWS provider version ~> 5.0
|
||||
2. Enable encryption by default
|
||||
3. Use least-privilege IAM
|
||||
4. Tag all resources consistently
|
||||
5. Enable logging and monitoring
|
||||
6. Use KMS for encryption
|
||||
7. Implement backup strategies
|
||||
8. Use PrivateLink when possible
|
||||
9. Enable GuardDuty/SecurityHub
|
||||
10. Follow AWS Well-Architected Framework
|
||||
Reference in New Issue
Block a user