Initial commit
This commit is contained in:
274
skills/cost-optimization/SKILL.md
Normal file
274
skills/cost-optimization/SKILL.md
Normal file
@@ -0,0 +1,274 @@
|
||||
---
|
||||
name: cost-optimization
|
||||
description: Optimize cloud costs through resource rightsizing, tagging strategies, reserved instances, and spending analysis. Use when reducing cloud expenses, analyzing infrastructure costs, or implementing cost governance policies.
|
||||
---
|
||||
|
||||
# Cloud Cost Optimization
|
||||
|
||||
Strategies and patterns for optimizing cloud costs across AWS, Azure, and GCP.
|
||||
|
||||
## Purpose
|
||||
|
||||
Implement systematic cost optimization strategies to reduce cloud spending while maintaining performance and reliability.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Reduce cloud spending
|
||||
- Right-size resources
|
||||
- Implement cost governance
|
||||
- Optimize multi-cloud costs
|
||||
- Meet budget constraints
|
||||
|
||||
## Cost Optimization Framework
|
||||
|
||||
### 1. Visibility
|
||||
- Implement cost allocation tags
|
||||
- Use cloud cost management tools
|
||||
- Set up budget alerts
|
||||
- Create cost dashboards
|
||||
|
||||
### 2. Right-Sizing
|
||||
- Analyze resource utilization
|
||||
- Downsize over-provisioned resources
|
||||
- Use auto-scaling
|
||||
- Remove idle resources
|
||||
|
||||
### 3. Pricing Models
|
||||
- Use reserved capacity
|
||||
- Leverage spot/preemptible instances
|
||||
- Implement savings plans
|
||||
- Use committed use discounts
|
||||
|
||||
### 4. Architecture Optimization
|
||||
- Use managed services
|
||||
- Implement caching
|
||||
- Optimize data transfer
|
||||
- Use lifecycle policies
|
||||
|
||||
## AWS Cost Optimization
|
||||
|
||||
### Reserved Instances
|
||||
```
|
||||
Savings: 30-72% vs On-Demand
|
||||
Term: 1 or 3 years
|
||||
Payment: All/Partial/No upfront
|
||||
Flexibility: Standard or Convertible
|
||||
```
|
||||
|
||||
### Savings Plans
|
||||
```
|
||||
Compute Savings Plans: 66% savings
|
||||
EC2 Instance Savings Plans: 72% savings
|
||||
Applies to: EC2, Fargate, Lambda
|
||||
Flexible across: Instance families, regions, OS
|
||||
```
|
||||
|
||||
### Spot Instances
|
||||
```
|
||||
Savings: Up to 90% vs On-Demand
|
||||
Best for: Batch jobs, CI/CD, stateless workloads
|
||||
Risk: 2-minute interruption notice
|
||||
Strategy: Mix with On-Demand for resilience
|
||||
```
|
||||
|
||||
### S3 Cost Optimization
|
||||
```hcl
|
||||
resource "aws_s3_bucket_lifecycle_configuration" "example" {
|
||||
bucket = aws_s3_bucket.example.id
|
||||
|
||||
rule {
|
||||
id = "transition-to-ia"
|
||||
status = "Enabled"
|
||||
|
||||
transition {
|
||||
days = 30
|
||||
storage_class = "STANDARD_IA"
|
||||
}
|
||||
|
||||
transition {
|
||||
days = 90
|
||||
storage_class = "GLACIER"
|
||||
}
|
||||
|
||||
expiration {
|
||||
days = 365
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Azure Cost Optimization
|
||||
|
||||
### Reserved VM Instances
|
||||
- 1 or 3 year terms
|
||||
- Up to 72% savings
|
||||
- Flexible sizing
|
||||
- Exchangeable
|
||||
|
||||
### Azure Hybrid Benefit
|
||||
- Use existing Windows Server licenses
|
||||
- Up to 80% savings with RI
|
||||
- Available for Windows and SQL Server
|
||||
|
||||
### Azure Advisor Recommendations
|
||||
- Right-size VMs
|
||||
- Delete unused resources
|
||||
- Use reserved capacity
|
||||
- Optimize storage
|
||||
|
||||
## GCP Cost Optimization
|
||||
|
||||
### Committed Use Discounts
|
||||
- 1 or 3 year commitment
|
||||
- Up to 57% savings
|
||||
- Applies to vCPUs and memory
|
||||
- Resource-based or spend-based
|
||||
|
||||
### Sustained Use Discounts
|
||||
- Automatic discounts
|
||||
- Up to 30% for running instances
|
||||
- No commitment required
|
||||
- Applies to Compute Engine, GKE
|
||||
|
||||
### Preemptible VMs
|
||||
- Up to 80% savings
|
||||
- 24-hour maximum runtime
|
||||
- Best for batch workloads
|
||||
|
||||
## Tagging Strategy
|
||||
|
||||
### AWS Tagging
|
||||
```hcl
|
||||
locals {
|
||||
common_tags = {
|
||||
Environment = "production"
|
||||
Project = "my-project"
|
||||
CostCenter = "engineering"
|
||||
Owner = "team@example.com"
|
||||
ManagedBy = "terraform"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_instance" "example" {
|
||||
ami = "ami-12345678"
|
||||
instance_type = "t3.medium"
|
||||
|
||||
tags = merge(
|
||||
local.common_tags,
|
||||
{
|
||||
Name = "web-server"
|
||||
}
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
**Reference:** See `references/tagging-standards.md`
|
||||
|
||||
## Cost Monitoring
|
||||
|
||||
### Budget Alerts
|
||||
```hcl
|
||||
# AWS Budget
|
||||
resource "aws_budgets_budget" "monthly" {
|
||||
name = "monthly-budget"
|
||||
budget_type = "COST"
|
||||
limit_amount = "1000"
|
||||
limit_unit = "USD"
|
||||
time_period_start = "2024-01-01_00:00"
|
||||
time_unit = "MONTHLY"
|
||||
|
||||
notification {
|
||||
comparison_operator = "GREATER_THAN"
|
||||
threshold = 80
|
||||
threshold_type = "PERCENTAGE"
|
||||
notification_type = "ACTUAL"
|
||||
subscriber_email_addresses = ["team@example.com"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Anomaly Detection
|
||||
- AWS Cost Anomaly Detection
|
||||
- Azure Cost Management alerts
|
||||
- GCP Budget alerts
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Pattern 1: Serverless First
|
||||
- Use Lambda/Functions for event-driven
|
||||
- Pay only for execution time
|
||||
- Auto-scaling included
|
||||
- No idle costs
|
||||
|
||||
### Pattern 2: Right-Sized Databases
|
||||
```
|
||||
Development: t3.small RDS
|
||||
Staging: t3.large RDS
|
||||
Production: r6g.2xlarge RDS with read replicas
|
||||
```
|
||||
|
||||
### Pattern 3: Multi-Tier Storage
|
||||
```
|
||||
Hot data: S3 Standard
|
||||
Warm data: S3 Standard-IA (30 days)
|
||||
Cold data: S3 Glacier (90 days)
|
||||
Archive: S3 Deep Archive (365 days)
|
||||
```
|
||||
|
||||
### Pattern 4: Auto-Scaling
|
||||
```hcl
|
||||
resource "aws_autoscaling_policy" "scale_up" {
|
||||
name = "scale-up"
|
||||
scaling_adjustment = 2
|
||||
adjustment_type = "ChangeInCapacity"
|
||||
cooldown = 300
|
||||
autoscaling_group_name = aws_autoscaling_group.main.name
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
|
||||
alarm_name = "cpu-high"
|
||||
comparison_operator = "GreaterThanThreshold"
|
||||
evaluation_periods = "2"
|
||||
metric_name = "CPUUtilization"
|
||||
namespace = "AWS/EC2"
|
||||
period = "60"
|
||||
statistic = "Average"
|
||||
threshold = "80"
|
||||
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
|
||||
}
|
||||
```
|
||||
|
||||
## Cost Optimization Checklist
|
||||
|
||||
- [ ] Implement cost allocation tags
|
||||
- [ ] Delete unused resources (EBS, EIPs, snapshots)
|
||||
- [ ] Right-size instances based on utilization
|
||||
- [ ] Use reserved capacity for steady workloads
|
||||
- [ ] Implement auto-scaling
|
||||
- [ ] Optimize storage classes
|
||||
- [ ] Use lifecycle policies
|
||||
- [ ] Enable cost anomaly detection
|
||||
- [ ] Set budget alerts
|
||||
- [ ] Review costs weekly
|
||||
- [ ] Use spot/preemptible instances
|
||||
- [ ] Optimize data transfer costs
|
||||
- [ ] Implement caching layers
|
||||
- [ ] Use managed services
|
||||
- [ ] Monitor and optimize continuously
|
||||
|
||||
## Tools
|
||||
|
||||
- **AWS:** Cost Explorer, Cost Anomaly Detection, Compute Optimizer
|
||||
- **Azure:** Cost Management, Advisor
|
||||
- **GCP:** Cost Management, Recommender
|
||||
- **Multi-cloud:** CloudHealth, Cloudability, Kubecost
|
||||
|
||||
## Reference Files
|
||||
|
||||
- `references/tagging-standards.md` - Tagging conventions
|
||||
- `assets/cost-analysis-template.xlsx` - Cost analysis spreadsheet
|
||||
|
||||
## Related Skills
|
||||
|
||||
- `terraform-module-library` - For resource provisioning
|
||||
- `multi-cloud-architecture` - For cloud selection
|
||||
Reference in New Issue
Block a user