Initial commit
This commit is contained in:
360
commands/cost-analyze.md
Normal file
360
commands/cost-analyze.md
Normal file
@@ -0,0 +1,360 @@
|
||||
# /specweave-cost-optimizer:cost-analyze
|
||||
|
||||
Analyze cloud infrastructure costs and identify optimization opportunities across AWS, Azure, and GCP.
|
||||
|
||||
You are an expert FinOps engineer who performs comprehensive cost analysis for cloud infrastructure.
|
||||
|
||||
## Your Task
|
||||
|
||||
Perform deep cost analysis of cloud resources and generate actionable optimization recommendations.
|
||||
|
||||
### 1. Cost Analysis Scope
|
||||
|
||||
**Multi-Cloud Support**:
|
||||
- AWS (EC2, Lambda, S3, RDS, DynamoDB, ECS/EKS, CloudFront)
|
||||
- Azure (VMs, Functions, Storage, SQL, Cosmos DB, AKS, CDN)
|
||||
- GCP (Compute Engine, Cloud Functions, Cloud Storage, Cloud SQL, GKE, Cloud CDN)
|
||||
|
||||
**Analysis Dimensions**:
|
||||
- Resource utilization vs capacity
|
||||
- Reserved vs on-demand pricing
|
||||
- Right-sizing opportunities
|
||||
- Idle resource detection
|
||||
- Storage lifecycle policies
|
||||
- Data transfer costs
|
||||
- Region pricing differences
|
||||
|
||||
### 2. Data Collection Methods
|
||||
|
||||
**AWS Cost Explorer**:
|
||||
```bash
|
||||
# Get cost and usage data
|
||||
aws ce get-cost-and-usage \
|
||||
--time-period Start=2025-01-01,End=2025-01-31 \
|
||||
--granularity DAILY \
|
||||
--metrics BlendedCost \
|
||||
--group-by Type=SERVICE
|
||||
|
||||
# Get right-sizing recommendations
|
||||
aws ce get-rightsizing-recommendation \
|
||||
--service AmazonEC2 \
|
||||
--page-size 100
|
||||
```
|
||||
|
||||
**Azure Cost Management**:
|
||||
```bash
|
||||
# Get cost details
|
||||
az consumption usage list \
|
||||
--start-date 2025-01-01 \
|
||||
--end-date 2025-01-31
|
||||
|
||||
# Get advisor recommendations
|
||||
az advisor recommendation list \
|
||||
--category Cost
|
||||
```
|
||||
|
||||
**GCP Billing API**:
|
||||
```bash
|
||||
# Export billing to BigQuery
|
||||
# Then query:
|
||||
SELECT
|
||||
service.description as service,
|
||||
SUM(cost) as total_cost
|
||||
FROM `project.dataset.gcp_billing_export`
|
||||
WHERE _PARTITIONDATE >= '2025-01-01'
|
||||
GROUP BY service
|
||||
ORDER BY total_cost DESC
|
||||
```
|
||||
|
||||
### 3. Analysis Framework
|
||||
|
||||
**Step 1: Resource Inventory**
|
||||
- List all compute instances (EC2, VMs, Compute Engine)
|
||||
- Identify database resources (RDS, SQL, Cloud SQL)
|
||||
- Catalog storage (S3, Blob, Cloud Storage)
|
||||
- Map serverless functions (Lambda, Functions, Cloud Functions)
|
||||
- Document networking (Load Balancers, NAT Gateways, VPN)
|
||||
|
||||
**Step 2: Utilization Analysis**
|
||||
```typescript
|
||||
interface ResourceUtilization {
|
||||
resourceId: string;
|
||||
resourceType: string;
|
||||
cpu: {
|
||||
average: number;
|
||||
peak: number;
|
||||
p95: number;
|
||||
};
|
||||
memory: {
|
||||
average: number;
|
||||
peak: number;
|
||||
p95: number;
|
||||
};
|
||||
recommendation: 'downsize' | 'rightsize' | 'optimal' | 'upsize';
|
||||
}
|
||||
|
||||
// Example thresholds
|
||||
const THRESHOLDS = {
|
||||
cpu: {
|
||||
idle: 5, // < 5% CPU = idle
|
||||
underused: 20, // < 20% CPU = undersized
|
||||
optimal: 70, // 20-70% = optimal
|
||||
overused: 85, // > 85% = needs upsize
|
||||
},
|
||||
memory: {
|
||||
idle: 10,
|
||||
underused: 30,
|
||||
optimal: 75,
|
||||
overused: 90,
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
**Step 3: Cost Breakdown**
|
||||
```typescript
|
||||
interface CostBreakdown {
|
||||
total: number;
|
||||
byService: Record<string, number>;
|
||||
byEnvironment: Record<string, number>;
|
||||
byTeam: Record<string, number>;
|
||||
trends: {
|
||||
mom: number; // month-over-month %
|
||||
yoy: number; // year-over-year %
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Optimization Opportunities
|
||||
|
||||
**Compute Optimization**:
|
||||
- **Idle Resources**: Instances with < 5% CPU for 7+ days
|
||||
- **Right-sizing**: Over-provisioned instances (< 20% utilization)
|
||||
- **Reserved Instances**: Steady-state workloads (> 70% usage)
|
||||
- **Spot/Preemptible**: Fault-tolerant, stateless workloads
|
||||
- **Auto-scaling**: Variable workloads with predictable patterns
|
||||
|
||||
**Storage Optimization**:
|
||||
- **Lifecycle Policies**: Move to cheaper tiers (S3 IA, Glacier, Archive)
|
||||
- **Compression**: Enable compression for text/logs
|
||||
- **Deduplication**: Remove duplicate data
|
||||
- **Snapshots**: Delete old AMIs, EBS snapshots, disk snapshots
|
||||
- **Data Transfer**: Use CDN, optimize cross-region transfers
|
||||
|
||||
**Database Optimization**:
|
||||
- **Right-sizing**: Analyze IOPS, connections, memory usage
|
||||
- **Reserved Capacity**: RDS/SQL Reserved Instances
|
||||
- **Serverless Options**: Aurora Serverless, Cosmos DB serverless
|
||||
- **Read Replicas**: Offload read traffic
|
||||
- **Backup Retention**: Optimize backup storage costs
|
||||
|
||||
**Serverless Optimization**:
|
||||
- **Memory Allocation**: Lambda/Functions memory vs execution time
|
||||
- **Concurrency**: Optimize for cold starts vs cost
|
||||
- **VPC Configuration**: Avoid VPC Lambda unless needed (adds NAT costs)
|
||||
- **Invocation Patterns**: Batch vs streaming, sync vs async
|
||||
|
||||
### 5. Savings Calculations
|
||||
|
||||
**Reserved Instance Savings**:
|
||||
```typescript
|
||||
interface RISavings {
|
||||
currentOnDemandCost: number;
|
||||
riCost: number;
|
||||
upfrontCost: number;
|
||||
monthlySavings: number;
|
||||
annualSavings: number;
|
||||
paybackPeriod: number; // months
|
||||
roi: number; // %
|
||||
}
|
||||
|
||||
// Example: AWS EC2 Reserved Instance
|
||||
const onDemandCost = 0.096 * 730; // t3.large on-demand/month
|
||||
const ri1Year = 0.062 * 730; // t3.large 1-year RI
|
||||
const savings = onDemandCost - ri1Year; // $24.82/month = $297.84/year
|
||||
const savingsPercent = (savings / onDemandCost) * 100; // 35%
|
||||
```
|
||||
|
||||
**Spot Instance Savings**:
|
||||
```typescript
|
||||
// Spot instances can save 50-90%
|
||||
const onDemand = 0.096; // t3.large
|
||||
const spot = 0.0288; // typical spot price (70% discount)
|
||||
const savings = 1 - (spot / onDemand); // 70% savings
|
||||
```
|
||||
|
||||
**Storage Tier Savings**:
|
||||
```typescript
|
||||
// S3 pricing (us-east-1, per GB/month)
|
||||
const pricing = {
|
||||
standard: 0.023,
|
||||
ia: 0.0125, // Infrequent Access (54% cheaper)
|
||||
glacier: 0.004, // Glacier (83% cheaper)
|
||||
deepArchive: 0.00099, // Deep Archive (96% cheaper)
|
||||
};
|
||||
|
||||
// For 1TB rarely accessed data
|
||||
const cost_standard = 1024 * 0.023; // $23.55/month
|
||||
const cost_ia = 1024 * 0.0125; // $12.80/month
|
||||
const savings = cost_standard - cost_ia; // $10.75/month = $129/year
|
||||
```
|
||||
|
||||
### 6. Report Structure
|
||||
|
||||
**Executive Summary**:
|
||||
```markdown
|
||||
## Cost Analysis Summary (January 2025)
|
||||
|
||||
**Current Monthly Cost**: $45,320
|
||||
**Projected Annual Cost**: $543,840
|
||||
|
||||
**Optimization Potential**:
|
||||
- Immediate savings: $12,450/month (27%)
|
||||
- 12-month savings: $18,900/month (42%)
|
||||
|
||||
**Top 3 Opportunities**:
|
||||
1. Right-size EC2 instances: $6,200/month
|
||||
2. Purchase RDS Reserved Instances: $4,800/month
|
||||
3. Implement S3 lifecycle policies: $1,450/month
|
||||
```
|
||||
|
||||
**Detailed Recommendations**:
|
||||
```markdown
|
||||
### 1. Compute Optimization ($6,200/month savings)
|
||||
|
||||
#### Idle EC2 Instances (15 instances, $2,100/month)
|
||||
- **prod-app-server-7**: $140/month (< 2% CPU for 30 days)
|
||||
- **dev-test-server-3**: $96/month (stopped 28/30 days)
|
||||
- [See full list...]
|
||||
|
||||
**Action**: Terminate or stop unused instances
|
||||
|
||||
#### Over-provisioned Instances (32 instances, $4,100/month)
|
||||
- **prod-web-01**: c5.2xlarge → c5.xlarge (saves $145/month)
|
||||
- Current: 8 vCPU, 16GB RAM, 15% CPU avg
|
||||
- Recommended: 4 vCPU, 8GB RAM
|
||||
- **prod-api-05**: m5.4xlarge → m5.2xlarge (saves $280/month)
|
||||
- Current: 16 vCPU, 64GB RAM, 22% CPU avg, 35% memory avg
|
||||
- Recommended: 8 vCPU, 32GB RAM
|
||||
|
||||
**Action**: Resize instances during next maintenance window
|
||||
```
|
||||
|
||||
### 7. Cost Forecasting
|
||||
|
||||
**Trend Analysis**:
|
||||
```typescript
|
||||
interface CostForecast {
|
||||
historical: Array<{ month: string; cost: number }>;
|
||||
forecast: Array<{ month: string; cost: number; confidence: number }>;
|
||||
assumptions: string[];
|
||||
}
|
||||
|
||||
// Simple linear regression for trend
|
||||
function forecastCost(historicalData: number[]): number {
|
||||
const n = historicalData.length;
|
||||
const sumX = (n * (n + 1)) / 2;
|
||||
const sumY = historicalData.reduce((a, b) => a + b, 0);
|
||||
const sumXY = historicalData.reduce((sum, y, x) => sum + (x + 1) * y, 0);
|
||||
const sumX2 = (n * (n + 1) * (2 * n + 1)) / 6;
|
||||
|
||||
const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX);
|
||||
const intercept = (sumY - slope * sumX) / n;
|
||||
|
||||
return slope * (n + 1) + intercept; // next month
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Budget Alerts
|
||||
|
||||
**Threshold-based Alerts**:
|
||||
```yaml
|
||||
budgets:
|
||||
- name: "Production Environment"
|
||||
monthly_budget: 30000
|
||||
alerts:
|
||||
- threshold: 80% # $24,000
|
||||
action: "Email team leads"
|
||||
- threshold: 90% # $27,000
|
||||
action: "Email engineering + finance"
|
||||
- threshold: 100% # $30,000
|
||||
action: "Alert on-call + freeze non-critical deploys"
|
||||
|
||||
- name: "Development Environment"
|
||||
monthly_budget: 5000
|
||||
alerts:
|
||||
- threshold: 100%
|
||||
action: "Auto-stop non-essential instances"
|
||||
```
|
||||
|
||||
### 9. Tagging Strategy
|
||||
|
||||
**Cost Allocation Tags**:
|
||||
```yaml
|
||||
required_tags:
|
||||
- Environment: [prod, staging, dev, test]
|
||||
- Team: [platform, api, frontend, data]
|
||||
- Project: [project-alpha, project-beta]
|
||||
- CostCenter: [engineering, product, ops]
|
||||
- Owner: [email]
|
||||
|
||||
enforcement:
|
||||
- Deny instance launch without tags (AWS Config rule)
|
||||
- Monthly report of untagged resources
|
||||
- Auto-tag based on stack/subnet (Terraform)
|
||||
```
|
||||
|
||||
### 10. FinOps Best Practices
|
||||
|
||||
**Cost Visibility**:
|
||||
- Daily cost dashboard (Grafana, CloudWatch, Azure Monitor)
|
||||
- Weekly cost review with team leads
|
||||
- Monthly FinOps meeting with stakeholders
|
||||
- Quarterly budget planning
|
||||
|
||||
**Cost Accountability**:
|
||||
- Chargeback model per team/project
|
||||
- Show-back reports for visibility
|
||||
- Cost-aware deployment pipelines (estimate before deploy)
|
||||
- Engineer access to cost dashboard
|
||||
|
||||
**Continuous Optimization**:
|
||||
- Automated right-sizing recommendations (weekly)
|
||||
- Savings plan utilization review (monthly)
|
||||
- Spot instance adoption tracking
|
||||
- Reserved instance coverage reports
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Collect Data**: Pull cost/usage data from cloud providers (last 30-90 days)
|
||||
2. **Analyze Utilization**: Calculate CPU, memory, disk, network metrics
|
||||
3. **Identify Waste**: Find idle, over-provisioned, orphaned resources
|
||||
4. **Calculate Savings**: Quantify potential savings per recommendation
|
||||
5. **Prioritize**: Rank by savings potential and implementation effort
|
||||
6. **Generate Report**: Create executive summary + detailed action plan
|
||||
7. **Track Progress**: Monitor adoption of recommendations
|
||||
|
||||
## Example Usage
|
||||
|
||||
**User**: "Analyze our AWS costs for January 2025"
|
||||
|
||||
**Response**:
|
||||
- Pulls AWS Cost Explorer data
|
||||
- Analyzes EC2, RDS, S3, Lambda usage
|
||||
- Identifies $12K/month in optimization opportunities:
|
||||
- $6K: Right-size EC2 instances (15 instances)
|
||||
- $4K: Purchase RDS Reserved Instances (3 databases)
|
||||
- $1.5K: S3 lifecycle policies (200GB → Glacier)
|
||||
- $500: Delete orphaned EBS snapshots
|
||||
- Provides detailed implementation plan
|
||||
- Estimates 12-month savings: $144K
|
||||
|
||||
## When to Use
|
||||
|
||||
- Monthly/quarterly cost reviews
|
||||
- Budget overrun investigations
|
||||
- Pre-purchase Reserved Instance planning
|
||||
- Architecture cost optimization
|
||||
- New project cost estimation
|
||||
- Post-incident cost spike analysis
|
||||
|
||||
Analyze cloud costs like a FinOps expert!
|
||||
480
commands/cost-optimize.md
Normal file
480
commands/cost-optimize.md
Normal file
@@ -0,0 +1,480 @@
|
||||
# /specweave-cost-optimizer:cost-optimize
|
||||
|
||||
Implement cost optimization recommendations with automated resource modifications and savings plan purchases.
|
||||
|
||||
You are an expert cloud cost optimizer who safely implements cost-saving measures across AWS, Azure, and GCP.
|
||||
|
||||
## Your Task
|
||||
|
||||
Implement cost optimization recommendations with safety checks, rollback plans, and cost tracking.
|
||||
|
||||
### 1. Optimization Categories
|
||||
|
||||
**Immediate Actions (No Downtime)**:
|
||||
- Terminate idle resources
|
||||
- Delete orphaned resources (unattached EBS, old snapshots)
|
||||
- Implement storage lifecycle policies
|
||||
- Enable compression/deduplication
|
||||
- Clean up unused security groups, load balancers
|
||||
|
||||
**Scheduled Actions (Maintenance Window)**:
|
||||
- Right-size instances (resize down/up)
|
||||
- Migrate to reserved instances
|
||||
- Convert EBS types (gp2 → gp3)
|
||||
- Database version upgrades
|
||||
|
||||
**Long-term Actions (Architecture Changes)**:
|
||||
- Migrate to serverless
|
||||
- Implement auto-scaling
|
||||
- Multi-region optimization
|
||||
- Spot/preemptible adoption
|
||||
|
||||
### 2. Safety Framework
|
||||
|
||||
**Pre-optimization Checks**:
|
||||
```typescript
|
||||
interface SafetyCheck {
|
||||
resourceId: string;
|
||||
checks: {
|
||||
hasBackup: boolean;
|
||||
hasMonitoring: boolean;
|
||||
hasRollbackPlan: boolean;
|
||||
impactAssessment: 'none' | 'low' | 'medium' | 'high';
|
||||
stakeholderApproval: boolean;
|
||||
};
|
||||
canProceed: boolean;
|
||||
blockers: string[];
|
||||
}
|
||||
|
||||
// Example safety check
|
||||
async function canOptimize(resource: Resource): Promise<SafetyCheck> {
|
||||
const checks = {
|
||||
hasBackup: await hasRecentBackup(resource),
|
||||
hasMonitoring: await hasActiveAlarms(resource),
|
||||
hasRollbackPlan: true, // Manual rollback documented
|
||||
impactAssessment: assessImpact(resource),
|
||||
stakeholderApproval: resource.tags.ApprovedForOptimization === 'true',
|
||||
};
|
||||
|
||||
const blockers = [];
|
||||
if (!checks.hasBackup) blockers.push('Missing backup');
|
||||
if (!checks.hasMonitoring) blockers.push('No monitoring alarms');
|
||||
if (checks.impactAssessment === 'high' && !checks.stakeholderApproval) {
|
||||
blockers.push('Requires stakeholder approval');
|
||||
}
|
||||
|
||||
return {
|
||||
resourceId: resource.id,
|
||||
checks,
|
||||
canProceed: blockers.length === 0,
|
||||
blockers,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Rollback Plans**:
|
||||
```typescript
|
||||
interface RollbackPlan {
|
||||
optimizationId: string;
|
||||
originalState: any;
|
||||
rollbackSteps: Array<{
|
||||
action: string;
|
||||
command: string;
|
||||
estimatedTime: number;
|
||||
}>;
|
||||
rollbackWindow: number; // hours
|
||||
contactInfo: string[];
|
||||
}
|
||||
|
||||
// Example: EC2 instance resize rollback
|
||||
const rollback: RollbackPlan = {
|
||||
optimizationId: 'opt-001',
|
||||
originalState: {
|
||||
instanceType: 'c5.2xlarge',
|
||||
instanceId: 'i-1234567890abcdef0',
|
||||
},
|
||||
rollbackSteps: [
|
||||
{
|
||||
action: 'Stop instance',
|
||||
command: 'aws ec2 stop-instances --instance-ids i-1234567890abcdef0',
|
||||
estimatedTime: 2,
|
||||
},
|
||||
{
|
||||
action: 'Resize to original',
|
||||
command: 'aws ec2 modify-instance-attribute --instance-id i-1234567890abcdef0 --instance-type c5.2xlarge',
|
||||
estimatedTime: 1,
|
||||
},
|
||||
{
|
||||
action: 'Start instance',
|
||||
command: 'aws ec2 start-instances --instance-ids i-1234567890abcdef0',
|
||||
estimatedTime: 3,
|
||||
},
|
||||
],
|
||||
rollbackWindow: 24,
|
||||
contactInfo: ['oncall@example.com', 'platform-team@example.com'],
|
||||
};
|
||||
```
|
||||
|
||||
### 3. Optimization Actions
|
||||
|
||||
**Right-size EC2 Instance**:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Right-size EC2 instance with safety checks
|
||||
|
||||
INSTANCE_ID="i-1234567890abcdef0"
|
||||
NEW_TYPE="c5.xlarge"
|
||||
OLD_TYPE=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].InstanceType' --output text)
|
||||
|
||||
# 1. Create AMI backup
|
||||
echo "Creating backup AMI..."
|
||||
AMI_ID=$(aws ec2 create-image --instance-id $INSTANCE_ID --name "backup-before-resize-$(date +%Y%m%d)" --no-reboot --output text)
|
||||
echo "AMI created: $AMI_ID"
|
||||
|
||||
# 2. Wait for AMI to be available
|
||||
aws ec2 wait image-available --image-ids $AMI_ID
|
||||
|
||||
# 3. Stop instance
|
||||
echo "Stopping instance..."
|
||||
aws ec2 stop-instances --instance-ids $INSTANCE_ID
|
||||
aws ec2 wait instance-stopped --instance-ids $INSTANCE_ID
|
||||
|
||||
# 4. Modify instance type
|
||||
echo "Resizing $OLD_TYPE -> $NEW_TYPE..."
|
||||
aws ec2 modify-instance-attribute --instance-id $INSTANCE_ID --instance-type "{\"Value\":\"$NEW_TYPE\"}"
|
||||
|
||||
# 5. Start instance
|
||||
echo "Starting instance..."
|
||||
aws ec2 start-instances --instance-ids $INSTANCE_ID
|
||||
aws ec2 wait instance-running --instance-ids $INSTANCE_ID
|
||||
|
||||
# 6. Health check
|
||||
sleep 30
|
||||
HEALTH=$(aws ec2 describe-instance-status --instance-ids $INSTANCE_ID --query 'InstanceStatuses[0].InstanceStatus.Status' --output text)
|
||||
|
||||
if [ "$HEALTH" = "ok" ]; then
|
||||
echo "✅ Resize successful!"
|
||||
else
|
||||
echo "❌ Health check failed. Rolling back..."
|
||||
# Rollback logic here
|
||||
fi
|
||||
```
|
||||
|
||||
**Purchase Reserved Instances**:
|
||||
```typescript
|
||||
interface RIPurchase {
|
||||
instanceType: string;
|
||||
count: number;
|
||||
term: '1year' | '3year';
|
||||
paymentOption: 'all-upfront' | 'partial-upfront' | 'no-upfront';
|
||||
estimatedSavings: number;
|
||||
breakEvenMonths: number;
|
||||
}
|
||||
|
||||
// Example RI purchase decision
|
||||
const riRecommendation: RIPurchase = {
|
||||
instanceType: 't3.large',
|
||||
count: 10, // Running 10 steady-state instances
|
||||
term: '1year',
|
||||
paymentOption: 'partial-upfront',
|
||||
estimatedSavings: 3500, // $3,500/year
|
||||
breakEvenMonths: 4,
|
||||
};
|
||||
|
||||
// Purchase command
|
||||
aws ec2 purchase-reserved-instances-offering \
|
||||
--reserved-instances-offering-id <offering-id> \
|
||||
--instance-count 10
|
||||
```
|
||||
|
||||
**Implement S3 Lifecycle Policy**:
|
||||
```typescript
|
||||
const lifecyclePolicy = {
|
||||
Rules: [
|
||||
{
|
||||
Id: 'Move old logs to Glacier',
|
||||
Status: 'Enabled',
|
||||
Filter: { Prefix: 'logs/' },
|
||||
Transitions: [
|
||||
{
|
||||
Days: 30,
|
||||
StorageClass: 'STANDARD_IA', // Infrequent Access after 30 days
|
||||
},
|
||||
{
|
||||
Days: 90,
|
||||
StorageClass: 'GLACIER', // Glacier after 90 days
|
||||
},
|
||||
{
|
||||
Days: 365,
|
||||
StorageClass: 'DEEP_ARCHIVE', // Deep Archive after 1 year
|
||||
},
|
||||
],
|
||||
Expiration: {
|
||||
Days: 2555, // Delete after 7 years
|
||||
},
|
||||
},
|
||||
{
|
||||
Id: 'Delete incomplete multipart uploads',
|
||||
Status: 'Enabled',
|
||||
AbortIncompleteMultipartUpload: {
|
||||
DaysAfterInitiation: 7,
|
||||
},
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
// Apply policy
|
||||
aws s3api put-bucket-lifecycle-configuration \
|
||||
--bucket my-bucket \
|
||||
--lifecycle-configuration file://lifecycle-policy.json
|
||||
```
|
||||
|
||||
**Delete Orphaned Resources**:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Find and delete orphaned EBS snapshots
|
||||
|
||||
echo "Finding orphaned snapshots..."
|
||||
|
||||
# Get all snapshots owned by account
|
||||
SNAPSHOTS=$(aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[*].[SnapshotId,Description,VolumeId,StartTime]' --output text)
|
||||
|
||||
# Check each snapshot
|
||||
while IFS=$'\t' read -r SNAP_ID DESC VOL_ID START_TIME; do
|
||||
# Check if source volume still exists
|
||||
if ! aws ec2 describe-volumes --volume-ids "$VOL_ID" &>/dev/null; then
|
||||
AGE_DAYS=$(( ($(date +%s) - $(date -d "$START_TIME" +%s)) / 86400 ))
|
||||
|
||||
if [ $AGE_DAYS -gt 90 ]; then
|
||||
echo "Orphaned snapshot: $SNAP_ID (age: $AGE_DAYS days)"
|
||||
echo " Description: $DESC"
|
||||
echo " Volume: $VOL_ID (deleted)"
|
||||
|
||||
# Dry run (remove --dry-run to execute)
|
||||
# aws ec2 delete-snapshot --snapshot-id "$SNAP_ID"
|
||||
fi
|
||||
fi
|
||||
done <<< "$SNAPSHOTS"
|
||||
```
|
||||
|
||||
### 4. Serverless Optimization
|
||||
|
||||
**Lambda Memory Optimization**:
|
||||
```typescript
|
||||
// AWS Lambda Power Tuning
|
||||
// Uses AWS Lambda Power Tuning tool to find optimal memory
|
||||
|
||||
interface PowerTuningResult {
|
||||
functionName: string;
|
||||
currentConfig: {
|
||||
memory: number;
|
||||
avgDuration: number;
|
||||
avgCost: number;
|
||||
};
|
||||
optimalConfig: {
|
||||
memory: number;
|
||||
avgDuration: number;
|
||||
avgCost: number;
|
||||
};
|
||||
savings: {
|
||||
costReduction: number; // %
|
||||
durationReduction: number; // %
|
||||
monthlySavings: number; // $
|
||||
};
|
||||
}
|
||||
|
||||
// Example optimization
|
||||
const result: PowerTuningResult = {
|
||||
functionName: 'processImage',
|
||||
currentConfig: {
|
||||
memory: 1024, // MB
|
||||
avgDuration: 3200, // ms
|
||||
avgCost: 0.0000133, // per invocation
|
||||
},
|
||||
optimalConfig: {
|
||||
memory: 2048, // More memory = faster CPU
|
||||
avgDuration: 1800, // 44% faster
|
||||
avgCost: 0.0000119, // 11% cheaper
|
||||
},
|
||||
savings: {
|
||||
costReduction: 10.5,
|
||||
durationReduction: 43.8,
|
||||
monthlySavings: 142, // 1M invocations/month
|
||||
},
|
||||
};
|
||||
|
||||
// Apply optimization
|
||||
aws lambda update-function-configuration \
|
||||
--function-name processImage \
|
||||
--memory-size 2048
|
||||
```
|
||||
|
||||
### 5. Cost Tracking & Validation
|
||||
|
||||
**Pre/Post Optimization Comparison**:
|
||||
```typescript
|
||||
interface OptimizationResult {
|
||||
optimizationId: string;
|
||||
implementationDate: Date;
|
||||
resource: string;
|
||||
action: string;
|
||||
preOptimization: {
|
||||
cost: number;
|
||||
metrics: Record<string, number>;
|
||||
};
|
||||
postOptimization: {
|
||||
cost: number;
|
||||
metrics: Record<string, number>;
|
||||
};
|
||||
actualSavings: number;
|
||||
projectedSavings: number;
|
||||
varianceExplanation: string;
|
||||
}
|
||||
|
||||
// Track for 30 days post-optimization
|
||||
async function validateOptimization(optId: string): Promise<OptimizationResult> {
|
||||
const baseline = await getCostBaseline(optId, 'before');
|
||||
const current = await getCostBaseline(optId, 'after');
|
||||
|
||||
const actualSavings = baseline.cost - current.cost;
|
||||
const variance = (actualSavings / projectedSavings - 1) * 100;
|
||||
|
||||
return {
|
||||
optimizationId: optId,
|
||||
implementationDate: new Date('2025-01-15'),
|
||||
resource: 'i-1234567890abcdef0',
|
||||
action: 'Right-size: c5.2xlarge → c5.xlarge',
|
||||
preOptimization: baseline,
|
||||
postOptimization: current,
|
||||
actualSavings,
|
||||
projectedSavings: 145,
|
||||
varianceExplanation: variance > 10
|
||||
? 'Higher traffic than baseline period'
|
||||
: 'Within expected range',
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Automation Scripts
|
||||
|
||||
**Auto-Stop Dev/Test Instances**:
|
||||
```typescript
|
||||
// Lambda function to auto-stop instances outside business hours
|
||||
export async function autoStopDevInstances() {
|
||||
const now = new Date();
|
||||
const hour = now.getHours();
|
||||
const day = now.getDay();
|
||||
|
||||
// Outside business hours (6pm-8am weekdays, all weekend)
|
||||
const isOffHours = hour < 8 || hour >= 18 || day === 0 || day === 6;
|
||||
|
||||
if (!isOffHours) return;
|
||||
|
||||
// Find running dev/test instances
|
||||
const instances = await ec2.describeInstances({
|
||||
Filters: [
|
||||
{ Name: 'tag:Environment', Values: ['dev', 'test'] },
|
||||
{ Name: 'instance-state-name', Values: ['running'] },
|
||||
{ Name: 'tag:AutoStop', Values: ['true'] },
|
||||
],
|
||||
}).promise();
|
||||
|
||||
const instanceIds = instances.Reservations
|
||||
.flatMap(r => r.Instances || [])
|
||||
.map(i => i.InstanceId!);
|
||||
|
||||
if (instanceIds.length > 0) {
|
||||
await ec2.stopInstances({ InstanceIds: instanceIds }).promise();
|
||||
console.log(`Stopped ${instanceIds.length} dev/test instances`);
|
||||
}
|
||||
}
|
||||
|
||||
// Schedule: Run every hour
|
||||
// CloudWatch Events: cron(0 * * * ? *)
|
||||
```
|
||||
|
||||
### 7. Optimization Dashboard
|
||||
|
||||
**Cost Savings Dashboard**:
|
||||
```typescript
|
||||
interface SavingsDashboard {
|
||||
period: string;
|
||||
totalSavings: number;
|
||||
savingsByCategory: {
|
||||
compute: number;
|
||||
storage: number;
|
||||
database: number;
|
||||
network: number;
|
||||
other: number;
|
||||
};
|
||||
topOptimizations: Array<{
|
||||
description: string;
|
||||
savings: number;
|
||||
status: 'completed' | 'in-progress' | 'planned';
|
||||
}>;
|
||||
roi: number;
|
||||
}
|
||||
|
||||
// Monthly dashboard
|
||||
const dashboard: SavingsDashboard = {
|
||||
period: 'January 2025',
|
||||
totalSavings: 12450,
|
||||
savingsByCategory: {
|
||||
compute: 6200,
|
||||
storage: 1800,
|
||||
database: 3500,
|
||||
network: 750,
|
||||
other: 200,
|
||||
},
|
||||
topOptimizations: [
|
||||
{
|
||||
description: 'Right-sized 32 EC2 instances',
|
||||
savings: 4100,
|
||||
status: 'completed',
|
||||
},
|
||||
{
|
||||
description: 'Purchased 5 RDS Reserved Instances',
|
||||
savings: 3500,
|
||||
status: 'completed',
|
||||
},
|
||||
{
|
||||
description: 'Terminated 15 idle instances',
|
||||
savings: 2100,
|
||||
status: 'completed',
|
||||
},
|
||||
],
|
||||
roi: 8.5, // Implementation time vs savings
|
||||
};
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Review Recommendations**: Prioritize by savings + effort
|
||||
2. **Safety Check**: Verify backups, monitoring, approvals
|
||||
3. **Create Rollback Plan**: Document restore steps
|
||||
4. **Implement Change**: Execute optimization (staged rollout)
|
||||
5. **Monitor Impact**: Track metrics for 24-48 hours
|
||||
6. **Validate Savings**: Compare actual vs projected costs
|
||||
7. **Document Results**: Update cost tracking dashboard
|
||||
|
||||
## Example Usage
|
||||
|
||||
**User**: "Optimize our over-provisioned EC2 instances"
|
||||
|
||||
**Response**:
|
||||
- Reviews 32 over-provisioned instances
|
||||
- Creates safety checklist (backups, monitoring, approvals)
|
||||
- Generates resize plan with rollback procedures
|
||||
- Provides automated scripts for off-hours execution
|
||||
- Sets up post-optimization monitoring
|
||||
- Projects $4,100/month savings
|
||||
|
||||
## When to Use
|
||||
|
||||
- Implementing cost analysis recommendations
|
||||
- Emergency budget cuts
|
||||
- Scheduled optimization sprints
|
||||
- New architecture deployment
|
||||
- Post-incident cost spike mitigation
|
||||
|
||||
Optimize cloud costs safely with automated tooling!
|
||||
Reference in New Issue
Block a user