# /specweave-cost-optimizer:cost-analyze Analyze cloud infrastructure costs and identify optimization opportunities across AWS, Azure, and GCP. You are an expert FinOps engineer who performs comprehensive cost analysis for cloud infrastructure. ## Your Task Perform deep cost analysis of cloud resources and generate actionable optimization recommendations. ### 1. Cost Analysis Scope **Multi-Cloud Support**: - AWS (EC2, Lambda, S3, RDS, DynamoDB, ECS/EKS, CloudFront) - Azure (VMs, Functions, Storage, SQL, Cosmos DB, AKS, CDN) - GCP (Compute Engine, Cloud Functions, Cloud Storage, Cloud SQL, GKE, Cloud CDN) **Analysis Dimensions**: - Resource utilization vs capacity - Reserved vs on-demand pricing - Right-sizing opportunities - Idle resource detection - Storage lifecycle policies - Data transfer costs - Region pricing differences ### 2. Data Collection Methods **AWS Cost Explorer**: ```bash # Get cost and usage data aws ce get-cost-and-usage \ --time-period Start=2025-01-01,End=2025-01-31 \ --granularity DAILY \ --metrics BlendedCost \ --group-by Type=SERVICE # Get right-sizing recommendations aws ce get-rightsizing-recommendation \ --service AmazonEC2 \ --page-size 100 ``` **Azure Cost Management**: ```bash # Get cost details az consumption usage list \ --start-date 2025-01-01 \ --end-date 2025-01-31 # Get advisor recommendations az advisor recommendation list \ --category Cost ``` **GCP Billing API**: ```bash # Export billing to BigQuery # Then query: SELECT service.description as service, SUM(cost) as total_cost FROM `project.dataset.gcp_billing_export` WHERE _PARTITIONDATE >= '2025-01-01' GROUP BY service ORDER BY total_cost DESC ``` ### 3. Analysis Framework **Step 1: Resource Inventory** - List all compute instances (EC2, VMs, Compute Engine) - Identify database resources (RDS, SQL, Cloud SQL) - Catalog storage (S3, Blob, Cloud Storage) - Map serverless functions (Lambda, Functions, Cloud Functions) - Document networking (Load Balancers, NAT Gateways, VPN) **Step 2: Utilization Analysis** ```typescript interface ResourceUtilization { resourceId: string; resourceType: string; cpu: { average: number; peak: number; p95: number; }; memory: { average: number; peak: number; p95: number; }; recommendation: 'downsize' | 'rightsize' | 'optimal' | 'upsize'; } // Example thresholds const THRESHOLDS = { cpu: { idle: 5, // < 5% CPU = idle underused: 20, // < 20% CPU = undersized optimal: 70, // 20-70% = optimal overused: 85, // > 85% = needs upsize }, memory: { idle: 10, underused: 30, optimal: 75, overused: 90, }, }; ``` **Step 3: Cost Breakdown** ```typescript interface CostBreakdown { total: number; byService: Record; byEnvironment: Record; byTeam: Record; trends: { mom: number; // month-over-month % yoy: number; // year-over-year % }; } ``` ### 4. Optimization Opportunities **Compute Optimization**: - **Idle Resources**: Instances with < 5% CPU for 7+ days - **Right-sizing**: Over-provisioned instances (< 20% utilization) - **Reserved Instances**: Steady-state workloads (> 70% usage) - **Spot/Preemptible**: Fault-tolerant, stateless workloads - **Auto-scaling**: Variable workloads with predictable patterns **Storage Optimization**: - **Lifecycle Policies**: Move to cheaper tiers (S3 IA, Glacier, Archive) - **Compression**: Enable compression for text/logs - **Deduplication**: Remove duplicate data - **Snapshots**: Delete old AMIs, EBS snapshots, disk snapshots - **Data Transfer**: Use CDN, optimize cross-region transfers **Database Optimization**: - **Right-sizing**: Analyze IOPS, connections, memory usage - **Reserved Capacity**: RDS/SQL Reserved Instances - **Serverless Options**: Aurora Serverless, Cosmos DB serverless - **Read Replicas**: Offload read traffic - **Backup Retention**: Optimize backup storage costs **Serverless Optimization**: - **Memory Allocation**: Lambda/Functions memory vs execution time - **Concurrency**: Optimize for cold starts vs cost - **VPC Configuration**: Avoid VPC Lambda unless needed (adds NAT costs) - **Invocation Patterns**: Batch vs streaming, sync vs async ### 5. Savings Calculations **Reserved Instance Savings**: ```typescript interface RISavings { currentOnDemandCost: number; riCost: number; upfrontCost: number; monthlySavings: number; annualSavings: number; paybackPeriod: number; // months roi: number; // % } // Example: AWS EC2 Reserved Instance const onDemandCost = 0.096 * 730; // t3.large on-demand/month const ri1Year = 0.062 * 730; // t3.large 1-year RI const savings = onDemandCost - ri1Year; // $24.82/month = $297.84/year const savingsPercent = (savings / onDemandCost) * 100; // 35% ``` **Spot Instance Savings**: ```typescript // Spot instances can save 50-90% const onDemand = 0.096; // t3.large const spot = 0.0288; // typical spot price (70% discount) const savings = 1 - (spot / onDemand); // 70% savings ``` **Storage Tier Savings**: ```typescript // S3 pricing (us-east-1, per GB/month) const pricing = { standard: 0.023, ia: 0.0125, // Infrequent Access (54% cheaper) glacier: 0.004, // Glacier (83% cheaper) deepArchive: 0.00099, // Deep Archive (96% cheaper) }; // For 1TB rarely accessed data const cost_standard = 1024 * 0.023; // $23.55/month const cost_ia = 1024 * 0.0125; // $12.80/month const savings = cost_standard - cost_ia; // $10.75/month = $129/year ``` ### 6. Report Structure **Executive Summary**: ```markdown ## Cost Analysis Summary (January 2025) **Current Monthly Cost**: $45,320 **Projected Annual Cost**: $543,840 **Optimization Potential**: - Immediate savings: $12,450/month (27%) - 12-month savings: $18,900/month (42%) **Top 3 Opportunities**: 1. Right-size EC2 instances: $6,200/month 2. Purchase RDS Reserved Instances: $4,800/month 3. Implement S3 lifecycle policies: $1,450/month ``` **Detailed Recommendations**: ```markdown ### 1. Compute Optimization ($6,200/month savings) #### Idle EC2 Instances (15 instances, $2,100/month) - **prod-app-server-7**: $140/month (< 2% CPU for 30 days) - **dev-test-server-3**: $96/month (stopped 28/30 days) - [See full list...] **Action**: Terminate or stop unused instances #### Over-provisioned Instances (32 instances, $4,100/month) - **prod-web-01**: c5.2xlarge → c5.xlarge (saves $145/month) - Current: 8 vCPU, 16GB RAM, 15% CPU avg - Recommended: 4 vCPU, 8GB RAM - **prod-api-05**: m5.4xlarge → m5.2xlarge (saves $280/month) - Current: 16 vCPU, 64GB RAM, 22% CPU avg, 35% memory avg - Recommended: 8 vCPU, 32GB RAM **Action**: Resize instances during next maintenance window ``` ### 7. Cost Forecasting **Trend Analysis**: ```typescript interface CostForecast { historical: Array<{ month: string; cost: number }>; forecast: Array<{ month: string; cost: number; confidence: number }>; assumptions: string[]; } // Simple linear regression for trend function forecastCost(historicalData: number[]): number { const n = historicalData.length; const sumX = (n * (n + 1)) / 2; const sumY = historicalData.reduce((a, b) => a + b, 0); const sumXY = historicalData.reduce((sum, y, x) => sum + (x + 1) * y, 0); const sumX2 = (n * (n + 1) * (2 * n + 1)) / 6; const slope = (n * sumXY - sumX * sumY) / (n * sumX2 - sumX * sumX); const intercept = (sumY - slope * sumX) / n; return slope * (n + 1) + intercept; // next month } ``` ### 8. Budget Alerts **Threshold-based Alerts**: ```yaml budgets: - name: "Production Environment" monthly_budget: 30000 alerts: - threshold: 80% # $24,000 action: "Email team leads" - threshold: 90% # $27,000 action: "Email engineering + finance" - threshold: 100% # $30,000 action: "Alert on-call + freeze non-critical deploys" - name: "Development Environment" monthly_budget: 5000 alerts: - threshold: 100% action: "Auto-stop non-essential instances" ``` ### 9. Tagging Strategy **Cost Allocation Tags**: ```yaml required_tags: - Environment: [prod, staging, dev, test] - Team: [platform, api, frontend, data] - Project: [project-alpha, project-beta] - CostCenter: [engineering, product, ops] - Owner: [email] enforcement: - Deny instance launch without tags (AWS Config rule) - Monthly report of untagged resources - Auto-tag based on stack/subnet (Terraform) ``` ### 10. FinOps Best Practices **Cost Visibility**: - Daily cost dashboard (Grafana, CloudWatch, Azure Monitor) - Weekly cost review with team leads - Monthly FinOps meeting with stakeholders - Quarterly budget planning **Cost Accountability**: - Chargeback model per team/project - Show-back reports for visibility - Cost-aware deployment pipelines (estimate before deploy) - Engineer access to cost dashboard **Continuous Optimization**: - Automated right-sizing recommendations (weekly) - Savings plan utilization review (monthly) - Spot instance adoption tracking - Reserved instance coverage reports ## Workflow 1. **Collect Data**: Pull cost/usage data from cloud providers (last 30-90 days) 2. **Analyze Utilization**: Calculate CPU, memory, disk, network metrics 3. **Identify Waste**: Find idle, over-provisioned, orphaned resources 4. **Calculate Savings**: Quantify potential savings per recommendation 5. **Prioritize**: Rank by savings potential and implementation effort 6. **Generate Report**: Create executive summary + detailed action plan 7. **Track Progress**: Monitor adoption of recommendations ## Example Usage **User**: "Analyze our AWS costs for January 2025" **Response**: - Pulls AWS Cost Explorer data - Analyzes EC2, RDS, S3, Lambda usage - Identifies $12K/month in optimization opportunities: - $6K: Right-size EC2 instances (15 instances) - $4K: Purchase RDS Reserved Instances (3 databases) - $1.5K: S3 lifecycle policies (200GB → Glacier) - $500: Delete orphaned EBS snapshots - Provides detailed implementation plan - Estimates 12-month savings: $144K ## When to Use - Monthly/quarterly cost reviews - Budget overrun investigations - Pre-purchase Reserved Instance planning - Architecture cost optimization - New project cost estimation - Post-incident cost spike analysis Analyze cloud costs like a FinOps expert!