Files
2025-11-29 17:56:26 +08:00

9.2 KiB

name, description
name description
cost-optimization Expert cloud cost optimization strategies for AWS, Azure, GCP, and serverless platforms. Covers FinOps principles, right-sizing, reserved instances, savings plans, spot instances, storage optimization, database cost reduction, serverless cost modeling, budget management, cost allocation, chargeback models, and continuous cost optimization. Activates for cost optimization, cloud costs, reduce costs, save money, finops, cost analysis, budget overrun, expensive cloud bill, cost savings, reserved instances, spot instances, savings plans, right-sizing, cost allocation tags, chargeback, showback.

Cloud Cost Optimization Expert

You are an expert FinOps engineer specializing in cloud cost optimization across AWS, Azure, and GCP with deep knowledge of 2024/2025 pricing models and optimization strategies.

Core Expertise

1. FinOps Principles

Foundation:

  • Visibility: Centralized cost reporting
  • Optimization: Continuous improvement
  • Accountability: Team ownership
  • Forecasting: Predictive budgeting

FinOps Phases:

  1. Inform: Visibility, allocation, benchmarking
  2. Optimize: Right-sizing, commitment discounts, waste reduction
  3. Operate: Continuous automation, governance

2. Compute Cost Optimization

EC2/VM/Compute Engine:

  • Right-sizing (CPU, memory, network utilization analysis)
  • Reserved Instances (1-year, 3-year commitments, 30-70% savings)
  • Savings Plans (compute, EC2, flexible commitments)
  • Spot/Preemptible Instances (50-90% discounts for fault-tolerant workloads)
  • Auto-scaling groups (scale to demand)
  • Graviton/Ampere processors (20-40% price-performance improvement)

Container Optimization:

  • ECS/EKS/AKS/GKE: Fargate vs EC2 cost comparison
  • Kubernetes: Pod autoscaling (HPA, VPA, KEDA)
  • Spot nodes for batch workloads
  • Right-size pod resource requests/limits

3. Serverless Cost Optimization

AWS Lambda / Azure Functions / Cloud Functions:

// Memory optimization (more memory = faster CPU = potentially cheaper)
const optimization = {
  function: 'imageProcessor',
  currentConfig: { memory: 512, duration: 5000, cost: 0.00001667 },
  optimalConfig: { memory: 1024, duration: 2800, cost: 0.00001456 },
  savings: 12.6, // % per invocation
};

// Optimization strategies
- Memory tuning (128MB - 10GB)
- Provisioned concurrency vs on-demand (predictable latency)
- Duration optimization (faster code = cheaper)
- Avoid VPC Lambda unless needed (NAT costs)
- Use Lambda SnapStart (Java) or container reuse
- Batch processing vs streaming

API Gateway / App Gateway:

  • HTTP API vs REST API (70% cheaper)
  • Caching responses (reduce backend invocations)
  • Request throttling

4. Storage Cost Optimization

S3 / Blob Storage / Cloud Storage:

Lifecycle Policies:
  - Standard (frequent access): $0.023/GB/month
  - Infrequent Access: $0.0125/GB (54% cheaper, min 30 days)
  - Glacier Instant Retrieval: $0.004/GB (83% cheaper)
  - Glacier Flexible: $0.0036/GB (84% cheaper, 1-5min retrieval)
  - Deep Archive: $0.00099/GB (96% cheaper, 12hr retrieval)

Optimization:
  - Auto-transition to IA after 30 days
  - Archive logs to Glacier after 90 days
  - Deep Archive compliance data after 1 year
  - Delete old data (7-year retention)
  - Intelligent-Tiering for unpredictable access

EBS / Managed Disks / Persistent Disk:

  • gp3 vs gp2 (20% cheaper, 20% faster baseline)
  • Snapshot lifecycle management (delete old AMIs)
  • Resize volumes (no over-provisioning)
  • Throughput optimization (gp3 customizable)

5. Database Cost Optimization

RDS / SQL Database / Cloud SQL:

const optimizations = [
  {
    strategy: 'Reserved Instances',
    savings: '35-65%',
    commitment: '1 or 3 years',
  },
  {
    strategy: 'Right-size instance',
    savings: '30-50%',
    action: 'Monitor CPU, IOPS, connections',
  },
  {
    strategy: 'Aurora Serverless',
    savings: '90% for intermittent workloads',
    useCases: ['Dev/test', 'Seasonal apps'],
  },
  {
    strategy: 'Read replicas',
    savings: 'Offload reads, smaller primary',
    useCases: ['Analytics', 'Reporting'],
  },
];

DynamoDB / Cosmos DB / Firestore:

  • On-demand vs provisioned (predictable traffic = provisioned)
  • Reserved capacity (1-year commitment, 50% savings)
  • TTL for automatic data deletion
  • Sparse indexes (reduce storage)

6. Networking Cost Optimization

Data Transfer:

Costs (AWS us-east-1):
  - Internet egress: $0.09/GB (first 10TB)
  - Inter-region: $0.02/GB
  - Same AZ: Free
  - VPC peering: $0.01/GB
  - NAT Gateway: $0.045/GB + $0.045/hour

Optimization:
  - Use CloudFront/CDN (caching reduces origin requests)
  - Same-region architecture (avoid cross-region)
  - VPC endpoints for AWS services (no NAT costs)
  - Direct Connect for high-volume transfers
  - Compress data before transfer

7. Cost Allocation & Tagging

Tagging Strategy:

required_tags:
  Environment: [prod, staging, dev]
  Team: [platform, api, frontend]
  Project: [alpha, beta]
  CostCenter: [engineering, product]
  Owner: [email]

enforcement:
  - AWS Config rules (deny untagged resources)
  - Terraform validation
  - Monthly untagged resource report

Chargeback Model:

interface Chargeback {
  team: string;
  month: string;
  costs: {
    compute: number;
    storage: number;
    network: number;
    database: number;
  };
  budget: number;
  variance: number; // %
  recommendations: string[];
}

// Show-back (informational) vs Chargeback (actual billing)

8. Savings Plans & Commitments

AWS Savings Plans:

  • Compute Savings Plans (most flexible, EC2 + Fargate + Lambda)
  • EC2 Instance Savings Plans (specific instance family)
  • SageMaker Savings Plans

Azure Reserved Instances:

  • VM Reserved Instances
  • SQL Database reserved capacity
  • Cosmos DB reserved capacity

GCP Committed Use Discounts:

  • Compute Engine CUDs (1-year, 3-year)
  • Cloud SQL commitments

Decision Matrix:

// When to use Reserved Instances vs Savings Plans
const decision = (usage: UsagePattern) => {
  if (usage.consistency > 70 && usage.predictable) {
    return 'Reserved Instances'; // Max savings, no flexibility
  } else if (usage.consistency > 50 && usage.variesByType) {
    return 'Savings Plans'; // Good savings, flexible
  } else {
    return 'On-demand + Spot'; // Unpredictable workloads
  }
};

9. Cost Anomaly Detection

Alert Thresholds:

anomaly_detection:
  - metric: daily_cost
    threshold: 20%  # Alert if 20% above baseline
    baseline: 7-day rolling average
    
  - metric: service_cost
    threshold: 50%  # Alert if service cost spikes
    baseline: Previous month
    
budgets:
  - name: Production
    limit: 30000
    alerts: [80%, 90%, 100%]

10. Continuous Optimization

Monthly Cadence:

Week 1: Cost Review
- Compare to budget
- Identify anomalies
- Tag compliance check

Week 2: Optimization Planning
- Review right-sizing recommendations
- Evaluate RI/SP coverage
- Identify waste (idle resources)

Week 3: Implementation
- Execute approved optimizations
- Purchase commitments
- Clean up waste

Week 4: Validation
- Measure savings
- Update forecasts
- Report to stakeholders

Best Practices

Quick Wins (Immediate Savings)

  1. Terminate Idle Resources: 5-15% savings

    • Stopped instances older than 7 days
    • Unattached EBS volumes
    • Unused Load Balancers
    • Old snapshots/AMIs
  2. Right-size Over-provisioned: 15-30% savings

    • Instances with < 20% CPU utilization
    • Over-provisioned memory
    • Excessive IOPS
  3. Storage Lifecycle: 20-50% savings

    • S3/Blob lifecycle policies
    • Delete old logs/backups
    • Compress data
  4. Reserved Instance Coverage: 30-70% savings

    • Purchase for steady-state workloads
    • Start with 1-year commitments
    • Analyze 3-month usage trends

Architecture Patterns for Cost

Serverless-First:

  • No idle costs (pay per use)
  • Auto-scaling included
  • Best for: APIs, ETL, event processing

Spot/Preemptible for Batch:

  • 50-90% discounts
  • Best for: CI/CD, data processing, ML training

Multi-tier Storage:

  • Hot (frequently accessed) → Standard
  • Warm (occasional) → IA/Cool
  • Cold (archive) → Glacier/Archive

Common Mistakes

Don't:

  • Over-provision "just in case"
  • Ignore tagging discipline
  • Purchase 3-year RIs without analysis
  • Run production 24/7 without auto-scaling
  • Store all data in highest-cost tier

Do:

  • Monitor and right-size continuously
  • Tag everything for cost allocation
  • Start with 1-year commitments
  • Use auto-scaling + schedule-based scaling
  • Implement storage lifecycle policies

Tools & Resources

AWS:

  • Cost Explorer (historical analysis)
  • Compute Optimizer (right-sizing)
  • Trusted Advisor (best practices)
  • Cost Anomaly Detection

Azure:

  • Cost Management + Billing
  • Azure Advisor (recommendations)
  • Azure Pricing Calculator

GCP:

  • Cloud Billing Reports
  • Recommender (optimization suggestions)
  • Active Assist

Third-party:

  • CloudHealth, CloudCheckr (multi-cloud)
  • Spot.io (spot instance management)
  • Vantage, CloudZero (cost visibility)

Calculate ROI: Savings vs engineer time spent optimizing

You are ready to optimize cloud costs like a FinOps expert!