Files
gh-ahmedasmar-devops-claude…/references/finops_governance.md
2025-11-29 17:51:09 +08:00

17 KiB

FinOps Governance Framework

Organizational practices, processes, and governance for AWS cost optimization.

Table of Contents

  1. FinOps Principles
  2. Cost Allocation & Tagging
  3. Budget Management
  4. Monthly Review Process
  5. Roles & Responsibilities
  6. Chargeback & Showback
  7. Policy & Governance
  8. Metrics & KPIs

FinOps Principles

The FinOps Framework

FinOps is the practice of bringing financial accountability to cloud spending through collaboration between engineering, finance, and business teams.

Core Principles:

  1. Teams Need to Collaborate

    • Engineering makes technical decisions
    • Finance provides visibility and reporting
    • Business sets priorities and budgets
    • Cross-functional cost optimization
  2. Everyone Takes Ownership

    • Engineers see cost impact of their decisions
    • Teams have cost budgets and accountability
    • Cost is a efficiency metric, not just finance
  3. Decisions Driven by Business Value

    • Speed, quality, and cost trade-offs
    • Investment vs optimization decisions
    • ROI-based prioritization
  4. Take Advantage of Variable Cost Model

    • Scale resources up and down as needed
    • Use different pricing models strategically
    • Optimize for actual usage patterns
  5. Centralized Team Drives FinOps

    • Central FinOps team enables
    • Distributed execution by product teams
    • Share best practices and tools

FinOps Maturity Model

Crawl Phase (Getting Started)

  • Basic cost visibility
  • Manual reporting
  • Ad-hoc optimization
  • Initial tagging strategy
  • Basic budget alerts

Walk Phase (Improving)

  • Automated cost reporting
  • Regular optimization reviews
  • Systematic tagging enforcement
  • Team cost allocation
  • Reserved Instance planning
  • Monthly optimization meetings

Run Phase (Optimized)

  • Real-time cost visibility
  • Automated optimization
  • Cost-aware engineering culture
  • Predictive forecasting
  • Automated guardrails
  • FinOps integrated in SDLC

Cost Allocation & Tagging

Tagging Strategy

Required Tags (Enforce via Policy)

Required Tags:
  Environment:
    values: [prod, staging, dev, test]
    purpose: Separate production from non-production costs

  Owner:
    values: [email or team name]
    purpose: Contact for resource questions

  Project:
    values: [project code]
    purpose: Track project spending

  CostCenter:
    values: [department code]
    purpose: Chargeback allocation

  Application:
    values: [app name]
    purpose: Application-level cost tracking

Optional but Recommended Tags

Optional Tags:
  ExpirationDate:
    format: YYYY-MM-DD
    purpose: Auto-cleanup scheduling

  DataClassification:
    values: [public, internal, confidential, restricted]
    purpose: Security and compliance

  BackupRequired:
    values: [true, false]
    purpose: Backup policy enforcement

  Criticality:
    values: [critical, high, medium, low]
    purpose: Priority and SLA determination

Tag Enforcement

Using AWS Organizations Service Control Policies (SCP)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyEC2CreationWithoutTags",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:instance/*"
      ],
      "Condition": {
        "StringNotLike": {
          "aws:RequestTag/Environment": ["prod", "staging", "dev", "test"],
          "aws:RequestTag/Owner": "*",
          "aws:RequestTag/Project": "*"
        }
      }
    }
  ]
}

Using AWS Config Rules

  • required-tags: Enforce tags on all resources
  • ec2-instance-no-public-ip: Prevent public IPs unless tagged
  • Custom Lambda-based rules for complex logic

Tag Compliance Monitoring

# Example: Check tag compliance
# Run weekly to find untagged resources

aws resourcegroupstaggingapi get-resources \
  --query 'ResourceTagMappingList[?length(Tags) == `0`]' \
  --output table

# Or use Tag Editor in AWS Console

Cost Allocation Tags

Activating Cost Allocation Tags

  1. Go to AWS Billing → Cost Allocation Tags
  2. Select user-defined tags to activate
  3. Wait 24 hours for tags to appear in Cost Explorer
  4. Tags only apply to charges after activation

Best Practices

  • Activate tags before using them
  • Use consistent naming (e.g., Environment not Env or environment)
  • Document tag values in wiki/runbook
  • Review and update tag strategy quarterly

Budget Management

AWS Budgets Setup

Budget Types

  1. Cost Budget: Track spending against threshold
  2. Usage Budget: Track service usage (e.g., EC2 hours)
  3. Savings Plans Budget: Track commitment utilization
  4. Reservation Budget: Track RI utilization

Recommended Budgets

1. Overall Monthly Budget

Budget Name: Company-Wide-Monthly-Budget
Amount: $50,000/month
Alerts:
  - 50% actual: Email CFO, FinOps team
  - 80% actual: Email CFO, CTO, FinOps team
  - 100% forecasted: Email CFO, CTO, all team leads
  - 100% actual: Email everyone + Slack alert

2. Per-Environment Budgets

Budget Name: Production-Environment-Budget
Amount: $30,000/month
Filter: Environment=prod
Alerts:
  - 80% actual: Email engineering leads
  - 100% forecasted: Email CTO + FinOps

Budget Name: Dev-Environment-Budget
Amount: $5,000/month
Filter: Environment=dev
Alerts:
  - 100% actual: Email dev team leads
  - 120% actual: Automated shutdown (if possible)

3. Per-Team Budgets

Budget Name: Team-Platform-Budget
Amount: $15,000/month
Filter: Owner=platform-team
Alerts:
  - 90% actual: Email platform team
  - 100% forecasted: Email platform team + manager

4. Per-Project Budgets

Budget Name: Project-Phoenix-Budget
Amount: $8,000/month
Filter: Project=phoenix
Alerts:
  - 75% actual: Email project owner
  - 100% actual: Email project owner + sponsor

Budget Alert Actions

Automated Responses to Budget Alerts

# Lambda function triggered by Budget alert SNS topic

def lambda_handler(event, context):
    # Parse budget alert
    budget_name = event['budgetName']
    threshold = event['threshold']

    if threshold >= 100:
        # Stop non-production instances
        stop_dev_instances()

        # Send Slack alert
        send_slack_alert(f"🚨 Budget {budget_name} exceeded!")

        # Create JIRA ticket
        create_cost_investigation_ticket()

    elif threshold >= 80:
        # Send warning
        send_slack_alert(f"⚠️  Budget {budget_name} at 80%")

Monthly Review Process

FinOps Monthly Cadence

Week 1: Data Collection

  • Export Cost & Usage Reports
  • Run cost optimization scripts
  • Gather CloudWatch metrics
  • Compile anomaly reports

Week 2: Analysis

  • Identify cost trends
  • Find optimization opportunities
  • Compare to previous months
  • Analyze tag compliance

Week 3: Team Review Meetings

  • Present findings to engineering teams
  • Discuss optimization opportunities
  • Assign action items
  • Review upcoming projects

Week 4: Executive Reporting

  • Create executive summary
  • Present cost trends to leadership
  • Report on optimization wins
  • Forecast next quarter

Monthly Review Meeting Agenda

Attendees: Engineering Leads, FinOps Team, Finance Rep, Product Manager

Agenda (1 hour)

  1. Previous Month Recap (10 min)

    • Total spend vs budget
    • Top 5 services by cost
    • Month-over-month comparison
    • Budget variance explanation
  2. Cost Anomalies (10 min)

    • Unusual spending spikes
    • Root cause analysis
    • Prevention measures
  3. Optimization Opportunities (15 min)

    • Unused resources found
    • Rightsizing recommendations
    • Reserved Instance opportunities
    • Estimated savings
  4. Team Cost Breakdown (10 min)

    • Per-team spending
    • Top spenders
    • Tag compliance status
  5. Upcoming Changes (10 min)

    • New projects launching
    • Expected cost impact
    • Budget adjustments needed
  6. Action Items Review (5 min)

    • Follow-up on previous items
    • Assign new action items
    • Set deadlines

Deliverable: Monthly FinOps Report (template provided)

Monthly Report Template

# AWS Cost Report - [Month Year]

## Executive Summary
- Total spend: $XX,XXX
- vs Budget: X% (under/over)
- vs Last month: +/-X%
- Optimization savings: $X,XXX

## Cost Breakdown
| Service | Cost | % of Total | MoM Change |
|---------|------|-----------|-----------|
| EC2     | $XX  | XX%       | +/-X%     |
| RDS     | $XX  | XX%       | +/-X%     |

## Optimization Actions Taken
1. Migrated 20 instances to Graviton (saved $X/month)
2. Purchased Reserved Instances (saved $X/month)
3. Deleted unused resources (saved $X/month)

## Recommendations for Next Month
1. Right-size 15 oversized instances (potential $X/month savings)
2. Implement S3 lifecycle policies (potential $X/month savings)

## Action Items
- [ ] [Owner] Task description (Deadline)

Roles & Responsibilities

FinOps Team Structure

FinOps Lead

  • Owns overall cloud financial management
  • Reports to CFO and CTO
  • Sets FinOps strategy and goals
  • Manages budget process

Cloud Cost Analyst

  • Analyzes spending trends
  • Generates reports and dashboards
  • Identifies optimization opportunities
  • Runs monthly review process

Cloud Architect (FinOps focus)

  • Advises on cost-optimized architectures
  • Implements cost optimization tools
  • Trains engineers on FinOps practices
  • Reviews architectural designs for cost impact

Engineering Team Responsibilities

Engineering Manager

  • Owns team budget
  • Reviews monthly cost reports
  • Prioritizes optimization work
  • Ensures tagging compliance

Engineers

  • Tag all resources they create
  • Consider cost in design decisions
  • Implement optimization recommendations
  • Delete unused resources

Platform/SRE Team

  • Implements cost optimization tooling
  • Automates cost monitoring
  • Provides cost visibility dashboards
  • Enforces tagging policies

Chargeback & Showback

Showback (Visibility Only)

Purpose: Show teams their costs without charging them Goal: Raise cost awareness

Implementation:

  • Monthly cost reports per team
  • Dashboard showing team spending
  • Highlight cost trends
  • No budget enforcement

Best for: Organizations new to FinOps

Chargeback (Financial Accountability)

Purpose: Allocate costs back to business units Goal: Financial accountability

Implementation:

  • Tag-based cost allocation
  • Transfer costs between cost centers
  • Teams have hard budgets
  • Overspending requires justification

Best for: Mature FinOps organizations

Shared Costs: Charged to central IT

  • VPC resources
  • Security tools
  • Monitoring infrastructure
  • Shared services

Team Costs: Charged to teams

  • Compute resources (EC2, Lambda)
  • Databases
  • Storage
  • Application-specific services

Implementation:

Total AWS Bill: $100,000

Shared Costs (30%): $30,000
  → Charged to IT/Platform budget

Team Costs (70%): $70,000
  → Allocated by tags:
    - Team A (Project=alpha): $20,000
    - Team B (Project=beta): $25,000
    - Team C (Project=gamma): $15,000
    - Untagged (alert!): $10,000 → Needs investigation

Policy & Governance

Cost Governance Policies

1. Resource Creation Policies

Policy: All resources must be tagged
Enforcement: Service Control Policy (SCP)
Exception process: Request via FinOps team

Policy: Dev/test resources must auto-stop nights/weekends
Enforcement: AWS Instance Scheduler
Exception process: Tag with NoAutoStop=true (requires approval)

Policy: S3 buckets must have lifecycle policies
Enforcement: AWS Config rule
Exception process: Document justification in bucket tags

2. Approval Workflows

# Spending thresholds requiring approval

< $1,000/month:
  - Auto-approved
  - Must be tagged

$1,000 - $5,000/month:
  - Engineering manager approval
  - Documented in JIRA

$5,000 - $20,000/month:
  - Director approval
  - Budget impact assessment
  - FinOps team review

> $20,000/month:
  - VP approval
  - Business case required
  - Quarterly review checkpoint

3. Reserved Instance / Savings Plans Policy

Policy: All commitments require FinOps review

Process:
  1. Team identifies workload suitable for commitment
  2. Submit request to FinOps with:
     - Resource details
     - Usage history (30+ days)
     - Business justification
  3. FinOps analyzes and recommends
  4. Finance approves commitment
  5. FinOps purchases and tracks utilization

Automation & Guardrails

Automated Actions

# Non-production resource scheduling
Schedule: Instance Scheduler
  - Stop all dev/test EC2/RDS instances at 7pm weekdays
  - Stop all dev/test instances all weekend
  - Start at 7am weekdays
  - Exception tag: NoAutoStop=true

# Untagged resource alerts
Trigger: AWS Config rule violation
Action:
  - Send Slack alert to team
  - Create JIRA ticket
  - Escalate if not tagged in 48 hours

# Old snapshot cleanup
Schedule: Weekly Lambda function
Action:
  - Delete snapshots older than 90 days (unless tagged KeepForever=true)
  - Notify teams of deletions
  - Estimate savings

# Budget breach response
Trigger: Budget > 100%
Action:
  - Email alerts to stakeholders
  - Create incident ticket
  - Stop non-production resources (optional)

Metrics & KPIs

Key FinOps Metrics

1. Cost Metrics

Total Monthly Cloud Spend:
  Target: Within budget
  Trend: Track month-over-month

Cost per Customer:
  Calculation: Total AWS Cost / Active Customers
  Target: Decreasing over time

Cost per Transaction:
  Calculation: Total AWS Cost / Transactions Processed
  Target: Optimize for efficiency

Unit Economics:
  Calculation: Revenue per Customer - Cost per Customer
  Target: Positive and growing

2. Efficiency Metrics

Compute Utilization:
  Metric: Average CPU utilization
  Target: 40-60% (room for burst, not over-provisioned)

Storage Utilization:
  Metric: % of S3 in cost-optimized tiers
  Target: >60% in IA or Glacier tiers

Reserved Instance Coverage:
  Metric: % of On-Demand usage covered by RIs/SPs
  Target: >70% for stable workloads

RI/SP Utilization:
  Metric: % of RIs/SPs actually used
  Target: >90%

3. Operational Metrics

Tag Compliance:
  Metric: % of resources with required tags
  Target: >95%

Budget Variance:
  Metric: Actual vs Budget %
  Target: ±5%

Optimization Savings:
  Metric: $ saved per month from optimizations
  Target: Growing

Mean Time to Optimize (MTTO):
  Metric: Days from finding opportunity to implementing
  Target: <30 days

4. Organizational Metrics

FinOps Engagement:
  Metric: % of teams attending monthly reviews
  Target: 100%

Cost Awareness:
  Survey: Do engineers know their team's monthly cost?
  Target: >80% aware

Optimization Velocity:
  Metric: # optimization tasks completed per quarter
  Target: Growing trend

Dashboard Requirements

Executive Dashboard (Monthly)

  • Total spend vs budget
  • Spend by service (top 10)
  • Month-over-month trend
  • Forecast for next quarter
  • Optimization savings achieved

Engineering Dashboard (Real-time)

  • Per-team costs (daily)
  • Cost anomaly alerts
  • Untagged resources count
  • Budget utilization %
  • Top cost drivers

FinOps Dashboard (Daily)

  • Detailed service costs
  • Tag compliance metrics
  • RI/SP utilization
  • Rightsizing opportunities
  • Unused resource counts

Getting Started Checklist

Phase 1: Foundation (Month 1)

  • Enable Cost Explorer
  • Set up AWS Budgets
  • Define tagging strategy
  • Activate cost allocation tags
  • Set up Cost and Usage Reports (CUR)
  • Create basic cost dashboard

Phase 2: Visibility (Months 2-3)

  • Implement tagging enforcement
  • Run first optimization scripts
  • Set up monthly review meeting
  • Create team cost reports
  • Assign team cost owners
  • Document FinOps processes

Phase 3: Optimization (Months 4-6)

  • Implement automated resource scheduling
  • Purchase first Reserved Instances
  • Set up cost anomaly detection
  • Automate reporting
  • Train engineering teams
  • Implement showback/chargeback

Phase 4: Culture (Ongoing)

  • Cost metrics in engineering KPIs
  • Cost review in architecture reviews
  • Regular optimization sprints
  • FinOps champions in each team
  • Cost-aware development practices
  • Continuous improvement

Resources

AWS Native Tools

  • AWS Cost Explorer
  • AWS Budgets
  • AWS Cost Anomaly Detection
  • AWS Compute Optimizer
  • AWS Trusted Advisor
  • AWS Cost & Usage Reports

Third-Party Tools

  • CloudHealth (VMware)
  • Cloudability (Apptio)
  • Kubecost (Kubernetes cost monitoring)
  • Spot.io (Cost optimization platform)

FinOps Foundation