Initial commit
This commit is contained in:
299
skills/aws-cost-operations/SKILL.md
Normal file
299
skills/aws-cost-operations/SKILL.md
Normal file
@@ -0,0 +1,299 @@
|
||||
---
|
||||
name: aws-cost-operations
|
||||
description: This skill provides AWS cost optimization, monitoring, and operational best practices with integrated MCP servers for billing analysis, cost estimation, observability, and security assessment.
|
||||
---
|
||||
|
||||
# AWS Cost & Operations
|
||||
|
||||
This skill provides comprehensive guidance for AWS cost optimization, monitoring, observability, and operational excellence with integrated MCP servers.
|
||||
|
||||
## Integrated MCP Servers
|
||||
|
||||
This skill includes 8 MCP servers automatically configured with the plugin:
|
||||
|
||||
### Cost Management Servers
|
||||
|
||||
#### 1. AWS Billing and Cost Management MCP Server
|
||||
**Purpose**: Real-time billing and cost management
|
||||
- View current AWS spending and trends
|
||||
- Analyze billing details across services
|
||||
- Track budget utilization
|
||||
- Monitor cost allocation tags
|
||||
- Review consolidated billing for organizations
|
||||
|
||||
#### 2. AWS Pricing MCP Server
|
||||
**Purpose**: Pre-deployment cost estimation and optimization
|
||||
- Estimate costs before deploying resources
|
||||
- Compare pricing across regions
|
||||
- Calculate Total Cost of Ownership (TCO)
|
||||
- Evaluate different service options for cost efficiency
|
||||
- Get current pricing information for AWS services
|
||||
|
||||
#### 3. AWS Cost Explorer MCP Server
|
||||
**Purpose**: Detailed cost analysis and reporting
|
||||
- Analyze historical spending patterns
|
||||
- Create custom cost reports
|
||||
- Identify cost anomalies and trends
|
||||
- Forecast future costs
|
||||
- Analyze cost by service, region, or tag
|
||||
- Generate cost optimization recommendations
|
||||
|
||||
### Monitoring & Observability Servers
|
||||
|
||||
#### 4. Amazon CloudWatch MCP Server
|
||||
**Purpose**: Metrics, alarms, and logs analysis
|
||||
- Query CloudWatch metrics and logs
|
||||
- Create and manage CloudWatch alarms
|
||||
- Analyze application performance metrics
|
||||
- Troubleshoot operational issues
|
||||
- Set up custom dashboards
|
||||
- Monitor resource utilization
|
||||
|
||||
#### 5. Amazon CloudWatch Application Signals MCP Server
|
||||
**Purpose**: Application monitoring and performance insights
|
||||
- Monitor application health and performance
|
||||
- Analyze service-level objectives (SLOs)
|
||||
- Track application dependencies
|
||||
- Identify performance bottlenecks
|
||||
- Monitor service map and traces
|
||||
|
||||
#### 6. AWS Managed Prometheus MCP Server
|
||||
**Purpose**: Prometheus-compatible monitoring
|
||||
- Query Prometheus metrics
|
||||
- Monitor containerized applications
|
||||
- Analyze Kubernetes workload metrics
|
||||
- Create PromQL queries
|
||||
- Track custom application metrics
|
||||
|
||||
### Audit & Security Servers
|
||||
|
||||
#### 7. AWS CloudTrail MCP Server
|
||||
**Purpose**: AWS API activity and audit analysis
|
||||
- Analyze AWS API calls and user activity
|
||||
- Track resource changes and modifications
|
||||
- Investigate security incidents
|
||||
- Audit compliance requirements
|
||||
- Identify unusual access patterns
|
||||
- Review who made what changes when
|
||||
|
||||
#### 8. AWS Well-Architected Security Assessment Tool MCP Server
|
||||
**Purpose**: Security assessment against Well-Architected Framework
|
||||
- Assess security posture against AWS best practices
|
||||
- Identify security gaps and vulnerabilities
|
||||
- Get security improvement recommendations
|
||||
- Review security pillar compliance
|
||||
- Generate security assessment reports
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Optimizing AWS costs and reducing spending
|
||||
- Estimating costs before deployment
|
||||
- Monitoring application and infrastructure performance
|
||||
- Setting up observability and alerting
|
||||
- Analyzing spending patterns and trends
|
||||
- Investigating operational issues
|
||||
- Auditing AWS activity and changes
|
||||
- Assessing security posture
|
||||
- Implementing operational excellence
|
||||
|
||||
## Cost Optimization Best Practices
|
||||
|
||||
### Pre-Deployment Cost Estimation
|
||||
|
||||
**Always estimate costs before deploying**:
|
||||
1. Use **AWS Pricing MCP** to estimate resource costs
|
||||
2. Compare pricing across different regions
|
||||
3. Evaluate alternative service options
|
||||
4. Calculate expected monthly costs
|
||||
5. Plan for scaling and growth
|
||||
|
||||
**Example workflow**:
|
||||
```
|
||||
"Estimate the monthly cost of running a Lambda function with
|
||||
1 million invocations, 512MB memory, 3-second duration in us-east-1"
|
||||
```
|
||||
|
||||
### Cost Analysis and Optimization
|
||||
|
||||
**Regular cost reviews**:
|
||||
1. Use **Cost Explorer MCP** to analyze spending trends
|
||||
2. Identify cost anomalies and unexpected charges
|
||||
3. Review costs by service, region, and environment
|
||||
4. Compare actual vs. budgeted costs
|
||||
5. Generate cost optimization recommendations
|
||||
|
||||
**Cost optimization strategies**:
|
||||
- Right-size over-provisioned resources
|
||||
- Use appropriate storage classes (S3, EBS)
|
||||
- Implement auto-scaling for dynamic workloads
|
||||
- Leverage Savings Plans and Reserved Instances
|
||||
- Delete unused resources and snapshots
|
||||
- Use cost allocation tags effectively
|
||||
|
||||
### Budget Monitoring
|
||||
|
||||
**Track spending against budgets**:
|
||||
1. Use **Billing and Cost Management MCP** to monitor budgets
|
||||
2. Set up budget alerts for threshold breaches
|
||||
3. Review budget utilization regularly
|
||||
4. Adjust budgets based on trends
|
||||
5. Implement cost controls and governance
|
||||
|
||||
## Monitoring and Observability Best Practices
|
||||
|
||||
### CloudWatch Metrics and Alarms
|
||||
|
||||
**Implement comprehensive monitoring**:
|
||||
1. Use **CloudWatch MCP** to query metrics and logs
|
||||
2. Set up alarms for critical metrics:
|
||||
- CPU and memory utilization
|
||||
- Error rates and latency
|
||||
- Queue depths and processing times
|
||||
- API gateway throttling
|
||||
- Lambda errors and timeouts
|
||||
3. Create CloudWatch dashboards for visualization
|
||||
4. Use log insights for troubleshooting
|
||||
|
||||
**Example alarm scenarios**:
|
||||
- Lambda error rate > 1%
|
||||
- EC2 CPU utilization > 80%
|
||||
- API Gateway 4xx/5xx error spike
|
||||
- DynamoDB throttled requests
|
||||
- ECS task failures
|
||||
|
||||
### Application Performance Monitoring
|
||||
|
||||
**Monitor application health**:
|
||||
1. Use **CloudWatch Application Signals MCP** for APM
|
||||
2. Track service-level objectives (SLOs)
|
||||
3. Monitor application dependencies
|
||||
4. Identify performance bottlenecks
|
||||
5. Set up distributed tracing
|
||||
|
||||
### Container and Kubernetes Monitoring
|
||||
|
||||
**For containerized workloads**:
|
||||
1. Use **AWS Managed Prometheus MCP** for metrics
|
||||
2. Monitor container resource utilization
|
||||
3. Track pod and node health
|
||||
4. Create PromQL queries for custom metrics
|
||||
5. Set up alerts for container anomalies
|
||||
|
||||
## Audit and Security Best Practices
|
||||
|
||||
### CloudTrail Activity Analysis
|
||||
|
||||
**Audit AWS activity**:
|
||||
1. Use **CloudTrail MCP** to analyze API activity
|
||||
2. Track who made changes to resources
|
||||
3. Investigate security incidents
|
||||
4. Monitor for suspicious activity patterns
|
||||
5. Audit compliance with policies
|
||||
|
||||
**Common audit scenarios**:
|
||||
- "Who deleted this S3 bucket?"
|
||||
- "Show all IAM role changes in the last 24 hours"
|
||||
- "List failed login attempts"
|
||||
- "Find all actions by a specific user"
|
||||
- "Track modifications to security groups"
|
||||
|
||||
### Security Assessment
|
||||
|
||||
**Regular security reviews**:
|
||||
1. Use **Well-Architected Security Assessment MCP**
|
||||
2. Assess security posture against best practices
|
||||
3. Identify security gaps and vulnerabilities
|
||||
4. Implement recommended security improvements
|
||||
5. Document security compliance
|
||||
|
||||
**Security assessment areas**:
|
||||
- Identity and Access Management (IAM)
|
||||
- Detective controls and monitoring
|
||||
- Infrastructure protection
|
||||
- Data protection and encryption
|
||||
- Incident response preparedness
|
||||
|
||||
## Using MCP Servers Effectively
|
||||
|
||||
### Cost Analysis Workflow
|
||||
|
||||
1. **Pre-deployment**: Use Pricing MCP to estimate costs
|
||||
2. **Post-deployment**: Use Billing MCP to track actual spending
|
||||
3. **Analysis**: Use Cost Explorer MCP for detailed cost analysis
|
||||
4. **Optimization**: Implement recommendations from Cost Explorer
|
||||
|
||||
### Monitoring Workflow
|
||||
|
||||
1. **Setup**: Configure CloudWatch metrics and alarms
|
||||
2. **Monitor**: Use CloudWatch MCP to track key metrics
|
||||
3. **Analyze**: Use Application Signals for APM insights
|
||||
4. **Troubleshoot**: Query CloudWatch Logs for issue resolution
|
||||
|
||||
### Security Workflow
|
||||
|
||||
1. **Audit**: Use CloudTrail MCP to review activity
|
||||
2. **Assess**: Use Well-Architected Security Assessment
|
||||
3. **Remediate**: Implement security recommendations
|
||||
4. **Monitor**: Track security events via CloudWatch
|
||||
|
||||
### MCP Usage Best Practices
|
||||
|
||||
1. **Cost Awareness**: Check pricing before deploying resources
|
||||
2. **Proactive Monitoring**: Set up alarms for critical metrics
|
||||
3. **Regular Reviews**: Analyze costs and performance weekly
|
||||
4. **Audit Trails**: Review CloudTrail logs for compliance
|
||||
5. **Security First**: Run security assessments regularly
|
||||
6. **Optimize Continuously**: Act on cost and performance recommendations
|
||||
|
||||
## Operational Excellence Guidelines
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
- **Tag Everything**: Use consistent cost allocation tags
|
||||
- **Review Monthly**: Analyze spending trends and anomalies
|
||||
- **Right-size**: Match resources to actual usage
|
||||
- **Automate**: Use auto-scaling and scheduling
|
||||
- **Monitor Budgets**: Set alerts for cost overruns
|
||||
|
||||
### Monitoring and Alerting
|
||||
|
||||
- **Critical Metrics**: Alert on business-critical metrics
|
||||
- **Noise Reduction**: Fine-tune thresholds to reduce false positives
|
||||
- **Actionable Alerts**: Ensure alerts have clear remediation steps
|
||||
- **Dashboard Visibility**: Create dashboards for key stakeholders
|
||||
- **Log Retention**: Balance cost and compliance needs
|
||||
|
||||
### Security and Compliance
|
||||
|
||||
- **Least Privilege**: Grant minimum required permissions
|
||||
- **Audit Regularly**: Review CloudTrail logs for anomalies
|
||||
- **Encrypt Data**: Use encryption at rest and in transit
|
||||
- **Assess Continuously**: Run security assessments frequently
|
||||
- **Incident Response**: Have procedures for security events
|
||||
|
||||
## Additional Resources
|
||||
|
||||
For detailed operational patterns and best practices, refer to the comprehensive reference:
|
||||
|
||||
**File**: `references/operations-patterns.md`
|
||||
|
||||
This reference includes:
|
||||
- Cost optimization strategies
|
||||
- Monitoring and alerting patterns
|
||||
- Observability best practices
|
||||
- Security and compliance guidelines
|
||||
- Troubleshooting workflows
|
||||
|
||||
## CloudWatch Alarms Reference
|
||||
|
||||
**File**: `references/cloudwatch-alarms.md`
|
||||
|
||||
Common alarm configurations for:
|
||||
- Lambda functions
|
||||
- EC2 instances
|
||||
- RDS databases
|
||||
- DynamoDB tables
|
||||
- API Gateway
|
||||
- ECS services
|
||||
- Application Load Balancers
|
||||
Reference in New Issue
Block a user