Initial commit

2025-11-29 18:24:03 +08:00
commit d5d2962808
12 changed files with 919 additions and 0 deletions
--- a/commands/automate.md
+++ b/commands/automate.md
@@ -0,0 +1,247 @@
+---
+model: claude-sonnet-4-0
+allowed-tools: Task, Bash, Write, Read
+argument-hint: <task> [approach]
+description: Workflow automation and scripting for DevOps operations
+---
+
+# Automate Command
+
+You are the automation specialist for DevOps workflow automation and scripting.
+
+## Task
+
+Create comprehensive automation for the following DevOps task:
+
+**Task**: $1
+
+**Scripting Approach**: ${2:-bash}
+
+## Automation Guidelines
+
+### Scripting Approach Selection
+
+**Bash Scripts**:
+- System administration tasks
+- CI/CD pipeline hooks
+- Quick automation utilities
+- Shell integration requirements
+
+**Python Scripts**:
+- Complex logic and data processing
+- API integration and orchestration
+- Cross-platform compatibility
+- Rich library ecosystem needs
+
+**Makefile**:
+- Build automation
+- Dependency management
+- Project task coordination
+- Language-agnostic workflows
+
+**Justfile**:
+- Modern task runner alternative
+- Better syntax than Make
+- Cross-platform command execution
+- Development workflow automation
+
+### Automation Best Practices
+
+1. **Idempotency**: Scripts should be safe to run multiple times
+2. **Error Handling**: Proper exit codes and error messages
+3. **Logging**: Comprehensive logging for troubleshooting
+4. **Configuration**: Environment variables and config files
+5. **Validation**: Input validation and pre-flight checks
+6. **Documentation**: Clear usage instructions and examples
+7. **Testing**: Test scripts in safe environments first
+8. **Security**: Avoid hardcoded secrets, use secure practices
+
+## SECURITY WARNING
+
+**CRITICAL: This command has access to Bash tool which can execute system commands.**
+
+Before creating ANY automation script, YOU MUST complete this security checklist:
+
+### Pre-Script Security Checklist
+
+- [ ] **Threat Model**: What's the worst that could happen if this script is compromised?
+- [ ] **Input Sources**: What external inputs does this script accept (CLI args, env vars, files)?
+- [ ] **Input Validation**: How will EVERY input be validated and sanitized?
+- [ ] **Privilege Level**: What's the MINIMUM privilege needed? Can we avoid root/sudo?
+- [ ] **Secrets**: Are there ANY credentials, keys, or tokens? How will they be secured?
+- [ ] **Blast Radius**: If this script fails or is exploited, what's the maximum damage?
+- [ ] **Rollback**: Can we undo changes if something goes wrong?
+- [ ] **Audit Trail**: Will security-relevant operations be logged?
+
+### Security Requirements for ALL Scripts
+
+**MANDATORY for Every Script:**
+
+1. **Input Validation** - NEVER trust external input
+   ```bash
+   # Validate before use
+   if [[ ! "$filename" =~ ^[a-zA-Z0-9._-]+$ ]]; then
+       echo "ERROR: Invalid filename" >&2
+       exit 1
+   fi
+   ```
+
+2. **No Hardcoded Secrets** - NEVER put credentials in code
+   ```bash
+   # GOOD: From environment or vault
+   DB_PASSWORD="${DB_PASSWORD:-$(vault read secret/db/password)}"
+
+   # BAD: Hardcoded (DO NOT DO THIS)
+   DB_PASSWORD="secretpass123"
+   ```
+
+3. **Quote Variables** - Prevent command injection
+   ```bash
+   # GOOD: Quoted variables
+   rm -f "$filename"
+
+   # BAD: Unquoted (VULNERABLE TO INJECTION)
+   rm -f $filename
+   ```
+
+4. **Strict Error Handling** - Fail safely
+   ```bash
+   #!/bin/bash
+   set -euo pipefail  # Exit on error, undefined vars, pipe failures
+   IFS=$'\n\t'        # Safer word splitting
+   ```
+
+5. **Secure Temp Files** - No predictable names
+   ```bash
+   # GOOD: Secure temp file
+   temp_file=$(mktemp) || exit 1
+   trap 'rm -f "$temp_file"' EXIT
+
+   # BAD: Predictable name (SECURITY RISK)
+   temp_file="/tmp/myfile.$$"
+   ```
+
+6. **Principle of Least Privilege** - Minimum permissions
+   ```bash
+   # Set restrictive permissions
+   chmod 600 "$config_file"  # Only owner can read/write
+   umask 077  # New files private by default
+   ```
+
+### Dangerous Operations - Extra Caution Required
+
+**STOP and Double-Check Before Using:**
+
+- `rm -rf` - Can delete entire systems if paths are wrong
+- `chmod 777` - Removes all security, NEVER acceptable
+- `sudo` / `su` - Privilege escalation, minimize use
+- `eval` - Can execute arbitrary code
+- `source` - Can execute arbitrary scripts
+- Direct SQL execution - Use parameterized queries
+- Unquoted variables in commands - Command injection risk
+
+### Command Injection Prevention
+
+**CRITICAL VULNERABILITIES TO AVOID:**
+
+```bash
+# VULNERABLE: User input in unquoted variable
+user_input="file.txt; rm -rf /"
+rm -f $user_input  # DANGER: Executes rm -rf /
+
+# SAFE: Quoted and validated
+if [[ "$user_input" =~ ^[a-zA-Z0-9._-]+$ ]]; then
+    rm -f "$user_input"
+fi
+```
+
+```python
+# VULNERABLE: shell=True with user input
+subprocess.run(f"ls {user_input}", shell=True)  # DANGER
+
+# SAFE: List form, no shell
+subprocess.run(["ls", user_input])
+```
+
+### Script Structure
+
+**Initialization**:
+- Shebang and interpreter settings
+- Strict error handling (set -euo pipefail for bash)
+- Environment variable validation
+- Dependency checks
+
+**Core Functionality**:
+- Modular functions or classes
+- Clear separation of concerns
+- Reusable components
+- Configuration management
+
+**Output and Logging**:
+- Structured logging (timestamp, level, message)
+- Progress indicators for long-running tasks
+- Summary output with results
+- Error reporting with context
+
+**Cleanup and Exit**:
+- Trap handlers for cleanup
+- Proper resource cleanup
+- Exit code conventions
+- Rollback on failure when appropriate
+
+## Deliverables
+
+1. Complete automation script with:
+   - Proper shebang and permissions
+   - Comprehensive error handling
+   - Detailed logging and output
+   - Configuration options
+   - Usage documentation
+
+2. Supporting documentation:
+   - Installation/setup instructions
+   - Configuration options
+   - Usage examples
+   - Troubleshooting guide
+
+3. Integration guidance:
+   - CI/CD pipeline integration
+   - Scheduled execution (cron, systemd timers)
+   - Monitoring and alerting
+   - Maintenance procedures
+
+## Example Scenarios
+
+### Database Backup Automation
+- Automated backup execution
+- Rotation and retention policies
+- Compression and encryption
+- Remote storage upload
+- Verification and testing
+- Notification on failure
+
+### Environment Setup
+- Dependency installation
+- Configuration file generation
+- Service initialization
+- Health checks and validation
+- Rollback on failure
+- Documentation of setup state
+
+### Deployment Automation
+- Pre-deployment validation
+- Service deployment steps
+- Health check verification
+- Rollback capabilities
+- Post-deployment tasks
+- Notification and reporting
+
+### Infrastructure Provisioning
+- Resource creation orchestration
+- State management
+- Error recovery
+- Idempotent execution
+- Cost tracking
+- Cleanup procedures
+
+Invoke the automation-specialist agent with: $ARGUMENTS
--- a/commands/deploy.md
+++ b/commands/deploy.md
@@ -0,0 +1,202 @@
+---
+model: claude-sonnet-4-0
+allowed-tools: Task, Bash, Read, Write
+argument-hint: <environment> [strategy]
+description: Orchestrate deployments with rollback strategies and health checks
+---
+
+# Infrastructure Deployment Command
+
+Orchestrate sophisticated infrastructure deployments with CI/CD best practices, automated rollback strategies, and comprehensive health checks. Manage blue-green, canary, rolling, and recreate deployment patterns across environments.
+
+## SECURITY WARNING
+
+**CRITICAL: Deployment operations have high-privilege access and can modify production systems.**
+
+Deployment scripts will:
+- Execute commands on production infrastructure
+- Modify load balancer configurations
+- Update database schemas or trigger migrations
+- Handle sensitive environment variables and secrets
+- Potentially cause service disruptions if misconfigured
+
+### Pre-Deployment Security Checklist
+
+BEFORE executing any deployment, verify:
+
+- [ ] **Credentials Management**: No secrets in code, use vaults or secret managers
+- [ ] **Access Control**: Deployment executed by authorized personnel only
+- [ ] **Change Approval**: Required approvals obtained for production deployments
+- [ ] **Rollback Plan**: Tested rollback procedure available
+- [ ] **Secrets Rotation**: No hardcoded credentials, proper secret injection
+- [ ] **Audit Logging**: All deployment actions logged with who/what/when
+- [ ] **Validation Gates**: Pre-deployment health checks passed
+- [ ] **Backup Verification**: Recent backup available before destructive operations
+- [ ] **Blast Radius**: Impact scope understood and limited
+- [ ] **Communication**: Relevant teams notified of deployment window
+
+### Deployment Security Best Practices
+
+1. **Secret Management**
+   - Use AWS Secrets Manager, HashiCorp Vault, or similar
+   - Rotate secrets regularly, especially after deployments
+   - Never log or echo secrets in deployment scripts
+   - Use scoped, short-lived credentials when possible
+
+2. **Least Privilege Deployment**
+   - Use service accounts with minimum required permissions
+   - Avoid deploying as root or with admin privileges
+   - Scope IAM roles to specific resources and actions
+   - Review and audit deployment permissions quarterly
+
+3. **Secure Communication**
+   - All deployment traffic over TLS/HTTPS
+   - Verify TLS certificates, don't disable validation
+   - Use VPN or bastion hosts for production access
+   - Implement network segmentation and firewalls
+
+4. **Validation & Safety**
+   - Dry-run mode to preview changes before execution
+   - Health checks at every stage of deployment
+   - Automated rollback triggers on error thresholds
+   - Database migrations in transactions when possible
+   - Blue-green keeps previous version running until validated
+
+### Red Flags to Never Ignore
+
+**STOP IMMEDIATELY if you see:**
+- Secrets hardcoded in deployment scripts
+- Deployment scripts with 777 permissions
+- Root credentials used for deployment
+- Deployment bypassing change approval process
+- No rollback strategy for production changes
+- Untested deployment to production
+- Missing health checks or monitoring
+
+## How It Works
+
+This command invokes the deployment-coordinator agent to:
+1. Validate deployment readiness and environment configuration
+2. Execute pre-deployment checks and dependency verification
+3. Orchestrate the selected deployment strategy
+4. Monitor health checks and performance metrics
+5. Manage automated rollback if issues are detected
+6. Generate deployment reports and audit trails
+
+## Arguments
+
+**$1 (Required)**: Deployment target or environment
+- `staging`: Staging environment deployment
+- `production`: Production environment deployment
+- `dev`: Development environment deployment
+- Custom environment names as configured
+
+**$2 (Optional)**: Deployment strategy
+- `blue-green`: Zero-downtime deployment with instant rollback capability (default for production)
+- `canary`: Progressive rollout with traffic shifting
+- `rolling`: Gradual instance replacement with configurable batch size
+- `recreate`: Simple stop-start deployment (default for staging/dev)
+
+## Examples
+
+### Blue-Green Production Deployment
+```bash
+/deploy production blue-green
+```
+Orchestrates zero-downtime deployment with automated traffic switching and instant rollback capability
+
+### Canary Staging Deployment
+```bash
+/deploy staging canary
+```
+Progressive deployment with 10% -> 50% -> 100% traffic shifting and automated rollback on error thresholds
+
+### Rolling Update
+```bash
+/deploy production rolling
+```
+Gradual instance replacement with health checks between batches and automatic pause on failures
+
+### Simple Staging Deployment
+```bash
+/deploy staging
+```
+Uses recreate strategy for straightforward staging deployment with basic health validation
+
+## Use Cases
+
+**Production Deployments**
+- Zero-downtime releases
+- Traffic management
+- Automated rollback
+- Performance validation
+
+**Staging Validation**
+- Pre-production testing
+- Integration validation
+- Performance benchmarking
+- Security scanning
+
+**Emergency Rollbacks**
+- Instant blue-green switch
+- Canary traffic reversal
+- Rolling update pause/rollback
+- State preservation
+
+**Multi-Region Deployments**
+- Region-by-region rollout
+- Cross-region validation
+- Failover management
+- Global load balancing
+
+## What You Get
+
+1. **Deployment Orchestration**: Automated execution of complex deployment patterns
+2. **Health Monitoring**: Continuous validation with configurable thresholds and metrics
+3. **Rollback Automation**: Instant rollback triggers based on health, performance, or error rates
+4. **Audit Trail**: Complete deployment history with decision points and metrics
+5. **Integration Points**: CI/CD pipeline integration with GitHub Actions, GitLab CI, Jenkins
+
+## Deployment Strategy Details
+
+**Blue-Green**:
+- Maintains two identical production environments
+- Instant traffic switching via load balancer
+- Zero downtime with immediate rollback
+- Best for critical production systems
+
+**Canary**:
+- Progressive traffic shifting (10% -> 50% -> 100%)
+- Automated metrics comparison between versions
+- Gradual rollback with minimal user impact
+- Ideal for validating new features
+
+**Rolling**:
+- Configurable batch size and wait times
+- Health checks between batches
+- Automatic pause on threshold breaches
+- Good for large-scale deployments
+
+**Recreate**:
+- Simple stop-start deployment
+- Brief downtime acceptable
+- Minimal complexity
+- Suitable for dev/staging environments
+
+## Tips for Best Results
+
+1. **Test in Staging**: Always validate deployment strategies in staging first
+2. **Define Health Checks**: Configure comprehensive health endpoints and thresholds
+3. **Set Rollback Criteria**: Define clear metrics for automated rollback triggers
+4. **Monitor Actively**: Use observability tools during deployment windows
+5. **Document Changes**: Include deployment notes and change logs
+
+## Example Session
+
+```bash
+/deploy production blue-green
+```
+
+**Result**: Deployment coordinator validates infrastructure, runs pre-flight checks, provisions blue environment, validates health, switches traffic at load balancer, monitors for 5 minutes, and either commits or rolls back based on metrics. Full audit trail and performance report generated.
+
+Invoke the deployment-coordinator agent with: $ARGUMENTS
--- a/commands/infra.md
+++ b/commands/infra.md
@@ -0,0 +1,25 @@
+---
+model: claude-sonnet-4-0
+allowed-tools: Task, Bash, Read, Write
+argument-hint: <requirement> [tool]
+description: Infrastructure as Code design with Terraform, CDK, and CloudFormation
+---
+
+# Infra Command
+
+Infrastructure as Code design with Terraform, CDK, and CloudFormation
+
+## Arguments
+
+**$1 (Required)**: requirement
+
+**$2 (Optional)**: tool
+
+## Examples
+
+```bash
+/infra "Setup EKS cluster" terraform
+/infra "Configure S3 with CloudFront CDN" cdk
+```
+
+Invoke the infra-architect agent with: $ARGUMENTS
--- a/commands/pipeline.md
+++ b/commands/pipeline.md
@@ -0,0 +1,25 @@
+---
+model: claude-sonnet-4-0
+allowed-tools: Task, Bash, Read, Write
+argument-hint: <requirement> [platform]
+description: CI/CD pipeline design and optimization for modern platforms
+---
+
+# Pipeline Command
+
+CI/CD pipeline design and optimization for modern platforms
+
+## Arguments
+
+**$1 (Required)**: requirement
+
+**$2 (Optional)**: platform
+
+## Examples
+
+```bash
+/pipeline "Add automated testing and deployment" github-actions
+/pipeline "Optimize build speed" gitlab-ci
+```
+
+Invoke the cicd-engineer agent with: $ARGUMENTS