Files
gh-jeremylongshore-claude-c…/skills/gh-actions-validator/SKILL.md
2025-11-30 08:19:43 +08:00

556 lines
16 KiB
Markdown

---
name: gh-actions-validator
description: |
Automatically validates and enforces GitHub Actions best practices for Vertex AI and Google Cloud deployments.
Expert in Workload Identity Federation (WIF), Vertex AI Agent Engine deployment pipelines, security validation, and CI/CD automation.
Triggers: "create github actions", "deploy vertex ai", "setup wif", "validate github workflow", "gcp deployment pipeline"
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
version: 1.0.0
---
## What This Skill Does
Expert validator and enforcer of GitHub Actions best practices specifically for Vertex AI Agent Engine and Google Cloud deployments. Ensures secure, production-ready CI/CD pipelines using Workload Identity Federation (WIF) instead of service account JSON keys.
## When This Skill Activates
### Trigger Phrases
- "Create GitHub Actions workflow for Vertex AI"
- "Deploy agent to Vertex AI Engine"
- "Set up Workload Identity Federation"
- "Validate GitHub Actions security for GCP"
- "GitHub Actions deployment pipeline"
- "WIF configuration for Google Cloud"
- "Automate Vertex AI deployment"
- "GitHub Actions best practices GCP"
### Use Cases
- Creating CI/CD pipelines for Vertex AI Agent Engine deployments
- Migrating from JSON service account keys to WIF
- Enforcing security best practices in GitHub Actions
- Validating post-deployment of Vertex AI agents
- Setting up automated monitoring for deployed agents
- Implementing OIDC-based authentication to Google Cloud
## Validation Rules Enforced
### 1. Workload Identity Federation (WIF) Mandatory
**NEVER use JSON service account keys**:
```yaml
# ❌ FORBIDDEN - JSON keys are insecure
- name: Authenticate (INSECURE)
uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }} # ❌ NEVER DO THIS
```
**ALWAYS use WIF**:
```yaml
# ✅ REQUIRED - WIF with OIDC
permissions:
contents: read
id-token: write # ✅ REQUIRED for WIF
- name: Authenticate (SECURE)
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
```
### 2. OIDC Permissions Required
**Missing id-token permission**:
```yaml
# ❌ FAILS - Missing id-token: write
permissions:
contents: read
```
**Correct permissions**:
```yaml
# ✅ REQUIRED for WIF OIDC
permissions:
contents: read
id-token: write # MUST be present
```
### 3. IAM Least Privilege
**Overly permissive roles**:
```yaml
# ❌ FORBIDDEN - Too many permissions
roles:
- roles/owner
- roles/editor
```
**Least privilege**:
```yaml
# ✅ REQUIRED - Minimal permissions
roles:
- roles/run.admin
- roles/iam.serviceAccountUser
- roles/aiplatform.user
```
### 4. Vertex AI Agent Engine Deployment Validation
**Post-Deployment Checks** (MANDATORY):
```yaml
- name: Validate Agent Deployment
run: |
python scripts/validate-deployment.py \
--project-id=${{ secrets.GCP_PROJECT_ID }} \
--agent-id=production-agent
# Validation checklist:
# ✅ Agent state is RUNNING
# ✅ Code Execution Sandbox enabled (14-day TTL)
# ✅ Memory Bank configured
# ✅ A2A Protocol compliant (AgentCard accessible)
# ✅ Model Armor enabled (prompt injection protection)
# ✅ VPC Service Controls configured
# ✅ Service account has minimal IAM permissions
# ✅ Monitoring dashboards created
# ✅ Alerting policies configured
```
### 5. Security Scanning (REQUIRED)
```yaml
# ✅ REQUIRED - Security validation before deployment
- name: Scan for secrets
uses: trufflesecurity/trufflehog@main
- name: Vulnerability scanning
uses: aquasecurity/trivy-action@master
- name: Validate no service account keys
run: |
if find . -name "*service-account*.json"; then
echo "❌ Service account JSON keys detected"
exit 1
fi
```
### 6. Agent Configuration Validation
**Before Deployment** (MANDATORY):
```python
def validate_agent_config(agent_config: dict) -> bool:
"""
Validate agent configuration before deployment.
"""
# ✅ Code Execution TTL must be 7-14 days
ttl = agent_config.get("code_execution_config", {}).get("state_ttl_days")
assert 7 <= ttl <= 14, "❌ State TTL must be 7-14 days"
# ✅ Memory Bank must be enabled for stateful agents
memory_enabled = agent_config.get("memory_bank_config", {}).get("enabled")
assert memory_enabled, "❌ Memory Bank should be enabled"
# ✅ Model Armor must be enabled (prompt injection protection)
model_armor = agent_config.get("model_armor", {}).get("enabled")
assert model_armor, "❌ Model Armor must be enabled"
# ✅ VPC configuration required for enterprise
vpc_config = agent_config.get("vpc_config")
assert vpc_config, "❌ VPC configuration missing"
# ✅ Auto-scaling configured
auto_scaling = agent_config.get("auto_scaling")
assert auto_scaling, "❌ Auto-scaling not configured"
assert auto_scaling.get("min_instances") >= 1, "❌ min_instances < 1"
return True
```
## Workflow Templates
### Template 1: Vertex AI Agent Engine Deployment
```yaml
name: Deploy Vertex AI Agent
on:
push:
branches: [main]
paths:
- 'agent/**'
workflow_dispatch:
permissions:
contents: read
id-token: write
env:
AGENT_ID: 'production-adk-agent'
REGION: 'us-central1'
jobs:
validate-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Authenticate to GCP (WIF)
uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.WIF_PROVIDER }}
service_account: ${{ secrets.WIF_SERVICE_ACCOUNT }}
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Validate Agent Configuration
run: |
python scripts/validate-agent-config.py
- name: Deploy to Vertex AI Engine
run: |
python scripts/deploy-agent.py \
--project-id=${{ secrets.GCP_PROJECT_ID }} \
--location=${{ env.REGION }} \
--agent-id=${{ env.AGENT_ID }}
- name: Post-Deployment Validation
run: |
python scripts/validate-deployment.py \
--project-id=${{ secrets.GCP_PROJECT_ID }} \
--agent-id=${{ env.AGENT_ID }}
- name: Setup Monitoring
run: |
python scripts/setup-monitoring.py \
--project-id=${{ secrets.GCP_PROJECT_ID }} \
--agent-id=${{ env.AGENT_ID }}
- name: Test Agent Endpoint
run: |
python scripts/test-agent.py \
--agent-id=${{ env.AGENT_ID }}
```
### Template 2: WIF Setup (One-Time Infrastructure)
```yaml
name: Setup Workload Identity Federation
on:
workflow_dispatch:
inputs:
github_repo:
description: 'GitHub repository (owner/repo)'
required: true
permissions:
contents: read
jobs:
setup-wif:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Authenticate with JSON key (one-time only)
uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SETUP_KEY }}
- name: Run WIF setup script
run: |
bash scripts/setup-wif.sh \
--project-id=${{ secrets.GCP_PROJECT_ID }} \
--github-repo=${{ github.event.inputs.github_repo }}
- name: Output WIF configuration
run: |
cat wif-config.txt
```
### Template 3: Security Validation (Pre-Deployment)
```yaml
name: Security Validation
on:
pull_request:
push:
branches: [main]
permissions:
contents: read
security-events: write
jobs:
security-checks:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Scan for secrets
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.repository.default_branch }}
head: HEAD
- name: Vulnerability scanning
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload results to GitHub Security
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: 'trivy-results.sarif'
- name: Validate no service account keys
run: |
if find . -name "*service-account*.json" -o -name "*credentials*.json"; then
echo "❌ Service account key files detected"
exit 1
fi
echo "✅ No service account keys found"
- name: Validate WIF usage in workflows
run: |
if grep -r "credentials_json" .github/workflows/; then
echo "❌ JSON credentials detected (use WIF)"
exit 1
fi
echo "✅ Workflows use WIF"
- name: Validate IAM roles (no owner/editor)
run: |
if grep -r "roles/owner\|roles/editor" . --include="*.tf" --include="*.yaml"; then
echo "❌ Overly permissive IAM roles detected"
exit 1
fi
echo "✅ Least privilege IAM roles"
```
## Vertex AI Agent Engine Specific Validations
### Deployment Configuration Validation
```python
# scripts/validate-agent-config.py
from typing import Dict, Any
def validate_vertex_agent_config(config: Dict[str, Any]) -> None:
"""
Comprehensive Vertex AI Agent Engine configuration validation.
"""
# 1. Model Selection
model = config.get("model")
assert model in ["gemini-2.5-pro", "gemini-2.5-flash"], \
f"❌ Invalid model: {model}. Use gemini-2.5-pro or gemini-2.5-flash"
# 2. Code Execution Sandbox
code_exec = config.get("code_execution_config", {})
if code_exec.get("enabled"):
ttl = code_exec.get("state_ttl_days")
assert 1 <= ttl <= 14, \
f"❌ State TTL must be 1-14 days, got {ttl}"
assert code_exec.get("sandbox_type") == "SECURE_ISOLATED", \
"❌ Sandbox type must be SECURE_ISOLATED"
timeout = code_exec.get("timeout_seconds")
assert 1 <= timeout <= 600, \
f"❌ Timeout must be 1-600 seconds, got {timeout}"
# 3. Memory Bank
memory = config.get("memory_bank_config", {})
if memory.get("enabled"):
max_memories = memory.get("max_memories")
assert max_memories >= 100, \
f"⚠️ Low memory limit: {max_memories}. Recommend >= 100"
assert memory.get("indexing_enabled"), \
"⚠️ Indexing disabled will slow query performance"
assert memory.get("auto_cleanup"), \
"⚠️ Auto-cleanup disabled may exceed quotas"
# 4. Security
assert config.get("model_armor", {}).get("enabled"), \
"❌ Model Armor must be enabled (prompt injection protection)"
assert config.get("vpc_config"), \
"❌ VPC configuration required for enterprise deployment"
# 5. Auto-Scaling
auto_scaling = config.get("auto_scaling", {})
assert auto_scaling.get("min_instances") >= 1, \
"❌ min_instances must be >= 1 for production"
assert auto_scaling.get("max_instances") >= 3, \
"⚠️ max_instances should be >= 3 for high availability"
# 6. IAM
sa = config.get("service_account")
assert sa and "@" in sa, \
f"❌ Invalid service account: {sa}"
print("✅ All Vertex AI agent configuration checks passed")
```
### Post-Deployment Health Check
```python
# scripts/validate-deployment.py
from google.cloud.aiplatform import agent_builder
import requests
def validate_vertex_deployment(
project_id: str,
location: str,
agent_id: str
) -> bool:
"""
Post-deployment validation for Vertex AI Agent Engine.
"""
client = agent_builder.AgentBuilderClient()
agent_name = f"projects/{project_id}/locations/{location}/agents/{agent_id}"
# 1. Agent Status
agent = client.get_agent(name=agent_name)
assert agent.state == "RUNNING", \
f"❌ Agent not running: {agent.state}"
print(f"✅ Agent status: {agent.state}")
# 2. Code Execution Sandbox
assert agent.code_execution_config.enabled, \
"❌ Code Execution not enabled"
print(f"✅ Code Execution enabled (TTL: {agent.code_execution_config.state_ttl_days} days)")
# 3. Memory Bank
assert agent.memory_bank_config.enabled, \
"❌ Memory Bank not enabled"
print(f"✅ Memory Bank enabled")
# 4. A2A Protocol Compliance
agentcard_url = f"{agent.agent_endpoint}/.well-known/agent-card"
response = requests.get(agentcard_url, timeout=10)
assert response.status_code == 200, \
f"❌ AgentCard not accessible: {response.status_code}"
agentcard = response.json()
assert "name" in agentcard and "version" in agentcard, \
"❌ AgentCard missing required fields"
print(f"✅ A2A Protocol: AgentCard accessible")
# 5. Model Armor
assert agent.model_armor.enabled, \
"❌ Model Armor not enabled"
print(f"✅ Model Armor enabled (prompt injection protection)")
# 6. Endpoint
assert agent.agent_endpoint, \
"❌ Agent endpoint not available"
print(f"✅ Agent endpoint: {agent.agent_endpoint}")
# 7. Service Account
assert agent.service_account, \
"❌ Service account not configured"
print(f"✅ Service account: {agent.service_account}")
print("\n✅ All post-deployment validations passed!")
return True
```
## Tool Permissions
This skill uses:
- **Read**: Analyze workflow files and configurations
- **Write**: Create GitHub Actions workflows
- **Edit**: Update existing workflows for compliance
- **Grep**: Search for security issues (JSON keys, etc.)
- **Glob**: Find workflow files across repository
- **Bash**: Execute validation scripts and gcloud commands
## Integration with Other Plugins
### Works with jeremy-adk-orchestrator
- Provides CI/CD for ADK agent deployments
- Automates A2A protocol validation
- Ensures production readiness
### Works with jeremy-vertex-validator
- GitHub Actions calls vertex-validator for post-deployment checks
- Comprehensive validation pipeline
- Production readiness scoring
### Works with jeremy-adk-terraform
- GitHub Actions deploys Terraform infrastructure
- Automated infrastructure provisioning
- Validation of Terraform-provisioned resources
### Works with jeremy-vertex-engine
- GitHub Actions triggers vertex-engine-inspector
- Continuous health monitoring
- Automated compliance checks
## Best Practices Summary
### Security (MANDATORY)
✅ Use WIF (Workload Identity Federation) - never JSON keys
✅ Require `id-token: write` permission for OIDC
✅ IAM least privilege (never owner/editor roles)
✅ Attribute-based access control (restrict by repository)
✅ Enable Model Armor for agents
✅ VPC Service Controls for enterprise isolation
✅ Scan for secrets in code (Trufflehog)
✅ Vulnerability scanning (Trivy)
### Vertex AI Specific (MANDATORY)
✅ Code Execution Sandbox: 7-14 day TTL
✅ Memory Bank enabled for stateful agents
✅ A2A Protocol compliance (AgentCard validation)
✅ Model Armor enabled (prompt injection protection)
✅ Auto-scaling configured (min >= 1, max >= 3)
✅ Post-deployment validation (agent status, endpoints)
✅ Monitoring dashboards created
✅ Alerting policies configured
### CI/CD (RECOMMENDED)
✅ Conditional job execution (only on relevant paths)
✅ Caching for dependencies (faster builds)
✅ Concurrent jobs when possible
✅ Rollback strategies for failed deployments
✅ Health check endpoints
## Version History
- **1.0.0** (2025): Initial release with WIF enforcement, Vertex AI validations, security scanning
## References
- **Workload Identity Federation**: https://cloud.google.com/iam/docs/workload-identity-federation
- **GitHub OIDC**: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect
- **Vertex AI Agent Engine**: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview
- **google-github-actions/auth**: https://github.com/google-github-actions/auth