2.4 KiB
2.4 KiB
name, description
| name | description |
|---|---|
| specweave-ml:ml-deploy | Generate deployment artifacts (API, Docker, monitoring) |
Deploy ML Model
You are preparing an ML model for production deployment. Generate all necessary deployment artifacts following MLOps best practices.
Your Task
- Generate API: FastAPI endpoint for model serving
- Containerize: Dockerfile for model deployment
- Setup Monitoring: Prometheus/Grafana configuration
- Create A/B Test: Traffic splitting infrastructure
- Document Deployment: Deployment runbook
Deployment Steps
Step 1: Generate FastAPI App
from specweave import create_model_api
api = create_model_api(
model_path="models/model.pkl",
framework="fastapi"
)
Creates: api/main.py, api/models.py, api/predict.py
Step 2: Create Dockerfile
dockerfile = containerize_model(
model_path="models/model.pkl",
python_version="3.10"
)
Creates: Dockerfile, requirements.txt
Step 3: Setup Monitoring
monitoring = setup_monitoring(
model_name="recommendation-model",
metrics=["latency", "throughput", "error_rate", "drift"]
)
Creates: monitoring/prometheus.yaml, monitoring/grafana-dashboard.json
Step 4: A/B Testing Infrastructure
ab_test = create_ab_test(
control_model="model-v2.pkl",
treatment_model="model-v3.pkl",
traffic_split=0.1
)
Creates: ab-test/router.py, ab-test/metrics.py
Step 5: Load Testing
load_test_results = load_test_model(
api_url="http://localhost:8000/predict",
target_rps=100,
duration=60
)
Creates: load-tests/results.md
Step 6: Deployment Runbook
Create DEPLOYMENT.md:
# Deployment Runbook
## Pre-Deployment Checklist
- [ ] Model versioned
- [ ] API tested locally
- [ ] Load testing passed
- [ ] Monitoring configured
- [ ] Rollback plan documented
## Deployment Steps
1. Build Docker image
2. Push to registry
3. Deploy to staging
4. Validate staging
5. Deploy to production (1% traffic)
6. Monitor for 24 hours
7. Ramp to 100% if stable
## Rollback Procedure
[Steps to rollback to previous version]
## Monitoring
[Grafana dashboard URL]
[Key metrics to watch]
Output
Report:
- All deployment artifacts generated
- Load test results (can it handle target RPS?)
- Deployment recommendation (ready/not ready)
- Next steps for deployment