346 lines
8.3 KiB
Markdown
346 lines
8.3 KiB
Markdown
---
|
|
name: ml-deployment-helper
|
|
description: |
|
|
Prepares ML models for production deployment with containerization, API creation, monitoring setup, and A/B testing. Activates for "deploy model", "production deployment", "model API", "containerize model", "docker ml", "serving ml model", "model monitoring", "A/B test model". Generates deployment artifacts and ensures models are production-ready with monitoring, versioning, and rollback capabilities.
|
|
---
|
|
|
|
# ML Deployment Helper
|
|
|
|
## Overview
|
|
|
|
Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.
|
|
|
|
## Deployment Checklist
|
|
|
|
Before deploying any model, this skill ensures:
|
|
|
|
- ✅ Model versioned and tracked
|
|
- ✅ Dependencies documented (requirements.txt/Dockerfile)
|
|
- ✅ API endpoint created
|
|
- ✅ Input validation implemented
|
|
- ✅ Monitoring configured
|
|
- ✅ A/B testing ready
|
|
- ✅ Rollback plan documented
|
|
- ✅ Performance benchmarked
|
|
|
|
## Deployment Patterns
|
|
|
|
### Pattern 1: REST API (FastAPI)
|
|
|
|
```python
|
|
from specweave import create_model_api
|
|
|
|
# Generates production-ready API
|
|
api = create_model_api(
|
|
model_path="models/model-v3.pkl",
|
|
increment="0042",
|
|
framework="fastapi"
|
|
)
|
|
|
|
# Creates:
|
|
# - api/
|
|
# ├── main.py (FastAPI app)
|
|
# ├── models.py (Pydantic schemas)
|
|
# ├── predict.py (Prediction logic)
|
|
# ├── Dockerfile
|
|
# ├── requirements.txt
|
|
# └── tests/
|
|
```
|
|
|
|
Generated `main.py`:
|
|
```python
|
|
from fastapi import FastAPI, HTTPException
|
|
from pydantic import BaseModel
|
|
import joblib
|
|
|
|
app = FastAPI(title="Recommendation Model API", version="0042-v3")
|
|
|
|
model = joblib.load("model-v3.pkl")
|
|
|
|
class PredictionRequest(BaseModel):
|
|
user_id: int
|
|
context: dict
|
|
|
|
@app.post("/predict")
|
|
async def predict(request: PredictionRequest):
|
|
try:
|
|
prediction = model.predict([request.dict()])
|
|
return {
|
|
"recommendations": prediction.tolist(),
|
|
"model_version": "0042-v3",
|
|
"timestamp": datetime.now()
|
|
}
|
|
except Exception as e:
|
|
raise HTTPException(status_code=500, detail=str(e))
|
|
|
|
@app.get("/health")
|
|
async def health():
|
|
return {"status": "healthy", "model_loaded": model is not None}
|
|
```
|
|
|
|
### Pattern 2: Batch Prediction
|
|
|
|
```python
|
|
from specweave import create_batch_predictor
|
|
|
|
# For offline scoring
|
|
batch_predictor = create_batch_predictor(
|
|
model_path="models/model-v3.pkl",
|
|
increment="0042",
|
|
input_path="s3://bucket/data/",
|
|
output_path="s3://bucket/predictions/"
|
|
)
|
|
|
|
# Creates:
|
|
# - batch/
|
|
# ├── predictor.py
|
|
# ├── scheduler.yaml (Airflow/Kubernetes CronJob)
|
|
# └── monitoring.py
|
|
```
|
|
|
|
### Pattern 3: Real-Time Streaming
|
|
|
|
```python
|
|
from specweave import create_streaming_predictor
|
|
|
|
# For Kafka/Kinesis streams
|
|
streaming = create_streaming_predictor(
|
|
model_path="models/model-v3.pkl",
|
|
increment="0042",
|
|
input_topic="user-events",
|
|
output_topic="predictions"
|
|
)
|
|
|
|
# Creates:
|
|
# - streaming/
|
|
# ├── consumer.py
|
|
# ├── predictor.py
|
|
# ├── producer.py
|
|
# └── docker-compose.yaml
|
|
```
|
|
|
|
## Containerization
|
|
|
|
```python
|
|
from specweave import containerize_model
|
|
|
|
# Generates optimized Dockerfile
|
|
dockerfile = containerize_model(
|
|
model_path="models/model-v3.pkl",
|
|
framework="sklearn",
|
|
python_version="3.10",
|
|
increment="0042"
|
|
)
|
|
```
|
|
|
|
Generated `Dockerfile`:
|
|
```dockerfile
|
|
FROM python:3.10-slim
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy model and dependencies
|
|
COPY models/model-v3.pkl /app/model.pkl
|
|
COPY requirements.txt /app/
|
|
|
|
# Install dependencies
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Copy application
|
|
COPY api/ /app/api/
|
|
|
|
# Health check
|
|
HEALTHCHECK --interval=30s --timeout=3s \
|
|
CMD curl -f http://localhost:8000/health || exit 1
|
|
|
|
# Run API
|
|
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
|
```
|
|
|
|
## Monitoring Setup
|
|
|
|
```python
|
|
from specweave import setup_model_monitoring
|
|
|
|
# Configures monitoring for production
|
|
monitoring = setup_model_monitoring(
|
|
model_name="recommendation-model",
|
|
increment="0042",
|
|
metrics=[
|
|
"prediction_latency",
|
|
"throughput",
|
|
"error_rate",
|
|
"prediction_distribution",
|
|
"feature_drift"
|
|
]
|
|
)
|
|
|
|
# Creates:
|
|
# - monitoring/
|
|
# ├── prometheus.yaml
|
|
# ├── grafana-dashboard.json
|
|
# ├── alerts.yaml
|
|
# └── drift-detector.py
|
|
```
|
|
|
|
## A/B Testing Infrastructure
|
|
|
|
```python
|
|
from specweave import create_ab_test
|
|
|
|
# Sets up A/B test framework
|
|
ab_test = create_ab_test(
|
|
control_model="model-v2.pkl",
|
|
treatment_model="model-v3.pkl",
|
|
traffic_split=0.1, # 10% to new model
|
|
success_metric="click_through_rate",
|
|
increment="0042"
|
|
)
|
|
|
|
# Creates:
|
|
# - ab-test/
|
|
# ├── router.py (traffic splitting)
|
|
# ├── metrics.py (success tracking)
|
|
# ├── statistical-tests.py (significance testing)
|
|
# └── dashboard.py (real-time monitoring)
|
|
```
|
|
|
|
A/B Test Router:
|
|
```python
|
|
import random
|
|
|
|
def route_prediction(user_id, control_model, treatment_model):
|
|
"""Route to control or treatment based on user_id hash"""
|
|
|
|
# Consistent hashing (same user always gets same model)
|
|
user_bucket = hash(user_id) % 100
|
|
|
|
if user_bucket < 10: # 10% to treatment
|
|
return treatment_model.predict(features), "treatment"
|
|
else:
|
|
return control_model.predict(features), "control"
|
|
```
|
|
|
|
## Model Versioning
|
|
|
|
```python
|
|
from specweave import ModelVersion
|
|
|
|
# Register model version
|
|
version = ModelVersion.register(
|
|
model_path="models/model-v3.pkl",
|
|
increment="0042",
|
|
metadata={
|
|
"accuracy": 0.87,
|
|
"training_date": "2024-01-15",
|
|
"data_version": "v2024-01",
|
|
"framework": "xgboost==1.7.0"
|
|
}
|
|
)
|
|
|
|
# Easy rollback
|
|
if production_metrics["error_rate"] > threshold:
|
|
ModelVersion.rollback(to_version="0042-v2")
|
|
```
|
|
|
|
## Load Testing
|
|
|
|
```python
|
|
from specweave import load_test_model
|
|
|
|
# Benchmark model performance
|
|
results = load_test_model(
|
|
api_url="http://localhost:8000/predict",
|
|
requests_per_second=[10, 50, 100, 500, 1000],
|
|
duration_seconds=60,
|
|
increment="0042"
|
|
)
|
|
```
|
|
|
|
Output:
|
|
```
|
|
Load Test Results:
|
|
==================
|
|
|
|
| RPS | Latency P50 | Latency P95 | Latency P99 | Error Rate |
|
|
|------|-------------|-------------|-------------|------------|
|
|
| 10 | 35ms | 45ms | 50ms | 0.00% |
|
|
| 50 | 38ms | 52ms | 65ms | 0.00% |
|
|
| 100 | 45ms | 70ms | 95ms | 0.02% |
|
|
| 500 | 120ms | 250ms | 400ms | 1.20% |
|
|
| 1000 | 350ms | 800ms | 1200ms | 8.50% |
|
|
|
|
Recommendation: Deploy with max 100 RPS per instance
|
|
Target: <100ms P95 latency (achieved at 100 RPS)
|
|
```
|
|
|
|
## Deployment Commands
|
|
|
|
```bash
|
|
# Generate deployment artifacts
|
|
/ml:deploy-prepare 0042
|
|
|
|
# Create API
|
|
/ml:create-api --increment 0042 --framework fastapi
|
|
|
|
# Setup monitoring
|
|
/ml:setup-monitoring 0042
|
|
|
|
# Create A/B test
|
|
/ml:create-ab-test --control v2 --treatment v3 --split 0.1
|
|
|
|
# Load test
|
|
/ml:load-test 0042 --rps 100 --duration 60s
|
|
|
|
# Deploy to production
|
|
/ml:deploy 0042 --environment production
|
|
```
|
|
|
|
## Deployment Increment
|
|
|
|
The skill creates a deployment increment:
|
|
|
|
```
|
|
.specweave/increments/0043-deploy-recommendation-model/
|
|
├── spec.md (deployment requirements)
|
|
├── plan.md (deployment strategy)
|
|
├── tasks.md
|
|
│ ├── [ ] Containerize model
|
|
│ ├── [ ] Create API
|
|
│ ├── [ ] Setup monitoring
|
|
│ ├── [ ] Configure A/B test
|
|
│ ├── [ ] Load test
|
|
│ ├── [ ] Deploy to staging
|
|
│ ├── [ ] Validate staging
|
|
│ └── [ ] Deploy to production
|
|
├── api/ (FastAPI app)
|
|
├── monitoring/ (Grafana dashboards)
|
|
├── ab-test/ (A/B testing logic)
|
|
└── load-tests/ (Performance benchmarks)
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Always load test** before production
|
|
2. **Start with 1-5% traffic** in A/B test
|
|
3. **Monitor model drift** in production
|
|
4. **Version everything** (model, data, code)
|
|
5. **Document rollback plan** before deploying
|
|
6. **Set up alerts** for anomalies
|
|
7. **Gradual rollout** (canary deployment)
|
|
|
|
## Integration with SpecWeave
|
|
|
|
```bash
|
|
# After training model (increment 0042)
|
|
/specweave:inc "0043-deploy-recommendation-model"
|
|
|
|
# Generates deployment increment with all artifacts
|
|
/specweave:do
|
|
|
|
# Deploy to production when ready
|
|
/ml:deploy 0043 --environment production
|
|
```
|
|
|
|
Model deployment is not the end—it's the beginning of the MLOps lifecycle.
|