Initial commit
This commit is contained in:
345
skills/ml-deployment-helper/SKILL.md
Normal file
345
skills/ml-deployment-helper/SKILL.md
Normal file
@@ -0,0 +1,345 @@
|
||||
---
|
||||
name: ml-deployment-helper
|
||||
description: |
|
||||
Prepares ML models for production deployment with containerization, API creation, monitoring setup, and A/B testing. Activates for "deploy model", "production deployment", "model API", "containerize model", "docker ml", "serving ml model", "model monitoring", "A/B test model". Generates deployment artifacts and ensures models are production-ready with monitoring, versioning, and rollback capabilities.
|
||||
---
|
||||
|
||||
# ML Deployment Helper
|
||||
|
||||
## Overview
|
||||
|
||||
Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.
|
||||
|
||||
## Deployment Checklist
|
||||
|
||||
Before deploying any model, this skill ensures:
|
||||
|
||||
- ✅ Model versioned and tracked
|
||||
- ✅ Dependencies documented (requirements.txt/Dockerfile)
|
||||
- ✅ API endpoint created
|
||||
- ✅ Input validation implemented
|
||||
- ✅ Monitoring configured
|
||||
- ✅ A/B testing ready
|
||||
- ✅ Rollback plan documented
|
||||
- ✅ Performance benchmarked
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Pattern 1: REST API (FastAPI)
|
||||
|
||||
```python
|
||||
from specweave import create_model_api
|
||||
|
||||
# Generates production-ready API
|
||||
api = create_model_api(
|
||||
model_path="models/model-v3.pkl",
|
||||
increment="0042",
|
||||
framework="fastapi"
|
||||
)
|
||||
|
||||
# Creates:
|
||||
# - api/
|
||||
# ├── main.py (FastAPI app)
|
||||
# ├── models.py (Pydantic schemas)
|
||||
# ├── predict.py (Prediction logic)
|
||||
# ├── Dockerfile
|
||||
# ├── requirements.txt
|
||||
# └── tests/
|
||||
```
|
||||
|
||||
Generated `main.py`:
|
||||
```python
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from pydantic import BaseModel
|
||||
import joblib
|
||||
|
||||
app = FastAPI(title="Recommendation Model API", version="0042-v3")
|
||||
|
||||
model = joblib.load("model-v3.pkl")
|
||||
|
||||
class PredictionRequest(BaseModel):
|
||||
user_id: int
|
||||
context: dict
|
||||
|
||||
@app.post("/predict")
|
||||
async def predict(request: PredictionRequest):
|
||||
try:
|
||||
prediction = model.predict([request.dict()])
|
||||
return {
|
||||
"recommendations": prediction.tolist(),
|
||||
"model_version": "0042-v3",
|
||||
"timestamp": datetime.now()
|
||||
}
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "healthy", "model_loaded": model is not None}
|
||||
```
|
||||
|
||||
### Pattern 2: Batch Prediction
|
||||
|
||||
```python
|
||||
from specweave import create_batch_predictor
|
||||
|
||||
# For offline scoring
|
||||
batch_predictor = create_batch_predictor(
|
||||
model_path="models/model-v3.pkl",
|
||||
increment="0042",
|
||||
input_path="s3://bucket/data/",
|
||||
output_path="s3://bucket/predictions/"
|
||||
)
|
||||
|
||||
# Creates:
|
||||
# - batch/
|
||||
# ├── predictor.py
|
||||
# ├── scheduler.yaml (Airflow/Kubernetes CronJob)
|
||||
# └── monitoring.py
|
||||
```
|
||||
|
||||
### Pattern 3: Real-Time Streaming
|
||||
|
||||
```python
|
||||
from specweave import create_streaming_predictor
|
||||
|
||||
# For Kafka/Kinesis streams
|
||||
streaming = create_streaming_predictor(
|
||||
model_path="models/model-v3.pkl",
|
||||
increment="0042",
|
||||
input_topic="user-events",
|
||||
output_topic="predictions"
|
||||
)
|
||||
|
||||
# Creates:
|
||||
# - streaming/
|
||||
# ├── consumer.py
|
||||
# ├── predictor.py
|
||||
# ├── producer.py
|
||||
# └── docker-compose.yaml
|
||||
```
|
||||
|
||||
## Containerization
|
||||
|
||||
```python
|
||||
from specweave import containerize_model
|
||||
|
||||
# Generates optimized Dockerfile
|
||||
dockerfile = containerize_model(
|
||||
model_path="models/model-v3.pkl",
|
||||
framework="sklearn",
|
||||
python_version="3.10",
|
||||
increment="0042"
|
||||
)
|
||||
```
|
||||
|
||||
Generated `Dockerfile`:
|
||||
```dockerfile
|
||||
FROM python:3.10-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Copy model and dependencies
|
||||
COPY models/model-v3.pkl /app/model.pkl
|
||||
COPY requirements.txt /app/
|
||||
|
||||
# Install dependencies
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy application
|
||||
COPY api/ /app/api/
|
||||
|
||||
# Health check
|
||||
HEALTHCHECK --interval=30s --timeout=3s \
|
||||
CMD curl -f http://localhost:8000/health || exit 1
|
||||
|
||||
# Run API
|
||||
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
|
||||
## Monitoring Setup
|
||||
|
||||
```python
|
||||
from specweave import setup_model_monitoring
|
||||
|
||||
# Configures monitoring for production
|
||||
monitoring = setup_model_monitoring(
|
||||
model_name="recommendation-model",
|
||||
increment="0042",
|
||||
metrics=[
|
||||
"prediction_latency",
|
||||
"throughput",
|
||||
"error_rate",
|
||||
"prediction_distribution",
|
||||
"feature_drift"
|
||||
]
|
||||
)
|
||||
|
||||
# Creates:
|
||||
# - monitoring/
|
||||
# ├── prometheus.yaml
|
||||
# ├── grafana-dashboard.json
|
||||
# ├── alerts.yaml
|
||||
# └── drift-detector.py
|
||||
```
|
||||
|
||||
## A/B Testing Infrastructure
|
||||
|
||||
```python
|
||||
from specweave import create_ab_test
|
||||
|
||||
# Sets up A/B test framework
|
||||
ab_test = create_ab_test(
|
||||
control_model="model-v2.pkl",
|
||||
treatment_model="model-v3.pkl",
|
||||
traffic_split=0.1, # 10% to new model
|
||||
success_metric="click_through_rate",
|
||||
increment="0042"
|
||||
)
|
||||
|
||||
# Creates:
|
||||
# - ab-test/
|
||||
# ├── router.py (traffic splitting)
|
||||
# ├── metrics.py (success tracking)
|
||||
# ├── statistical-tests.py (significance testing)
|
||||
# └── dashboard.py (real-time monitoring)
|
||||
```
|
||||
|
||||
A/B Test Router:
|
||||
```python
|
||||
import random
|
||||
|
||||
def route_prediction(user_id, control_model, treatment_model):
|
||||
"""Route to control or treatment based on user_id hash"""
|
||||
|
||||
# Consistent hashing (same user always gets same model)
|
||||
user_bucket = hash(user_id) % 100
|
||||
|
||||
if user_bucket < 10: # 10% to treatment
|
||||
return treatment_model.predict(features), "treatment"
|
||||
else:
|
||||
return control_model.predict(features), "control"
|
||||
```
|
||||
|
||||
## Model Versioning
|
||||
|
||||
```python
|
||||
from specweave import ModelVersion
|
||||
|
||||
# Register model version
|
||||
version = ModelVersion.register(
|
||||
model_path="models/model-v3.pkl",
|
||||
increment="0042",
|
||||
metadata={
|
||||
"accuracy": 0.87,
|
||||
"training_date": "2024-01-15",
|
||||
"data_version": "v2024-01",
|
||||
"framework": "xgboost==1.7.0"
|
||||
}
|
||||
)
|
||||
|
||||
# Easy rollback
|
||||
if production_metrics["error_rate"] > threshold:
|
||||
ModelVersion.rollback(to_version="0042-v2")
|
||||
```
|
||||
|
||||
## Load Testing
|
||||
|
||||
```python
|
||||
from specweave import load_test_model
|
||||
|
||||
# Benchmark model performance
|
||||
results = load_test_model(
|
||||
api_url="http://localhost:8000/predict",
|
||||
requests_per_second=[10, 50, 100, 500, 1000],
|
||||
duration_seconds=60,
|
||||
increment="0042"
|
||||
)
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Load Test Results:
|
||||
==================
|
||||
|
||||
| RPS | Latency P50 | Latency P95 | Latency P99 | Error Rate |
|
||||
|------|-------------|-------------|-------------|------------|
|
||||
| 10 | 35ms | 45ms | 50ms | 0.00% |
|
||||
| 50 | 38ms | 52ms | 65ms | 0.00% |
|
||||
| 100 | 45ms | 70ms | 95ms | 0.02% |
|
||||
| 500 | 120ms | 250ms | 400ms | 1.20% |
|
||||
| 1000 | 350ms | 800ms | 1200ms | 8.50% |
|
||||
|
||||
Recommendation: Deploy with max 100 RPS per instance
|
||||
Target: <100ms P95 latency (achieved at 100 RPS)
|
||||
```
|
||||
|
||||
## Deployment Commands
|
||||
|
||||
```bash
|
||||
# Generate deployment artifacts
|
||||
/ml:deploy-prepare 0042
|
||||
|
||||
# Create API
|
||||
/ml:create-api --increment 0042 --framework fastapi
|
||||
|
||||
# Setup monitoring
|
||||
/ml:setup-monitoring 0042
|
||||
|
||||
# Create A/B test
|
||||
/ml:create-ab-test --control v2 --treatment v3 --split 0.1
|
||||
|
||||
# Load test
|
||||
/ml:load-test 0042 --rps 100 --duration 60s
|
||||
|
||||
# Deploy to production
|
||||
/ml:deploy 0042 --environment production
|
||||
```
|
||||
|
||||
## Deployment Increment
|
||||
|
||||
The skill creates a deployment increment:
|
||||
|
||||
```
|
||||
.specweave/increments/0043-deploy-recommendation-model/
|
||||
├── spec.md (deployment requirements)
|
||||
├── plan.md (deployment strategy)
|
||||
├── tasks.md
|
||||
│ ├── [ ] Containerize model
|
||||
│ ├── [ ] Create API
|
||||
│ ├── [ ] Setup monitoring
|
||||
│ ├── [ ] Configure A/B test
|
||||
│ ├── [ ] Load test
|
||||
│ ├── [ ] Deploy to staging
|
||||
│ ├── [ ] Validate staging
|
||||
│ └── [ ] Deploy to production
|
||||
├── api/ (FastAPI app)
|
||||
├── monitoring/ (Grafana dashboards)
|
||||
├── ab-test/ (A/B testing logic)
|
||||
└── load-tests/ (Performance benchmarks)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always load test** before production
|
||||
2. **Start with 1-5% traffic** in A/B test
|
||||
3. **Monitor model drift** in production
|
||||
4. **Version everything** (model, data, code)
|
||||
5. **Document rollback plan** before deploying
|
||||
6. **Set up alerts** for anomalies
|
||||
7. **Gradual rollout** (canary deployment)
|
||||
|
||||
## Integration with SpecWeave
|
||||
|
||||
```bash
|
||||
# After training model (increment 0042)
|
||||
/specweave:inc "0043-deploy-recommendation-model"
|
||||
|
||||
# Generates deployment increment with all artifacts
|
||||
/specweave:do
|
||||
|
||||
# Deploy to production when ready
|
||||
/ml:deploy 0043 --environment production
|
||||
```
|
||||
|
||||
Model deployment is not the end—it's the beginning of the MLOps lifecycle.
|
||||
Reference in New Issue
Block a user