Initial commit
This commit is contained in:
116
commands/specweave-ml-deploy.md
Normal file
116
commands/specweave-ml-deploy.md
Normal file
@@ -0,0 +1,116 @@
|
||||
---
|
||||
name: specweave-ml:ml-deploy
|
||||
description: Generate deployment artifacts (API, Docker, monitoring)
|
||||
---
|
||||
|
||||
# Deploy ML Model
|
||||
|
||||
You are preparing an ML model for production deployment. Generate all necessary deployment artifacts following MLOps best practices.
|
||||
|
||||
## Your Task
|
||||
|
||||
1. **Generate API**: FastAPI endpoint for model serving
|
||||
2. **Containerize**: Dockerfile for model deployment
|
||||
3. **Setup Monitoring**: Prometheus/Grafana configuration
|
||||
4. **Create A/B Test**: Traffic splitting infrastructure
|
||||
5. **Document Deployment**: Deployment runbook
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Step 1: Generate FastAPI App
|
||||
|
||||
```python
|
||||
from specweave import create_model_api
|
||||
|
||||
api = create_model_api(
|
||||
model_path="models/model.pkl",
|
||||
framework="fastapi"
|
||||
)
|
||||
```
|
||||
|
||||
Creates: `api/main.py`, `api/models.py`, `api/predict.py`
|
||||
|
||||
### Step 2: Create Dockerfile
|
||||
|
||||
```python
|
||||
dockerfile = containerize_model(
|
||||
model_path="models/model.pkl",
|
||||
python_version="3.10"
|
||||
)
|
||||
```
|
||||
|
||||
Creates: `Dockerfile`, `requirements.txt`
|
||||
|
||||
### Step 3: Setup Monitoring
|
||||
|
||||
```python
|
||||
monitoring = setup_monitoring(
|
||||
model_name="recommendation-model",
|
||||
metrics=["latency", "throughput", "error_rate", "drift"]
|
||||
)
|
||||
```
|
||||
|
||||
Creates: `monitoring/prometheus.yaml`, `monitoring/grafana-dashboard.json`
|
||||
|
||||
### Step 4: A/B Testing Infrastructure
|
||||
|
||||
```python
|
||||
ab_test = create_ab_test(
|
||||
control_model="model-v2.pkl",
|
||||
treatment_model="model-v3.pkl",
|
||||
traffic_split=0.1
|
||||
)
|
||||
```
|
||||
|
||||
Creates: `ab-test/router.py`, `ab-test/metrics.py`
|
||||
|
||||
### Step 5: Load Testing
|
||||
|
||||
```python
|
||||
load_test_results = load_test_model(
|
||||
api_url="http://localhost:8000/predict",
|
||||
target_rps=100,
|
||||
duration=60
|
||||
)
|
||||
```
|
||||
|
||||
Creates: `load-tests/results.md`
|
||||
|
||||
### Step 6: Deployment Runbook
|
||||
|
||||
Create `DEPLOYMENT.md`:
|
||||
|
||||
```markdown
|
||||
# Deployment Runbook
|
||||
|
||||
## Pre-Deployment Checklist
|
||||
- [ ] Model versioned
|
||||
- [ ] API tested locally
|
||||
- [ ] Load testing passed
|
||||
- [ ] Monitoring configured
|
||||
- [ ] Rollback plan documented
|
||||
|
||||
## Deployment Steps
|
||||
1. Build Docker image
|
||||
2. Push to registry
|
||||
3. Deploy to staging
|
||||
4. Validate staging
|
||||
5. Deploy to production (1% traffic)
|
||||
6. Monitor for 24 hours
|
||||
7. Ramp to 100% if stable
|
||||
|
||||
## Rollback Procedure
|
||||
[Steps to rollback to previous version]
|
||||
|
||||
## Monitoring
|
||||
[Grafana dashboard URL]
|
||||
[Key metrics to watch]
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Report:
|
||||
- All deployment artifacts generated
|
||||
- Load test results (can it handle target RPS?)
|
||||
- Deployment recommendation (ready/not ready)
|
||||
- Next steps for deployment
|
||||
87
commands/specweave-ml-evaluate.md
Normal file
87
commands/specweave-ml-evaluate.md
Normal file
@@ -0,0 +1,87 @@
|
||||
---
|
||||
name: specweave-ml:ml-evaluate
|
||||
description: Evaluate ML model with comprehensive metrics
|
||||
---
|
||||
|
||||
# Evaluate ML Model
|
||||
|
||||
You are evaluating an ML model in a SpecWeave increment. Generate a comprehensive evaluation report following ML best practices.
|
||||
|
||||
## Your Task
|
||||
|
||||
1. **Load Model**: Load the model from the specified increment
|
||||
2. **Run Evaluation**: Execute comprehensive evaluation with appropriate metrics
|
||||
3. **Generate Report**: Create evaluation report in increment folder
|
||||
|
||||
## Evaluation Steps
|
||||
|
||||
### Step 1: Identify Model Type
|
||||
- Classification: accuracy, precision, recall, F1, ROC AUC, confusion matrix
|
||||
- Regression: RMSE, MAE, MAPE, R², residual analysis
|
||||
- Ranking: precision@K, recall@K, NDCG@K, MAP
|
||||
|
||||
### Step 2: Load Test Data
|
||||
```python
|
||||
# Load test set from increment
|
||||
X_test = load_test_data(increment_path)
|
||||
y_test = load_test_labels(increment_path)
|
||||
```
|
||||
|
||||
### Step 3: Compute Metrics
|
||||
```python
|
||||
from specweave import ModelEvaluator
|
||||
|
||||
evaluator = ModelEvaluator(model, X_test, y_test)
|
||||
metrics = evaluator.compute_all_metrics()
|
||||
```
|
||||
|
||||
### Step 4: Generate Visualizations
|
||||
- Confusion matrix (classification)
|
||||
- ROC curves (classification)
|
||||
- Residual plots (regression)
|
||||
- Calibration curves (classification)
|
||||
|
||||
### Step 5: Statistical Validation
|
||||
- Cross-validation results
|
||||
- Confidence intervals
|
||||
- Comparison to baseline
|
||||
- Statistical significance tests
|
||||
|
||||
### Step 6: Generate Report
|
||||
|
||||
Create `evaluation-report.md` in increment folder:
|
||||
|
||||
```markdown
|
||||
# Model Evaluation Report
|
||||
|
||||
## Model: [Model Name]
|
||||
- Version: [Version]
|
||||
- Increment: [Increment ID]
|
||||
- Date: [Evaluation Date]
|
||||
|
||||
## Overall Performance
|
||||
[Metrics table]
|
||||
|
||||
## Visualizations
|
||||
[Embedded plots]
|
||||
|
||||
## Cross-Validation
|
||||
[CV results]
|
||||
|
||||
## Comparison to Baseline
|
||||
[Baseline comparison]
|
||||
|
||||
## Statistical Tests
|
||||
[Significance tests]
|
||||
|
||||
## Recommendations
|
||||
[Deploy/improve/investigate]
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
After evaluation, report:
|
||||
- Overall performance summary
|
||||
- Key metrics
|
||||
- Whether model meets success criteria (from spec.md)
|
||||
- Recommendation (deploy/improve/investigate)
|
||||
83
commands/specweave-ml-explain.md
Normal file
83
commands/specweave-ml-explain.md
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
name: specweave-ml:ml-explain
|
||||
description: Generate model explainability reports (SHAP, LIME, feature importance)
|
||||
---
|
||||
|
||||
# Explain ML Model
|
||||
|
||||
You are generating explainability artifacts for an ML model in a SpecWeave increment. Make the black box transparent.
|
||||
|
||||
## Your Task
|
||||
|
||||
1. **Load Model**: Load model from increment
|
||||
2. **Generate Global Explanations**: Feature importance, partial dependence
|
||||
3. **Generate Local Explanations**: SHAP/LIME for sample predictions
|
||||
4. **Create Report**: Comprehensive explainability documentation
|
||||
|
||||
## Explainability Steps
|
||||
|
||||
### Step 1: Feature Importance
|
||||
```python
|
||||
from specweave import ModelExplainer
|
||||
|
||||
explainer = ModelExplainer(model, X_train)
|
||||
importance = explainer.feature_importance()
|
||||
```
|
||||
|
||||
Create: `feature-importance.png`
|
||||
|
||||
### Step 2: SHAP Summary
|
||||
```python
|
||||
shap_values = explainer.shap_summary()
|
||||
```
|
||||
|
||||
Create: `shap-summary.png` (beeswarm plot)
|
||||
|
||||
### Step 3: Partial Dependence Plots
|
||||
```python
|
||||
for feature in top_features:
|
||||
pdp = explainer.partial_dependence(feature)
|
||||
```
|
||||
|
||||
Create: `pdp-plots/` directory
|
||||
|
||||
### Step 4: Local Explanations
|
||||
```python
|
||||
# Explain sample predictions
|
||||
samples = [high_confidence, low_confidence, edge_case]
|
||||
for sample in samples:
|
||||
explanation = explainer.explain_prediction(sample)
|
||||
```
|
||||
|
||||
Create: `local-explanations/` directory
|
||||
|
||||
### Step 5: Generate Report
|
||||
|
||||
Create `explainability-report.md`:
|
||||
|
||||
```markdown
|
||||
# Model Explainability Report
|
||||
|
||||
## Global Feature Importance
|
||||
[Top 10 features with importance scores]
|
||||
|
||||
## SHAP Analysis
|
||||
[Summary plot and interpretation]
|
||||
|
||||
## Partial Dependence
|
||||
[How each feature affects predictions]
|
||||
|
||||
## Example Explanations
|
||||
[3-5 example predictions with full explanations]
|
||||
|
||||
## Recommendations
|
||||
[Model improvements based on feature analysis]
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Report:
|
||||
- Top 10 most important features
|
||||
- Any surprising feature importance (might indicate data leakage)
|
||||
- Model behavior insights
|
||||
- Recommendations for improvement
|
||||
297
commands/specweave-ml-pipeline.md
Normal file
297
commands/specweave-ml-pipeline.md
Normal file
@@ -0,0 +1,297 @@
|
||||
---
|
||||
name: specweave-ml:ml-pipeline
|
||||
description: Design and implement a complete ML pipeline with multi-agent MLOps orchestration
|
||||
---
|
||||
|
||||
# Machine Learning Pipeline - Multi-Agent MLOps Orchestration
|
||||
|
||||
Design and implement a complete ML pipeline for: $ARGUMENTS
|
||||
|
||||
## Thinking
|
||||
|
||||
This workflow orchestrates multiple specialized agents to build a production-ready ML pipeline following modern MLOps best practices. The approach emphasizes:
|
||||
|
||||
- **Phase-based coordination**: Each phase builds upon previous outputs, with clear handoffs between agents
|
||||
- **Modern tooling integration**: MLflow/W&B for experiments, Feast/Tecton for features, KServe/Seldon for serving
|
||||
- **Production-first mindset**: Every component designed for scale, monitoring, and reliability
|
||||
- **Reproducibility**: Version control for data, models, and infrastructure
|
||||
- **Continuous improvement**: Automated retraining, A/B testing, and drift detection
|
||||
|
||||
The multi-agent approach ensures each aspect is handled by domain experts:
|
||||
- Data engineers handle ingestion and quality
|
||||
- Data scientists design features and experiments
|
||||
- ML engineers implement training pipelines
|
||||
- MLOps engineers handle production deployment
|
||||
- Observability engineers ensure monitoring
|
||||
|
||||
## Phase 1: Data & Requirements Analysis
|
||||
|
||||
<Task>
|
||||
subagent_type: data-engineer
|
||||
prompt: |
|
||||
Analyze and design data pipeline for ML system with requirements: $ARGUMENTS
|
||||
|
||||
Deliverables:
|
||||
1. Data source audit and ingestion strategy:
|
||||
- Source systems and connection patterns
|
||||
- Schema validation using Pydantic/Great Expectations
|
||||
- Data versioning with DVC or lakeFS
|
||||
- Incremental loading and CDC strategies
|
||||
|
||||
2. Data quality framework:
|
||||
- Profiling and statistics generation
|
||||
- Anomaly detection rules
|
||||
- Data lineage tracking
|
||||
- Quality gates and SLAs
|
||||
|
||||
3. Storage architecture:
|
||||
- Raw/processed/feature layers
|
||||
- Partitioning strategy
|
||||
- Retention policies
|
||||
- Cost optimization
|
||||
|
||||
Provide implementation code for critical components and integration patterns.
|
||||
</Task>
|
||||
|
||||
<Task>
|
||||
subagent_type: data-scientist
|
||||
prompt: |
|
||||
Design feature engineering and model requirements for: $ARGUMENTS
|
||||
Using data architecture from: {phase1.data-engineer.output}
|
||||
|
||||
Deliverables:
|
||||
1. Feature engineering pipeline:
|
||||
- Transformation specifications
|
||||
- Feature store schema (Feast/Tecton)
|
||||
- Statistical validation rules
|
||||
- Handling strategies for missing data/outliers
|
||||
|
||||
2. Model requirements:
|
||||
- Algorithm selection rationale
|
||||
- Performance metrics and baselines
|
||||
- Training data requirements
|
||||
- Evaluation criteria and thresholds
|
||||
|
||||
3. Experiment design:
|
||||
- Hypothesis and success metrics
|
||||
- A/B testing methodology
|
||||
- Sample size calculations
|
||||
- Bias detection approach
|
||||
|
||||
Include feature transformation code and statistical validation logic.
|
||||
</Task>
|
||||
|
||||
## Phase 2: Model Development & Training
|
||||
|
||||
<Task>
|
||||
subagent_type: ml-engineer
|
||||
prompt: |
|
||||
Implement training pipeline based on requirements: {phase1.data-scientist.output}
|
||||
Using data pipeline: {phase1.data-engineer.output}
|
||||
|
||||
Build comprehensive training system:
|
||||
1. Training pipeline implementation:
|
||||
- Modular training code with clear interfaces
|
||||
- Hyperparameter optimization (Optuna/Ray Tune)
|
||||
- Distributed training support (Horovod/PyTorch DDP)
|
||||
- Cross-validation and ensemble strategies
|
||||
|
||||
2. Experiment tracking setup:
|
||||
- MLflow/Weights & Biases integration
|
||||
- Metric logging and visualization
|
||||
- Artifact management (models, plots, data samples)
|
||||
- Experiment comparison and analysis tools
|
||||
|
||||
3. Model registry integration:
|
||||
- Version control and tagging strategy
|
||||
- Model metadata and lineage
|
||||
- Promotion workflows (dev -> staging -> prod)
|
||||
- Rollback procedures
|
||||
|
||||
Provide complete training code with configuration management.
|
||||
</Task>
|
||||
|
||||
<Task>
|
||||
subagent_type: python-pro
|
||||
prompt: |
|
||||
Optimize and productionize ML code from: {phase2.ml-engineer.output}
|
||||
|
||||
Focus areas:
|
||||
1. Code quality and structure:
|
||||
- Refactor for production standards
|
||||
- Add comprehensive error handling
|
||||
- Implement proper logging with structured formats
|
||||
- Create reusable components and utilities
|
||||
|
||||
2. Performance optimization:
|
||||
- Profile and optimize bottlenecks
|
||||
- Implement caching strategies
|
||||
- Optimize data loading and preprocessing
|
||||
- Memory management for large-scale training
|
||||
|
||||
3. Testing framework:
|
||||
- Unit tests for data transformations
|
||||
- Integration tests for pipeline components
|
||||
- Model quality tests (invariance, directional)
|
||||
- Performance regression tests
|
||||
|
||||
Deliver production-ready, maintainable code with full test coverage.
|
||||
</Task>
|
||||
|
||||
## Phase 3: Production Deployment & Serving
|
||||
|
||||
<Task>
|
||||
subagent_type: mlops-engineer
|
||||
prompt: |
|
||||
Design production deployment for models from: {phase2.ml-engineer.output}
|
||||
With optimized code from: {phase2.python-pro.output}
|
||||
|
||||
Implementation requirements:
|
||||
1. Model serving infrastructure:
|
||||
- REST/gRPC APIs with FastAPI/TorchServe
|
||||
- Batch prediction pipelines (Airflow/Kubeflow)
|
||||
- Stream processing (Kafka/Kinesis integration)
|
||||
- Model serving platforms (KServe/Seldon Core)
|
||||
|
||||
2. Deployment strategies:
|
||||
- Blue-green deployments for zero downtime
|
||||
- Canary releases with traffic splitting
|
||||
- Shadow deployments for validation
|
||||
- A/B testing infrastructure
|
||||
|
||||
3. CI/CD pipeline:
|
||||
- GitHub Actions/GitLab CI workflows
|
||||
- Automated testing gates
|
||||
- Model validation before deployment
|
||||
- ArgoCD for GitOps deployment
|
||||
|
||||
4. Infrastructure as Code:
|
||||
- Terraform modules for cloud resources
|
||||
- Helm charts for Kubernetes deployments
|
||||
- Docker multi-stage builds for optimization
|
||||
- Secret management with Vault/Secrets Manager
|
||||
|
||||
Provide complete deployment configuration and automation scripts.
|
||||
</Task>
|
||||
|
||||
<Task>
|
||||
subagent_type: kubernetes-architect
|
||||
prompt: |
|
||||
Design Kubernetes infrastructure for ML workloads from: {phase3.mlops-engineer.output}
|
||||
|
||||
Kubernetes-specific requirements:
|
||||
1. Workload orchestration:
|
||||
- Training job scheduling with Kubeflow
|
||||
- GPU resource allocation and sharing
|
||||
- Spot/preemptible instance integration
|
||||
- Priority classes and resource quotas
|
||||
|
||||
2. Serving infrastructure:
|
||||
- HPA/VPA for autoscaling
|
||||
- KEDA for event-driven scaling
|
||||
- Istio service mesh for traffic management
|
||||
- Model caching and warm-up strategies
|
||||
|
||||
3. Storage and data access:
|
||||
- PVC strategies for training data
|
||||
- Model artifact storage with CSI drivers
|
||||
- Distributed storage for feature stores
|
||||
- Cache layers for inference optimization
|
||||
|
||||
Provide Kubernetes manifests and Helm charts for entire ML platform.
|
||||
</Task>
|
||||
|
||||
## Phase 4: Monitoring & Continuous Improvement
|
||||
|
||||
<Task>
|
||||
subagent_type: observability-engineer
|
||||
prompt: |
|
||||
Implement comprehensive monitoring for ML system deployed in: {phase3.mlops-engineer.output}
|
||||
Using Kubernetes infrastructure: {phase3.kubernetes-architect.output}
|
||||
|
||||
Monitoring framework:
|
||||
1. Model performance monitoring:
|
||||
- Prediction accuracy tracking
|
||||
- Latency and throughput metrics
|
||||
- Feature importance shifts
|
||||
- Business KPI correlation
|
||||
|
||||
2. Data and model drift detection:
|
||||
- Statistical drift detection (KS test, PSI)
|
||||
- Concept drift monitoring
|
||||
- Feature distribution tracking
|
||||
- Automated drift alerts and reports
|
||||
|
||||
3. System observability:
|
||||
- Prometheus metrics for all components
|
||||
- Grafana dashboards for visualization
|
||||
- Distributed tracing with Jaeger/Zipkin
|
||||
- Log aggregation with ELK/Loki
|
||||
|
||||
4. Alerting and automation:
|
||||
- PagerDuty/Opsgenie integration
|
||||
- Automated retraining triggers
|
||||
- Performance degradation workflows
|
||||
- Incident response runbooks
|
||||
|
||||
5. Cost tracking:
|
||||
- Resource utilization metrics
|
||||
- Cost allocation by model/experiment
|
||||
- Optimization recommendations
|
||||
- Budget alerts and controls
|
||||
|
||||
Deliver monitoring configuration, dashboards, and alert rules.
|
||||
</Task>
|
||||
|
||||
## Configuration Options
|
||||
|
||||
- **experiment_tracking**: mlflow | wandb | neptune | clearml
|
||||
- **feature_store**: feast | tecton | databricks | custom
|
||||
- **serving_platform**: kserve | seldon | torchserve | triton
|
||||
- **orchestration**: kubeflow | airflow | prefect | dagster
|
||||
- **cloud_provider**: aws | azure | gcp | multi-cloud
|
||||
- **deployment_mode**: realtime | batch | streaming | hybrid
|
||||
- **monitoring_stack**: prometheus | datadog | newrelic | custom
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Data Pipeline Success**:
|
||||
- < 0.1% data quality issues in production
|
||||
- Automated data validation passing 99.9% of time
|
||||
- Complete data lineage tracking
|
||||
- Sub-second feature serving latency
|
||||
|
||||
2. **Model Performance**:
|
||||
- Meeting or exceeding baseline metrics
|
||||
- < 5% performance degradation before retraining
|
||||
- Successful A/B tests with statistical significance
|
||||
- No undetected model drift > 24 hours
|
||||
|
||||
3. **Operational Excellence**:
|
||||
- 99.9% uptime for model serving
|
||||
- < 200ms p99 inference latency
|
||||
- Automated rollback within 5 minutes
|
||||
- Complete observability with < 1 minute alert time
|
||||
|
||||
4. **Development Velocity**:
|
||||
- < 1 hour from commit to production
|
||||
- Parallel experiment execution
|
||||
- Reproducible training runs
|
||||
- Self-service model deployment
|
||||
|
||||
5. **Cost Efficiency**:
|
||||
- < 20% infrastructure waste
|
||||
- Optimized resource allocation
|
||||
- Automatic scaling based on load
|
||||
- Spot instance utilization > 60%
|
||||
|
||||
## Final Deliverables
|
||||
|
||||
Upon completion, the orchestrated pipeline will provide:
|
||||
- End-to-end ML pipeline with full automation
|
||||
- Comprehensive documentation and runbooks
|
||||
- Production-ready infrastructure as code
|
||||
- Complete monitoring and alerting system
|
||||
- CI/CD pipelines for continuous improvement
|
||||
- Cost optimization and scaling strategies
|
||||
- Disaster recovery and rollback procedures
|
||||
Reference in New Issue
Block a user