Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:51:35 +08:00
commit 0543c11758
11 changed files with 361 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
{
"name": "ml-model-trainer",
"description": "Train and optimize machine learning models with automated workflows",
"version": "1.0.0",
"author": {
"name": "Claude Code Plugins",
"email": "[email protected]"
},
"skills": [
"./skills"
],
"commands": [
"./commands"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# ml-model-trainer
Train and optimize machine learning models with automated workflows

23
commands/train.md Normal file
View File

@@ -0,0 +1,23 @@
---
description: Train a machine learning model with specified parameters
---
# Train ML Model
You are an ML training specialist. When this command is invoked:
1. Analyze the dataset and target variable
2. Select appropriate model type (classification, regression, etc.)
3. Configure training parameters
4. Train the model with cross-validation
5. Generate performance metrics
6. Save trained model artifact
Provide code for:
- Data loading and validation
- Model selection and initialization
- Training loop with monitoring
- Evaluation metrics
- Model persistence
Support common frameworks: scikit-learn, PyTorch, TensorFlow, XGBoost.

73
plugin.lock.json Normal file
View File

@@ -0,0 +1,73 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/ai-ml/ml-model-trainer",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "0b84bdce28de29d7113883fa8b0a3728da15a7ee",
"treeHash": "16e8aa90f55e8fa10c71ec8283f26e991f996f3269584d65734ebe81e613cd95",
"generatedAt": "2025-11-28T10:18:34.514446Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "ml-model-trainer",
"description": "Train and optimize machine learning models with automated workflows",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "aad4ee7396487558b3851f7196058c611462b5f5482852830bf2b2e757920d3f"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "32b4b915e6cae52ee697f2199cb94ba246a45c0b81db711c8a008abc7a18aef8"
},
{
"path": "commands/train.md",
"sha256": "ccc60c855122b9de89113c16660bb6052b7a3ffd173c03c467c67cf589db18dd"
},
{
"path": "skills/ml-model-trainer/SKILL.md",
"sha256": "0b1c79a892063a3ca12665dd75eebf77510ca90ea044d200b19cb7fcb59b247a"
},
{
"path": "skills/ml-model-trainer/references/README.md",
"sha256": "75d63fb5df7075b91f22e57f9e77304cf62318e8da963205bffe82c785453f62"
},
{
"path": "skills/ml-model-trainer/scripts/README.md",
"sha256": "6fb5d0cbb988e6858430caa759ff1b877807d693afdc2bce10e1b29a12c3976d"
},
{
"path": "skills/ml-model-trainer/assets/example_dataset.csv",
"sha256": "56a1da02a5290d505836bf2b5ef94af63a3c77edea644ebb915a76a701e0fe22"
},
{
"path": "skills/ml-model-trainer/assets/requirements.txt",
"sha256": "1950a9f5df1a020130e0420c6457d47e37b3e47f02bad2876712c56f5801d622"
},
{
"path": "skills/ml-model-trainer/assets/evaluation_report_template.md",
"sha256": "8413f6bfc8b75db93df796501bd13d49e7f0e1755a1e0e6ba427bb0e15c260ba"
},
{
"path": "skills/ml-model-trainer/assets/README.md",
"sha256": "faa2c3c53a65e5d652e774223f7705a83e2f42334edf70bbc0c1d97be253cd32"
}
],
"dirSha256": "16e8aa90f55e8fa10c71ec8283f26e991f996f3269584d65734ebe81e613cd95"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,52 @@
---
name: training-machine-learning-models
description: |
This skill trains machine learning models using automated workflows. It analyzes datasets, selects appropriate model types (classification, regression, etc.), configures training parameters, trains the model with cross-validation, generates performance metrics, and saves the trained model artifact. Use this skill when the user requests to "train" a model, needs to evaluate a dataset for machine learning purposes, or wants to optimize model performance. The skill supports common frameworks like scikit-learn.
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
version: 1.0.0
---
## Overview
This skill empowers Claude to automatically train and evaluate machine learning models. It streamlines the model development process by handling data analysis, model selection, training, and evaluation, ultimately providing a persisted model artifact.
## How It Works
1. **Data Analysis and Preparation**: The skill analyzes the provided dataset and identifies the target variable, determining the appropriate model type (classification, regression, etc.).
2. **Model Selection and Training**: Based on the data analysis, the skill selects a suitable machine learning model and configures the training parameters. It then trains the model using cross-validation techniques.
3. **Performance Evaluation and Persistence**: After training, the skill generates performance metrics to evaluate the model's effectiveness. Finally, it saves the trained model artifact for future use.
## When to Use This Skill
This skill activates when you need to:
- Train a machine learning model on a given dataset.
- Evaluate the performance of a machine learning model.
- Automate the machine learning model training process.
## Examples
### Example 1: Training a Classification Model
User request: "Train a classification model on this dataset of customer churn data."
The skill will:
1. Analyze the customer churn data, identify the churn status as the target variable, and determine that a classification model is appropriate.
2. Select a suitable classification algorithm (e.g., Logistic Regression, Random Forest), train the model using cross-validation, and generate performance metrics such as accuracy, precision, and recall.
### Example 2: Training a Regression Model
User request: "Train a regression model to predict house prices based on features like size, location, and number of bedrooms."
The skill will:
1. Analyze the house price data, identify the price as the target variable, and determine that a regression model is appropriate.
2. Select a suitable regression algorithm (e.g., Linear Regression, Support Vector Regression), train the model using cross-validation, and generate performance metrics such as Mean Squared Error (MSE) and R-squared.
## Best Practices
- **Data Quality**: Ensure the dataset is clean and properly formatted before training the model.
- **Feature Engineering**: Consider feature engineering techniques to improve model performance.
- **Hyperparameter Tuning**: Experiment with different hyperparameter settings to optimize model performance.
## Integration
This skill can be used in conjunction with other data analysis and manipulation tools to prepare data for training. It can also integrate with model deployment tools to deploy the trained model to production.

View File

@@ -0,0 +1,8 @@
# Assets
Bundled resources for ml-model-trainer skill
- [ ] example_dataset.csv: A sample dataset that can be used to train the model.
- [ ] model_config.json: A template for configuring the model training parameters.
- [ ] evaluation_report_template.md: A template for generating the evaluation report.
- [ ] requirements.txt: Lists the Python dependencies for the scripts.

View File

@@ -0,0 +1,75 @@
# Model Evaluation Report
This report summarizes the evaluation of a machine learning model trained using the ML Model Trainer Plugin. It provides key metrics and insights into the model's performance.
## 1. Model Information
* **Model Name:** [Insert Model Name Here, e.g., "Customer Churn Prediction v1"]
* **Model Type:** [Insert Model Type Here, e.g., "Logistic Regression", "Random Forest"]
* **Training Date:** [Insert Date of Training Here, e.g., "2023-10-27"]
* **Plugin Version:** [Insert Plugin Version Here, find in plugin details]
* **Dataset Used for Training:** [Insert Dataset Name/Description Here, e.g., "Customer Transaction Data"]
## 2. Dataset Details
* **Training Set Size:** [Insert Number of Training Samples Here, e.g., "10,000"]
* **Validation Set Size:** [Insert Number of Validation Samples Here, e.g., "2,000"]
* **Testing Set Size:** [Insert Number of Testing Samples Here, e.g., "3,000"]
* **Features Used:** [List the features used for training. E.g., Age, Income, Location, etc.]
* **Target Variable:** [Specify the target variable. E.g., Customer Churn (Yes/No)]
## 3. Training Parameters
* **Parameters:** [List of the hyper parameters used for the model. E.g., learning rate, number of estimators, etc.]
* **Cross-Validation Strategy:** [Describe the cross-validation strategy used (e.g., k-fold cross-validation with k=5)]
* **Optimization Metric:** [Specify the metric used for optimization during training (e.g., Accuracy, F1-score)]
## 4. Performance Metrics
### 4.1. Overall Performance
| Metric | Value | Description |
|-----------------|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Accuracy | [Insert Accuracy Here] | Percentage of correctly classified instances. *Example: 0.85 means 85% of predictions were correct.* |
| Precision | [Insert Precision Here] | Of all instances predicted as positive, what percentage were actually positive? *Example: 0.78 means 78% of instances predicted as positive were actually positive.* |
| Recall | [Insert Recall Here] | Of all actual positive instances, what percentage were correctly predicted? *Example: 0.92 means 92% of all actual positive instances were correctly predicted.* |
| F1-Score | [Insert F1-Score Here] | Harmonic mean of precision and recall. Provides a balanced measure of the model's performance. *Example: 0.84 represents the harmonic mean of precision and recall.* |
| AUC | [Insert AUC Here] | Area Under the Receiver Operating Characteristic (ROC) curve. Measures the model's ability to distinguish between positive and negative classes. *Example: 0.95 indicates excellent discrimination between classes.* |
### 4.2. Detailed Performance (Per Class)
[If applicable, include a table showing performance metrics for each class. For example, in a binary classification problem (Churn/No Churn), show precision, recall, and F1-score for each class.]
| Class | Precision | Recall | F1-Score |
|-------------|-----------|--------|----------|
| [Class 1 Name] | [Value] | [Value] | [Value] |
| [Class 2 Name] | [Value] | [Value] | [Value] |
| ... | ... | ... | ... |
### 4.3. Confusion Matrix
[Include a confusion matrix showing the counts of true positives, true negatives, false positives, and false negatives. This can be represented as a table or an image.]
| | Predicted Positive | Predicted Negative |
|-------------------|--------------------|--------------------|
| Actual Positive | [True Positives] | [False Negatives] |
| Actual Negative | [False Positives] | [True Negatives] |
## 5. Model Interpretation
* **Feature Importance:** [Discuss the most important features influencing the model's predictions. You can provide a ranked list of features and their importance scores.]
* **Insights:** [Describe any interesting insights gained from the model. For example, "Customers with high income and low usage are more likely to churn."]
## 6. Recommendations
* **Model Improvements:** [Suggest potential improvements to the model. For example, "Try using a different algorithm", "Add more features", "Tune hyperparameters."]
* **Further Analysis:** [Suggest further analysis that could be performed. For example, "Investigate the reasons for high false positive rates."]
* **Deployment Considerations:** [Discuss any considerations for deploying the model to production. For example, "Monitor the model's performance over time", "Retrain the model periodically with new data."]
## 7. Conclusion
[Summarize the overall performance of the model and its suitability for the intended purpose. State whether the model is ready for deployment or if further improvements are needed.]
## 8. Appendix (Optional)
* [Include any additional information, such as detailed code snippets, visualizations, or links to external resources.]

View File

@@ -0,0 +1,39 @@
# Sample dataset for training a machine learning model.
# This dataset contains features and a target variable for demonstration purposes.
# You can replace this with your own dataset.
# Feature 1: Numerical feature representing age (e.g., of a customer).
# Feature 2: Categorical feature representing location (e.g., city).
# Feature 3: Binary feature representing whether a customer subscribed (1) or not (0).
# Target Variable: Represents the outcome we want to predict (e.g., customer churn).
age,location,subscribed,target
25,New York,1,0
30,Los Angeles,0,1
40,Chicago,1,0
22,Houston,0,0
35,New York,1,1
28,Los Angeles,0,0
45,Chicago,1,1
31,Houston,0,0
27,New York,1,0
33,Los Angeles,0,1
42,Chicago,1,1
24,Houston,0,0
37,New York,1,0
29,Los Angeles,0,0
47,Chicago,1,1
32,Houston,0,1
26,New York,1,0
34,Los Angeles,0,1
41,Chicago,1,0
23,Houston,0,0
# Add more rows to create a robust dataset for training.
# Consider increasing the number of rows and the diversity of data for better model performance.
# Remember to clean and preprocess your data before training.
# [ADD_MORE_DATA_HERE]
# Example:
# 50,San Francisco,1,1
# 60,Seattle,0,0
# 70,Boston,1,1
# 80,Miami,0,0
1 # Sample dataset for training a machine learning model.
2 # This dataset contains features and a target variable for demonstration purposes.
3 # You can replace this with your own dataset.
4 # Feature 1: Numerical feature representing age (e.g., of a customer).
5 # Feature 2: Categorical feature representing location (e.g., city).
6 # Feature 3: Binary feature representing whether a customer subscribed (1) or not (0).
7 # Target Variable: Represents the outcome we want to predict (e.g., customer churn).
8 age,location,subscribed,target
9 25,New York,1,0
10 30,Los Angeles,0,1
11 40,Chicago,1,0
12 22,Houston,0,0
13 35,New York,1,1
14 28,Los Angeles,0,0
15 45,Chicago,1,1
16 31,Houston,0,0
17 27,New York,1,0
18 33,Los Angeles,0,1
19 42,Chicago,1,1
20 24,Houston,0,0
21 37,New York,1,0
22 29,Los Angeles,0,0
23 47,Chicago,1,1
24 32,Houston,0,1
25 26,New York,1,0
26 34,Los Angeles,0,1
27 41,Chicago,1,0
28 23,Houston,0,0
29 # Add more rows to create a robust dataset for training.
30 # Consider increasing the number of rows and the diversity of data for better model performance.
31 # Remember to clean and preprocess your data before training.
32 # [ADD_MORE_DATA_HERE]
33 # Example:
34 # 50,San Francisco,1,1
35 # 60,Seattle,0,0
36 # 70,Boston,1,1
37 # 80,Miami,0,0

View File

@@ -0,0 +1,55 @@
# Requirements for ml-model-trainer Plugin
This file, `requirements.txt`, specifies the Python packages required for the `ml-model-trainer` plugin to function correctly. It is used by `pip` to install these dependencies.
## Purpose
The `requirements.txt` file ensures that all necessary libraries are installed in the correct versions, guaranteeing consistency and reproducibility of the model training processes. It helps avoid dependency conflicts and ensures that the plugin operates as intended.
## Contents
Below is a template for the `requirements.txt` file. **Please carefully review and modify the versions to match your specific needs and compatibility requirements.**
```
# Core Dependencies
scikit-learn==<INSERT_SCI_KIT_LEARN_VERSION_HERE>
pandas==<INSERT_PANDAS_VERSION_HERE>
numpy==<INSERT_NUMPY_VERSION_HERE>
# Optional Dependencies (Uncomment and specify versions if needed)
# matplotlib==<INSERT_MATPLOTLIB_VERSION_HERE> # For visualization
# seaborn==<INSERT_SEABORN_VERSION_HERE> # For enhanced visualization
# xgboost==<INSERT_XGBOOST_VERSION_HERE> # For XGBoost models
# lightgbm==<INSERT_LIGHTGBM_VERSION_HERE> # For LightGBM models
# tensorflow==<INSERT_TENSORFLOW_VERSION_HERE> # For TensorFlow models
# torch==<INSERT_TORCH_VERSION_HERE> # For PyTorch models
# Example:
# scikit-learn==1.2.2
# pandas==1.5.3
# numpy==1.24.3
```
## Instructions
1. **Replace Placeholders:** Carefully replace the `<INSERT_*_VERSION_HERE>` placeholders with the specific versions of each package you intend to use. It is highly recommended to specify exact versions to ensure consistent behavior across different environments.
2. **Uncomment Optional Dependencies:** If your model training process relies on optional packages like `matplotlib`, `seaborn`, `xgboost`, `lightgbm`, `tensorflow`, or `torch`, uncomment the corresponding lines and specify the desired versions.
3. **Consider Version Compatibility:** Pay close attention to the compatibility between different packages. Refer to the documentation of each package to understand potential version conflicts.
4. **Updating Dependencies:** When updating the plugin, review and update the `requirements.txt` file to reflect any changes in dependencies.
## Example Usage
To install the dependencies listed in this file, use the following command:
```bash
pip install -r requirements.txt
```
**Important Considerations:**
* **Virtual Environments:** It is highly recommended to use virtual environments (e.g., `venv` or `conda`) to isolate the plugin's dependencies from the global Python environment. This helps prevent conflicts with other projects.
* **Package Management:** Consider using a package manager like `poetry` or `pipenv` for more advanced dependency management features.
* **Testing:** Thoroughly test the plugin after installing the dependencies to ensure that everything functions correctly.

View File

@@ -0,0 +1,9 @@
# References
Bundled resources for ml-model-trainer skill
- [ ] model_selection_guide.md: Provides guidance on selecting the appropriate model type for a given dataset and task.
- [ ] training_parameters.md: Explains the various training parameters and how to configure them for optimal performance.
- [ ] performance_metrics.md: Describes the different performance metrics used to evaluate the trained model.
- [ ] data_preprocessing.md: Best practices for data preprocessing.
- [ ] cross_validation.md: Explains cross-validation techniques.

View File

@@ -0,0 +1,9 @@
# Scripts
Bundled resources for ml-model-trainer skill
- [ ] train_model.py: Automates the model training process, including data loading, preprocessing, model selection, training, and evaluation.
- [ ] evaluate_model.py: Evaluates the trained model using various metrics and generates a report.
- [ ] preprocess_data.py: Preprocesses the input data, including handling missing values, scaling, and feature engineering.
- [ ] save_model.py: Saves the trained model artifact to a specified location.
- [ ] load_model.py: Loads a previously trained model artifact from a specified location.