Initial commit

2025-11-29 18:51:35 +08:00
commit 0543c11758
11 changed files with 361 additions and 0 deletions
--- a/skills/ml-model-trainer/assets/README.md
+++ b/skills/ml-model-trainer/assets/README.md
@@ -0,0 +1,8 @@
+# Assets
+
+Bundled resources for ml-model-trainer skill
+
+- [ ] example_dataset.csv: A sample dataset that can be used to train the model.
+- [ ] model_config.json: A template for configuring the model training parameters.
+- [ ] evaluation_report_template.md: A template for generating the evaluation report.
+- [ ] requirements.txt: Lists the Python dependencies for the scripts.
--- a/skills/ml-model-trainer/assets/evaluation_report_template.md
+++ b/skills/ml-model-trainer/assets/evaluation_report_template.md
@@ -0,0 +1,75 @@
+# Model Evaluation Report
+
+This report summarizes the evaluation of a machine learning model trained using the ML Model Trainer Plugin. It provides key metrics and insights into the model's performance.
+
+## 1. Model Information
+
+*   **Model Name:** [Insert Model Name Here, e.g., "Customer Churn Prediction v1"]
+*   **Model Type:** [Insert Model Type Here, e.g., "Logistic Regression", "Random Forest"]
+*   **Training Date:** [Insert Date of Training Here, e.g., "2023-10-27"]
+*   **Plugin Version:** [Insert Plugin Version Here, find in plugin details]
+*   **Dataset Used for Training:** [Insert Dataset Name/Description Here, e.g., "Customer Transaction Data"]
+
+## 2. Dataset Details
+
+*   **Training Set Size:** [Insert Number of Training Samples Here, e.g., "10,000"]
+*   **Validation Set Size:** [Insert Number of Validation Samples Here, e.g., "2,000"]
+*   **Testing Set Size:** [Insert Number of Testing Samples Here, e.g., "3,000"]
+*   **Features Used:** [List the features used for training. E.g., Age, Income, Location, etc.]
+*   **Target Variable:** [Specify the target variable. E.g., Customer Churn (Yes/No)]
+
+## 3. Training Parameters
+
+*   **Parameters:** [List of the hyper parameters used for the model. E.g., learning rate, number of estimators, etc.]
+*   **Cross-Validation Strategy:** [Describe the cross-validation strategy used (e.g., k-fold cross-validation with k=5)]
+*   **Optimization Metric:** [Specify the metric used for optimization during training (e.g., Accuracy, F1-score)]
+
+## 4. Performance Metrics
+
+### 4.1. Overall Performance
+
+| Metric          | Value  | Description                                                                                                                                                                                                                                                                                              |
+|-----------------|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Accuracy        | [Insert Accuracy Here] | Percentage of correctly classified instances.  *Example: 0.85 means 85% of predictions were correct.*                                                                                                                                                                                           |
+| Precision       | [Insert Precision Here] | Of all instances predicted as positive, what percentage were actually positive? *Example: 0.78 means 78% of instances predicted as positive were actually positive.*                                                                                                                                                                                                                                                                |
+| Recall          | [Insert Recall Here] | Of all actual positive instances, what percentage were correctly predicted? *Example: 0.92 means 92% of all actual positive instances were correctly predicted.*                                                                                                                                                                                                                                                              |
+| F1-Score        | [Insert F1-Score Here] | Harmonic mean of precision and recall.  Provides a balanced measure of the model's performance. *Example: 0.84 represents the harmonic mean of precision and recall.*                                                                                                                                                                                                                                                              |
+| AUC             | [Insert AUC Here] | Area Under the Receiver Operating Characteristic (ROC) curve.  Measures the model's ability to distinguish between positive and negative classes. *Example: 0.95 indicates excellent discrimination between classes.*                                                                                                                                                                                                                                                              |
+
+### 4.2. Detailed Performance (Per Class)
+
+[If applicable, include a table showing performance metrics for each class.  For example, in a binary classification problem (Churn/No Churn), show precision, recall, and F1-score for each class.]
+
+| Class       | Precision | Recall | F1-Score |
+|-------------|-----------|--------|----------|
+| [Class 1 Name] | [Value]   | [Value]  | [Value]  |
+| [Class 2 Name] | [Value]   | [Value]  | [Value]  |
+| ...         | ...       | ...    | ...      |
+
+### 4.3. Confusion Matrix
+
+[Include a confusion matrix showing the counts of true positives, true negatives, false positives, and false negatives.  This can be represented as a table or an image.]
+
+|                   | Predicted Positive | Predicted Negative |
+|-------------------|--------------------|--------------------|
+| Actual Positive   | [True Positives]  | [False Negatives] |
+| Actual Negative   | [False Positives] | [True Negatives]  |
+
+## 5. Model Interpretation
+
+*   **Feature Importance:** [Discuss the most important features influencing the model's predictions. You can provide a ranked list of features and their importance scores.]
+*   **Insights:** [Describe any interesting insights gained from the model. For example, "Customers with high income and low usage are more likely to churn."]
+
+## 6. Recommendations
+
+*   **Model Improvements:** [Suggest potential improvements to the model. For example, "Try using a different algorithm", "Add more features", "Tune hyperparameters."]
+*   **Further Analysis:** [Suggest further analysis that could be performed. For example, "Investigate the reasons for high false positive rates."]
+*   **Deployment Considerations:** [Discuss any considerations for deploying the model to production.  For example, "Monitor the model's performance over time", "Retrain the model periodically with new data."]
+
+## 7. Conclusion
+
+[Summarize the overall performance of the model and its suitability for the intended purpose.  State whether the model is ready for deployment or if further improvements are needed.]
+
+## 8. Appendix (Optional)
+
+*   [Include any additional information, such as detailed code snippets, visualizations, or links to external resources.]
--- a/skills/ml-model-trainer/assets/example_dataset.csv
+++ b/skills/ml-model-trainer/assets/example_dataset.csv
@@ -0,0 +1,39 @@
+# Sample dataset for training a machine learning model.
+# This dataset contains features and a target variable for demonstration purposes.
+# You can replace this with your own dataset.
+
+# Feature 1: Numerical feature representing age (e.g., of a customer).
+# Feature 2: Categorical feature representing location (e.g., city).
+# Feature 3: Binary feature representing whether a customer subscribed (1) or not (0).
+# Target Variable: Represents the outcome we want to predict (e.g., customer churn).
+
+age,location,subscribed,target
+25,New York,1,0
+30,Los Angeles,0,1
+40,Chicago,1,0
+22,Houston,0,0
+35,New York,1,1
+28,Los Angeles,0,0
+45,Chicago,1,1
+31,Houston,0,0
+27,New York,1,0
+33,Los Angeles,0,1
+42,Chicago,1,1
+24,Houston,0,0
+37,New York,1,0
+29,Los Angeles,0,0
+47,Chicago,1,1
+32,Houston,0,1
+26,New York,1,0
+34,Los Angeles,0,1
+41,Chicago,1,0
+23,Houston,0,0
+# Add more rows to create a robust dataset for training.
+# Consider increasing the number of rows and the diversity of data for better model performance.
+# Remember to clean and preprocess your data before training.
+# [ADD_MORE_DATA_HERE]
+# Example:
+# 50,San Francisco,1,1
+# 60,Seattle,0,0
+# 70,Boston,1,1
+# 80,Miami,0,0
--- a/skills/ml-model-trainer/assets/requirements.txt
+++ b/skills/ml-model-trainer/assets/requirements.txt
@@ -0,0 +1,55 @@
+# Requirements for ml-model-trainer Plugin
+
+This file, `requirements.txt`, specifies the Python packages required for the `ml-model-trainer` plugin to function correctly.  It is used by `pip` to install these dependencies.
+
+## Purpose
+
+The `requirements.txt` file ensures that all necessary libraries are installed in the correct versions, guaranteeing consistency and reproducibility of the model training processes.  It helps avoid dependency conflicts and ensures that the plugin operates as intended.
+
+## Contents
+
+Below is a template for the `requirements.txt` file.  **Please carefully review and modify the versions to match your specific needs and compatibility requirements.**
+
+```
+# Core Dependencies
+scikit-learn==<INSERT_SCI_KIT_LEARN_VERSION_HERE>
+pandas==<INSERT_PANDAS_VERSION_HERE>
+numpy==<INSERT_NUMPY_VERSION_HERE>
+
+# Optional Dependencies (Uncomment and specify versions if needed)
+# matplotlib==<INSERT_MATPLOTLIB_VERSION_HERE>  # For visualization
+# seaborn==<INSERT_SEABORN_VERSION_HERE>      # For enhanced visualization
+# xgboost==<INSERT_XGBOOST_VERSION_HERE>        # For XGBoost models
+# lightgbm==<INSERT_LIGHTGBM_VERSION_HERE>      # For LightGBM models
+# tensorflow==<INSERT_TENSORFLOW_VERSION_HERE>   # For TensorFlow models
+# torch==<INSERT_TORCH_VERSION_HERE>             # For PyTorch models
+
+# Example:
+# scikit-learn==1.2.2
+# pandas==1.5.3
+# numpy==1.24.3
+```
+
+## Instructions
+
+1.  **Replace Placeholders:**  Carefully replace the `<INSERT_*_VERSION_HERE>` placeholders with the specific versions of each package you intend to use.  It is highly recommended to specify exact versions to ensure consistent behavior across different environments.
+
+2.  **Uncomment Optional Dependencies:** If your model training process relies on optional packages like `matplotlib`, `seaborn`, `xgboost`, `lightgbm`, `tensorflow`, or `torch`, uncomment the corresponding lines and specify the desired versions.
+
+3.  **Consider Version Compatibility:**  Pay close attention to the compatibility between different packages.  Refer to the documentation of each package to understand potential version conflicts.
+
+4.  **Updating Dependencies:** When updating the plugin, review and update the `requirements.txt` file to reflect any changes in dependencies.
+
+## Example Usage
+
+To install the dependencies listed in this file, use the following command:
+
+```bash
+pip install -r requirements.txt
+```
+
+**Important Considerations:**
+
+*   **Virtual Environments:** It is highly recommended to use virtual environments (e.g., `venv` or `conda`) to isolate the plugin's dependencies from the global Python environment.  This helps prevent conflicts with other projects.
+*   **Package Management:**  Consider using a package manager like `poetry` or `pipenv` for more advanced dependency management features.
+*   **Testing:**  Thoroughly test the plugin after installing the dependencies to ensure that everything functions correctly.