Initial commit

2025-11-29 18:50:58 +08:00
commit 3fb2d73fdf
11 changed files with 488 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
+{
+  "name": "automl-pipeline-builder",
+  "description": "Build AutoML pipelines",
+  "version": "1.0.0",
+  "author": {
+    "name": "Claude Code Plugins",
+    "email": "[email protected]"
+  },
+  "skills": [
+    "./skills"
+  ],
+  "commands": [
+    "./commands"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# automl-pipeline-builder
+
+Build AutoML pipelines
--- a/commands/build-automl.md
+++ b/commands/build-automl.md
@@ -0,0 +1,15 @@
+---
+description: Execute AI/ML task with intelligent automation
+---
+
+# AI/ML Task Executor
+
+You are an AI/ML specialist. When this command is invoked:
+
+1. Analyze the current context and requirements
+2. Generate appropriate code for the ML task
+3. Include data validation and error handling
+4. Provide performance metrics and insights
+5. Save artifacts and generate documentation
+
+Support modern ML frameworks and best practices.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,73 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/ai-ml/automl-pipeline-builder",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "fb34009766a8bc2e9399fffd0376914fd9c97ab3",
+    "treeHash": "c31c6acb6671d79f92b4e28acb851a76d56e34bb91fe52739417ac754698c660",
+    "generatedAt": "2025-11-28T10:18:10.783298Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "automl-pipeline-builder",
+    "description": "Build AutoML pipelines",
+    "version": "1.0.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "65ddad12a34abea594620cef98ef80e73e8f5fe9e4b51f6de2748cef28f5b33c"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "29894f57198e94aa48e120e7a39a2ad0a38259cd66f7a7c6aaf4b865722d4248"
+      },
+      {
+        "path": "commands/build-automl.md",
+        "sha256": "043efb83e2f02fc6d0869c8a3a7388d6e49f6c809292b93dd6a97a1b142e5647"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/SKILL.md",
+        "sha256": "95dcd51280f06617680d939bb4b53db46e97e18bd51ceaf5eaecd600a5def7b2"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/references/README.md",
+        "sha256": "cc008a892b439b86749b67023a04bb02ceaa11207207b5a267a4c161afa0a73d"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/scripts/README.md",
+        "sha256": "06961fee5bb99c5bd43e86c2cef8c010c7c93a20cd23f6125d8575f356c0b0a2"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/assets/example_dataset.csv",
+        "sha256": "df2165de808d5ffc740008e372a6dcf72d7851aa0a4f9b087cacfd47e6adc9d3"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/assets/pipeline_template.yaml",
+        "sha256": "ce61db53c569e9945a1bc79fc05ba5757395dfec0cad306039eb6927b6c0a6c3"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/assets/README.md",
+        "sha256": "392baecc536382ba5f53272f95a232d90e39b683e1f678bdc174b49282b13f29"
+      },
+      {
+        "path": "skills/automl-pipeline-builder/assets/evaluation_report_template.html",
+        "sha256": "bc27b226c4c65d0567721be60a65e799b7511ef72c545277a68cc3a0729f7a8f"
+      }
+    ],
+    "dirSha256": "c31c6acb6671d79f92b4e28acb851a76d56e34bb91fe52739417ac754698c660"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/automl-pipeline-builder/SKILL.md
+++ b/skills/automl-pipeline-builder/SKILL.md
@@ -0,0 +1,53 @@
+---
+name: building-automl-pipelines
+description: |
+  This skill empowers Claude to build AutoML pipelines using the automl-pipeline-builder plugin. It is triggered when the user requests the creation of an automated machine learning pipeline, specifies the use of AutoML techniques, or asks for assistance in automating the machine learning model building process. The skill analyzes the context, generates code for the ML task, includes data validation and error handling, provides performance metrics, and saves artifacts with documentation. Use this skill when the user explicitly asks to "build automl pipeline", "create automated ml pipeline", or needs help with "automating machine learning workflows".
+allowed-tools: Read, Write, Edit, Grep, Glob, Bash
+version: 1.0.0
+---
+
+## Overview
+
+This skill automates the creation of machine learning pipelines using the automl-pipeline-builder plugin. It simplifies the process of building, training, and evaluating machine learning models by automating feature engineering, model selection, and hyperparameter tuning.
+
+## How It Works
+
+1. **Analyze Requirements**: The skill analyzes the user's request and identifies the specific machine learning task and data requirements.
+2. **Generate Code**: Based on the analysis, the skill generates the necessary code to build an AutoML pipeline using appropriate libraries.
+3. **Implement Best Practices**: The skill incorporates data validation, error handling, and performance optimization techniques into the generated code.
+4. **Provide Insights**: After execution, the skill provides performance metrics, insights, and documentation for the created pipeline.
+
+## When to Use This Skill
+
+This skill activates when you need to:
+- Build an automated machine learning pipeline.
+- Automate the process of model selection and hyperparameter tuning.
+- Generate code for a complete AutoML workflow.
+
+## Examples
+
+### Example 1: Creating a Classification Pipeline
+
+User request: "Build an AutoML pipeline for classifying customer churn."
+
+The skill will:
+1. Generate code to load and preprocess customer data.
+2. Create an AutoML pipeline that automatically selects and tunes a classification model.
+
+### Example 2: Optimizing a Regression Model
+
+User request: "Create an automated ml pipeline to predict house prices."
+
+The skill will:
+1. Generate code to build a regression model using AutoML techniques.
+2. Automatically select the best performing model and provide performance metrics.
+
+## Best Practices
+
+- **Data Preparation**: Ensure data is clean, properly formatted, and relevant to the machine learning task.
+- **Performance Monitoring**: Continuously monitor the performance of the AutoML pipeline and retrain the model as needed.
+- **Error Handling**: Implement robust error handling to gracefully handle unexpected issues during pipeline execution.
+
+## Integration
+
+This skill can be integrated with other data processing and visualization plugins to create end-to-end machine learning workflows. It can also be used in conjunction with deployment plugins to automate the deployment of trained models.
--- a/skills/automl-pipeline-builder/assets/README.md
+++ b/skills/automl-pipeline-builder/assets/README.md
@@ -0,0 +1,7 @@
+# Assets
+
+Bundled resources for automl-pipeline-builder skill
+
+- [ ] pipeline_template.yaml: YAML template for defining the structure and configuration of the AutoML pipeline.
+- [ ] example_dataset.csv: Sample dataset that can be used as input for the AutoML pipeline.
+- [ ] evaluation_report_template.html: HTML template for generating the model evaluation report.
--- a/skills/automl-pipeline-builder/assets/evaluation_report_template.html
+++ b/skills/automl-pipeline-builder/assets/evaluation_report_template.html
@@ -0,0 +1,203 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>AutoML Model Evaluation Report</title>
+    <style>
+        /* Basic Reset */
+        body, h1, h2, h3, p, table, th, td {
+            margin: 0;
+            padding: 0;
+            border: 0;
+            font-size: 100%;
+            font: inherit;
+            vertical-align: baseline;
+        }
+
+        /* General Styles */
+        body {
+            font-family: sans-serif;
+            line-height: 1.6;
+            background-color: #f4f4f4;
+            color: #333;
+            padding: 20px;
+        }
+
+        .container {
+            max-width: 960px;
+            margin: 0 auto;
+            background-color: #fff;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
+        }
+
+        h1, h2, h3 {
+            margin-bottom: 15px;
+            color: #0056b3;
+        }
+
+        h1 {
+            font-size: 2.5em;
+        }
+
+        h2 {
+            font-size: 2em;
+        }
+
+        h3 {
+            font-size: 1.5em;
+        }
+
+        p {
+            margin-bottom: 15px;
+        }
+
+        /* Table Styles */
+        table {
+            width: 100%;
+            border-collapse: collapse;
+            margin-bottom: 20px;
+        }
+
+        th, td {
+            padding: 12px 15px;
+            text-align: left;
+            border-bottom: 1px solid #ddd;
+        }
+
+        th {
+            background-color: #f0f0f0;
+            font-weight: bold;
+        }
+
+        /* Responsive Design */
+        @media (max-width: 768px) {
+            .container {
+                padding: 15px;
+            }
+
+            h1 {
+                font-size: 2em;
+            }
+
+            h2 {
+                font-size: 1.6em;
+            }
+
+            h3 {
+                font-size: 1.3em;
+            }
+
+            table {
+                display: block;
+                overflow-x: auto;
+            }
+        }
+
+        /* Specific Styles */
+        .model-summary {
+            margin-bottom: 30px;
+        }
+
+        .evaluation-metrics {
+            margin-bottom: 30px;
+        }
+
+        .visualizations {
+            margin-bottom: 30px;
+        }
+
+        .conclusion {
+            margin-bottom: 20px;
+        }
+
+        .visualization-image {
+            max-width: 100%;
+            height: auto;
+            border: 1px solid #ccc;
+            border-radius: 5px;
+            margin-bottom: 10px;
+        }
+    </style>
+</head>
+<body>
+
+    <div class="container">
+        <!-- Report Header -->
+        <h1>AutoML Model Evaluation Report</h1>
+        <p>Generated on: {{generation_date}}</p>
+
+        <!-- Model Summary -->
+        <section class="model-summary">
+            <h2>Model Summary</h2>
+            <p><strong>Model Name:</strong> {{model_name}}</p>
+            <p><strong>Algorithm:</strong> {{algorithm}}</p>
+            <p><strong>Dataset:</strong> {{dataset_name}}</p>
+            <p><strong>Features Used:</strong> {{features_used}}</p>
+        </section>
+
+        <!-- Evaluation Metrics -->
+        <section class="evaluation-metrics">
+            <h2>Evaluation Metrics</h2>
+            <table>
+                <thead>
+                    <tr>
+                        <th>Metric</th>
+                        <th>Value</th>
+                    </tr>
+                </thead>
+                <tbody>
+                    <tr>
+                        <td>Accuracy</td>
+                        <td>{{accuracy}}</td>
+                    </tr>
+                    <tr>
+                        <td>Precision</td>
+                        <td>{{precision}}</td>
+                    </tr>
+                    <tr>
+                        <td>Recall</td>
+                        <td>{{recall}}</td>
+                    </tr>
+                    <tr>
+                        <td>F1-Score</td>
+                        <td>{{f1_score}}</td>
+                    </tr>
+                    <tr>
+                        <td>AUC-ROC</td>
+                        <td>{{auc_roc}}</td>
+                    </tr>
+                </tbody>
+            </table>
+        </section>
+
+        <!-- Visualizations -->
+        <section class="visualizations">
+            <h2>Visualizations</h2>
+            <h3>Confusion Matrix</h3>
+            <img src="{{confusion_matrix_image}}" alt="Confusion Matrix" class="visualization-image">
+
+            <h3>ROC Curve</h3>
+            <img src="{{roc_curve_image}}" alt="ROC Curve" class="visualization-image">
+
+            <h3>Feature Importance</h3>
+            <img src="{{feature_importance_image}}" alt="Feature Importance" class="visualization-image">
+        </section>
+
+        <!-- Conclusion -->
+        <section class="conclusion">
+            <h2>Conclusion</h2>
+            <p>{{conclusion_text}}</p>
+        </section>
+
+        <!-- Additional Notes -->
+        <section class="notes">
+            <h3>Additional Notes</h3>
+            <p>{{additional_notes}}</p>
+        </section>
+    </div>
+
+</body>
+</html>
--- a/skills/automl-pipeline-builder/assets/example_dataset.csv
+++ b/skills/automl-pipeline-builder/assets/example_dataset.csv
@@ -0,0 +1,36 @@
+# Sample dataset for AutoML pipeline builder plugin
+# This dataset is a simplified example and may not be suitable for all AutoML tasks.
+# Replace this with your actual dataset for optimal results.
+#
+# Columns:
+#   feature1: Numerical feature (e.g., age, income)
+#   feature2: Categorical feature (e.g., city, product type) - encoded as strings
+#   target: Target variable (e.g., churn, conversion) - binary (0 or 1)
+
+feature1,feature2,target
+25,New York,0
+30,Los Angeles,1
+40,Chicago,0
+22,Houston,0
+35,Phoenix,1
+48,Philadelphia,1
+28,San Antonio,0
+32,San Diego,1
+45,Dallas,0
+27,San Jose,0
+31,Austin,1
+38,Jacksonville,0
+24,Fort Worth,0
+41,Columbus,1
+29,Charlotte,0
+33,San Francisco,1
+46,Indianapolis,1
+23,Seattle,0
+36,Denver,1
+49,Washington,1
+# Add more data rows here.  Aim for a larger dataset (hundreds or thousands of rows) for better AutoML performance.
+# Example:
+# 52,Miami,0
+# 39,Boston,1
+# Consider adding missing values (e.g., empty strings) to test the pipeline's handling of missing data.
+# For categorical features with many unique values, consider using techniques like one-hot encoding or target encoding.
--- a/skills/automl-pipeline-builder/assets/pipeline_template.yaml
+++ b/skills/automl-pipeline-builder/assets/pipeline_template.yaml
@@ -0,0 +1,69 @@
+# pipeline_template.yaml
+
+# --- General Pipeline Configuration ---
+pipeline_name: "AutoML Pipeline - REPLACE_ME" # Name of the pipeline (e.g., Customer Churn Prediction)
+description: "Automated Machine Learning pipeline for REPLACE_ME." # Short description of the pipeline's purpose
+version: "1.0.0" # Pipeline version
+
+# --- Data Source Configuration ---
+data_source:
+  type: "csv" # Type of data source (e.g., csv, database, api)
+  location: "data/YOUR_DATASET.csv" # Path to the data file or connection string
+  target_column: "target" # Name of the target variable column
+  index_column: null # Name of the index column (optional)
+  delimiter: "," # Delimiter for CSV files (e.g., ",", ";", "\t")
+  quotechar: '"' # Quote character for CSV files
+  encoding: "utf-8" # Encoding of the data file
+
+# --- Feature Engineering Configuration ---
+feature_engineering:
+  enabled: true # Enable or disable feature engineering
+  numeric_imputation: "mean" # Strategy for handling missing numerical values (e.g., mean, median, most_frequent, constant)
+  categorical_encoding: "onehot" # Method for encoding categorical features (e.g., onehot, ordinal, target)
+  feature_scaling: "standard" # Scaling method for numeric features (e.g., standard, minmax, robust)
+  feature_selection:
+    enabled: false # Enable or disable feature selection
+    method: "variance_threshold" # Feature selection method (e.g., variance_threshold, selectkbest)
+    threshold: 0.01 # Threshold for feature selection (depends on the method)
+
+# --- Model Training Configuration ---
+model_training:
+  algorithm: "xgboost" # Machine learning algorithm to use (e.g., xgboost, lightgbm, randomforest, logisticregression)
+  hyperparameter_tuning:
+    enabled: true # Enable or disable hyperparameter tuning
+    method: "random_search" # Hyperparameter tuning method (e.g., random_search, grid_search, bayesian_optimization)
+    n_trials: 50 # Number of trials for hyperparameter tuning
+    scoring_metric: "roc_auc" # Metric to optimize for (e.g., roc_auc, accuracy, f1, precision, recall)
+    hyperparameter_space: # Define hyperparameter ranges for each algorithm
+      xgboost: # Example for XGBoost
+        n_estimators: [100, 200, 300]
+        learning_rate: [0.01, 0.1, 0.2]
+        max_depth: [3, 5, 7]
+      # Add hyperparameter spaces for other algorithms as needed
+
+# --- Model Evaluation Configuration ---
+model_evaluation:
+  split_ratio: 0.2 # Ratio for splitting data into training and validation sets
+  scoring_metrics: ["roc_auc", "accuracy", "f1", "precision", "recall"] # List of metrics to evaluate the model
+  cross_validation:
+    enabled: true # Enable or disable cross-validation
+    n_folds: 5 # Number of folds for cross-validation
+
+# --- Model Deployment Configuration ---
+model_deployment:
+  enabled: false # Enable or disable model deployment
+  environment: "staging" # Target deployment environment (e.g., staging, production)
+  model_registry: "local" # Location to store the trained model (e.g., local, s3, gcp)
+  model_path: "models/YOUR_MODEL.pkl" # Path to save the trained model
+  api_endpoint: "YOUR_API_ENDPOINT" # API endpoint for model deployment (if applicable)
+
+# --- Logging Configuration ---
+logging:
+  level: "INFO" # Logging level (e.g., DEBUG, INFO, WARNING, ERROR)
+  format: "%(asctime)s - %(levelname)s - %(message)s" # Logging format
+  file_path: "logs/pipeline.log" # Path to the log file
+
+# --- Error Handling Configuration ---
+error_handling:
+  on_failure: "email_notification" # Action to take on pipeline failure (e.g., email_notification, retry, stop)
+  email_recipients: ["YOUR_EMAIL@example.com"] # List of email addresses to notify on failure
--- a/skills/automl-pipeline-builder/references/README.md
+++ b/skills/automl-pipeline-builder/references/README.md
@@ -0,0 +1,7 @@
+# References
+
+Bundled resources for automl-pipeline-builder skill
+
+- [ ] automl_best_practices.md: Document outlining best practices for building and deploying AutoML pipelines, including data preprocessing, feature engineering, and model selection.
+- [ ] supported_algorithms.md: Document listing the supported machine learning algorithms within the AutoML pipeline builder plugin, along with their parameters and usage.
+- [ ] error_handling_guide.md: Guide on how to handle errors and exceptions that may occur during the AutoML pipeline building process.
--- a/skills/automl-pipeline-builder/scripts/README.md
+++ b/skills/automl-pipeline-builder/scripts/README.md
@@ -0,0 +1,7 @@
+# Scripts
+
+Bundled resources for automl-pipeline-builder skill
+
+- [ ] data_validation.py: Script to validate input data for the AutoML pipeline, ensuring data quality and preventing errors.
+- [ ] model_evaluation.py: Script to evaluate the performance of the trained AutoML model using various metrics and generate a report.
+- [ ] pipeline_deployment.py: Script to deploy the trained AutoML pipeline to a production environment.