Initial commit

2025-11-29 18:51:02 +08:00
commit b092d0393c
11 changed files with 371 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
+{
+  "name": "clustering-algorithm-runner",
+  "description": "Run clustering algorithms on datasets",
+  "version": "1.0.0",
+  "author": {
+    "name": "Claude Code Plugins",
+    "email": "[email protected]"
+  },
+  "skills": [
+    "./skills"
+  ],
+  "commands": [
+    "./commands"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# clustering-algorithm-runner
+
+Run clustering algorithms on datasets
--- a/commands/run-clustering.md
+++ b/commands/run-clustering.md
@@ -0,0 +1,15 @@
+---
+description: Execute AI/ML task with intelligent automation
+---
+
+# AI/ML Task Executor
+
+You are an AI/ML specialist. When this command is invoked:
+
+1. Analyze the current context and requirements
+2. Generate appropriate code for the ML task
+3. Include data validation and error handling
+4. Provide performance metrics and insights
+5. Save artifacts and generate documentation
+
+Support modern ML frameworks and best practices.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,73 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:jeremylongshore/claude-code-plugins-plus:plugins/ai-ml/clustering-algorithm-runner",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "47f18846c478d777ce5de46cd4eb5c74281fc774",
+    "treeHash": "8220835f8fadf794dd8d7e35ce95351b1a91ae359637b0d8de2a586dbe34d3eb",
+    "generatedAt": "2025-11-28T10:18:13.279231Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "clustering-algorithm-runner",
+    "description": "Run clustering algorithms on datasets",
+    "version": "1.0.0"
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "2551eeb2eb35353aee4342be637f260476b1a8c1180a657da11297265433b978"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "fb6caee3cfe7492aad704b2e0b3d0d22cfa04f7c2b9fdfc178f7c1d94b51e234"
+      },
+      {
+        "path": "commands/run-clustering.md",
+        "sha256": "043efb83e2f02fc6d0869c8a3a7388d6e49f6c809292b93dd6a97a1b142e5647"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/SKILL.md",
+        "sha256": "323a82f4d4f19da955789854afe87fb259eca7e3a9bc381991abef10e6d213ee"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/references/README.md",
+        "sha256": "67b789d21eb80edd82f976b85ebb23e6b8a5237b445badf63ad6bc3a01f9ee2a"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/scripts/README.md",
+        "sha256": "3a62057e5b841e9f86498c93dbc71495402d722a437380320c3dfd152d7818a2"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/assets/README.md",
+        "sha256": "d643f42844e86b7c280971fc15eba90a85b3f808f62a7020f2d0b447fe027ae5"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/assets/clustering_visualization.ipynb",
+        "sha256": "cae9b42b4a1f15314bf70870a214cfc8ba64cdada44087d495106282d6d3a503"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/assets/config_template.json",
+        "sha256": "6c8843546b08bde63ef5cff86624ae94f4d3fce81155311b59a2e2fc5f32885e"
+      },
+      {
+        "path": "skills/clustering-algorithm-runner/assets/example_data.csv",
+        "sha256": "abe42a4e3d54db5cfa4bd1d643c933ab1e76a88e526e3311c7d764bc79db1da5"
+      }
+    ],
+    "dirSha256": "8220835f8fadf794dd8d7e35ce95351b1a91ae359637b0d8de2a586dbe34d3eb"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/skills/clustering-algorithm-runner/SKILL.md
+++ b/skills/clustering-algorithm-runner/SKILL.md
@@ -0,0 +1,55 @@
+---
+name: running-clustering-algorithms
+description: |
+  This skill enables Claude to execute clustering algorithms on datasets. It is used when the user requests to perform clustering, identify groups within data, or analyze data structure. The skill supports algorithms like K-means, DBSCAN, and hierarchical clustering. Claude should use this skill when the user explicitly asks to "run clustering," "perform a cluster analysis," or "group data points" and provides a dataset or a way to access one. The skill also handles data validation, error handling, performance metrics, and artifact saving.
+allowed-tools: Read, Write, Edit, Grep, Glob, Bash
+version: 1.0.0
+---
+
+## Overview
+
+This skill empowers Claude to perform clustering analysis on provided datasets. It allows for automated execution of various clustering algorithms, providing insights into data groupings and structures.
+
+## How It Works
+
+1. **Analyzing the Context**: Claude analyzes the user's request to determine the dataset, desired clustering algorithm (if specified), and any specific requirements.
+2. **Generating Code**: Claude generates Python code using appropriate ML libraries (e.g., scikit-learn) to perform the clustering task, including data loading, preprocessing, algorithm execution, and result visualization.
+3. **Executing Clustering**: The generated code is executed, and the clustering algorithm is applied to the dataset.
+4. **Providing Results**: Claude presents the results, including cluster assignments, performance metrics (e.g., silhouette score, Davies-Bouldin index), and visualizations (e.g., scatter plots with cluster labels).
+
+## When to Use This Skill
+
+This skill activates when you need to:
+- Identify distinct groups within a dataset.
+- Perform a cluster analysis to understand data structure.
+- Run K-means, DBSCAN, or hierarchical clustering on a given dataset.
+
+## Examples
+
+### Example 1: Customer Segmentation
+
+User request: "Run clustering on this customer data to identify customer segments. The data is in customer_data.csv."
+
+The skill will:
+1. Load the customer_data.csv dataset.
+2. Perform K-means clustering to identify distinct customer segments based on their attributes.
+3. Provide a visualization of the customer segments and their characteristics.
+
+### Example 2: Anomaly Detection
+
+User request: "Perform DBSCAN clustering on this network traffic data to identify anomalies. The data is available at network_traffic.txt."
+
+The skill will:
+1. Load the network_traffic.txt dataset.
+2. Perform DBSCAN clustering to identify outliers representing anomalous network traffic.
+3. Report the identified anomalies and their characteristics.
+
+## Best Practices
+
+- **Data Preprocessing**: Always preprocess the data (e.g., scaling, normalization) before applying clustering algorithms to improve performance and accuracy.
+- **Algorithm Selection**: Choose the appropriate clustering algorithm based on the data characteristics and the desired outcome. K-means is suitable for spherical clusters, while DBSCAN is better for non-spherical clusters and anomaly detection.
+- **Parameter Tuning**: Tune the parameters of the clustering algorithm (e.g., number of clusters in K-means, epsilon and min_samples in DBSCAN) to optimize the results.
+
+## Integration
+
+This skill can be integrated with data loading skills to retrieve datasets from various sources. It can also be combined with visualization skills to generate insightful visualizations of the clustering results.
--- a/skills/clustering-algorithm-runner/assets/README.md
+++ b/skills/clustering-algorithm-runner/assets/README.md
@@ -0,0 +1,7 @@
+# Assets
+
+Bundled resources for clustering-algorithm-runner skill
+
+- [ ] example_data.csv: Example CSV dataset for testing clustering algorithms.
+- [ ] clustering_visualization.ipynb: Jupyter notebook demonstrating how to visualize clustering results using matplotlib and seaborn.
+- [ ] config_template.json: Template JSON file for configuring clustering algorithm parameters.
--- a/skills/clustering-algorithm-runner/assets/clustering_visualization.ipynb
+++ b/skills/clustering-algorithm-runner/assets/clustering_visualization.ipynb
@@ -0,0 +1,98 @@
+# -*- coding: utf-8 -*-
+"""
+Clustering Visualization Notebook
+
+This notebook demonstrates how to visualize clustering results using matplotlib and seaborn.
+
+It assumes that you have already run a clustering algorithm and have cluster assignments for your data.
+
+Instructions:
+1.  Replace the placeholder data loading and clustering results with your actual data.
+2.  Adjust the visualization parameters (e.g., colors, markers) to suit your data and preferences.
+3.  Experiment with different visualization techniques to gain insights into your clustering results.
+"""
+
+import matplotlib.pyplot as plt
+import seaborn as sns
+import pandas as pd
+import numpy as np  # Import numpy
+
+# --- Placeholder: Load your data here ---
+# Replace this with your actual data loading code
+# For example:
+# data = pd.read_csv("your_data.csv")
+# features = data[["feature1", "feature2"]] # Select the features used for clustering
+
+# Generate some sample data if no data is loaded
+np.random.seed(42)  # for reproducibility
+num_samples = 100
+features = pd.DataFrame({
+    'feature1': np.random.rand(num_samples),
+    'feature2': np.random.rand(num_samples)
+})
+
+# --- Placeholder: Load your clustering results here ---
+# Replace this with your actual cluster assignments
+# For example:
+# cluster_labels = model.labels_  # Assuming you used scikit-learn
+# Or:
+# cluster_labels = your_clustering_function(features)
+
+# Generate some sample cluster labels if no cluster labels are loaded
+num_clusters = 3
+cluster_labels = np.random.randint(0, num_clusters, num_samples)
+
+
+# --- Create a DataFrame for visualization ---
+df = features.copy()
+df['cluster'] = cluster_labels
+
+# --- Visualization using matplotlib ---
+plt.figure(figsize=(8, 6))
+plt.title("Clustering Visualization (Matplotlib)")
+
+# Define colors for each cluster
+colors = ['red', 'green', 'blue', 'purple', 'orange'] # Add more colors if needed
+
+for cluster_id in df['cluster'].unique():
+    cluster_data = df[df['cluster'] == cluster_id]
+    plt.scatter(cluster_data['feature1'], cluster_data['feature2'],
+                color=colors[cluster_id % len(colors)],  # Cycle through colors
+                label=f'Cluster {cluster_id}')
+
+plt.xlabel("Feature 1")
+plt.ylabel("Feature 2")
+plt.legend()
+plt.show()
+
+
+# --- Visualization using seaborn ---
+plt.figure(figsize=(8, 6))
+plt.title("Clustering Visualization (Seaborn)")
+sns.scatterplot(x='feature1', y='feature2', hue='cluster', data=df, palette='viridis') # or other palettes
+plt.show()
+
+# --- Additional Visualizations ---
+# You can add more visualizations here, such as:
+# - Pair plots
+# - Box plots
+# - Histograms
+
+# Example: Pair plot
+# sns.pairplot(df, hue='cluster')
+# plt.show()
+
+# --- Summary and Interpretation ---
+# Add your interpretation of the clustering results here.
+# For example:
+# "The clustering algorithm has successfully separated the data into distinct groups based on feature1 and feature2."
+# "Cluster 0 represents the group with low values for both feature1 and feature2."
+# "Cluster 1 represents the group with high values for feature1 and low values for feature2."
+# "Cluster 2 represents the group with high values for both feature1 and feature2."
+
+# --- Next Steps ---
+# Consider the following next steps:
+# - Evaluate the clustering performance using metrics like silhouette score or Davies-Bouldin index.
+# - Tune the clustering algorithm parameters to improve the results.
+# - Investigate the characteristics of each cluster to gain insights into the data.
+# - Use the clustering results for downstream tasks, such as customer segmentation or anomaly detection.
--- a/skills/clustering-algorithm-runner/assets/config_template.json
+++ b/skills/clustering-algorithm-runner/assets/config_template.json
@@ -0,0 +1,63 @@
+{
+  "_comment": "Configuration template for clustering algorithms",
+  "dataset_name": "example_dataset.csv",
+  "_comment": "Path to the dataset file (CSV format)",
+  "algorithm": "KMeans",
+  "_comment": "Clustering algorithm to use (e.g., KMeans, DBSCAN, AgglomerativeClustering)",
+  "algorithm_params": {
+    "_comment": "Parameters specific to the chosen algorithm",
+    "KMeans": {
+      "n_clusters": 3,
+      "_comment": "Number of clusters for KMeans",
+      "init": "k-means++",
+      "_comment": "Initialization method for KMeans",
+      "max_iter": 300,
+      "_comment": "Maximum number of iterations for KMeans"
+    },
+    "DBSCAN": {
+      "eps": 0.5,
+      "_comment": "Maximum distance between two samples for DBSCAN to be considered as in the same neighborhood",
+      "min_samples": 5,
+      "_comment": "The number of samples in a neighborhood for a point to be considered as a core point for DBSCAN"
+    },
+    "AgglomerativeClustering": {
+      "n_clusters": 3,
+      "_comment": "The number of clusters to find for AgglomerativeClustering",
+      "linkage": "ward",
+      "_comment": "Which linkage criterion to use for AgglomerativeClustering. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.",
+      "affinity": "euclidean"
+      "_comment": "Metric used to compute the linkage. Can be 'euclidean', 'l1', 'l2', 'manhattan', 'cosine', or 'precomputed'. If linkage is 'ward', only 'euclidean' is accepted."
+    }
+  },
+  "preprocessing": {
+    "_comment": "Preprocessing steps to apply to the dataset",
+    "scale_data": true,
+    "_comment": "Whether to scale the data using StandardScaler",
+    "handle_missing_values": "impute",
+    "_comment": "How to handle missing values ('drop', 'impute', or null for none)",
+    "imputation_strategy": "mean"
+    "_comment": "Strategy to use for imputation (e.g., 'mean', 'median', 'most_frequent')"
+  },
+  "output": {
+    "_comment": "Output configuration",
+    "output_file": "clustering_results.csv",
+    "_comment": "Name of the output CSV file to store the cluster assignments",
+    "include_original_data": true
+    "_comment": "Whether to include the original data in the output file along with cluster assignments"
+  },
+  "evaluation_metrics": [
+    "silhouette_score",
+    "_comment": "Silhouette Score (requires ground truth labels)",
+    "davies_bouldin_score",
+    "_comment": "Davies-Bouldin Index",
+    "calinski_harabasz_score"
+    "_comment": "Calinski-Harabasz Index"
+  ],
+  "advanced_options": {
+    "_comment": "Advanced options for controlling the execution",
+    "random_state": 42,
+    "_comment": "Random seed for reproducibility",
+    "n_jobs": -1,
+    "_comment": "Number of CPU cores to use (-1 for all cores)"
+  }
+}
--- a/skills/clustering-algorithm-runner/assets/example_data.csv
+++ b/skills/clustering-algorithm-runner/assets/example_data.csv
@@ -0,0 +1,24 @@
+# Example CSV data for clustering algorithm testing.
+#
+# This file demonstrates the expected format for input data to the clustering algorithms.
+# Each row represents a data point, and each column represents a feature.
+#
+# Replace the placeholder data below with your own dataset.
+# You can add or remove columns as needed, but ensure the column headers are descriptive.
+#
+# Note: The first row MUST be the header row containing column names.
+#
+# Example usage:
+# 1. Save this file as 'example_data.csv' or a similar descriptive name.
+# 2. Run the clustering algorithm using the command: /run-clustering
+#    (The plugin will prompt you for the data file name if it isn't the default).
+
+feature_1,feature_2,feature_3
+1.0,2.0,3.0
+4.0,5.0,6.0
+7.0,8.0,9.0
+10.0,11.0,12.0
+13.0,14.0,15.0
+# Add more data points here...
+# Example: 16.0,17.0,18.0
+# Example: 19.0,20.0,21.0
--- a/skills/clustering-algorithm-runner/references/README.md
+++ b/skills/clustering-algorithm-runner/references/README.md
@@ -0,0 +1,9 @@
+# References
+
+Bundled resources for clustering-algorithm-runner skill
+
+- [ ] kmeans_algorithm.md: Detailed explanation of the K-means algorithm, its parameters, and best practices.
+- [ ] dbscan_algorithm.md: Detailed explanation of the DBSCAN algorithm, its parameters, and best practices.
+- [ ] hierarchical_clustering.md: Detailed explanation of hierarchical clustering, its parameters, and best practices.
+- [ ] clustering_metrics.md: Explanation of various clustering metrics (silhouette score, Davies-Bouldin index) and how to interpret them.
+- [ ] data_preprocessing.md: Best practices for data preprocessing before clustering, including scaling and normalization.
--- a/skills/clustering-algorithm-runner/scripts/README.md
+++ b/skills/clustering-algorithm-runner/scripts/README.md
@@ -0,0 +1,9 @@
+# Scripts
+
+Bundled resources for clustering-algorithm-runner skill
+
+- [ ] run_kmeans.py: Executes K-means clustering with configurable parameters.
+- [ ] run_dbscan.py: Executes DBSCAN clustering with configurable parameters.
+- [ ] run_hierarchical.py: Executes hierarchical clustering with configurable parameters.
+- [ ] data_loader.py: Loads data from various formats (CSV, JSON) into a standardized format for clustering.
+- [ ] cluster_evaluator.py: Evaluates the performance of clustering algorithms using metrics like silhouette score and Davies-Bouldin index.