Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:51:40 +08:00
commit 9174377a09
9 changed files with 344 additions and 0 deletions

View File

@@ -0,0 +1,55 @@
---
name: evaluating-machine-learning-models
description: |
This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".
allowed-tools: Read, Write, Edit, Grep, Glob, Bash
version: 1.0.0
---
## Overview
This skill empowers Claude to perform thorough evaluations of machine learning models, providing detailed performance insights. It leverages the `model-evaluation-suite` plugin to generate a range of metrics, enabling informed decisions about model selection and optimization.
## How It Works
1. **Analyzing Context**: Claude analyzes the user's request to identify the model to be evaluated and any specific metrics of interest.
2. **Executing Evaluation**: Claude uses the `/eval-model` command to initiate the model evaluation process within the `model-evaluation-suite` plugin.
3. **Presenting Results**: Claude presents the generated metrics and insights to the user, highlighting key performance indicators and potential areas for improvement.
## When to Use This Skill
This skill activates when you need to:
- Assess the performance of a machine learning model.
- Compare the performance of multiple models.
- Identify areas where a model can be improved.
- Validate a model's performance before deployment.
## Examples
### Example 1: Evaluating Model Accuracy
User request: "Evaluate the accuracy of my image classification model."
The skill will:
1. Invoke the `/eval-model` command.
2. Analyze the model's performance on a held-out dataset.
3. Report the accuracy score and other relevant metrics.
### Example 2: Comparing Model Performance
User request: "Compare the F1-score of model A and model B."
The skill will:
1. Invoke the `/eval-model` command for both models.
2. Extract the F1-score from the evaluation results.
3. Present a comparison of the F1-scores for model A and model B.
## Best Practices
- **Specify Metrics**: Clearly define the specific metrics of interest for the evaluation.
- **Data Validation**: Ensure the data used for evaluation is representative of the real-world data the model will encounter.
- **Interpret Results**: Provide context and interpretation of the evaluation results to facilitate informed decision-making.
## Integration
This skill integrates seamlessly with the `model-evaluation-suite` plugin, providing a comprehensive solution for model evaluation within the Claude Code environment. It can be combined with other skills to build automated machine learning workflows.

View File

@@ -0,0 +1,7 @@
# Assets
Bundled resources for model-evaluation-suite skill
- [ ] evaluation_template.md: Template for generating evaluation reports with placeholders for metrics and visualizations.
- [ ] example_dataset.csv: Example dataset for testing the evaluation process.
- [ ] visualization_script.py: Script to generate visualizations of model performance metrics.

View File

@@ -0,0 +1,170 @@
#!/usr/bin/env python3
"""
visualization_script.py
This script generates visualizations of model performance metrics.
It supports various plot types and data formats.
Example Usage:
To generate a scatter plot of predicted vs. actual values:
python visualization_script.py --plot_type scatter --actual_values actual.csv --predicted_values predicted.csv --output scatter_plot.png
To generate a histogram of errors:
python visualization_script.py --plot_type histogram --errors errors.csv --output error_histogram.png
"""
import argparse
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
def generate_scatter_plot(actual_values_path, predicted_values_path, output_path):
"""
Generates a scatter plot of actual vs. predicted values.
Args:
actual_values_path (str): Path to the CSV file containing actual values.
predicted_values_path (str): Path to the CSV file containing predicted values.
output_path (str): Path to save the generated plot.
"""
try:
actual_values = pd.read_csv(actual_values_path).values.flatten()
predicted_values = pd.read_csv(predicted_values_path).values.flatten()
plt.figure(figsize=(10, 8))
sns.scatterplot(x=actual_values, y=predicted_values)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs. Predicted Values")
plt.savefig(output_path)
plt.close()
print(f"Scatter plot saved to {output_path}")
except FileNotFoundError as e:
print(f"Error: File not found: {e}")
except Exception as e:
print(f"Error generating scatter plot: {e}")
def generate_histogram(errors_path, output_path):
"""
Generates a histogram of errors.
Args:
errors_path (str): Path to the CSV file containing errors.
output_path (str): Path to save the generated plot.
"""
try:
errors = pd.read_csv(errors_path).values.flatten()
plt.figure(figsize=(10, 8))
sns.histplot(errors, kde=True) # Add kernel density estimate
plt.xlabel("Error")
plt.ylabel("Frequency")
plt.title("Distribution of Errors")
plt.savefig(output_path)
plt.close()
print(f"Histogram saved to {output_path}")
except FileNotFoundError as e:
print(f"Error: File not found: {e}")
except Exception as e:
print(f"Error generating histogram: {e}")
def generate_residual_plot(actual_values_path, predicted_values_path, output_path):
"""
Generates a residual plot.
Args:
actual_values_path (str): Path to the CSV file containing actual values.
predicted_values_path (str): Path to the CSV file containing predicted values.
output_path (str): Path to save the generated plot.
"""
try:
actual_values = pd.read_csv(actual_values_path).values.flatten()
predicted_values = pd.read_csv(predicted_values_path).values.flatten()
residuals = actual_values - predicted_values
plt.figure(figsize=(10, 8))
sns.scatterplot(x=predicted_values, y=residuals)
plt.xlabel("Predicted Values")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.axhline(y=0, color='r', linestyle='--') # Add a horizontal line at y=0
plt.savefig(output_path)
plt.close()
print(f"Residual plot saved to {output_path}")
except FileNotFoundError as e:
print(f"Error: File not found: {e}")
except Exception as e:
print(f"Error generating residual plot: {e}")
def main():
"""
Main function to parse arguments and generate visualizations.
"""
parser = argparse.ArgumentParser(
description="Generate visualizations of model performance metrics."
)
parser.add_argument(
"--plot_type",
type=str,
required=True,
choices=["scatter", "histogram", "residual"],
help="Type of plot to generate (scatter, histogram, residual).",
)
parser.add_argument(
"--actual_values",
type=str,
help="Path to the CSV file containing actual values (required for scatter and residual plots).",
)
parser.add_argument(
"--predicted_values",
type=str,
help="Path to the CSV file containing predicted values (required for scatter and residual plots).",
)
parser.add_argument(
"--errors",
type=str,
help="Path to the CSV file containing errors (required for histogram).",
)
parser.add_argument(
"--output", type=str, required=True, help="Path to save the generated plot."
)
args = parser.parse_args()
if args.plot_type == "scatter":
if not args.actual_values or not args.predicted_values:
print(
"Error: --actual_values and --predicted_values are required for scatter plots."
)
return
generate_scatter_plot(args.actual_values, args.predicted_values, args.output)
elif args.plot_type == "histogram":
if not args.errors:
print("Error: --errors is required for histograms.")
return
generate_histogram(args.errors, args.output)
elif args.plot_type == "residual":
if not args.actual_values or not args.predicted_values:
print(
"Error: --actual_values and --predicted_values are required for residual plots."
)
return
generate_residual_plot(args.actual_values, args.predicted_values, args.output)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,7 @@
# References
Bundled resources for model-evaluation-suite skill
- [ ] metrics_definitions.md: Detailed definitions and explanations of all supported evaluation metrics.
- [ ] dataset_schemas.md: Schemas for supported datasets, including required fields and data types.
- [ ] model_api_documentation.md: Documentation for the model API, including input/output formats and authentication details.

View File

@@ -0,0 +1,7 @@
# Scripts
Bundled resources for model-evaluation-suite skill
- [ ] evaluate_model.py: Script to execute model evaluation using specified metrics and datasets.
- [ ] data_loader.py: Script to load model and datasets for evaluation.
- [ ] metrics_calculator.py: Script to calculate evaluation metrics based on model predictions and ground truth.