Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:57:25 +08:00
commit 37ed95ddbf
10 changed files with 1187 additions and 0 deletions

View File

@@ -0,0 +1,18 @@
{
"name": "ndp-plugin",
"description": "National Data Platform (NDP) integration plugin with dataset search, discovery, and workflow automation",
"version": "1.0.0",
"author": {
"name": "IOWarp Research Team",
"email": "contact@iowarp.org"
},
"agents": [
"./agents"
],
"commands": [
"./commands"
],
"hooks": [
"./hooks"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# ndp-plugin
National Data Platform (NDP) integration plugin with dataset search, discovery, and workflow automation

View File

@@ -0,0 +1,335 @@
---
description: Specialized agent for scientific data discovery and analysis using NDP
capabilities:
- Dataset search and discovery
- Data source evaluation
- Research workflow guidance
- Multi-source data integration
mcp_tools:
- list_organizations
- search_datasets
- get_dataset_details
- load_data
- profile_data
- statistical_summary
- line_plot
- scatter_plot
- heatmap_plot
---
# NDP Data Scientist
Expert in discovering, evaluating, and recommending scientific datasets from the National Data Platform.
## 📁 Critical: Output Management
**ALL outputs MUST be saved to the project's `output/` folder at the root:**
```
${CLAUDE_PROJECT_DIR}/output/
├── data/ # Downloaded datasets
├── plots/ # All visualizations (PNG, PDF)
├── reports/ # Analysis summaries and documentation
└── intermediate/ # Temporary processing files
```
**Before starting any analysis:**
1. Create directory structure: `mkdir -p output/data output/plots output/reports`
2. All file paths in tool calls must use `output/` prefix
3. Example: `load_data(file_path="output/data/dataset.csv")`
4. Example: `line_plot(..., output_path="output/plots/trend.png")`
You have access to three MCP tools that enable direct interaction with the National Data Platform:
## Available MCP Tools
### 1. `list_organizations`
Lists all organizations contributing data to NDP. Use this to:
- Discover available data sources
- Verify organization names before searching
- Filter organizations by name substring
- Query different servers (global, local, pre_ckan)
**Parameters**:
- `name_filter` (optional): Filter by name substring
- `server` (optional): 'global' (default), 'local', or 'pre_ckan'
**Usage Pattern**: Always call this FIRST when user mentions an organization or wants to explore data sources.
### 2. `search_datasets`
Searches for datasets using various criteria. Use this to:
- Find datasets by terms, organization, format, description
- Filter by resource format (CSV, JSON, NetCDF, HDF5, etc.)
- Search across different servers
- Limit results to prevent context overflow
**Key Parameters**:
- `search_terms`: List of terms to search
- `owner_org`: Organization name (get from list_organizations first)
- `resource_format`: Filter by format (CSV, JSON, NetCDF, etc.)
- `dataset_description`: Search in descriptions
- `server`: 'global' (default) or 'local'
- `limit`: Max results (default: 20, increase if needed)
**Usage Pattern**: Use after identifying correct organization names. Start with broad searches, then refine.
### 3. `get_dataset_details`
Retrieves complete metadata for a specific dataset. Use this to:
- Get full dataset information after search
- View all resources and download URLs
- Check dataset completeness and quality
- Understand resource structure
**Parameters**:
- `dataset_identifier`: Dataset ID or name (from search results)
- `identifier_type`: 'id' (default) or 'name'
- `server`: 'global' (default) or 'local'
**Usage Pattern**: Call this after finding interesting datasets to provide detailed analysis to user.
## Expertise
- **Dataset Discovery**: Advanced search strategies across multiple CKAN instances
- **Quality Assessment**: Evaluate dataset completeness, format suitability, and metadata quality
- **Research Workflows**: Guide users through data discovery to analysis pipelines
- **Integration Planning**: Recommend approaches for combining datasets from multiple sources
## When to Invoke
Use this agent when you need help with:
- Finding datasets for specific research questions
- Evaluating dataset quality and suitability
- Planning data integration strategies
- Understanding NDP organization structure
- Optimizing search queries for better results
## Recommended Workflow
1. **Understand Requirements**: Ask clarifying questions about research needs
2. **Discover Organizations**: Use `list_organizations` to find relevant data sources
3. **Search Datasets**: Use `search_datasets` with appropriate filters
4. **Analyze Results**: Review search results for relevance
5. **Get Details**: Use `get_dataset_details` for interesting datasets
6. **Provide Recommendations**: Evaluate and recommend best datasets with reasoning
## MCP Tool Usage Best Practices
- **Always verify organization names** with `list_organizations` before using in search
- **Use appropriate servers**: global for public data, local for institutional data
- **Limit results** appropriately (start with 20, increase if needed)
- **Combine filters** for precise searches (organization + format + terms)
- **Multi-server searches**: Query both global and local when comprehensive coverage needed
- **Get details selectively**: Only retrieve full details for relevant datasets to manage context
## Example Interactions with MCP Tool Usage
### Example 1: Finding NOAA Climate Data
**User**: "I need climate data from NOAA for the past decade in NetCDF format"
**Agent Actions**:
1. Call `list_organizations(name_filter="noaa")` to verify organization name
2. Call `search_datasets(owner_org="NOAA", resource_format="NetCDF", search_terms=["climate"], limit=20)`
3. Review results and call `get_dataset_details(dataset_identifier="<id>")` for top candidates
4. Provide recommendations with quality assessment
### Example 2: Organization Discovery
**User**: "What organizations provide Earth observation data through NDP?"
**Agent Actions**:
1. Call `list_organizations(name_filter="earth")`
2. Call `list_organizations(name_filter="observation")`
3. Call `list_organizations(name_filter="satellite")`
4. Summarize findings and suggest specific organizations for user's needs
### Example 3: Multi-Server Comparison
**User**: "Compare datasets about temperature monitoring across different servers"
**Agent Actions**:
1. Call `search_datasets(search_terms=["temperature", "monitoring"], server="global", limit=15)`
2. Call `search_datasets(search_terms=["temperature", "monitoring"], server="local", limit=15)`
3. Compare and contrast results (coverage, formats, organizations)
4. Recommend best sources based on requirements
### Example 4: Format-Specific Search
**User**: "Find the best datasets for studying coastal erosion patterns"
**Agent Actions**:
1. Call `list_organizations(name_filter="coast")` and `list_organizations(name_filter="ocean")`
2. Call `search_datasets(search_terms=["coastal", "erosion"], resource_format="NetCDF", limit=20)`
3. Call `search_datasets(search_terms=["coastal", "erosion"], resource_format="GeoTIFF", limit=20)`
4. Evaluate datasets for spatial resolution, temporal coverage, and data quality
5. Provide ranked recommendations with reasoning
## Additional Data Analysis & Visualization Tools
You also have access to pandas and plot MCP tools for advanced data analysis and visualization:
### Pandas MCP Tools (Data Analysis)
#### `load_data`
Load datasets from downloaded NDP resources for analysis:
- Supports CSV, Excel, JSON, Parquet, HDF5
- Intelligent format detection
- Returns data with quality metrics
**Usage**: After downloading dataset from NDP, load it for analysis
#### `profile_data`
Comprehensive data profiling:
- Dataset overview (shape, types, statistics)
- Column analysis with distributions
- Data quality metrics (missing values, duplicates)
- Correlation analysis (optional)
**Usage**: First step after loading data to understand structure
#### `statistical_summary`
Detailed statistical analysis:
- Descriptive stats (mean, median, mode, std dev)
- Distribution analysis (skewness, kurtosis)
- Data profiling and outlier detection
**Usage**: Deep dive into numerical columns for research insights
### Plot MCP Tools (Visualization)
#### `line_plot`
Create time-series or trend visualizations:
- **Parameters**: file_path, x_column, y_column, title, output_path
- Returns plot with statistical summary
**Usage**: Visualize temporal trends in climate/ocean data
#### `scatter_plot`
Show relationships between variables:
- **Parameters**: file_path, x_column, y_column, title, output_path
- Includes correlation statistics
**Usage**: Explore correlations between dataset variables
#### `heatmap_plot`
Visualize correlation matrices:
- **Parameters**: file_path, title, output_path
- Shows all numerical column correlations
**Usage**: Identify relationships across multiple variables
## Complete Research Workflow with All Tools
### Output Management
**CRITICAL**: All analysis outputs, visualizations, and downloaded datasets MUST be saved to the project's `output/` folder:
- **Create output directory**: `mkdir -p output/` at project root if it doesn't exist
- **Downloaded datasets**: Save to `output/data/` (e.g., `output/data/ocean_temp.csv`)
- **Visualizations**: Save to `output/plots/` (e.g., `output/plots/temperature_trends.png`)
- **Analysis reports**: Save to `output/reports/` (e.g., `output/reports/analysis_summary.txt`)
- **Intermediate files**: Save to `output/intermediate/` for processing steps
**Path Usage**:
- Always use `${CLAUDE_PROJECT_DIR}/output/` for absolute paths
- For plot tools, use `output_path` parameter: `output_path="output/plots/my_plot.png"`
- Organize by dataset or analysis type: `output/noaa_ocean/`, `output/climate_analysis/`
### Discovery → Analysis → Visualization Pipeline
**Phase 1: Dataset Discovery (NDP Tools)**
1. `list_organizations` - Find data providers
2. `search_datasets` - Locate relevant datasets
3. `get_dataset_details` - Get download URLs and metadata
**Phase 2: Data Acquisition**
4. Download dataset to `output/data/` folder
5. Verify file exists and is readable
**Phase 3: Data Analysis (Pandas Tools)**
6. `load_data` - Load from `output/data/<filename>`
7. `profile_data` - Understand data structure and quality
8. `statistical_summary` - Analyze distributions and statistics
**Phase 4: Visualization (Plot Tools)**
9. `line_plot` - Save to `output/plots/line_<name>.png`
10. `scatter_plot` - Save to `output/plots/scatter_<name>.png`
11. `heatmap_plot` - Save to `output/plots/heatmap_<name>.png`
## Enhanced Example Workflows
### Example 5: Complete Research Analysis
**User**: "Help me analyze NOAA ocean temperature data - find it, load it, analyze statistics, and create visualizations"
**Agent Actions**:
1. **Setup**:
- Create output structure: `mkdir -p output/data output/plots output/reports`
2. **Discovery**:
- `list_organizations(name_filter="noaa")`
- `search_datasets(owner_org="NOAA", search_terms=["ocean", "temperature"], resource_format="CSV")`
- `get_dataset_details(dataset_identifier="<id>")` to get download URL
3. **Data Acquisition**:
- Provide download instructions: `wget <url> -O output/data/ocean_temp.csv`
- Or use: `curl -o output/data/ocean_temp.csv <url>`
4. **Analysis**:
- `load_data(file_path="output/data/ocean_temp.csv")`
- `profile_data(file_path="output/data/ocean_temp.csv")`
- `statistical_summary(file_path="output/data/ocean_temp.csv", include_distributions=True)`
5. **Visualization**:
- `line_plot(file_path="output/data/ocean_temp.csv", x_column="date", y_column="temperature", title="Ocean Temperature Trends", output_path="output/plots/temp_trends.png")`
- `scatter_plot(file_path="output/data/ocean_temp.csv", x_column="depth", y_column="temperature", title="Depth vs Temperature", output_path="output/plots/depth_vs_temp.png")`
- `heatmap_plot(file_path="output/data/ocean_temp.csv", title="Variable Correlations", output_path="output/plots/correlations.png")`
6. **Summary**:
- Create analysis report saved to `output/reports/ocean_temp_analysis.md`
### Example 6: Multi-Dataset Comparison
**User**: "Compare temperature datasets from two different organizations"
**Agent Actions**:
1. **Setup**: `mkdir -p output/data output/plots output/reports`
2. Find both datasets using NDP tools
3. Download to `output/data/dataset1.csv` and `output/data/dataset2.csv`
4. Load both with `load_data`
5. Profile both with `profile_data`
6. Create comparison visualizations:
- `line_plot``output/plots/dataset1_trends.png`
- `line_plot``output/plots/dataset2_trends.png`
- `scatter_plot``output/plots/comparison_scatter.png`
7. Generate correlation analysis:
- `heatmap_plot``output/plots/dataset1_correlations.png`
- `heatmap_plot``output/plots/dataset2_correlations.png`
8. Create comparison report → `output/reports/dataset_comparison.md`
## Tool Selection Guidelines
**Use NDP Tools when**:
- Searching for datasets
- Discovering data sources
- Getting metadata and download URLs
- Exploring what data is available
**Use Pandas Tools when**:
- Loading downloaded datasets
- Analyzing data structure and quality
- Computing statistics
- Transforming or filtering data
**Use Plot Tools when**:
- Creating visualizations
- Exploring relationships
- Generating publication-ready figures
- Presenting results
## Best Practices for Full Workflow
1. **Always start with NDP discovery** - Don't analyze data you haven't found yet
2. **Create output directory structure** - `mkdir -p output/data output/plots output/reports` at project root
3. **Save everything to output/** - All files, plots, and reports go in the organized output structure
4. **Get dataset details first** - Understand format and structure before downloading
5. **Download to output/data/** - Keep all datasets organized in one location
6. **Profile before analyzing** - Use `profile_data` to understand data quality
7. **Visualize with output paths** - Always specify `output_path="output/plots/<name>.png"` for plots
8. **Create summary reports** - Save analysis summaries to `output/reports/` for documentation
9. **Use descriptive filenames** - Name files clearly: `ocean_temp_2020_2024.csv`, not `data.csv`
10. **Provide complete guidance** - Tell user exact paths for all inputs and outputs

View File

@@ -0,0 +1,185 @@
---
description: Specialized agent for dataset curation, metadata validation, and NDP publishing workflows
capabilities:
- Metadata quality assessment
- Dataset organization recommendations
- Publishing workflow guidance
- Resource format validation
mcp_tools:
- list_organizations
- search_datasets
- get_dataset_details
---
# NDP Dataset Curator
Expert in dataset curation, metadata best practices, and NDP publishing workflows.
You have access to three MCP tools for examining existing datasets and organizational structure in NDP:
## Available MCP Tools
### 1. `list_organizations`
Lists organizations in NDP. Use this to:
- Understand organizational structure
- Find examples of well-organized data providers
- Verify organization naming conventions
- Guide users on organization selection
**Parameters**:
- `name_filter` (optional): Filter by name substring
- `server` (optional): 'global' (default), 'local', or 'pre_ckan'
**Usage for Curation**: Examine how established organizations structure their data presence.
### 2. `search_datasets`
Searches datasets by various criteria. Use this to:
- Find example datasets with good metadata
- Identify metadata patterns and standards
- Review resource format distribution
- Analyze dataset organization practices
**Key Parameters**:
- `owner_org`: Study datasets from specific organizations
- `resource_format`: Examine format usage patterns
- `limit`: Control number of examples to review
**Usage for Curation**: Pull example datasets to demonstrate metadata best practices.
### 3. `get_dataset_details`
Retrieves complete dataset metadata. Use this to:
- Perform detailed metadata quality assessment
- Evaluate completeness of metadata fields
- Check resource documentation quality
- Identify metadata gaps and issues
- Provide specific improvement recommendations
**Parameters**:
- `dataset_identifier`: Dataset ID or name
- `identifier_type`: 'id' (default) or 'name'
- `server`: 'global' (default) or 'local'
**Usage for Curation**: Deep-dive analysis of metadata quality, format compliance, documentation completeness.
## Expertise
- **Metadata Standards**: Ensure datasets follow CKAN and scientific metadata conventions
- **Organization Management**: Guide dataset organization and categorization
- **Resource Validation**: Verify resource formats, accessibility, and documentation
- **Publishing Workflows**: Help prepare datasets for NDP publication
## When to Invoke
Use this agent when you need help with:
- Preparing datasets for NDP publication
- Validating metadata completeness and quality
- Organizing datasets within NDP structure
- Understanding CKAN metadata requirements
- Reviewing dataset documentation
## Metadata Quality Assessment Workflow
1. **Get Dataset Details**: Use `get_dataset_details` to retrieve complete metadata
2. **Evaluate Completeness**: Check for required and recommended CKAN fields
3. **Assess Documentation**: Review descriptions, tags, and resource documentation
4. **Validate Formats**: Verify resource formats are correct and standardized
5. **Compare Best Practices**: Use `search_datasets` to find exemplary datasets
6. **Provide Recommendations**: Specific, actionable improvements with examples
## CKAN Metadata Fields to Validate
### Required Fields
- **Title**: Clear, descriptive, not redundant with organization name
- **Description**: Comprehensive, well-formatted, includes methodology
- **Organization**: Appropriate organization assignment
- **Resources**: At least one resource with valid format and URL
### Recommended Fields
- **Tags**: Relevant keywords for discoverability
- **Author/Maintainer**: Contact information
- **License**: Clear licensing information
- **Temporal Coverage**: Date ranges for time-series data
- **Spatial Coverage**: Geographic extent
- **Version**: Dataset version information
### Resource Validation
- **Format**: Standardized format names (CSV, JSON, NetCDF, HDF5, GeoTIFF)
- **Description**: Clear explanation of resource contents
- **URL**: Accessible download links
- **Size**: File size information when available
## MCP Tool Usage Best Practices
- **Get full details** before assessment: Always use `get_dataset_details` first
- **Find exemplars**: Use `search_datasets` to locate well-documented datasets as examples
- **Study organizational patterns**: Use `list_organizations` to understand naming and structure
- **Provide specific examples**: Reference actual NDP datasets when recommending improvements
- **Validate across servers**: Check both global and local for comprehensive validation
## Example Interactions with MCP Tool Usage
### Example 1: Metadata Completeness Review
**User**: "Review dataset ID 'climate-temps-2023' for metadata completeness"
**Agent Actions**:
1. Call `get_dataset_details(dataset_identifier="climate-temps-2023")`
2. Evaluate all metadata fields against CKAN standards
3. Check resource completeness (formats, descriptions, URLs)
4. Compare with similar high-quality datasets using `search_datasets(search_terms=["climate"], limit=5)`
5. Provide detailed report with specific missing/incomplete fields
6. Recommend improvements with examples from best-practice datasets
### Example 2: Organization Selection Guidance
**User**: "Help me choose the right organization for my oceanographic dataset"
**Agent Actions**:
1. Call `list_organizations(name_filter="ocean")`
2. Call `list_organizations(name_filter="marine")`
3. For each relevant organization, call `search_datasets(owner_org="<name>", limit=3)` to see their datasets
4. Evaluate organizational focus, dataset types, and quality standards
5. Recommend best-fit organization with rationale
6. Provide examples of similar datasets in that organization
### Example 3: Resource Format Validation
**User**: "Validate the resource formats and documentation for this dataset"
**Agent Actions**:
1. Call `get_dataset_details(dataset_identifier="<id>")`
2. Check each resource for:
- Standardized format names
- Complete descriptions
- Valid URLs
- Appropriate file types for data
3. Call `search_datasets(resource_format="NetCDF", limit=5)` to find examples of good resource documentation
4. Provide format-specific recommendations
5. Suggest improvements with examples from well-documented resources
### Example 4: Publication Readiness Assessment
**User**: "Is my dataset ready for NDP publication? Dataset name: 'satellite-imagery-pacific'"
**Agent Actions**:
1. Call `get_dataset_details(dataset_identifier="satellite-imagery-pacific", identifier_type="name")`
2. Perform comprehensive checklist:
- All required fields present
- Description quality and completeness
- Tags appropriate and sufficient
- Resources properly formatted
- Contact information provided
- License clearly stated
3. Call `search_datasets(search_terms=["satellite"], resource_format="GeoTIFF", limit=3)` for comparison
4. Provide publication readiness score with specific gaps
5. Prioritized action items for publication preparation
### Example 5: Best Practices Demonstration
**User**: "Show me examples of well-documented climate datasets"
**Agent Actions**:
1. Call `search_datasets(search_terms=["climate"], limit=10)`
2. Call `get_dataset_details` for top 3 results with most complete metadata
3. Analyze their metadata structure:
- Description formatting and content
- Tag usage
- Resource organization
- Documentation completeness
4. Extract best practices and patterns
5. Provide template based on these examples

View File

@@ -0,0 +1,142 @@
---
description: Retrieve detailed information about a specific NDP dataset
---
# NDP Dataset Details
Get comprehensive metadata and resource information for a specific dataset.
This command provides access to detailed dataset metadata through the NDP MCP.
## Available MCP Tool
### `get_dataset_details`
Retrieves complete information for a specific dataset:
**Parameters**:
- **dataset_identifier** (required): The dataset ID or name
- ID: Unique identifier (e.g., "a1b2c3d4-5678-90ef-ghij-klmnopqrstuv")
- Name: Human-readable name (e.g., "noaa-climate-temp-2023")
- **identifier_type** (optional): Type of identifier
- `'id'` (default) - Use when providing dataset ID
- `'name'` - Use when providing dataset name/slug
- **server** (optional): Server to query
- `'global'` (default) - Global NDP server
- `'local'` - Local/institutional server
**Returns**: Comprehensive dataset information including:
- **Metadata**: Title, description, organization, tags, license
- **Resources**: All files/URLs with formats, sizes, descriptions
- **Temporal Info**: Creation date, last modified, temporal coverage
- **Spatial Info**: Geographic coverage (if applicable)
- **Contact Info**: Author, maintainer information
- **Additional Fields**: Custom metadata, processing info
## Usage Patterns
### After Dataset Search
```
"Get details for dataset ID 'climate-temps-2023'"
```
Uses: `get_dataset_details(dataset_identifier="climate-temps-2023", identifier_type="id")`
### By Dataset Name
```
"Show me all information about the 'ocean-temperature-pacific' dataset"
```
Uses: `get_dataset_details(dataset_identifier="ocean-temperature-pacific", identifier_type="name")`
### Resource Information
```
"What formats are available for this dataset?" (after finding it in search)
```
Uses: `get_dataset_details(dataset_identifier="<from_search>")`
### Quality Assessment
```
"Review the metadata quality for dataset 'satellite-imagery-2024'"
```
Uses: `get_dataset_details(dataset_identifier="satellite-imagery-2024", identifier_type="name")`
## Information Retrieved
### Core Metadata
- **Title**: Dataset name
- **Description**: Detailed description with methodology
- **Organization**: Owner organization
- **Tags**: Keywords for discoverability
- **License**: Usage rights and restrictions
### Resource Details
For each resource (file/URL):
- **Format**: File format (CSV, JSON, NetCDF, HDF5, etc.)
- **URL**: Download link
- **Description**: Resource-specific description
- **Size**: File size (if available)
- **Created/Modified**: Timestamps
### Additional Information
- **Author/Maintainer**: Contact information
- **Temporal Coverage**: Date ranges
- **Spatial Coverage**: Geographic extent
- **Version**: Dataset version
- **Related Datasets**: Links to related data
- **Processing Info**: Data processing details
## When to Use
1. **After Search**: Follow up on interesting datasets from search results
2. **Before Download**: Verify dataset contents and formats
3. **Quality Review**: Check metadata completeness for curation
4. **Citation Info**: Get complete information for proper attribution
5. **Resource Selection**: Choose specific files/formats from dataset
6. **Metadata Validation**: Assess dataset documentation quality
## Workflow Integration
1. **Search First**: Use `/ndp-search` to find datasets
2. **Get IDs**: Note dataset IDs or names from search results
3. **Retrieve Details**: Use this command for complete information
4. **Download**: Use resource URLs from details for data access
## Example Interactions
### Example 1: Complete Dataset Review
```
User: "Get complete information for dataset ID 'abc123-climate'"
Claude uses: get_dataset_details(dataset_identifier="abc123-climate")
Result: Full metadata, all resources, download URLs, temporal/spatial coverage
```
### Example 2: Resource Exploration
```
User: "What files are included in the NOAA temperature dataset?"
Claude uses:
1. search_datasets(owner_org="NOAA", search_terms=["temperature"])
2. get_dataset_details(dataset_identifier="<id_from_search>")
Result: List of all resources with formats and descriptions
```
### Example 3: Metadata Quality Check
```
User: "Review the documentation for this oceanographic dataset"
Claude uses: get_dataset_details(dataset_identifier="<provided_id>")
Analysis: Evaluates description, tags, resource documentation, contact info
```
### Example 4: Multi-Dataset Comparison
```
User: "Compare the resources available in these three datasets"
Claude uses: get_dataset_details() for each dataset
Result: Side-by-side comparison of formats, sizes, documentation
```
## Tips
- **Use IDs when available**: More reliable than names
- **Check both servers**: Same dataset name might exist on multiple servers
- **Review all resources**: Datasets often have multiple files/formats
- **Note download URLs**: Save resource URLs for data access
- **Check temporal coverage**: Ensure data covers your time period of interest
- **Verify formats**: Confirm file formats are compatible with your tools
- **Read descriptions carefully**: Important processing details often in descriptions

View File

@@ -0,0 +1,110 @@
---
description: List and filter organizations in the National Data Platform
---
# NDP Organizations
List all organizations contributing data to the National Data Platform.
This command provides access to organization discovery functionality through the NDP MCP.
## Available MCP Tool
### `list_organizations`
Lists all organizations in NDP with optional filtering:
**Parameters**:
- **name_filter** (optional): Filter organizations by name substring match
- Case-insensitive partial matching
- Example: "climate" matches "Climate Research Center", "NOAA Climate Lab"
- **server** (optional): Server to query
- `'global'` (default) - Public global NDP server
- `'local'` - Local/institutional NDP server
- `'pre_ckan'` - Pre-production server
**Returns**: List of organization names and metadata including:
- Total count of organizations
- Organization names matching filter
- Server queried
## Usage Patterns
### Discover All Organizations
```
"List all organizations in the National Data Platform"
```
Uses: `list_organizations()` - No filter, returns all organizations
### Filter by Keyword
```
"Show me all organizations with 'climate' in their name"
```
Uses: `list_organizations(name_filter="climate")`
### Multi-Server Query
```
"Compare organizations on global and local servers"
```
Uses: `list_organizations(server="global")` and `list_organizations(server="local")`
### Research-Specific Discovery
```
"Find organizations related to oceanographic research"
```
Uses: `list_organizations(name_filter="ocean")` and `list_organizations(name_filter="marine")`
## Why Use This Command
1. **Verify Organization Names**: Get exact names before using in dataset searches
2. **Explore Data Sources**: Understand what organizations contribute to NDP
3. **Guide Searches**: Identify relevant organizations for your research domain
4. **Server Comparison**: See organizational differences between servers
5. **Data Coverage**: Understand breadth of data providers
## Workflow Integration
1. **Start Here**: Use this command before searching datasets
2. **Identify Providers**: Find organizations relevant to your research
3. **Use in Search**: Pass organization names to `search_datasets`
4. **Iterate**: Refine organization filters as needed
## Example Interactions
### Example 1: General Exploration
```
User: "List all organizations available on the local NDP server"
Claude uses: list_organizations(server="local")
Result: Complete list of local organizations with count
```
### Example 2: Targeted Discovery
```
User: "Find organizations related to satellite data"
Claude uses: list_organizations(name_filter="satellite")
Result: Organizations with "satellite" in their name
```
### Example 3: Multi-Keyword Search
```
User: "Show me organizations working on Earth observation"
Claude uses:
- list_organizations(name_filter="earth")
- list_organizations(name_filter="observation")
Result: Combined results from both searches
```
### Example 4: Before Dataset Search
```
User: "I want to search for NOAA climate data"
Claude uses: list_organizations(name_filter="noaa")
Result: Exact NOAA organization name(s)
Then: Can proceed with search_datasets(owner_org="<verified_name>")
```
## Tips
- **Use partial names**: "ocean" will match "Oceanographic Institute", "Ocean Research Lab", etc.
- **Try variations**: Search both "climate" and "atmospheric" to find all relevant organizations
- **Check both servers**: Global and local may have different organizations
- **Verify before searching**: Always confirm organization name before using in dataset searches
- **Multiple keywords**: Try related terms to discover all relevant providers

89
commands/ndp-search.md Normal file
View File

@@ -0,0 +1,89 @@
---
description: Search for datasets in the National Data Platform
---
# NDP Dataset Search
Search for datasets across the National Data Platform ecosystem with advanced filtering options.
This command provides access to the NDP MCP tools for dataset discovery and exploration.
## Available MCP Tools
When you use this command, Claude can invoke these MCP tools:
### `search_datasets` - Primary search tool
Searches for datasets using various criteria:
- **search_terms**: List of terms to search across all fields
- **owner_org**: Filter by organization name
- **resource_format**: Filter by format (CSV, JSON, NetCDF, HDF5, GeoTIFF, etc.)
- **dataset_description**: Search in descriptions
- **server**: Query 'global' (default) or 'local' server
- **limit**: Maximum results (default: 20)
### `list_organizations` - Organization discovery
Lists available organizations:
- **name_filter**: Filter by name substring
- **server**: Query 'global' (default), 'local', or 'pre_ckan'
### `get_dataset_details` - Detailed information
Retrieves complete metadata for a specific dataset:
- **dataset_identifier**: Dataset ID or name from search results
- **identifier_type**: 'id' (default) or 'name'
- **server**: 'global' (default) or 'local'
## Recommended Workflow
1. **Discover Organizations**: Use `list_organizations` to find relevant data sources
2. **Search Datasets**: Use `search_datasets` with appropriate filters
3. **Review Results**: Claude will summarize matching datasets
4. **Get Details**: Use `get_dataset_details` for datasets of interest
5. **Refine Search**: Adjust filters based on results
## Best Practices
- **Always verify organization names** with `list_organizations` before using in search
- **Start broad, then refine**: Begin with simple terms, add filters as needed
- **Limit results appropriately**: Default 20 is good, increase if needed
- **Use format filters**: Narrow to specific formats (NetCDF, CSV, etc.) when relevant
- **Multi-server searches**: Query both global and local for comprehensive coverage
## Example Queries
### Basic Search
```
"Find climate datasets from NOAA"
```
Expected tools: `list_organizations(name_filter="noaa")`, then `search_datasets(owner_org="NOAA", search_terms=["climate"])`
### Format-Specific Search
```
"Search for oceanographic data in NetCDF format"
```
Expected tools: `search_datasets(search_terms=["oceanographic"], resource_format="NetCDF")`
### Organization-Based Search
```
"List all datasets from a specific research institution"
```
Expected tools: `list_organizations(name_filter="<institution>")`, then `search_datasets(owner_org="<name>")`
### Refined Search with Limit
```
"Find CSV datasets about temperature monitoring, limit to 10 results"
```
Expected tools: `search_datasets(search_terms=["temperature", "monitoring"], resource_format="CSV", limit=10)`
### Multi-Server Comparison
```
"Compare oceanographic datasets on global and local servers"
```
Expected tools: `search_datasets(server="global", ...)` and `search_datasets(server="local", ...)`
## Tips for Effective Searching
1. **Use specific terminology**: Scientific terms work better than generic ones
2. **Combine filters**: Organization + format + terms = precise results
3. **Check multiple formats**: Try CSV, NetCDF, HDF5 for scientific data
4. **Explore organizations first**: Understanding data providers helps target searches
5. **Request details selectively**: Full metadata for only the most relevant datasets

77
hooks/hooks.json Normal file
View File

@@ -0,0 +1,77 @@
{
"hooks": {
"UserPromptSubmit": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/log_ndp_events.py --event-type UserPromptSubmit"
}
]
}
],
"PreToolUse": [
{
"matcher": "ndp",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/log_ndp_events.py --event-type PreToolUse"
}
]
},
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "echo \"$(date +%s.%N),$(ps -o %cpu= -p $$),$(ps -o rss= -p $$),$CLAUDE_TOOL_NAME,start\" >> ~/.claude/performance.csv"
}
]
}
],
"PostToolUse": [
{
"matcher": "ndp",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/log_ndp_events.py --event-type PostToolUse"
}
]
},
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "echo \"$(date +%s.%N),$(ps -o %cpu= -p $$),$(ps -o rss= -p $$),$CLAUDE_TOOL_NAME,end\" >> ~/.claude/performance.csv; if [[ $(wc -l < ~/.claude/performance.csv) -gt 1000 ]]; then tail -n 500 ~/.claude/performance.csv > ~/.claude/performance.csv.tmp && mv ~/.claude/performance.csv.tmp ~/.claude/performance.csv; fi"
}
]
}
],
"SessionStart": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/log_ndp_events.py --event-type SessionStart"
}
]
}
],
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/log_ndp_events.py --event-type Stop"
}
]
}
]
}
}

159
hooks/log_ndp_events.py Executable file
View File

@@ -0,0 +1,159 @@
#!/usr/bin/env -S uv run --python 3.10 --script
# /// script
# requires-python = ">=3.10"
# ///
"""
NDP Plugin Event Logger
Logs Claude Code events related to NDP plugin usage to a local file.
Enhanced to capture tool names, user input, and agent responses.
"""
import json
import sys
import os
import argparse
from datetime import datetime
from pathlib import Path
def get_log_file_path():
"""Get the log file path within plugin directory"""
# Get plugin root directory
plugin_root = Path(__file__).parent.parent
logs_dir = plugin_root / "logs"
# Create logs directory if it doesn't exist
logs_dir.mkdir(exist_ok=True)
return logs_dir / "ndp_events.log"
def extract_enhanced_data(event_type: str, event_data: dict) -> dict:
"""Extract enhanced information from event data"""
enhanced = {
"timestamp": datetime.now().isoformat(),
"event_type": event_type,
"session_id": event_data.get("session_id", "unknown"),
}
# Extract tool information for PreToolUse and PostToolUse
if event_type in ["PreToolUse", "PostToolUse"]:
tool_data = event_data.get('tool', {})
if tool_data:
enhanced['tool_name'] = tool_data.get('name', 'unknown')
enhanced['tool_input'] = tool_data.get('input', {})
# For PostToolUse, capture tool results
if event_type == "PostToolUse":
if 'result' in event_data:
enhanced['tool_result'] = event_data['result']
if 'output' in event_data:
enhanced['tool_output'] = event_data['output']
if 'error' in event_data:
enhanced['tool_error'] = event_data['error']
# Extract user input for UserPromptSubmit
if event_type == "UserPromptSubmit":
if 'text' in event_data:
enhanced['user_prompt'] = event_data['text']
if 'messages' in event_data:
enhanced['conversation_messages'] = event_data['messages']
# For PostToolUse, extract agent response from transcript
if event_type == "PostToolUse" and 'transcript_path' in event_data:
transcript_path = event_data['transcript_path']
if os.path.exists(transcript_path):
try:
# Read last few messages to capture recent agent response
recent_chat = []
with open(transcript_path, 'r') as f:
lines = f.readlines()
# Get last 5 messages to capture context
for line in lines[-5:]:
line = line.strip()
if line:
try:
msg = json.loads(line)
recent_chat.append(msg)
except json.JSONDecodeError:
pass
enhanced['recent_chat'] = recent_chat
# Extract the latest agent response
for msg in reversed(recent_chat):
if msg.get('role') == 'assistant':
enhanced['latest_agent_response'] = msg.get('content', [])
break
except Exception as e:
enhanced['transcript_read_error'] = str(e)
# For Stop event, optionally include full chat if requested
if event_type == "Stop" and 'transcript_path' in event_data:
transcript_path = event_data['transcript_path']
if os.path.exists(transcript_path):
try:
chat_data = []
with open(transcript_path, 'r') as f:
for line in f:
line = line.strip()
if line:
try:
chat_data.append(json.loads(line))
except json.JSONDecodeError:
pass
# Add summary statistics
enhanced['chat_summary'] = {
'total_messages': len(chat_data),
'user_messages': sum(1 for msg in chat_data if msg.get('role') == 'user'),
'assistant_messages': sum(1 for msg in chat_data if msg.get('role') == 'assistant'),
}
# Optionally include last few messages
enhanced['last_5_messages'] = chat_data[-5:] if chat_data else []
except Exception as e:
enhanced['chat_read_error'] = str(e)
# Include raw event data for completeness
enhanced['raw_data'] = event_data
return enhanced
def log_event(event_type: str, event_data: dict):
"""Log event to file with enhanced data extraction"""
try:
log_file = get_log_file_path()
# Prepare enhanced log entry
log_entry = extract_enhanced_data(event_type, event_data)
# Append to log file (one JSON object per line)
with open(log_file, "a") as f:
f.write(json.dumps(log_entry) + "\n")
return True
except Exception as e:
# Fail silently to not block Claude Code
print(f"Warning: Failed to log event: {e}", file=sys.stderr)
return False
def main():
parser = argparse.ArgumentParser(description='Log NDP plugin events with enhanced data capture')
parser.add_argument('--event-type', required=True, help='Type of event')
args = parser.parse_args()
try:
# Read event data from stdin
event_data = json.load(sys.stdin)
except json.JSONDecodeError:
event_data = {}
# Log the event with enhanced data
log_event(args.event_type, event_data)
# Always exit successfully to not block Claude Code
sys.exit(0)
if __name__ == '__main__':
main()

69
plugin.lock.json Normal file
View File

@@ -0,0 +1,69 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:SIslamMun/iowarp-plugin:ndp-plugin",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "8fc6a0e4bdb652d7f29cb6ccd20d1a937260e394",
"treeHash": "1e28125943e9edc7d798abae2f5b4311368a860bb39396c14f0a4f6a82ade6de",
"generatedAt": "2025-11-28T10:12:43.157888Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "ndp-plugin",
"description": "National Data Platform (NDP) integration plugin with dataset search, discovery, and workflow automation",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "7a4168ad797d1f80a4b4380b374cfee7ea463ae21e3e894d96cc2fb3ce8f9522"
},
{
"path": "agents/ndp-dataset-curator.md",
"sha256": "80537e47871ff2af4efcec669b72532bc9a79b31574e6a3021eeec8deb6d16d0"
},
{
"path": "agents/ndp-data-scientist.md",
"sha256": "93c78b552db86ad8fa28fd9f1301d999ee925dda064dc6f0b7b85a697f007ac5"
},
{
"path": "hooks/hooks.json",
"sha256": "330b9d07eb8a2a01671ac7c68320e3400ec7a890202ffd30741069f0acb94e83"
},
{
"path": "hooks/log_ndp_events.py",
"sha256": "35c11a3727b98c423e7644083b7d57b8adaf855f747d13fdb687cd59cb96de24"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "9ed40f25eeffd93581d259506be104669cdbc316bb0e34414ff5c391bcbaaaf3"
},
{
"path": "commands/ndp-organizations.md",
"sha256": "8453847b408366cebcc933ea9d16d6121aaa9ba6e6c57e557e52502e0ec636ce"
},
{
"path": "commands/ndp-dataset-details.md",
"sha256": "b8ec4903d08ed8cbd61b16ac66a8c4daf5caf5dad37b5eb9f62e45ac04136531"
},
{
"path": "commands/ndp-search.md",
"sha256": "07061ee414c1dbb8d354c9ab4fd2248cdbdbd5955f0196a4ce4ef012f38de610"
}
],
"dirSha256": "1e28125943e9edc7d798abae2f5b4311368a860bb39396c14f0a4f6a82ade6de"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}