Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:40 +08:00
commit f125e90b9f
370 changed files with 67769 additions and 0 deletions

View File

@@ -0,0 +1,122 @@
# Changelog
All notable changes to the insight-skill-generator skill will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.1.0] - 2025-11-16
### Added
- **Phase 1: Insight Discovery and Parsing**
- Automatic discovery of insights in `docs/lessons-learned/` directory
- Parse insight files with session metadata extraction
- Build structured inventory with keywords and categorization
- Support for multiple insight categories (testing, architecture, hooks-and-events, etc.)
- **Phase 2: Smart Clustering**
- Keyword-based similarity analysis
- Multi-factor scoring (category, keywords, temporal proximity, content overlap)
- Automatic cluster formation with configurable thresholds
- Standalone high-quality insight detection
- Sub-clustering for large insight groups
- Interactive cluster review and customization
- **Phase 3: Interactive Skill Design**
- Intelligent skill naming from insight keywords
- Auto-generated descriptions with trigger phrases
- Complexity assessment (minimal/standard/complex)
- Pattern selection (phase-based/mode-based/validation/data-processing)
- Content-to-structure mapping
- Workflow/phase definition
- Preview and customization before generation
- **Phase 4: Skill Generation**
- Complete SKILL.md generation with proper frontmatter
- README.md with usage examples
- plugin.json with marketplace metadata
- CHANGELOG.md initialization
- data/insights-reference.md with original insights
- examples/ directory with code samples
- templates/ directory with actionable checklists
- Comprehensive validation against Anthropic standards
- **Phase 5: Installation and Testing**
- Flexible installation (project-specific or global)
- Conflict detection and resolution
- Post-installation validation
- Skill loading verification
- Testing guidance with trigger phrases
- Refinement suggestions
- **Configuration System**
- `data/clustering-config.yaml` - Tunable similarity rules and thresholds
- `data/skill-templates-map.yaml` - Insight-to-pattern mappings
- `data/quality-checklist.md` - Validation criteria
- **Template System**
- `templates/insight-based-skill.md.j2` - SKILL.md structure reference
- `templates/insight-reference.md.j2` - Insights consolidation pattern
- `templates/insight-checklist.md.j2` - Actionable checklist pattern
- **Documentation**
- Comprehensive SKILL.md with 5-phase workflow
- User-friendly README.md with quick start guide
- Troubleshooting section for common issues
- Example outputs and generated skills
### Features
- **Smart Clustering**: Analyzes insights using keyword similarity, category matching, and temporal proximity
- **Hybrid Approach**: Generates standalone skills from single insights or comprehensive skills from clusters
- **Interactive Guided**: User reviews and customizes every design decision
- **Quality Validation**: Ensures generated skills meet Anthropic standards
- **Pattern Recognition**: Automatically selects appropriate skill pattern based on insight content
- **Deduplication**: Prevents creating skills that duplicate existing functionality
### Integration
- Integrates with `extract-explanatory-insights` hook
- Reads from `docs/lessons-learned/` directory structure
- Supports all insight categories from the hook (testing, configuration, hooks-and-events, security, performance, architecture, version-control, react, typescript, general)
### Supported Patterns
- **Phase-based**: Linear workflows with sequential steps
- **Mode-based**: Multiple distinct approaches for same domain
- **Validation**: Analysis and checking patterns
- **Data-processing**: Transform or analyze data patterns
### Complexity Levels
- **Minimal**: Single insight, basic structure (SKILL.md, README, plugin.json, CHANGELOG)
- **Standard**: 2-4 insights with reference materials and examples
- **Complex**: 5+ insights with comprehensive templates and multiple examples
### Known Limitations
- Requires `docs/lessons-learned/` directory structure from extract-explanatory-insights hook
- Clustering algorithm is keyword-based (not ML-powered)
- Templates use Jinja2 syntax for documentation reference only (not programmatically rendered)
- First release - patterns and thresholds may need tuning based on usage
### Notes
- Generated from research on extract-explanatory-insights hook
- Based on Anthropic's official skill creation patterns
- Follows skill-creator's guided creation approach
- Initial thresholds (cluster_minimum: 0.6, standalone_quality: 0.8) are starting points and may need adjustment
---
## Future Enhancements (Planned)
- Auto-detection of existing skill overlap to prevent duplication
- ML-based clustering for better semantic grouping
- Skill versioning support (updating existing skills with new insights)
- Team collaboration features (merging insights from multiple developers)
- Export skills to Claudex marketplace format
- Integration with cc-insights skill for enhanced pattern detection
- Batch generation mode for processing multiple projects
- Custom template support for organization-specific skill patterns

View File

@@ -0,0 +1,192 @@
# Insight-to-Skill Generator
Transform your accumulated Claude Code explanatory insights into production-ready, reusable skills.
## Overview
The Insight-to-Skill Generator analyzes insights collected by the `extract-explanatory-insights` hook and converts them into well-structured Claude Code skills. It uses smart clustering to group related insights, guides you through interactive skill design, and generates complete skills following Anthropic's standards.
**Perfect for**:
- Reusing knowledge from previous Claude Code sessions
- Creating team-wide skills from project-specific learnings
- Building a library of domain-specific productivity tools
- Codifying best practices discovered through experience
## When to Use
Use this skill when you have insights stored in your project's `docs/lessons-learned/` directory and want to turn them into reusable skills.
**Trigger Phrases**:
- "create skill from insights"
- "generate skill from lessons learned"
- "turn my insights into a skill"
- "convert docs/lessons-learned to skill"
## Quick Start
### Prerequisites
1. Your project has the `extract-explanatory-insights` hook configured
2. You have insights stored in `docs/lessons-learned/` directory
3. You're using Claude Code with Explanatory output style
### Basic Usage
```
You: "I have a bunch of insights about testing in docs/lessons-learned/. Can you create a skill from them?"
Claude: [Activates insight-skill-generator]
- Scans your docs/lessons-learned/ directory
- Clusters related testing insights
- Proposes a "testing-best-practices" skill
- Guides you through customization
- Generates and installs the skill
```
### Example Workflow
1. **Discovery**: The skill finds 12 insights across 4 categories
2. **Clustering**: Groups them into 3 skill candidates:
- "testing-strategy-guide" (5 insights)
- "hook-debugging-helper" (4 insights)
- "performance-optimization" (3 insights)
3. **Design**: You review and customize each skill proposal
4. **Generation**: Complete skills are created with SKILL.md, README, examples, etc.
5. **Installation**: You choose to install "testing-strategy-guide" globally, others project-specific
## Installation
### Standard Installation
```bash
# Clone or copy this skill to your Claude Code skills directory
cp -r insight-skill-generator ~/.claude/skills/
# The skill is now available in all your Claude Code sessions
```
### Project-Specific Installation
```bash
# Copy to project's .claude directory
cp -r insight-skill-generator /path/to/project/.claude/skills/
```
## What Gets Generated
For each skill created, you'll get:
**Minimal Skill** (1 simple insight):
- `SKILL.md` - Main skill instructions
- `README.md` - User documentation
- `plugin.json` - Marketplace metadata
- `CHANGELOG.md` - Version history
**Standard Skill** (2-4 insights):
- All of the above, plus:
- `data/insights-reference.md` - Original insights for reference
- `examples/usage-examples.md` - How to use the skill
**Complex Skill** (5+ insights):
- All of the above, plus:
- `examples/code-samples.md` - Code examples extracted from insights
- `templates/checklist.md` - Actionable checklist
## Features
### Smart Clustering
- Analyzes keywords, categories, and temporal proximity
- Groups related insights automatically
- Identifies standalone high-value insights
- Suggests optimal skill patterns (phase-based, mode-based, validation)
### Interactive Design
- Proposes skill names and descriptions
- Lets you customize every aspect
- Shows pattern trade-offs with examples
- Previews structure before generation
### Quality Assurance
- Validates YAML frontmatter syntax
- Checks against Anthropic's skill standards
- Ensures proper file structure
- Verifies all references are valid
### Flexible Installation
- Choose project-specific or global installation
- Detects naming conflicts
- Tests skill loading after installation
- Provides testing guidance
## Configuration
### Tuning Clustering
Edit `~/.claude/skills/insight-skill-generator/data/clustering-config.yaml`:
```yaml
thresholds:
cluster_minimum: 0.6 # Lower = more aggressive clustering
standalone_quality: 0.8 # Higher = fewer standalone skills
```
### Category Patterns
Customize skill patterns for your domain in `data/skill-templates-map.yaml`:
```yaml
category_patterns:
testing:
preferred_pattern: validation
skill_name_suffix: "testing-guide"
```
## Examples
See [examples/example-clustering-output.md](examples/example-clustering-output.md) for sample cluster analysis.
See [examples/example-generated-skill/](examples/example-generated-skill/) for a complete generated skill.
## Tips
- **Filter quality**: Not every insight should become a skill. Focus on actionable, reusable knowledge
- **Start minimal**: It's easier to expand a skill later than to simplify a complex one
- **Test thoroughly**: Use all trigger phrases to ensure the skill works as expected
- **Version control**: Commit generated skills to git for team sharing
- **Iterate**: Skills can evolve. Version 0.1.0 is just the start
## Troubleshooting
### No insights found
- Verify `docs/lessons-learned/` exists in your project
- Check that the extract-explanatory-insights hook is configured
- Ensure insight files match the naming pattern: `YYYY-MM-DD-*.md`
### Clustering produces weird results
- Adjust thresholds in `data/clustering-config.yaml`
- Manually split or combine clusters in Phase 2
- Try increasing similarity threshold for tighter clusters
### Generated skill doesn't load
- Check YAML frontmatter syntax (no tabs, proper format)
- Verify skill name is lowercase kebab-case
- Restart Claude Code session
- Check file permissions
## Learn More
For detailed workflow documentation, see [SKILL.md](SKILL.md).
## License
Created by Connor for use with Claude Code. Part of the Claude Code skills ecosystem.
## Contributing
Have ideas for improving insight-to-skill generation? Open an issue or submit suggestions through your project's Claude Code configuration.
---
**Version**: 0.1.0
**Category**: Productivity
**Integration**: extract-explanatory-insights hook

View File

@@ -0,0 +1,147 @@
---
name: insight-skill-generator
version: 0.1.0
description: Use PROACTIVELY when working with projects that have docs/lessons-learned/ directories to transform Claude Code explanatory insights into reusable, production-ready skills. Analyzes insight files, clusters related content, and generates interactive skills following Anthropic's standards.
---
# Insight-to-Skill Generator
## Overview
This skill transforms insights from the `extract-explanatory-insights` hook into production-ready Claude Code skills. It discovers insight files, clusters related insights using smart similarity analysis, and guides you through interactive skill creation.
**Key Capabilities**:
- Automatic discovery and parsing of insight files from `docs/lessons-learned/`
- **Deduplication** to remove duplicate entries from extraction hook bugs
- **Quality filtering** to keep only actionable, skill-worthy insights
- Smart clustering using keyword similarity, category matching, and temporal proximity
- Interactive skill design with customizable patterns (phase-based, mode-based, validation)
- Flexible installation (project-specific or global)
## When to Use This Skill
**Trigger Phrases**:
- "create skill from insights"
- "generate skill from lessons learned"
- "turn my insights into a skill"
- "convert docs/lessons-learned to skill"
**Use PROACTIVELY when**:
- User mentions they have accumulated insights in a project
- You notice `docs/lessons-learned/` directory with multiple insights
- User asks how to reuse knowledge from previous sessions
- User wants to create a skill but has raw insights as source material
**NOT for**:
- Creating skills from scratch (use skill-creator instead)
- Creating sub-agents (use sub-agent-creator instead)
- User has no insights or lessons-learned directory
- User wants to create MCP servers (use mcp-server-creator instead)
## Response Style
**Interactive and Educational**: Guide users through each decision point with clear explanations of trade-offs. Provide insights about why certain patterns work better for different insight types.
## Quick Decision Matrix
| User Request | Action | Reference |
|--------------|--------|-----------|
| "create skill from insights" | Full workflow | Start at Phase 1 |
| "show me insight clusters" | Clustering only | `workflow/phase-2-clustering.md` |
| "design skill structure" | Design phase | `workflow/phase-3-design.md` |
| "install generated skill" | Installation | `workflow/phase-5-installation.md` |
## Workflow Overview
### Phase 1: Insight Discovery and Parsing
Locate, read, **deduplicate**, and **quality-filter** insights from lessons-learned directory.
**Details**: `workflow/phase-1-discovery.md`
### Phase 2: Smart Clustering
Group related insights using similarity analysis to identify skill candidates.
**Details**: `workflow/phase-2-clustering.md`
### Phase 3: Interactive Skill Design
Design skill structure with user customization (name, pattern, complexity).
**Details**: `workflow/phase-3-design.md`
### Phase 4: Skill Generation
Create all skill files following the approved design.
**Details**: `workflow/phase-4-generation.md`
### Phase 5: Installation and Testing
Install the skill and provide testing guidance.
**Details**: `workflow/phase-5-installation.md`
## Quality Thresholds
**Minimum quality score: 4** (out of 9 possible)
Score calculation:
- Has actionable items (checklists, steps): +3
- Has code examples: +2
- Has numbered steps: +2
- Word count > 200: +1
- Has warnings/notes: +1
**Skip insights that are**:
- Basic explanatory notes without actionable steps
- Simple definitions or concept explanations
- Single-paragraph observations
**Keep insights that have**:
- Actionable workflows (numbered steps, checklists)
- Decision frameworks (trade-offs, when to use X vs Y)
- Code patterns with explanation of WHY
- Troubleshooting guides with solutions
## File Naming Convention
Files MUST follow: `YYYY-MM-DD-descriptive-slug.md`
-`2025-11-21-jwt-refresh-token-pattern.md`
-`2025-11-20-vitest-mocking-best-practices.md`
-`2025-11-21.md` (missing description)
## Important Reminders
- **Deduplicate first**: Extraction hook may create 7-8 duplicates per file - always deduplicate
- **Quality over quantity**: Not every insight should become a skill (minimum score: 4)
- **Descriptive filenames**: Use `YYYY-MM-DD-topic-slug.md` format
- **Avoid skill duplication**: Check existing skills before generating
- **User confirmation**: Always get user approval before writing files to disk
- **Pattern selection matters**: Wrong pattern makes skill confusing. When in doubt, use phase-based
- **Test before sharing**: Always test trigger phrases work as expected
## Limitations
- Requires `docs/lessons-learned/` directory with insight files
- Insight quality determines output quality (garbage in, garbage out)
- Cannot modify existing skills (generates new ones only)
- Clustering algorithm may need threshold tuning for different domains
## Reference Materials
| Resource | Purpose |
|----------|---------|
| `workflow/*.md` | Detailed phase instructions |
| `reference/troubleshooting.md` | Common issues and fixes |
| `data/clustering-config.yaml` | Similarity rules and thresholds |
| `data/skill-templates-map.yaml` | Insight-to-skill pattern mappings |
| `data/quality-checklist.md` | Validation criteria |
| `templates/*.md.j2` | Generation templates |
| `examples/` | Sample outputs |
## Success Criteria
- [ ] Insights discovered and parsed from lessons-learned
- [ ] Clusters formed with user approval
- [ ] Skill design approved (name, pattern, structure)
- [ ] All files generated and validated
- [ ] Skill installed in chosen location
- [ ] Trigger phrases tested and working
---
**Version**: 0.2.0
**Author**: Connor
**Integration**: extract-explanatory-insights hook

View File

@@ -0,0 +1,205 @@
# Clustering Configuration for Insight-to-Skill Generator
# Version: 0.1.0
# Similarity Scoring Weights
similarity_weights:
same_category: 0.3 # Base score if insights are in same category
shared_keyword: 0.1 # Added per shared keyword
temporal_proximity: 0.05 # If insights created within temporal_window
title_similarity: 0.15 # If titles share significant words
content_overlap: 0.2 # If content has overlapping concepts
# Clustering Thresholds
thresholds:
cluster_minimum: 0.6 # Minimum score to group insights together
standalone_quality: 0.8 # Score for single insight to become standalone skill
split_cluster_size: 5 # If cluster > this size, consider splitting by sub-topics
# Temporal Settings
temporal_window: 7 # Days - insights within this window get proximity bonus
# Category Keywords (from extract-explanatory-insights hook)
# Used for semantic grouping beyond directory structure
categories:
testing:
keywords:
- test
- testing
- coverage
- tdd
- unit
- integration
- e2e
- vitest
- jest
- assertion
- mock
- fixture
skill_category: tooling
configuration:
keywords:
- config
- configuration
- settings
- inheritance
- yaml
- json
- env
- environment
skill_category: tooling
hooks-and-events:
keywords:
- hook
- lifecycle
- event
- trigger
- callback
- listener
- middleware
skill_category: productivity
security:
keywords:
- security
- vulnerability
- auth
- authentication
- authorization
- encryption
- sanitize
- injection
- xss
- csrf
skill_category: analysis
performance:
keywords:
- performance
- optimize
- optimization
- cache
- caching
- lazy
- memoize
- bundle
- latency
- throughput
skill_category: productivity
architecture:
keywords:
- architecture
- design
- pattern
- structure
- module
- component
- layer
- separation
- coupling
- cohesion
skill_category: analysis
version-control:
keywords:
- git
- commit
- branch
- merge
- rebase
- pull request
- pr
- cherry-pick
skill_category: devops
react:
keywords:
- react
- component
- tsx
- jsx
- hooks
- useEffect
- useState
- props
- state
- render
skill_category: tooling
typescript:
keywords:
- typescript
- type
- interface
- generic
- enum
- type guard
- union
- intersection
skill_category: tooling
general:
keywords:
- general
- best practice
- lesson
- debugging
- troubleshooting
skill_category: productivity
# Complexity Assessment Rules
# Determines whether insight(s) become minimal/standard/complex skill
complexity_rules:
minimal:
max_insights: 1
max_paragraphs: 3
has_code_examples: false
description: "Single focused insight, phase-based workflow"
standard:
max_insights: 4
min_paragraphs: 3
requires_data_dir: true
description: "Multiple related insights, comprehensive workflow with reference materials"
complex:
min_insights: 5
requires_modes: true
requires_examples: true
description: "Large insight cluster, mode-based with extensive examples and templates"
# Skill Naming Heuristics
naming:
max_length: 40 # Max chars for skill name (kebab-case)
remove_words: # Common words to remove from auto-generated names
- "insight"
- "lesson"
- "learned"
- "the"
- "a"
- "an"
preferred_suffixes: # Preferred endings for skill names
- "guide"
- "advisor"
- "helper"
- "automation"
- "analyzer"
# Description Generation
description:
max_length: 150 # Soft limit for description (actual limit is 1024)
required_elements:
- action_verb # Must start with verb (e.g., "Use", "Analyzes", "Guides")
- trigger_phrase # Must include "PROACTIVELY when" or "Use when"
- benefit # Must describe value/outcome
action_verbs:
- "Use PROACTIVELY when"
- "Guides"
- "Analyzes"
- "Automates"
- "Validates"
- "Optimizes"
- "Generates"
- "Monitors"

View File

@@ -0,0 +1,259 @@
# Quality Validation Checklist
Generated skills must pass all validation criteria before installation. This checklist ensures compliance with Anthropic's skill standards and Connor's quality requirements.
## 1. YAML Frontmatter Validation
### Required Fields
- [ ] `name` field present and valid
- Max 64 characters
- Lowercase, numbers, hyphens only
- No reserved words (skill, claude, anthropic)
- Matches directory name
- [ ] `description` field present and valid
- Max 1024 characters
- Non-empty
- No XML/HTML tags
- Action-oriented (starts with verb)
### Description Quality
- [ ] Contains trigger phrase
- "Use PROACTIVELY when..." OR
- "Use when..." OR
- "Guides..." OR
- "Analyzes..." OR
- Similar action verb
- [ ] Describes clear value/benefit
- What problem does it solve?
- What outcome does it produce?
- [ ] Appropriate for skill category
- Aligns with insight category
- Matches skill type (tooling/analysis/productivity)
### Optional Fields (if present)
- [ ] `allowed-tools` (Claude Code only)
- Valid tool names only
- No duplicates
- [ ] Custom fields are reasonable
- `version`, `author`, `category` are common
## 2. File Structure Validation
### Required Files
- [ ] `SKILL.md` exists and is non-empty
- [ ] `README.md` exists and is non-empty
- [ ] `plugin.json` exists and is valid JSON
- [ ] `CHANGELOG.md` exists with v0.1.0 entry
### Optional Files (based on complexity)
- [ ] `data/` directory if complexity >= standard
- [ ] `data/insights-reference.md` if multiple insights
- [ ] `examples/` directory if code examples present
- [ ] `templates/` directory if actionable checklists exist
### File Size Limits
- [ ] SKILL.md < 500 lines (recommend splitting if larger)
- [ ] No single file > 1000 lines
- [ ] Total skill size < 1MB
## 3. SKILL.md Content Quality
### Structure
- [ ] Clear heading hierarchy (h1 → h2 → h3)
- [ ] No skipped heading levels
- [ ] Consistent formatting throughout
### Required Sections
- [ ] Overview/Introduction
- What the skill does
- When to use it
- [ ] Workflow or Phases
- Clear step-by-step instructions
- Numbered or bulleted steps
- Logical progression
- [ ] Examples (if applicable)
- Concrete use cases
- Expected outputs
### Content Quality
- [ ] Clear, concise language
- [ ] No ambiguous pronouns ("it", "this", "that" without context)
- [ ] Consistent terminology (no mixing synonyms)
- [ ] Action-oriented instructions (imperative mood)
### Progressive Disclosure
- [ ] SKILL.md serves as table of contents
- [ ] Detailed content in separate files (data/, examples/)
- [ ] References are one level deep (no nested references)
## 4. Insight Integration Quality
### Insight Attribution
- [ ] Original insights preserved in `data/insights-reference.md`
- [ ] Insights properly dated and sourced
- [ ] Session metadata included
### Content Transformation
- [ ] Insights converted to actionable workflow steps
- [ ] Problem-solution structure maintained
- [ ] Code examples extracted to examples/
- [ ] Best practices highlighted in Important Reminders
### Deduplication
- [ ] No duplicate content between skill files
- [ ] Cross-references used instead of duplication
- [ ] Consolidated similar points
## 5. Pattern Adherence
### Selected Pattern (phase-based/mode-based/validation)
- [ ] Pattern choice justified by insight content
- [ ] Pattern correctly implemented
- [ ] Section structure matches pattern conventions
### Workflow Logic
- [ ] Phases/modes are sequential or parallel as appropriate
- [ ] Each phase has clear purpose
- [ ] Prerequisites stated upfront
- [ ] Expected outputs defined
### Error Handling
- [ ] Common pitfalls documented
- [ ] Troubleshooting guidance included
- [ ] Failure recovery steps provided
## 6. README.md Quality
### Required Sections
- [ ] Brief overview (1-2 sentences)
- [ ] When to use (trigger phrases)
- [ ] Quick start example
- [ ] Installation instructions (if not standard)
### Clarity
- [ ] User-focused (not developer-focused)
- [ ] Examples are copy-pasteable
- [ ] Screenshots/diagrams if beneficial
## 7. plugin.json Validation
### JSON Validity
- [ ] Valid JSON syntax
- [ ] No trailing commas
- [ ] Proper escaping
### Required Fields
- [ ] `name` matches SKILL.md frontmatter
- [ ] `version` follows semver (0.1.0 for new skills)
- [ ] `description` matches SKILL.md frontmatter
- [ ] `type` is "skill"
### Optional Fields (if present)
- [ ] `author` is reasonable string
- [ ] `category` is valid (tooling/analysis/productivity/devops)
- [ ] `tags` are relevant keywords
## 8. Code Quality (if skill includes examples)
### Code Examples
- [ ] Syntax highlighting specified (```language)
- [ ] Code is complete and runnable
- [ ] No placeholder variables (unless clearly marked)
- [ ] Comments explain non-obvious logic
### Best Practices
- [ ] Follows language conventions
- [ ] No security vulnerabilities
- [ ] No hardcoded credentials
- [ ] Error handling demonstrated
## 9. Accessibility & Usability
### Trigger Phrases
- [ ] Multiple trigger phrases provided
- [ ] Phrases are natural language
- [ ] Covers different ways users might ask
- [ ] PROACTIVELY triggers are specific (not too broad)
### Searchability
- [ ] Skill name reflects function
- [ ] Description contains relevant keywords
- [ ] Tags (if present) aid discovery
### User Guidance
- [ ] Clear next steps after each phase
- [ ] Decision points clearly marked
- [ ] Optional vs. required steps distinguished
## 10. Edge Cases & Robustness
### Input Handling
- [ ] Handles missing/malformed input gracefully
- [ ] Validates prerequisites before execution
- [ ] Provides helpful error messages
### Project Variability
- [ ] Handles different project structures
- [ ] Works with monorepos and single packages
- [ ] Adapts to different tech stacks
### Maintenance
- [ ] No hardcoded paths (use relative or user-provided)
- [ ] No assumptions about environment
- [ ] Graceful degradation if optional tools unavailable
## 11. Insight-Specific Validation
### Quality Filter
- [ ] Only high-quality insights converted
- Actionable (not just informational)
- Generally applicable (not project-specific)
- Valuable (solves real problem)
### Relevance
- [ ] Skill solves problem not covered by existing skills
- [ ] No duplication with skill-creator, sub-agent-creator, etc.
- [ ] Unique value proposition clear
### Scope
- [ ] Skill is focused (does one thing well)
- [ ] Not too broad (overwhelming)
- [ ] Not too narrow (trivial)
## Scoring
### Critical (must pass all)
All items in sections 1-2 (Frontmatter, File Structure)
### High Priority (must pass 90%+)
Sections 3-5 (Content Quality, Insight Integration, Pattern Adherence)
### Medium Priority (must pass 80%+)
Sections 6-9 (README, plugin.json, Code Quality, Usability)
### Optional (nice to have)
Sections 10-11 (Edge Cases, Insight-Specific)
## Auto-Fix Opportunities
Some issues can be auto-corrected:
- [ ] Trim description to 1024 chars
- [ ] Convert skill name to lowercase kebab-case
- [ ] Add missing CHANGELOG.md with v0.1.0
- [ ] Generate basic README.md from SKILL.md overview
- [ ] Validate and pretty-print JSON files
## Manual Review Required
Some issues require user decision:
- Ambiguous trigger phrases
- Pattern selection uncertainty
- Multiple valid skill name options
- Content organization choices
- Category assignment conflicts

View File

@@ -0,0 +1,304 @@
# Skill Templates Mapping
# Maps insight characteristics to appropriate skill patterns and structures
# Version: 0.1.0
# Pattern Definitions
# Based on Anthropic's skill creation patterns
patterns:
phase-based:
description: "Linear workflow with sequential phases"
best_for:
- "Step-by-step processes"
- "Problem-diagnosis-solution flows"
- "Single-path workflows"
structure:
- "## Phase 1: [Name]"
- "## Phase 2: [Name]"
- "## Phase 3: [Name]"
example_skills:
- "skill-creator"
- "mcp-server-creator"
mode-based:
description: "Multiple distinct workflows/approaches"
best_for:
- "Different use cases for same domain"
- "Beginner vs advanced paths"
- "Alternative strategies"
structure:
- "## Mode 1: [Name]"
- "## Mode 2: [Name]"
- "## Mode Selection Guide"
example_skills:
- "react-project-scaffolder"
validation:
description: "Analysis and checking pattern"
best_for:
- "Code review insights"
- "Quality assessment insights"
- "Security/performance audits"
structure:
- "## Analysis Phase"
- "## Issue Detection"
- "## Recommendations"
example_skills:
- "codebase-auditor"
- "bulletproof-react-auditor"
data-processing:
description: "Transform or analyze data"
best_for:
- "File parsing insights"
- "Data transformation insights"
- "Report generation insights"
structure:
- "## Input Processing"
- "## Transformation Logic"
- "## Output Generation"
example_skills:
- "cc-insights"
# Insight-to-Pattern Mapping Rules
mapping_rules:
- condition:
insight_contains:
- "step"
- "process"
- "workflow"
- "first"
- "then"
- "finally"
recommended_pattern: phase-based
confidence: high
- condition:
insight_contains:
- "approach"
- "strategy"
- "alternative"
- "different ways"
- "option"
recommended_pattern: mode-based
confidence: high
- condition:
insight_contains:
- "check"
- "validate"
- "ensure"
- "verify"
- "audit"
- "review"
recommended_pattern: validation
confidence: medium
- condition:
insight_contains:
- "parse"
- "transform"
- "analyze data"
- "process file"
- "extract"
recommended_pattern: data-processing
confidence: medium
- condition:
has_code_examples: true
has_step_by_step: true
recommended_pattern: phase-based
confidence: high
- condition:
cluster_size: 1
complexity: low
recommended_pattern: phase-based
confidence: high
- condition:
cluster_size: "> 4"
has_multiple_approaches: true
recommended_pattern: mode-based
confidence: medium
# Insight Content → Skill Section Mappings
content_mappings:
problem_description:
triggers:
- "The problem"
- "Challenge"
- "Issue"
- "We encountered"
maps_to:
section: "## Overview"
subsection: "Problem Statement"
solution_explanation:
triggers:
- "The solution"
- "To fix this"
- "We resolved"
- "The approach"
maps_to:
section: "## Workflow"
subsection: "Implementation Steps"
code_example:
triggers:
- "```"
- "Example:"
- "For instance"
maps_to:
section: "## Examples"
file: "examples/code-samples.md"
best_practice:
triggers:
- "Best practice"
- "Always"
- "Never"
- "Recommended"
- "Avoid"
maps_to:
section: "## Important Reminders"
file: "data/insights-reference.md"
checklist_items:
triggers:
- "- [ ]"
- "1."
- "2."
- "Steps:"
maps_to:
section: "## Workflow"
file: "templates/checklist.md"
trade_offs:
triggers:
- "trade-off"
- "pros and cons"
- "advantage"
- "disadvantage"
- "however"
maps_to:
section: "## Decision Guide"
subsection: "Considerations"
warning_caution:
triggers:
- "Warning"
- "Caution"
- "Note"
- "Important"
- "Be careful"
maps_to:
section: "## Important Reminders"
priority: high
# Skill Complexity Matrix
# Determines file structure based on insight characteristics
complexity_matrix:
minimal:
condition:
insight_count: 1
total_lines: "< 50"
has_code: false
structure:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
pattern: phase-based
standard:
condition:
insight_count: "2-4"
total_lines: "50-200"
has_code: true
structure:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
- "data/insights-reference.md"
- "examples/usage-examples.md"
pattern: phase-based
complex:
condition:
insight_count: "> 4"
total_lines: "> 200"
has_multiple_topics: true
structure:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
- "data/insights-reference.md"
- "examples/usage-examples.md"
- "examples/code-samples.md"
- "templates/checklist.md"
- "templates/workflow-template.md"
pattern: mode-based
# Category-Specific Skill Patterns
category_patterns:
testing:
preferred_pattern: validation
skill_name_suffix: "testing-guide"
description_template: "Use PROACTIVELY when [testing context] to ensure [quality aspect]"
typical_sections:
- "Test Strategy"
- "Common Pitfalls"
- "Best Practices"
- "Example Tests"
architecture:
preferred_pattern: validation
skill_name_suffix: "architecture-advisor"
description_template: "Guides [architecture decision] with [architectural principle]"
typical_sections:
- "Architectural Principles"
- "Pattern Selection"
- "Trade-offs Analysis"
- "Implementation Guidance"
hooks-and-events:
preferred_pattern: phase-based
skill_name_suffix: "hook-automation"
description_template: "Automates [hook-related task] for [benefit]"
typical_sections:
- "Hook Configuration"
- "Event Handling"
- "Debugging Tips"
- "Common Patterns"
performance:
preferred_pattern: validation
skill_name_suffix: "performance-optimizer"
description_template: "Analyzes [performance aspect] and generates [optimization]"
typical_sections:
- "Performance Analysis"
- "Optimization Techniques"
- "Measurement Approach"
- "Expected Improvements"
security:
preferred_pattern: validation
skill_name_suffix: "security-validator"
description_template: "Validates [security aspect] against [security standard]"
typical_sections:
- "Security Checklist"
- "Vulnerability Detection"
- "Remediation Steps"
- "Verification"
configuration:
preferred_pattern: phase-based
skill_name_suffix: "config-helper"
description_template: "Guides [configuration task] with [configuration approach]"
typical_sections:
- "Configuration Setup"
- "Common Patterns"
- "Troubleshooting"
- "Validation"

View File

@@ -0,0 +1,348 @@
# Lessons Learned Index
This directory contains auto-extracted insights from Claude Code sessions using the Explanatory output style.
## Directory Structure
Insights are organized by category with timestamped, descriptive filenames:
```
docs/lessons-learned/
├── README.md (this file)
├── architecture/
│ ├── 2025-11-14-system-design-patterns.md
│ └── 2025-11-10-microservices-architecture.md
├── configuration/
│ └── 2025-11-12-config-inheritance.md
├── hooks-and-events/
│ ├── 2025-11-14-hook-debugging-strategy.md
│ └── 2025-11-13-lifecycle-events.md
├── performance/
│ └── 2025-11-11-optimization-techniques.md
├── react/
│ └── 2025-11-14-component-patterns.md
├── security/
│ └── 2025-11-09-auth-best-practices.md
├── testing/
│ └── 2025-11-14-tdd-workflow.md
├── typescript/
│ └── 2025-11-10-type-safety.md
├── version-control/
│ └── 2025-11-08-git-workflow.md
└── general/
└── 2025-11-07-general-tips.md
```
## Categories
### Architecture
### Architecture
### Architecture
### Architecture
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
### Architecture
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
**Total**: 7 insights across 1 file
**Total**: 7 insights across 1 file
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
**Total**: 7 insights across 1 file
**Total**: 7 insights across 1 file
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
**Total**: 7 insights across 1 file
**Total**: 7 insights across 1 file
- [`2025-11-16-skill-design-pattern-selection.md`](./architecture/2025-11-16-skill-design-pattern-selection.md) - 7 insights (Updated: 2025-11-16)
### Configuration
### Configuration
**Total**: 7 insights across 1 file
### Configuration
### Configuration
### Configuration
### Configuration
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
### Configuration
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
**Total**: 10 insights across 1 file
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
**Total**: 10 insights across 1 file
**Total**: 10 insights across 1 file
**Total**: 10 insights across 1 file
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
- [`2025-11-16-configuration-driven-design-pattern.md`](./configuration/2025-11-16-configuration-driven-design-pattern.md) - 10 insights (Updated: 2025-11-16)
**Total**: 10 insights across 1 file
**Total**: 10 insights across 1 file
**Total**: 10 insights across 1 file
### Version-control
### Version-control
### Version-control
### Version-control
### Version-control
### Version-control
### Version-control
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
**Total**: 8 insights across 1 file
**Total**: 8 insights across 1 file
**Total**: 8 insights across 1 file
**Total**: 8 insights across 1 file
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
- [`2025-11-16-skill-discovery-and-validation.md`](./version-control/2025-11-16-skill-discovery-and-validation.md) - 8 insights (Updated: 2025-11-16)
**Total**: 8 insights across 1 file
**Total**: 8 insights across 1 file
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*
**Total**: 8 insights across 1 file
## Usage
Each category contains an `insights.md` file with chronologically ordered insights. Insights are automatically categorized based on content analysis.
### Manual Categorization
If you need to recategorize an insight:
1. Cut the insight from the current file
2. Paste it into the appropriate category file
3. The index will auto-update on the next extraction
### Searching
Use grep to search across all insights:
```bash
grep -r "your search term" docs/lessons-learned/
```
Or use ripgrep for faster searches:
```bash
rg "your search term" docs/lessons-learned/
```
---
*Auto-generated by extract-explanatory-insights.sh hook*

View File

@@ -0,0 +1,114 @@
# Architecture Insights - November 16, 2025
Auto-generated lessons learned from Claude Code Explanatory insights.
**Session**: 79b654b6-10f8-4c3c-92e1-a3535644366c
**Generated**: 2025-11-16 09:57:31
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---
## Skill Design Pattern Selection
**Skill Design Pattern Selection**
The research revealed three key patterns for skill creation:
1. **Phase-based workflow** (used by skill-creator) - best for linear, multi-step processes
2. **Mode-based approach** (used by complex skills) - best for multiple distinct workflows
3. **Validation pattern** (used by auditor skills) - best for analysis and checking
For our insight-to-skill generator, we'll use the **phase-based workflow** pattern because the process is inherently sequential (discover → cluster → design → generate → install), and it aligns with how skill-creator already works, making the user experience consistent.
---

View File

@@ -0,0 +1,169 @@
# Configuration Insights - November 16, 2025
Auto-generated lessons learned from Claude Code Explanatory insights.
**Session**: 79b654b6-10f8-4c3c-92e1-a3535644366c
**Generated**: 2025-11-16 09:57:31
---
larity thresholds without editing code
2. Add new categories and keywords as their projects evolve
3. Customize skill generation patterns for their domain
This approach follows the "data over code" principle from the skill-creator research, making the skill more maintainable and adaptable.
---
## Configuration-Driven Design Pattern
**Configuration-Driven Design Pattern**
Rather than hardcoding clustering logic, we're externalizing it to YAML configuration files. This allows users to:
1. Tune similarity thresholds without editing code
2. Add new categories and keywords as their projects evolve
3. Customize skill generation patterns for their domain
This approach follows the "data over code" principle from the skill-creator research, making the skill more maintainable and adaptable.
---
## Configuration-Driven Design Pattern
**Configuration-Driven Design Pattern**
Rather than hardcoding clustering logic, we're externalizing it to YAML configuration files. This allows users to:
1. Tune similarity thresholds without editing code
2. Add new categories and keywords as their projects evolve
3. Customize skill generation patterns for their domain
This approach follows the "data over code" principle from the skill-creator research, making the skill more maintainable and adaptable.
---
## Configuration-Driven Design Pattern
**Configuration-Driven Design Pattern**
Rather than hardcoding clustering logic, we're externalizing it to YAML configuration files. This allows users to:
1. Tune similarity thresholds without editing code
2. Add new categories and keywords as their projects evolve
3. Customize skill generation patterns for their domain
This approach follows the "data over code" principle from the skill-creator research, making the skill more maintainable and adaptable.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---
## Template-Based Documentation vs. Executable Logic
**Template-Based Documentation vs. Executable Logic**
Notice that our templates use Jinja2-style syntax (`{{ variable }}`), but this is for documentation and reference only. In Claude Code skills, the actual logic is expressed as clear, step-by-step instructions in SKILL.md that I (Claude) interpret and execute. The templates serve as:
1. **Consistency guides** - showing expected output structure
2. **Reference patterns** - documenting best practices
3. **Quality standards** - setting expectations for generated content
This is different from traditional code generation where templates are programmatically rendered. Here, I read the templates as examples and generate similar output naturally.
---

View File

@@ -0,0 +1,154 @@
# Version-control Insights - November 16, 2025
Auto-generated lessons learned from Claude Code Explanatory insights.
**Session**: 79b654b6-10f8-4c3c-92e1-a3535644366c
**Generated**: 2025-11-16 09:57:31
---
valid YAML between `---` delimiters
2. **Required fields** - `name` and `description` must be present and valid
3. **Name format** - Lowercase kebab-case, max 64 chars
4. **Description format** - Max 1024 chars, contains action verbs
By validating these before even attempting to load the skill, we catch 90%+ of skill loading issues early.
---
## Skill Discovery and Validation
**Skill Discovery and Validation**
Claude Code skills are discovered by reading the frontmatter of SKILL.md files. The key validation points are:
1. **YAML syntax** - Must be valid YAML between `---` delimiters
2. **Required fields** - `name` and `description` must be present and valid
3. **Name format** - Lowercase kebab-case, max 64 chars
4. **Description format** - Max 1024 chars, contains action verbs
By validating these before even attempting to load the skill, we catch 90%+ of skill loading issues early.
---
## Skill Discovery and Validation
**Skill Discovery and Validation**
Claude Code skills are discovered by reading the frontmatter of SKILL.md files. The key validation points are:
1. **YAML syntax** - Must be valid YAML between `---` delimiters
2. **Required fields** - `name` and `description` must be present and valid
3. **Name format** - Lowercase kebab-case, max 64 chars
4. **Description format** - Max 1024 chars, contains action verbs
By validating these before even attempting to load the skill, we catch 90%+ of skill loading issues early.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---
## Skill as Documentation Pattern
**Skill as Documentation Pattern**
Notice that this skill doesn't execute code—it provides comprehensive instructions that I (Claude) interpret. The templates use Jinja2-style syntax for documentation reference, not programmatic rendering. This "skill as documentation" pattern is powerful because:
1. **Flexibility**: I can adapt instructions to context rather than rigidly executing code
2. **Maintainability**: Plain markdown is easier to update than code
3. **Clarity**: Users can read SKILL.md to understand exactly what will happen
4. **Extensibility**: Easy to add new phases, patterns, or validations
The configuration files (YAML) provide tunable parameters, while SKILL.md provides the logic I execute. This separation of concerns makes the skill both powerful and maintainable.
---

View File

@@ -0,0 +1,286 @@
# Example: Clustering Analysis Output
This example shows what the clustering phase produces when analyzing a project's insights.
## Scenario
A project has been using the extract-explanatory-insights hook for 2 weeks, generating 12 insights across different categories.
---
## Phase 1: Discovery Summary
**Total Insights Found**: 12
**Date Range**: 2025-11-01 to 2025-11-14
**Unique Sessions**: 8
**Categories**:
- testing: 5 insights
- hooks-and-events: 3 insights
- architecture: 2 insights
- performance: 2 insights
**Preview**:
1. "Modern Testing Strategy with Testing Trophy" (testing, 2025-11-01)
2. "Hook Deduplication Session Management" (hooks-and-events, 2025-11-03)
3. "CPU Usage Prevention in Vitest" (testing, 2025-11-03)
4. "BSD awk Compatibility in Hook Scripts" (hooks-and-events, 2025-11-05)
5. "Semantic Query Priorities in React Testing Library" (testing, 2025-11-06)
---
## Phase 2: Clustering Analysis
### Cluster 1: Testing Strategy
**Size**: 3 insights
**Similarity Score**: 0.75 (high)
**Recommended Complexity**: Standard
**Recommended Pattern**: Validation
**Insights**:
1. "Modern Testing Strategy with Testing Trophy"
- Keywords: testing, integration, unit, e2e, trophy, kent-c-dodds
- Category: testing
- Date: 2025-11-01
- Length: 156 lines
- Has code examples: Yes
2. "Semantic Query Priorities in React Testing Library"
- Keywords: testing, react, semantic, query, getByRole, accessibility
- Category: testing
- Date: 2025-11-06
- Length: 89 lines
- Has code examples: Yes
3. "What NOT to Test - Brittle Patterns"
- Keywords: testing, avoid, brittle, implementation-details, user-behavior
- Category: testing
- Date: 2025-11-08
- Length: 67 lines
- Has code examples: No
**Shared Keywords**: testing (3), react (2), user (2), behavior (2), semantic (2)
**Cluster Characteristics**:
- All in same category (testing)
- Temporal span: 7 days
- Common theme: User-focused testing approach
- Total code examples: 5 blocks
- Actionable items: 12
**Suggested Skill Name**: "user-focused-testing-guide"
**Suggested Description**: "Use PROACTIVELY when writing tests to ensure user-centric testing strategy following Testing Trophy methodology and React Testing Library best practices"
**Skill Structure Recommendation**:
```
SKILL.md sections:
- Overview (Testing Trophy philosophy)
- Phase 1: Query Selection (semantic queries)
- Phase 2: Test Writing (user workflows)
- Phase 3: Avoiding Brittle Tests
- Important Reminders (what NOT to test)
- Examples (from code blocks)
```
---
### Cluster 2: Hook Development
**Size**: 2 insights
**Similarity Score**: 0.68 (medium-high)
**Recommended Complexity**: Minimal to Standard
**Recommended Pattern**: Phase-based
**Insights**:
1. "Hook Deduplication Session Management"
- Keywords: hook, deduplication, session, hash, state
- Category: hooks-and-events
- Date: 2025-11-03
- Length: 124 lines
- Has code examples: Yes (bash)
2. "BSD awk Compatibility in Hook Scripts"
- Keywords: hook, awk, bsd, macos, compatibility, portable
- Category: hooks-and-events
- Date: 2025-11-05
- Length: 78 lines
- Has code examples: Yes (bash)
**Shared Keywords**: hook (2), bash (2), script (2), compatibility (2)
**Cluster Characteristics**:
- Same category (hooks-and-events)
- Temporal span: 2 days (high proximity)
- Common theme: Hook script development best practices
- Total code examples: 4 blocks (all bash)
- Actionable items: 8
**Suggested Skill Name**: "hook-script-development-guide"
**Suggested Description**: "Guides Claude Code hook script development with deduplication patterns and cross-platform compatibility best practices for macOS/Linux environments"
**Skill Structure Recommendation**:
```
SKILL.md sections:
- Overview (Hook development challenges)
- Phase 1: Deduplication Strategy (session vs content)
- Phase 2: Cross-Platform Compatibility (BSD awk)
- Phase 3: Testing and Validation
- Examples (bash scripts)
- Templates (hook-template.sh)
```
---
### Cluster 3: Performance Optimization
**Size**: 2 insights
**Similarity Score**: 0.62 (medium)
**Recommended Complexity**: Minimal
**Recommended Pattern**: Phase-based
**Insights**:
1. "CPU Usage Prevention in Vitest"
- Keywords: vitest, cpu, performance, threads, watch-mode
- Category: testing
- Date: 2025-11-03
- Length: 45 lines
- Has code examples: Yes (npm scripts)
2. "Bundle Size Optimization"
- Keywords: bundle, size, webpack, optimization, tree-shaking
- Category: performance
- Date: 2025-11-09
- Length: 92 lines
- Has code examples: Yes (webpack config)
**Shared Keywords**: performance (2), optimization (2), build (2)
**Cluster Characteristics**:
- Different categories (testing, performance)
- Temporal span: 6 days
- Common theme: Build/test performance
- Total code examples: 3 blocks
- Actionable items: 6
**Suggested Skill Name**: "build-performance-optimizer"
**Suggested Description**: "Analyzes build and test performance issues and generates optimization recommendations for Vitest test runs and webpack bundle sizes"
**Skill Structure Recommendation**:
```
SKILL.md sections:
- Overview (Performance impact on DX)
- Phase 1: Test Performance Analysis
- Phase 2: Bundle Size Analysis
- Phase 3: Optimization Implementation
- Important Reminders (check before test, monitor bundle)
- Examples (npm scripts, webpack config)
```
---
### Standalone Insights
#### Standalone 1: "Hook State Management Patterns"
**Quality Score**: 0.85 (high - qualifies for standalone skill)
**Category**: hooks-and-events
**Date**: 2025-11-07
**Length**: 134 lines
**Code Examples**: Yes (3 blocks)
**Why Standalone**:
- Doesn't cluster with other hook insights (different focus)
- High quality with comprehensive coverage
- Self-contained topic (state management)
- Multiple actionable patterns
**Suggested Skill Name**: "hook-state-manager"
**Suggested Description**: "Automates state management setup for Claude Code hooks with persistent storage, cleanup, and safe concurrency patterns"
---
#### Standalone 2: "Architecture Decision Records"
**Quality Score**: 0.82 (high - qualifies for standalone skill)
**Category**: architecture
**Date**: 2025-11-12
**Length**: 156 lines
**Code Examples**: Yes (template)
**Why Standalone**:
- Unique topic (no other architecture insights)
- High quality with complete template
- Valuable for documentation
- Industry best practice
**Suggested Skill Name**: "adr-documentation-helper"
**Suggested Description**: "Guides creation of Architecture Decision Records (ADRs) following industry standards with templates and integration with project documentation"
---
### Low-Quality Insights (Not Recommended for Skills)
#### "Git Branch Naming Convention"
**Quality Score**: 0.42 (low)
**Category**: version-control
**Reason for Exclusion**: Too simple, covered by existing conventions, no unique value
#### "TypeScript Strict Mode Benefits"
**Quality Score**: 0.38 (low)
**Category**: typescript
**Reason for Exclusion**: Common knowledge, well-documented elsewhere, not actionable enough
---
## User Decision Points
At this stage, the skill would present the following options to the user:
**Option 1: Generate All Recommended Skills** (5 skills)
- user-focused-testing-guide (Cluster 1)
- hook-script-development-guide (Cluster 2)
- build-performance-optimizer (Cluster 3)
- hook-state-manager (Standalone 1)
- adr-documentation-helper (Standalone 2)
**Option 2: Select Specific Skills**
- User picks which clusters/standalones to convert
**Option 3: Modify Clusters**
- Split large clusters
- Merge small clusters
- Recategorize insights
- Adjust complexity levels
**Option 4: Tune Thresholds and Retry**
- Increase cluster_minimum (0.6 → 0.7) for tighter clusters
- Decrease standalone_quality (0.8 → 0.7) for more standalone skills
---
## Proceeding to Phase 3
If user selects "user-focused-testing-guide" to generate, the skill would proceed to Phase 3: Interactive Skill Design with the following proposal:
**Skill Design Proposal**:
- Name: `user-focused-testing-guide`
- Description: "Use PROACTIVELY when writing tests to ensure user-centric testing strategy following Testing Trophy methodology and React Testing Library best practices"
- Complexity: Standard
- Pattern: Validation
- Structure:
- SKILL.md with validation workflow
- data/insights-reference.md with 3 source insights
- examples/query-examples.md with semantic query patterns
- templates/test-checklist.md with testing checklist
User can then customize before generation begins.
---
**This example demonstrates**:
1. How clustering groups related insights
2. What information is presented for each cluster
3. How standalone insights are identified
4. Why some insights are excluded
5. What decisions users can make
6. How the process flows into Phase 3

View File

@@ -0,0 +1,24 @@
# Changelog
## [0.1.0] - 2025-11-16
### Added
- Initial release
- Generated from 1 insight (Hook Deduplication Session Management)
- Phase 1: Choose Deduplication Strategy
- Phase 2: Implement Content-Based Deduplication
- Phase 3: Implement Hash Rotation
- Phase 4: Testing and Validation
- Code examples for bash hook implementation
- Troubleshooting section
### Features
- Content-based deduplication using SHA256 hashes
- Session-independent duplicate detection
- Efficient hash storage with rotation
- State management best practices
### Generated By
- insight-skill-generator v0.1.0
- Source category: hooks-and-events
- Original insight date: 2025-11-03

View File

@@ -0,0 +1,51 @@
# Hook Deduplication Guide
Implement robust content-based deduplication for Claude Code hooks.
## Overview
This skill guides you through implementing SHA256 hash-based deduplication to prevent duplicate insights or data from being stored across sessions.
## When to Use
**Trigger Phrases**:
- "implement hook deduplication"
- "prevent duplicate insights in hooks"
- "content-based deduplication for hooks"
## Quick Start
```bash
# Test the skill
You: "I need to add deduplication to my hook to prevent storing the same insight twice"
Claude: [Activates hook-deduplication-guide]
- Explains content-based vs session-based strategies
- Guides implementation of SHA256 hashing
- Shows hash rotation to prevent file bloat
- Provides testing validation
```
## What You'll Get
- Content-based deduplication using SHA256
- Efficient hash storage with rotation
- Testing and validation guidance
- Best practices for hook state management
## Installation
```bash
# This is an example generated by insight-skill-generator
# Copy to your skills directory if you want to use it
cp -r examples/example-generated-skill ~/.claude/skills/hook-deduplication-guide
```
## Learn More
See [SKILL.md](SKILL.md) for complete workflow documentation.
---
**Generated by**: insight-skill-generator v0.1.0
**Source**: 1 insight from hooks-and-events category

View File

@@ -0,0 +1,342 @@
---
name: hook-deduplication-guide
description: Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions
---
# Hook Deduplication Guide
## Overview
This skill guides you through implementing robust deduplication for Claude Code hooks, using content-based hashing instead of session-based tracking. Prevents duplicate insights from being stored while allowing multiple unique insights per session.
**Based on 1 insight**:
- Hook Deduplication Session Management (hooks-and-events, 2025-11-03)
**Key Capabilities**:
- Content-based deduplication using SHA256 hashes
- Session-independent duplicate detection
- Efficient hash storage with rotation
- State management best practices
## When to Use This Skill
**Trigger Phrases**:
- "implement hook deduplication"
- "prevent duplicate insights in hooks"
- "content-based deduplication for hooks"
- "hook state management patterns"
**Use Cases**:
- Developing new Claude Code hooks that store data
- Refactoring hooks to prevent duplicates
- Implementing efficient state management for hooks
- Debugging duplicate data issues in hooks
**Do NOT use when**:
- Creating hooks that don't store data (read-only hooks)
- Session-based deduplication is actually desired
- Hook doesn't run frequently enough to need deduplication
## Response Style
Educational and practical - explain the why behind content-based vs. session-based deduplication, then guide implementation with code examples.
---
## Workflow
### Phase 1: Choose Deduplication Strategy
**Purpose**: Determine whether content-based or session-based deduplication is appropriate.
**Steps**:
1. **Assess hook behavior**:
- How often does the hook run? (per message, per session, per event)
- What data is being stored? (insights, logs, metrics)
- Is the same content likely to appear across sessions?
2. **Evaluate deduplication needs**:
- **Content-based**: Use when the same insight/data might appear in different sessions
- Example: Extract-explanatory-insights hook (same insight might appear in multiple conversations)
- **Session-based**: Use when duplicates should only be prevented within a session
- Example: Error logging (same error in different sessions should be logged)
3. **Recommend strategy**:
- For insights/lessons-learned: Content-based (SHA256 hashing)
- For session logs/events: Session-based (session ID tracking)
- For unique events: No deduplication needed
**Output**: Clear recommendation on deduplication strategy.
**Common Issues**:
- **Unsure which to use**: Default to content-based for data that's meant to be unique (insights, documentation)
- **Performance concerns**: Content-based hashing is fast (<1ms for typical content)
---
### Phase 2: Implement Content-Based Deduplication
**Purpose**: Set up SHA256 hash-based deduplication with state management.
**Steps**:
1. **Create state directory**:
```bash
mkdir -p ~/.claude/state/hook-state/
```
2. **Initialize hash storage file**:
```bash
HASH_FILE="$HOME/.claude/state/hook-state/content-hashes.txt"
touch "$HASH_FILE"
```
3. **Implement hash generation**:
```bash
# Generate SHA256 hash of content
compute_content_hash() {
local content="$1"
echo -n "$content" | sha256sum | awk '{print $1}'
}
```
4. **Check for duplicates**:
```bash
# Returns 0 if content is new, 1 if duplicate
is_duplicate() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
if grep -Fxq "$content_hash" "$HASH_FILE"; then
return 1 # Duplicate found
else
return 0 # New content
fi
}
```
5. **Store hash after processing**:
```bash
store_content_hash() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
echo "$content_hash" >> "$HASH_FILE"
}
```
6. **Integrate into hook**:
```bash
# In your hook script
content="extracted insight or data"
if is_duplicate "$content"; then
# Skip - duplicate content
echo "Duplicate detected, skipping..." >&2
exit 0
fi
# Process new content
process_content "$content"
# Store hash to prevent future duplicates
store_content_hash "$content"
```
**Output**: Working content-based deduplication in your hook.
**Common Issues**:
- **Hash file grows too large**: Implement rotation (see Phase 3)
- **False positives**: Ensure content normalization (whitespace, formatting)
---
### Phase 3: Implement Hash Rotation
**Purpose**: Prevent hash file from growing indefinitely.
**Steps**:
1. **Set rotation limit**:
```bash
MAX_HASHES=10000 # Keep last 10,000 hashes
```
2. **Implement rotation logic**:
```bash
rotate_hash_file() {
local hash_file="$1"
local max_hashes="${2:-10000}"
# Count current hashes
local current_count=$(wc -l < "$hash_file")
# Rotate if needed
if [ "$current_count" -gt "$max_hashes" ]; then
tail -n "$max_hashes" "$hash_file" > "${hash_file}.tmp"
mv "${hash_file}.tmp" "$hash_file"
echo "Rotated hash file: kept last $max_hashes hashes" >&2
fi
}
```
3. **Call rotation periodically**:
```bash
# After storing new hash
store_content_hash "$content"
rotate_hash_file "$HASH_FILE" 10000
```
**Output**: Self-maintaining hash storage with bounded size.
**Common Issues**:
- **Rotation too aggressive**: Increase MAX_HASHES
- **Rotation too infrequent**: Consider checking count before every append
---
### Phase 4: Testing and Validation
**Purpose**: Verify deduplication works correctly.
**Steps**:
1. **Test duplicate detection**:
```bash
# First run - should process
echo "Test insight" | your_hook.sh
# Check: Content was processed
# Second run - should skip
echo "Test insight" | your_hook.sh
# Check: Duplicate detected message
```
2. **Test multiple unique items**:
```bash
echo "Insight 1" | your_hook.sh # Processed
echo "Insight 2" | your_hook.sh # Processed
echo "Insight 3" | your_hook.sh # Processed
echo "Insight 1" | your_hook.sh # Skipped (duplicate)
```
3. **Verify hash file**:
```bash
cat ~/.claude/state/hook-state/content-hashes.txt
# Should show 3 unique hashes (not 4)
```
4. **Test rotation**:
```bash
# Generate more than MAX_HASHES entries
for i in {1..10500}; do
echo "Insight $i" | your_hook.sh
done
# Verify file size bounded
wc -l ~/.claude/state/hook-state/content-hashes.txt
# Should be ~10000, not 10500
```
**Output**: Confirmed working deduplication with proper rotation.
---
## Reference Materials
- [Original Insight](data/insights-reference.md) - Full context on hook deduplication patterns
---
## Important Reminders
- **Use content-based deduplication for insights/documentation** - prevents duplicates across sessions
- **Use session-based deduplication for logs/events** - same event in different sessions is meaningful
- **Normalize content before hashing** - whitespace differences shouldn't create false negatives
- **Implement rotation** - prevent unbounded hash file growth
- **Hash storage location**: `~/.claude/state/hook-state/` (not project-specific)
- **SHA256 is fast** - no performance concerns for typical hook data
- **Test both paths** - verify both new content and duplicates work correctly
**Warnings**:
- ⚠️ **Do not use session ID alone** - prevents same insight in different sessions from being stored
- ⚠️ **Do not skip rotation** - hash file will grow indefinitely
- ⚠️ **Do not hash before normalization** - formatting changes will cause false negatives
---
## Best Practices
1. **Choose the Right Strategy**: Content-based for unique data, session-based for session-specific events
2. **Normalize Before Hashing**: Strip whitespace, lowercase if appropriate, consistent formatting
3. **Efficient Storage**: Use grep -Fxq for fast hash lookups (fixed-string, line-match, quiet)
4. **Bounded Growth**: Implement rotation to prevent file bloat
5. **Clear Logging**: Log when duplicates are detected for debugging
6. **State Location**: Use ~/.claude/state/hook-state/ for cross-project state
---
## Troubleshooting
### Duplicates not being detected
**Symptoms**: Same content processed multiple times
**Solution**:
1. Check hash file exists and is writable
2. Verify store_content_hash is called after processing
3. Check content normalization (whitespace differences)
4. Verify grep command uses -Fxq flags
**Prevention**: Test deduplication immediately after implementation
---
### Hash file growing too large
**Symptoms**: Hash file exceeds MAX_HASHES significantly
**Solution**:
1. Verify rotate_hash_file is called
2. Check MAX_HASHES value is reasonable
3. Manually rotate if needed: `tail -n 10000 hashes.txt > hashes.tmp && mv hashes.tmp hashes.txt`
**Prevention**: Call rotation after every hash storage
---
### False positives (new content marked as duplicate)
**Symptoms**: Different content being skipped
**Solution**:
1. Check for hash collisions (extremely unlikely with SHA256)
2. Verify content is actually different
3. Check normalization isn't too aggressive
4. Review recent hashes in file
**Prevention**: Use consistent normalization, test with diverse content
---
## Next Steps
After implementing deduplication:
1. Monitor hash file growth over time
2. Tune MAX_HASHES based on usage patterns
3. Consider adding metrics (duplicates prevented, storage size)
4. Share pattern with team for other hooks
---
## Metadata
**Source Insights**:
- Session: abc123-session-id
- Date: 2025-11-03
- Category: hooks-and-events
- File: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
**Skill Version**: 0.1.0
**Generated**: 2025-11-16
**Last Updated**: 2025-11-16

View File

@@ -0,0 +1,116 @@
# Insights Reference: hook-deduplication-guide
This document contains the original insight from Claude Code's Explanatory output style that was used to create the **Hook Deduplication Guide** skill.
## Overview
**Total Insights**: 1
**Date Range**: 2025-11-03
**Categories**: hooks-and-events
**Sessions**: 1 unique session
---
## 1. Hook Deduplication Session Management
**Metadata**:
- **Date**: 2025-11-03
- **Category**: hooks-and-events
- **Session**: abc123-session-id
- **Source File**: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
**Original Content**:
The extract-explanatory-insights hook initially used session-based deduplication, which prevented multiple insights from the same session from being stored. However, this created a limitation: if the same valuable insight appeared in different sessions, only the first one would be saved.
By switching to content-based deduplication using SHA256 hashing, we can:
1. **Allow multiple unique insights per session** - Different insights in the same conversation are all preserved
2. **Prevent true duplicates across sessions** - The same insight appearing in multiple conversations is stored only once
3. **Maintain efficient storage** - Hash file rotation keeps storage bounded
The implementation involves:
**Hash Generation**:
```bash
compute_content_hash() {
local content="$1"
echo -n "$content" | sha256sum | awk '{print $1}'
}
```
**Duplicate Detection**:
```bash
is_duplicate() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
if grep -Fxq "$content_hash" "$HASH_FILE"; then
return 1 # Duplicate
else
return 0 # New content
fi
}
```
**Hash Storage with Rotation**:
```bash
store_content_hash() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
echo "$content_hash" >> "$HASH_FILE"
# Rotate if file exceeds MAX_HASHES
local count=$(wc -l < "$HASH_FILE")
if [ "$count" -gt 10000 ]; then
tail -n 10000 "$HASH_FILE" > "${HASH_FILE}.tmp"
mv "${HASH_FILE}.tmp" "$HASH_FILE"
fi
}
```
This approach provides the best of both worlds: session independence and true deduplication based on content, not session boundaries.
---
## How This Insight Informs the Skill
### Hook Deduplication Session Management → Phase-Based Workflow
The insight's structure (problem → solution → implementation) maps directly to the skill's phases:
- **Problem Description** → Phase 1: Choose Deduplication Strategy
- Explains why session-based is insufficient
- Defines when content-based is needed
- **Solution Explanation** → Phase 2: Implement Content-Based Deduplication
- Hash generation logic
- Duplicate detection mechanism
- State file management
- **Implementation Details** → Phase 3: Implement Hash Rotation
- Rotation logic to prevent unbounded growth
- MAX_HASHES configuration
- **Code Examples** → All phases
- Bash functions extracted and integrated into workflow steps
---
## Additional Context
**Why This Insight Was Selected**:
This insight was selected for skill generation because it:
1. Provides a complete, actionable pattern
2. Includes working code examples
3. Solves a common problem in hook development
4. Is generally applicable (not project-specific)
5. Has clear benefits over the naive approach
**Quality Score**: 0.85 (high - qualified for standalone skill)
---
**Generated**: 2025-11-16
**Last Updated**: 2025-11-16

View File

@@ -0,0 +1,15 @@
{
"name": "hook-deduplication-guide",
"version": "0.1.0",
"description": "Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions",
"type": "skill",
"author": "Connor",
"category": "productivity",
"tags": [
"hooks",
"deduplication",
"state-management",
"bash",
"generated-from-insights"
]
}

View File

@@ -0,0 +1,72 @@
# Troubleshooting Guide
## No insights found
**Symptoms**: Phase 1 reports 0 insights
**Solution**:
1. Verify `docs/lessons-learned/` exists in project
2. Check for alternative locations (ask user)
3. Verify insights were actually generated by extract-explanatory-insights hook
4. Check file naming pattern matches YYYY-MM-DD-*.md
**Prevention**: Set up extract-explanatory-insights hook if not already configured
---
## All insights cluster into one giant skill
**Symptoms**: Phase 2 creates one cluster with 10+ insights
**Solution**:
1. Increase clustering threshold (e.g., 0.6 → 0.7)
2. Enable sub-clustering (split by sub-topics or time periods)
3. Manually split cluster in Phase 2 user review
4. Create mode-based skill with different modes for different sub-topics
**Prevention**: Tune clustering-config.yaml thresholds for your domain
---
## Generated skill doesn't load
**Symptoms**: Skill not recognized by Claude Code after installation
**Solution**:
1. Check YAML frontmatter syntax (no tabs, proper dashes)
2. Verify name field is lowercase kebab-case
3. Check description doesn't contain special characters that break parsing
4. Restart Claude Code session
5. Check skill file permissions (should be readable)
**Prevention**: Always run validation in Phase 4 before installation
---
## Trigger phrases don't activate skill
**Symptoms**: Using trigger phrase doesn't invoke the skill
**Solution**:
1. Make trigger phrases more specific (avoid overly generic phrases)
2. Include domain keywords in trigger phrases
3. Add "PROACTIVELY when" to description for auto-triggering
4. Try exact phrase match vs. semantic match
5. Check for conflicts with built-in commands or other skills
**Prevention**: Test trigger phrases in Phase 5 before finalizing
---
## Generated content quality is low
**Symptoms**: SKILL.md is vague, missing details, or poorly organized
**Solution**:
1. Check insight quality (garbage in, garbage out)
2. Manually edit SKILL.md to improve clarity
3. Add more examples and context
4. Reorganize sections for better flow
5. Iterate to version 0.2.0 with improvements
**Prevention**: Filter low-quality insights in Phase 2, prioritize insights with clear action items

View File

@@ -0,0 +1,211 @@
# Template: Insight-Based Skill (SKILL.md)
This template shows the recommended structure for skills generated from insights.
Use this as a reference pattern when creating SKILL.md files.
---
name: {{ skill_name }}
description: {{ description }}
---
# {{ skill_title }}
## Overview
{{ overview_text }}
**Based on {{ insight_count }} insight(s)**:
{% for insight in insights %}
- {{ insight.title }} ({{ insight.category }}, {{ insight.date }})
{% endfor %}
**Key Capabilities**:
{% for capability in capabilities %}
- {{ capability }}
{% endfor %}
## When to Use This Skill
**Trigger Phrases**:
{% for trigger in trigger_phrases %}
- "{{ trigger }}"
{% endfor %}
**Use Cases**:
{% for use_case in use_cases %}
- {{ use_case }}
{% endfor %}
**Do NOT use when**:
{% for anti_use_case in anti_use_cases %}
- {{ anti_use_case }}
{% endfor %}
## Response Style
{{ response_style }}
---
{% if pattern == 'phase-based' %}
## Workflow
{% for phase in phases %}
### Phase {{ loop.index }}: {{ phase.name }}
**Purpose**: {{ phase.purpose }}
**Steps**:
{% for step in phase.steps %}
{{ loop.index }}. {{ step }}
{% endfor %}
**Output**: {{ phase.output }}
{% if phase.common_issues %}
**Common Issues**:
{% for issue in phase.common_issues %}
- {{ issue }}
{% endfor %}
{% endif %}
{% endfor %}
{% elif pattern == 'mode-based' %}
## Mode Selection
This skill operates in {{ modes|length }} distinct modes. Choose based on your needs:
{% for mode in modes %}
### Mode {{ loop.index }}: {{ mode.name }}
**When to use**: {{ mode.when_to_use }}
**Steps**:
{% for step in mode.steps %}
{{ loop.index }}. {{ step }}
{% endfor %}
**Output**: {{ mode.output }}
{% endfor %}
{% elif pattern == 'validation' %}
## Validation Workflow
### Analysis Phase
**Steps**:
{% for step in analysis_steps %}
{{ loop.index }}. {{ step }}
{% endfor %}
### Issue Detection
**Checks performed**:
{% for check in checks %}
- {{ check }}
{% endfor %}
### Recommendations
**Output format**:
- Issue severity (Critical/High/Medium/Low)
- Issue description
- Recommended fix
- Code examples (if applicable)
{% endif %}
---
## Reference Materials
{% if has_data_dir %}
Detailed reference information is available in:
- [Insights Reference](data/insights-reference.md) - Original insights that informed this skill
{% if has_additional_data %}
{% for data_file in data_files %}
- [{{ data_file.title }}](data/{{ data_file.filename }}) - {{ data_file.description }}
{% endfor %}
{% endif %}
{% endif %}
{% if has_examples %}
**Examples**:
{% for example in examples %}
- [{{ example.title }}](examples/{{ example.filename }}) - {{ example.description }}
{% endfor %}
{% endif %}
{% if has_templates %}
**Templates**:
{% for template in templates %}
- [{{ template.title }}](templates/{{ template.filename }}) - {{ template.description }}
{% endfor %}
{% endif %}
---
## Important Reminders
{% for reminder in important_reminders %}
- {{ reminder }}
{% endfor %}
{% if warnings %}
**Warnings**:
{% for warning in warnings %}
- ⚠️ {{ warning }}
{% endfor %}
{% endif %}
---
## Best Practices
{% for practice in best_practices %}
{{ loop.index }}. **{{ practice.title }}**: {{ practice.description }}
{% endfor %}
---
## Troubleshooting
{% for issue in troubleshooting %}
### {{ issue.problem }}
**Symptoms**: {{ issue.symptoms }}
**Solution**: {{ issue.solution }}
{% if issue.prevention %}
**Prevention**: {{ issue.prevention }}
{% endif %}
{% endfor %}
---
## Next Steps
After using this skill:
{% for next_step in next_steps %}
{{ loop.index }}. {{ next_step }}
{% endfor %}
---
## Metadata
**Source Insights**:
{% for insight in insights %}
- Session: {{ insight.session_id }}
- Date: {{ insight.date }}
- Category: {{ insight.category }}
- File: {{ insight.source_file }}
{% endfor %}
**Skill Version**: {{ version }}
**Generated**: {{ generated_date }}
**Last Updated**: {{ updated_date }}

View File

@@ -0,0 +1,71 @@
# Template: Actionable Checklist (templates/checklist.md)
This template creates actionable checklists from insight action items.
# {{ skill_title }} - Checklist
This checklist extracts actionable items from the insights that inform this skill.
## Overview
Use this checklist to ensure you've applied all recommendations from the **{{ skill_title }}** skill.
**Total Items**: {{ total_items }}
**Estimated Time**: {{ estimated_time }}
---
{% for section in sections %}
## {{ section.name }}
{{ section.description }}
{% for item in section.items %}
- [ ] {{ item.description }}
{% if item.details %}
- Details: {{ item.details }}
{% endif %}
{% if item.why %}
- Why: {{ item.why }}
{% endif %}
{% if item.how %}
- How: {{ item.how }}
{% endif %}
{% if item.validation %}
- Validation: {{ item.validation }}
{% endif %}
{% endfor %}
{% if section.notes %}
**Notes**:
{% for note in section.notes %}
- {{ note }}
{% endfor %}
{% endif %}
---
{% endfor %}
## Verification
After completing this checklist:
{% for verification in verifications %}
- [ ] {{ verification }}
{% endfor %}
## Common Mistakes
{% for mistake in common_mistakes %}
{{ loop.index }}. **{{ mistake.what }}**
- How to avoid: {{ mistake.how_to_avoid }}
{% if mistake.fix %}
- If you made this mistake: {{ mistake.fix }}
{% endif %}
{% endfor %}
---
**Source**: Generated from {{ insight_count }} insight(s)
**Last Updated**: {{ updated_date }}

View File

@@ -0,0 +1,96 @@
# Template: Insights Reference (data/insights-reference.md)
This template consolidates the original insights that informed a generated skill.
# Insights Reference: {{ skill_name }}
This document contains the original insights from Claude Code's Explanatory output style that were used to create the **{{ skill_title }}** skill.
## Overview
**Total Insights**: {{ insight_count }}
**Date Range**: {{ earliest_date }} to {{ latest_date }}
**Categories**: {{ categories|join(', ') }}
**Sessions**: {{ session_count }} unique session(s)
---
{% for insight in insights %}
## {{ loop.index }}. {{ insight.title }}
**Metadata**:
- **Date**: {{ insight.date }}
- **Category**: {{ insight.category }}
- **Session**: {{ insight.session_id }}
- **Source File**: {{ insight.source_file }}
**Original Content**:
{{ insight.content }}
{% if insight.code_examples %}
**Code Examples from this Insight**:
{% for example in insight.code_examples %}
```{{ example.language }}
{{ example.code }}
```
{% endfor %}
{% endif %}
{% if insight.related_insights %}
**Related Insights**: {{ insight.related_insights|join(', ') }}
{% endif %}
---
{% endfor %}
## Insight Clustering Analysis
**Similarity Scores**:
{% for cluster in clusters %}
- Cluster {{ loop.index }}: {{ cluster.insights|join(', ') }} (score: {{ cluster.score }})
{% endfor %}
**Common Keywords**:
{% for keyword in common_keywords %}
- {{ keyword.word }} (frequency: {{ keyword.count }})
{% endfor %}
**Category Distribution**:
{% for category, count in category_distribution.items() %}
- {{ category }}: {{ count }} insight(s)
{% endfor %}
---
## How These Insights Inform the Skill
{% for mapping in insight_mappings %}
### {{ mapping.insight_title }} → {{ mapping.skill_section }}
{{ mapping.explanation }}
{% endfor %}
---
## Additional Context
**Why These Insights Were Selected**:
{{ selection_rationale }}
**Insights Not Included** (if any):
{% if excluded_insights %}
{% for excluded in excluded_insights %}
- {{ excluded.title }}: {{ excluded.reason }}
{% endfor %}
{% else %}
All relevant insights were included.
{% endif %}
---
**Generated**: {{ generated_date }}
**Last Updated**: {{ updated_date }}

View File

@@ -0,0 +1,139 @@
# Phase 1: Insight Discovery and Parsing
**Purpose**: Locate, read, deduplicate, and structure all insights from the project's lessons-learned directory.
## Steps
### 1. Verify project structure
- Ask user for project root directory (default: current working directory)
- Check if `docs/lessons-learned/` exists
- If not found, explain the expected structure and offer to search alternative locations
- List all categories found (testing, configuration, hooks-and-events, etc.)
### 2. Scan and catalog insight files
**File Naming Convention**:
Files MUST follow: `YYYY-MM-DD-descriptive-slug.md`
- Date prefix for chronological sorting
- Descriptive slug (3-5 words) summarizing the insight topic
- Examples:
- `2025-11-21-jwt-refresh-token-pattern.md`
- `2025-11-20-vitest-mocking-best-practices.md`
- `2025-11-19-react-testing-library-queries.md`
**Scanning**:
- Use Glob tool to find all markdown files: `docs/lessons-learned/**/*.md`
- For each file found, extract:
- File path and category (from directory name)
- Creation date (from filename prefix)
- Descriptive title (from filename slug)
- File size and line count
- Build initial inventory report
### 3. Deduplicate insights (CRITICAL)
**Why**: The extraction hook may create duplicate entries within files.
**Deduplication Algorithm**:
```python
def deduplicate_insights(insights):
seen_hashes = set()
unique_insights = []
for insight in insights:
# Create hash from normalized content
content_hash = hash(normalize(insight.title + insight.content[:200]))
if content_hash not in seen_hashes:
seen_hashes.add(content_hash)
unique_insights.append(insight)
else:
log_duplicate(insight)
return unique_insights
```
**Deduplication Checks**:
- Exact title match → duplicate
- First 200 chars content match → duplicate
- Same code blocks in same order → duplicate
- Report: "Found X insights, removed Y duplicates (Z unique)"
### 4. Parse individual insights
- Read each file using Read tool
- Extract session metadata (session ID, timestamp from file headers)
- Split file content on `---` separator (insights are separated by horizontal rules)
- For each insight section:
- Extract title (first line, often wrapped in `**bold**`)
- Extract body content (remaining markdown)
- Identify code blocks
- Extract actionable items (lines starting with `- [ ]` or numbered lists)
- Note any warnings/cautions
### 5. Apply quality filters
**Filter out low-depth insights** that are:
- Basic explanatory notes without actionable steps
- Simple definitions or concept explanations
- Single-paragraph observations
**Keep insights that have**:
- Actionable workflows (numbered steps, checklists)
- Decision frameworks (trade-offs, when to use X vs Y)
- Code patterns with explanation of WHY
- Troubleshooting guides with solutions
- Best practices with concrete examples
**Quality Score Calculation**:
```
score = 0
if has_actionable_items: score += 3
if has_code_examples: score += 2
if has_numbered_steps: score += 2
if word_count > 200: score += 1
if has_warnings_or_notes: score += 1
# Minimum score for skill consideration: 4
```
### 6. Build structured insight inventory
```
{
id: unique_id,
title: string,
content: string,
category: string,
date: ISO_date,
session_id: string,
source_file: path,
code_examples: [{ language, code }],
action_items: [string],
keywords: [string],
quality_score: int,
paragraph_count: int,
line_count: int
}
```
### 7. Present discovery summary
- Total insights found (before deduplication)
- Duplicates removed
- Low-quality insights filtered
- **Final count**: Unique, quality insights
- Category breakdown
- Date range (earliest to latest)
- Preview of top 5 insights by quality score
## Output
Deduplicated, quality-filtered inventory of insights with metadata and categorization.
## Common Issues
- **No lessons-learned directory**: Ask if user wants to search elsewhere or exit
- **Empty files**: Skip and report count of empty files
- **Malformed markdown**: Log warning but continue parsing (best effort)
- **Missing session metadata**: Use filename date as fallback
- **High duplicate count**: Indicates extraction hook bug - warn user
- **All insights filtered as low-quality**: Lower threshold or suggest manual curation
- **Files without descriptive names**: Suggest renaming for better organization

View File

@@ -0,0 +1,82 @@
# Phase 2: Smart Clustering
**Purpose**: Group related insights using similarity analysis to identify skill candidates.
## Steps
### 1. Load clustering configuration
- Read `data/clustering-config.yaml` for weights and thresholds
- Similarity weights:
- Same category: +0.3
- Shared keyword: +0.1 per keyword
- Temporal proximity (within 7 days): +0.05
- Title similarity: +0.15
- Content overlap: +0.2
- Clustering threshold: 0.6 minimum to group
- Standalone quality threshold: 0.8 for single-insight skills
### 2. Extract keywords from each insight
- Normalize text (lowercase, remove punctuation)
- Extract significant words from title (weight 2x)
- Extract significant words from body (weight 1x)
- Filter out common stop words
- Apply category-specific keyword boosting
- Build keyword vector for each insight
### 3. Calculate pairwise similarity scores
For each pair of insights (i, j):
- Base score = 0
- If same category: +0.3
- For each shared keyword: +0.1
- If dates within 7 days: +0.05
- Calculate title word overlap: shared_words / total_words * 0.15
- Calculate content concept overlap: shared_concepts / total_concepts * 0.2
- Final score = sum of all components
### 4. Build clusters
- Start with highest similarity pairs
- Group insights with similarity >= 0.6
- Use connected components algorithm
- Identify standalone insights (don't cluster with any others)
- For standalone insights, check if quality score >= 0.8
### 5. Assess cluster characteristics
For each cluster:
- Count insights
- Identify dominant category
- Extract common keywords
- Assess complexity (lines, code examples, etc.)
- Recommend skill complexity (minimal/standard/complex)
- Suggest skill pattern (phase-based/mode-based/validation)
### 6. Handle large clusters (>5 insights)
- Attempt sub-clustering by:
- Temporal splits (early vs. late insights)
- Sub-topic splits (different keyword groups)
- Complexity splits (simple vs. complex insights)
- Ask user if they want to split or keep as comprehensive skill
### 7. Present clustering results interactively
For each cluster, show:
- Cluster ID and size
- Suggested skill name (from keywords)
- Dominant category
- Insight titles in cluster
- Similarity scores
- Recommended complexity
Ask user to:
- Review proposed clusters
- Accept/reject/modify groupings
- Combine or split clusters
- Remove low-value insights
## Output
Validated clusters of insights, each representing a skill candidate.
## Common Issues
- **All insights are unrelated** (no clusters): Offer to generate standalone skills or exit
- **One giant cluster**: Suggest sub-clustering or mode-based skill
- **Too many standalone insights**: Suggest raising similarity threshold or manual grouping

View File

@@ -0,0 +1,82 @@
# Phase 3: Interactive Skill Design
**Purpose**: For each skill candidate, design the skill structure with user customization.
## Steps
### 1. Propose skill name
- Extract top keywords from cluster
- Apply naming heuristics:
- Max 40 characters
- Kebab-case
- Remove filler words ("insight", "lesson", "the")
- Add preferred suffix ("guide", "advisor", "helper")
- Example: "hook-deduplication-session-management" → "hook-deduplication-guide"
- Present to user with alternatives
- Allow user to customize
### 2. Generate description
- Use action verbs: "Use PROACTIVELY when", "Guides", "Analyzes"
- Include trigger context (what scenario)
- Include benefit (what outcome)
- Keep under 150 chars (soft limit, hard limit 1024)
- Present to user and allow editing
### 3. Assess complexity
Calculate based on:
- Number of insights (1 = minimal, 2-4 = standard, 5+ = complex)
- Total content length
- Presence of code examples
- Actionable items count
Recommend: minimal, standard, or complex
- Minimal: SKILL.md + README.md + plugin.json + CHANGELOG.md
- Standard: + data/insights-reference.md + examples/
- Complex: + templates/ + multiple examples/
### 4. Select skill pattern
Analyze insight content for pattern indicators:
- **Phase-based**: sequential steps, "first/then/finally"
- **Mode-based**: multiple approaches, "alternatively", "option"
- **Validation**: checking/auditing language, "ensure", "verify"
- **Data-processing**: parsing/transformation language
Recommend pattern with confidence level and explain trade-offs.
### 5. Map insights to skill structure
For each insight, identify content types:
- Problem description → Overview section
- Solution explanation → Workflow/Phases
- Code examples → examples/ directory
- Best practices → Important Reminders
- Checklists → templates/checklist.md
- Trade-offs → Decision Guide section
- Warnings → Important Reminders (high priority)
### 6. Define workflow phases (if phase-based)
For each phase:
- Generate phase name from insight content
- Extract purpose statement
- List steps (from insight action items or narrative)
- Define expected output
- Note common issues (from warnings in insights)
### 7. Preview the skill design
Show complete outline:
- Name, description, complexity
- Pattern and structure
- Section breakdown
- File structure
Ask for final confirmation or modifications.
## Output
Approved skill design specification ready for generation.
## Common Issues
- **User unsure about pattern**: Show examples from existing skills, offer recommendation
- **Naming conflicts**: Check ~/.claude/skills/ and .claude/skills/ for existing skills
- **Description too long**: Auto-trim and ask user to review
- **Unclear structure**: Fall back to default phase-based pattern

View File

@@ -0,0 +1,89 @@
# Phase 4: Skill Generation
**Purpose**: Create all skill files following the approved design.
## Steps
### 1. Prepare generation workspace
- Create temporary directory for skill assembly
- Load templates from `templates/` directory
### 2. Generate SKILL.md
- Create frontmatter with name and description
- Add h1 heading
- Generate Overview section (what, based on X insights, capabilities)
- Generate "When to Use" section (trigger phrases, use cases, anti-use cases)
- Generate Response Style section
- Generate workflow sections based on pattern:
- Phase-based: Phase 1, Phase 2, etc. with Purpose, Steps, Output, Common Issues
- Mode-based: Mode 1, Mode 2, etc. with When to use, Steps, Output
- Validation: Analysis → Detection → Recommendations
- Generate Reference Materials section
- Generate Important Reminders
- Generate Best Practices
- Generate Troubleshooting
- Add Metadata section with source insight attribution
### 3. Generate README.md
- Brief overview (1-2 sentences)
- Installation instructions (standard)
- Quick start example
- Trigger phrases list
- Link to SKILL.md for details
### 4. Generate plugin.json
```json
{
"name": "[skill-name]",
"version": "0.1.0",
"description": "[description]",
"type": "skill",
"author": "Connor",
"category": "[category from clustering-config]",
"tags": ["insights", "lessons-learned", "[domain]"]
}
```
### 5. Generate CHANGELOG.md
Initialize with v0.1.0 and list key features.
### 6. Generate data/insights-reference.md (if complexity >= standard)
- Add overview (insight count, date range, categories)
- For each insight: title, metadata, original content, code examples, related insights
- Add clustering analysis section
- Add insight-to-skill mapping explanation
### 7. Generate examples/ (if needed)
- Extract and organize code blocks by language or topic
- Add explanatory context
- Create usage examples showing example prompts and expected behaviors
### 8. Generate templates/ (if needed)
- Create templates/checklist.md from actionable items
- Organize items by section
- Add verification steps
- Include common mistakes section
### 9. Validate all generated files
- Check YAML frontmatter syntax
- Validate JSON syntax
- Check file references are valid
- Verify no broken markdown links
- Run quality checklist
- Report validation results to user
### 10. Preview generated skill
- Show file tree
- Show key sections from SKILL.md
- Show README.md preview
- Highlight any validation warnings
## Output
Complete, validated skill in temporary workspace, ready for installation.
## Common Issues
- **Validation failures**: Fix automatically if possible, otherwise ask user
- **Missing code examples**: Offer to generate placeholder or skip examples/ directory
- **Large SKILL.md** (>500 lines): Suggest splitting content into separate files

View File

@@ -0,0 +1,88 @@
# Phase 5: Installation and Testing
**Purpose**: Install the skill and provide testing guidance.
## Steps
### 1. Ask installation location
Present options:
- **Project-specific**: `[project]/.claude/skills/[skill-name]/`
- Pros: Version controlled with project, only available in this project
- Cons: Not available in other projects
- **Global**: `~/.claude/skills/[skill-name]/`
- Pros: Available in all projects
- Cons: Not version controlled (unless user manages ~/.claude with git)
### 2. Check for conflicts
- Verify chosen location doesn't already have a skill with same name
- If conflict found:
- Show existing skill details
- Offer options: Choose different name, Overwrite (with confirmation), Cancel
### 3. Copy skill files
- Create target directory
- Copy all generated files preserving structure
- Set appropriate permissions
- Verify all files copied successfully
### 4. Re-validate installed skill
- Read SKILL.md from install location
- Verify frontmatter is still valid
- Check file references work from install location
- Confirm no corruption during copy
### 5. Test skill loading
- Attempt to trigger skill using one of the trigger phrases
- Verify Claude Code recognizes the skill
- Check skill appears in available skills list
- Report results to user
### 6. Provide testing guidance
Show trigger phrases to test:
```
Try these phrases to test your new skill:
- "[trigger phrase 1]"
- "[trigger phrase 2]"
- "[trigger phrase 3]"
```
Suggest test scenarios based on skill purpose and explain expected behavior.
### 7. Offer refinement suggestions
Based on skill characteristics, suggest potential improvements:
- Add more examples if skill is complex
- Refine trigger phrases if they're too broad/narrow
- Split into multiple skills if scope is too large
- Add troubleshooting section if skill has edge cases
Ask if user wants to iterate on the skill.
### 8. Document the skill
Offer to add skill to project documentation:
```markdown
### [Skill Name]
**Location**: [path]
**Purpose**: [description]
**Trigger**: "[main trigger phrase]"
**Source**: Generated from [X] insights ([categories])
```
### 9. Next steps
Suggest:
- Test the skill with real scenarios
- Share with team if relevant
- Iterate based on usage (version 0.2.0)
- Generate more skills from other insight clusters
Ask if user wants to generate another skill from remaining insights.
## Output
Installed, validated skill with testing guidance and refinement suggestions.
## Common Issues
- **Installation permission errors**: Check directory permissions, suggest sudo if needed
- **Skill not recognized**: Verify frontmatter format, check Claude Code skill discovery
- **Trigger phrases don't work**: Suggest broadening or clarifying phrases
- **Conflicts with existing skills**: Help user choose unique name or merge functionality

View File

@@ -0,0 +1,15 @@
# Changelog
## 0.2.0
- Refactored to Anthropic progressive disclosure pattern
- Updated description with "Use PROACTIVELY when..." format
- Removed version/author from frontmatter
## 0.1.0
- Initial release with guided skill creation workflow
- Four modes: Guided, Quick Start, Clone, Validate
- Jinja2 template system for SKILL.md, README.md, CHANGELOG.md
- Pattern library: mode-based, phase-based, validation, data-processing
- Quality validation with A-F grade scoring

View File

@@ -0,0 +1,328 @@
# Skill Creator
Automated skill generation tool that creates production-ready Claude Code skills following Claudex marketplace standards with intelligent templates, pattern detection, and quality validation.
## Quick Start
```
User: "Create a new skill for validating API responses"
```
Claude will:
1. Guide you through interactive questions
2. Detect appropriate skill type and pattern
3. Generate all required files with templates
4. Install to ~/.claude/skills/
5. Provide testing guidance and next steps
## Features
### Feature 1: Intelligent Skill Generation
- Interactive guided creation with smart defaults
- Automatic skill type detection (minimal/standard/complex)
- Pattern selection based on skill purpose
- Jinja2-based template population
- Quality validation before finalization
### Feature 2: Multiple Creation Modes
- **Guided Creation**: Full interactive process with questions
- **Quick Start**: Template-based for fast setup
- **Clone & Modify**: Base on existing skill patterns
- **Validation Only**: Check existing skill quality
### Feature 3: Comprehensive Structure
- All required files (SKILL.md, README.md, plugin.json, CHANGELOG.md)
- Optional directories based on complexity (data/, examples/, templates/, modes/, scripts/)
- Pattern-specific templates and guidance
- Clear TODO markers for customization
### Feature 4: Quality Assurance
- Built-in quality checklist validation
- Security checks (no secrets or sensitive data)
- Syntax validation (YAML, JSON, Markdown)
- Naming convention enforcement
- Grade-based scoring (A-F)
## Installation
```bash
# Already installed in your .claude directory!
# Located at: ~/.claude/skills/skill-creator/
```
Or install manually:
```bash
cp -r skill-creator ~/.claude/skills/
```
## Usage Examples
### Example 1: Create Analysis Skill
**Scenario:** Create a skill that audits React components for performance issues
```
User: "Create a new skill for auditing React components"
```
**Claude will ask:**
1. Skill name: "react-performance-auditor"
2. Description: "Analyzes React components for performance anti-patterns"
3. Category: analysis (auto-suggested)
4. Trigger phrases: (auto-generated + user confirms)
5. Complexity: Standard (has reference materials)
**Result:**
- Complete skill directory created
- Standard structure with data/ for anti-patterns reference
- Validation pattern applied
- Quality report: Grade A
- Ready for customization
### Example 2: Create Multi-Mode Skill
**Scenario:** Create a skill that manages environment variables (create/update/delete/list)
```
User: "Create a skill for managing environment variables with multiple modes"
```
**Claude will ask:**
1. Basic info (name, description, author)
2. Confirms: Complex skill with modes
3. How many modes? 4 (create, update, delete, list)
4. For each mode: name and trigger phrase
5. Structure: Creates modes/ directory
**Result:**
- Complex skill with mode-based pattern
- Separate workflow files for each mode
- Mode detection logic in SKILL.md
- Quick decision matrix included
- Ready for mode-specific customization
### Example 3: Create Data Processing Skill
**Scenario:** Create a skill that analyzes git commit history
```
User: "Create a skill that analyzes git commit patterns"
```
**Claude will detect:**
- Data processing skill (analyzes git data)
- Needs scripts/ directory
- Should generate reports
**Result:**
- Complex data-processing structure
- scripts/ directory with placeholder scripts
- Data pipeline architecture documented
- Report templates included
- Performance characteristics section
### Example 4: Quick Start with Template
**Scenario:** Quickly scaffold a minimal skill
```
User: "Create a minimal skill called code-formatter"
```
**Claude will:**
1. Recognize "minimal" keyword
2. Ask only essential questions (name, description)
3. Use minimal template from examples/
4. Generate with defaults
5. Flag customization points
**Result:**
- Minimal structure (4 required files only)
- Fast generation (<1 minute)
- All customization points marked with TODO
- Simple phase-based workflow template
### Example 5: Clone Existing Pattern
**Scenario:** Create skill with same structure as codebase-auditor
```
User: "Create a skill similar to codebase-auditor for database schemas"
```
**Claude will:**
1. Read codebase-auditor structure
2. Extract pattern (validation, phase-based)
3. Ask for new skill details
4. Generate with same organizational structure
5. Clear codebase-specific content
**Result:**
- Same directory structure as codebase-auditor
- Validation pattern applied
- data/ and examples/ directories included
- Content cleared, ready for customization
### Example 6: Validate Existing Skill
**Scenario:** Check quality of skill you're working on
```
User: "Validate my custom-skill"
```
**Claude will:**
1. Locate skill at ~/.claude/skills/custom-skill/
2. Run quality checklist
3. Check all files and syntax
4. Generate detailed report
5. Provide remediation steps
**Result:**
```markdown
# Quality Report: custom-skill
## Grade: B (85/100)
### Issues Found:
⚠️ HIGH: Missing usage examples in README.md
📋 MEDIUM: Could use more trigger phrases (only 2, recommend 3-5)
LOW: CHANGELOG could include more detail
### Remediation:
1. Add 2-3 concrete examples to README.md
2. Add 1-2 more trigger phrases to SKILL.md
3. Expand CHANGELOG Added section
### Security: ✅ PASS (no issues)
### Syntax: ✅ PASS (all valid)
```
## Requirements
- Claude Code with Skills support
- Write access to ~/.claude/skills/ directory
- Python 3.8+ (for Jinja2 templates, if using scripts)
## Configuration
No additional configuration required. The skill uses:
- Built-in templates from `templates/`
- Pattern libraries from `patterns/`
- Reference data from `data/`
- Examples from `examples/`
## Troubleshooting
### Issue 1: Skill name already exists
**Problem:** Directory ~/.claude/skills/[name]/ already exists
**Solution:**
- Choose a different name, or
- Backup existing skill and remove directory, or
- Use validation mode to check existing skill instead
### Issue 2: Permission denied
**Problem:** Cannot write to ~/.claude/skills/
**Solution:**
```bash
# Check permissions
ls -la ~/.claude/
# Fix permissions if needed
chmod 755 ~/.claude/skills/
# Verify
ls -la ~/.claude/skills/
```
### Issue 3: Generated skill won't load
**Problem:** Claude Code doesn't recognize new skill
**Solution:**
1. Check YAML frontmatter syntax in SKILL.md
2. Verify plugin.json is valid JSON
3. Restart Claude Code session
4. Check skill appears in skill list
### Issue 4: Templates not rendering
**Problem:** Jinja2 template errors during generation
**Solution:**
- Verify templates/ directory exists
- Check template syntax
- Report issue with specific error message
## Best Practices
1. **Start Simple**: Use minimal structure, grow as needed
2. **Clear Trigger Phrases**: Make them intuitive and specific
3. **Concrete Examples**: Show real usage scenarios in README
4. **Test Early**: Try trigger phrases immediately after generation
5. **Iterate**: Customize, test, refine workflow
6. **Validate Often**: Run validation after changes
7. **Reference Examples**: Look at existing skills for inspiration
8. **Document Well**: Future you will thank you
## Limitations
- Cannot automatically implement skill logic (only scaffolding)
- Jinja2 templates are opinionated (based on Claudex standards)
- Assumes standard skill structure (may not fit all use cases)
- Quality validation is structural (doesn't test functionality)
- Mode detection requires clear user intent
## Contributing
See [CONTRIBUTING.md](https://github.com/cskiro/claudex/blob/main/CONTRIBUTING.md) for contribution guidelines.
## License
Apache 2.0
## Version History
See [CHANGELOG.md](./CHANGELOG.md) for version history.
## Quick Reference
### Skill Types
- **Minimal**: Simple automation, single workflow (4 files)
- **Standard**: Sequential phases, reference materials (4 files + 3 dirs)
- **Complex (Mode-Based)**: Multiple distinct modes (4 files + modes/)
- **Complex (Data Processing)**: Data analysis, reports (4 files + scripts/)
### Patterns
- **Phase-Based**: Sequential workflow with clear stages
- **Mode-Based**: Multiple workflows based on user intent
- **Validation**: Audit/compliance checking pattern
- **Data Processing**: Ingest → Process → Analyze → Report
### Categories
- **analysis**: Code analysis, auditing, quality checking
- **tooling**: Development tools, configuration validators
- **productivity**: Workflow, automation, insights
- **devops**: Infrastructure, deployment, monitoring
### Creation Modes
- **Guided**: Full interactive (most control)
- **Quick Start**: Template-based (fastest)
- **Clone**: Copy existing pattern (proven structure)
- **Validate**: Check existing quality (QA)
## Support
For questions or issues:
1. Check this README for common scenarios
2. Review examples/ directory for structure guidance
3. Consult patterns/ for pattern-specific guidance
4. Read data/quality-checklist.md for validation criteria
5. Open a discussion on GitHub
## Related Skills
- **claude-md-auditor**: Validates SKILL.md files specifically
- **codebase-auditor**: General code quality analysis
- All skills in ~/.claude/skills/ serve as examples
---
**Remember**: This skill handles the boring scaffolding work so you can focus on the creative and domain-specific parts of your skill!

View File

@@ -0,0 +1,182 @@
---
name: skill-creator
description: Use PROACTIVELY when creating new Claude Code skills from scratch. Automated generation tool following Claudex marketplace standards with intelligent templates, pattern detection, and quality validation. Supports guided creation, quick start templates, clone-and-modify, and validation-only modes. Not for modifying existing skills or non-skill Claude Code configurations.
---
# Skill Creator
Automates creation of Claude Code skills through interactive guidance, template generation, and quality validation.
## When to Use
**Trigger Phrases**:
- "create a new skill for [purpose]"
- "generate a skill called [name]"
- "scaffold a [type] skill"
- "set up a new skill"
**Use Cases**:
- Creating new skills from scratch
- Following Claudex marketplace standards
- Learning skill structure through examples
## Quick Decision Matrix
| User Request | Mode | Action |
|--------------|------|--------|
| "create skill for [purpose]" | Guided | Interactive creation |
| "create [type] skill" | Quick Start | Template-based |
| "skill like [existing]" | Clone | Copy pattern |
| "validate skill" | Validate | Quality check |
## Mode 1: Guided Creation (Default)
**Use when**: User wants full guidance and customization
**Process**:
1. Gather basic info (name, description, author)
2. Define purpose, category, triggers
3. Assess complexity → determine skill type
4. Customize directory structure
5. Select pattern (mode-based, phase-based, validation, data-processing)
6. Generate files from templates
7. Run quality validation
8. Provide installation and next steps
**Workflow**: `workflow/guided-creation.md`
## Mode 2: Quick Start
**Use when**: User specifies skill type directly (minimal, standard, complex)
**Process**:
1. Confirm skill type
2. Gather minimal required info
3. Generate with standardized defaults
4. Flag ALL customization points
**Advantages**: Fast, minimal questions
**Trade-off**: More TODO sections to customize
## Mode 3: Clone & Modify
**Use when**: User wants to base skill on existing one
**Process**:
1. Read existing skill's structure
2. Extract organizational pattern (not content)
3. Generate new skill with same structure
4. Clear example-specific content
**Advantages**: Proven structure, familiar patterns
## Mode 4: Validation Only
**Use when**: User wants to check existing skill quality
**Process**:
1. Read existing skill files
2. Run quality checklist
3. Generate validation report
4. Offer to fix issues automatically
**Use Case**: Before submission, after modifications
## Skill Types
| Type | Complexity | Directories | Pattern |
|------|------------|-------------|---------|
| Minimal | Low | SKILL.md, README.md only | phase-based |
| Standard | Medium | + data/, examples/ | phase-based or validation |
| Complex (mode) | High | + modes/, templates/ | mode-based |
| Complex (data) | High | + scripts/, data/ | data-processing |
## Generated Files
**Required** (all skills):
- `SKILL.md` - Main skill manifest
- `README.md` - User documentation
- `plugin.json` - Marketplace metadata
- `CHANGELOG.md` - Version history
**Optional** (based on type):
- `modes/` - Mode-specific workflows
- `data/` - Reference materials
- `examples/` - Example outputs
- `templates/` - Reusable templates
- `scripts/` - Automation scripts
## Quality Validation
Validates against `data/quality-checklist.md`:
- File existence (all required files)
- Syntax (YAML frontmatter, JSON)
- Content completeness
- Security (no secrets)
- Naming conventions (kebab-case)
- Quality grade (A-F)
## Success Criteria
- [ ] All required files generated
- [ ] Valid YAML frontmatter
- [ ] Valid JSON in plugin.json
- [ ] No security issues
- [ ] Kebab-case naming
- [ ] Version 0.1.0 for new skills
- [ ] At least 3 trigger phrases
- [ ] Quality grade C or better
## Reference Materials
### Templates
- `templates/SKILL.md.j2` - Main manifest
- `templates/README.md.j2` - Documentation
- `templates/plugin.json.j2` - Metadata
- `templates/CHANGELOG.md.j2` - History
### Patterns
- `patterns/mode-based.md` - Multi-mode skills
- `patterns/phase-based.md` - Sequential workflows
- `patterns/validation.md` - Audit skills
- `patterns/data-processing.md` - Data analysis
### Reference Data
- `data/categories.yaml` - Valid categories
- `data/skill-types.yaml` - Type definitions
- `data/quality-checklist.md` - Validation criteria
### Examples
- `examples/minimal-skill/`
- `examples/standard-skill/`
- `examples/complex-skill/`
## Quick Commands
```bash
# Check existing skills
ls ~/.claude/skills/
# View skill structure
tree ~/.claude/skills/[skill-name]/
# Validate frontmatter
head -20 ~/.claude/skills/[skill-name]/SKILL.md
# Validate JSON
python -m json.tool ~/.claude/skills/[skill-name]/plugin.json
```
## Error Handling
| Error | Solution |
|-------|----------|
| Name exists | Suggest alternatives or confirm overwrite |
| Invalid name | Explain kebab-case, provide corrected suggestion |
| Permission denied | Check ~/.claude/skills/ write access |
| Template fails | Fallback to manual creation with guidance |
---
**Version**: 0.1.0 | **Author**: Connor

View File

@@ -0,0 +1,54 @@
# Valid skill categories for Claudex marketplace
# Each category has a name, description, and example skills
categories:
analysis:
name: "analysis"
description: "Code analysis, auditing, quality checking"
examples:
- "codebase-auditor"
- "claude-md-auditor"
- "bulletproof-react-auditor"
use_when:
- "Skill validates or audits code/configuration"
- "Skill detects issues and provides remediation"
- "Skill checks compliance against standards"
tooling:
name: "tooling"
description: "Development tools, configuration validators"
examples:
- "git-worktree-setup"
use_when:
- "Skill automates development workflows"
- "Skill manages development environment"
- "Skill provides developer utilities"
productivity:
name: "productivity"
description: "Developer workflow, automation, insights"
examples:
- "cc-insights"
use_when:
- "Skill improves developer efficiency"
- "Skill provides insights from data"
- "Skill automates repetitive tasks"
devops:
name: "devops"
description: "Infrastructure, deployment, monitoring"
examples:
- "claude-code-otel-setup"
use_when:
- "Skill manages infrastructure"
- "Skill handles deployment pipelines"
- "Skill sets up monitoring/observability"
# Selection guidance
selection_tips:
- "Choose the category that best describes the PRIMARY purpose"
- "If skill fits multiple categories, choose the most specific"
- "Analysis: Focus on checking/validating"
- "Tooling: Focus on developer workflows"
- "Productivity: Focus on insights and efficiency"
- "DevOps: Focus on infrastructure and deployment"

View File

@@ -0,0 +1,211 @@
# Skill Quality Checklist
Use this checklist to validate skill quality before submission or installation.
## Required Files (Critical)
- [ ] `SKILL.md` exists with valid YAML frontmatter
- [ ] `README.md` exists with usage examples
- [ ] `plugin.json` exists with valid JSON
- [ ] `CHANGELOG.md` exists with v0.1.0 entry
## SKILL.md Validation
### Frontmatter
- [ ] `name` field present (kebab-case)
- [ ] `version` field present (0.1.0 for new skills)
- [ ] `description` field present (1-2 sentences)
- [ ] `author` field present
### Required Sections
- [ ] "Overview" section describes skill capabilities
- [ ] "When to Use This Skill" with trigger phrases
- [ ] "When to Use This Skill" with use cases (3-5 items)
- [ ] "Core Responsibilities" or "Workflow" section
- [ ] "Success Criteria" or similar completion checklist
### Content Quality
- [ ] No placeholder text like "[TODO]" or "[FILL IN]" (unless marked for user customization)
- [ ] Trigger phrases are specific and actionable
- [ ] Use cases clearly describe when to activate skill
- [ ] Workflow or responsibilities are detailed
- [ ] No generic programming advice (Claude already knows this)
## README.md Validation
### Structure
- [ ] Title matches skill name
- [ ] Brief description (1-2 sentences) at top
- [ ] "Quick Start" section with example
- [ ] "Installation" instructions
- [ ] At least 2 usage examples
### Content Quality
- [ ] Examples are concrete and actionable
- [ ] Installation instructions are clear
- [ ] Requirements section lists dependencies
- [ ] Troubleshooting section addresses common issues
## plugin.json Validation
### Required Fields
- [ ] `name` matches skill directory name
- [ ] `version` is valid semver (0.1.0 for new skills)
- [ ] `description` matches SKILL.md frontmatter
- [ ] `author` present
- [ ] `license` is "Apache-2.0"
- [ ] `homepage` URL is correct
- [ ] `repository` object present with type and url
### Components
- [ ] `components.agents` array present
- [ ] At least one agent with `name` and `manifestPath`
- [ ] `manifestPath` points to "SKILL.md"
### Metadata
- [ ] `metadata.category` is one of: analysis, tooling, productivity, devops
- [ ] `metadata.status` is "proof-of-concept" for new skills
- [ ] `metadata.tested` describes testing scope
### Keywords
- [ ] At least 3 keywords present
- [ ] Keywords are relevant and specific
- [ ] Keywords aid discoverability (not too generic)
## CHANGELOG.md Validation
- [ ] Follows "Keep a Changelog" format
- [ ] Has section for version 0.1.0
- [ ] Date is present and correct
- [ ] "Added" section lists initial features
- [ ] Status section describes testing level
- [ ] Link to release tag at bottom
## Security Validation (Critical)
- [ ] No API keys, tokens, or passwords in any file
- [ ] No database connection strings with credentials
- [ ] No private keys (PEM format)
- [ ] No internal IP addresses or infrastructure details
- [ ] No hardcoded secrets of any kind
## Version Control
- [ ] `.gitignore` present if skill generates files (like .processed/)
- [ ] No generated files committed (build artifacts, logs, etc.)
- [ ] No large binary files (> 1MB)
## Naming Conventions
- [ ] Skill name is kebab-case (e.g., "skill-name")
- [ ] Directory name matches skill name
- [ ] No spaces in names
- [ ] Name is descriptive and not too generic
## Documentation Quality
- [ ] All sections are complete (no stubs)
- [ ] Examples are realistic and helpful
- [ ] Technical terms are explained or linked
- [ ] Grammar and spelling are correct
- [ ] Markdown formatting is valid
## Pattern Consistency
If skill uses specific pattern:
### Mode-Based Skills
- [ ] "Quick Decision Matrix" present
- [ ] "Mode Detection Logic" present
- [ ] Each mode has clear trigger phrases
- [ ] Modes are distinct and non-overlapping
### Phase-Based Skills
- [ ] Phases are numbered and named
- [ ] Each phase has clear purpose
- [ ] Dependencies between phases are documented
- [ ] Transition criteria are explicit
### Validation Skills
- [ ] Validation sources are documented
- [ ] Finding structure is consistent
- [ ] Severity levels are defined
- [ ] Score calculation is explained
### Data Processing Skills
- [ ] Data flow architecture is documented
- [ ] Storage strategy is explained
- [ ] Performance characteristics are listed
- [ ] Helper scripts are provided
## Testing Validation
- [ ] Skill can be loaded without errors
- [ ] Trigger phrases activate the skill
- [ ] Example workflows complete successfully
- [ ] No obvious bugs or crashes
## User Experience
- [ ] Skill purpose is immediately clear
- [ ] Trigger phrases are intuitive
- [ ] Workflow is logical and easy to follow
- [ ] Error messages are helpful
- [ ] Success criteria are measurable
## Scoring
**Critical Issues** (Must fix before use):
- Missing required files
- Invalid JSON/YAML
- Security issues (exposed secrets)
- Skill fails to load
**High Priority** (Fix before submission):
- Incomplete documentation
- Missing examples
- Unclear trigger phrases
- Invalid metadata
**Medium Priority** (Improve when possible):
- Inconsistent formatting
- Missing optional sections
- Could use more examples
- Documentation could be clearer
**Low Priority** (Nice to have):
- Additional examples
- More detailed explanations
- Enhanced formatting
- Extra reference materials
## Overall Quality Score
Calculate a quality score:
```
Critical Issues: 0 required (any critical issue = fail)
High Priority: 0-2 acceptable (> 2 needs work)
Medium Priority: 0-5 acceptable (> 5 needs improvement)
Low Priority: Any number acceptable
Overall Grade:
- A (90-100): Production ready, excellent quality
- B (80-89): Good quality, minor improvements
- C (70-79): Acceptable, some improvements needed
- D (60-69): Needs work before submission
- F (< 60): Significant issues, do not submit
```
## Pre-Submission Final Check
Before submitting to marketplace:
1. [ ] Run through entire checklist
2. [ ] Test in fresh Claude Code session
3. [ ] Get peer review if possible
4. [ ] Verify all links work
5. [ ] Check for typos and errors
6. [ ] Confirm no sensitive data
7. [ ] Verify version is correct
8. [ ] Update CHANGELOG if needed

View File

@@ -0,0 +1,121 @@
# Skill type definitions for determining structure and patterns
# Each type maps to a complexity level and recommended pattern
skill_types:
minimal:
name: "Minimal"
description: "Simple automation with single workflow"
complexity: "low"
structure:
required:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
optional: []
pattern: "phase-based"
use_when:
- "Single straightforward workflow"
- "No multiple modes or complex branching"
- "Minimal configuration needed"
examples:
- "Simple file formatters"
- "Basic code generators"
- "Single-purpose validators"
standard:
name: "Standard"
description: "Most common skills with reference materials"
complexity: "medium"
structure:
required:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
optional:
- "data/"
- "examples/"
- "templates/"
pattern: "phase-based"
use_when:
- "Sequential workflow with phases"
- "Needs reference materials"
- "Provides templates or examples"
examples:
- "codebase-auditor"
- "claude-md-auditor"
complex:
name: "Complex"
description: "Multiple modes with separate workflows"
complexity: "high"
structure:
required:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
- "modes/"
optional:
- "data/"
- "examples/"
- "templates/"
- "scripts/"
pattern: "mode-based"
use_when:
- "Multiple distinct operating modes"
- "Different workflows based on user intent"
- "Complex decision logic needed"
examples:
- "git-worktree-setup (single/batch/cleanup/list)"
- "Skills with multiple output formats"
data_processing:
name: "Data Processing"
description: "Skills that process, analyze, or transform data"
complexity: "high"
structure:
required:
- "SKILL.md"
- "README.md"
- "plugin.json"
- "CHANGELOG.md"
- "scripts/"
optional:
- "data/"
- "examples/"
- "templates/"
- "dashboard/" # optional web UI
pattern: "data-processing"
use_when:
- "Ingests data from files or APIs"
- "Performs analysis or transformation"
- "Generates insights or reports"
- "Needs helper scripts for processing"
examples:
- "cc-insights"
- "Log analyzers"
- "Metrics aggregators"
# Pattern recommendations by skill purpose
pattern_recommendations:
validation_audit:
primary_pattern: "validation"
secondary_pattern: "phase-based"
notes: "Use validation pattern for compliance checking, phase-based for workflow"
workflow_automation:
primary_pattern: "phase-based"
secondary_pattern: "mode-based"
notes: "Use phase-based for sequential steps, mode-based if multiple workflows"
multi_mode_operation:
primary_pattern: "mode-based"
secondary_pattern: null
notes: "Clear mode detection is critical for user experience"
data_analysis:
primary_pattern: "data-processing"
secondary_pattern: "phase-based"
notes: "Use data-processing for pipelines, phase-based for analysis stages"

View File

@@ -0,0 +1,272 @@
# Complex Skill Structure Example
This example shows the structure for skills with multiple operating modes or data processing capabilities.
## Directory Structure
### Mode-Based Complex Skill
```
complex-skill/
├── SKILL.md # Agent manifest with mode detection (required)
├── README.md # User documentation (required)
├── plugin.json # Marketplace metadata (required)
├── CHANGELOG.md # Version history (required)
├── modes/ # Mode-specific workflows (required for mode-based)
│ ├── mode1-name.md
│ ├── mode2-name.md
│ └── mode3-name.md
├── data/ # Reference materials (optional)
│ ├── best-practices.md
│ └── troubleshooting.md
├── examples/ # Sample outputs per mode (optional)
│ ├── mode1-example.md
│ └── mode2-example.md
└── templates/ # Reusable templates (optional)
└── output-template.md
```
### Data Processing Complex Skill
```
complex-skill/
├── SKILL.md # Agent manifest (required)
├── README.md # User documentation (required)
├── plugin.json # Marketplace metadata (required)
├── CHANGELOG.md # Version history (required)
├── scripts/ # Processing scripts (required for data processing)
│ ├── processor.py
│ ├── indexer.py
│ ├── query.py
│ └── generator.py
├── data/ # Reference materials (optional)
│ └── config-defaults.yaml
├── examples/ # Sample outputs (optional)
│ └── sample-report.md
└── templates/ # Report templates (optional)
└── report-template.md.j2
```
## When to Use Complex Structure
Use this structure when:
### Mode-Based:
- Multiple distinct operating modes based on user intent
- Each mode has its own workflow
- Different outputs per mode
- Clear mode detection logic needed
- Example: git-worktree-setup (single/batch/cleanup/list modes)
### Data Processing:
- Processes data from files or APIs
- Performs analysis or transformation
- Generates insights or reports
- Needs helper scripts for processing
- Example: cc-insights (conversation analysis)
## Characteristics
- **Complexity**: High
- **Files**: 4 required + 4+ optional directories
- **Pattern**: Mode-based or data-processing
- **Modes**: Multiple distinct modes OR data pipeline
- **Scripts**: Often needed for data processing
- **Dependencies**: May have Python/Node dependencies
## SKILL.md Template (Mode-Based)
```markdown
---
name: skill-name
version: 0.1.0
description: Multi-mode skill that handles X, Y, and Z
author: Your Name
---
# Skill Name
## Overview
This skill operates in multiple modes based on user intent.
## When to Use This Skill
**Trigger Phrases:**
- "mode 1 trigger"
- "mode 2 trigger"
- "mode 3 trigger"
**Use Cases:**
- Mode 1: Use case
- Mode 2: Use case
- Mode 3: Use case
## Quick Decision Matrix
\`\`\`
User Request → Mode → Action
─────────────────────────────────────────────────
"trigger 1" → Mode 1 → Action 1
"trigger 2" → Mode 2 → Action 2
"trigger 3" → Mode 3 → Action 3
\`\`\`
## Mode Detection Logic
\`\`\`javascript
// Mode 1: Description
if (userMentions("keyword1")) {
return "mode1-name";
}
// Mode 2: Description
if (userMentions("keyword2")) {
return "mode2-name";
}
// Mode 3: Description
if (userMentions("keyword3")) {
return "mode3-name";
}
// Ambiguous - ask user
return askForClarification();
\`\`\`
## Core Responsibilities
### Shared Prerequisites
- ✓ Prerequisite 1 (all modes)
- ✓ Prerequisite 2 (all modes)
### Mode-Specific Workflows
See detailed workflows in:
- \`modes/mode1-name.md\` - Mode 1 complete workflow
- \`modes/mode2-name.md\` - Mode 2 complete workflow
- \`modes/mode3-name.md\` - Mode 3 complete workflow
## Success Criteria
Varies by mode - see individual mode documentation.
```
## SKILL.md Template (Data Processing)
```markdown
---
name: skill-name
version: 0.1.0
description: Processes X data to generate Y insights
author: Your Name
---
# Skill Name
## Overview
Automatically processes data from [source] to provide [capabilities].
## When to Use This Skill
**Trigger Phrases:**
- "search for X"
- "generate Y report"
- "analyze Z data"
**Use Cases:**
- Search and find
- Generate insights
- Track patterns
## Architecture
\`\`\`
Input → Processing → Storage → Query/Analysis → Output
\`\`\`
## Workflow
### Phase 1: Data Ingestion
- Discover data sources
- Validate format
- Process incrementally
### Phase 2: Processing
- Extract features
- Generate embeddings (if semantic)
- Store in database
### Phase 3: Query/Analysis
- Search interface
- Pattern detection
- Generate reports
## Scripts
See \`scripts/\` directory:
- \`processor.py\` - Main data processing
- \`indexer.py\` - Build indexes
- \`query.py\` - Query interface
- \`generator.py\` - Report generation
## Performance
- Initial processing: [time estimate]
- Incremental updates: [time estimate]
- Search latency: [time estimate]
- Memory usage: [estimate]
```
## Directory Purposes
### modes/
For mode-based skills, each file documents one mode:
- Complete workflow for that mode
- Mode-specific prerequisites
- Mode-specific outputs
- Mode-specific error handling
### scripts/
For data processing skills:
- Python/Node scripts for heavy processing
- CLI interfaces for user interaction
- Batch processors
- Report generators
## Best Practices
### Mode-Based Skills:
1. **Clear mode boundaries**: Each mode is distinct
2. **Explicit detection**: Unambiguous mode selection
3. **Shared prerequisites**: Extract common validation
4. **Mode independence**: Each mode works standalone
5. **Detailed documentation**: Each mode has its own guide
### Data Processing Skills:
1. **Incremental processing**: Don't reprocess everything
2. **State tracking**: Know what's been processed
3. **Progress indicators**: Show progress for long operations
4. **Error recovery**: Handle failures gracefully
5. **Performance docs**: Document expected performance
6. **Script documentation**: Each script has clear --help
## Examples of Complex Skills
### Mode-Based:
- **git-worktree-setup**: Single/Batch/Cleanup/List modes
- **Multi-format converter**: Different output formats
- **Environment manager**: Create/Update/Delete/List
### Data Processing:
- **cc-insights**: Conversation analysis with RAG search
- **Log analyzer**: Parse logs, detect patterns, generate reports
- **Metrics aggregator**: Collect data, analyze trends, visualize
## When NOT to Use Complex Structure
Avoid over-engineering:
- Don't create modes if phases suffice
- Don't add scripts if pure LLM can handle it
- Don't add directories you won't populate
- Start minimal, grow as needed

View File

@@ -0,0 +1,91 @@
# Minimal Skill Structure Example
This example shows the minimal required structure for a simple skill.
## Directory Structure
```
minimal-skill/
├── SKILL.md # Agent manifest (required)
├── README.md # User documentation (required)
├── plugin.json # Marketplace metadata (required)
└── CHANGELOG.md # Version history (required)
```
## When to Use Minimal Structure
Use this structure when:
- Skill has a single straightforward workflow
- No multiple modes or complex branching
- Minimal configuration needed
- No external dependencies or scripts
- Simple automation or transformation task
## Examples of Minimal Skills
- **Code Formatter**: Applies consistent formatting to code files
- **Template Generator**: Creates files from simple templates
- **Single-Purpose Validator**: Checks one specific thing
## Characteristics
- **Complexity**: Low
- **Files**: 4 required only
- **Pattern**: Usually phase-based with 2-3 simple phases
- **Modes**: None (single workflow)
- **Scripts**: None
- **Dependencies**: None or minimal
## SKILL.md Template
```markdown
---
name: skill-name
version: 0.1.0
description: Brief description of what this skill does
author: Your Name
---
# Skill Name
## Overview
What this skill does in detail.
## When to Use This Skill
**Trigger Phrases:**
- "phrase 1"
- "phrase 2"
**Use Cases:**
- Use case 1
- Use case 2
## Workflow
### Phase 1: Setup
1. Validate inputs
2. Gather context
### Phase 2: Execute
1. Perform main action
2. Verify result
### Phase 3: Completion
1. Report results
2. Provide next steps
## Success Criteria
- [ ] Criterion 1
- [ ] Criterion 2
```
## Best Practices
1. **Keep it simple**: Don't add structure you don't need
2. **Clear workflow**: 2-4 phases maximum
3. **Explicit success criteria**: User knows when it's done
4. **Good examples**: Show concrete usage in README
5. **Test thoroughly**: Minimal doesn't mean untested

View File

@@ -0,0 +1,142 @@
# Standard Skill Structure Example
This example shows the standard structure used by most skills in the marketplace.
## Directory Structure
```
standard-skill/
├── SKILL.md # Agent manifest (required)
├── README.md # User documentation (required)
├── plugin.json # Marketplace metadata (required)
├── CHANGELOG.md # Version history (required)
├── data/ # Reference materials, standards (optional)
│ ├── best-practices.md
│ ├── standards.yaml
│ └── references.md
├── examples/ # Sample outputs (optional)
│ ├── example-1.md
│ └── example-2.md
└── templates/ # Reusable templates (optional)
├── report-template.md
└── output-template.json
```
## When to Use Standard Structure
Use this structure when:
- Sequential workflow with clear phases
- Needs reference materials or standards
- Provides templates for outputs
- Examples help users understand
- Medium complexity
## Examples of Standard Skills
- **Codebase Auditor**: Analyzes code against standards
- **CLAUDE.md Auditor**: Validates configuration files
- **Documentation Generator**: Creates docs from code
## Characteristics
- **Complexity**: Medium
- **Files**: 4 required + 3 optional directories
- **Pattern**: Phase-based or validation
- **Modes**: Usually single mode, sequential phases
- **Scripts**: Rarely needed (pure LLM skill)
- **Dependencies**: Minimal
## SKILL.md Template
```markdown
---
name: skill-name
version: 0.1.0
description: Brief description of what this skill does
author: Your Name
---
# Skill Name
## Overview
Detailed description of capabilities.
## When to Use This Skill
**Trigger Phrases:**
- "phrase 1"
- "phrase 2"
- "phrase 3"
**Use Cases:**
- Use case 1
- Use case 2
- Use case 3
## Workflow
### Phase 1: Discovery
- Identify scope
- Gather context
- Validate prerequisites
### Phase 2: Analysis
- Apply standards from data/
- Check compliance
- Detect issues
### Phase 3: Reporting
- Generate report using templates/
- Provide examples from examples/
- Offer recommendations
### Phase 4: Remediation
- Guide user through fixes
- Verify improvements
- Update documentation
## Success Criteria
- [ ] All phases completed
- [ ] Report generated
- [ ] Recommendations provided
## Reference Materials
- `data/` - Standards and best practices
- `examples/` - Sample outputs
- `templates/` - Reusable templates
```
## Directory Purposes
### data/
Contains reference materials the skill consults:
- Standards documents (YAML, MD)
- Best practices guides
- Lookup tables
- Configuration defaults
### examples/
Shows users what to expect:
- Sample outputs
- Before/after comparisons
- Success stories
- Common scenarios
### templates/
Reusable output formats:
- Report templates (Jinja2 or Markdown)
- JSON schemas
- Configuration files
- Document structures
## Best Practices
1. **Organized references**: Put all standards in data/
2. **Concrete examples**: Show real usage in examples/
3. **Reusable templates**: DRY principle for outputs
4. **Progressive disclosure**: Start simple, add detail as needed
5. **Clear phases**: Each phase has specific purpose
6. **Documentation**: Reference materials are well-documented

View File

@@ -0,0 +1,247 @@
# Data Processing Skill Pattern
Use this pattern when your skill **processes, analyzes, or transforms** data to extract insights.
## When to Use
- Skill ingests data from files or APIs
- Performs analysis or transformation
- Generates insights, reports, or visualizations
- Examples: cc-insights (conversation analysis)
## Structure
### Data Flow Architecture
Define clear data pipeline:
```
Input Sources → Processing → Storage → Query/Analysis → Output
```
Example:
```
JSONL files → Parser → SQLite + Vector DB → Search/Analytics → Reports/Dashboard
```
### Processing Modes
**Batch Processing:**
- Process all data at once
- Good for: Initial setup, complete reprocessing
- Trade-off: Slow startup, complete data
**Incremental Processing:**
- Process only new/changed data
- Good for: Regular updates, performance
- Trade-off: Complex state tracking
**Streaming Processing:**
- Process data as it arrives
- Good for: Real-time updates
- Trade-off: Complex implementation
### Storage Strategy
Choose appropriate storage:
**SQLite:**
- Structured metadata
- Fast queries
- Relational data
- Good for: Indexes, aggregations
**Vector Database (ChromaDB):**
- Semantic embeddings
- Similarity search
- Good for: RAG, semantic queries
**File System:**
- Raw data
- Large blobs
- Good for: Backups, archives
## Example: CC Insights
**Input**: Claude Code conversation JSONL files
**Processing Pipeline:**
1. JSONL Parser - Decode base64, extract messages
2. Metadata Extractor - Timestamps, files, tools
3. Embeddings Generator - Vector representations
4. Pattern Detector - Identify trends
**Storage:**
- SQLite: Conversation metadata, fast queries
- ChromaDB: Vector embeddings, semantic search
- Cache: Processed conversation data
**Query Interfaces:**
1. CLI Search - Command-line semantic search
2. Insight Generator - Pattern-based reports
3. Dashboard - Interactive web UI
**Outputs:**
- Search results with similarity scores
- Weekly activity reports
- File heatmaps
- Tool usage analytics
## Data Processing Workflow
### Phase 1: Ingestion
```markdown
1. **Discover Data Sources**
- Locate input files/APIs
- Validate accessibility
- Calculate scope (file count, size)
2. **Initial Validation**
- Check format validity
- Verify schema compliance
- Estimate processing time
3. **State Management**
- Track what's been processed
- Support incremental updates
- Handle failures gracefully
```
### Phase 2: Processing
```markdown
1. **Parse/Transform**
- Read raw data
- Apply transformations
- Handle errors and edge cases
2. **Extract Features**
- Generate metadata
- Calculate metrics
- Create embeddings (if semantic search)
3. **Store Results**
- Write to database(s)
- Update indexes
- Maintain consistency
```
### Phase 3: Analysis
```markdown
1. **Query Interface**
- Support multiple query types
- Optimize for common patterns
- Return ranked results
2. **Pattern Detection**
- Aggregate data
- Identify trends
- Generate insights
3. **Visualization**
- Format for human consumption
- Support multiple output formats
- Interactive when possible
```
## Performance Characteristics
Document expected performance:
```markdown
### Performance Characteristics
- **Initial indexing**: ~1-2 minutes for 100 records
- **Incremental updates**: <5 seconds for new records
- **Search latency**: <1 second for queries
- **Report generation**: <10 seconds for standard reports
- **Memory usage**: ~200MB for 1000 records
```
## Best Practices
1. **Incremental Processing**: Don't reprocess everything on each run
2. **State Tracking**: Track what's been processed to avoid duplicates
3. **Batch Operations**: Process in batches for memory efficiency
4. **Progress Indicators**: Show progress for long operations
5. **Error Recovery**: Handle failures gracefully, resume where left off
6. **Data Validation**: Validate inputs before expensive processing
7. **Index Optimization**: Optimize databases for common queries
8. **Memory Management**: Stream large files, don't load everything
9. **Parallel Processing**: Use parallelism when possible
10. **Cache Wisely**: Cache expensive computations
## Scripts Structure
For data processing skills, provide helper scripts:
```
scripts/
├── processor.py # Main data processing script
├── indexer.py # Build indexes/embeddings
├── query.py # Query interface (CLI)
└── generator.py # Report/insight generation
```
### Script Best Practices
```python
# Good patterns for processing scripts:
# 1. Use click for CLI
import click
@click.command()
@click.option('--input', help='Input path')
@click.option('--reindex', is_flag=True)
def process(input, reindex):
"""Process data from input source."""
pass
# 2. Show progress
from tqdm import tqdm
for item in tqdm(items, desc="Processing"):
process_item(item)
# 3. Handle errors gracefully
try:
result = process_item(item)
except Exception as e:
logger.error(f"Failed to process {item}: {e}")
continue # Continue with next item
# 4. Support incremental updates
if not reindex and is_already_processed(item):
continue
# 5. Use batch processing
for batch in chunks(items, batch_size=32):
process_batch(batch)
```
## Storage Schema
Document your data schema:
```sql
-- Example SQLite schema
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
timestamp INTEGER,
message_count INTEGER,
files_modified TEXT, -- JSON array
tools_used TEXT -- JSON array
);
CREATE INDEX idx_timestamp ON conversations(timestamp);
CREATE INDEX idx_files ON conversations(files_modified);
```
## Output Formats
Support multiple output formats:
1. **Markdown**: Human-readable reports
2. **JSON**: Machine-readable for integration
3. **CSV**: Spreadsheet-compatible data
4. **HTML**: Styled reports with charts
5. **Interactive**: Web dashboards (optional)

View File

@@ -0,0 +1,78 @@
# Mode-Based Skill Pattern
Use this pattern when your skill has **multiple distinct operating modes** based on user intent.
## When to Use
- Skill performs fundamentally different operations based on context
- Each mode has its own workflow and outputs
- User intent determines which mode to activate
- Examples: git-worktree-setup (single/batch/cleanup/list modes)
## Structure
### Quick Decision Matrix
Create a clear mapping of user requests to modes:
```
User Request → Mode → Action
───────────────────────────────────────────────────────────
"trigger phrase 1" → Mode 1 → High-level action
"trigger phrase 2" → Mode 2 → High-level action
"trigger phrase 3" → Mode 3 → High-level action
```
### Mode Detection Logic
Provide clear logic for mode selection:
```javascript
// Mode 1: [Name]
if (userMentions("keyword1", "keyword2")) {
return "mode1-name";
}
// Mode 2: [Name]
if (userMentions("keyword3", "keyword4")) {
return "mode2-name";
}
// Ambiguous - ask user
return askForClarification();
```
### Separate Mode Documentation
For complex skills, create separate files for each mode:
```
skill-name/
├── SKILL.md # Overview and mode detection
├── modes/
│ ├── mode1-name.md # Detailed workflow for mode 1
│ ├── mode2-name.md # Detailed workflow for mode 2
│ └── mode3-name.md # Detailed workflow for mode 3
```
## Example: Git Worktree Setup
**Modes:**
1. Single Worktree - Create one worktree
2. Batch Worktrees - Create multiple worktrees
3. Cleanup - Remove worktrees
4. List/Manage - Show worktree status
**Detection Logic:**
- "create worktree for X" → Single mode
- "create worktrees for A, B, C" → Batch mode
- "remove worktree" → Cleanup mode
- "list worktrees" → List mode
## Best Practices
1. **Clear Mode Boundaries**: Each mode should be distinct and non-overlapping
2. **Explicit Detection**: Provide clear rules for mode selection
3. **Clarification Path**: Always have a fallback to ask user when ambiguous
4. **Mode Independence**: Each mode should work standalone
5. **Shared Prerequisites**: Extract common validation to reduce duplication

View File

@@ -0,0 +1,115 @@
# Phase-Based Skill Pattern
Use this pattern when your skill follows **sequential phases** that build on each other.
## When to Use
- Skill has a linear workflow with clear stages
- Each phase depends on the previous phase
- Progressive disclosure of complexity
- Examples: codebase-auditor (discovery → analysis → reporting → remediation)
## Structure
### Phase Overview
Define clear phases with dependencies:
```
Phase 1: Discovery
Phase 2: Analysis
Phase 3: Reporting
Phase 4: Action/Remediation
```
### Phase Workflow Template
```markdown
## Workflow
### Phase 1: [Name]
**Purpose**: [What this phase accomplishes]
**Steps:**
1. [Step 1]
2. [Step 2]
3. [Step 3]
**Output**: [What information is produced]
**Transition**: [When to move to next phase]
### Phase 2: [Name]
**Purpose**: [What this phase accomplishes]
**Inputs**: [Required from previous phase]
**Steps:**
1. [Step 1]
2. [Step 2]
**Output**: [What information is produced]
```
## Example: Codebase Auditor
**Phase 1: Initial Assessment** (Progressive Disclosure)
- Lightweight scan to understand codebase
- Identify tech stack and structure
- Quick health check
- **Output**: Project profile and initial findings
**Phase 2: Deep Analysis** (Load on Demand)
- Based on Phase 1, perform targeted analysis
- Code quality, security, testing, etc.
- **Output**: Detailed findings with severity
**Phase 3: Report Generation**
- Aggregate findings from Phase 2
- Calculate scores and metrics
- **Output**: Comprehensive audit report
**Phase 4: Remediation Planning**
- Prioritize findings by severity
- Generate action plan
- **Output**: Prioritized task list
## Best Practices
1. **Progressive Disclosure**: Start lightweight, go deep only when needed
2. **Clear Transitions**: Explicitly state when moving between phases
3. **Phase Independence**: Each phase should have clear inputs/outputs
4. **Checkpoint Validation**: Verify prerequisites before advancing
5. **Early Exit**: Allow stopping after any phase if user only needs partial analysis
6. **Incremental Value**: Each phase should provide standalone value
## Phase Characteristics
### Discovery Phase
- Fast and lightweight
- Gather context and identify scope
- No expensive operations
- Output guides subsequent phases
### Analysis Phase
- Deep dive based on discovery
- Resource-intensive operations
- Parallel processing when possible
- Structured output for reporting
### Reporting Phase
- Aggregate and synthesize data
- Calculate metrics and scores
- Generate human-readable output
- Support multiple formats
### Action Phase
- Provide recommendations
- Generate implementation guidance
- Offer to perform actions
- Track completion

View File

@@ -0,0 +1,174 @@
# Validation/Audit Skill Pattern
Use this pattern when your skill **validates, audits, or checks** artifacts against standards.
## When to Use
- Skill checks compliance against defined standards
- Detects issues and provides remediation guidance
- Generates reports with severity levels
- Examples: claude-md-auditor, codebase-auditor
## Structure
### Validation Sources
Clearly define what you're validating against:
```markdown
## Validation Sources
### 1. ✅ Official Standards
- **Source**: [Authority/documentation]
- **Authority**: Highest (requirements)
- **Examples**: [List key standards]
### 2. 💡 Best Practices
- **Source**: Community/field experience
- **Authority**: Medium (recommendations)
- **Examples**: [List practices]
### 3. 🔬 Research/Optimization
- **Source**: Academic research
- **Authority**: Medium (evidence-based)
- **Examples**: [List findings]
```
### Finding Structure
Use consistent structure for all findings:
```markdown
**Severity**: Critical | High | Medium | Low
**Category**: [Type of issue]
**Location**: [File:line or context]
**Description**: [What the issue is]
**Impact**: [Why it matters]
**Remediation**: [How to fix]
**Effort**: [Time estimate]
**Source**: Official | Community | Research
```
### Severity Levels
Define clear severity criteria:
- **Critical**: Security risk, production-blocking (fix immediately)
- **High**: Significant quality issue (fix this sprint)
- **Medium**: Moderate improvement (schedule for next quarter)
- **Low**: Minor optimization (backlog)
### Score Calculation
Provide quantitative scoring:
```
Overall Health Score (0-100):
- 90-100: Excellent
- 75-89: Good
- 60-74: Fair
- 40-59: Poor
- 0-39: Critical
Category Scores:
- Security: Should always be 100
- Compliance: Aim for 80+
- Best Practices: 70+ is good
```
## Example: CLAUDE.md Auditor
**Validation Against:**
1. Official Anthropic documentation (docs.claude.com)
2. Community best practices (field experience)
3. Academic research (LLM context optimization)
**Finding Categories:**
- Security (secrets, sensitive data)
- Official Compliance (Anthropic guidelines)
- Best Practices (community recommendations)
- Structure (organization, formatting)
**Output Modes:**
1. Audit Report - Detailed findings
2. JSON Report - Machine-readable for CI/CD
3. Refactored File - Production-ready output
## Validation Workflow
### Step 1: Discovery
- Locate target artifact(s)
- Calculate metrics (size, complexity)
- Read content for analysis
### Step 2: Analysis
Run validators in priority order:
1. Security Validation (CRITICAL)
2. Official Compliance
3. Best Practices
4. Optimization Opportunities
### Step 3: Scoring
- Calculate overall health score
- Generate category-specific scores
- Count findings by severity
### Step 4: Reporting
- Generate human-readable report
- Provide machine-readable output
- Offer remediation options
## Best Practices
1. **Prioritize Security**: Always check security first
2. **Source Attribution**: Label each finding with its source
3. **Actionable Remediation**: Provide specific fix instructions
4. **Multiple Output Formats**: Support markdown, JSON, HTML
5. **Incremental Improvement**: Don't overwhelm with all issues
6. **Track Over Time**: Support baseline comparisons
7. **CI/CD Integration**: Provide exit codes and JSON output
## Report Structure
```markdown
# [Artifact] Audit Report
## Executive Summary
- Overall health score: [X/100]
- Critical findings: [count]
- High findings: [count]
- Top 3 priorities
## File Metrics
- [Relevant size/complexity metrics]
## Detailed Findings
### Critical Issues
[Grouped by category]
### High Priority Issues
[Grouped by category]
### Medium Priority Issues
[Grouped by category]
## Remediation Plan
- P0: IMMEDIATE (critical)
- P1: THIS SPRINT (high)
- P2: NEXT QUARTER (medium)
- P3: BACKLOG (low)
```
## Success Criteria Template
```markdown
A well-validated [artifact] should achieve:
- ✅ Security Score: 100/100
- ✅ Compliance Score: 80+/100
- ✅ Overall Health: 75+/100
- ✅ Zero CRITICAL findings
- ✅ < 3 HIGH findings
- ✅ [Artifact-specific criteria]
```

View File

@@ -0,0 +1,26 @@
# Changelog
All notable changes to the {{ skill_name }} skill will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [{{ version }}] - {{ date }}
### Added
- Initial release of {{ skill_name }}
{% for feature in initial_features -%}
- {{ feature }}
{% endfor %}
### Features
{% for feature in key_features -%}
- **{{ feature.name }}**: {{ feature.description }}
{% endfor %}
### Status
- Proof of concept
- Tested locally on 1-2 projects
- Ready for community feedback and testing
[{{ version }}]: https://github.com/cskiro/claudex/releases/tag/{{ skill_name }}@{{ version }}

View File

@@ -0,0 +1,109 @@
# {{ skill_title }}
{{ description }}
## Quick Start
```
User: "{{ trigger_phrases[0] }}"
```
Claude will:
1. [Action 1]
2. [Action 2]
3. [Action 3]
## Features
### Feature 1: [Feature Name]
- [Capability 1]
- [Capability 2]
- [Capability 3]
### Feature 2: [Feature Name]
- [Capability 1]
- [Capability 2]
## Installation
```bash
/plugin install {{ skill_name }}@claudex
```
Or manually:
```bash
cp -r {{ skill_name }} ~/.claude/skills/
```
## Usage Examples
### Example 1: [Scenario Name]
**Scenario:** [Description of scenario]
```
User: "{{ trigger_phrases[0] }}"
```
**Result:**
- [Outcome 1]
- [Outcome 2]
### Example 2: [Another Scenario]
**Scenario:** [Description of scenario]
```
User: "{{ trigger_phrases[1] if trigger_phrases|length > 1 else trigger_phrases[0] }}"
```
**Result:**
- [Outcome 1]
- [Outcome 2]
## Requirements
- [Requirement 1]
- [Requirement 2]
## Configuration
{% if has_config %}
[TODO: Describe configuration options]
{% else %}
No additional configuration required.
{% endif %}
## Troubleshooting
### Issue 1: [Issue Name]
**Problem:** [Description]
**Solution:** [Steps to resolve]
### Issue 2: [Issue Name]
**Problem:** [Description]
**Solution:** [Steps to resolve]
## Best Practices
- [Practice 1]
- [Practice 2]
- [Practice 3]
## Limitations
- [Limitation 1]
- [Limitation 2]
## Contributing
See [CONTRIBUTING.md](https://github.com/cskiro/claudex/blob/main/CONTRIBUTING.md) for contribution guidelines.
## License
Apache 2.0
## Version History
See [CHANGELOG.md](./CHANGELOG.md) for version history.

View File

@@ -0,0 +1,124 @@
---
name: {{ skill_name }}
version: {{ version }}
description: {{ description }}
author: {{ author }}
---
# {{ skill_title }}
## Overview
{{ detailed_description }}
## When to Use This Skill
**Trigger Phrases:**
{% for phrase in trigger_phrases -%}
- "{{ phrase }}"
{% endfor %}
**Use Cases:**
{% for use_case in use_cases -%}
- {{ use_case }}
{% endfor %}
## Response Style
- **Characteristic 1**: [TODO: Define first response characteristic]
- **Characteristic 2**: [TODO: Define second response characteristic]
- **Characteristic 3**: [TODO: Define third response characteristic]
{% if has_modes %}
## Quick Decision Matrix
```
User Request → Mode → Action
───────────────────────────────────────────────────────────
{% for mode in modes -%}
"{{ mode.trigger }}" → {{ mode.name }} → {{ mode.action }}
{% endfor -%}
```
## Mode Detection Logic
```javascript
{% for mode in modes -%}
// Mode {{ loop.index }}: {{ mode.name }}
if (userMentions("{{ mode.keyword }}")) {
return "{{ mode.name|lower|replace(' ', '-') }}";
}
{% endfor -%}
// Ambiguous - ask user
return askForClarification();
```
{% endif %}
## Core Responsibilities
### 1. [First Responsibility]
- ✓ [Detail 1]
- ✓ [Detail 2]
- ✓ [Detail 3]
### 2. [Second Responsibility]
- ✓ [Detail 1]
- ✓ [Detail 2]
## Workflow
{% if has_phases %}
{% for phase in phases %}
### Phase {{ loop.index }}: {{ phase.name }}
{% for step in phase.steps -%}
{{ loop.index }}. {{ step }}
{% endfor %}
{% endfor %}
{% else %}
### Phase 1: Initial Assessment
1. [Step 1]
2. [Step 2]
3. [Step 3]
### Phase 2: Main Operation
1. [Step 1]
2. [Step 2]
### Phase 3: Verification
1. [Step 1]
2. [Step 2]
{% endif %}
## Error Handling
Common issues and how to handle them:
- **Error 1**: [Solution]
- **Error 2**: [Solution]
## Success Criteria
- [ ] [Criterion 1]
- [ ] [Criterion 2]
- [ ] [Criterion 3]
{% if has_reference_materials %}
## Reference Materials
See additional documentation in:
{% if has_data_dir -%}
- `data/` - Best practices and standards
{% endif -%}
{% if has_modes_dir -%}
- `modes/` - Detailed mode workflows
{% endif -%}
{% if has_examples_dir -%}
- `examples/` - Sample outputs
{% endif -%}
{% if has_templates_dir -%}
- `templates/` - Reusable templates
{% endif -%}
{% endif %}
---
**Remember:** [TODO: Add key reminder about using this skill effectively]

View File

@@ -0,0 +1,39 @@
{
"name": "{{ skill_name }}",
"version": "{{ version }}",
"description": "{{ description }}",
"author": "{{ author }}",
"license": "Apache-2.0",
"homepage": "https://github.com/cskiro/claudex/tree/main/{{ skill_name }}",
"repository": {
"type": "git",
"url": "https://github.com/cskiro/claudex"
},
"keywords": [
{% for keyword in keywords -%}
"{{ keyword }}"{% if not loop.last %},{% endif %}
{% endfor -%}
],
{% if has_requirements -%}
"requirements": {
{% for req, version in requirements.items() -%}
"{{ req }}": "{{ version }}"{% if not loop.last %},{% endif %}
{% endfor -%}
},
{% endif -%}
"components": {
"agents": [
{
"name": "{{ skill_name }}",
"manifestPath": "SKILL.md"
}
]
},
"metadata": {
"category": "{{ category }}",
"status": "proof-of-concept",
"tested": "1-2 projects locally"
}
}

View File

@@ -0,0 +1,212 @@
# Guided Creation Workflow
Detailed step-by-step process for Mode 1: Interactive skill creation.
## Step 1: Basic Information
Ask user for:
### Skill Name
- Format: kebab-case (lowercase, hyphens)
- Validate: no spaces, descriptive
- Examples: "code-formatter", "test-generator", "api-validator"
- Check: name doesn't conflict with existing skills
### Brief Description
- 1-2 sentences for metadata
- Used in plugin.json and SKILL.md frontmatter
- Should clearly state what skill does
### Author Name
- Default: Connor
- Used in all metadata files
## Step 2: Skill Purpose & Category
### Detailed Description
- 2-4 sentences for SKILL.md Overview
- Explains full capabilities
### Category Selection
Present options from `data/categories.yaml`:
- **analysis**: Code analysis, auditing, quality checking
- **tooling**: Development tools, configuration validators
- **productivity**: Developer workflow, automation, insights
- **devops**: Infrastructure, deployment, monitoring
Suggest category based on skill purpose, allow user to confirm/change.
### Trigger Phrases
- Ask for 3-5 phrases users might say
- Provide examples based on similar skills
- Generate suggestions if needed
### Use Cases
- 3-5 concrete scenarios
- Specific, actionable situations
## Step 3: Complexity Assessment
Determine skill type through questions:
**Question 1**: "Does this skill have multiple distinct modes or workflows?"
- Yes → Complex (mode-based)
- No → Continue
**Question 2**: "Does this skill process data from files or generate reports?"
- Yes → Complex (data-processing)
- No → Continue
**Question 3**: "Does this skill need reference materials?"
- Yes → Standard
- No → Minimal
Reference: `data/skill-types.yaml`
## Step 4: Structure Customization
Based on type, ask about optional directories:
### For Standard or Complex skills:
- "Will you need reference data files?" → create data/
- "Will you need example outputs?" → create examples/
- "Will you need reusable templates?" → create templates/
### For Complex (mode-based) skills:
- "How many modes does this skill have?" (2-5 typical)
- For each mode:
- Mode name
- When to use (trigger phrases)
- Primary action
### For Complex (data-processing) skills:
- "What data sources will you process?"
- "What output formats do you need?"
- Always create scripts/ directory
## Step 5: Pattern Selection
Select from `patterns/` based on skill type:
- Minimal → phase-based.md
- Standard → phase-based.md or validation.md
- Complex (mode-based) → mode-based.md
- Complex (data-processing) → data-processing.md
Present pattern to user: "I'll use the [pattern] pattern, which means..."
## Step 6: Generation
### Create Directory Structure
```bash
mkdir -p ~/.claude/skills/[skill-name]/{required,optional-dirs}
```
### Generate Files from Templates
Using Jinja2 templates:
- SKILL.md from `templates/SKILL.md.j2`
- README.md from `templates/README.md.j2`
- plugin.json from `templates/plugin.json.j2`
- CHANGELOG.md from `templates/CHANGELOG.md.j2`
### Apply Pattern-Specific Content
- Include pattern guidance in sections
- Add pattern templates if needed
- Create mode files if mode-based
### Mark Customization Points
- Add TODO comments where needed
- Provide inline guidance
- Reference examples/
## Step 7: Quality Validation
Run validation using `data/quality-checklist.md`:
1. **File Existence**: Verify all required files
2. **Syntax Validation**: Check YAML/JSON
3. **Content Completeness**: No empty required sections
4. **Security Check**: No secrets
5. **Naming Conventions**: Verify kebab-case
6. **Quality Score**: Calculate A-F grade
### Validation Report Format
```markdown
# Skill Quality Report: [skill-name]
## Status: [PASS/NEEDS WORK]
### Files Generated
✅ SKILL.md
✅ README.md
✅ plugin.json
✅ CHANGELOG.md
### Quality Score: [Grade]
### Items Needing Customization
- [ ] SKILL.md: Complete "Response Style" section
- [ ] SKILL.md: Fill in workflow details
- [ ] README.md: Add concrete usage examples
### Validation Results
✅ No security issues
✅ Valid YAML frontmatter
✅ Valid JSON in plugin.json
✅ Proper naming conventions
```
## Step 8: Installation & Next Steps
### Verify Installation
```bash
ls -la ~/.claude/skills/[skill-name]/
```
### Testing Guidance
```markdown
## Test Your Skill
Try these trigger phrases in a new Claude session:
1. "[trigger-phrase-1]"
2. "[trigger-phrase-2]"
3. "[trigger-phrase-3]"
Expected behavior: [What should happen]
```
### Customization TODO List
- List all sections marked with TODO
- Prioritize by importance
- Provide examples for each
### Next Steps
```markdown
## Next Steps
1. Review generated files in ~/.claude/skills/[skill-name]/
2. Customize sections marked with TODO
3. Add reference materials to data/ (if applicable)
4. Create example outputs in examples/ (if applicable)
5. Test trigger phrases in new Claude session
6. Iterate on description and workflow
7. Run validation again
8. Ready to use or submit to marketplace!
```
## Information Summary
By end of guided creation, you should have:
| Field | Source | Used In |
|-------|--------|---------|
| Skill name | User input | Directory, all files |
| Brief description | User input | plugin.json, frontmatter |
| Detailed description | User input | SKILL.md Overview |
| Author | User input (default: Connor) | All metadata |
| Category | User selection | plugin.json |
| Trigger phrases | User input | SKILL.md |
| Use cases | User input | SKILL.md |
| Skill type | Assessment | Structure decisions |
| Pattern | Auto-selected | SKILL.md structure |
| Optional dirs | User input | Directory structure |

View File

@@ -0,0 +1,15 @@
# Changelog
## 0.2.0
- Refactored to Anthropic progressive disclosure pattern
- Updated description with "Use PROACTIVELY when..." format
- Removed version/author from frontmatter
## 0.1.0
- Initial release with three isolation modes
- Git Worktree (fast), Docker (balanced), VM (safest)
- Automatic risk assessment and mode detection
- Side-effect validation and dependency analysis
- Test report generation with actionable recommendations

View File

@@ -0,0 +1,335 @@
# Skill Isolation Tester
> Automated testing framework for Claude Code skills in isolated environments
## Overview
Test your newly created Claude Code skills in isolated environments before sharing them publicly. This skill automatically spins up git worktrees, Docker containers, or VMs to validate that your skills work correctly without hidden dependencies on your local setup.
## Features
- **Multiple Isolation Levels**: Choose from git worktree (fast), Docker (balanced), or VM (safest)
- **Automatic Mode Detection**: Analyzes skill risk and suggests appropriate isolation level
- **Comprehensive Validation**: Checks execution, side effects, dependencies, and cleanup
- **Detailed Reports**: Get actionable feedback with specific issues and recommendations
- **Safe Testing**: Protect your main development environment from experimental skills
## Quick Start
### Basic Usage
```
test skill my-new-skill in isolation
```
Claude will analyze your skill and choose the appropriate isolation environment.
### Specify Environment
```
test skill my-new-skill in worktree # Fast, lightweight
test skill my-new-skill in docker # OS isolation
test skill my-new-skill in vm # Maximum security
```
### Check for Issues
```
check if skill my-new-skill has hidden dependencies
verify skill my-new-skill cleans up after itself
```
## Isolation Modes
### 🚀 Git Worktree (Fast)
**Best for**: Read-only skills, quick iteration during development
- ✅ Creates test in seconds
- ✅ Minimal disk space
- ⚠️ Limited isolation (shares system packages)
**Prerequisites**: Git 2.5+
### 🐳 Docker (Balanced)
**Best for**: Skills that install packages or modify files
- ✅ Full OS isolation
- ✅ Reproducible environment
- ⚠️ Requires Docker installed
**Prerequisites**: Docker daemon running
### 🖥️ VM (Safest)
**Best for**: High-risk skills, untrusted sources
- ✅ Complete isolation
- ✅ Test on different OS versions
- ⚠️ Slower, resource-intensive
**Prerequisites**: Multipass, UTM, or VirtualBox
## What Gets Tested
### ✅ Execution Validation
- Skill completes without errors
- No unhandled exceptions
- Acceptable performance
### ✅ Side Effect Detection
- Files created/modified/deleted
- Processes started (and stopped)
- System configuration changes
- Network activity
### ✅ Dependency Analysis
- Required system packages
- NPM/pip dependencies
- Hardcoded paths
- Environment variables needed
### ✅ Cleanup Verification
- Temporary files removed
- Processes terminated
- System state restored
## Example Report
```markdown
# Skill Isolation Test Report: my-new-skill
## Status: ⚠️ WARNING (Ready with minor fixes)
### Execution Results
✅ Skill completed successfully
✅ No errors detected
⏱️ Execution time: 12s
### Issues Found
**HIGH Priority:**
- Missing documentation for `jq` dependency
- Hardcoded path: /Users/connor/.claude/config (line 45)
**MEDIUM Priority:**
- 3 temporary files not cleaned up in /tmp
### Recommendations
1. Document `jq` requirement in README
2. Replace hardcoded path with $HOME/.claude/config
3. Add cleanup for /tmp/skill-temp-*.log files
### Overall Grade: B (READY after addressing HIGH priority items)
```
## Installation
This skill is already available in your Claude Code skills directory.
### Manual Installation
```bash
cp -r skill-isolation-tester ~/.claude/skills/
```
### Verify Installation
Start Claude Code and say:
```
test skill [any-skill-name] in isolation
```
## Prerequisites
### Required (All Modes)
- Git 2.5+
- Claude Code 1.0+
### Optional (Docker Mode)
- Docker Desktop or Docker Engine
- 1GB+ free disk space
### Optional (VM Mode)
- Multipass (recommended) or
- UTM (macOS) or
- VirtualBox (cross-platform)
- 8GB+ host RAM
- 20GB+ free disk space
## Configuration
### Set Default Isolation Mode
Create `~/.claude/skills/skill-isolation-tester/config.json`:
```json
{
"default_mode": "docker",
"docker": {
"base_image": "ubuntu:22.04",
"memory_limit": "512m",
"cpu_limit": "1.0"
},
"vm": {
"platform": "multipass",
"os_version": "22.04",
"cpus": 2,
"memory": "2G",
"disk": "10G"
}
}
```
## Use Cases
### Before Submitting to Claudex Marketplace
```
validate skill my-marketplace-skill in docker
```
Ensures your skill works in clean environment without your personal configs.
### Testing Skills from Others
```
test skill untrusted-skill in vm
```
Maximum isolation protects your system from potential issues.
### Catching Environment-Specific Bugs
```
test skill my-skill in worktree
```
Quickly verify skill doesn't depend on your specific setup.
### CI/CD Integration
```bash
#!/bin/bash
# In your CI pipeline
claude "test skill $SKILL_NAME in docker"
if [ $? -eq 0 ]; then
echo "✅ Skill tests passed"
exit 0
else
echo "❌ Skill tests failed"
exit 1
fi
```
## Troubleshooting
### "Docker daemon not running"
**macOS**: Open Docker Desktop
**Linux**: `sudo systemctl start docker`
### "Multipass not found"
```bash
# macOS
brew install multipass
# Linux
sudo snap install multipass
```
### "Permission denied"
Add your user to docker group:
```bash
sudo usermod -aG docker $USER
newgrp docker
```
### "Out of disk space"
Clean up Docker:
```bash
docker system prune -a
```
## Best Practices
1. **Test before committing** - Catch issues early
2. **Start with worktree** - Fast iteration during development
3. **Use Docker for final validation** - Before public release
4. **Use VM for untrusted skills** - Safety first
5. **Review test reports** - Address all HIGH priority issues
6. **Document dependencies** - Help other users
## Advanced Usage
### Custom Test Scenarios
```
test skill my-skill with inputs "test-file.txt, --option value"
```
### Batch Testing
```
test all skills in directory ./skills/ in worktree
```
### Keep Environment for Debugging
```
test skill my-skill in docker --keep
```
Preserves container/VM for manual inspection.
## Architecture
```
skill-isolation-tester/
├── SKILL.md # Main skill manifest
├── README.md # This file
├── CHANGELOG.md # Version history
├── plugin.json # Marketplace metadata
├── modes/ # Mode-specific workflows
│ ├── mode1-git-worktree.md # Fast isolation
│ ├── mode2-docker.md # Container isolation
│ └── mode3-vm.md # VM isolation
├── data/ # Reference materials
│ ├── risk-assessment.md # How to assess skill risk
│ └── side-effect-checklist.md # What to check for
├── templates/ # Report templates
│ └── test-report.md # Standard report format
└── examples/ # Sample outputs
└── test-results/ # Example test results
```
## Contributing
Found a bug or have a feature request? Issues and PRs welcome!
## License
MIT License - see LICENSE file for details
## Related Skills
- **skill-creator**: Create new skills with proper structure
- **git-worktree-setup**: Manage parallel development workflows
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for version history.
## Credits
Created by Connor
Inspired by best practices in software testing and isolation
---
**Remember**: Test in isolation, ship with confidence! 🚀

View File

@@ -0,0 +1,174 @@
---
name: skill-isolation-tester
description: Use PROACTIVELY when validating Claude Code skills before sharing or public release. Automated testing framework using multiple isolation environments (git worktree, Docker containers, VMs) to catch environment-specific bugs, hidden dependencies, and cleanup issues. Includes production-ready test templates and risk-based mode auto-detection. Not for functional testing of skill logic or non-skill code.
---
# Skill Isolation Tester
Tests Claude Code skills in isolated environments to ensure they work correctly without dependencies on your local setup.
## When to Use
**Trigger Phrases**:
- "test skill [name] in isolation"
- "validate skill [name] in clean environment"
- "test my new skill in worktree/docker/vm"
- "check if skill [name] has hidden dependencies"
**Use Cases**:
- Test before committing or sharing publicly
- Validate no hidden dependencies on local environment
- Verify cleanup behavior (no leftover files/processes)
- Catch environment-specific bugs
## Quick Decision Matrix
| Request | Mode | Isolation Level |
|---------|------|-----------------|
| "test in worktree" | Git Worktree | Fast, lightweight |
| "test in docker" | Docker | Full OS isolation |
| "test in vm" | VM | Complete isolation |
| "test skill X" (unspecified) | Auto-detect | Based on skill risk |
## Risk-Based Auto-Detection
| Risk Level | Criteria | Recommended Mode |
|------------|----------|------------------|
| Low | Read-only, no system commands | Git Worktree |
| Medium | File creation, bash commands | Docker |
| High | System config changes, VM ops | VM |
## Mode 1: Git Worktree (Fast)
**Best for**: Low-risk skills, quick iteration
**Process**:
1. Create isolated git worktree
2. Install Claude Code
3. Copy skill and run tests
4. Cleanup
**Workflow**: `modes/mode1-git-worktree.md`
## Mode 2: Docker Container (Balanced)
**Best for**: Medium-risk skills, full OS isolation
**Process**:
1. Build/pull Docker image
2. Create container with Claude Code
3. Run skill tests with monitoring
4. Cleanup container and images
**Workflow**: `modes/mode2-docker.md`
## Mode 3: VM (Safest)
**Best for**: High-risk skills, untrusted code
**Process**:
1. Provision VM, take snapshot
2. Install Claude Code
3. Run tests with full monitoring
4. Rollback or cleanup
**Workflow**: `modes/mode3-vm.md`
## Test Templates
Production-ready templates in `test-templates/`:
| Template | Use For |
|----------|---------|
| `docker-skill-test.sh` | Docker container/image skills |
| `docker-skill-test-json.sh` | CI/CD with JSON/JUnit output |
| `api-skill-test.sh` | HTTP/API calling skills |
| `file-manipulation-skill-test.sh` | File modification skills |
| `git-skill-test.sh` | Git operation skills |
**Usage**:
```bash
chmod +x test-templates/docker-skill-test.sh
./test-templates/docker-skill-test.sh my-skill-name
# CI/CD with JSON output
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill-name
```
## Helper Library
`lib/docker-helpers.sh` provides robust Docker testing utilities:
```bash
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
trap cleanup_on_exit EXIT
preflight_check_docker || exit 1
safe_docker_build "Dockerfile" "skill-test:my-skill"
safe_docker_run "skill-test:my-skill" bash -c "echo 'Testing...'"
```
**Functions**: `validate_shell_command`, `retry_docker_command`, `cleanup_on_exit`, `preflight_check_docker`, `safe_docker_build`, `safe_docker_run`
## Validation Checks
**Execution**:
- [ ] Skill completes without errors
- [ ] Output matches expected format
- [ ] Execution time acceptable
**Side Effects**:
- [ ] No orphaned processes
- [ ] Temporary files cleaned up
- [ ] No unexpected system modifications
**Portability**:
- [ ] No hardcoded paths
- [ ] All dependencies documented
- [ ] Works in clean environment
## Test Report Format
```markdown
# Skill Isolation Test Report: [skill-name]
## Environment: [Git Worktree / Docker / VM]
## Status: [PASS / FAIL / WARNING]
### Execution Results
✅ Skill completed successfully
### Side Effects Detected
⚠️ 3 temporary files not cleaned up
### Dependency Analysis
📦 Required: jq, git
### Overall Grade: B (READY with minor fixes)
```
## Reference Materials
- `modes/mode1-git-worktree.md` - Fast isolation workflow
- `modes/mode2-docker.md` - Container isolation workflow
- `modes/mode3-vm.md` - Full VM isolation workflow
- `data/risk-assessment.md` - Skill risk evaluation
- `data/side-effect-checklist.md` - Side effect validation
- `templates/test-report.md` - Report template
- `test-templates/README.md` - Template documentation
## Quick Commands
```bash
# Test with auto-detection
test skill my-new-skill in isolation
# Test in specific environment
test skill my-new-skill in worktree # Fast
test skill my-new-skill in docker # Balanced
test skill my-new-skill in vm # Safest
```
---
**Version**: 0.1.0 | **Author**: Connor

View File

@@ -0,0 +1,391 @@
# Skill Risk Assessment Guide
## Overview
This guide helps you assess the risk level of a skill to determine the appropriate isolation environment for testing. Risk assessment prevents over-isolation (wasting time) and under-isolation (security issues).
## Risk Levels
### Low Risk → Git Worktree
**Characteristics:**
- Read-only operations on existing files
- No system commands (bash, npm, apt, etc.)
- No file creation outside skill directory
- No network requests
- Pure data processing or analysis
- File reading and reporting only
**Examples:**
- Code analyzer that reads files and generates reports
- Configuration validator that checks syntax
- Documentation generator from code comments
- Markdown formatter or linter
- Log file parser
**Appropriate Environment:** Git Worktree (fast, lightweight)
### Medium Risk → Docker
**Characteristics:**
- File creation in user directories
- NPM/pip package installation
- Bash commands for file operations
- Git operations (clone, commit, etc.)
- Network requests (API calls, downloads)
- Environment variable reads
- Temporary file creation
- Database connections (local)
**Examples:**
- Code generator that creates new files
- Package installer or dependency manager
- API integration that fetches remote data
- Build tool that compiles code
- Test runner that executes tests
- Migration tool that updates files
**Appropriate Environment:** Docker (OS isolation, reproducible)
### High Risk → VM
**Characteristics:**
- System configuration changes (/etc/ modifications)
- Service installation (systemd, cron)
- Kernel module loading
- VM or container operations
- Database schema migrations (production)
- Destructive operations (file deletion, disk formatting)
- Privilege escalation (sudo commands)
- Unknown or untrusted source
**Examples:**
- System setup automation
- Infrastructure provisioning
- VM management tools
- Security testing tools
- Experimental or unreviewed skills
- Skills from external repositories
**Appropriate Environment:** VM (complete isolation, safest)
## Assessment Checklist
### Step 1: Parse Skill Manifest (SKILL.md)
Read the skill's SKILL.md and look for these keywords:
**Low Risk Indicators:**
- "analyze", "read", "parse", "validate", "check", "lint", "format"
- "generate report", "calculate", "summarize"
- Read-only file operations
- No system commands mentioned
**Medium Risk Indicators:**
- "install", "create", "write", "modify", "update", "build", "compile"
- "npm install", "pip install", "git clone"
- "fetch", "download", "API call"
- File creation mentioned
- Bash commands for file operations
**High Risk Indicators:**
- "sudo", "systemctl", "cron", "service"
- "configure system", "modify /etc"
- "VM", "docker run", "container"
- "delete", "remove", "format"
- "root access", "privilege"
### Step 2: Scan Skill Code
If skill includes scripts or code files, scan for:
**Red Flags (High Risk):**
```bash
# In bash scripts
sudo
systemctl
/etc/
chmod 777
rm -rf /
dd if=
mkfs
usermod
passwd
```
```javascript
// In JavaScript/Node
require('child_process').exec('sudo')
fs.rmdirSync('/', { recursive: true })
process.setuid(0)
```
```python
# In Python
os.system('sudo')
import subprocess
subprocess.run(['sudo', ...])
```
**Medium Risk Patterns:**
```bash
npm install
git clone
curl | bash
apt-get install
brew install
pip install
mkdir -p
touch
echo > file
```
**Low Risk Patterns:**
```bash
cat file.txt
grep pattern
find . -name
ls -la
echo "message"
```
### Step 3: Check Dependencies
Review plugin.json or README for dependencies:
**Low Risk:**
- No external dependencies
- Pure JavaScript/Python/Ruby standard library
- Read-only CLI tools (cat, grep, jq for reading only)
**Medium Risk:**
- NPM packages listed
- Python packages (via requirements.txt)
- Common CLI tools (git, curl, wget)
- Database connections (read/write)
**High Risk:**
- System packages (apt, yum, brew)
- Kernel modules
- Root-level dependencies
- Unsigned binaries
- External scripts from unknown sources
### Step 4: Review File Operations
Check what directories the skill accesses:
**Low Risk:**
- Reads from current directory only
- Reads from specified input files
- Writes reports to current directory
**Medium Risk:**
- Reads/writes to ~/.claude/
- Reads/writes to /tmp/
- Creates files in user directories
- Modifies project files
**High Risk:**
- Accesses /etc/
- Accesses /usr/ or /usr/local/
- Accesses /sys/ or /proc/
- Modifies system binaries
- Accesses /var/log/
### Step 5: Network Activity Assessment
**Low Risk:**
- No network activity
- Reads from local cache only
**Medium Risk:**
- HTTP GET requests to public APIs
- Documented API endpoints
- Read-only data fetching
- HTTPS only
**High Risk:**
- HTTP POST with sensitive data
- Unclear network destinations
- Raw socket operations
- Arbitrary URL from user input
- Self-updating mechanism
## Automatic Risk Scoring
Use this scoring system:
```javascript
function assessSkillRisk(skill) {
let score = 0;
// File operations
if (mentions(skill, "read", "parse", "analyze")) score += 1;
if (mentions(skill, "write", "create", "modify")) score += 3;
if (mentions(skill, "delete", "remove", "rm -rf")) score += 8;
// System operations
if (mentions(skill, "npm install", "pip install")) score += 3;
if (mentions(skill, "apt-get", "brew install")) score += 5;
if (mentions(skill, "sudo", "systemctl", "service")) score += 10;
// File paths
if (accesses(skill, "~/", "/tmp/")) score += 2;
if (accesses(skill, "/etc/", "/usr/")) score += 8;
// Network
if (mentions(skill, "fetch", "API", "curl")) score += 2;
if (mentions(skill, "download", "wget")) score += 3;
// Process operations
if (mentions(skill, "exec", "spawn", "child_process")) score += 4;
// Determine risk level
if (score <= 3) return "low"; // Worktree
if (score <= 10) return "medium"; // Docker
return "high"; // VM
}
```
**Scoring Reference:**
- 0-3: Low Risk → Git Worktree
- 4-10: Medium Risk → Docker
- 11+: High Risk → VM
## Special Cases
### Unknown or Unreviewed Skills
**Default:** High Risk (VM isolation)
Even if skill appears low risk, use VM for first test of:
- Skills from external repositories
- Skills without documentation
- Skills with obfuscated code
- Skills from untrusted authors
### Skills in Active Development
**Recommendation:** Medium Risk (Docker)
For your own skills during development:
- Start with Git Worktree for speed
- Use Docker before committing
- Use VM before public release
### Skills from Marketplace
**Recommendation:** Follow listed risk level
Trusted marketplace skills can use their documented risk level.
## Override Cases
User can always override automatic detection:
```
test skill low-risk-skill in vm # More isolation than needed (safe but slow)
test skill high-risk-skill in docker # Less isolation (not recommended)
```
**Warn user if choosing lower isolation than recommended.**
## Risk Re-assessment
Re-assess risk if skill is updated:
- Major version changes
- New dependencies added
- New file operations
- Expanded scope
## Decision Tree
```
Start
|
├─ Does skill read files only?
| └─ YES → Low Risk (Worktree)
| └─ NO → Continue
|
├─ Does skill install packages or modify files?
| └─ YES → Medium Risk (Docker)
| └─ NO → Continue
|
├─ Does skill modify system configs or use sudo?
| └─ YES → High Risk (VM)
| └─ NO → Continue
|
└─ Is skill from untrusted source?
└─ YES → High Risk (VM)
└─ NO → Medium Risk (Docker)
```
## Example Assessments
### Example 1: "code-formatter"
**Description:** Formats JavaScript/TypeScript files using prettier
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: Yes (score: +3)
- System commands: No
- Dependencies: prettier (npm package) (score: +3)
- File paths: Current directory only
**Total Score:** 7
**Risk Level:** Medium → Docker
**Reasoning:** Modifies files but limited to project directory. Docker provides adequate isolation.
### Example 2: "log-analyzer"
**Description:** Parses log files and generates HTML report
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: Yes (HTML report) (score: +3)
- System commands: No
- Dependencies: None
- File paths: Current directory + /tmp for temp files (score: +2)
**Total Score:** 6
**Risk Level:** Medium → Docker
**Reasoning:** Safe operations but creates files. Docker ensures clean testing.
### Example 3: "system-auditor"
**Description:** Audits system security configuration
**Analysis:**
- Reads files: Yes, including /etc/ (score: +1 + 8)
- System commands: Runs systemctl, checks services (score: +10)
- Dependencies: System tools
- File paths: /etc/, /var/log/ (score: +8)
**Total Score:** 27
**Risk Level:** High → VM
**Reasoning:** Accesses sensitive system directories and uses system commands. VM required.
### Example 4: "markdown-linter"
**Description:** Checks markdown files for style violations
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: No (only stdout)
- System commands: No
- Dependencies: None
- File paths: Current directory only
**Total Score:** 1
**Risk Level:** Low → Git Worktree
**Reasoning:** Pure read-only analysis. Worktree is sufficient and fast.
---
**Remember:** When in doubt, choose higher isolation. It's better to be safe than to clean up a compromised system. Speed is secondary to security.

View File

@@ -0,0 +1,543 @@
# Side Effect Detection Checklist
## Overview
This checklist helps identify all side effects caused by skill execution. Side effects are any changes to the system state beyond the skill's primary output. Proper detection ensures skills are well-behaved and clean up after themselves.
## Why Side Effects Matter
**Portability:** Skills with untracked side effects may not work for other users
**Cleanliness:** Leftover files and processes waste resources
**Security:** Unexpected system modifications are security risks
**Documentation:** Users need to know what a skill changes
## Categories of Side Effects
## 1. Filesystem Changes
### Files Created
**What to Check:**
- Files in skill directory
- Files in /tmp/ or /var/tmp/
- Files in user home directory (~/)
- Files in system directories (/usr/local/, /opt/)
- Hidden files (.*) and cache directories (.cache/)
- Lock files (.lock, .pid)
**How to Detect:**
```bash
# Before execution
find /path -type f > /tmp/before-files.txt
# After execution
find /path -type f > /tmp/after-files.txt
# Compare
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^>" | sed 's/^> //'
```
**Expected Behavior:**
- ✅ Temporary files in /tmp cleaned up before exit
- ✅ Output files in current directory or specified location
- ✅ Cache files in ~/.cache/skill-name/ (acceptable)
- ❌ Random files scattered across filesystem
- ❌ Files in system directories without explicit permission
**Severity:**
- **LOW**: Cache files in proper location
- **MEDIUM**: Temp files not cleaned up
- **HIGH**: Files in system directories
- **CRITICAL**: Files overwriting existing user data
### Files Modified
**What to Check:**
- Project files (package.json, tsconfig.json, etc.)
- Configuration files (.env, .config/)
- System configs (/etc/*)
- User configs (~/.bashrc, ~/.zshrc)
- Git repository files (.git/)
**How to Detect:**
```bash
# Take checksums before
find /path -type f -exec md5sum {} \; > /tmp/before-checksums.txt
# After execution
find /path -type f -exec md5sum {} \; > /tmp/after-checksums.txt
# Find modified files
diff /tmp/before-checksums.txt /tmp/after-checksums.txt
```
**Expected Behavior:**
- ✅ Only files explicitly in skill's scope modified
- ✅ Backup created before modifying important files
- ✅ Modifications clearly documented in output
- ❌ Configuration files modified without notice
- ❌ Git repository modified unexpectedly
- ❌ System files changed
**Severity:**
- **LOW**: Intended file modifications (skill's purpose)
- **MEDIUM**: Unintended project file changes
- **HIGH**: User config modifications without consent
- **CRITICAL**: System file modifications
### Files Deleted
**What to Check:**
- Files in skill scope (expected deletions)
- Temp files created by skill
- User files outside skill scope
- System files
**How to Detect:**
```bash
# Compare before/after file lists
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^<" | sed 's/^< //'
```
**Expected Behavior:**
- ✅ Only temporary files created by skill deleted
- ✅ Deletions are part of skill's documented purpose
- ❌ User files deleted without explicit permission
- ❌ Project files deleted accidentally
- ❌ System files deleted
**Severity:**
- **LOW**: Skill's own temp files deleted (cleanup)
- **MEDIUM**: Unexpected file deletions in project
- **HIGH**: User files deleted
- **CRITICAL**: System files or important data deleted
### Directory Changes
**What to Check:**
- New directories created
- Working directory changed
- Directories removed
**How to Detect:**
```bash
# List directories before/after
find /path -type d > /tmp/before-dirs.txt
find /path -type d > /tmp/after-dirs.txt
diff /tmp/before-dirs.txt /tmp/after-dirs.txt
```
**Expected Behavior:**
- ✅ Directories created for skill output
- ✅ Temp directories in /tmp
- ✅ Working directory restored after operations
- ❌ Empty directories left behind
- ❌ Directories created in unexpected locations
## 2. Process Management
### Processes Created
**What to Check:**
- Foreground processes (should complete)
- Background processes (daemons, services)
- Child processes (spawned by skill)
- Zombie processes
**How to Detect:**
```bash
# Before execution
ps aux > /tmp/before-processes.txt
# After execution (wait 30 seconds)
sleep 30
ps aux > /tmp/after-processes.txt
# Find new processes
diff /tmp/before-processes.txt /tmp/after-processes.txt | grep "^>"
```
**Expected Behavior:**
- ✅ All skill processes complete and exit
- ✅ No orphaned child processes
- ✅ Background services documented if needed
- ❌ Processes still running after skill exits
- ❌ Zombie processes
- ❌ High CPU/memory usage processes
**Severity:**
- **LOW**: Short-lived child processes that exit cleanly
- **MEDIUM**: Background processes that should have been stopped
- **HIGH**: Orphaned processes consuming resources
- **CRITICAL**: Runaway processes (infinite loops, memory leaks)
### Process Resource Usage
**What to Check:**
- CPU usage during and after execution
- Memory consumption
- Disk I/O
- Network I/O
**How to Detect:**
```bash
# Monitor during execution
top -b -n 1 > /tmp/resource-usage.txt
# Or use htop, ps aux, etc.
```
**Expected Behavior:**
- ✅ Reasonable resource usage for task
- ✅ Resources released after completion
- ❌ 100% CPU for extended time
- ❌ Memory leaks (growing usage)
- ❌ Excessive disk I/O
**Severity:**
- **LOW**: Temporary spike during execution
- **MEDIUM**: Higher than expected but acceptable
- **HIGH**: Excessive usage (> 80% CPU, > 1GB RAM)
- **CRITICAL**: Resource exhaustion (OOM, disk full)
## 3. System Configuration
### Environment Variables
**What to Check:**
- New environment variables set
- Modified PATH, HOME, etc.
- Shell configuration changes
**How to Detect:**
```bash
# Before
env | sort > /tmp/before-env.txt
# After
env | sort > /tmp/after-env.txt
# Compare
diff /tmp/before-env.txt /tmp/after-env.txt
```
**Expected Behavior:**
- ✅ No permanent environment changes
- ✅ Temporary env vars for skill only
- ❌ PATH modified globally
- ❌ System env vars changed
**Severity:**
- **LOW**: Temporary env vars in skill scope
- **MEDIUM**: PATH modified in current shell
- **HIGH**: .bashrc/.zshrc modified
- **CRITICAL**: System-wide env changes
### System Services
**What to Check:**
- Systemd services started
- Cron jobs created
- Launch agents/daemons (macOS)
**How to Detect:**
```bash
# Linux
systemctl list-units --type=service > /tmp/before-services.txt
# After
systemctl list-units --type=service > /tmp/after-services.txt
diff /tmp/before-services.txt /tmp/after-services.txt
# Cron jobs
crontab -l > /tmp/before-cron.txt
# After
crontab -l > /tmp/after-cron.txt
```
**Expected Behavior:**
- ✅ No services unless explicitly documented
- ✅ Services stopped after skill exits
- ❌ Services left running
- ❌ Cron jobs created without consent
**Severity:**
- **MEDIUM**: Services that should have been stopped
- **HIGH**: Unexpected service installations
- **CRITICAL**: System services modified
### Package Installations
**What to Check:**
- NPM packages (global)
- Python packages (pip)
- System packages (apt, brew)
- Ruby gems, Go modules, etc.
**How to Detect:**
```bash
# NPM global packages
npm list -g --depth=0 > /tmp/before-npm.txt
# After
npm list -g --depth=0 > /tmp/after-npm.txt
diff /tmp/before-npm.txt /tmp/after-npm.txt
# System packages (Debian/Ubuntu)
dpkg -l > /tmp/before-packages.txt
# After
dpkg -l > /tmp/after-packages.txt
```
**Expected Behavior:**
- ✅ All dependencies documented in README
- ✅ Local installations (in project directory)
- ❌ Global package installations without notice
- ❌ System package changes
**Severity:**
- **LOW**: Local project dependencies
- **MEDIUM**: Global NPM packages (if documented)
- **HIGH**: System packages installed
- **CRITICAL**: Conflicting package versions
## 4. Network Activity
### Connections Established
**What to Check:**
- HTTP/HTTPS requests
- WebSocket connections
- Database connections
- SSH connections
**How to Detect:**
```bash
# Monitor network during execution
# macOS
lsof -i -n -P | grep <skill-process>
# Linux
netstat -tupn | grep <skill-process>
# Or use tcpdump, wireshark for detailed analysis
```
**Expected Behavior:**
- ✅ All network requests documented
- ✅ HTTPS used for sensitive data
- ✅ Connections properly closed
- ❌ Unexpected outbound connections
- ❌ Data sent to unknown servers
- ❌ Connections left open
**Severity:**
- **LOW**: Documented API calls (HTTPS)
- **MEDIUM**: HTTP requests (not HTTPS)
- **HIGH**: Unexpected network destinations
- **CRITICAL**: Data exfiltration attempts
### Data Transmitted
**What to Check:**
- API payloads
- File uploads/downloads
- Metrics/telemetry data
**Expected Behavior:**
- ✅ Clear documentation of what's sent
- ✅ User consent for data transmission
- ✅ No sensitive data in plaintext
- ❌ Telemetry without consent
- ❌ Credentials sent over HTTP
## 5. Database & State
### Database Changes
**What to Check:**
- Tables created/dropped
- Records inserted/updated/deleted
- Schema migrations
- Indexes created
**How to Detect:**
```sql
-- Before (SQLite example)
SELECT * FROM sqlite_master WHERE type='table';
-- After
SELECT * FROM sqlite_master WHERE type='table';
-- Record counts
SELECT COUNT(*) FROM each_table;
```
**Expected Behavior:**
- ✅ Changes are part of skill's purpose
- ✅ Backup created before modifications
- ✅ Transactions used (rollback on error)
- ❌ Unexpected table drops
- ❌ Data loss without backup
- ❌ Schema changes without migration docs
### Cache & Session State
**What to Check:**
- Redis/Memcached keys
- Session files
- Browser storage (if skill uses web UI)
**Expected Behavior:**
- ✅ Cache properly namespaced
- ✅ Expired sessions cleaned up
- ❌ Cache pollution
- ❌ Stale session files
## 6. Permissions & Security
### File Permissions
**What to Check:**
- File permission changes (chmod)
- Ownership changes (chown)
- ACL modifications
**How to Detect:**
```bash
# Before
ls -la /path > /tmp/before-perms.txt
# After
ls -la /path > /tmp/after-perms.txt
diff /tmp/before-perms.txt /tmp/after-perms.txt
```
**Expected Behavior:**
- ✅ Appropriate permissions for created files
- ✅ No overly permissive files (777)
- ❌ Permissions changed on existing files
- ❌ World-writable files created
**Severity:**
- **MEDIUM**: Overly restrictive permissions
- **HIGH**: Overly permissive permissions (777)
- **CRITICAL**: System file permission changes
### Security Credentials
**What to Check:**
- API keys in files or logs
- Passwords in plaintext
- Certificates/keys created
- SSH keys modified
**Expected Behavior:**
- ✅ Credentials stored securely (keychain, vault)
- ✅ No credentials in logs or temp files
- ❌ API keys in plaintext files
- ❌ Passwords in shell history
- ❌ Private keys with wrong permissions
**Severity:**
- **HIGH**: Credentials in files
- **CRITICAL**: Credentials exposed to other users
## Automated Detection Script
```bash
#!/bin/bash
# side-effect-detector.sh
BEFORE_DIR="/tmp/skill-test-before"
AFTER_DIR="/tmp/skill-test-after"
mkdir -p "$BEFORE_DIR" "$AFTER_DIR"
# Capture before state
capture_state() {
local DIR="$1"
find /tmp -type f > "$DIR/tmp-files.txt"
ps aux > "$DIR/processes.txt"
env | sort > "$DIR/env.txt"
npm list -g --depth=0 > "$DIR/npm-global.txt" 2>/dev/null
netstat -tupn > "$DIR/network.txt" 2>/dev/null
# Add more as needed
}
# Before
capture_state "$BEFORE_DIR"
# Run skill
echo "Execute skill now..."
read -p "Press enter when skill completes..."
# After
capture_state "$AFTER_DIR"
# Compare
echo "=== Side Effects Detected ==="
echo ""
echo "Files in /tmp:"
diff "$BEFORE_DIR/tmp-files.txt" "$AFTER_DIR/tmp-files.txt" | grep "^>" | wc -l
echo "Processes:"
diff "$BEFORE_DIR/processes.txt" "$AFTER_DIR/processes.txt" | grep "^>" | head -5
echo "Environment variables:"
diff "$BEFORE_DIR/env.txt" "$AFTER_DIR/env.txt"
echo "NPM global packages:"
diff "$BEFORE_DIR/npm-global.txt" "$AFTER_DIR/npm-global.txt"
# Detailed reports
echo ""
echo "Full reports in: $BEFORE_DIR and $AFTER_DIR"
```
## Reporting Template
```markdown
## Side Effects Report
### Filesystem Changes
- **Files Created**: X files
- /tmp/skill-temp-123.log (5KB)
- ~/.cache/skill-name/data.json (15KB)
- **Files Modified**: Y files
- package.json (version updated)
- **Files Deleted**: Z files
- /tmp/old-cache.json
### Process Management
- **Processes Created**: N
- **Orphaned Processes**: M (list if > 0)
- **Resource Usage**: Peak 45% CPU, 128MB RAM
### System Configuration
- **Env Vars Changed**: None
- **Services Started**: None
- **Packages Installed**: jq (1.6)
### Network Activity
- **Connections**: 3 HTTPS requests to api.example.com
- **Data Transmitted**: 1.2KB (API calls)
### Database Changes
- **Tables**: 1 created (skill_cache)
- **Records**: 15 inserted
### Security
- **Permissions**: All files 644 (appropriate)
- **Credentials**: No sensitive data detected
### Overall Assessment
✅ Cleanup: Mostly clean (3 temp files remaining)
⚠️ Documentation: Missing jq dependency in README
✅ Security: No issues
```
---
**Remember:** The goal is not zero side effects (that's impossible for useful skills), but **documented, intentional, and cleaned-up** side effects. Every side effect should be either part of the skill's purpose or properly cleaned up on exit.

View File

@@ -0,0 +1,292 @@
# Mode 1: Git Worktree Isolation
## When to Use
**Best for:**
- Read-only skills or skills with minimal file operations
- Quick validation during development
- Skills that don't require system package installation
- Testing iterations where speed matters
**Not suitable for:**
- Skills that install system packages (npm install, apt-get, brew, etc.)
- Skills that modify system configurations
- Skills that require a clean Node.js environment
**Risk Level**: Low complexity skills only
## Advantages
-**Fast**: Creates worktree in seconds
- 💾 **Efficient**: Shares git history, minimal disk space
- 🔄 **Repeatable**: Easy to create, test, and destroy
- 🛠️ **Familiar**: Same git tools you already know
## Limitations
- ❌ Shares system packages (node_modules, global npm packages)
- ❌ Shares environment variables and configs
- ❌ Same OS user and permissions
- ❌ Cannot test system-level dependencies
- ⚠️ Not true isolation - just a separate git checkout
## Prerequisites
1. Must be in a git repository
2. Git worktree feature available (Git 2.5+)
3. Clean working directory (or willing to proceed with uncommitted changes)
4. Sufficient disk space for additional worktree
## Workflow
### Step 1: Validate Environment
```bash
# Check if in git repo
git rev-parse --is-inside-work-tree
# Check for uncommitted changes
git status --porcelain
# Get current repo name
basename $(git rev-parse --show-toplevel)
```
If dirty working directory → warn user but allow proceeding (isolation is separate)
### Step 2: Create Isolation Worktree
**Generate unique branch name:**
```bash
BRANCH_NAME="test-skill-$(date +%s)" # e.g., test-skill-1699876543
```
**Create worktree:**
```bash
WORKTREE_PATH="../$(basename $(pwd))-${BRANCH_NAME}"
git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
```
Example result: `/Users/connor/claude-test-skill-1699876543/`
### Step 3: Copy Skill to Worktree
```bash
# Copy skill directory to worktree's .claude/skills/
cp -r ~/.claude/skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
# Or if skill is in current repo
cp -r ./skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
```
**Verify copy:**
```bash
ls -la "$WORKTREE_PATH/.claude/skills/[skill-name]/"
```
### Step 4: Setup Development Environment
**Install dependencies if needed:**
```bash
cd "$WORKTREE_PATH"
# Detect package manager
if [ -f "pnpm-lock.yaml" ]; then
pnpm install
elif [ -f "yarn.lock" ]; then
yarn install
elif [ -f "package-lock.json" ]; then
npm install
fi
```
**Copy environment files (optional):**
```bash
# Only if skill needs .env for testing
cp ../.env "$WORKTREE_PATH/.env"
```
### Step 5: Take "Before" Snapshot
```bash
# List all files in worktree
find "$WORKTREE_PATH" -type f > /tmp/before-files.txt
# List running processes (for comparison later)
ps aux > /tmp/before-processes.txt
# Current disk usage
du -sh "$WORKTREE_PATH" > /tmp/before-disk.txt
```
### Step 6: Execute Skill in Worktree
**Open new Claude Code session in worktree:**
```bash
cd "$WORKTREE_PATH"
claude
```
**Run skill with test trigger:**
- User manually tests skill with trigger phrases
- OR: Use Claude CLI to run skill programmatically (if available)
**Monitor execution:**
- Watch for errors in output
- Note execution time
- Check resource usage
### Step 7: Take "After" Snapshot
```bash
# List all files after execution
find "$WORKTREE_PATH" -type f > /tmp/after-files.txt
# Compare before/after
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
# Check for new processes
ps aux > /tmp/after-processes.txt
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
# Check disk usage
du -sh "$WORKTREE_PATH" > /tmp/after-disk.txt
```
### Step 8: Analyze Results
**Check for side effects:**
```bash
# Files created
grep ">" /tmp/file-changes.txt | wc -l
# Files deleted
grep "<" /tmp/file-changes.txt | wc -l
# New processes (filter out expected ones)
# Look for processes related to skill
```
**Validate cleanup:**
```bash
# Check for leftover temp files
find "$WORKTREE_PATH" -name "*.tmp" -o -name "*.temp" -o -name ".cache"
# Check for orphaned processes
# Look for processes still running from skill
```
### Step 9: Generate Report
**Execution Results:**
- ✅ Skill completed successfully / ❌ Skill failed with error
- ⏱️ Execution time: Xs
- 📊 Resource usage: XMB disk, X% CPU
**Side Effects:**
- Files created: [count] (list if < 10)
- Files modified: [count]
- Processes created: [count]
- Temporary files remaining: [count]
**Dependency Analysis:**
- Required tools: [list tools used by skill]
- Hardcoded paths: [list any absolute paths found]
- Environment variables: [list any ENV vars referenced]
### Step 10: Cleanup
**Ask user:**
```
Test complete. Worktree location: $WORKTREE_PATH
Options:
1. Keep worktree for debugging
2. Remove worktree and branch
3. Remove worktree, keep branch
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Full cleanup
git worktree remove "$WORKTREE_PATH"
git branch -D "$BRANCH_NAME"
# Option 3: Keep branch
git worktree remove "$WORKTREE_PATH"
```
## Interpreting Results
### ✅ **PASS** - Ready for git worktree environments
- Skill completed without errors
- No unexpected file modifications
- No orphaned processes
- No hardcoded paths detected
- Temporary files cleaned up
### ⚠️ **WARNING** - Works but has minor issues
- Skill works but left temporary files
- Uses some hardcoded paths (but non-critical)
- Performance could be improved
- Missing some documentation
### ❌ **FAIL** - Not ready
- Skill crashed or hung
- Requires system packages not installed
- Modifies files outside skill directory without permission
- Creates orphaned processes
- Has critical hardcoded paths
## Common Issues
### Issue: "Skill not found in Claude"
**Cause**: Skill wasn't copied to worktree's .claude/skills/
**Fix**: Verify copy command and path
### Issue: "Permission denied" errors
**Cause**: Skill trying to write to protected directories
**Fix**: Identify problematic paths, suggest using /tmp or skill directory
### Issue: "Command not found"
**Cause**: Skill depends on system tool not installed
**Fix**: Document dependency, suggest adding to skill README
### Issue: Test results different from main directory
**Cause**: Different node_modules or configs
**Fix**: This is expected - worktree shares some state, not true isolation
## Best Practices
1. **Always take before/after snapshots** for accurate comparison
2. **Test multiple times** to ensure consistency
3. **Check temp directories** (`/tmp`, `/var/tmp`) for leftover files
4. **Monitor processes** for at least 30s after skill completes
5. **Document all dependencies** found during testing
6. **Use relative paths** in skill code, never absolute
7. **Cleanup worktrees** regularly to avoid clutter
## Quick Command Reference
```bash
# Create test worktree
git worktree add ../test-branch -b test-branch
# List all worktrees
git worktree list
# Remove worktree
git worktree remove ../test-branch
# Remove worktree and branch
git worktree remove ../test-branch && git branch -D test-branch
# Find temp files created
find /tmp -name "*skill-name*" -mtime -1
```
---
**Remember:** Git worktree provides quick, lightweight isolation but is NOT true isolation. Use for low-risk skills or fast iteration during development. For skills that modify system state, use Docker or VM modes.

View File

@@ -0,0 +1,468 @@
# Mode 2: Docker Container Isolation
## Using Docker Helper Library
**RECOMMENDED:** Use the helper library for robust error handling and cleanup.
```bash
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
# Set cleanup trap (runs automatically on exit)
trap cleanup_on_exit EXIT
# Pre-flight checks
preflight_check_docker || exit 1
```
The helper library provides:
- Shell command validation (prevents syntax errors)
- Retry logic with exponential backoff
- Automatic cleanup on exit
- Pre-flight Docker environment checks
- Safe build and run functions
See `lib/docker-helpers.sh` for full documentation.
---
## When to Use
**Best for:**
- Skills that install npm/pip packages or system dependencies
- Skills that modify configuration files
- Medium-risk skills that need OS-level isolation
- Testing skills with different Claude Code versions
- Reproducible testing environments
**Not suitable for:**
- Skills that require VM operations or nested virtualization
- Skills that need GUI access (without X11 forwarding)
- Extremely high-risk skills (use VM mode instead)
**Risk Level**: Low to medium complexity skills
## Advantages
- 🏗️ **True OS Isolation**: Complete filesystem and process separation
- 📦 **Reproducible**: Same environment every time
- 🔒 **Sandboxed**: Limited access to host system
- 🎯 **Precise**: Control exactly what's installed
- 🗑️ **Clean**: Easy to destroy and recreate
## Limitations
- ⏱️ Slower than git worktree (container overhead)
- 💾 Requires disk space for images
- 🐳 Requires Docker installation and running daemon
- ⚙️ More complex setup than worktree
- 🔧 May need volume mounts for file access
## Prerequisites
1. Docker installed and running (`docker info`)
2. Sufficient disk space (~1GB for base image + skill)
3. Permissions to run Docker commands
4. Internet connection (first time only, to pull images)
## Workflow
### Step 1: Validate Docker Environment
```bash
# Check Docker is installed
command -v docker || { echo "Docker not installed"; exit 1; }
# Check Docker daemon is running
docker info > /dev/null 2>&1 || { echo "Docker daemon not running"; exit 1; }
# Check disk space
docker system df
```
### Step 2: Choose Base Image
**Options:**
1. **claude-code-base** (preferred if available)
- Pre-built image with Claude Code installed
- Fastest startup time
2. **ubuntu:22.04** (fallback)
- Install Claude Code manually
- More control over environment
**Check if custom image exists:**
```bash
docker images | grep claude-code-base
```
### Step 3: Prepare Skill for Container
**Create temporary directory:**
```bash
TEST_DIR="/tmp/skill-test-$(date +%s)"
mkdir -p "$TEST_DIR"
# Copy skill to test directory
cp -r ~/.claude/skills/[skill-name] "$TEST_DIR/"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<'EOF'
FROM ubuntu:22.04
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
git \
nodejs \
npm \
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (adjust version as needed)
RUN npm install -g @anthropic/claude-code
# Create directory structure
RUN mkdir -p /root/.claude/skills
# Copy skill
COPY [skill-name]/ /root/.claude/skills/[skill-name]/
# Set working directory
WORKDIR /root
# Default command
CMD ["/bin/bash"]
EOF
```
### Step 4: Build Docker Image
```bash
cd "$TEST_DIR"
# Build image with tag
docker build -t skill-test:[skill-name] .
# Verify build succeeded
docker images | grep skill-test
```
**Expected build time:** 2-5 minutes (first time), < 30s (cached)
### Step 5: Take "Before" Snapshot
**Create container (don't start yet):**
```bash
CONTAINER_ID=$(docker create \
--name skill-test-$(date +%s) \
--memory="512m" \
--cpus="1.0" \
skill-test:[skill-name])
echo "Container ID: $CONTAINER_ID"
```
**Snapshot filesystem:**
```bash
docker export $CONTAINER_ID | tar -t > /tmp/before-files.txt
```
### Step 6: Run Skill in Container
**Start container interactively:**
```bash
docker start -ai $CONTAINER_ID
```
**Or run with test command:**
```bash
docker run -it \
--name skill-test \
--rm \
--memory="512m" \
--cpus="1.0" \
skill-test:[skill-name] \
bash -c "claude skill run [skill-name] --test"
```
**Monitor execution:**
```bash
# In another terminal, watch resource usage
docker stats $CONTAINER_ID
# Watch logs
docker logs -f $CONTAINER_ID
```
### Step 7: Take "After" Snapshot
**Commit container state:**
```bash
docker commit $CONTAINER_ID skill-test:[skill-name]-after
```
**Export and compare files:**
```bash
# Export after state
docker export $CONTAINER_ID | tar -t > /tmp/after-files.txt
# Find differences
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
# Count changes
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
```
**Check for running processes:**
```bash
docker exec $CONTAINER_ID ps aux > /tmp/processes.txt
```
### Step 8: Analyze Results
**Extract skill logs:**
```bash
docker logs $CONTAINER_ID > /tmp/skill-execution.log
# Check for errors
grep -i "error\|fail\|exception" /tmp/skill-execution.log
```
**Check resource usage:**
```bash
docker stats --no-stream $CONTAINER_ID
```
**Inspect filesystem changes:**
```bash
# List files in skill directory
docker exec $CONTAINER_ID find /root/.claude/skills/[skill-name] -type f
# Check temp directories
docker exec $CONTAINER_ID find /tmp -name "*skill*" -o -name "*.tmp"
# Check for leftover processes
docker exec $CONTAINER_ID ps aux | grep -v "ps\|bash"
```
**Analyze dependencies:**
```bash
# Check what packages were installed
docker diff $CONTAINER_ID | grep -E "^A /usr|^A /var/lib"
# Check what commands were executed
docker logs $CONTAINER_ID | grep -E "npm install|apt-get|pip install"
```
### Step 9: Generate Report
**Execution Status:**
```markdown
## Execution Results
**Container**: $CONTAINER_ID
**Base Image**: ubuntu:22.04
**Status**: [Running/Stopped/Exited]
**Exit Code**: $(docker inspect $CONTAINER_ID --format='{{.State.ExitCode}}')
**Resource Usage**:
- Memory: XMB / 512MB
- CPU: X%
- Execution Time: Xs
```
**Side Effects:**
```markdown
## Filesystem Changes
Files added: X
Files modified: X
Files deleted: X
**Significant changes:**
- /tmp/skill-temp-xyz.log (5KB)
- /root/.claude/cache/skill-data.json (15KB)
```
**Dependency Analysis:**
```markdown
## Dependencies Detected
**System Packages**:
- curl (already present)
- jq (installed by skill)
**NPM Packages**:
- lodash@4.17.21 (installed)
**Hardcoded Paths**:
⚠️ /root/.claude/config (line 45)
→ Use $HOME/.claude/config instead
```
### Step 10: Cleanup
**Ask user:**
```
Test complete. Container: $CONTAINER_ID
Options:
1. Keep container for debugging (docker start -ai $CONTAINER_ID)
2. Stop container, keep image (can restart later)
3. Remove container and image (full cleanup)
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Stop container
docker stop $CONTAINER_ID
# Option 3: Full cleanup
docker rm -f $CONTAINER_ID
docker rmi skill-test:[skill-name]
docker rmi skill-test:[skill-name]-after
# Cleanup test directory
rm -rf "$TEST_DIR"
```
**Cleanup all test containers:**
```bash
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi -f
```
## Interpreting Results
### ✅ **PASS** - Production Ready
- Container exited with code 0
- Skill completed successfully
- No excessive resource usage
- All dependencies documented
- No orphaned processes
- Temp files in acceptable locations (/tmp only)
### ⚠️ **WARNING** - Needs Improvement
- Exit code 0 but warnings in logs
- Higher than expected resource usage
- Some undocumented dependencies
- Minor cleanup issues
### ❌ **FAIL** - Not Ready
- Container exited with non-zero code
- Skill crashed or hung
- Excessive resource usage (> 512MB memory)
- Attempted to access outside container
- Critical dependencies not documented
## Common Issues
### Issue: "Docker daemon not running"
**Fix**:
```bash
# macOS
open -a Docker
# Linux
sudo systemctl start docker
```
### Issue: "Permission denied" when building image
**Cause**: User not in docker group
**Fix**:
```bash
# Add user to docker group
sudo usermod -aG docker $USER
# Logout/login or run:
newgrp docker
```
### Issue: "No space left on device"
**Cause**: Docker disk space full
**Fix**:
```bash
# Clean up old images and containers
docker system prune -a
# Check space
docker system df
```
### Issue: Skill requires GUI
**Cause**: Skill opens browser or displays graphics
**Fix**: Add X11 forwarding or mark skill as requiring GUI
## Advanced Techniques
### Volume Mounts for Live Testing
```bash
# Mount skill directory for live editing
docker run -it \
-v ~/.claude/skills/[skill-name]:/root/.claude/skills/[skill-name] \
skill-test:[skill-name]
```
### Custom Network Settings
```bash
# Isolated network (no internet)
docker run -it --network=none skill-test:[skill-name]
# Monitor network traffic
docker run -it --cap-add=NET_ADMIN skill-test:[skill-name]
```
### Multi-Stage Testing
```bash
# Test with different Node versions
docker build -t skill-test:node16 --build-arg NODE_VERSION=16 .
docker build -t skill-test:node18 --build-arg NODE_VERSION=18 .
docker build -t skill-test:node20 --build-arg NODE_VERSION=20 .
```
## Best Practices
1. **Always set resource limits** (`--memory`, `--cpus`) to prevent runaway processes
2. **Use `--rm` flag** for auto-cleanup in simple tests
3. **Tag images clearly** with skill name and version
4. **Cache base images** to speed up subsequent tests
5. **Export test results** before removing containers
6. **Test with minimal permissions** first, add as needed
7. **Document all APT/NPM/PIP installs** found during testing
## Quick Command Reference
```bash
# Build test image
docker build -t skill-test:my-skill .
# Run with auto-cleanup
docker run -it --rm skill-test:my-skill
# Run with resource limits
docker run -it --memory="512m" --cpus="1.0" skill-test:my-skill
# Check container status
docker ps -a | grep skill-test
# View container logs
docker logs <container-id>
# Execute command in running container
docker exec <container-id> <command>
# Stop and remove all test containers
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
# Remove all test images
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi
```
---
**Remember:** Docker provides strong isolation with reproducible environments. Use for skills that install packages or modify system files. For highest security, use VM mode instead.

View File

@@ -0,0 +1,565 @@
# Mode 3: VM (Virtual Machine) Isolation
## When to Use
**Best for:**
- High-risk skills that modify system configurations
- Skills that require kernel modules or system services
- Testing skills that interact with VMs themselves
- Maximum isolation and security
- Skills from untrusted sources
**Not suitable for:**
- Quick iteration during development (too slow)
- Skills that are obviously safe and read-only
- Situations where speed is more important than isolation
**Risk Level**: Medium to high complexity skills
## Advantages
- 🔒 **Complete Isolation**: Separate kernel, OS, and all resources
- 🛡️ **Maximum Security**: Host system is completely protected
- 🖥️ **Real OS Environment**: Test on actual Linux/macOS distributions
- 📸 **Snapshots**: Easy rollback to clean state
- 🧪 **Destructive Testing**: Safe to test potentially dangerous operations
## Limitations
- 🐌 **Slow**: Minutes to provision, slower execution
- 💾 **Disk Space**: 10-20GB per VM
- 💰 **Resource Intensive**: Requires significant RAM and CPU
- 🔧 **Complex Setup**: More moving parts to configure
- ⏱️ **Longer Feedback Loop**: Not ideal for rapid iteration
## Prerequisites
1. Virtualization software installed:
- **macOS**: UTM, Parallels, or VMware Fusion
- **Linux**: QEMU/KVM, VirtualBox, or virt-manager
- **Windows**: VirtualBox, Hyper-V, or VMware Workstation
2. Base VM image or ISO:
- Ubuntu 22.04 LTS (recommended)
- Debian 12
- Fedora 39
3. System resources:
- 8GB+ host RAM (allocate 2-4GB to VM)
- 20GB+ disk space
- CPU virtualization enabled (VT-x/AMD-V)
4. Command-line tools:
- **macOS with UTM**: `utmctl` or use UI
- **Linux**: `virsh` (libvirt) or `vboxmanage` (VirtualBox)
- **Multipass**: `multipass` (cross-platform, recommended)
## Recommended: Use Multipass
Multipass is the easiest option for cross-platform VM management:
```bash
# Install Multipass
# macOS:
brew install multipass
# Linux:
sudo snap install multipass
# Windows:
# Download from https://multipass.run/
```
## Workflow
### Step 1: Validate Virtualization Environment
```bash
# Check virtualization is enabled (Linux)
grep -E 'vmx|svm' /proc/cpuinfo
# Check Multipass is installed
command -v multipass || { echo "Install Multipass"; exit 1; }
# Check available resources
multipass info || echo "First time setup needed"
```
### Step 2: Create Base VM
**Launch clean Ubuntu VM:**
```bash
VM_NAME="skill-test-$(date +%s)"
# Launch VM with Multipass
multipass launch \
--name "$VM_NAME" \
--cpus 2 \
--memory 2G \
--disk 10G \
22.04
# Wait for VM to be ready
multipass exec "$VM_NAME" -- cloud-init status --wait
```
**Or use UTM (macOS GUI):**
1. Download Ubuntu 22.04 ARM64 ISO
2. Create new VM with 2GB RAM, 10GB disk
3. Install Ubuntu and setup user
4. Note VM name for scripts
**Or use virsh (Linux CLI):**
```bash
# Download cloud image
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
# Create VM
virt-install \
--name "$VM_NAME" \
--memory 2048 \
--vcpus 2 \
--disk ubuntu-22.04-server-cloudimg-amd64.img \
--import \
--os-variant ubuntu22.04
```
### Step 3: Install Claude Code in VM
```bash
# Install system dependencies
multipass exec "$VM_NAME" -- sudo apt-get update
multipass exec "$VM_NAME" -- sudo apt-get install -y \
curl \
git \
nodejs \
npm
# Install Claude Code
multipass exec "$VM_NAME" -- npm install -g @anthropic/claude-code
# Verify installation
multipass exec "$VM_NAME" -- which claude
```
### Step 4: Copy Skill to VM
```bash
# Create directory structure
multipass exec "$VM_NAME" -- mkdir -p /home/ubuntu/.claude/skills
# Copy skill to VM
multipass transfer \
~/.claude/skills/[skill-name] \
"$VM_NAME":/home/ubuntu/.claude/skills/
# Verify copy
multipass exec "$VM_NAME" -- ls -la /home/ubuntu/.claude/skills/[skill-name]
```
### Step 5: Take VM Snapshot
**With Multipass:**
```bash
# Multipass doesn't support snapshots directly
# Instead, we'll capture filesystem state
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/before-files.txt
multipass exec "$VM_NAME" -- dpkg -l > /tmp/before-packages.txt
multipass exec "$VM_NAME" -- ps aux > /tmp/before-processes.txt
```
**With UTM (macOS):**
```bash
# Take snapshot via UI or CLI if available
utmctl snapshot "$VM_NAME" --name "before-skill-test"
```
**With virsh (Linux):**
```bash
virsh snapshot-create-as "$VM_NAME" before-skill-test "Before skill test"
```
### Step 6: Execute Skill in VM
**Start Claude Code session in VM:**
```bash
# Interactive session
multipass shell "$VM_NAME"
# Then inside VM:
claude
# Run skill with trigger phrase
```
**Or execute non-interactively:**
```bash
# If skill has test command
multipass exec "$VM_NAME" -- \
bash -c "claude skill run [skill-name] --test"
```
**Monitor from host:**
```bash
# Watch resource usage
multipass info "$VM_NAME" --format json | jq '.info[] | {memory_usage, cpu_usage}'
# Tail logs
multipass exec "$VM_NAME" -- tail -f /var/log/syslog
```
### Step 7: Take Post-Execution Snapshot
```bash
# Capture filesystem state
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/after-files.txt
multipass exec "$VM_NAME" -- dpkg -l > /tmp/after-packages.txt
multipass exec "$VM_NAME" -- ps aux > /tmp/after-processes.txt
# Compare
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
diff /tmp/before-packages.txt /tmp/after-packages.txt > /tmp/package-changes.txt
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
```
**Snapshot VM state:**
```bash
# virsh
virsh snapshot-create-as "$VM_NAME" after-skill-test "After skill test"
# UTM (macOS)
utmctl snapshot "$VM_NAME" --name "after-skill-test"
```
### Step 8: Analyze Results
**Extract execution logs:**
```bash
# Copy Claude Code logs from VM
multipass transfer \
"$VM_NAME":/home/ubuntu/.claude/logs/ \
/tmp/skill-test-logs/
# Analyze logs
grep -i "error\|warning\|fail" /tmp/skill-test-logs/*.log
```
**Check filesystem changes:**
```bash
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
# Check for unexpected modifications
grep ">/etc/" /tmp/file-changes.txt # System config changes
grep ">/usr/local/" /tmp/file-changes.txt # Global installs
```
**Check package changes:**
```bash
# List newly installed packages
grep ">" /tmp/package-changes.txt
# Check for removed packages
grep "<" /tmp/package-changes.txt
```
**Check for orphaned processes:**
```bash
# Processes still running after skill completion
grep ">" /tmp/process-changes.txt | grep -v "ps\|grep\|ssh"
```
**System modifications:**
```bash
# Check for systemd services
multipass exec "$VM_NAME" -- systemctl list-units --type=service --state=running
# Check for cron jobs
multipass exec "$VM_NAME" -- crontab -l
# Check for environment modifications
multipass exec "$VM_NAME" -- cat /etc/environment
```
### Step 9: Generate Comprehensive Report
```markdown
# VM Isolation Test Report: [skill-name]
## Environment
**VM Platform**: Multipass / UTM / virsh
**OS**: Ubuntu 22.04 LTS
**VM Name**: $VM_NAME
**Resources**: 2 vCPU, 2GB RAM, 10GB disk
## Execution Results
**Status**: ✅ Completed successfully
**Duration**: 45 seconds
**Exit Code**: 0
## Filesystem Changes
**Files Added**: 12
- `/home/ubuntu/.claude/cache/skill-data.json` (15KB)
- `/tmp/skill-temp-*.log` (3 files, 45KB total)
- `/home/ubuntu/.cache/skill-assets/` (8 files, 120KB)
**Files Modified**: 2
- `/home/ubuntu/.claude/config.json` (updated skill registry)
- `/home/ubuntu/.bash_history` (normal)
**Files Deleted**: 0
## Package Changes
**Installed Packages**: 2
- `jq` (1.6-2.1ubuntu3)
- `tree` (2.0.2-1)
**Removed Packages**: 0
## System Modifications
✅ No systemd services added
✅ No cron jobs created
✅ No environment variables modified
⚠️ Found leftover temp files in /tmp
## Process Analysis
**Orphaned Processes**: 0
**Background Jobs**: 0
**Network Connections**: 0
## Security Assessment
✅ No unauthorized file access attempts
✅ No privilege escalation attempts
✅ No suspicious network activity
✅ All operations within user home directory
## Dependency Analysis
**System Packages Required**:
- `jq` (for JSON processing) - Not documented in README
- `tree` (for directory visualization) - Optional
**NPM Packages Required**: None beyond Claude Code
**Hardcoded Paths Detected**:
⚠️ `/home/ubuntu/.claude/cache` (line 67)
→ Should use `$HOME/.claude/cache` or `~/.claude/cache`
## Recommendations
1. **CRITICAL**: Document `jq` dependency in README.md
2. **HIGH**: Fix hardcoded path on line 67
3. **MEDIUM**: Clean up /tmp files before skill exits
4. **LOW**: Consider making `tree` dependency optional
## Overall Grade: B (READY with minor fixes)
**Portability**: 85/100
**Cleanliness**: 75/100
**Security**: 100/100
**Documentation**: 70/100
**Final Status**: ✅ **APPROVED** for public release after addressing CRITICAL and HIGH priority items
```
### Step 10: Cleanup or Preserve
**Ask user:**
```
Test complete. VM: $VM_NAME
Options:
1. Keep VM for manual inspection
Command: multipass shell $VM_NAME
2. Stop VM (can restart later)
Command: multipass stop $VM_NAME
3. Delete VM and snapshots (full cleanup)
Command: multipass delete $VM_NAME && multipass purge
4. Rollback to "before" snapshot and retest
(virsh/UTM only)
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Stop VM
multipass stop "$VM_NAME"
# Option 3: Full cleanup
multipass delete "$VM_NAME"
multipass purge
# Cleanup temp files
rm -rf /tmp/skill-test-logs
rm /tmp/before-*.txt /tmp/after-*.txt /tmp/*-changes.txt
```
## Interpreting Results
### ✅ **PASS** - Production Ready
- VM still bootable after test
- Skill completed successfully
- No unauthorized system modifications
- All dependencies documented
- No security issues detected
- Clean cleanup (no orphaned resources)
### ⚠️ **WARNING** - Needs Review
- Skill works but left system modifications
- Installed undocumented packages
- Modified system configs (needs user consent)
- Performance issues (high resource usage)
### ❌ **FAIL** - Not Safe
- VM corrupted or unbootable
- Skill crashed or hung indefinitely
- Unauthorized privilege escalation
- Malicious behavior detected
- Critical undocumented dependencies
- Data exfiltration attempts
## Common Issues
### Issue: "Multipass not found"
**Fix**:
```bash
# macOS
brew install multipass
# Linux
sudo snap install multipass
```
### Issue: "Virtualization not enabled"
**Cause**: VT-x/AMD-V disabled in BIOS
**Fix**: Enable virtualization in BIOS/UEFI settings
### Issue: "Failed to launch VM"
**Cause**: Insufficient resources
**Fix**:
```bash
# Reduce VM resources
multipass launch --cpus 1 --memory 1G --disk 5G
```
### Issue: "VM network not working"
**Cause**: Network bridge issues
**Fix**:
```bash
# Restart Multipass daemon
# macOS
sudo launchctl kickstart -k system/com.canonical.multipassd
# Linux
sudo systemctl restart snap.multipass.multipassd
```
### Issue: "Can't copy files to VM"
**Cause**: SSH/sftp issues
**Fix**:
```bash
# Mount host directory instead
multipass mount ~/.claude/skills "$VM_NAME":/mnt/skills
```
## Advanced Techniques
### Automated Testing Pipeline
```bash
#!/bin/bash
# test-skill-vm.sh
SKILL_NAME="$1"
VM_NAME="skill-test-$SKILL_NAME-$(date +%s)"
# Launch VM
multipass launch --name "$VM_NAME" 22.04
# Setup
multipass exec "$VM_NAME" -- bash -c "
sudo apt-get update
sudo apt-get install -y nodejs npm
npm install -g @anthropic/claude-code
"
# Copy skill
multipass transfer ~/.claude/skills/$SKILL_NAME "$VM_NAME":/home/ubuntu/.claude/skills/
# Run test
multipass exec "$VM_NAME" -- claude skill test $SKILL_NAME
# Cleanup
multipass delete "$VM_NAME"
multipass purge
```
### Testing on Multiple OS Versions
```bash
# Test on Ubuntu 20.04, 22.04, and 24.04
for version in 20.04 22.04 24.04; do
VM="skill-test-ubuntu-${version}"
multipass launch --name "$VM" $version
# ... run tests ...
multipass delete "$VM"
done
```
### Network Isolation Testing
```bash
# Create VM without internet access (if supported by hypervisor)
# Then test if skill fails gracefully without network
```
## Best Practices
1. **Always take snapshots** before running skills
2. **Test on clean VMs** - don't reuse VMs between tests
3. **Monitor resource usage** - catch runaway processes
4. **Check system logs** (`/var/log/syslog`) for warnings
5. **Test rollback** - ensure VM can be restored
6. **Document all system dependencies** found
7. **Use minimal VM resources** to catch resource issues
8. **Archive test results** before destroying VMs
## Quick Command Reference
```bash
# Launch VM
multipass launch --name test-vm 22.04
# List VMs
multipass list
# Shell into VM
multipass shell test-vm
# Execute command in VM
multipass exec test-vm -- <command>
# Copy file to VM
multipass transfer local-file test-vm:/remote/path
# Copy file from VM
multipass transfer test-vm:/remote/path local-file
# Stop VM
multipass stop test-vm
# Start VM
multipass start test-vm
# Delete VM
multipass delete test-vm && multipass purge
# VM info
multipass info test-vm
```
---
**Remember:** VM isolation is the gold standard for testing high-risk skills. It's slower but provides complete security and accurate testing of system-level behaviors. Use for skills from untrusted sources or skills that modify system state.

View File

@@ -0,0 +1,408 @@
# Skill Isolation Test Report: {{skill_name}}
**Generated**: {{timestamp}}
**Tester**: {{tester_name}}
**Environment**: {{environment}} ({{mode}})
**Duration**: {{duration}}
---
## Executive Summary
**Overall Status**: {{status}}
**Grade**: {{grade}}
**Ready for Release**: {{ready_for_release}}
### Quick Stats
- Execution Status: {{execution_status}}
- Side Effects: {{side_effects_count}} detected
- Dependencies: {{dependencies_count}} found
- Issues: {{issues_high}} HIGH, {{issues_medium}} MEDIUM, {{issues_low}} LOW
---
## Test Environment
**Isolation Mode**: {{mode}}
**Platform**: {{platform}}
**OS**: {{os_version}}
**Resources**: {{resources}}
{{#if mode_specific_details}}
### Mode-Specific Details
{{mode_specific_details}}
{{/if}}
---
## Execution Results
### Status
{{execution_status_icon}} **{{execution_status}}**
### Details
- **Start Time**: {{start_time}}
- **End Time**: {{end_time}}
- **Duration**: {{duration}}
- **Exit Code**: {{exit_code}}
### Output
```
{{skill_output}}
```
{{#if execution_errors}}
### Errors
```
{{execution_errors}}
```
{{/if}}
### Resource Usage
- **Peak CPU**: {{peak_cpu}}%
- **Peak Memory**: {{peak_memory}}
- **Disk I/O**: {{disk_io}}
- **Network**: {{network_usage}}
---
## Side Effects Analysis
### Filesystem Changes
#### Files Created: {{files_created_count}}
{{#each files_created}}
- `{{path}}` ({{size}}){{#if temporary}} - TEMPORARY{{/if}}{{#if cleanup_failed}} ⚠️ Not cleaned up{{/if}}
{{/each}}
{{#if files_created_count_zero}}
✅ No files created
{{/if}}
#### Files Modified: {{files_modified_count}}
{{#each files_modified}}
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
{{/each}}
{{#if files_modified_count_zero}}
✅ No files modified
{{/if}}
#### Files Deleted: {{files_deleted_count}}
{{#each files_deleted}}
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
{{/each}}
{{#if files_deleted_count_zero}}
✅ No files deleted
{{/if}}
### Process Management
#### Processes Created: {{processes_created_count}}
{{#each processes}}
- PID {{pid}}: `{{command}}`{{#if still_running}} ⚠️ Still running{{/if}}
{{/each}}
{{#if orphaned_processes}}
⚠️ **Orphaned Processes**: {{orphaned_processes_count}}
{{#each orphaned_processes}}
- PID {{pid}}: `{{command}}` ({{runtime}} running)
{{/each}}
{{/if}}
{{#if no_process_issues}}
✅ All processes completed successfully
{{/if}}
### System Configuration
#### Environment Variables
{{#if env_vars_changed}}
{{#each env_vars_changed}}
- `{{name}}`: {{before}} → {{after}}
{{/each}}
{{else}}
✅ No environment variable changes
{{/if}}
#### Services & Daemons
{{#if services_started}}
{{#each services_started}}
- `{{name}}` ({{status}}){{#if undocumented}} ⚠️ Undocumented{{/if}}
{{/each}}
{{else}}
✅ No services started
{{/if}}
#### Package Installations
{{#if packages_installed}}
{{#each packages_installed}}
- `{{name}}` ({{version}}){{#if undocumented}} ⚠️ Not documented{{/if}}
{{/each}}
{{else}}
✅ No packages installed
{{/if}}
### Network Activity
{{#if network_connections}}
**Connections**: {{network_connections_count}}
{{#each network_connections}}
- {{protocol}} to `{{destination}}:{{port}}`{{#if secure}} (HTTPS){{else}} ⚠️ (HTTP){{/if}}
{{/each}}
**Data Transmitted**: {{data_transmitted}}
{{else}}
✅ No network activity detected
{{/if}}
### Database Changes
{{#if database_changes}}
{{#each database_changes}}
- {{type}}: {{description}}
{{/each}}
{{else}}
✅ No database changes
{{/if}}
---
## Dependency Analysis
### System Packages Required
{{#if system_packages}}
{{#each system_packages}}
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}`{{#if version}} ({{version}}){{/if}}{{#unless documented}} - **Not documented in README**{{/unless}}
{{/each}}
{{else}}
✅ No system package dependencies
{{/if}}
### Language Packages (npm/pip/gem)
{{#if language_packages}}
{{#each language_packages}}
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}@{{version}}`{{#unless documented}} - **Not documented**{{/unless}}
{{/each}}
{{else}}
✅ No language package dependencies
{{/if}}
### Runtime Requirements
{{#if runtime_requirements}}
{{#each runtime_requirements}}
- {{name}}: {{requirement}}{{#if met}}✅{{else}}❌{{/if}}
{{/each}}
{{else}}
✅ No special runtime requirements
{{/if}}
---
## Code Quality Issues
### Hardcoded Paths Detected
{{#if hardcoded_paths}}
{{#each hardcoded_paths}}
⚠️ `{{path}}` in {{file}}:{{line}}
**Recommendation**: Use `$HOME` or relative path
{{/each}}
{{else}}
✅ No hardcoded paths detected
{{/if}}
### Security Concerns
{{#if security_issues}}
{{#each security_issues}}
{{severity_icon}} **{{severity}}**: {{description}}
Location: {{file}}:{{line}}
Recommendation: {{recommendation}}
{{/each}}
{{else}}
✅ No security issues detected
{{/if}}
### Performance Issues
{{#if performance_issues}}
{{#each performance_issues}}
⚠️ {{description}}
{{/each}}
{{else}}
✅ No performance issues detected
{{/if}}
---
## Portability Assessment
### Cross-Platform Compatibility
- **Linux**: {{linux_compatible}}
- **macOS**: {{macos_compatible}}
- **Windows**: {{windows_compatible}}
### Environment Dependencies
{{#if env_dependencies}}
{{#each env_dependencies}}
- {{name}}: {{status}}
{{/each}}
{{else}}
✅ No environment-specific dependencies
{{/if}}
### User-Specific Assumptions
{{#if user_assumptions}}
{{#each user_assumptions}}
⚠️ {{description}}
{{/each}}
{{else}}
✅ No user-specific assumptions
{{/if}}
---
## Issues Summary
### 🔴 HIGH Priority ({{issues_high_count}})
{{#each issues_high}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Location: {{location}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_high_issues}}
✅ No HIGH priority issues
{{/if}}
### 🟡 MEDIUM Priority ({{issues_medium_count}})
{{#each issues_medium}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Location: {{location}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_medium_issues}}
✅ No MEDIUM priority issues
{{/if}}
### 🟢 LOW Priority ({{issues_low_count}})
{{#each issues_low}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_low_issues}}
✅ No LOW priority issues
{{/if}}
---
## Recommendations
### Required Before Release
{{#each required_fixes}}
{{index}}. {{recommendation}}
{{/each}}
{{#if no_required_fixes}}
✅ No required fixes
{{/if}}
### Suggested Improvements
{{#each suggested_improvements}}
{{index}}. {{recommendation}}
{{/each}}
### Documentation Updates Needed
{{#each documentation_updates}}
- {{item}}
{{/each}}
---
## Scoring Breakdown
| Category | Score | Weight | Weighted Score |
|----------|-------|--------|----------------|
| **Execution** | {{execution_score}}/100 | 25% | {{execution_weighted}} |
| **Cleanliness** | {{cleanliness_score}}/100 | 25% | {{cleanliness_weighted}} |
| **Security** | {{security_score}}/100 | 30% | {{security_weighted}} |
| **Portability** | {{portability_score}}/100 | 10% | {{portability_weighted}} |
| **Documentation** | {{documentation_score}}/100 | 10% | {{documentation_weighted}} |
| **TOTAL** | | | **{{total_score}}/100** |
### Grade: {{grade}}
**Grading Scale:**
- A (90-100): Production ready
- B (80-89): Ready with minor fixes
- C (70-79): Significant improvements needed
- D (60-69): Major issues, not recommended
- F (0-59): Not safe to use
---
## Test Artifacts
### Snapshots
- Before: `{{snapshot_before_path}}`
- After: `{{snapshot_after_path}}`
### Logs
- Execution log: `{{execution_log_path}}`
- Side effects log: `{{side_effects_log_path}}`
### Isolation Environment
{{#if environment_preserved}}
**Preserved for debugging**
Access instructions:
```bash
{{access_command}}
```
{{else}}
🗑️ **Cleaned up**
{{/if}}
---
## Final Verdict
### Status: {{final_status}}
{{#if approved}}
**APPROVED for public release**
This skill has passed isolation testing with acceptable results. Address HIGH priority issues before release, and consider MEDIUM/LOW priority improvements in future versions.
{{/if}}
{{#if approved_with_fixes}}
⚠️ **APPROVED with required fixes**
This skill will be ready for public release after addressing the {{issues_high_count}} HIGH priority issue(s) listed above. Retest after fixes.
{{/if}}
{{#if not_approved}}
**NOT APPROVED**
This skill has critical issues that must be addressed before public release. Major refactoring or fixes required. Retest after addressing all HIGH priority issues and reviewing MEDIUM priority items.
{{/if}}
### Next Steps
{{#each next_steps}}
{{index}}. {{step}}
{{/each}}
---
**Test Completed**: {{completion_time}}
**Report Version**: 1.0
**Tester**: {{tester_name}}
---
*This report was generated by skill-isolation-tester*

View File

@@ -0,0 +1,392 @@
# Skill Test Templates
Production-ready test templates for validating Claude Code skills in isolated environments.
## Overview
These templates provide standardized testing workflows for different skill types. Each template includes:
- Pre-flight environment validation
- Before/after snapshots for comparison
- Comprehensive safety and security checks
- Detailed reporting with pass/fail criteria
- Automatic cleanup on exit (success or failure)
## CI/CD Integration with JSON Output
All test templates support JSON output for integration with CI/CD pipelines. The JSON reporter generates:
- **Structured JSON** - Machine-readable test results
- **JUnit XML** - Compatible with Jenkins, GitLab CI, GitHub Actions
- **Markdown Summary** - Human-readable reports for GitHub Actions
**Enable JSON output:**
```bash
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill
```
**Output files:**
- `test-report.json` - Full structured test data
- `test-report.junit.xml` - JUnit format for CI systems
- `test-report.md` - Markdown summary
**JSON Report Structure:**
```json
{
"test_name": "docker-skill-test",
"skill_name": "my-skill",
"timestamp": "2025-11-02T12:00:00Z",
"status": "passed",
"duration_seconds": 45,
"exit_code": 0,
"metrics": {
"containers_created": 2,
"images_created": 1,
"execution_duration_seconds": 12
},
"issues": [],
"recommendations": []
}
```
**GitHub Actions Integration:**
```yaml
- name: Test Skill
run: |
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill
- name: Upload Test Results
uses: actions/upload-artifact@v3
with:
name: test-results
path: /tmp/skill-test-*/test-report.*
```
See `lib/json-reporter.sh` for full API documentation.
---
## Available Templates
### 1. Docker Skill Test (`docker-skill-test.sh`)
**Use for skills that:**
- Start or manage Docker containers
- Build Docker images
- Work with Docker volumes, networks, or compose files
- Require Docker daemon access
**Features:**
- Tracks Docker resource creation (containers, images, volumes, networks)
- Detects orphaned containers
- Validates cleanup behavior
- Resource limit enforcement
**Usage:**
```bash
chmod +x test-templates/docker-skill-test.sh
./test-templates/docker-skill-test.sh my-docker-skill
```
**Customization:**
Edit the skill execution command on line ~178:
```bash
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
./skill.sh test-mode # <-- Customize this
"
```
---
### 2. API Skill Test (`api-skill-test.sh`)
**Use for skills that:**
- Make HTTP/HTTPS requests to external APIs
- Require API keys or authentication
- Interact with web services
- Need network access
**Features:**
- Network traffic monitoring
- API call detection and counting
- API key/secret leak detection
- Rate limiting validation
- HTTPS enforcement checking
**Usage:**
```bash
chmod +x test-templates/api-skill-test.sh
./test-templates/api-skill-test.sh my-api-skill
```
**Optional: Enable network capture:**
```bash
# Requires tcpdump and sudo
sudo apt-get install tcpdump # or brew install tcpdump
./test-templates/api-skill-test.sh my-api-skill
```
---
### 3. File Manipulation Skill Test (`file-manipulation-skill-test.sh`)
**Use for skills that:**
- Create, read, update, or delete files
- Modify configuration files
- Generate reports or artifacts
- Perform filesystem operations
**Features:**
- Complete filesystem diff (added/removed/modified files)
- File permission validation
- Sensitive data scanning
- Temp file cleanup verification
- MD5 checksum comparison
**Usage:**
```bash
chmod +x test-templates/file-manipulation-skill-test.sh
./test-templates/file-manipulation-skill-test.sh my-file-skill
```
**Customization:**
Add your own test files to the workspace (lines 54-70):
```bash
cat > "$TEST_DIR/test-workspace/your-file.txt" <<'EOF'
Your test content here
EOF
```
---
### 4. Git Skill Test (`git-skill-test.sh`)
**Use for skills that:**
- Create commits, branches, or tags
- Modify git history or configuration
- Work with git worktrees
- Interact with remote repositories
**Features:**
- Git state comparison (commits, branches, tags)
- Working tree cleanliness validation
- Force operation detection
- History rewriting detection
- Dangling commit detection
**Usage:**
```bash
chmod +x test-templates/git-skill-test.sh
./test-templates/git-skill-test.sh my-git-skill
```
**Customization:**
Modify the test repository setup (lines 59-81) to match your skill's requirements.
---
## Common Usage Patterns
### Basic Test Execution
```bash
# Run test for a specific skill
./test-templates/docker-skill-test.sh my-skill-name
# Keep container for debugging
export SKILL_TEST_KEEP_CONTAINER="true"
./test-templates/docker-skill-test.sh my-skill-name
# Keep images after test
export SKILL_TEST_REMOVE_IMAGES="false"
./test-templates/docker-skill-test.sh my-skill-name
```
### Custom Resource Limits
```bash
# Set custom memory/CPU limits
export SKILL_TEST_MEMORY_LIMIT="1g"
export SKILL_TEST_CPU_LIMIT="2.0"
./test-templates/docker-skill-test.sh my-skill-name
```
### Parallel Testing
```bash
# Test multiple skills in parallel
for skill in skill1 skill2 skill3; do
./test-templates/docker-skill-test.sh "$skill" &
done
wait
echo "All tests complete!"
```
### CI/CD Integration
```bash
# Exit code 0 = pass, 1 = fail
#!/bin/bash
set -e
SKILLS=(
"skill-creator"
"claude-code-otel-setup"
"playwright-e2e-automation"
)
for skill in "${SKILLS[@]}"; do
echo "Testing $skill..."
./test-templates/docker-skill-test.sh "$skill" || {
echo "$skill failed!"
exit 1
}
done
echo "✅ All skills passed!"
```
## Customizing Templates
### Add Custom Validation
Insert your own checks before the "Generate Test Report" section:
```bash
# ============================================================================
# Custom Validation
# ============================================================================
echo ""
echo "=== Running Custom Checks ==="
# Your custom checks here
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
# Example: Check if specific file exists
test -f /workspace/expected-output.txt || {
echo 'ERROR: Expected output file not found'
exit 1
}
"
```
### Modify Execution Command
Each template has a skill execution section. Customize the command to match your skill's interface:
```bash
# Example: Run skill with arguments
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
./skill.sh --mode=test --output=/workspace/results
"
# Example: Source skill as library
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
source /root/.claude/skills/$SKILL_NAME/lib.sh
run_skill_tests
"
```
### Add Pre-Test Setup
Insert setup steps after the "Build Test Environment" section:
```bash
# Install additional dependencies
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
apt-get update && apt-get install -y your-package
"
# Set environment variables
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
export SKILL_CONFIG_PATH=/etc/skill-config.json
"
```
## Environment Variables
All templates support these environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `SKILL_TEST_KEEP_CONTAINER` | `false` | Keep container after test for debugging |
| `SKILL_TEST_REMOVE_IMAGES` | `true` | Remove test images after completion |
| `SKILL_TEST_MEMORY_LIMIT` | `512m` | Container memory limit |
| `SKILL_TEST_CPU_LIMIT` | `1.0` | Container CPU limit (cores) |
| `SKILL_TEST_TEMP_DIR` | `/tmp/skill-test-*` | Temporary directory for test artifacts |
## Exit Codes
- `0` - Test passed (skill executed successfully)
- `1` - Test failed (skill execution error or validation failure)
- `>1` - Other errors (environment setup, Docker issues, etc.)
## Troubleshooting
### "Docker daemon not running"
```bash
# macOS
open -a Docker
# Linux
sudo systemctl start docker
```
### "Permission denied" errors
```bash
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
```
### Container hangs or never exits
```bash
# Set a timeout in your skill execution
timeout 300 ./test-templates/docker-skill-test.sh my-skill
```
### Need to inspect failed test
```bash
# Keep container after failure
export SKILL_TEST_KEEP_CONTAINER="true"
./test-templates/docker-skill-test.sh my-skill
# Inspect container
docker start -ai <container-id>
docker logs <container-id>
```
## Best Practices
1. **Run tests before committing** - Catch environment-specific bugs early
2. **Test in clean environment** - Don't rely on local configs or files
3. **Validate cleanup** - Ensure skills don't leave orphaned resources
4. **Check for secrets** - Never commit API keys or sensitive data
5. **Document dependencies** - List all required packages and tools
6. **Use resource limits** - Prevent runaway processes
7. **Review diffs carefully** - Understand all file system changes
## Contributing
To add a new test template:
1. Copy an existing template as a starting point
2. Customize for your skill type
3. Add comprehensive validation checks
4. Update this README with usage documentation
5. Test your template with at least 3 different skills
## Related Documentation
- `../lib/docker-helpers.sh` - Shared helper functions
- `../modes/mode2-docker.md` - Docker isolation mode documentation
- `../skill.md` - Main skill documentation
## Support
For issues or questions:
- Check the skill logs: `docker logs <container-id>`
- Review test artifacts in `/tmp/skill-test-*/`
- Consult the helper library: `lib/docker-helpers.sh`

View File

@@ -0,0 +1,317 @@
#!/bin/bash
# Test Template for API-Calling Skills
# Use this template when testing skills that:
# - Make HTTP/HTTPS requests to external APIs
# - Require API keys or authentication
# - Need network access
# - Interact with web services
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-api-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== API Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# Check internet connectivity
if ! curl -s --max-time 5 https://www.google.com > /dev/null 2>&1; then
echo "⚠ WARNING: No internet connectivity detected"
echo " API skill may fail if it requires external network access"
fi
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR"
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies for API testing
RUN apt-get update && apt-get install -y \\
curl \\
jq \\
ca-certificates \\
&& rm -rf /var/lib/apt/lists/*
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Network Monitoring Setup
# ============================================================================
echo ""
echo "=== Setting Up Network Monitoring ==="
# Create network monitor log
NETWORK_LOG="$TEST_DIR/network-activity.log"
touch "$NETWORK_LOG"
# Start tcpdump in background (if available)
if command -v tcpdump &> /dev/null; then
echo "Starting network capture..."
sudo tcpdump -i any -w "$TEST_DIR/network-capture.pcap" &
TCPDUMP_PID=$!
echo "tcpdump PID: $TCPDUMP_PID"
else
echo "tcpdump not available - skipping network capture"
TCPDUMP_PID=""
fi
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with DNS configuration
safe_docker_run "skill-test:$SKILL_NAME" \
--dns 8.8.8.8 \
--dns 8.8.4.4 \
bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Execute skill and capture network activity
echo "Executing skill..."
START_TIME=$(date +%s)
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./api-skill.sh --test-mode
echo 'Skill execution placeholder - customize this for your skill'
# Log any curl/wget/http calls made
if command -v curl &> /dev/null; then
echo 'curl is available in container'
fi
if command -v wget &> /dev/null; then
echo 'wget is available in container'
fi
" 2>&1 | tee "$NETWORK_LOG" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
# Stop network capture
if [[ -n "$TCPDUMP_PID" ]]; then
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
fi
exit "$EXEC_EXIT_CODE"
}
END_TIME=$(date +%s)
EXECUTION_TIME=$((END_TIME - START_TIME))
# Stop network capture
if [[ -n "$TCPDUMP_PID" ]]; then
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
echo "Network capture saved to: $TEST_DIR/network-capture.pcap"
fi
# ============================================================================
# Analyze Network Activity
# ============================================================================
echo ""
echo "=== Analyzing Network Activity ==="
# Check for API calls in logs
echo "Searching for HTTP/HTTPS requests..."
API_CALLS=$(grep -iE "http://|https://|curl|wget|GET|POST|PUT|DELETE" "$NETWORK_LOG" || true)
if [[ -n "$API_CALLS" ]]; then
echo "Detected API calls:"
echo "$API_CALLS"
# Extract unique domains
DOMAINS=$(echo "$API_CALLS" | grep -oE "https?://[^/\"]+" | sort -u || true)
if [[ -n "$DOMAINS" ]]; then
echo ""
echo "Unique API endpoints:"
echo "$DOMAINS"
fi
else
echo "No obvious API calls detected in logs"
fi
# Check container network stats
echo ""
echo "Container network statistics:"
docker stats --no-stream --format "table {{.Name}}\t{{.NetIO}}" "$SKILL_TEST_CONTAINER_ID"
# ============================================================================
# Validate API Key Handling
# ============================================================================
echo ""
echo "=== Validating API Key Security ==="
# Check if API keys appear in logs (security concern)
POTENTIAL_KEYS=$(grep -iE "api[-_]?key|token|secret|password|bearer" "$NETWORK_LOG" | grep -v "API_KEY=" || true)
if [[ -n "$POTENTIAL_KEYS" ]]; then
echo "⚠ WARNING: Potential API keys/secrets found in logs:"
echo "$POTENTIAL_KEYS"
echo ""
echo "SECURITY ISSUE: API keys should NOT appear in logs!"
echo " - Use environment variables instead"
echo " - Redact sensitive data in log output"
fi
# Check for hardcoded endpoints
HARDCODED_URLS=$(grep -rn "http://" "$SKILL_PATH" 2>/dev/null | grep -v "example.com" || true)
if [[ -n "$HARDCODED_URLS" ]]; then
echo "⚠ WARNING: Hardcoded HTTP URLs found (should use HTTPS):"
echo "$HARDCODED_URLS"
fi
# ============================================================================
# Rate Limiting Check
# ============================================================================
echo ""
echo "=== Checking Rate Limiting Behavior ==="
# Count number of requests made
REQUEST_COUNT=$(grep -icE "GET|POST|PUT|DELETE" "$NETWORK_LOG" || echo "0")
echo "Total HTTP requests detected: $REQUEST_COUNT"
if [[ $REQUEST_COUNT -gt 100 ]]; then
echo "⚠ WARNING: High number of API requests ($REQUEST_COUNT)"
echo " - Consider implementing rate limiting"
echo " - Use caching to reduce API calls"
echo " - Check for request loops"
fi
REQUESTS_PER_SECOND=$((REQUEST_COUNT / EXECUTION_TIME))
echo "Requests per second: $REQUESTS_PER_SECOND"
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Summary:"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Execution time: ${EXECUTION_TIME}s"
echo " - API requests: $REQUEST_COUNT"
echo " - Network log: $NETWORK_LOG"
echo ""
echo "Security Checklist:"
if [[ -z "$POTENTIAL_KEYS" ]]; then
echo " ✓ No API keys in logs"
else
echo " ✗ API keys found in logs"
fi
if [[ -z "$HARDCODED_URLS" ]]; then
echo " ✓ No hardcoded HTTP URLs"
else
echo " ✗ Hardcoded HTTP URLs found"
fi
if [[ $REQUEST_COUNT -lt 100 ]]; then
echo " ✓ Reasonable request volume"
else
echo " ✗ High request volume"
fi
echo ""
echo "Recommendations:"
echo " - Document all external API dependencies"
echo " - Implement request caching where possible"
echo " - Use exponential backoff for retries"
echo " - Respect API rate limits"
echo " - Use HTTPS for all API calls"
echo " - Never log API keys or secrets"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,302 @@
#!/bin/bash
# Test Template for Docker-Based Skills with JSON Output
# This is an enhanced version of docker-skill-test.sh with CI/CD integration
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-docker-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# JSON reporting
export JSON_REPORT_FILE="$TEST_DIR/test-report.json"
export JSON_ENABLED="${JSON_ENABLED:-true}"
# ============================================================================
# Load Helper Libraries
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
JSON_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/json-reporter.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
if [[ ! -f "$JSON_LIB" ]]; then
echo "ERROR: JSON reporter library not found: $JSON_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# shellcheck source=/dev/null
source "$JSON_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
cleanup_and_finalize() {
local exit_code=$?
local end_time=$(date +%s)
local duration=$((end_time - START_TIME))
# Finalize JSON report
if [[ "$JSON_ENABLED" == "true" ]]; then
json_finalize "$exit_code" "$duration"
export_all_formats "$TEST_DIR/test-report"
fi
# Standard cleanup
cleanup_on_exit
exit "$exit_code"
}
trap cleanup_and_finalize EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Docker Skill Test (JSON Mode): $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Create test directory
mkdir -p "$TEST_DIR"
# Initialize JSON report
if [[ "$JSON_ENABLED" == "true" ]]; then
json_init "docker-skill-test" "$SKILL_NAME"
fi
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "setup" "Skill directory not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
if ! preflight_check_docker; then
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "environment" "Docker pre-flight checks failed"
exit 1
fi
# ============================================================================
# Baseline Measurements (Before)
# ============================================================================
echo ""
echo "=== Taking Baseline Measurements ==="
START_TIME=$(date +%s)
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "Before test:"
echo " Containers: $BEFORE_CONTAINERS"
echo " Images: $BEFORE_IMAGES"
echo " Volumes: $BEFORE_VOLUMES"
echo " Networks: $BEFORE_NETWORKS"
# Record baseline in JSON
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "baseline_containers" "$BEFORE_CONTAINERS"
json_add_metric "baseline_images" "$BEFORE_IMAGES"
json_add_metric "baseline_volumes" "$BEFORE_VOLUMES"
json_add_metric "baseline_networks" "$BEFORE_NETWORKS"
fi
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies
RUN apt-get update && apt-get install -y \\
curl \\
git \\
nodejs \\
npm \\
docker.io \\
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (mock for testing)
RUN mkdir -p /root/.claude/skills
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
BUILD_START=$(date +%s)
if ! safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME"; then
echo "ERROR: Failed to build test image"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "build" "Docker image build failed"
exit 1
fi
BUILD_END=$(date +%s)
BUILD_DURATION=$((BUILD_END - BUILD_START))
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# Record build metrics
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "build_duration_seconds" "$BUILD_DURATION" "seconds"
fi
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with Docker socket access
if ! safe_docker_run "skill-test:$SKILL_NAME" \
-v /var/run/docker.sock:/var/run/docker.sock \
bash -c "sleep infinity"; then
echo "ERROR: Failed to start test container"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "runtime" "Container failed to start"
exit 1
fi
# Execute skill
echo "Executing skill..."
EXEC_START=$(date +%s)
EXEC_OUTPUT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
echo 'Skill execution placeholder - customize this for your skill'
" 2>&1) || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "execution" "Skill failed with exit code $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
EXEC_END=$(date +%s)
EXEC_DURATION=$((EXEC_END - EXEC_START))
# Record execution metrics
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "execution_duration_seconds" "$EXEC_DURATION" "seconds"
fi
# ============================================================================
# Collect Measurements (After)
# ============================================================================
echo ""
echo "=== Collecting Post-Execution Measurements ==="
sleep 2 # Wait for async operations
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
CONTAINERS_DELTA=$((AFTER_CONTAINERS - BEFORE_CONTAINERS))
IMAGES_DELTA=$((AFTER_IMAGES - BEFORE_IMAGES))
VOLUMES_DELTA=$((AFTER_VOLUMES - BEFORE_VOLUMES))
NETWORKS_DELTA=$((AFTER_NETWORKS - BEFORE_NETWORKS))
echo "After test:"
echo " Containers: $AFTER_CONTAINERS (delta: $CONTAINERS_DELTA)"
echo " Images: $AFTER_IMAGES (delta: $IMAGES_DELTA)"
echo " Volumes: $AFTER_VOLUMES (delta: $VOLUMES_DELTA)"
echo " Networks: $AFTER_NETWORKS (delta: $NETWORKS_DELTA)"
# Record changes in JSON
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "containers_created" "$CONTAINERS_DELTA"
json_add_metric "images_created" "$IMAGES_DELTA"
json_add_metric "volumes_created" "$VOLUMES_DELTA"
json_add_metric "networks_created" "$NETWORKS_DELTA"
fi
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Skill Cleanup ==="
# Check for orphaned containers
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_issue "warning" "cleanup" "Found $ORPHANED_CONTAINERS orphaned containers"
json_add_recommendation "Cleanup" "Implement automatic container cleanup in skill"
fi
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "test-failure" "Container exited with code $CONTAINER_EXIT_CODE"
fi
echo ""
echo "Summary:"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Build duration: ${BUILD_DURATION}s"
echo " - Execution duration: ${EXEC_DURATION}s"
echo " - Docker resources created: $CONTAINERS_DELTA containers, $IMAGES_DELTA images, $VOLUMES_DELTA volumes, $NETWORKS_DELTA networks"
if [[ "$JSON_ENABLED" == "true" ]]; then
echo ""
echo "JSON reports will be generated at:"
echo " - $TEST_DIR/test-report.json"
echo " - $TEST_DIR/test-report.junit.xml"
echo " - $TEST_DIR/test-report.md"
fi
# Exit with appropriate code (cleanup_and_finalize will handle JSON)
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,236 @@
#!/bin/bash
# Test Template for Docker-Based Skills
# Use this template when testing skills that:
# - Start Docker containers
# - Build Docker images
# - Manage Docker volumes/networks
# - Require Docker daemon access
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-docker-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Docker Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Baseline Measurements (Before)
# ============================================================================
echo ""
echo "=== Taking Baseline Measurements ==="
# Count Docker resources before test
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "Before test:"
echo " Containers: $BEFORE_CONTAINERS"
echo " Images: $BEFORE_IMAGES"
echo " Volumes: $BEFORE_VOLUMES"
echo " Networks: $BEFORE_NETWORKS"
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR"
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies
RUN apt-get update && apt-get install -y \\
curl \\
git \\
nodejs \\
npm \\
docker.io \\
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (mock for testing)
RUN mkdir -p /root/.claude/skills
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with Docker socket access (for Docker-in-Docker skills)
safe_docker_run "skill-test:$SKILL_NAME" \
-v /var/run/docker.sock:/var/run/docker.sock \
bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Execute skill (customize this command based on your skill's interface)
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./skill.sh test-mode
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Collect Measurements (After)
# ============================================================================
echo ""
echo "=== Collecting Post-Execution Measurements ==="
# Wait for async operations to complete
sleep 2
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "After test:"
echo " Containers: $AFTER_CONTAINERS (delta: $((AFTER_CONTAINERS - BEFORE_CONTAINERS)))"
echo " Images: $AFTER_IMAGES (delta: $((AFTER_IMAGES - BEFORE_IMAGES)))"
echo " Volumes: $AFTER_VOLUMES (delta: $((AFTER_VOLUMES - BEFORE_VOLUMES)))"
echo " Networks: $AFTER_NETWORKS (delta: $((AFTER_NETWORKS - BEFORE_NETWORKS)))"
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Skill Cleanup ==="
# Check for orphaned containers
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
docker ps -a --filter "label=created-by-skill=$SKILL_NAME"
fi
# Check for unlabeled containers (potential orphans)
SKILL_CONTAINERS=$(docker ps -a --filter "name=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $SKILL_CONTAINERS -gt 1 ]]; then
echo "⚠ WARNING: Found $SKILL_CONTAINERS containers with skill name pattern"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
echo ""
echo "Summary:"
echo " - Skill executed successfully"
echo " - Exit code: 0"
echo " - Container cleanup: Will be handled by trap"
else
echo "❌ TEST FAILED"
echo ""
echo "Summary:"
echo " - Skill execution failed"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Check logs: docker logs $SKILL_TEST_CONTAINER_ID"
fi
echo ""
echo "Docker Resources Created:"
echo " - Containers: $((AFTER_CONTAINERS - BEFORE_CONTAINERS))"
echo " - Images: $((AFTER_IMAGES - BEFORE_IMAGES))"
echo " - Volumes: $((AFTER_VOLUMES - BEFORE_VOLUMES))"
echo " - Networks: $((AFTER_NETWORKS - BEFORE_NETWORKS))"
echo ""
echo "Cleanup Instructions:"
echo " - Test container will be removed automatically"
echo " - To manually clean up: docker rm -f $SKILL_TEST_CONTAINER_ID"
echo " - To remove test image: docker rmi skill-test:$SKILL_NAME"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,360 @@
#!/bin/bash
# Test Template for File-Manipulation Skills
# Use this template when testing skills that:
# - Create, read, update, or delete files
# - Modify configurations or codebases
# - Generate reports or artifacts
# - Work with filesystem operations
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-file-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== File Manipulation Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Build Test Environment with Sample Files
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR/test-workspace"
# Create sample files for the skill to manipulate
cat > "$TEST_DIR/test-workspace/sample.txt" <<'EOF'
This is a sample text file for testing.
Line 2
Line 3
EOF
cat > "$TEST_DIR/test-workspace/config.json" <<'EOF'
{
"setting1": "value1",
"setting2": 42,
"enabled": true
}
EOF
mkdir -p "$TEST_DIR/test-workspace/subdir"
echo "Nested file" > "$TEST_DIR/test-workspace/subdir/nested.txt"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install file manipulation tools
RUN apt-get update && apt-get install -y \\
coreutils \\
jq \\
tree \\
&& rm -rf /var/lib/apt/lists/*
# Create workspace
RUN mkdir -p /workspace
# Copy skill
COPY skill/ /root/.claude/skills/$SKILL_NAME/
# Copy test files
COPY test-workspace/ /workspace/
WORKDIR /workspace
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Take "Before" Filesystem Snapshot
# ============================================================================
echo ""
echo "=== Taking Filesystem Snapshot (Before) ==="
# Start container
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Get baseline file list
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/before-files.txt"
# Get file sizes and checksums
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
find . -type f -exec md5sum {} \; | sort
" > "$TEST_DIR/before-checksums.txt"
# Count files
BEFORE_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
BEFORE_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
echo "Before execution:"
echo " Files: $BEFORE_FILE_COUNT"
echo " Directories: $BEFORE_DIR_COUNT"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Execute skill
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./file-processor.sh /workspace
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Take "After" Filesystem Snapshot
# ============================================================================
echo ""
echo "=== Taking Filesystem Snapshot (After) ==="
# Get updated file list
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/after-files.txt"
# Get updated checksums
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
find . -type f -exec md5sum {} \; | sort
" > "$TEST_DIR/after-checksums.txt"
# Count files
AFTER_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
AFTER_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
echo "After execution:"
echo " Files: $AFTER_FILE_COUNT"
echo " Directories: $AFTER_DIR_COUNT"
# ============================================================================
# Analyze Filesystem Changes
# ============================================================================
echo ""
echo "=== Analyzing Filesystem Changes ==="
# Files added
echo ""
echo "Files Added:"
comm -13 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-added.txt"
ADDED_COUNT=$(wc -l < "$TEST_DIR/files-added.txt")
echo " Count: $ADDED_COUNT"
if [[ $ADDED_COUNT -gt 0 ]]; then
head -10 "$TEST_DIR/files-added.txt"
if [[ $ADDED_COUNT -gt 10 ]]; then
echo " ... and $((ADDED_COUNT - 10)) more"
fi
fi
# Files removed
echo ""
echo "Files Removed:"
comm -23 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-removed.txt"
REMOVED_COUNT=$(wc -l < "$TEST_DIR/files-removed.txt")
echo " Count: $REMOVED_COUNT"
if [[ $REMOVED_COUNT -gt 0 ]]; then
head -10 "$TEST_DIR/files-removed.txt"
if [[ $REMOVED_COUNT -gt 10 ]]; then
echo " ... and $((REMOVED_COUNT - 10)) more"
fi
fi
# Files modified
echo ""
echo "Files Modified:"
comm -12 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" | while read -r file; do
BEFORE_HASH=$(grep "$file" "$TEST_DIR/before-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
AFTER_HASH=$(grep "$file" "$TEST_DIR/after-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
if [[ -n "$BEFORE_HASH" && -n "$AFTER_HASH" && "$BEFORE_HASH" != "$AFTER_HASH" ]]; then
echo " $file"
fi
done | tee "$TEST_DIR/files-modified.txt"
MODIFIED_COUNT=$(wc -l < "$TEST_DIR/files-modified.txt")
echo " Count: $MODIFIED_COUNT"
# ============================================================================
# Validate File Permissions
# ============================================================================
echo ""
echo "=== Checking File Permissions ==="
# Find files with unusual permissions
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /workspace -type f -perm /111 -ls
" > "$TEST_DIR/executable-files.txt" || true
EXECUTABLE_COUNT=$(wc -l < "$TEST_DIR/executable-files.txt")
if [[ $EXECUTABLE_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $EXECUTABLE_COUNT executable files"
cat "$TEST_DIR/executable-files.txt"
else
echo "✓ No unexpected executable files"
fi
# Check for world-writable files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /workspace -type f -perm -002 -ls
" > "$TEST_DIR/world-writable-files.txt" || true
WRITABLE_COUNT=$(wc -l < "$TEST_DIR/world-writable-files.txt")
if [[ $WRITABLE_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $WRITABLE_COUNT world-writable files (security risk)"
cat "$TEST_DIR/world-writable-files.txt"
else
echo "✓ No world-writable files"
fi
# ============================================================================
# Check for Sensitive Data
# ============================================================================
echo ""
echo "=== Scanning for Sensitive Data ==="
# Check for potential secrets in new files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
grep -rni 'password\|api[-_]key\|secret\|token' /workspace
" 2>/dev/null | tee "$TEST_DIR/potential-secrets.txt" || true
SECRET_COUNT=$(wc -l < "$TEST_DIR/potential-secrets.txt")
if [[ $SECRET_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $SECRET_COUNT lines with potential secrets"
echo " Review: $TEST_DIR/potential-secrets.txt"
else
echo "✓ No obvious secrets detected"
fi
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Cleanup Behavior ==="
# Check for leftover temp files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /tmp -name '*skill*' -o -name '*.tmp' -o -name '*.temp'
" > "$TEST_DIR/temp-files.txt" || true
TEMP_COUNT=$(wc -l < "$TEST_DIR/temp-files.txt")
if [[ $TEMP_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $TEMP_COUNT leftover temp files"
cat "$TEST_DIR/temp-files.txt"
else
echo "✓ No leftover temp files"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Filesystem Changes Summary:"
echo " - Files added: $ADDED_COUNT"
echo " - Files removed: $REMOVED_COUNT"
echo " - Files modified: $MODIFIED_COUNT"
echo " - Total file count change: $((AFTER_FILE_COUNT - BEFORE_FILE_COUNT))"
echo ""
echo "Security & Quality Checklist:"
[[ $EXECUTABLE_COUNT -eq 0 ]] && echo " ✓ No unexpected executable files" || echo " ✗ Found executable files"
[[ $WRITABLE_COUNT -eq 0 ]] && echo " ✓ No world-writable files" || echo " ✗ Found world-writable files"
[[ $SECRET_COUNT -eq 0 ]] && echo " ✓ No secrets in files" || echo " ✗ Potential secrets found"
[[ $TEMP_COUNT -eq 0 ]] && echo " ✓ Clean temp directory" || echo " ✗ Leftover temp files"
echo ""
echo "Detailed Reports:"
echo " - Files added: $TEST_DIR/files-added.txt"
echo " - Files removed: $TEST_DIR/files-removed.txt"
echo " - Files modified: $TEST_DIR/files-modified.txt"
echo " - Before snapshot: $TEST_DIR/before-files.txt"
echo " - After snapshot: $TEST_DIR/after-files.txt"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,395 @@
#!/bin/bash
# Test Template for Git-Operation Skills
# Use this template when testing skills that:
# - Create commits, branches, or tags
# - Modify git history or configuration
# - Work with git worktrees
# - Interact with remote repositories
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-git-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Git Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Create Test Git Repository
# ============================================================================
echo ""
echo "=== Creating Test Git Repository ==="
mkdir -p "$TEST_DIR/test-repo"
cd "$TEST_DIR/test-repo"
# Initialize git repo
git init
git config user.name "Test User"
git config user.email "test@example.com"
# Create initial commit
echo "# Test Repository" > README.md
echo "Initial content" > file1.txt
git add .
git commit -m "Initial commit"
# Create a branch
git checkout -b feature-branch
echo "Feature content" > feature.txt
git add feature.txt
git commit -m "Add feature"
# Go back to main
git checkout main
# Create a tag
git tag v1.0.0
echo "Test repository created:"
git log --oneline --all --graph
echo ""
git branch -a
echo ""
git tag
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
cd "$TEST_DIR"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install git
RUN apt-get update && apt-get install -y \\
git \\
&& rm -rf /var/lib/apt/lists/*
# Configure git
RUN git config --global user.name "Test User" && \\
git config --global user.email "test@example.com"
# Copy skill
COPY skill/ /root/.claude/skills/$SKILL_NAME/
# Copy test repository
COPY test-repo/ /workspace/
WORKDIR /workspace
CMD ["/bin/bash"]
EOF
# Copy skill
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Take "Before" Git Snapshot
# ============================================================================
echo ""
echo "=== Taking Git Snapshot (Before) ==="
# Start container
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Capture git state before
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --all --oneline --graph > /tmp/before-log.txt
git branch -a > /tmp/before-branches.txt
git tag > /tmp/before-tags.txt
git status > /tmp/before-status.txt
git config --list > /tmp/before-config.txt
" || true
# Copy snapshots out
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-log.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-branches.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-tags.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-status.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-config.txt" "$TEST_DIR/"
BEFORE_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
BEFORE_BRANCH_COUNT=$(wc -l < "$TEST_DIR/before-branches.txt")
BEFORE_TAG_COUNT=$(wc -l < "$TEST_DIR/before-tags.txt")
echo "Before execution:"
echo " Commits: $BEFORE_COMMIT_COUNT"
echo " Branches: $BEFORE_BRANCH_COUNT"
echo " Tags: $BEFORE_TAG_COUNT"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Execute skill
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./git-skill.sh /workspace
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Take "After" Git Snapshot
# ============================================================================
echo ""
echo "=== Taking Git Snapshot (After) ==="
# Capture git state after
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --all --oneline --graph > /tmp/after-log.txt
git branch -a > /tmp/after-branches.txt
git tag > /tmp/after-tags.txt
git status > /tmp/after-status.txt
git config --list > /tmp/after-config.txt
" || true
# Copy snapshots out
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-log.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-branches.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-tags.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-status.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-config.txt" "$TEST_DIR/"
AFTER_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
AFTER_BRANCH_COUNT=$(wc -l < "$TEST_DIR/after-branches.txt")
AFTER_TAG_COUNT=$(wc -l < "$TEST_DIR/after-tags.txt")
echo "After execution:"
echo " Commits: $AFTER_COMMIT_COUNT"
echo " Branches: $AFTER_BRANCH_COUNT"
echo " Tags: $AFTER_TAG_COUNT"
# ============================================================================
# Analyze Git Changes
# ============================================================================
echo ""
echo "=== Analyzing Git Changes ==="
# New commits
COMMIT_DIFF=$((AFTER_COMMIT_COUNT - BEFORE_COMMIT_COUNT))
if [[ $COMMIT_DIFF -gt 0 ]]; then
echo "✓ Added $COMMIT_DIFF new commit(s)"
echo ""
echo "New commits:"
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --oneline -n $COMMIT_DIFF
"
else
echo "No new commits created"
fi
# New branches
echo ""
echo "Branch Changes:"
comm -13 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-added.txt"
BRANCH_ADDED=$(wc -l < "$TEST_DIR/branches-added.txt")
if [[ $BRANCH_ADDED -gt 0 ]]; then
echo " Added $BRANCH_ADDED branch(es):"
cat "$TEST_DIR/branches-added.txt"
fi
comm -23 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-removed.txt"
BRANCH_REMOVED=$(wc -l < "$TEST_DIR/branches-removed.txt")
if [[ $BRANCH_REMOVED -gt 0 ]]; then
echo " Removed $BRANCH_REMOVED branch(es):"
cat "$TEST_DIR/branches-removed.txt"
fi
if [[ $BRANCH_ADDED -eq 0 && $BRANCH_REMOVED -eq 0 ]]; then
echo " No branch changes"
fi
# New tags
echo ""
echo "Tag Changes:"
comm -13 "$TEST_DIR/before-tags.txt" "$TEST_DIR/after-tags.txt" > "$TEST_DIR/tags-added.txt"
TAG_ADDED=$(wc -l < "$TEST_DIR/tags-added.txt")
if [[ $TAG_ADDED -gt 0 ]]; then
echo " Added $TAG_ADDED tag(s):"
cat "$TEST_DIR/tags-added.txt"
fi
# Config changes
echo ""
echo "Git Config Changes:"
diff "$TEST_DIR/before-config.txt" "$TEST_DIR/after-config.txt" > "$TEST_DIR/config-diff.txt" || true
if [[ -s "$TEST_DIR/config-diff.txt" ]]; then
echo " Configuration was modified:"
cat "$TEST_DIR/config-diff.txt"
else
echo " No configuration changes"
fi
# ============================================================================
# Check Working Tree Status
# ============================================================================
echo ""
echo "=== Checking Working Tree Status ==="
UNCOMMITTED_CHANGES=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git status --porcelain" || echo "")
if [[ -n "$UNCOMMITTED_CHANGES" ]]; then
echo "⚠ WARNING: Uncommitted changes detected:"
echo "$UNCOMMITTED_CHANGES"
echo ""
echo "Skills should clean up working tree after execution!"
else
echo "✓ Working tree is clean"
fi
# ============================================================================
# Validate Git Safety
# ============================================================================
echo ""
echo "=== Git Safety Checks ==="
# Check for force operations in logs
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "force\|--force\|-f" > "$TEST_DIR/force-operations.txt" || true
FORCE_OPS=$(wc -l < "$TEST_DIR/force-operations.txt")
if [[ $FORCE_OPS -gt 0 ]]; then
echo "⚠ WARNING: Detected $FORCE_OPS force operations"
cat "$TEST_DIR/force-operations.txt"
else
echo "✓ No force operations detected"
fi
# Check for history rewriting
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "rebase\|reset --hard\|filter-branch" > "$TEST_DIR/history-rewrites.txt" || true
REWRITES=$(wc -l < "$TEST_DIR/history-rewrites.txt")
if [[ $REWRITES -gt 0 ]]; then
echo "⚠ WARNING: Detected $REWRITES history rewrite operations"
cat "$TEST_DIR/history-rewrites.txt"
else
echo "✓ No history rewriting detected"
fi
# Check for dangling commits
DANGLING_COMMITS=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git fsck --lost-found 2>&1 | grep 'dangling commit'" || echo "")
if [[ -n "$DANGLING_COMMITS" ]]; then
echo "⚠ WARNING: Dangling commits found (potential data loss)"
echo "$DANGLING_COMMITS"
else
echo "✓ No dangling commits"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Git Changes Summary:"
echo " - Commits added: $COMMIT_DIFF"
echo " - Branches added: $BRANCH_ADDED"
echo " - Branches removed: $BRANCH_REMOVED"
echo " - Tags added: $TAG_ADDED"
echo ""
echo "Safety Checklist:"
[[ -z "$UNCOMMITTED_CHANGES" ]] && echo " ✓ Clean working tree" || echo " ✗ Uncommitted changes"
[[ $FORCE_OPS -eq 0 ]] && echo " ✓ No force operations" || echo " ✗ Force operations detected"
[[ $REWRITES -eq 0 ]] && echo " ✓ No history rewriting" || echo " ✗ History rewriting detected"
[[ -z "$DANGLING_COMMITS" ]] && echo " ✓ No dangling commits" || echo " ✗ Dangling commits found"
echo ""
echo "Detailed Snapshots:"
echo " - Before log: $TEST_DIR/before-log.txt"
echo " - After log: $TEST_DIR/after-log.txt"
echo " - Branch changes: $TEST_DIR/branches-added.txt"
echo " - Config diff: $TEST_DIR/config-diff.txt"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi