Initial commit

2025-11-29 18:16:56 +08:00
commit 8a3d331e04
61 changed files with 11808 additions and 0 deletions
--- a/skills/skill-creator/patterns/data-processing.md
+++ b/skills/skill-creator/patterns/data-processing.md
@@ -0,0 +1,247 @@
+# Data Processing Skill Pattern
+
+Use this pattern when your skill **processes, analyzes, or transforms** data to extract insights.
+
+## When to Use
+
+- Skill ingests data from files or APIs
+- Performs analysis or transformation
+- Generates insights, reports, or visualizations
+- Examples: cc-insights (conversation analysis)
+
+## Structure
+
+### Data Flow Architecture
+
+Define clear data pipeline:
+
+```
+Input Sources → Processing → Storage → Query/Analysis → Output
+```
+
+Example:
+```
+JSONL files → Parser → SQLite + Vector DB → Search/Analytics → Reports/Dashboard
+```
+
+### Processing Modes
+
+**Batch Processing:**
+- Process all data at once
+- Good for: Initial setup, complete reprocessing
+- Trade-off: Slow startup, complete data
+
+**Incremental Processing:**
+- Process only new/changed data
+- Good for: Regular updates, performance
+- Trade-off: Complex state tracking
+
+**Streaming Processing:**
+- Process data as it arrives
+- Good for: Real-time updates
+- Trade-off: Complex implementation
+
+### Storage Strategy
+
+Choose appropriate storage:
+
+**SQLite:**
+- Structured metadata
+- Fast queries
+- Relational data
+- Good for: Indexes, aggregations
+
+**Vector Database (ChromaDB):**
+- Semantic embeddings
+- Similarity search
+- Good for: RAG, semantic queries
+
+**File System:**
+- Raw data
+- Large blobs
+- Good for: Backups, archives
+
+## Example: CC Insights
+
+**Input**: Claude Code conversation JSONL files
+
+**Processing Pipeline:**
+1. JSONL Parser - Decode base64, extract messages
+2. Metadata Extractor - Timestamps, files, tools
+3. Embeddings Generator - Vector representations
+4. Pattern Detector - Identify trends
+
+**Storage:**
+- SQLite: Conversation metadata, fast queries
+- ChromaDB: Vector embeddings, semantic search
+- Cache: Processed conversation data
+
+**Query Interfaces:**
+1. CLI Search - Command-line semantic search
+2. Insight Generator - Pattern-based reports
+3. Dashboard - Interactive web UI
+
+**Outputs:**
+- Search results with similarity scores
+- Weekly activity reports
+- File heatmaps
+- Tool usage analytics
+
+## Data Processing Workflow
+
+### Phase 1: Ingestion
+```markdown
+1. **Discover Data Sources**
+   - Locate input files/APIs
+   - Validate accessibility
+   - Calculate scope (file count, size)
+
+2. **Initial Validation**
+   - Check format validity
+   - Verify schema compliance
+   - Estimate processing time
+
+3. **State Management**
+   - Track what's been processed
+   - Support incremental updates
+   - Handle failures gracefully
+```
+
+### Phase 2: Processing
+```markdown
+1. **Parse/Transform**
+   - Read raw data
+   - Apply transformations
+   - Handle errors and edge cases
+
+2. **Extract Features**
+   - Generate metadata
+   - Calculate metrics
+   - Create embeddings (if semantic search)
+
+3. **Store Results**
+   - Write to database(s)
+   - Update indexes
+   - Maintain consistency
+```
+
+### Phase 3: Analysis
+```markdown
+1. **Query Interface**
+   - Support multiple query types
+   - Optimize for common patterns
+   - Return ranked results
+
+2. **Pattern Detection**
+   - Aggregate data
+   - Identify trends
+   - Generate insights
+
+3. **Visualization**
+   - Format for human consumption
+   - Support multiple output formats
+   - Interactive when possible
+```
+
+## Performance Characteristics
+
+Document expected performance:
+
+```markdown
+### Performance Characteristics
+
+- **Initial indexing**: ~1-2 minutes for 100 records
+- **Incremental updates**: <5 seconds for new records
+- **Search latency**: <1 second for queries
+- **Report generation**: <10 seconds for standard reports
+- **Memory usage**: ~200MB for 1000 records
+```
+
+## Best Practices
+
+1. **Incremental Processing**: Don't reprocess everything on each run
+2. **State Tracking**: Track what's been processed to avoid duplicates
+3. **Batch Operations**: Process in batches for memory efficiency
+4. **Progress Indicators**: Show progress for long operations
+5. **Error Recovery**: Handle failures gracefully, resume where left off
+6. **Data Validation**: Validate inputs before expensive processing
+7. **Index Optimization**: Optimize databases for common queries
+8. **Memory Management**: Stream large files, don't load everything
+9. **Parallel Processing**: Use parallelism when possible
+10. **Cache Wisely**: Cache expensive computations
+
+## Scripts Structure
+
+For data processing skills, provide helper scripts:
+
+```
+scripts/
+├── processor.py          # Main data processing script
+├── indexer.py           # Build indexes/embeddings
+├── query.py             # Query interface (CLI)
+└── generator.py         # Report/insight generation
+```
+
+### Script Best Practices
+
+```python
+# Good patterns for processing scripts:
+
+# 1. Use click for CLI
+import click
+
+@click.command()
+@click.option('--input', help='Input path')
+@click.option('--reindex', is_flag=True)
+def process(input, reindex):
+    """Process data from input source."""
+    pass
+
+# 2. Show progress
+from tqdm import tqdm
+for item in tqdm(items, desc="Processing"):
+    process_item(item)
+
+# 3. Handle errors gracefully
+try:
+    result = process_item(item)
+except Exception as e:
+    logger.error(f"Failed to process {item}: {e}")
+    continue  # Continue with next item
+
+# 4. Support incremental updates
+if not reindex and is_already_processed(item):
+    continue
+
+# 5. Use batch processing
+for batch in chunks(items, batch_size=32):
+    process_batch(batch)
+```
+
+## Storage Schema
+
+Document your data schema:
+
+```sql
+-- Example SQLite schema
+CREATE TABLE conversations (
+    id TEXT PRIMARY KEY,
+    timestamp INTEGER,
+    message_count INTEGER,
+    files_modified TEXT,  -- JSON array
+    tools_used TEXT       -- JSON array
+);
+
+CREATE INDEX idx_timestamp ON conversations(timestamp);
+CREATE INDEX idx_files ON conversations(files_modified);
+```
+
+## Output Formats
+
+Support multiple output formats:
+
+1. **Markdown**: Human-readable reports
+2. **JSON**: Machine-readable for integration
+3. **CSV**: Spreadsheet-compatible data
+4. **HTML**: Styled reports with charts
+5. **Interactive**: Web dashboards (optional)
--- a/skills/skill-creator/patterns/mode-based.md
+++ b/skills/skill-creator/patterns/mode-based.md
@@ -0,0 +1,78 @@
+# Mode-Based Skill Pattern
+
+Use this pattern when your skill has **multiple distinct operating modes** based on user intent.
+
+## When to Use
+
+- Skill performs fundamentally different operations based on context
+- Each mode has its own workflow and outputs
+- User intent determines which mode to activate
+- Examples: git-worktree-setup (single/batch/cleanup/list modes)
+
+## Structure
+
+### Quick Decision Matrix
+
+Create a clear mapping of user requests to modes:
+
+```
+User Request                → Mode          → Action
+───────────────────────────────────────────────────────────
+"trigger phrase 1"          → Mode 1        → High-level action
+"trigger phrase 2"          → Mode 2        → High-level action
+"trigger phrase 3"          → Mode 3        → High-level action
+```
+
+### Mode Detection Logic
+
+Provide clear logic for mode selection:
+
+```javascript
+// Mode 1: [Name]
+if (userMentions("keyword1", "keyword2")) {
+  return "mode1-name";
+}
+
+// Mode 2: [Name]
+if (userMentions("keyword3", "keyword4")) {
+  return "mode2-name";
+}
+
+// Ambiguous - ask user
+return askForClarification();
+```
+
+### Separate Mode Documentation
+
+For complex skills, create separate files for each mode:
+
+```
+skill-name/
+├── SKILL.md                 # Overview and mode detection
+├── modes/
+│   ├── mode1-name.md       # Detailed workflow for mode 1
+│   ├── mode2-name.md       # Detailed workflow for mode 2
+│   └── mode3-name.md       # Detailed workflow for mode 3
+```
+
+## Example: Git Worktree Setup
+
+**Modes:**
+1. Single Worktree - Create one worktree
+2. Batch Worktrees - Create multiple worktrees
+3. Cleanup - Remove worktrees
+4. List/Manage - Show worktree status
+
+**Detection Logic:**
+- "create worktree for X" → Single mode
+- "create worktrees for A, B, C" → Batch mode
+- "remove worktree" → Cleanup mode
+- "list worktrees" → List mode
+
+## Best Practices
+
+1. **Clear Mode Boundaries**: Each mode should be distinct and non-overlapping
+2. **Explicit Detection**: Provide clear rules for mode selection
+3. **Clarification Path**: Always have a fallback to ask user when ambiguous
+4. **Mode Independence**: Each mode should work standalone
+5. **Shared Prerequisites**: Extract common validation to reduce duplication
--- a/skills/skill-creator/patterns/phase-based.md
+++ b/skills/skill-creator/patterns/phase-based.md
@@ -0,0 +1,115 @@
+# Phase-Based Skill Pattern
+
+Use this pattern when your skill follows **sequential phases** that build on each other.
+
+## When to Use
+
+- Skill has a linear workflow with clear stages
+- Each phase depends on the previous phase
+- Progressive disclosure of complexity
+- Examples: codebase-auditor (discovery → analysis → reporting → remediation)
+
+## Structure
+
+### Phase Overview
+
+Define clear phases with dependencies:
+
+```
+Phase 1: Discovery
+  ↓
+Phase 2: Analysis
+  ↓
+Phase 3: Reporting
+  ↓
+Phase 4: Action/Remediation
+```
+
+### Phase Workflow Template
+
+```markdown
+## Workflow
+
+### Phase 1: [Name]
+
+**Purpose**: [What this phase accomplishes]
+
+**Steps:**
+1. [Step 1]
+2. [Step 2]
+3. [Step 3]
+
+**Output**: [What information is produced]
+
+**Transition**: [When to move to next phase]
+
+### Phase 2: [Name]
+
+**Purpose**: [What this phase accomplishes]
+
+**Inputs**: [Required from previous phase]
+
+**Steps:**
+1. [Step 1]
+2. [Step 2]
+
+**Output**: [What information is produced]
+```
+
+## Example: Codebase Auditor
+
+**Phase 1: Initial Assessment** (Progressive Disclosure)
+- Lightweight scan to understand codebase
+- Identify tech stack and structure
+- Quick health check
+- **Output**: Project profile and initial findings
+
+**Phase 2: Deep Analysis** (Load on Demand)
+- Based on Phase 1, perform targeted analysis
+- Code quality, security, testing, etc.
+- **Output**: Detailed findings with severity
+
+**Phase 3: Report Generation**
+- Aggregate findings from Phase 2
+- Calculate scores and metrics
+- **Output**: Comprehensive audit report
+
+**Phase 4: Remediation Planning**
+- Prioritize findings by severity
+- Generate action plan
+- **Output**: Prioritized task list
+
+## Best Practices
+
+1. **Progressive Disclosure**: Start lightweight, go deep only when needed
+2. **Clear Transitions**: Explicitly state when moving between phases
+3. **Phase Independence**: Each phase should have clear inputs/outputs
+4. **Checkpoint Validation**: Verify prerequisites before advancing
+5. **Early Exit**: Allow stopping after any phase if user only needs partial analysis
+6. **Incremental Value**: Each phase should provide standalone value
+
+## Phase Characteristics
+
+### Discovery Phase
+- Fast and lightweight
+- Gather context and identify scope
+- No expensive operations
+- Output guides subsequent phases
+
+### Analysis Phase
+- Deep dive based on discovery
+- Resource-intensive operations
+- Parallel processing when possible
+- Structured output for reporting
+
+### Reporting Phase
+- Aggregate and synthesize data
+- Calculate metrics and scores
+- Generate human-readable output
+- Support multiple formats
+
+### Action Phase
+- Provide recommendations
+- Generate implementation guidance
+- Offer to perform actions
+- Track completion
--- a/skills/skill-creator/patterns/validation.md
+++ b/skills/skill-creator/patterns/validation.md
@@ -0,0 +1,174 @@
+# Validation/Audit Skill Pattern
+
+Use this pattern when your skill **validates, audits, or checks** artifacts against standards.
+
+## When to Use
+
+- Skill checks compliance against defined standards
+- Detects issues and provides remediation guidance
+- Generates reports with severity levels
+- Examples: claude-md-auditor, codebase-auditor
+
+## Structure
+
+### Validation Sources
+
+Clearly define what you're validating against:
+
+```markdown
+## Validation Sources
+
+### 1. ✅ Official Standards
+- **Source**: [Authority/documentation]
+- **Authority**: Highest (requirements)
+- **Examples**: [List key standards]
+
+### 2. 💡 Best Practices
+- **Source**: Community/field experience
+- **Authority**: Medium (recommendations)
+- **Examples**: [List practices]
+
+### 3. 🔬 Research/Optimization
+- **Source**: Academic research
+- **Authority**: Medium (evidence-based)
+- **Examples**: [List findings]
+```
+
+### Finding Structure
+
+Use consistent structure for all findings:
+
+```markdown
+**Severity**: Critical | High | Medium | Low
+**Category**: [Type of issue]
+**Location**: [File:line or context]
+**Description**: [What the issue is]
+**Impact**: [Why it matters]
+**Remediation**: [How to fix]
+**Effort**: [Time estimate]
+**Source**: Official | Community | Research
+```
+
+### Severity Levels
+
+Define clear severity criteria:
+
+- **Critical**: Security risk, production-blocking (fix immediately)
+- **High**: Significant quality issue (fix this sprint)
+- **Medium**: Moderate improvement (schedule for next quarter)
+- **Low**: Minor optimization (backlog)
+
+### Score Calculation
+
+Provide quantitative scoring:
+
+```
+Overall Health Score (0-100):
+- 90-100: Excellent
+- 75-89: Good
+- 60-74: Fair
+- 40-59: Poor
+- 0-39: Critical
+
+Category Scores:
+- Security: Should always be 100
+- Compliance: Aim for 80+
+- Best Practices: 70+ is good
+```
+
+## Example: CLAUDE.md Auditor
+
+**Validation Against:**
+1. Official Anthropic documentation (docs.claude.com)
+2. Community best practices (field experience)
+3. Academic research (LLM context optimization)
+
+**Finding Categories:**
+- Security (secrets, sensitive data)
+- Official Compliance (Anthropic guidelines)
+- Best Practices (community recommendations)
+- Structure (organization, formatting)
+
+**Output Modes:**
+1. Audit Report - Detailed findings
+2. JSON Report - Machine-readable for CI/CD
+3. Refactored File - Production-ready output
+
+## Validation Workflow
+
+### Step 1: Discovery
+- Locate target artifact(s)
+- Calculate metrics (size, complexity)
+- Read content for analysis
+
+### Step 2: Analysis
+Run validators in priority order:
+1. Security Validation (CRITICAL)
+2. Official Compliance
+3. Best Practices
+4. Optimization Opportunities
+
+### Step 3: Scoring
+- Calculate overall health score
+- Generate category-specific scores
+- Count findings by severity
+
+### Step 4: Reporting
+- Generate human-readable report
+- Provide machine-readable output
+- Offer remediation options
+
+## Best Practices
+
+1. **Prioritize Security**: Always check security first
+2. **Source Attribution**: Label each finding with its source
+3. **Actionable Remediation**: Provide specific fix instructions
+4. **Multiple Output Formats**: Support markdown, JSON, HTML
+5. **Incremental Improvement**: Don't overwhelm with all issues
+6. **Track Over Time**: Support baseline comparisons
+7. **CI/CD Integration**: Provide exit codes and JSON output
+
+## Report Structure
+
+```markdown
+# [Artifact] Audit Report
+
+## Executive Summary
+- Overall health score: [X/100]
+- Critical findings: [count]
+- High findings: [count]
+- Top 3 priorities
+
+## File Metrics
+- [Relevant size/complexity metrics]
+
+## Detailed Findings
+
+### Critical Issues
+[Grouped by category]
+
+### High Priority Issues
+[Grouped by category]
+
+### Medium Priority Issues
+[Grouped by category]
+
+## Remediation Plan
+- P0: IMMEDIATE (critical)
+- P1: THIS SPRINT (high)
+- P2: NEXT QUARTER (medium)
+- P3: BACKLOG (low)
+```
+
+## Success Criteria Template
+
+```markdown
+A well-validated [artifact] should achieve:
+
+- ✅ Security Score: 100/100
+- ✅ Compliance Score: 80+/100
+- ✅ Overall Health: 75+/100
+- ✅ Zero CRITICAL findings
+- ✅ < 3 HIGH findings
+- ✅ [Artifact-specific criteria]
+```