Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:56 +08:00
commit 8a3d331e04
61 changed files with 11808 additions and 0 deletions

View File

@@ -0,0 +1,247 @@
# Data Processing Skill Pattern
Use this pattern when your skill **processes, analyzes, or transforms** data to extract insights.
## When to Use
- Skill ingests data from files or APIs
- Performs analysis or transformation
- Generates insights, reports, or visualizations
- Examples: cc-insights (conversation analysis)
## Structure
### Data Flow Architecture
Define clear data pipeline:
```
Input Sources → Processing → Storage → Query/Analysis → Output
```
Example:
```
JSONL files → Parser → SQLite + Vector DB → Search/Analytics → Reports/Dashboard
```
### Processing Modes
**Batch Processing:**
- Process all data at once
- Good for: Initial setup, complete reprocessing
- Trade-off: Slow startup, complete data
**Incremental Processing:**
- Process only new/changed data
- Good for: Regular updates, performance
- Trade-off: Complex state tracking
**Streaming Processing:**
- Process data as it arrives
- Good for: Real-time updates
- Trade-off: Complex implementation
### Storage Strategy
Choose appropriate storage:
**SQLite:**
- Structured metadata
- Fast queries
- Relational data
- Good for: Indexes, aggregations
**Vector Database (ChromaDB):**
- Semantic embeddings
- Similarity search
- Good for: RAG, semantic queries
**File System:**
- Raw data
- Large blobs
- Good for: Backups, archives
## Example: CC Insights
**Input**: Claude Code conversation JSONL files
**Processing Pipeline:**
1. JSONL Parser - Decode base64, extract messages
2. Metadata Extractor - Timestamps, files, tools
3. Embeddings Generator - Vector representations
4. Pattern Detector - Identify trends
**Storage:**
- SQLite: Conversation metadata, fast queries
- ChromaDB: Vector embeddings, semantic search
- Cache: Processed conversation data
**Query Interfaces:**
1. CLI Search - Command-line semantic search
2. Insight Generator - Pattern-based reports
3. Dashboard - Interactive web UI
**Outputs:**
- Search results with similarity scores
- Weekly activity reports
- File heatmaps
- Tool usage analytics
## Data Processing Workflow
### Phase 1: Ingestion
```markdown
1. **Discover Data Sources**
- Locate input files/APIs
- Validate accessibility
- Calculate scope (file count, size)
2. **Initial Validation**
- Check format validity
- Verify schema compliance
- Estimate processing time
3. **State Management**
- Track what's been processed
- Support incremental updates
- Handle failures gracefully
```
### Phase 2: Processing
```markdown
1. **Parse/Transform**
- Read raw data
- Apply transformations
- Handle errors and edge cases
2. **Extract Features**
- Generate metadata
- Calculate metrics
- Create embeddings (if semantic search)
3. **Store Results**
- Write to database(s)
- Update indexes
- Maintain consistency
```
### Phase 3: Analysis
```markdown
1. **Query Interface**
- Support multiple query types
- Optimize for common patterns
- Return ranked results
2. **Pattern Detection**
- Aggregate data
- Identify trends
- Generate insights
3. **Visualization**
- Format for human consumption
- Support multiple output formats
- Interactive when possible
```
## Performance Characteristics
Document expected performance:
```markdown
### Performance Characteristics
- **Initial indexing**: ~1-2 minutes for 100 records
- **Incremental updates**: <5 seconds for new records
- **Search latency**: <1 second for queries
- **Report generation**: <10 seconds for standard reports
- **Memory usage**: ~200MB for 1000 records
```
## Best Practices
1. **Incremental Processing**: Don't reprocess everything on each run
2. **State Tracking**: Track what's been processed to avoid duplicates
3. **Batch Operations**: Process in batches for memory efficiency
4. **Progress Indicators**: Show progress for long operations
5. **Error Recovery**: Handle failures gracefully, resume where left off
6. **Data Validation**: Validate inputs before expensive processing
7. **Index Optimization**: Optimize databases for common queries
8. **Memory Management**: Stream large files, don't load everything
9. **Parallel Processing**: Use parallelism when possible
10. **Cache Wisely**: Cache expensive computations
## Scripts Structure
For data processing skills, provide helper scripts:
```
scripts/
├── processor.py # Main data processing script
├── indexer.py # Build indexes/embeddings
├── query.py # Query interface (CLI)
└── generator.py # Report/insight generation
```
### Script Best Practices
```python
# Good patterns for processing scripts:
# 1. Use click for CLI
import click
@click.command()
@click.option('--input', help='Input path')
@click.option('--reindex', is_flag=True)
def process(input, reindex):
"""Process data from input source."""
pass
# 2. Show progress
from tqdm import tqdm
for item in tqdm(items, desc="Processing"):
process_item(item)
# 3. Handle errors gracefully
try:
result = process_item(item)
except Exception as e:
logger.error(f"Failed to process {item}: {e}")
continue # Continue with next item
# 4. Support incremental updates
if not reindex and is_already_processed(item):
continue
# 5. Use batch processing
for batch in chunks(items, batch_size=32):
process_batch(batch)
```
## Storage Schema
Document your data schema:
```sql
-- Example SQLite schema
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
timestamp INTEGER,
message_count INTEGER,
files_modified TEXT, -- JSON array
tools_used TEXT -- JSON array
);
CREATE INDEX idx_timestamp ON conversations(timestamp);
CREATE INDEX idx_files ON conversations(files_modified);
```
## Output Formats
Support multiple output formats:
1. **Markdown**: Human-readable reports
2. **JSON**: Machine-readable for integration
3. **CSV**: Spreadsheet-compatible data
4. **HTML**: Styled reports with charts
5. **Interactive**: Web dashboards (optional)

View File

@@ -0,0 +1,78 @@
# Mode-Based Skill Pattern
Use this pattern when your skill has **multiple distinct operating modes** based on user intent.
## When to Use
- Skill performs fundamentally different operations based on context
- Each mode has its own workflow and outputs
- User intent determines which mode to activate
- Examples: git-worktree-setup (single/batch/cleanup/list modes)
## Structure
### Quick Decision Matrix
Create a clear mapping of user requests to modes:
```
User Request → Mode → Action
───────────────────────────────────────────────────────────
"trigger phrase 1" → Mode 1 → High-level action
"trigger phrase 2" → Mode 2 → High-level action
"trigger phrase 3" → Mode 3 → High-level action
```
### Mode Detection Logic
Provide clear logic for mode selection:
```javascript
// Mode 1: [Name]
if (userMentions("keyword1", "keyword2")) {
return "mode1-name";
}
// Mode 2: [Name]
if (userMentions("keyword3", "keyword4")) {
return "mode2-name";
}
// Ambiguous - ask user
return askForClarification();
```
### Separate Mode Documentation
For complex skills, create separate files for each mode:
```
skill-name/
├── SKILL.md # Overview and mode detection
├── modes/
│ ├── mode1-name.md # Detailed workflow for mode 1
│ ├── mode2-name.md # Detailed workflow for mode 2
│ └── mode3-name.md # Detailed workflow for mode 3
```
## Example: Git Worktree Setup
**Modes:**
1. Single Worktree - Create one worktree
2. Batch Worktrees - Create multiple worktrees
3. Cleanup - Remove worktrees
4. List/Manage - Show worktree status
**Detection Logic:**
- "create worktree for X" → Single mode
- "create worktrees for A, B, C" → Batch mode
- "remove worktree" → Cleanup mode
- "list worktrees" → List mode
## Best Practices
1. **Clear Mode Boundaries**: Each mode should be distinct and non-overlapping
2. **Explicit Detection**: Provide clear rules for mode selection
3. **Clarification Path**: Always have a fallback to ask user when ambiguous
4. **Mode Independence**: Each mode should work standalone
5. **Shared Prerequisites**: Extract common validation to reduce duplication

View File

@@ -0,0 +1,115 @@
# Phase-Based Skill Pattern
Use this pattern when your skill follows **sequential phases** that build on each other.
## When to Use
- Skill has a linear workflow with clear stages
- Each phase depends on the previous phase
- Progressive disclosure of complexity
- Examples: codebase-auditor (discovery → analysis → reporting → remediation)
## Structure
### Phase Overview
Define clear phases with dependencies:
```
Phase 1: Discovery
Phase 2: Analysis
Phase 3: Reporting
Phase 4: Action/Remediation
```
### Phase Workflow Template
```markdown
## Workflow
### Phase 1: [Name]
**Purpose**: [What this phase accomplishes]
**Steps:**
1. [Step 1]
2. [Step 2]
3. [Step 3]
**Output**: [What information is produced]
**Transition**: [When to move to next phase]
### Phase 2: [Name]
**Purpose**: [What this phase accomplishes]
**Inputs**: [Required from previous phase]
**Steps:**
1. [Step 1]
2. [Step 2]
**Output**: [What information is produced]
```
## Example: Codebase Auditor
**Phase 1: Initial Assessment** (Progressive Disclosure)
- Lightweight scan to understand codebase
- Identify tech stack and structure
- Quick health check
- **Output**: Project profile and initial findings
**Phase 2: Deep Analysis** (Load on Demand)
- Based on Phase 1, perform targeted analysis
- Code quality, security, testing, etc.
- **Output**: Detailed findings with severity
**Phase 3: Report Generation**
- Aggregate findings from Phase 2
- Calculate scores and metrics
- **Output**: Comprehensive audit report
**Phase 4: Remediation Planning**
- Prioritize findings by severity
- Generate action plan
- **Output**: Prioritized task list
## Best Practices
1. **Progressive Disclosure**: Start lightweight, go deep only when needed
2. **Clear Transitions**: Explicitly state when moving between phases
3. **Phase Independence**: Each phase should have clear inputs/outputs
4. **Checkpoint Validation**: Verify prerequisites before advancing
5. **Early Exit**: Allow stopping after any phase if user only needs partial analysis
6. **Incremental Value**: Each phase should provide standalone value
## Phase Characteristics
### Discovery Phase
- Fast and lightweight
- Gather context and identify scope
- No expensive operations
- Output guides subsequent phases
### Analysis Phase
- Deep dive based on discovery
- Resource-intensive operations
- Parallel processing when possible
- Structured output for reporting
### Reporting Phase
- Aggregate and synthesize data
- Calculate metrics and scores
- Generate human-readable output
- Support multiple formats
### Action Phase
- Provide recommendations
- Generate implementation guidance
- Offer to perform actions
- Track completion

View File

@@ -0,0 +1,174 @@
# Validation/Audit Skill Pattern
Use this pattern when your skill **validates, audits, or checks** artifacts against standards.
## When to Use
- Skill checks compliance against defined standards
- Detects issues and provides remediation guidance
- Generates reports with severity levels
- Examples: claude-md-auditor, codebase-auditor
## Structure
### Validation Sources
Clearly define what you're validating against:
```markdown
## Validation Sources
### 1. ✅ Official Standards
- **Source**: [Authority/documentation]
- **Authority**: Highest (requirements)
- **Examples**: [List key standards]
### 2. 💡 Best Practices
- **Source**: Community/field experience
- **Authority**: Medium (recommendations)
- **Examples**: [List practices]
### 3. 🔬 Research/Optimization
- **Source**: Academic research
- **Authority**: Medium (evidence-based)
- **Examples**: [List findings]
```
### Finding Structure
Use consistent structure for all findings:
```markdown
**Severity**: Critical | High | Medium | Low
**Category**: [Type of issue]
**Location**: [File:line or context]
**Description**: [What the issue is]
**Impact**: [Why it matters]
**Remediation**: [How to fix]
**Effort**: [Time estimate]
**Source**: Official | Community | Research
```
### Severity Levels
Define clear severity criteria:
- **Critical**: Security risk, production-blocking (fix immediately)
- **High**: Significant quality issue (fix this sprint)
- **Medium**: Moderate improvement (schedule for next quarter)
- **Low**: Minor optimization (backlog)
### Score Calculation
Provide quantitative scoring:
```
Overall Health Score (0-100):
- 90-100: Excellent
- 75-89: Good
- 60-74: Fair
- 40-59: Poor
- 0-39: Critical
Category Scores:
- Security: Should always be 100
- Compliance: Aim for 80+
- Best Practices: 70+ is good
```
## Example: CLAUDE.md Auditor
**Validation Against:**
1. Official Anthropic documentation (docs.claude.com)
2. Community best practices (field experience)
3. Academic research (LLM context optimization)
**Finding Categories:**
- Security (secrets, sensitive data)
- Official Compliance (Anthropic guidelines)
- Best Practices (community recommendations)
- Structure (organization, formatting)
**Output Modes:**
1. Audit Report - Detailed findings
2. JSON Report - Machine-readable for CI/CD
3. Refactored File - Production-ready output
## Validation Workflow
### Step 1: Discovery
- Locate target artifact(s)
- Calculate metrics (size, complexity)
- Read content for analysis
### Step 2: Analysis
Run validators in priority order:
1. Security Validation (CRITICAL)
2. Official Compliance
3. Best Practices
4. Optimization Opportunities
### Step 3: Scoring
- Calculate overall health score
- Generate category-specific scores
- Count findings by severity
### Step 4: Reporting
- Generate human-readable report
- Provide machine-readable output
- Offer remediation options
## Best Practices
1. **Prioritize Security**: Always check security first
2. **Source Attribution**: Label each finding with its source
3. **Actionable Remediation**: Provide specific fix instructions
4. **Multiple Output Formats**: Support markdown, JSON, HTML
5. **Incremental Improvement**: Don't overwhelm with all issues
6. **Track Over Time**: Support baseline comparisons
7. **CI/CD Integration**: Provide exit codes and JSON output
## Report Structure
```markdown
# [Artifact] Audit Report
## Executive Summary
- Overall health score: [X/100]
- Critical findings: [count]
- High findings: [count]
- Top 3 priorities
## File Metrics
- [Relevant size/complexity metrics]
## Detailed Findings
### Critical Issues
[Grouped by category]
### High Priority Issues
[Grouped by category]
### Medium Priority Issues
[Grouped by category]
## Remediation Plan
- P0: IMMEDIATE (critical)
- P1: THIS SPRINT (high)
- P2: NEXT QUARTER (medium)
- P3: BACKLOG (low)
```
## Success Criteria Template
```markdown
A well-validated [artifact] should achieve:
- ✅ Security Score: 100/100
- ✅ Compliance Score: 80+/100
- ✅ Overall Health: 75+/100
- ✅ Zero CRITICAL findings
- ✅ < 3 HIGH findings
- ✅ [Artifact-specific criteria]
```