Initial commit

2025-11-29 18:16:40 +08:00
commit f125e90b9f
370 changed files with 67769 additions and 0 deletions
--- a/skills/meta/insight-skill-generator/workflow/phase-1-discovery.md
+++ b/skills/meta/insight-skill-generator/workflow/phase-1-discovery.md
@@ -0,0 +1,139 @@
+# Phase 1: Insight Discovery and Parsing
+
+**Purpose**: Locate, read, deduplicate, and structure all insights from the project's lessons-learned directory.
+
+## Steps
+
+### 1. Verify project structure
+- Ask user for project root directory (default: current working directory)
+- Check if `docs/lessons-learned/` exists
+- If not found, explain the expected structure and offer to search alternative locations
+- List all categories found (testing, configuration, hooks-and-events, etc.)
+
+### 2. Scan and catalog insight files
+
+**File Naming Convention**:
+Files MUST follow: `YYYY-MM-DD-descriptive-slug.md`
+- Date prefix for chronological sorting
+- Descriptive slug (3-5 words) summarizing the insight topic
+- Examples:
+  - `2025-11-21-jwt-refresh-token-pattern.md`
+  - `2025-11-20-vitest-mocking-best-practices.md`
+  - `2025-11-19-react-testing-library-queries.md`
+
+**Scanning**:
+- Use Glob tool to find all markdown files: `docs/lessons-learned/**/*.md`
+- For each file found, extract:
+  - File path and category (from directory name)
+  - Creation date (from filename prefix)
+  - Descriptive title (from filename slug)
+  - File size and line count
+- Build initial inventory report
+
+### 3. Deduplicate insights (CRITICAL)
+
+**Why**: The extraction hook may create duplicate entries within files.
+
+**Deduplication Algorithm**:
+```python
+def deduplicate_insights(insights):
+    seen_hashes = set()
+    unique_insights = []
+
+    for insight in insights:
+        # Create hash from normalized content
+        content_hash = hash(normalize(insight.title + insight.content[:200]))
+
+        if content_hash not in seen_hashes:
+            seen_hashes.add(content_hash)
+            unique_insights.append(insight)
+        else:
+            log_duplicate(insight)
+
+    return unique_insights
+```
+
+**Deduplication Checks**:
+- Exact title match → duplicate
+- First 200 chars content match → duplicate
+- Same code blocks in same order → duplicate
+- Report: "Found X insights, removed Y duplicates (Z unique)"
+
+### 4. Parse individual insights
+- Read each file using Read tool
+- Extract session metadata (session ID, timestamp from file headers)
+- Split file content on `---` separator (insights are separated by horizontal rules)
+- For each insight section:
+  - Extract title (first line, often wrapped in `**bold**`)
+  - Extract body content (remaining markdown)
+  - Identify code blocks
+  - Extract actionable items (lines starting with `- [ ]` or numbered lists)
+  - Note any warnings/cautions
+
+### 5. Apply quality filters
+
+**Filter out low-depth insights** that are:
+- Basic explanatory notes without actionable steps
+- Simple definitions or concept explanations
+- Single-paragraph observations
+
+**Keep insights that have**:
+- Actionable workflows (numbered steps, checklists)
+- Decision frameworks (trade-offs, when to use X vs Y)
+- Code patterns with explanation of WHY
+- Troubleshooting guides with solutions
+- Best practices with concrete examples
+
+**Quality Score Calculation**:
+```
+score = 0
+if has_actionable_items: score += 3
+if has_code_examples: score += 2
+if has_numbered_steps: score += 2
+if word_count > 200: score += 1
+if has_warnings_or_notes: score += 1
+
+# Minimum score for skill consideration: 4
+```
+
+### 6. Build structured insight inventory
+```
+{
+  id: unique_id,
+  title: string,
+  content: string,
+  category: string,
+  date: ISO_date,
+  session_id: string,
+  source_file: path,
+  code_examples: [{ language, code }],
+  action_items: [string],
+  keywords: [string],
+  quality_score: int,
+  paragraph_count: int,
+  line_count: int
+}
+```
+
+### 7. Present discovery summary
+- Total insights found (before deduplication)
+- Duplicates removed
+- Low-quality insights filtered
+- **Final count**: Unique, quality insights
+- Category breakdown
+- Date range (earliest to latest)
+- Preview of top 5 insights by quality score
+
+## Output
+
+Deduplicated, quality-filtered inventory of insights with metadata and categorization.
+
+## Common Issues
+
+- **No lessons-learned directory**: Ask if user wants to search elsewhere or exit
+- **Empty files**: Skip and report count of empty files
+- **Malformed markdown**: Log warning but continue parsing (best effort)
+- **Missing session metadata**: Use filename date as fallback
+- **High duplicate count**: Indicates extraction hook bug - warn user
+- **All insights filtered as low-quality**: Lower threshold or suggest manual curation
+- **Files without descriptive names**: Suggest renaming for better organization