Initial commit
This commit is contained in:
@@ -0,0 +1,139 @@
|
||||
# Phase 1: Insight Discovery and Parsing
|
||||
|
||||
**Purpose**: Locate, read, deduplicate, and structure all insights from the project's lessons-learned directory.
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Verify project structure
|
||||
- Ask user for project root directory (default: current working directory)
|
||||
- Check if `docs/lessons-learned/` exists
|
||||
- If not found, explain the expected structure and offer to search alternative locations
|
||||
- List all categories found (testing, configuration, hooks-and-events, etc.)
|
||||
|
||||
### 2. Scan and catalog insight files
|
||||
|
||||
**File Naming Convention**:
|
||||
Files MUST follow: `YYYY-MM-DD-descriptive-slug.md`
|
||||
- Date prefix for chronological sorting
|
||||
- Descriptive slug (3-5 words) summarizing the insight topic
|
||||
- Examples:
|
||||
- `2025-11-21-jwt-refresh-token-pattern.md`
|
||||
- `2025-11-20-vitest-mocking-best-practices.md`
|
||||
- `2025-11-19-react-testing-library-queries.md`
|
||||
|
||||
**Scanning**:
|
||||
- Use Glob tool to find all markdown files: `docs/lessons-learned/**/*.md`
|
||||
- For each file found, extract:
|
||||
- File path and category (from directory name)
|
||||
- Creation date (from filename prefix)
|
||||
- Descriptive title (from filename slug)
|
||||
- File size and line count
|
||||
- Build initial inventory report
|
||||
|
||||
### 3. Deduplicate insights (CRITICAL)
|
||||
|
||||
**Why**: The extraction hook may create duplicate entries within files.
|
||||
|
||||
**Deduplication Algorithm**:
|
||||
```python
|
||||
def deduplicate_insights(insights):
|
||||
seen_hashes = set()
|
||||
unique_insights = []
|
||||
|
||||
for insight in insights:
|
||||
# Create hash from normalized content
|
||||
content_hash = hash(normalize(insight.title + insight.content[:200]))
|
||||
|
||||
if content_hash not in seen_hashes:
|
||||
seen_hashes.add(content_hash)
|
||||
unique_insights.append(insight)
|
||||
else:
|
||||
log_duplicate(insight)
|
||||
|
||||
return unique_insights
|
||||
```
|
||||
|
||||
**Deduplication Checks**:
|
||||
- Exact title match → duplicate
|
||||
- First 200 chars content match → duplicate
|
||||
- Same code blocks in same order → duplicate
|
||||
- Report: "Found X insights, removed Y duplicates (Z unique)"
|
||||
|
||||
### 4. Parse individual insights
|
||||
- Read each file using Read tool
|
||||
- Extract session metadata (session ID, timestamp from file headers)
|
||||
- Split file content on `---` separator (insights are separated by horizontal rules)
|
||||
- For each insight section:
|
||||
- Extract title (first line, often wrapped in `**bold**`)
|
||||
- Extract body content (remaining markdown)
|
||||
- Identify code blocks
|
||||
- Extract actionable items (lines starting with `- [ ]` or numbered lists)
|
||||
- Note any warnings/cautions
|
||||
|
||||
### 5. Apply quality filters
|
||||
|
||||
**Filter out low-depth insights** that are:
|
||||
- Basic explanatory notes without actionable steps
|
||||
- Simple definitions or concept explanations
|
||||
- Single-paragraph observations
|
||||
|
||||
**Keep insights that have**:
|
||||
- Actionable workflows (numbered steps, checklists)
|
||||
- Decision frameworks (trade-offs, when to use X vs Y)
|
||||
- Code patterns with explanation of WHY
|
||||
- Troubleshooting guides with solutions
|
||||
- Best practices with concrete examples
|
||||
|
||||
**Quality Score Calculation**:
|
||||
```
|
||||
score = 0
|
||||
if has_actionable_items: score += 3
|
||||
if has_code_examples: score += 2
|
||||
if has_numbered_steps: score += 2
|
||||
if word_count > 200: score += 1
|
||||
if has_warnings_or_notes: score += 1
|
||||
|
||||
# Minimum score for skill consideration: 4
|
||||
```
|
||||
|
||||
### 6. Build structured insight inventory
|
||||
```
|
||||
{
|
||||
id: unique_id,
|
||||
title: string,
|
||||
content: string,
|
||||
category: string,
|
||||
date: ISO_date,
|
||||
session_id: string,
|
||||
source_file: path,
|
||||
code_examples: [{ language, code }],
|
||||
action_items: [string],
|
||||
keywords: [string],
|
||||
quality_score: int,
|
||||
paragraph_count: int,
|
||||
line_count: int
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Present discovery summary
|
||||
- Total insights found (before deduplication)
|
||||
- Duplicates removed
|
||||
- Low-quality insights filtered
|
||||
- **Final count**: Unique, quality insights
|
||||
- Category breakdown
|
||||
- Date range (earliest to latest)
|
||||
- Preview of top 5 insights by quality score
|
||||
|
||||
## Output
|
||||
|
||||
Deduplicated, quality-filtered inventory of insights with metadata and categorization.
|
||||
|
||||
## Common Issues
|
||||
|
||||
- **No lessons-learned directory**: Ask if user wants to search elsewhere or exit
|
||||
- **Empty files**: Skip and report count of empty files
|
||||
- **Malformed markdown**: Log warning but continue parsing (best effort)
|
||||
- **Missing session metadata**: Use filename date as fallback
|
||||
- **High duplicate count**: Indicates extraction hook bug - warn user
|
||||
- **All insights filtered as low-quality**: Lower threshold or suggest manual curation
|
||||
- **Files without descriptive names**: Suggest renaming for better organization
|
||||
Reference in New Issue
Block a user