Initial commit
This commit is contained in:
@@ -0,0 +1,286 @@
|
||||
# Example: Clustering Analysis Output
|
||||
|
||||
This example shows what the clustering phase produces when analyzing a project's insights.
|
||||
|
||||
## Scenario
|
||||
|
||||
A project has been using the extract-explanatory-insights hook for 2 weeks, generating 12 insights across different categories.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Discovery Summary
|
||||
|
||||
**Total Insights Found**: 12
|
||||
**Date Range**: 2025-11-01 to 2025-11-14
|
||||
**Unique Sessions**: 8
|
||||
**Categories**:
|
||||
- testing: 5 insights
|
||||
- hooks-and-events: 3 insights
|
||||
- architecture: 2 insights
|
||||
- performance: 2 insights
|
||||
|
||||
**Preview**:
|
||||
1. "Modern Testing Strategy with Testing Trophy" (testing, 2025-11-01)
|
||||
2. "Hook Deduplication Session Management" (hooks-and-events, 2025-11-03)
|
||||
3. "CPU Usage Prevention in Vitest" (testing, 2025-11-03)
|
||||
4. "BSD awk Compatibility in Hook Scripts" (hooks-and-events, 2025-11-05)
|
||||
5. "Semantic Query Priorities in React Testing Library" (testing, 2025-11-06)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Clustering Analysis
|
||||
|
||||
### Cluster 1: Testing Strategy
|
||||
**Size**: 3 insights
|
||||
**Similarity Score**: 0.75 (high)
|
||||
**Recommended Complexity**: Standard
|
||||
**Recommended Pattern**: Validation
|
||||
|
||||
**Insights**:
|
||||
1. "Modern Testing Strategy with Testing Trophy"
|
||||
- Keywords: testing, integration, unit, e2e, trophy, kent-c-dodds
|
||||
- Category: testing
|
||||
- Date: 2025-11-01
|
||||
- Length: 156 lines
|
||||
- Has code examples: Yes
|
||||
|
||||
2. "Semantic Query Priorities in React Testing Library"
|
||||
- Keywords: testing, react, semantic, query, getByRole, accessibility
|
||||
- Category: testing
|
||||
- Date: 2025-11-06
|
||||
- Length: 89 lines
|
||||
- Has code examples: Yes
|
||||
|
||||
3. "What NOT to Test - Brittle Patterns"
|
||||
- Keywords: testing, avoid, brittle, implementation-details, user-behavior
|
||||
- Category: testing
|
||||
- Date: 2025-11-08
|
||||
- Length: 67 lines
|
||||
- Has code examples: No
|
||||
|
||||
**Shared Keywords**: testing (3), react (2), user (2), behavior (2), semantic (2)
|
||||
|
||||
**Cluster Characteristics**:
|
||||
- All in same category (testing)
|
||||
- Temporal span: 7 days
|
||||
- Common theme: User-focused testing approach
|
||||
- Total code examples: 5 blocks
|
||||
- Actionable items: 12
|
||||
|
||||
**Suggested Skill Name**: "user-focused-testing-guide"
|
||||
|
||||
**Suggested Description**: "Use PROACTIVELY when writing tests to ensure user-centric testing strategy following Testing Trophy methodology and React Testing Library best practices"
|
||||
|
||||
**Skill Structure Recommendation**:
|
||||
```
|
||||
SKILL.md sections:
|
||||
- Overview (Testing Trophy philosophy)
|
||||
- Phase 1: Query Selection (semantic queries)
|
||||
- Phase 2: Test Writing (user workflows)
|
||||
- Phase 3: Avoiding Brittle Tests
|
||||
- Important Reminders (what NOT to test)
|
||||
- Examples (from code blocks)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Cluster 2: Hook Development
|
||||
**Size**: 2 insights
|
||||
**Similarity Score**: 0.68 (medium-high)
|
||||
**Recommended Complexity**: Minimal to Standard
|
||||
**Recommended Pattern**: Phase-based
|
||||
|
||||
**Insights**:
|
||||
1. "Hook Deduplication Session Management"
|
||||
- Keywords: hook, deduplication, session, hash, state
|
||||
- Category: hooks-and-events
|
||||
- Date: 2025-11-03
|
||||
- Length: 124 lines
|
||||
- Has code examples: Yes (bash)
|
||||
|
||||
2. "BSD awk Compatibility in Hook Scripts"
|
||||
- Keywords: hook, awk, bsd, macos, compatibility, portable
|
||||
- Category: hooks-and-events
|
||||
- Date: 2025-11-05
|
||||
- Length: 78 lines
|
||||
- Has code examples: Yes (bash)
|
||||
|
||||
**Shared Keywords**: hook (2), bash (2), script (2), compatibility (2)
|
||||
|
||||
**Cluster Characteristics**:
|
||||
- Same category (hooks-and-events)
|
||||
- Temporal span: 2 days (high proximity)
|
||||
- Common theme: Hook script development best practices
|
||||
- Total code examples: 4 blocks (all bash)
|
||||
- Actionable items: 8
|
||||
|
||||
**Suggested Skill Name**: "hook-script-development-guide"
|
||||
|
||||
**Suggested Description**: "Guides Claude Code hook script development with deduplication patterns and cross-platform compatibility best practices for macOS/Linux environments"
|
||||
|
||||
**Skill Structure Recommendation**:
|
||||
```
|
||||
SKILL.md sections:
|
||||
- Overview (Hook development challenges)
|
||||
- Phase 1: Deduplication Strategy (session vs content)
|
||||
- Phase 2: Cross-Platform Compatibility (BSD awk)
|
||||
- Phase 3: Testing and Validation
|
||||
- Examples (bash scripts)
|
||||
- Templates (hook-template.sh)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Cluster 3: Performance Optimization
|
||||
**Size**: 2 insights
|
||||
**Similarity Score**: 0.62 (medium)
|
||||
**Recommended Complexity**: Minimal
|
||||
**Recommended Pattern**: Phase-based
|
||||
|
||||
**Insights**:
|
||||
1. "CPU Usage Prevention in Vitest"
|
||||
- Keywords: vitest, cpu, performance, threads, watch-mode
|
||||
- Category: testing
|
||||
- Date: 2025-11-03
|
||||
- Length: 45 lines
|
||||
- Has code examples: Yes (npm scripts)
|
||||
|
||||
2. "Bundle Size Optimization"
|
||||
- Keywords: bundle, size, webpack, optimization, tree-shaking
|
||||
- Category: performance
|
||||
- Date: 2025-11-09
|
||||
- Length: 92 lines
|
||||
- Has code examples: Yes (webpack config)
|
||||
|
||||
**Shared Keywords**: performance (2), optimization (2), build (2)
|
||||
|
||||
**Cluster Characteristics**:
|
||||
- Different categories (testing, performance)
|
||||
- Temporal span: 6 days
|
||||
- Common theme: Build/test performance
|
||||
- Total code examples: 3 blocks
|
||||
- Actionable items: 6
|
||||
|
||||
**Suggested Skill Name**: "build-performance-optimizer"
|
||||
|
||||
**Suggested Description**: "Analyzes build and test performance issues and generates optimization recommendations for Vitest test runs and webpack bundle sizes"
|
||||
|
||||
**Skill Structure Recommendation**:
|
||||
```
|
||||
SKILL.md sections:
|
||||
- Overview (Performance impact on DX)
|
||||
- Phase 1: Test Performance Analysis
|
||||
- Phase 2: Bundle Size Analysis
|
||||
- Phase 3: Optimization Implementation
|
||||
- Important Reminders (check before test, monitor bundle)
|
||||
- Examples (npm scripts, webpack config)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Standalone Insights
|
||||
|
||||
#### Standalone 1: "Hook State Management Patterns"
|
||||
**Quality Score**: 0.85 (high - qualifies for standalone skill)
|
||||
**Category**: hooks-and-events
|
||||
**Date**: 2025-11-07
|
||||
**Length**: 134 lines
|
||||
**Code Examples**: Yes (3 blocks)
|
||||
|
||||
**Why Standalone**:
|
||||
- Doesn't cluster with other hook insights (different focus)
|
||||
- High quality with comprehensive coverage
|
||||
- Self-contained topic (state management)
|
||||
- Multiple actionable patterns
|
||||
|
||||
**Suggested Skill Name**: "hook-state-manager"
|
||||
|
||||
**Suggested Description**: "Automates state management setup for Claude Code hooks with persistent storage, cleanup, and safe concurrency patterns"
|
||||
|
||||
---
|
||||
|
||||
#### Standalone 2: "Architecture Decision Records"
|
||||
**Quality Score**: 0.82 (high - qualifies for standalone skill)
|
||||
**Category**: architecture
|
||||
**Date**: 2025-11-12
|
||||
**Length**: 156 lines
|
||||
**Code Examples**: Yes (template)
|
||||
|
||||
**Why Standalone**:
|
||||
- Unique topic (no other architecture insights)
|
||||
- High quality with complete template
|
||||
- Valuable for documentation
|
||||
- Industry best practice
|
||||
|
||||
**Suggested Skill Name**: "adr-documentation-helper"
|
||||
|
||||
**Suggested Description**: "Guides creation of Architecture Decision Records (ADRs) following industry standards with templates and integration with project documentation"
|
||||
|
||||
---
|
||||
|
||||
### Low-Quality Insights (Not Recommended for Skills)
|
||||
|
||||
#### "Git Branch Naming Convention"
|
||||
**Quality Score**: 0.42 (low)
|
||||
**Category**: version-control
|
||||
**Reason for Exclusion**: Too simple, covered by existing conventions, no unique value
|
||||
|
||||
#### "TypeScript Strict Mode Benefits"
|
||||
**Quality Score**: 0.38 (low)
|
||||
**Category**: typescript
|
||||
**Reason for Exclusion**: Common knowledge, well-documented elsewhere, not actionable enough
|
||||
|
||||
---
|
||||
|
||||
## User Decision Points
|
||||
|
||||
At this stage, the skill would present the following options to the user:
|
||||
|
||||
**Option 1: Generate All Recommended Skills** (5 skills)
|
||||
- user-focused-testing-guide (Cluster 1)
|
||||
- hook-script-development-guide (Cluster 2)
|
||||
- build-performance-optimizer (Cluster 3)
|
||||
- hook-state-manager (Standalone 1)
|
||||
- adr-documentation-helper (Standalone 2)
|
||||
|
||||
**Option 2: Select Specific Skills**
|
||||
- User picks which clusters/standalones to convert
|
||||
|
||||
**Option 3: Modify Clusters**
|
||||
- Split large clusters
|
||||
- Merge small clusters
|
||||
- Recategorize insights
|
||||
- Adjust complexity levels
|
||||
|
||||
**Option 4: Tune Thresholds and Retry**
|
||||
- Increase cluster_minimum (0.6 → 0.7) for tighter clusters
|
||||
- Decrease standalone_quality (0.8 → 0.7) for more standalone skills
|
||||
|
||||
---
|
||||
|
||||
## Proceeding to Phase 3
|
||||
|
||||
If user selects "user-focused-testing-guide" to generate, the skill would proceed to Phase 3: Interactive Skill Design with the following proposal:
|
||||
|
||||
**Skill Design Proposal**:
|
||||
- Name: `user-focused-testing-guide`
|
||||
- Description: "Use PROACTIVELY when writing tests to ensure user-centric testing strategy following Testing Trophy methodology and React Testing Library best practices"
|
||||
- Complexity: Standard
|
||||
- Pattern: Validation
|
||||
- Structure:
|
||||
- SKILL.md with validation workflow
|
||||
- data/insights-reference.md with 3 source insights
|
||||
- examples/query-examples.md with semantic query patterns
|
||||
- templates/test-checklist.md with testing checklist
|
||||
|
||||
User can then customize before generation begins.
|
||||
|
||||
---
|
||||
|
||||
**This example demonstrates**:
|
||||
1. How clustering groups related insights
|
||||
2. What information is presented for each cluster
|
||||
3. How standalone insights are identified
|
||||
4. Why some insights are excluded
|
||||
5. What decisions users can make
|
||||
6. How the process flows into Phase 3
|
||||
@@ -0,0 +1,24 @@
|
||||
# Changelog
|
||||
|
||||
## [0.1.0] - 2025-11-16
|
||||
|
||||
### Added
|
||||
- Initial release
|
||||
- Generated from 1 insight (Hook Deduplication Session Management)
|
||||
- Phase 1: Choose Deduplication Strategy
|
||||
- Phase 2: Implement Content-Based Deduplication
|
||||
- Phase 3: Implement Hash Rotation
|
||||
- Phase 4: Testing and Validation
|
||||
- Code examples for bash hook implementation
|
||||
- Troubleshooting section
|
||||
|
||||
### Features
|
||||
- Content-based deduplication using SHA256 hashes
|
||||
- Session-independent duplicate detection
|
||||
- Efficient hash storage with rotation
|
||||
- State management best practices
|
||||
|
||||
### Generated By
|
||||
- insight-skill-generator v0.1.0
|
||||
- Source category: hooks-and-events
|
||||
- Original insight date: 2025-11-03
|
||||
@@ -0,0 +1,51 @@
|
||||
# Hook Deduplication Guide
|
||||
|
||||
Implement robust content-based deduplication for Claude Code hooks.
|
||||
|
||||
## Overview
|
||||
|
||||
This skill guides you through implementing SHA256 hash-based deduplication to prevent duplicate insights or data from being stored across sessions.
|
||||
|
||||
## When to Use
|
||||
|
||||
**Trigger Phrases**:
|
||||
- "implement hook deduplication"
|
||||
- "prevent duplicate insights in hooks"
|
||||
- "content-based deduplication for hooks"
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Test the skill
|
||||
You: "I need to add deduplication to my hook to prevent storing the same insight twice"
|
||||
|
||||
Claude: [Activates hook-deduplication-guide]
|
||||
- Explains content-based vs session-based strategies
|
||||
- Guides implementation of SHA256 hashing
|
||||
- Shows hash rotation to prevent file bloat
|
||||
- Provides testing validation
|
||||
```
|
||||
|
||||
## What You'll Get
|
||||
|
||||
- Content-based deduplication using SHA256
|
||||
- Efficient hash storage with rotation
|
||||
- Testing and validation guidance
|
||||
- Best practices for hook state management
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# This is an example generated by insight-skill-generator
|
||||
# Copy to your skills directory if you want to use it
|
||||
cp -r examples/example-generated-skill ~/.claude/skills/hook-deduplication-guide
|
||||
```
|
||||
|
||||
## Learn More
|
||||
|
||||
See [SKILL.md](SKILL.md) for complete workflow documentation.
|
||||
|
||||
---
|
||||
|
||||
**Generated by**: insight-skill-generator v0.1.0
|
||||
**Source**: 1 insight from hooks-and-events category
|
||||
@@ -0,0 +1,342 @@
|
||||
---
|
||||
name: hook-deduplication-guide
|
||||
description: Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions
|
||||
---
|
||||
|
||||
# Hook Deduplication Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This skill guides you through implementing robust deduplication for Claude Code hooks, using content-based hashing instead of session-based tracking. Prevents duplicate insights from being stored while allowing multiple unique insights per session.
|
||||
|
||||
**Based on 1 insight**:
|
||||
- Hook Deduplication Session Management (hooks-and-events, 2025-11-03)
|
||||
|
||||
**Key Capabilities**:
|
||||
- Content-based deduplication using SHA256 hashes
|
||||
- Session-independent duplicate detection
|
||||
- Efficient hash storage with rotation
|
||||
- State management best practices
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
**Trigger Phrases**:
|
||||
- "implement hook deduplication"
|
||||
- "prevent duplicate insights in hooks"
|
||||
- "content-based deduplication for hooks"
|
||||
- "hook state management patterns"
|
||||
|
||||
**Use Cases**:
|
||||
- Developing new Claude Code hooks that store data
|
||||
- Refactoring hooks to prevent duplicates
|
||||
- Implementing efficient state management for hooks
|
||||
- Debugging duplicate data issues in hooks
|
||||
|
||||
**Do NOT use when**:
|
||||
- Creating hooks that don't store data (read-only hooks)
|
||||
- Session-based deduplication is actually desired
|
||||
- Hook doesn't run frequently enough to need deduplication
|
||||
|
||||
## Response Style
|
||||
|
||||
Educational and practical - explain the why behind content-based vs. session-based deduplication, then guide implementation with code examples.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Phase 1: Choose Deduplication Strategy
|
||||
|
||||
**Purpose**: Determine whether content-based or session-based deduplication is appropriate.
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. **Assess hook behavior**:
|
||||
- How often does the hook run? (per message, per session, per event)
|
||||
- What data is being stored? (insights, logs, metrics)
|
||||
- Is the same content likely to appear across sessions?
|
||||
|
||||
2. **Evaluate deduplication needs**:
|
||||
- **Content-based**: Use when the same insight/data might appear in different sessions
|
||||
- Example: Extract-explanatory-insights hook (same insight might appear in multiple conversations)
|
||||
- **Session-based**: Use when duplicates should only be prevented within a session
|
||||
- Example: Error logging (same error in different sessions should be logged)
|
||||
|
||||
3. **Recommend strategy**:
|
||||
- For insights/lessons-learned: Content-based (SHA256 hashing)
|
||||
- For session logs/events: Session-based (session ID tracking)
|
||||
- For unique events: No deduplication needed
|
||||
|
||||
**Output**: Clear recommendation on deduplication strategy.
|
||||
|
||||
**Common Issues**:
|
||||
- **Unsure which to use**: Default to content-based for data that's meant to be unique (insights, documentation)
|
||||
- **Performance concerns**: Content-based hashing is fast (<1ms for typical content)
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Implement Content-Based Deduplication
|
||||
|
||||
**Purpose**: Set up SHA256 hash-based deduplication with state management.
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. **Create state directory**:
|
||||
```bash
|
||||
mkdir -p ~/.claude/state/hook-state/
|
||||
```
|
||||
|
||||
2. **Initialize hash storage file**:
|
||||
```bash
|
||||
HASH_FILE="$HOME/.claude/state/hook-state/content-hashes.txt"
|
||||
touch "$HASH_FILE"
|
||||
```
|
||||
|
||||
3. **Implement hash generation**:
|
||||
```bash
|
||||
# Generate SHA256 hash of content
|
||||
compute_content_hash() {
|
||||
local content="$1"
|
||||
echo -n "$content" | sha256sum | awk '{print $1}'
|
||||
}
|
||||
```
|
||||
|
||||
4. **Check for duplicates**:
|
||||
```bash
|
||||
# Returns 0 if content is new, 1 if duplicate
|
||||
is_duplicate() {
|
||||
local content="$1"
|
||||
local content_hash=$(compute_content_hash "$content")
|
||||
|
||||
if grep -Fxq "$content_hash" "$HASH_FILE"; then
|
||||
return 1 # Duplicate found
|
||||
else
|
||||
return 0 # New content
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
5. **Store hash after processing**:
|
||||
```bash
|
||||
store_content_hash() {
|
||||
local content="$1"
|
||||
local content_hash=$(compute_content_hash "$content")
|
||||
echo "$content_hash" >> "$HASH_FILE"
|
||||
}
|
||||
```
|
||||
|
||||
6. **Integrate into hook**:
|
||||
```bash
|
||||
# In your hook script
|
||||
content="extracted insight or data"
|
||||
|
||||
if is_duplicate "$content"; then
|
||||
# Skip - duplicate content
|
||||
echo "Duplicate detected, skipping..." >&2
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Process new content
|
||||
process_content "$content"
|
||||
|
||||
# Store hash to prevent future duplicates
|
||||
store_content_hash "$content"
|
||||
```
|
||||
|
||||
**Output**: Working content-based deduplication in your hook.
|
||||
|
||||
**Common Issues**:
|
||||
- **Hash file grows too large**: Implement rotation (see Phase 3)
|
||||
- **False positives**: Ensure content normalization (whitespace, formatting)
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Implement Hash Rotation
|
||||
|
||||
**Purpose**: Prevent hash file from growing indefinitely.
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. **Set rotation limit**:
|
||||
```bash
|
||||
MAX_HASHES=10000 # Keep last 10,000 hashes
|
||||
```
|
||||
|
||||
2. **Implement rotation logic**:
|
||||
```bash
|
||||
rotate_hash_file() {
|
||||
local hash_file="$1"
|
||||
local max_hashes="${2:-10000}"
|
||||
|
||||
# Count current hashes
|
||||
local current_count=$(wc -l < "$hash_file")
|
||||
|
||||
# Rotate if needed
|
||||
if [ "$current_count" -gt "$max_hashes" ]; then
|
||||
tail -n "$max_hashes" "$hash_file" > "${hash_file}.tmp"
|
||||
mv "${hash_file}.tmp" "$hash_file"
|
||||
echo "Rotated hash file: kept last $max_hashes hashes" >&2
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
3. **Call rotation periodically**:
|
||||
```bash
|
||||
# After storing new hash
|
||||
store_content_hash "$content"
|
||||
rotate_hash_file "$HASH_FILE" 10000
|
||||
```
|
||||
|
||||
**Output**: Self-maintaining hash storage with bounded size.
|
||||
|
||||
**Common Issues**:
|
||||
- **Rotation too aggressive**: Increase MAX_HASHES
|
||||
- **Rotation too infrequent**: Consider checking count before every append
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Testing and Validation
|
||||
|
||||
**Purpose**: Verify deduplication works correctly.
|
||||
|
||||
**Steps**:
|
||||
|
||||
1. **Test duplicate detection**:
|
||||
```bash
|
||||
# First run - should process
|
||||
echo "Test insight" | your_hook.sh
|
||||
# Check: Content was processed
|
||||
|
||||
# Second run - should skip
|
||||
echo "Test insight" | your_hook.sh
|
||||
# Check: Duplicate detected message
|
||||
```
|
||||
|
||||
2. **Test multiple unique items**:
|
||||
```bash
|
||||
echo "Insight 1" | your_hook.sh # Processed
|
||||
echo "Insight 2" | your_hook.sh # Processed
|
||||
echo "Insight 3" | your_hook.sh # Processed
|
||||
echo "Insight 1" | your_hook.sh # Skipped (duplicate)
|
||||
```
|
||||
|
||||
3. **Verify hash file**:
|
||||
```bash
|
||||
cat ~/.claude/state/hook-state/content-hashes.txt
|
||||
# Should show 3 unique hashes (not 4)
|
||||
```
|
||||
|
||||
4. **Test rotation**:
|
||||
```bash
|
||||
# Generate more than MAX_HASHES entries
|
||||
for i in {1..10500}; do
|
||||
echo "Insight $i" | your_hook.sh
|
||||
done
|
||||
|
||||
# Verify file size bounded
|
||||
wc -l ~/.claude/state/hook-state/content-hashes.txt
|
||||
# Should be ~10000, not 10500
|
||||
```
|
||||
|
||||
**Output**: Confirmed working deduplication with proper rotation.
|
||||
|
||||
---
|
||||
|
||||
## Reference Materials
|
||||
|
||||
- [Original Insight](data/insights-reference.md) - Full context on hook deduplication patterns
|
||||
|
||||
---
|
||||
|
||||
## Important Reminders
|
||||
|
||||
- **Use content-based deduplication for insights/documentation** - prevents duplicates across sessions
|
||||
- **Use session-based deduplication for logs/events** - same event in different sessions is meaningful
|
||||
- **Normalize content before hashing** - whitespace differences shouldn't create false negatives
|
||||
- **Implement rotation** - prevent unbounded hash file growth
|
||||
- **Hash storage location**: `~/.claude/state/hook-state/` (not project-specific)
|
||||
- **SHA256 is fast** - no performance concerns for typical hook data
|
||||
- **Test both paths** - verify both new content and duplicates work correctly
|
||||
|
||||
**Warnings**:
|
||||
- ⚠️ **Do not use session ID alone** - prevents same insight in different sessions from being stored
|
||||
- ⚠️ **Do not skip rotation** - hash file will grow indefinitely
|
||||
- ⚠️ **Do not hash before normalization** - formatting changes will cause false negatives
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Choose the Right Strategy**: Content-based for unique data, session-based for session-specific events
|
||||
2. **Normalize Before Hashing**: Strip whitespace, lowercase if appropriate, consistent formatting
|
||||
3. **Efficient Storage**: Use grep -Fxq for fast hash lookups (fixed-string, line-match, quiet)
|
||||
4. **Bounded Growth**: Implement rotation to prevent file bloat
|
||||
5. **Clear Logging**: Log when duplicates are detected for debugging
|
||||
6. **State Location**: Use ~/.claude/state/hook-state/ for cross-project state
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Duplicates not being detected
|
||||
|
||||
**Symptoms**: Same content processed multiple times
|
||||
|
||||
**Solution**:
|
||||
1. Check hash file exists and is writable
|
||||
2. Verify store_content_hash is called after processing
|
||||
3. Check content normalization (whitespace differences)
|
||||
4. Verify grep command uses -Fxq flags
|
||||
|
||||
**Prevention**: Test deduplication immediately after implementation
|
||||
|
||||
---
|
||||
|
||||
### Hash file growing too large
|
||||
|
||||
**Symptoms**: Hash file exceeds MAX_HASHES significantly
|
||||
|
||||
**Solution**:
|
||||
1. Verify rotate_hash_file is called
|
||||
2. Check MAX_HASHES value is reasonable
|
||||
3. Manually rotate if needed: `tail -n 10000 hashes.txt > hashes.tmp && mv hashes.tmp hashes.txt`
|
||||
|
||||
**Prevention**: Call rotation after every hash storage
|
||||
|
||||
---
|
||||
|
||||
### False positives (new content marked as duplicate)
|
||||
|
||||
**Symptoms**: Different content being skipped
|
||||
|
||||
**Solution**:
|
||||
1. Check for hash collisions (extremely unlikely with SHA256)
|
||||
2. Verify content is actually different
|
||||
3. Check normalization isn't too aggressive
|
||||
4. Review recent hashes in file
|
||||
|
||||
**Prevention**: Use consistent normalization, test with diverse content
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After implementing deduplication:
|
||||
1. Monitor hash file growth over time
|
||||
2. Tune MAX_HASHES based on usage patterns
|
||||
3. Consider adding metrics (duplicates prevented, storage size)
|
||||
4. Share pattern with team for other hooks
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Source Insights**:
|
||||
- Session: abc123-session-id
|
||||
- Date: 2025-11-03
|
||||
- Category: hooks-and-events
|
||||
- File: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
|
||||
|
||||
**Skill Version**: 0.1.0
|
||||
**Generated**: 2025-11-16
|
||||
**Last Updated**: 2025-11-16
|
||||
@@ -0,0 +1,116 @@
|
||||
# Insights Reference: hook-deduplication-guide
|
||||
|
||||
This document contains the original insight from Claude Code's Explanatory output style that was used to create the **Hook Deduplication Guide** skill.
|
||||
|
||||
## Overview
|
||||
|
||||
**Total Insights**: 1
|
||||
**Date Range**: 2025-11-03
|
||||
**Categories**: hooks-and-events
|
||||
**Sessions**: 1 unique session
|
||||
|
||||
---
|
||||
|
||||
## 1. Hook Deduplication Session Management
|
||||
|
||||
**Metadata**:
|
||||
- **Date**: 2025-11-03
|
||||
- **Category**: hooks-and-events
|
||||
- **Session**: abc123-session-id
|
||||
- **Source File**: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
|
||||
|
||||
**Original Content**:
|
||||
|
||||
The extract-explanatory-insights hook initially used session-based deduplication, which prevented multiple insights from the same session from being stored. However, this created a limitation: if the same valuable insight appeared in different sessions, only the first one would be saved.
|
||||
|
||||
By switching to content-based deduplication using SHA256 hashing, we can:
|
||||
|
||||
1. **Allow multiple unique insights per session** - Different insights in the same conversation are all preserved
|
||||
2. **Prevent true duplicates across sessions** - The same insight appearing in multiple conversations is stored only once
|
||||
3. **Maintain efficient storage** - Hash file rotation keeps storage bounded
|
||||
|
||||
The implementation involves:
|
||||
|
||||
**Hash Generation**:
|
||||
```bash
|
||||
compute_content_hash() {
|
||||
local content="$1"
|
||||
echo -n "$content" | sha256sum | awk '{print $1}'
|
||||
}
|
||||
```
|
||||
|
||||
**Duplicate Detection**:
|
||||
```bash
|
||||
is_duplicate() {
|
||||
local content="$1"
|
||||
local content_hash=$(compute_content_hash "$content")
|
||||
|
||||
if grep -Fxq "$content_hash" "$HASH_FILE"; then
|
||||
return 1 # Duplicate
|
||||
else
|
||||
return 0 # New content
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
**Hash Storage with Rotation**:
|
||||
```bash
|
||||
store_content_hash() {
|
||||
local content="$1"
|
||||
local content_hash=$(compute_content_hash "$content")
|
||||
echo "$content_hash" >> "$HASH_FILE"
|
||||
|
||||
# Rotate if file exceeds MAX_HASHES
|
||||
local count=$(wc -l < "$HASH_FILE")
|
||||
if [ "$count" -gt 10000 ]; then
|
||||
tail -n 10000 "$HASH_FILE" > "${HASH_FILE}.tmp"
|
||||
mv "${HASH_FILE}.tmp" "$HASH_FILE"
|
||||
fi
|
||||
}
|
||||
```
|
||||
|
||||
This approach provides the best of both worlds: session independence and true deduplication based on content, not session boundaries.
|
||||
|
||||
---
|
||||
|
||||
## How This Insight Informs the Skill
|
||||
|
||||
### Hook Deduplication Session Management → Phase-Based Workflow
|
||||
|
||||
The insight's structure (problem → solution → implementation) maps directly to the skill's phases:
|
||||
|
||||
- **Problem Description** → Phase 1: Choose Deduplication Strategy
|
||||
- Explains why session-based is insufficient
|
||||
- Defines when content-based is needed
|
||||
|
||||
- **Solution Explanation** → Phase 2: Implement Content-Based Deduplication
|
||||
- Hash generation logic
|
||||
- Duplicate detection mechanism
|
||||
- State file management
|
||||
|
||||
- **Implementation Details** → Phase 3: Implement Hash Rotation
|
||||
- Rotation logic to prevent unbounded growth
|
||||
- MAX_HASHES configuration
|
||||
|
||||
- **Code Examples** → All phases
|
||||
- Bash functions extracted and integrated into workflow steps
|
||||
|
||||
---
|
||||
|
||||
## Additional Context
|
||||
|
||||
**Why This Insight Was Selected**:
|
||||
|
||||
This insight was selected for skill generation because it:
|
||||
1. Provides a complete, actionable pattern
|
||||
2. Includes working code examples
|
||||
3. Solves a common problem in hook development
|
||||
4. Is generally applicable (not project-specific)
|
||||
5. Has clear benefits over the naive approach
|
||||
|
||||
**Quality Score**: 0.85 (high - qualified for standalone skill)
|
||||
|
||||
---
|
||||
|
||||
**Generated**: 2025-11-16
|
||||
**Last Updated**: 2025-11-16
|
||||
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"name": "hook-deduplication-guide",
|
||||
"version": "0.1.0",
|
||||
"description": "Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions",
|
||||
"type": "skill",
|
||||
"author": "Connor",
|
||||
"category": "productivity",
|
||||
"tags": [
|
||||
"hooks",
|
||||
"deduplication",
|
||||
"state-management",
|
||||
"bash",
|
||||
"generated-from-insights"
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user