Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:40 +08:00
commit f125e90b9f
370 changed files with 67769 additions and 0 deletions

View File

@@ -0,0 +1,24 @@
# Changelog
## [0.1.0] - 2025-11-16
### Added
- Initial release
- Generated from 1 insight (Hook Deduplication Session Management)
- Phase 1: Choose Deduplication Strategy
- Phase 2: Implement Content-Based Deduplication
- Phase 3: Implement Hash Rotation
- Phase 4: Testing and Validation
- Code examples for bash hook implementation
- Troubleshooting section
### Features
- Content-based deduplication using SHA256 hashes
- Session-independent duplicate detection
- Efficient hash storage with rotation
- State management best practices
### Generated By
- insight-skill-generator v0.1.0
- Source category: hooks-and-events
- Original insight date: 2025-11-03

View File

@@ -0,0 +1,51 @@
# Hook Deduplication Guide
Implement robust content-based deduplication for Claude Code hooks.
## Overview
This skill guides you through implementing SHA256 hash-based deduplication to prevent duplicate insights or data from being stored across sessions.
## When to Use
**Trigger Phrases**:
- "implement hook deduplication"
- "prevent duplicate insights in hooks"
- "content-based deduplication for hooks"
## Quick Start
```bash
# Test the skill
You: "I need to add deduplication to my hook to prevent storing the same insight twice"
Claude: [Activates hook-deduplication-guide]
- Explains content-based vs session-based strategies
- Guides implementation of SHA256 hashing
- Shows hash rotation to prevent file bloat
- Provides testing validation
```
## What You'll Get
- Content-based deduplication using SHA256
- Efficient hash storage with rotation
- Testing and validation guidance
- Best practices for hook state management
## Installation
```bash
# This is an example generated by insight-skill-generator
# Copy to your skills directory if you want to use it
cp -r examples/example-generated-skill ~/.claude/skills/hook-deduplication-guide
```
## Learn More
See [SKILL.md](SKILL.md) for complete workflow documentation.
---
**Generated by**: insight-skill-generator v0.1.0
**Source**: 1 insight from hooks-and-events category

View File

@@ -0,0 +1,342 @@
---
name: hook-deduplication-guide
description: Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions
---
# Hook Deduplication Guide
## Overview
This skill guides you through implementing robust deduplication for Claude Code hooks, using content-based hashing instead of session-based tracking. Prevents duplicate insights from being stored while allowing multiple unique insights per session.
**Based on 1 insight**:
- Hook Deduplication Session Management (hooks-and-events, 2025-11-03)
**Key Capabilities**:
- Content-based deduplication using SHA256 hashes
- Session-independent duplicate detection
- Efficient hash storage with rotation
- State management best practices
## When to Use This Skill
**Trigger Phrases**:
- "implement hook deduplication"
- "prevent duplicate insights in hooks"
- "content-based deduplication for hooks"
- "hook state management patterns"
**Use Cases**:
- Developing new Claude Code hooks that store data
- Refactoring hooks to prevent duplicates
- Implementing efficient state management for hooks
- Debugging duplicate data issues in hooks
**Do NOT use when**:
- Creating hooks that don't store data (read-only hooks)
- Session-based deduplication is actually desired
- Hook doesn't run frequently enough to need deduplication
## Response Style
Educational and practical - explain the why behind content-based vs. session-based deduplication, then guide implementation with code examples.
---
## Workflow
### Phase 1: Choose Deduplication Strategy
**Purpose**: Determine whether content-based or session-based deduplication is appropriate.
**Steps**:
1. **Assess hook behavior**:
- How often does the hook run? (per message, per session, per event)
- What data is being stored? (insights, logs, metrics)
- Is the same content likely to appear across sessions?
2. **Evaluate deduplication needs**:
- **Content-based**: Use when the same insight/data might appear in different sessions
- Example: Extract-explanatory-insights hook (same insight might appear in multiple conversations)
- **Session-based**: Use when duplicates should only be prevented within a session
- Example: Error logging (same error in different sessions should be logged)
3. **Recommend strategy**:
- For insights/lessons-learned: Content-based (SHA256 hashing)
- For session logs/events: Session-based (session ID tracking)
- For unique events: No deduplication needed
**Output**: Clear recommendation on deduplication strategy.
**Common Issues**:
- **Unsure which to use**: Default to content-based for data that's meant to be unique (insights, documentation)
- **Performance concerns**: Content-based hashing is fast (<1ms for typical content)
---
### Phase 2: Implement Content-Based Deduplication
**Purpose**: Set up SHA256 hash-based deduplication with state management.
**Steps**:
1. **Create state directory**:
```bash
mkdir -p ~/.claude/state/hook-state/
```
2. **Initialize hash storage file**:
```bash
HASH_FILE="$HOME/.claude/state/hook-state/content-hashes.txt"
touch "$HASH_FILE"
```
3. **Implement hash generation**:
```bash
# Generate SHA256 hash of content
compute_content_hash() {
local content="$1"
echo -n "$content" | sha256sum | awk '{print $1}'
}
```
4. **Check for duplicates**:
```bash
# Returns 0 if content is new, 1 if duplicate
is_duplicate() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
if grep -Fxq "$content_hash" "$HASH_FILE"; then
return 1 # Duplicate found
else
return 0 # New content
fi
}
```
5. **Store hash after processing**:
```bash
store_content_hash() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
echo "$content_hash" >> "$HASH_FILE"
}
```
6. **Integrate into hook**:
```bash
# In your hook script
content="extracted insight or data"
if is_duplicate "$content"; then
# Skip - duplicate content
echo "Duplicate detected, skipping..." >&2
exit 0
fi
# Process new content
process_content "$content"
# Store hash to prevent future duplicates
store_content_hash "$content"
```
**Output**: Working content-based deduplication in your hook.
**Common Issues**:
- **Hash file grows too large**: Implement rotation (see Phase 3)
- **False positives**: Ensure content normalization (whitespace, formatting)
---
### Phase 3: Implement Hash Rotation
**Purpose**: Prevent hash file from growing indefinitely.
**Steps**:
1. **Set rotation limit**:
```bash
MAX_HASHES=10000 # Keep last 10,000 hashes
```
2. **Implement rotation logic**:
```bash
rotate_hash_file() {
local hash_file="$1"
local max_hashes="${2:-10000}"
# Count current hashes
local current_count=$(wc -l < "$hash_file")
# Rotate if needed
if [ "$current_count" -gt "$max_hashes" ]; then
tail -n "$max_hashes" "$hash_file" > "${hash_file}.tmp"
mv "${hash_file}.tmp" "$hash_file"
echo "Rotated hash file: kept last $max_hashes hashes" >&2
fi
}
```
3. **Call rotation periodically**:
```bash
# After storing new hash
store_content_hash "$content"
rotate_hash_file "$HASH_FILE" 10000
```
**Output**: Self-maintaining hash storage with bounded size.
**Common Issues**:
- **Rotation too aggressive**: Increase MAX_HASHES
- **Rotation too infrequent**: Consider checking count before every append
---
### Phase 4: Testing and Validation
**Purpose**: Verify deduplication works correctly.
**Steps**:
1. **Test duplicate detection**:
```bash
# First run - should process
echo "Test insight" | your_hook.sh
# Check: Content was processed
# Second run - should skip
echo "Test insight" | your_hook.sh
# Check: Duplicate detected message
```
2. **Test multiple unique items**:
```bash
echo "Insight 1" | your_hook.sh # Processed
echo "Insight 2" | your_hook.sh # Processed
echo "Insight 3" | your_hook.sh # Processed
echo "Insight 1" | your_hook.sh # Skipped (duplicate)
```
3. **Verify hash file**:
```bash
cat ~/.claude/state/hook-state/content-hashes.txt
# Should show 3 unique hashes (not 4)
```
4. **Test rotation**:
```bash
# Generate more than MAX_HASHES entries
for i in {1..10500}; do
echo "Insight $i" | your_hook.sh
done
# Verify file size bounded
wc -l ~/.claude/state/hook-state/content-hashes.txt
# Should be ~10000, not 10500
```
**Output**: Confirmed working deduplication with proper rotation.
---
## Reference Materials
- [Original Insight](data/insights-reference.md) - Full context on hook deduplication patterns
---
## Important Reminders
- **Use content-based deduplication for insights/documentation** - prevents duplicates across sessions
- **Use session-based deduplication for logs/events** - same event in different sessions is meaningful
- **Normalize content before hashing** - whitespace differences shouldn't create false negatives
- **Implement rotation** - prevent unbounded hash file growth
- **Hash storage location**: `~/.claude/state/hook-state/` (not project-specific)
- **SHA256 is fast** - no performance concerns for typical hook data
- **Test both paths** - verify both new content and duplicates work correctly
**Warnings**:
- ⚠️ **Do not use session ID alone** - prevents same insight in different sessions from being stored
- ⚠️ **Do not skip rotation** - hash file will grow indefinitely
- ⚠️ **Do not hash before normalization** - formatting changes will cause false negatives
---
## Best Practices
1. **Choose the Right Strategy**: Content-based for unique data, session-based for session-specific events
2. **Normalize Before Hashing**: Strip whitespace, lowercase if appropriate, consistent formatting
3. **Efficient Storage**: Use grep -Fxq for fast hash lookups (fixed-string, line-match, quiet)
4. **Bounded Growth**: Implement rotation to prevent file bloat
5. **Clear Logging**: Log when duplicates are detected for debugging
6. **State Location**: Use ~/.claude/state/hook-state/ for cross-project state
---
## Troubleshooting
### Duplicates not being detected
**Symptoms**: Same content processed multiple times
**Solution**:
1. Check hash file exists and is writable
2. Verify store_content_hash is called after processing
3. Check content normalization (whitespace differences)
4. Verify grep command uses -Fxq flags
**Prevention**: Test deduplication immediately after implementation
---
### Hash file growing too large
**Symptoms**: Hash file exceeds MAX_HASHES significantly
**Solution**:
1. Verify rotate_hash_file is called
2. Check MAX_HASHES value is reasonable
3. Manually rotate if needed: `tail -n 10000 hashes.txt > hashes.tmp && mv hashes.tmp hashes.txt`
**Prevention**: Call rotation after every hash storage
---
### False positives (new content marked as duplicate)
**Symptoms**: Different content being skipped
**Solution**:
1. Check for hash collisions (extremely unlikely with SHA256)
2. Verify content is actually different
3. Check normalization isn't too aggressive
4. Review recent hashes in file
**Prevention**: Use consistent normalization, test with diverse content
---
## Next Steps
After implementing deduplication:
1. Monitor hash file growth over time
2. Tune MAX_HASHES based on usage patterns
3. Consider adding metrics (duplicates prevented, storage size)
4. Share pattern with team for other hooks
---
## Metadata
**Source Insights**:
- Session: abc123-session-id
- Date: 2025-11-03
- Category: hooks-and-events
- File: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
**Skill Version**: 0.1.0
**Generated**: 2025-11-16
**Last Updated**: 2025-11-16

View File

@@ -0,0 +1,116 @@
# Insights Reference: hook-deduplication-guide
This document contains the original insight from Claude Code's Explanatory output style that was used to create the **Hook Deduplication Guide** skill.
## Overview
**Total Insights**: 1
**Date Range**: 2025-11-03
**Categories**: hooks-and-events
**Sessions**: 1 unique session
---
## 1. Hook Deduplication Session Management
**Metadata**:
- **Date**: 2025-11-03
- **Category**: hooks-and-events
- **Session**: abc123-session-id
- **Source File**: docs/lessons-learned/hooks-and-events/2025-11-03-hook-deduplication.md
**Original Content**:
The extract-explanatory-insights hook initially used session-based deduplication, which prevented multiple insights from the same session from being stored. However, this created a limitation: if the same valuable insight appeared in different sessions, only the first one would be saved.
By switching to content-based deduplication using SHA256 hashing, we can:
1. **Allow multiple unique insights per session** - Different insights in the same conversation are all preserved
2. **Prevent true duplicates across sessions** - The same insight appearing in multiple conversations is stored only once
3. **Maintain efficient storage** - Hash file rotation keeps storage bounded
The implementation involves:
**Hash Generation**:
```bash
compute_content_hash() {
local content="$1"
echo -n "$content" | sha256sum | awk '{print $1}'
}
```
**Duplicate Detection**:
```bash
is_duplicate() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
if grep -Fxq "$content_hash" "$HASH_FILE"; then
return 1 # Duplicate
else
return 0 # New content
fi
}
```
**Hash Storage with Rotation**:
```bash
store_content_hash() {
local content="$1"
local content_hash=$(compute_content_hash "$content")
echo "$content_hash" >> "$HASH_FILE"
# Rotate if file exceeds MAX_HASHES
local count=$(wc -l < "$HASH_FILE")
if [ "$count" -gt 10000 ]; then
tail -n 10000 "$HASH_FILE" > "${HASH_FILE}.tmp"
mv "${HASH_FILE}.tmp" "$HASH_FILE"
fi
}
```
This approach provides the best of both worlds: session independence and true deduplication based on content, not session boundaries.
---
## How This Insight Informs the Skill
### Hook Deduplication Session Management → Phase-Based Workflow
The insight's structure (problem → solution → implementation) maps directly to the skill's phases:
- **Problem Description** → Phase 1: Choose Deduplication Strategy
- Explains why session-based is insufficient
- Defines when content-based is needed
- **Solution Explanation** → Phase 2: Implement Content-Based Deduplication
- Hash generation logic
- Duplicate detection mechanism
- State file management
- **Implementation Details** → Phase 3: Implement Hash Rotation
- Rotation logic to prevent unbounded growth
- MAX_HASHES configuration
- **Code Examples** → All phases
- Bash functions extracted and integrated into workflow steps
---
## Additional Context
**Why This Insight Was Selected**:
This insight was selected for skill generation because it:
1. Provides a complete, actionable pattern
2. Includes working code examples
3. Solves a common problem in hook development
4. Is generally applicable (not project-specific)
5. Has clear benefits over the naive approach
**Quality Score**: 0.85 (high - qualified for standalone skill)
---
**Generated**: 2025-11-16
**Last Updated**: 2025-11-16

View File

@@ -0,0 +1,15 @@
{
"name": "hook-deduplication-guide",
"version": "0.1.0",
"description": "Use PROACTIVELY when developing Claude Code hooks to implement content-based deduplication and prevent duplicate insight storage across sessions",
"type": "skill",
"author": "Connor",
"category": "productivity",
"tags": [
"hooks",
"deduplication",
"state-management",
"bash",
"generated-from-insights"
]
}