Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:51 +08:00
commit 4e8a12140c
88 changed files with 17078 additions and 0 deletions

29
skills/cc-insights/.gitignore vendored Normal file
View File

@@ -0,0 +1,29 @@
# Ignore processed data and cache
.processed/
*.db
*.db-journal
# Python cache
__pycache__/
*.py[cod]
*$py.class
*.so
# Virtual environment
venv/
env/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Logs
*.log
logs/

View File

@@ -0,0 +1,14 @@
# Changelog
## 0.2.0
- Refactored to Anthropic progressive disclosure pattern
- Updated description with "Use PROACTIVELY when..." format
- Extracted detailed content to modes/, workflow/, reference/ directories
## 0.1.0
- Initial skill release
- RAG-powered conversation analysis
- Semantic search and insight reports
- Optional Next.js dashboard

View File

@@ -0,0 +1,500 @@
# cc-insights: Claude Code Conversation Insights
Automatically process, search, and analyze your Claude Code conversation history using RAG-powered semantic search and intelligent pattern detection.
## Overview
This skill transforms your Claude Code conversations into actionable insights without any manual effort. It automatically processes conversations stored in `~/.claude/projects/`, builds a searchable knowledge base with semantic understanding, and generates insightful reports about your development patterns.
### Key Features
- 🔍 **RAG-Powered Semantic Search**: Find conversations by meaning, not just keywords
- 📊 **Automatic Insight Reports**: Detect patterns, file hotspots, and tool usage analytics
- 📈 **Activity Trends**: Understand development patterns over time
- 💡 **Knowledge Extraction**: Surface recurring topics and solutions
- 🎯 **Zero Manual Effort**: Fully automatic processing of existing conversations
- 🚀 **Fast Performance**: <1s search, <10s report generation
## Quick Start
### 1. Installation
```bash
# Navigate to the skill directory
cd .claude/skills/cc-insights
# Install Python dependencies
pip install -r requirements.txt
# Verify installation
python scripts/conversation-processor.py --help
```
### 2. Initial Setup
Process your existing conversations:
```bash
# Process all conversations for current project
python scripts/conversation-processor.py --project-name annex --verbose --stats
# Build semantic search index
python scripts/rag-indexer.py --verbose --stats
```
This one-time setup will:
- Parse all JSONL files from `~/.claude/projects/`
- Extract metadata (files, tools, topics, timestamps)
- Build SQLite database for fast queries
- Generate vector embeddings for semantic search
- Create ChromaDB index
**Time**: ~1-2 minutes for 100 conversations
### 3. Search Conversations
```bash
# Semantic search (understands meaning)
python scripts/search-conversations.py "fixing authentication bugs"
# Search by file
python scripts/search-conversations.py --file "src/auth/token.ts"
# Search by tool
python scripts/search-conversations.py --tool "Write"
# Keyword search with date filter
python scripts/search-conversations.py "refactoring" --keyword --date-from 2025-10-01
```
### 4. Generate Insights
```bash
# Weekly activity report
python scripts/insight-generator.py weekly --verbose
# File heatmap (most modified files)
python scripts/insight-generator.py file-heatmap
# Tool usage analytics
python scripts/insight-generator.py tool-usage
# Save report to file
python scripts/insight-generator.py weekly --output weekly-report.md
```
## Usage via Skill
Once set up, you can interact with the skill naturally:
```
User: "Search conversations about React performance optimization"
→ Returns top semantic matches with context
User: "Generate insights for the past week"
→ Creates comprehensive weekly report with metrics
User: "Show me files I've modified most often"
→ Generates file heatmap with recommendations
```
## Architecture
```
.claude/skills/cc-insights/
├── SKILL.md # Skill definition for Claude
├── README.md # This file
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
├── scripts/ # Core functionality
│ ├── conversation-processor.py # Parse JSONL, extract metadata
│ ├── rag-indexer.py # Build vector embeddings
│ ├── search-conversations.py # Search interface
│ └── insight-generator.py # Report generation
├── templates/ # Report templates
│ └── weekly-summary.md
└── .processed/ # Generated data (gitignored)
├── conversations.db # SQLite metadata
└── embeddings/ # ChromaDB vector store
├── chroma.sqlite3
└── [embedding data]
```
## Scripts Reference
### conversation-processor.py
Parse JSONL files and extract conversation metadata.
**Usage:**
```bash
python scripts/conversation-processor.py [OPTIONS]
Options:
--project-name TEXT Project to process (default: annex)
--db-path PATH Database path
--reindex Reprocess all (ignore cache)
--verbose Show detailed logs
--stats Display statistics after processing
```
**What it does:**
- Scans `~/.claude/projects/[project]/*.jsonl`
- Decodes base64-encoded conversation content
- Extracts: messages, files, tools, topics, timestamps
- Stores in SQLite with indexes for fast queries
- Tracks processing state for incremental updates
**Output:**
- SQLite database at `.processed/conversations.db`
- Processing state for incremental updates
### rag-indexer.py
Build vector embeddings for semantic search.
**Usage:**
```bash
python scripts/rag-indexer.py [OPTIONS]
Options:
--db-path PATH Database path
--embeddings-dir PATH ChromaDB directory
--model TEXT Embedding model (default: all-MiniLM-L6-v2)
--rebuild Rebuild entire index
--batch-size INT Batch size (default: 32)
--verbose Show detailed logs
--stats Display statistics
--test-search TEXT Test search with query
```
**What it does:**
- Reads conversations from SQLite
- Generates embeddings using sentence-transformers
- Stores in ChromaDB for similarity search
- Supports incremental indexing (only new conversations)
**Models:**
- `all-MiniLM-L6-v2` (default): Fast, good quality, 384 dimensions
- `all-mpnet-base-v2`: Higher quality, slower, 768 dimensions
### search-conversations.py
Search conversations with semantic + metadata filters.
**Usage:**
```bash
python scripts/search-conversations.py QUERY [OPTIONS]
Options:
--semantic/--keyword Semantic (RAG) or keyword search (default: semantic)
--file TEXT Filter by file pattern
--tool TEXT Search by tool name
--date-from TEXT Start date (ISO format)
--date-to TEXT End date (ISO format)
--limit INT Max results (default: 10)
--format TEXT Output: text|json|markdown (default: text)
--verbose Show detailed logs
```
**Examples:**
```bash
# Semantic search
python scripts/search-conversations.py "authentication bugs"
# Filter by file
python scripts/search-conversations.py "React optimization" --file "src/components"
# Search by tool
python scripts/search-conversations.py --tool "Edit"
# Date range
python scripts/search-conversations.py "deployment" --date-from 2025-10-01
# JSON output for integration
python scripts/search-conversations.py "testing" --format json > results.json
```
### insight-generator.py
Generate pattern-based reports and analytics.
**Usage:**
```bash
python scripts/insight-generator.py REPORT_TYPE [OPTIONS]
Report Types:
weekly Weekly activity summary
file-heatmap File modification heatmap
tool-usage Tool usage analytics
Options:
--date-from TEXT Start date (ISO format)
--date-to TEXT End date (ISO format)
--output PATH Save to file (default: stdout)
--verbose Show detailed logs
```
**Examples:**
```bash
# Weekly report (last 7 days)
python scripts/insight-generator.py weekly
# Custom date range
python scripts/insight-generator.py weekly --date-from 2025-10-01 --date-to 2025-10-15
# File heatmap with output
python scripts/insight-generator.py file-heatmap --output heatmap.md
# Tool analytics
python scripts/insight-generator.py tool-usage
```
## Data Storage
All processed data is stored locally in `.processed/` (gitignored):
### SQLite Database (`conversations.db`)
**Tables:**
- `conversations`: Main metadata (timestamps, messages, topics)
- `file_interactions`: File-level interactions (read, write, edit)
- `tool_usage`: Tool usage counts per conversation
- `processing_state`: Tracks processed files for incremental updates
**Indexes:**
- `idx_timestamp`: Fast date-range queries
- `idx_project`: Filter by project
- `idx_file_path`: File-based searches
- `idx_tool_name`: Tool usage queries
### ChromaDB Vector Store (`embeddings/`)
**Contents:**
- Vector embeddings (384-dimensional by default)
- Document text for retrieval
- Metadata for filtering
- HNSW index for fast similarity search
**Performance:**
- <1 second for semantic search
- Handles 10,000+ conversations efficiently
- ~100MB per 1,000 conversations
## Performance
| Operation | Time | Notes |
|-----------|------|-------|
| Initial processing (100 convs) | ~30s | One-time setup |
| Initial indexing (100 convs) | ~60s | One-time setup |
| Incremental processing | <5s | Only new conversations |
| Semantic search | <1s | Top 10 results |
| Keyword search | <0.1s | SQLite FTS |
| Weekly report generation | <10s | Includes visualizations |
## Troubleshooting
### "Database not found"
**Problem:** Scripts can't find `conversations.db`
**Solution:**
```bash
# Run processor first
python scripts/conversation-processor.py --project-name annex --verbose
```
### "No conversations found"
**Problem:** Project name doesn't match or no JSONL files
**Solution:**
```bash
# Check project directories
ls ~/.claude/projects/
# Use correct project name (may be encoded)
python scripts/conversation-processor.py --project-name [actual-name] --verbose
```
### "ImportError: sentence_transformers"
**Problem:** Dependencies not installed
**Solution:**
```bash
# Install requirements
pip install -r requirements.txt
# Or individually
pip install sentence-transformers chromadb jinja2 click python-dateutil
```
### "Slow embedding generation"
**Problem:** Large number of conversations
**Solution:**
```bash
# Use smaller batch size
python scripts/rag-indexer.py --batch-size 16
# Or use faster model (lower quality)
python scripts/rag-indexer.py --model all-MiniLM-L6-v2
```
### "Out of memory"
**Problem:** Too many conversations processed at once
**Solution:**
```bash
# Smaller batch size
python scripts/rag-indexer.py --batch-size 8
# Or process in chunks by date
python scripts/conversation-processor.py --date-from 2025-10-01 --date-to 2025-10-15
```
## Incremental Updates
The system automatically handles incremental updates:
1. **Conversation Processor**: Tracks file hashes in `processing_state` table
- Only reprocesses changed files
- Detects new JSONL files automatically
2. **RAG Indexer**: Checks ChromaDB for existing IDs
- Only indexes new conversations
- Skips already-embedded conversations
**Recommended workflow:**
```bash
# Daily/weekly: Run both for new conversations
python scripts/conversation-processor.py --project-name annex
python scripts/rag-indexer.py
# Takes <5s if only a few new conversations
```
## Integration Examples
### Search from command line
```bash
# Quick search function in .bashrc or .zshrc
cc-search() {
python ~/.claude/skills/cc-insights/scripts/search-conversations.py "$@"
}
# Usage
cc-search "authentication bugs"
```
### Generate weekly report automatically
```bash
# Add to crontab for weekly reports
0 9 * * MON cd ~/.claude/skills/cc-insights && python scripts/insight-generator.py weekly --output ~/reports/weekly-$(date +\%Y-\%m-\%d).md
```
### Export data for external tools
```bash
# Export to JSON
python scripts/search-conversations.py "testing" --format json | jq
# Export metadata
sqlite3 .processed/conversations.db "SELECT * FROM conversations" -json > export.json
```
## Privacy & Security
- **Local-only**: All data stays on your machine
- **No external APIs**: Embeddings generated locally
- **Project-scoped**: Only accesses current project
- **Gitignored**: `.processed/` excluded from version control
- **Sensitive data**: Review before sharing reports (may contain secrets)
## Requirements
### Python Dependencies
- `sentence-transformers>=2.2.0` - Semantic embeddings
- `chromadb>=0.4.0` - Vector database
- `jinja2>=3.1.0` - Template engine
- `click>=8.1.0` - CLI framework
- `python-dateutil>=2.8.0` - Date utilities
### System Requirements
- Python 3.8+
- 500MB disk space (for 1,000 conversations)
- 2GB RAM (for embedding generation)
## Limitations
- **Read-only**: Analyzes existing conversations, doesn't modify them
- **Single project**: Designed for per-project insights (not cross-project)
- **Static analysis**: Analyzes saved conversations, not real-time
- **Embedding quality**: Good but not GPT-4 level (local models)
- **JSONL format**: Depends on Claude Code's internal storage format
## Future Enhancements
Potential additions (not currently implemented):
- [ ] Cross-project analytics dashboard
- [ ] AI-powered summarization with LLM
- [ ] Slack/Discord integration for weekly reports
- [ ] Git commit correlation
- [ ] VS Code extension
- [ ] Web dashboard (Next.js)
- [ ] Confluence/Notion export
- [ ] Custom embedding models
## FAQ
**Q: How often should I rebuild the index?**
A: Never, unless changing models. Use incremental updates.
**Q: Can I change the embedding model?**
A: Yes, use `--model` flag with rag-indexer.py, then `--rebuild`.
**Q: Does this work with incognito mode?**
A: No, incognito conversations aren't saved to JSONL files.
**Q: Can I share reports with my team?**
A: Yes, but review for sensitive information first (API keys, secrets).
**Q: What if Claude Code changes the JSONL format?**
A: The processor may need updates. File an issue if parsing breaks.
**Q: Can I delete old conversations?**
A: Yes, remove JSONL files and run `--reindex` to rebuild.
## Contributing
Contributions welcome! Areas to improve:
- Additional report templates
- Better pattern detection algorithms
- Performance optimizations
- Web dashboard implementation
- Documentation improvements
## License
MIT License - See repository root for details
## Support
For issues or questions:
1. Check this README and SKILL.md
2. Review script `--help` output
3. Run with `--verbose` to see detailed logs
4. Check `.processed/logs/` if created
5. Open an issue in the repository
---
**Built for Connor's annex project**
*Zero-effort conversation intelligence*

120
skills/cc-insights/SKILL.md Normal file
View File

@@ -0,0 +1,120 @@
---
name: cc-insights
description: Use PROACTIVELY when searching past Claude Code conversations, analyzing development patterns, or generating activity reports. Automatically processes conversation history from the project, enables RAG-powered semantic search, and generates insight reports with pattern detection. Provides optional dashboard for visualization. Not for real-time analysis or cross-project searches.
---
# Claude Code Insights
Unlock the hidden value in your Claude Code conversation history through automatic processing, semantic search, and intelligent insight generation.
## Overview
This skill automatically analyzes your project's Claude Code conversations (stored in `~/.claude/projects/[project]/*.jsonl`) to provide:
- **RAG-Powered Semantic Search**: Find conversations by meaning, not just keywords
- **Automatic Insight Reports**: Pattern detection, file hotspots, tool usage analytics
- **Activity Trends**: Understand your development patterns over time
- **Knowledge Extraction**: Surface recurring topics, solutions, and best practices
- **Zero Manual Effort**: Fully automatic processing of existing conversations
## When to Use This Skill
**Trigger Phrases**:
- "Find conversations about [topic]"
- "Generate weekly insights report"
- "What files do I modify most often?"
- "Launch the insights dashboard"
- "Export insights as [format]"
**Use Cases**:
- Search past conversations by topic or file
- Generate activity reports and insights
- Understand development patterns over time
- Extract knowledge and recurring solutions
- Visualize activity with interactive dashboard
**NOT for**:
- Real-time conversation analysis (analyzes history only)
- Conversations from other projects (project-specific)
- Manual conversation logging (automatic only)
## Response Style
**Informative and Visual**: Present search results with relevance scores and snippets. Generate reports with clear metrics and ASCII visualizations. Offer to save or export results.
## Mode Selection
| User Request | Mode | Reference |
|--------------|------|-----------|
| "Find conversations about X" | Search | `modes/mode-1-search.md` |
| "Generate insights report" | Insights | `modes/mode-2-insights.md` |
| "Launch dashboard" | Dashboard | `modes/mode-3-dashboard.md` |
| "Export as JSON/CSV/HTML" | Export | `modes/mode-4-export.md` |
## Mode Overview
### Mode 1: Search Conversations
Find past conversations using semantic search (by meaning) or metadata search (by files/tools).
**Details**: `modes/mode-1-search.md`
### Mode 2: Generate Insights
Analyze patterns and generate reports with file hotspots, tool usage, and knowledge highlights.
**Details**: `modes/mode-2-insights.md`
### Mode 3: Interactive Dashboard
Launch a Next.js web dashboard for rich visualization and exploration.
**Details**: `modes/mode-3-dashboard.md`
### Mode 4: Export and Integration
Export insights as Markdown, JSON, CSV, or HTML for sharing and integration.
**Details**: `modes/mode-4-export.md`
## Initial Setup
**First time usage**:
1. Install dependencies: `pip install -r requirements.txt`
2. Run initial processing (automatic on first use)
3. Build embeddings (one-time, ~1-2 min)
4. Ready to search and analyze!
**What happens automatically**:
- Scans `~/.claude/projects/[current-project]/*.jsonl`
- Extracts and indexes conversation metadata
- Builds vector embeddings for semantic search
- Creates SQLite database for fast queries
## Important Reminders
- **Automatic processing**: Skill updates index on each use (incremental)
- **First run is slow**: Embedding creation takes 1-2 minutes
- **Project-specific**: Analyzes only current project's conversations
- **Dashboard requires Node.js**: v18+ for the Next.js dashboard
- **ChromaDB for search**: Vector similarity search for semantic queries
## Limitations
- Only analyzes JSONL conversation files from Claude Code
- Requires sentence-transformers for embedding creation
- Dashboard is local only (localhost:3000)
- Large conversation histories may take longer to process initially
## Reference Materials
| Resource | Purpose |
|----------|---------|
| `modes/*.md` | Detailed mode instructions |
| `reference/troubleshooting.md` | Common issues and fixes |
| `scripts/` | Processing and indexing scripts |
| `dashboard/` | Next.js dashboard application |
## Success Criteria
- [ ] Conversations processed and indexed
- [ ] Embeddings built for semantic search
- [ ] Search returns relevant results
- [ ] Insights reports generated correctly
- [ ] Dashboard launches and displays data
---
**Tech Stack**: Python (processing), SQLite (metadata), ChromaDB (vectors), Next.js (dashboard)

View File

@@ -0,0 +1,68 @@
# Mode 1: Search Conversations
**When to use**: Find specific past conversations
## Trigger Phrases
- "Find conversations about React performance optimization"
- "Search for times I fixed authentication bugs"
- "Show me conversations that modified Auth.tsx"
- "What conversations mention TypeScript strict mode?"
## Process
1. User asks to search for a topic or file
2. Skill performs RAG semantic search
3. Returns ranked results with context snippets
4. Optionally show full conversation details
## Search Types
### Semantic Search (by meaning)
```
User: "Find conversations about fixing bugs related to user authentication"
Skill: [Performs RAG search]
Found 3 conversations:
1. "Debug JWT token expiration" (Oct 24)
2. "Fix OAuth redirect loop" (Oct 20)
3. "Implement session timeout handling" (Oct 18)
```
### Metadata Search (by files/tools)
```
User: "Show conversations that modified src/auth/token.ts"
Skill: [Queries SQLite metadata]
Found 5 conversations touching src/auth/token.ts:
1. "Implement token refresh logic" (Oct 25)
2. "Add token validation" (Oct 22)
...
```
### Time-based Search
```
User: "What did I work on last week?"
Skill: [Queries by date range]
Last week (Oct 19-25) you had 12 conversations:
- 5 about authentication features
- 3 about bug fixes
- 2 about testing
- 2 about refactoring
```
## Output Format
```
Found 5 conversations about "React performance optimization":
1. [Similarity: 0.89] "Optimize UserProfile re-renders" (Oct 25, 2025)
Files: src/components/UserProfile.tsx, src/hooks/useUser.ts
Snippet: "...implemented useMemo to prevent unnecessary re-renders..."
2. [Similarity: 0.82] "Fix dashboard performance issues" (Oct 20, 2025)
Files: src/pages/Dashboard.tsx
Snippet: "...React.memo wrapper reduced render count by 60%..."
[View full conversations? Type the number]
```

View File

@@ -0,0 +1,75 @@
# Mode 2: Generate Insights
**When to use**: Understand patterns and trends
## Trigger Phrases
- "Generate weekly insights report"
- "Show me my most active files this month"
- "What patterns do you see in my conversations?"
- "Create a project summary report"
## Process
1. User asks for insights on a timeframe
2. Skill analyzes metadata and patterns
3. Creates markdown report with visualizations
4. Offers to save report to file
## Report Sections
- **Executive Summary**: Key metrics
- **Activity Timeline**: Conversations over time
- **File Hotspots**: Most modified files
- **Tool Usage Breakdown**: Which tools you use most
- **Topic Clusters**: Recurring themes
- **Knowledge Highlights**: Key solutions and learnings
## Example Output
```markdown
# Weekly Insights (Oct 19-25, 2025)
## Overview
- 12 conversations
- 8 active days
- 23 files modified
- 45 tool uses
## Top Files
1. src/auth/token.ts (5 modifications)
2. src/components/Login.tsx (3 modifications)
3. src/api/auth.ts (3 modifications)
## Activity Pattern
Mon: ████████ 4 conversations
Tue: ██████ 3 conversations
Wed: ██████ 3 conversations
Thu: ████ 2 conversations
Fri: ████ 2 conversations
## Key Topics
- Authentication (6 conversations)
- Testing (3 conversations)
- Bug fixes (2 conversations)
## Knowledge Highlights
- Implemented JWT refresh token pattern
- Added React Testing Library for auth components
- Fixed OAuth redirect edge case
[Save report to file? Y/n]
```
## File-Centric Analysis
```
# File Hotspots (All Time)
🔥🔥🔥 src/auth/token.ts (15 conversations)
🔥🔥 src/components/Login.tsx (9 conversations)
🔥🔥 src/api/auth.ts (8 conversations)
🔥 src/hooks/useAuth.ts (6 conversations)
Insight: Authentication module is your most active area.
Consider: Review token.ts for refactoring opportunities.
```

View File

@@ -0,0 +1,69 @@
# Mode 3: Interactive Dashboard
**When to use**: Rich visual exploration and ongoing monitoring
## Trigger Phrases
- "Launch the insights dashboard"
- "Start the visualization server"
- "Show me the interactive insights"
## Process
1. User asks to start the dashboard
2. Skill launches Next.js dev server
3. Opens browser to http://localhost:3000
4. Provides real-time data from SQLite + ChromaDB
## Dashboard Pages
### Home
- Timeline of recent conversations
- Activity stats and quick metrics
- Summary cards
### Search
- Interactive semantic + keyword search interface
- Real-time results
- Filter by date, files, tools
### Insights
- Auto-generated reports with interactive charts
- Trend visualizations
- Pattern detection results
### Files
- File-centric view of all conversations
- Click to see all conversations touching a file
- Modification frequency heatmap
### Analytics
- Deep-dive into patterns and trends
- Tool usage statistics
- Activity patterns by time of day/week
## Tech Stack
- **Framework**: Next.js 15 with React Server Components
- **Styling**: Tailwind CSS
- **Charts**: Recharts
- **Data**: SQLite + ChromaDB
- **URL**: http://localhost:3000
## Starting the Dashboard
```bash
# Navigate to dashboard directory
cd ~/.claude/skills/cc-insights/dashboard
# Install dependencies (first time only)
npm install
# Start development server
npm run dev
```
The browser will automatically open to http://localhost:3000.
## Stopping the Dashboard
Press `Ctrl+C` in the terminal or close the terminal window.

View File

@@ -0,0 +1,68 @@
# Mode 4: Export and Integration
**When to use**: Share insights or integrate with other tools
## Trigger Phrases
- "Export weekly insights as markdown"
- "Save conversation metadata as JSON"
- "Generate HTML report for sharing"
## Process
1. User asks to export in a specific format
2. Skill generates formatted output
3. Saves to specified location
## Export Formats
### Markdown
Human-readable reports with formatting.
```bash
Export location: ./insights/weekly-report.md
```
### JSON
Machine-readable data for integration with other tools.
```json
{
"period": "2025-10-19 to 2025-10-25",
"conversations": 12,
"files_modified": 23,
"tool_uses": 45,
"top_files": [
{"path": "src/auth/token.ts", "count": 5},
{"path": "src/components/Login.tsx", "count": 3}
],
"topics": ["authentication", "testing", "bug fixes"]
}
```
### CSV
Activity data for spreadsheets.
```csv
date,conversation_count,files_modified,tool_uses
2025-10-19,4,8,12
2025-10-20,3,6,9
...
```
### HTML
Standalone report with styling for sharing.
```html
<!-- Self-contained report with inline CSS -->
<!-- Can be opened in any browser -->
```
## Example Usage
```
User: "Export this month's insights as JSON for my dashboard"
Skill: [Generates JSON report]
Exported to: ./insights/october-2025.json
Contains: 45 conversations, 89 files, 156 tool uses
```

View File

@@ -0,0 +1,84 @@
# Troubleshooting Guide
## No conversations found
**Symptoms**: Skill reports 0 conversations
**Solution**:
1. Verify you're in a project with conversation history
2. Check `~/.claude/projects/` for your project folder
3. Ensure JSONL files exist in the project folder
4. Run initial processing if first time
---
## Search returns no results
**Symptoms**: Semantic search finds nothing
**Solution**:
1. Check embeddings were built (look for ChromaDB folder)
2. Rebuild embeddings: skill will do this automatically
3. Try broader search terms
4. Use metadata search for specific files
---
## Dashboard won't start
**Symptoms**: Error when launching dashboard
**Solution**:
1. Check Node.js is installed (v18+)
2. Run `npm install` in dashboard directory
3. Check port 3000 is available
4. Kill existing processes: `lsof -i :3000`
---
## Slow processing
**Symptoms**: Initial setup takes very long
**Solution**:
1. First-time embedding creation is slow (1-2 min normal)
2. Subsequent runs use incremental updates (fast)
3. For very large history, consider limiting date range
---
## Missing dependencies
**Symptoms**: Import errors when running
**Solution**:
```bash
pip install -r requirements.txt
```
Required packages:
- sentence-transformers
- chromadb
- sqlite3 (built-in)
---
## Embeddings out of date
**Symptoms**: New conversations not appearing in search
**Solution**:
1. Skill automatically updates on each use
2. Force rebuild: delete ChromaDB folder and rerun
3. Check incremental processing completed
---
## Database locked
**Symptoms**: SQLite errors about locked database
**Solution**:
1. Close other processes using the database
2. Close the dashboard if running
3. Wait a moment and retry

View File

@@ -0,0 +1,16 @@
# Core dependencies for cc-insights skill
# Sentence transformers for semantic embeddings
sentence-transformers>=2.2.0
# ChromaDB for vector database
chromadb>=0.4.0
# Template engine for reports
jinja2>=3.1.0
# CLI framework
click>=8.1.0
# Date utilities
python-dateutil>=2.8.0

View File

@@ -0,0 +1,634 @@
#!/usr/bin/env python3
"""
Conversation Processor for Claude Code Insights
Parses JSONL conversation files from ~/.claude/projects/, extracts metadata,
and stores in SQLite for fast querying. Supports incremental processing.
"""
import json
import sqlite3
import base64
import hashlib
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
from dataclasses import dataclass, asdict
import click
import re
@dataclass
class ConversationMetadata:
"""Structured conversation metadata"""
id: str
project_path: str
timestamp: datetime
message_count: int
user_messages: int
assistant_messages: int
files_read: List[str]
files_written: List[str]
files_edited: List[str]
tools_used: List[str]
topics: List[str]
first_user_message: str
last_assistant_message: str
conversation_hash: str
file_size_bytes: int
processed_at: datetime
class ConversationProcessor:
"""Processes Claude Code conversation JSONL files"""
def __init__(self, db_path: Path, verbose: bool = False):
self.db_path = db_path
self.verbose = verbose
self.conn = None
self._init_database()
def _init_database(self):
"""Initialize SQLite database with schema"""
self.db_path.parent.mkdir(parents=True, exist_ok=True)
self.conn = sqlite3.connect(str(self.db_path))
self.conn.row_factory = sqlite3.Row
# Create tables
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS conversations (
id TEXT PRIMARY KEY,
project_path TEXT NOT NULL,
timestamp TEXT NOT NULL,
message_count INTEGER NOT NULL,
user_messages INTEGER NOT NULL,
assistant_messages INTEGER NOT NULL,
files_read TEXT, -- JSON array
files_written TEXT, -- JSON array
files_edited TEXT, -- JSON array
tools_used TEXT, -- JSON array
topics TEXT, -- JSON array
first_user_message TEXT,
last_assistant_message TEXT,
conversation_hash TEXT UNIQUE NOT NULL,
file_size_bytes INTEGER NOT NULL,
processed_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_timestamp ON conversations(timestamp);
CREATE INDEX IF NOT EXISTS idx_project ON conversations(project_path);
CREATE INDEX IF NOT EXISTS idx_processed ON conversations(processed_at);
CREATE TABLE IF NOT EXISTS file_interactions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
conversation_id TEXT NOT NULL,
file_path TEXT NOT NULL,
interaction_type TEXT NOT NULL, -- read, write, edit
FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);
CREATE INDEX IF NOT EXISTS idx_file_path ON file_interactions(file_path);
CREATE INDEX IF NOT EXISTS idx_conversation ON file_interactions(conversation_id);
CREATE TABLE IF NOT EXISTS tool_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
conversation_id TEXT NOT NULL,
tool_name TEXT NOT NULL,
usage_count INTEGER NOT NULL,
FOREIGN KEY (conversation_id) REFERENCES conversations(id)
);
CREATE INDEX IF NOT EXISTS idx_tool_name ON tool_usage(tool_name);
CREATE TABLE IF NOT EXISTS processing_state (
file_path TEXT PRIMARY KEY,
last_modified TEXT NOT NULL,
last_processed TEXT NOT NULL,
file_hash TEXT NOT NULL
);
""")
self.conn.commit()
def _log(self, message: str):
"""Log if verbose mode is enabled"""
if self.verbose:
print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
def _compute_file_hash(self, file_path: Path) -> str:
"""Compute SHA256 hash of file for change detection"""
sha256 = hashlib.sha256()
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
sha256.update(chunk)
return sha256.hexdigest()
def _needs_processing(self, file_path: Path, reindex: bool = False) -> bool:
"""Check if file needs (re)processing"""
if reindex:
return True
file_stat = file_path.stat()
file_hash = self._compute_file_hash(file_path)
cursor = self.conn.execute(
"SELECT last_modified, file_hash FROM processing_state WHERE file_path = ?",
(str(file_path),)
)
row = cursor.fetchone()
if not row:
return True # Never processed
last_modified, stored_hash = row
return stored_hash != file_hash # File changed
def _update_processing_state(self, file_path: Path):
"""Update processing state for file"""
file_hash = self._compute_file_hash(file_path)
last_modified = datetime.fromtimestamp(file_path.stat().st_mtime).isoformat()
self.conn.execute("""
INSERT OR REPLACE INTO processing_state (file_path, last_modified, last_processed, file_hash)
VALUES (?, ?, ?, ?)
""", (str(file_path), last_modified, datetime.now().isoformat(), file_hash))
def _parse_jsonl_file(self, file_path: Path) -> List[Dict[str, Any]]:
"""Parse JSONL file with base64-encoded content"""
messages = []
with open(file_path, 'r', encoding='utf-8') as f:
for line_num, line in enumerate(f, 1):
try:
if line.strip():
data = json.loads(line)
messages.append(data)
except json.JSONDecodeError as e:
self._log(f"Warning: Failed to parse line {line_num} in {file_path.name}: {e}")
return messages
def _extract_tool_uses(self, content: str) -> List[str]:
"""Extract tool names from assistant messages"""
tools = []
# Look for tool use patterns in content
tool_patterns = [
r'"name":\s*"([A-Z][a-zA-Z]+)"', # JSON tool calls
r'<tool>([A-Z][a-zA-Z]+)</tool>', # XML tool calls
]
for pattern in tool_patterns:
matches = re.findall(pattern, content)
tools.extend(matches)
return list(set(tools)) # Unique tools
def _extract_file_paths(self, content: str) -> Dict[str, List[str]]:
"""Extract file paths and their interaction types from content"""
files = {
'read': [],
'written': [],
'edited': []
}
# Patterns for file operations
read_patterns = [
r'Reading\s+(.+\.(?:py|js|ts|tsx|jsx|md|json|yaml|yml))',
r'Read\s+file:\s*(.+)',
r'"file_path":\s*"([^"]+)"', # Tool parameters
]
write_patterns = [
r'Writing\s+(.+\.(?:py|js|ts|tsx|jsx|md|json|yaml|yml))',
r'Created\s+file:\s*(.+)',
r'Write\s+(.+)',
]
edit_patterns = [
r'Editing\s+(.+\.(?:py|js|ts|tsx|jsx|md|json|yaml|yml))',
r'Modified\s+file:\s*(.+)',
r'Edit\s+(.+)',
]
for pattern in read_patterns:
files['read'].extend(re.findall(pattern, content, re.IGNORECASE))
for pattern in write_patterns:
files['written'].extend(re.findall(pattern, content, re.IGNORECASE))
for pattern in edit_patterns:
files['edited'].extend(re.findall(pattern, content, re.IGNORECASE))
# Deduplicate and clean
for key in files:
files[key] = list(set(path.strip() for path in files[key]))
return files
def _extract_topics(self, messages: List[Dict[str, Any]]) -> List[str]:
"""Extract topic keywords from conversation"""
# Combine first user message and some assistant responses
text = ""
user_count = 0
for msg in messages:
msg_type = msg.get('type', '')
# Handle event-stream format
if msg_type == 'user':
message_dict = msg.get('message', {})
content = message_dict.get('content', '') if isinstance(message_dict, dict) else ''
# Handle content that's a list (content blocks)
if isinstance(content, list):
message_content = ' '.join(
block.get('text', '') if isinstance(block, dict) and block.get('type') == 'text' else ''
for block in content
)
else:
message_content = content
if message_content:
text += message_content + " "
user_count += 1
if user_count >= 3: # Only use first few user messages
break
elif msg_type == 'assistant' and user_count < 3:
# Also include some assistant responses for context
message_dict = msg.get('message', {})
content = message_dict.get('content', '') if isinstance(message_dict, dict) else ''
# Handle content that's a list (content blocks)
if isinstance(content, list):
message_content = ' '.join(
block.get('text', '') if isinstance(block, dict) and block.get('type') == 'text' else ''
for block in content
)
else:
message_content = content
if message_content:
text += message_content[:200] + " " # Just a snippet
# Extract common programming keywords
keywords = []
common_topics = [
'authentication', 'auth', 'login', 'jwt', 'oauth',
'testing', 'test', 'unit test', 'integration test',
'bug', 'fix', 'error', 'issue', 'debug',
'performance', 'optimization', 'optimize', 'slow',
'refactor', 'refactoring', 'cleanup',
'feature', 'implement', 'add', 'create',
'database', 'sql', 'query', 'schema',
'api', 'endpoint', 'rest', 'graphql',
'typescript', 'javascript', 'react', 'node',
'css', 'style', 'styling', 'tailwind',
'security', 'vulnerability', 'xss', 'csrf',
'deploy', 'deployment', 'ci/cd', 'docker',
]
text_lower = text.lower()
for topic in common_topics:
if topic in text_lower:
keywords.append(topic)
return list(set(keywords))[:10] # Max 10 topics
def _process_conversation(self, file_path: Path, messages: List[Dict[str, Any]]) -> ConversationMetadata:
"""Extract metadata from parsed conversation"""
# Generate conversation ID from filename
conv_id = file_path.stem
# Count messages by role
user_messages = 0
assistant_messages = 0
first_user_msg = ""
last_assistant_msg = ""
all_tools = []
all_files = {'read': [], 'written': [], 'edited': []}
for msg in messages:
msg_type = msg.get('type', '')
# Handle event-stream format
if msg_type == 'user':
user_messages += 1
message_dict = msg.get('message', {})
content = message_dict.get('content', '') if isinstance(message_dict, dict) else ''
# Handle content that's a list (content blocks)
if isinstance(content, list):
message_content = ' '.join(
block.get('text', '') if isinstance(block, dict) and block.get('type') == 'text' else ''
for block in content
)
else:
message_content = content
if not first_user_msg and message_content:
first_user_msg = message_content[:500] # First 500 chars
elif msg_type == 'assistant':
assistant_messages += 1
message_dict = msg.get('message', {})
content = message_dict.get('content', '') if isinstance(message_dict, dict) else ''
# Handle content that's a list (content blocks)
if isinstance(content, list):
message_content = ' '.join(
block.get('text', '') if isinstance(block, dict) and block.get('type') == 'text' else ''
for block in content
)
# Also extract tools from content blocks
for block in content:
if isinstance(block, dict) and block.get('type') == 'tool_use':
tool_name = block.get('name', '')
if tool_name:
all_tools.append(tool_name)
else:
message_content = content
if message_content:
last_assistant_msg = message_content[:500]
# Extract tools and files from assistant messages
tools = self._extract_tool_uses(message_content)
all_tools.extend(tools)
files = self._extract_file_paths(message_content)
for key in all_files:
all_files[key].extend(files[key])
# Deduplicate
all_tools = list(set(all_tools))
for key in all_files:
all_files[key] = list(set(all_files[key]))
# Extract topics
topics = self._extract_topics(messages)
# File stats
file_stat = file_path.stat()
# Compute conversation hash
conv_hash = self._compute_file_hash(file_path)
# Extract timestamp (from filename or file mtime)
try:
# Try to get timestamp from file modification time
timestamp = datetime.fromtimestamp(file_stat.st_mtime)
except Exception:
timestamp = datetime.now()
return ConversationMetadata(
id=conv_id,
project_path=str(file_path.parent),
timestamp=timestamp,
message_count=len(messages),
user_messages=user_messages,
assistant_messages=assistant_messages,
files_read=all_files['read'],
files_written=all_files['written'],
files_edited=all_files['edited'],
tools_used=all_tools,
topics=topics,
first_user_message=first_user_msg,
last_assistant_message=last_assistant_msg,
conversation_hash=conv_hash,
file_size_bytes=file_stat.st_size,
processed_at=datetime.now()
)
def _store_conversation(self, metadata: ConversationMetadata):
"""Store conversation metadata in database"""
# Store main conversation record
self.conn.execute("""
INSERT OR REPLACE INTO conversations
(id, project_path, timestamp, message_count, user_messages, assistant_messages,
files_read, files_written, files_edited, tools_used, topics,
first_user_message, last_assistant_message, conversation_hash,
file_size_bytes, processed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
metadata.id,
metadata.project_path,
metadata.timestamp.isoformat(),
metadata.message_count,
metadata.user_messages,
metadata.assistant_messages,
json.dumps(metadata.files_read),
json.dumps(metadata.files_written),
json.dumps(metadata.files_edited),
json.dumps(metadata.tools_used),
json.dumps(metadata.topics),
metadata.first_user_message,
metadata.last_assistant_message,
metadata.conversation_hash,
metadata.file_size_bytes,
metadata.processed_at.isoformat()
))
# Store file interactions
self.conn.execute(
"DELETE FROM file_interactions WHERE conversation_id = ?",
(metadata.id,)
)
for file_path in metadata.files_read:
self.conn.execute(
"INSERT INTO file_interactions (conversation_id, file_path, interaction_type) VALUES (?, ?, ?)",
(metadata.id, file_path, 'read')
)
for file_path in metadata.files_written:
self.conn.execute(
"INSERT INTO file_interactions (conversation_id, file_path, interaction_type) VALUES (?, ?, ?)",
(metadata.id, file_path, 'write')
)
for file_path in metadata.files_edited:
self.conn.execute(
"INSERT INTO file_interactions (conversation_id, file_path, interaction_type) VALUES (?, ?, ?)",
(metadata.id, file_path, 'edit')
)
# Store tool usage
self.conn.execute(
"DELETE FROM tool_usage WHERE conversation_id = ?",
(metadata.id,)
)
for tool_name in metadata.tools_used:
self.conn.execute(
"INSERT INTO tool_usage (conversation_id, tool_name, usage_count) VALUES (?, ?, ?)",
(metadata.id, tool_name, 1)
)
def process_file(self, file_path: Path, reindex: bool = False) -> bool:
"""Process a single conversation file"""
if not self._needs_processing(file_path, reindex):
self._log(f"Skipping {file_path.name} (already processed)")
return False
self._log(f"Processing {file_path.name}...")
try:
# Parse JSONL
messages = self._parse_jsonl_file(file_path)
if not messages:
self._log(f"Warning: No messages found in {file_path.name}")
return False
# Extract metadata
metadata = self._process_conversation(file_path, messages)
# Store in database
self._store_conversation(metadata)
# Update processing state
self._update_processing_state(file_path)
self.conn.commit()
self._log(f"✓ Processed {file_path.name}: {metadata.message_count} messages, "
f"{metadata.user_messages} user, {metadata.assistant_messages} assistant")
return True
except Exception as e:
self._log(f"Error processing {file_path.name}: {e}")
import traceback
if self.verbose:
traceback.print_exc()
return False
def process_project(self, project_name: str, reindex: bool = False) -> int:
"""Process all conversations for a project"""
# Find conversation files
claude_projects = Path.home() / ".claude" / "projects"
if not claude_projects.exists():
self._log(f"Error: {claude_projects} does not exist")
return 0
# Find project directory (may be encoded)
project_dirs = list(claude_projects.glob(f"*{project_name}*"))
if not project_dirs:
self._log(f"Error: No project directory found matching '{project_name}'")
return 0
if len(project_dirs) > 1:
self._log(f"Warning: Multiple project directories found, using {project_dirs[0].name}")
project_dir = project_dirs[0]
self._log(f"Processing conversations from {project_dir}")
# Find all JSONL files
jsonl_files = list(project_dir.glob("*.jsonl"))
if not jsonl_files:
self._log(f"No conversation files found in {project_dir}")
return 0
self._log(f"Found {len(jsonl_files)} conversation files")
# Process each file
processed_count = 0
for jsonl_file in jsonl_files:
if self.process_file(jsonl_file, reindex):
processed_count += 1
self._log(f"\nProcessed {processed_count}/{len(jsonl_files)} conversations")
return processed_count
def get_stats(self) -> Dict[str, Any]:
"""Get processing statistics"""
cursor = self.conn.execute("""
SELECT
COUNT(*) as total_conversations,
SUM(message_count) as total_messages,
SUM(user_messages) as total_user_messages,
SUM(assistant_messages) as total_assistant_messages,
MIN(timestamp) as earliest_conversation,
MAX(timestamp) as latest_conversation
FROM conversations
""")
row = cursor.fetchone()
stats = {
'total_conversations': row['total_conversations'],
'total_messages': row['total_messages'],
'total_user_messages': row['total_user_messages'],
'total_assistant_messages': row['total_assistant_messages'],
'earliest_conversation': row['earliest_conversation'],
'latest_conversation': row['latest_conversation']
}
# Top files
cursor = self.conn.execute("""
SELECT file_path, COUNT(*) as interaction_count
FROM file_interactions
GROUP BY file_path
ORDER BY interaction_count DESC
LIMIT 10
""")
stats['top_files'] = [
{'file': row['file_path'], 'count': row['interaction_count']}
for row in cursor.fetchall()
]
# Top tools
cursor = self.conn.execute("""
SELECT tool_name, SUM(usage_count) as total_usage
FROM tool_usage
GROUP BY tool_name
ORDER BY total_usage DESC
LIMIT 10
""")
stats['top_tools'] = [
{'tool': row['tool_name'], 'count': row['total_usage']}
for row in cursor.fetchall()
]
return stats
def close(self):
"""Close database connection"""
if self.conn:
self.conn.close()
@click.command()
@click.option('--project-name', default='annex', help='Project name to process')
@click.option('--db-path', type=click.Path(), default='.claude/skills/cc-insights/.processed/conversations.db',
help='SQLite database path')
@click.option('--reindex', is_flag=True, help='Reprocess all conversations (ignore cache)')
@click.option('--verbose', is_flag=True, help='Show detailed processing logs')
@click.option('--stats', is_flag=True, help='Show statistics after processing')
def main(project_name: str, db_path: str, reindex: bool, verbose: bool, stats: bool):
"""Process Claude Code conversations and store metadata"""
db_path = Path(db_path)
processor = ConversationProcessor(db_path, verbose=verbose)
try:
# Process conversations
count = processor.process_project(project_name, reindex=reindex)
print(f"\n✓ Processed {count} conversations")
if stats:
print("\n=== Statistics ===")
stats_data = processor.get_stats()
print(f"Total conversations: {stats_data['total_conversations']}")
print(f"Total messages: {stats_data['total_messages']}")
print(f"User messages: {stats_data['total_user_messages']}")
print(f"Assistant messages: {stats_data['total_assistant_messages']}")
print(f"Date range: {stats_data['earliest_conversation']} to {stats_data['latest_conversation']}")
print("\nTop 10 Files:")
for item in stats_data['top_files']:
print(f" {item['file']}: {item['count']} interactions")
print("\nTop 10 Tools:")
for item in stats_data['top_tools']:
print(f" {item['tool']}: {item['count']} uses")
finally:
processor.close()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,509 @@
#!/usr/bin/env python3
"""
Insight Generator for Claude Code Insights
Analyzes conversation patterns and generates insight reports with
visualizations, metrics, and actionable recommendations.
"""
import sqlite3
import json
from pathlib import Path
from typing import List, Dict, Any, Optional, Tuple
from datetime import datetime, timedelta
from collections import Counter, defaultdict
import click
try:
from jinja2 import Template, Environment, FileSystemLoader
except ImportError:
print("Error: jinja2 not installed. Run: pip install jinja2")
exit(1)
class PatternDetector:
"""Detects patterns in conversation data"""
def __init__(self, db_path: Path, verbose: bool = False):
self.db_path = db_path
self.verbose = verbose
self.conn = sqlite3.connect(str(db_path))
self.conn.row_factory = sqlite3.Row
def _log(self, message: str):
"""Log if verbose mode is enabled"""
if self.verbose:
print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
def get_date_range_filter(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> Tuple[str, List]:
"""Build date range SQL filter"""
conditions = []
params = []
if date_from:
conditions.append("timestamp >= ?")
params.append(date_from)
if date_to:
conditions.append("timestamp <= ?")
params.append(date_to)
where_clause = " AND ".join(conditions) if conditions else "1=1"
return where_clause, params
def get_overview_metrics(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> Dict[str, Any]:
"""Get high-level overview metrics"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT
COUNT(*) as total_conversations,
SUM(message_count) as total_messages,
SUM(user_messages) as total_user_messages,
SUM(assistant_messages) as total_assistant_messages,
AVG(message_count) as avg_messages_per_conversation,
MIN(timestamp) as earliest_conversation,
MAX(timestamp) as latest_conversation,
COUNT(DISTINCT DATE(timestamp)) as active_days
FROM conversations
WHERE {where_clause}
""", params)
row = cursor.fetchone()
return {
'total_conversations': row['total_conversations'] or 0,
'total_messages': row['total_messages'] or 0,
'total_user_messages': row['total_user_messages'] or 0,
'total_assistant_messages': row['total_assistant_messages'] or 0,
'avg_messages_per_conversation': round(row['avg_messages_per_conversation'] or 0, 1),
'earliest_conversation': row['earliest_conversation'],
'latest_conversation': row['latest_conversation'],
'active_days': row['active_days'] or 0
}
def get_file_hotspots(self, date_from: Optional[str] = None, date_to: Optional[str] = None, limit: int = 20) -> List[Dict[str, Any]]:
"""Get most frequently modified files"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT
fi.file_path,
COUNT(DISTINCT fi.conversation_id) as conversation_count,
SUM(CASE WHEN fi.interaction_type = 'read' THEN 1 ELSE 0 END) as read_count,
SUM(CASE WHEN fi.interaction_type = 'write' THEN 1 ELSE 0 END) as write_count,
SUM(CASE WHEN fi.interaction_type = 'edit' THEN 1 ELSE 0 END) as edit_count
FROM file_interactions fi
JOIN conversations c ON fi.conversation_id = c.id
WHERE {where_clause}
GROUP BY fi.file_path
ORDER BY conversation_count DESC
LIMIT ?
""", params + [limit])
return [
{
'file_path': row['file_path'],
'conversation_count': row['conversation_count'],
'read_count': row['read_count'],
'write_count': row['write_count'],
'edit_count': row['edit_count'],
'total_interactions': row['read_count'] + row['write_count'] + row['edit_count']
}
for row in cursor.fetchall()
]
def get_tool_usage(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> List[Dict[str, Any]]:
"""Get tool usage statistics"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT
tu.tool_name,
COUNT(DISTINCT tu.conversation_id) as conversation_count,
SUM(tu.usage_count) as total_uses
FROM tool_usage tu
JOIN conversations c ON tu.conversation_id = c.id
WHERE {where_clause}
GROUP BY tu.tool_name
ORDER BY total_uses DESC
""", params)
return [
{
'tool_name': row['tool_name'],
'conversation_count': row['conversation_count'],
'total_uses': row['total_uses']
}
for row in cursor.fetchall()
]
def get_topic_clusters(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> List[Dict[str, Any]]:
"""Get most common topics"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT topics FROM conversations
WHERE {where_clause} AND topics IS NOT NULL
""", params)
topic_counter = Counter()
for row in cursor.fetchall():
topics = json.loads(row['topics'])
topic_counter.update(topics)
return [
{'topic': topic, 'count': count}
for topic, count in topic_counter.most_common(20)
]
def get_activity_timeline(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> Dict[str, int]:
"""Get conversation count by date"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT DATE(timestamp) as date, COUNT(*) as count
FROM conversations
WHERE {where_clause}
GROUP BY DATE(timestamp)
ORDER BY date
""", params)
return {row['date']: row['count'] for row in cursor.fetchall()}
def get_hourly_distribution(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> Dict[int, int]:
"""Get conversation distribution by hour of day"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT
CAST(strftime('%H', timestamp) AS INTEGER) as hour,
COUNT(*) as count
FROM conversations
WHERE {where_clause}
GROUP BY hour
ORDER BY hour
""", params)
return {row['hour']: row['count'] for row in cursor.fetchall()}
def get_weekday_distribution(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> Dict[str, int]:
"""Get conversation distribution by day of week"""
where_clause, params = self.get_date_range_filter(date_from, date_to)
cursor = self.conn.execute(f"""
SELECT
CASE CAST(strftime('%w', timestamp) AS INTEGER)
WHEN 0 THEN 'Sunday'
WHEN 1 THEN 'Monday'
WHEN 2 THEN 'Tuesday'
WHEN 3 THEN 'Wednesday'
WHEN 4 THEN 'Thursday'
WHEN 5 THEN 'Friday'
WHEN 6 THEN 'Saturday'
END as weekday,
COUNT(*) as count
FROM conversations
WHERE {where_clause}
GROUP BY weekday
""", params)
weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
result = {day: 0 for day in weekday_order}
for row in cursor.fetchall():
result[row['weekday']] = row['count']
return result
def close(self):
"""Close database connection"""
if self.conn:
self.conn.close()
class InsightGenerator:
"""Generates insight reports from pattern data"""
def __init__(self, db_path: Path, templates_dir: Path, verbose: bool = False):
self.db_path = db_path
self.templates_dir = templates_dir
self.verbose = verbose
self.detector = PatternDetector(db_path, verbose=verbose)
# Setup Jinja2 environment
if templates_dir.exists():
self.jinja_env = Environment(loader=FileSystemLoader(str(templates_dir)))
else:
self.jinja_env = None
def _log(self, message: str):
"""Log if verbose mode is enabled"""
if self.verbose:
print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
def _create_ascii_bar_chart(self, data: Dict[str, int], max_width: int = 50) -> str:
"""Create ASCII bar chart"""
if not data:
return "No data"
max_value = max(data.values())
lines = []
for label, value in data.items():
bar_length = int((value / max_value) * max_width) if max_value > 0 else 0
bar = "" * bar_length
lines.append(f"{label:15} {bar} {value}")
return "\n".join(lines)
def _create_sparkline(self, values: List[int]) -> str:
"""Create sparkline chart"""
if not values:
return ""
chars = "▁▂▃▄▅▆▇█"
min_val = min(values)
max_val = max(values)
if max_val == min_val:
return chars[0] * len(values)
normalized = [(v - min_val) / (max_val - min_val) for v in values]
return "".join(chars[int(n * (len(chars) - 1))] for n in normalized)
def generate_weekly_report(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> str:
"""Generate weekly activity report"""
self._log("Generating weekly report...")
# Auto-calculate date range if not provided
if not date_from:
date_from = (datetime.now() - timedelta(days=7)).date().isoformat()
if not date_to:
date_to = datetime.now().date().isoformat()
# Gather data
overview = self.detector.get_overview_metrics(date_from, date_to)
file_hotspots = self.detector.get_file_hotspots(date_from, date_to, limit=10)
tool_usage = self.detector.get_tool_usage(date_from, date_to)
topics = self.detector.get_topic_clusters(date_from, date_to)
timeline = self.detector.get_activity_timeline(date_from, date_to)
weekday_dist = self.detector.get_weekday_distribution(date_from, date_to)
# Build report
report_lines = [
f"# Weekly Insights Report",
f"**Period:** {date_from} to {date_to}",
f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"",
"## Overview",
f"- **Total Conversations:** {overview['total_conversations']}",
f"- **Active Days:** {overview['active_days']}",
f"- **Total Messages:** {overview['total_messages']}",
f"- **Avg Messages/Conversation:** {overview['avg_messages_per_conversation']}",
"",
"## Activity Timeline",
"```",
self._create_ascii_bar_chart(timeline, max_width=40),
"```",
"",
"## Weekday Distribution",
"```",
self._create_ascii_bar_chart(weekday_dist, max_width=40),
"```",
""
]
if file_hotspots:
report_lines.extend([
"## File Hotspots (Top 10)",
""
])
for i, file in enumerate(file_hotspots, 1):
heat = "🔥" * min(3, (file['conversation_count'] + 2) // 3)
report_lines.append(
f"{i}. {heat} **{file['file_path']}** "
f"({file['conversation_count']} conversations, "
f"R:{file['read_count']} W:{file['write_count']} E:{file['edit_count']})"
)
report_lines.append("")
if tool_usage:
report_lines.extend([
"## Tool Usage",
""
])
tool_dict = {t['tool_name']: t['total_uses'] for t in tool_usage[:10]}
report_lines.append("```")
report_lines.append(self._create_ascii_bar_chart(tool_dict, max_width=40))
report_lines.append("```")
report_lines.append("")
if topics:
report_lines.extend([
"## Top Topics",
""
])
topic_dict = {t['topic']: t['count'] for t in topics[:15]}
report_lines.append("```")
report_lines.append(self._create_ascii_bar_chart(topic_dict, max_width=40))
report_lines.append("```")
report_lines.append("")
# Insights and recommendations
report_lines.extend([
"## Insights & Recommendations",
""
])
# File hotspot insights
if file_hotspots and file_hotspots[0]['conversation_count'] >= 5:
top_file = file_hotspots[0]
report_lines.append(
f"- 🔥 **High Activity File:** `{top_file['file_path']}` was modified in "
f"{top_file['conversation_count']} conversations. Consider reviewing for refactoring opportunities."
)
# Topic insights
if topics and topics[0]['count'] >= 3:
top_topic = topics[0]
report_lines.append(
f"- 📌 **Trending Topic:** '{top_topic['topic']}' appeared in {top_topic['count']} conversations. "
f"This might warrant documentation or team knowledge sharing."
)
# Activity pattern insights
if overview['active_days'] < 3:
report_lines.append(
f"- 📅 **Low Activity:** Only {overview['active_days']} active days this week. "
f"Consider scheduling regular development sessions."
)
if not report_lines[-1]: # If no insights were added
report_lines.append("- No significant patterns detected this period.")
return "\n".join(report_lines)
def generate_file_heatmap_report(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> str:
"""Generate detailed file interaction heatmap"""
self._log("Generating file heatmap report...")
file_hotspots = self.detector.get_file_hotspots(date_from, date_to, limit=50)
report_lines = [
"# File Interaction Heatmap",
f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"",
"## File Hotspots",
""
]
if not file_hotspots:
report_lines.append("No file interactions found in the specified period.")
return "\n".join(report_lines)
for i, file in enumerate(file_hotspots, 1):
heat_level = min(5, (file['conversation_count'] + 1) // 2)
heat_emoji = "🔥" * heat_level
report_lines.extend([
f"### {i}. {heat_emoji} {file['file_path']}",
f"- **Conversations:** {file['conversation_count']}",
f"- **Reads:** {file['read_count']}",
f"- **Writes:** {file['write_count']}",
f"- **Edits:** {file['edit_count']}",
f"- **Total Interactions:** {file['total_interactions']}",
""
])
return "\n".join(report_lines)
def generate_tool_usage_report(self, date_from: Optional[str] = None, date_to: Optional[str] = None) -> str:
"""Generate tool usage analytics report"""
self._log("Generating tool usage report...")
tool_usage = self.detector.get_tool_usage(date_from, date_to)
report_lines = [
"# Tool Usage Analytics",
f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"",
"## Tool Statistics",
""
]
if not tool_usage:
report_lines.append("No tool usage data found.")
return "\n".join(report_lines)
total_uses = sum(t['total_uses'] for t in tool_usage)
for i, tool in enumerate(tool_usage, 1):
percentage = (tool['total_uses'] / total_uses * 100) if total_uses > 0 else 0
report_lines.extend([
f"### {i}. {tool['tool_name']}",
f"- **Total Uses:** {tool['total_uses']}",
f"- **Used in Conversations:** {tool['conversation_count']}",
f"- **Percentage of Total:** {percentage:.1f}%",
""
])
return "\n".join(report_lines)
def close(self):
"""Close connections"""
self.detector.close()
@click.command()
@click.argument('report_type', type=click.Choice(['weekly', 'file-heatmap', 'tool-usage', 'custom']))
@click.option('--db-path', type=click.Path(), default='.claude/skills/cc-insights/.processed/conversations.db',
help='SQLite database path')
@click.option('--templates-dir', type=click.Path(), default='.claude/skills/cc-insights/templates',
help='Templates directory')
@click.option('--date-from', type=str, help='Start date (ISO format)')
@click.option('--date-to', type=str, help='End date (ISO format)')
@click.option('--output', type=click.Path(), help='Save to file (default: stdout)')
@click.option('--verbose', is_flag=True, help='Show detailed logs')
def main(report_type: str, db_path: str, templates_dir: str, date_from: Optional[str],
date_to: Optional[str], output: Optional[str], verbose: bool):
"""Generate insight reports from conversation data
Report types:
weekly - Weekly activity summary with metrics
file-heatmap - File modification heatmap
tool-usage - Tool usage analytics
custom - Custom report from template
"""
db_path = Path(db_path)
templates_dir = Path(templates_dir)
if not db_path.exists():
print(f"Error: Database not found at {db_path}")
exit(1)
generator = InsightGenerator(db_path, templates_dir, verbose=verbose)
try:
# Generate report based on type
if report_type == 'weekly':
report = generator.generate_weekly_report(date_from, date_to)
elif report_type == 'file-heatmap':
report = generator.generate_file_heatmap_report(date_from, date_to)
elif report_type == 'tool-usage':
report = generator.generate_tool_usage_report(date_from, date_to)
else:
print("Custom templates not yet implemented")
exit(1)
# Output report
if output:
Path(output).write_text(report)
print(f"✓ Report saved to {output}")
else:
print(report)
finally:
generator.close()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,298 @@
#!/usr/bin/env python3
"""
RAG Indexer for Claude Code Insights
Builds vector embeddings for semantic search using sentence-transformers
and ChromaDB. Supports incremental indexing and efficient similarity search.
"""
import sqlite3
import json
from pathlib import Path
from typing import List, Dict, Any, Optional
from datetime import datetime
import click
try:
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
except ImportError as e:
print(f"Error: Required packages not installed. Run: pip install sentence-transformers chromadb")
print(f"Missing: {e}")
exit(1)
class RAGIndexer:
"""Builds and manages vector embeddings for conversations"""
def __init__(self, db_path: Path, embeddings_dir: Path, model_name: str = "all-MiniLM-L6-v2", verbose: bool = False):
self.db_path = db_path
self.embeddings_dir = embeddings_dir
self.model_name = model_name
self.verbose = verbose
# Initialize sentence transformer model
self._log("Loading embedding model...")
self.model = SentenceTransformer(model_name)
self._log(f"✓ Loaded {model_name}")
# Initialize ChromaDB
self.embeddings_dir.mkdir(parents=True, exist_ok=True)
self.chroma_client = chromadb.PersistentClient(
path=str(self.embeddings_dir),
settings=Settings(anonymized_telemetry=False)
)
# Get or create collection
self.collection = self.chroma_client.get_or_create_collection(
name="conversations",
metadata={"hnsw:space": "cosine"} # Use cosine similarity
)
# Connect to SQLite
self.conn = sqlite3.connect(str(self.db_path))
self.conn.row_factory = sqlite3.Row
def _log(self, message: str):
"""Log if verbose mode is enabled"""
if self.verbose:
print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
def _get_indexed_conversation_ids(self) -> set:
"""Get set of conversation IDs already indexed"""
try:
results = self.collection.get(include=[])
return set(results['ids'])
except Exception:
return set()
def _fetch_conversations_to_index(self, rebuild: bool = False) -> List[Dict[str, Any]]:
"""Fetch conversations that need indexing"""
if rebuild:
# Rebuild: get all conversations
cursor = self.conn.execute("""
SELECT id, first_user_message, last_assistant_message, topics,
files_read, files_written, files_edited, timestamp
FROM conversations
ORDER BY timestamp DESC
""")
else:
# Incremental: only get conversations not yet indexed
indexed_ids = self._get_indexed_conversation_ids()
if not indexed_ids:
# Nothing indexed yet, get all
cursor = self.conn.execute("""
SELECT id, first_user_message, last_assistant_message, topics,
files_read, files_written, files_edited, timestamp
FROM conversations
ORDER BY timestamp DESC
""")
else:
# Get conversations not in indexed set
placeholders = ','.join('?' * len(indexed_ids))
cursor = self.conn.execute(f"""
SELECT id, first_user_message, last_assistant_message, topics,
files_read, files_written, files_edited, timestamp
FROM conversations
WHERE id NOT IN ({placeholders})
ORDER BY timestamp DESC
""", tuple(indexed_ids))
conversations = []
for row in cursor.fetchall():
conversations.append({
'id': row['id'],
'first_user_message': row['first_user_message'] or "",
'last_assistant_message': row['last_assistant_message'] or "",
'topics': json.loads(row['topics']) if row['topics'] else [],
'files_read': json.loads(row['files_read']) if row['files_read'] else [],
'files_written': json.loads(row['files_written']) if row['files_written'] else [],
'files_edited': json.loads(row['files_edited']) if row['files_edited'] else [],
'timestamp': row['timestamp']
})
return conversations
def _create_document_text(self, conversation: Dict[str, Any]) -> str:
"""Create text document for embedding"""
# Combine relevant fields into searchable text
parts = []
if conversation['first_user_message']:
parts.append(f"User: {conversation['first_user_message']}")
if conversation['last_assistant_message']:
parts.append(f"Assistant: {conversation['last_assistant_message']}")
if conversation['topics']:
parts.append(f"Topics: {', '.join(conversation['topics'])}")
all_files = conversation['files_read'] + conversation['files_written'] + conversation['files_edited']
if all_files:
parts.append(f"Files: {', '.join(all_files)}")
return "\n\n".join(parts)
def _create_metadata(self, conversation: Dict[str, Any]) -> Dict[str, Any]:
"""Create metadata for ChromaDB"""
return {
'timestamp': conversation['timestamp'],
'topics': json.dumps(conversation['topics']),
'files_read': json.dumps(conversation['files_read']),
'files_written': json.dumps(conversation['files_written']),
'files_edited': json.dumps(conversation['files_edited']),
}
def index_conversations(self, rebuild: bool = False, batch_size: int = 32) -> int:
"""Index conversations for semantic search"""
if rebuild:
self._log("Rebuilding entire index...")
# Clear existing collection
self.chroma_client.delete_collection("conversations")
self.collection = self.chroma_client.create_collection(
name="conversations",
metadata={"hnsw:space": "cosine"}
)
else:
self._log("Incremental indexing...")
# Fetch conversations to index
conversations = self._fetch_conversations_to_index(rebuild)
if not conversations:
self._log("No conversations to index")
return 0
self._log(f"Indexing {len(conversations)} conversations...")
# Process in batches
indexed_count = 0
for i in range(0, len(conversations), batch_size):
batch = conversations[i:i + batch_size]
# Prepare batch data
ids = []
documents = []
metadatas = []
for conv in batch:
ids.append(conv['id'])
documents.append(self._create_document_text(conv))
metadatas.append(self._create_metadata(conv))
# Generate embeddings
embeddings = self.model.encode(documents, show_progress_bar=self.verbose)
# Add to ChromaDB
self.collection.add(
ids=ids,
documents=documents,
embeddings=embeddings.tolist(),
metadatas=metadatas
)
indexed_count += len(batch)
self._log(f"Indexed {indexed_count}/{len(conversations)} conversations")
self._log(f"✓ Indexing complete: {indexed_count} conversations")
return indexed_count
def search(self, query: str, n_results: int = 10, filters: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]:
"""Search conversations by semantic similarity"""
# Generate query embedding
query_embedding = self.model.encode([query])[0]
# Search in ChromaDB
results = self.collection.query(
query_embeddings=[query_embedding.tolist()],
n_results=n_results,
where=filters if filters else None
)
# Format results
formatted_results = []
for i in range(len(results['ids'][0])):
formatted_results.append({
'id': results['ids'][0][i],
'distance': results['distances'][0][i],
'similarity': 1 - results['distances'][0][i], # Convert distance to similarity
'document': results['documents'][0][i],
'metadata': results['metadatas'][0][i] if results['metadatas'] else {}
})
return formatted_results
def get_stats(self) -> Dict[str, Any]:
"""Get indexing statistics"""
try:
count = self.collection.count()
return {
'total_indexed': count,
'model': self.model_name,
'collection_name': self.collection.name,
'embedding_dimension': self.model.get_sentence_embedding_dimension()
}
except Exception as e:
return {
'error': str(e)
}
def close(self):
"""Close connections"""
if self.conn:
self.conn.close()
@click.command()
@click.option('--db-path', type=click.Path(), default='.claude/skills/cc-insights/.processed/conversations.db',
help='SQLite database path')
@click.option('--embeddings-dir', type=click.Path(), default='.claude/skills/cc-insights/.processed/embeddings',
help='ChromaDB embeddings directory')
@click.option('--model', default='all-MiniLM-L6-v2', help='Sentence transformer model name')
@click.option('--rebuild', is_flag=True, help='Rebuild entire index (delete and recreate)')
@click.option('--batch-size', default=32, help='Batch size for embedding generation')
@click.option('--verbose', is_flag=True, help='Show detailed logs')
@click.option('--stats', is_flag=True, help='Show statistics after indexing')
@click.option('--test-search', type=str, help='Test search with query')
def main(db_path: str, embeddings_dir: str, model: str, rebuild: bool, batch_size: int, verbose: bool, stats: bool, test_search: Optional[str]):
"""Build vector embeddings for semantic search"""
db_path = Path(db_path)
embeddings_dir = Path(embeddings_dir)
if not db_path.exists():
print(f"Error: Database not found at {db_path}")
print("Run conversation-processor.py first to process conversations")
exit(1)
indexer = RAGIndexer(db_path, embeddings_dir, model, verbose=verbose)
try:
# Index conversations
count = indexer.index_conversations(rebuild=rebuild, batch_size=batch_size)
print(f"\n✓ Indexed {count} conversations")
if stats:
print("\n=== Indexing Statistics ===")
stats_data = indexer.get_stats()
for key, value in stats_data.items():
print(f"{key}: {value}")
if test_search:
print(f"\n=== Test Search: '{test_search}' ===")
results = indexer.search(test_search, n_results=5)
if not results:
print("No results found")
else:
for i, result in enumerate(results, 1):
print(f"\n{i}. [Similarity: {result['similarity']:.3f}] {result['id']}")
print(f" {result['document'][:200]}...")
finally:
indexer.close()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,384 @@
#!/usr/bin/env python3
"""
Search Interface for Claude Code Insights
Provides unified search across conversations using semantic (RAG) and keyword search.
Supports filtering by dates, files, and output formatting.
"""
import sqlite3
import json
from pathlib import Path
from typing import List, Dict, Any, Optional
from datetime import datetime
import click
try:
from rag_indexer import RAGIndexer
except ImportError:
print("Error: Cannot import rag_indexer. Ensure it's in the same directory.")
exit(1)
class ConversationSearch:
"""Unified search interface for conversations"""
def __init__(self, db_path: Path, embeddings_dir: Path, verbose: bool = False):
self.db_path = db_path
self.embeddings_dir = embeddings_dir
self.verbose = verbose
# Initialize RAG indexer for semantic search
self.indexer = RAGIndexer(db_path, embeddings_dir, verbose=verbose)
# Separate SQLite connection for metadata queries
self.conn = sqlite3.connect(str(db_path))
self.conn.row_factory = sqlite3.Row
def _log(self, message: str):
"""Log if verbose mode is enabled"""
if self.verbose:
print(f"[{datetime.now().strftime('%H:%M:%S')}] {message}")
def _get_conversation_details(self, conversation_id: str) -> Optional[Dict[str, Any]]:
"""Get full conversation details from SQLite"""
cursor = self.conn.execute("""
SELECT * FROM conversations WHERE id = ?
""", (conversation_id,))
row = cursor.fetchone()
if not row:
return None
return {
'id': row['id'],
'timestamp': row['timestamp'],
'message_count': row['message_count'],
'user_messages': row['user_messages'],
'assistant_messages': row['assistant_messages'],
'files_read': json.loads(row['files_read']) if row['files_read'] else [],
'files_written': json.loads(row['files_written']) if row['files_written'] else [],
'files_edited': json.loads(row['files_edited']) if row['files_edited'] else [],
'tools_used': json.loads(row['tools_used']) if row['tools_used'] else [],
'topics': json.loads(row['topics']) if row['topics'] else [],
'first_user_message': row['first_user_message'],
'last_assistant_message': row['last_assistant_message']
}
def semantic_search(
self,
query: str,
limit: int = 10,
date_from: Optional[str] = None,
date_to: Optional[str] = None,
file_pattern: Optional[str] = None
) -> List[Dict[str, Any]]:
"""Perform RAG-based semantic search"""
self._log(f"Semantic search: '{query}'")
# TODO: Add ChromaDB filters for dates/files when supported
results = self.indexer.search(query, n_results=limit * 2) # Get extra for filtering
# Enrich with full conversation details
enriched_results = []
for result in results:
details = self._get_conversation_details(result['id'])
if details:
# Apply post-search filters
if date_from and details['timestamp'] < date_from:
continue
if date_to and details['timestamp'] > date_to:
continue
if file_pattern:
all_files = details['files_read'] + details['files_written'] + details['files_edited']
if not any(file_pattern in f for f in all_files):
continue
enriched_results.append({
**result,
**details
})
if len(enriched_results) >= limit:
break
return enriched_results
def keyword_search(
self,
query: str,
limit: int = 10,
date_from: Optional[str] = None,
date_to: Optional[str] = None,
file_pattern: Optional[str] = None
) -> List[Dict[str, Any]]:
"""Perform SQL-based keyword search"""
self._log(f"Keyword search: '{query}'")
# Build SQL query
conditions = [
"(first_user_message LIKE ? OR last_assistant_message LIKE ? OR topics LIKE ?)"
]
params = [f"%{query}%", f"%{query}%", f"%{query}%"]
if date_from:
conditions.append("timestamp >= ?")
params.append(date_from)
if date_to:
conditions.append("timestamp <= ?")
params.append(date_to)
if file_pattern:
conditions.append(
"(files_read LIKE ? OR files_written LIKE ? OR files_edited LIKE ?)"
)
params.extend([f"%{file_pattern}%"] * 3)
where_clause = " AND ".join(conditions)
cursor = self.conn.execute(f"""
SELECT * FROM conversations
WHERE {where_clause}
ORDER BY timestamp DESC
LIMIT ?
""", params + [limit])
results = []
for row in cursor.fetchall():
results.append({
'id': row['id'],
'timestamp': row['timestamp'],
'message_count': row['message_count'],
'user_messages': row['user_messages'],
'assistant_messages': row['assistant_messages'],
'files_read': json.loads(row['files_read']) if row['files_read'] else [],
'files_written': json.loads(row['files_written']) if row['files_written'] else [],
'files_edited': json.loads(row['files_edited']) if row['files_edited'] else [],
'tools_used': json.loads(row['tools_used']) if row['tools_used'] else [],
'topics': json.loads(row['topics']) if row['topics'] else [],
'first_user_message': row['first_user_message'],
'last_assistant_message': row['last_assistant_message']
})
return results
def search_by_file(self, file_pattern: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Find all conversations that touched specific files"""
self._log(f"File search: '{file_pattern}'")
cursor = self.conn.execute("""
SELECT DISTINCT c.*
FROM conversations c
JOIN file_interactions fi ON c.id = fi.conversation_id
WHERE fi.file_path LIKE ?
ORDER BY c.timestamp DESC
LIMIT ?
""", (f"%{file_pattern}%", limit))
results = []
for row in cursor.fetchall():
results.append({
'id': row['id'],
'timestamp': row['timestamp'],
'message_count': row['message_count'],
'files_read': json.loads(row['files_read']) if row['files_read'] else [],
'files_written': json.loads(row['files_written']) if row['files_written'] else [],
'files_edited': json.loads(row['files_edited']) if row['files_edited'] else [],
'tools_used': json.loads(row['tools_used']) if row['tools_used'] else [],
'topics': json.loads(row['topics']) if row['topics'] else [],
'first_user_message': row['first_user_message']
})
return results
def search_by_tool(self, tool_name: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Find conversations using specific tools"""
self._log(f"Tool search: '{tool_name}'")
cursor = self.conn.execute("""
SELECT DISTINCT c.*
FROM conversations c
JOIN tool_usage tu ON c.id = tu.conversation_id
WHERE tu.tool_name LIKE ?
ORDER BY c.timestamp DESC
LIMIT ?
""", (f"%{tool_name}%", limit))
results = []
for row in cursor.fetchall():
results.append({
'id': row['id'],
'timestamp': row['timestamp'],
'message_count': row['message_count'],
'tools_used': json.loads(row['tools_used']) if row['tools_used'] else [],
'topics': json.loads(row['topics']) if row['topics'] else [],
'first_user_message': row['first_user_message']
})
return results
def format_results(self, results: List[Dict[str, Any]], format: str = 'text') -> str:
"""Format search results"""
if format == 'json':
return json.dumps(results, indent=2)
elif format == 'markdown':
output = [f"# Search Results ({len(results)} found)\n"]
for i, result in enumerate(results, 1):
timestamp = datetime.fromisoformat(result['timestamp']).strftime('%b %d, %Y %H:%M')
similarity = f"[Similarity: {result['similarity']:.3f}] " if 'similarity' in result else ""
output.append(f"## {i}. {similarity}{result['id']}")
output.append(f"**Date:** {timestamp}")
output.append(f"**Messages:** {result.get('message_count', 'N/A')}")
if result.get('topics'):
output.append(f"**Topics:** {', '.join(result['topics'])}")
all_files = (result.get('files_read', []) +
result.get('files_written', []) +
result.get('files_edited', []))
if all_files:
output.append(f"**Files:** {', '.join(all_files[:5])}")
if len(all_files) > 5:
output.append(f" _(and {len(all_files) - 5} more)_")
if result.get('tools_used'):
output.append(f"**Tools:** {', '.join(result['tools_used'][:5])}")
if result.get('first_user_message'):
msg = result['first_user_message'][:200]
output.append(f"\n**Snippet:** {msg}...")
output.append("")
return "\n".join(output)
else: # text format
output = [f"\nFound {len(results)} conversations:\n"]
for i, result in enumerate(results, 1):
timestamp = datetime.fromisoformat(result['timestamp']).strftime('%b %d, %Y %H:%M')
similarity = f"[Similarity: {result['similarity']:.3f}] " if 'similarity' in result else ""
output.append(f"{i}. {similarity}{result['id']}")
output.append(f" Date: {timestamp}")
output.append(f" Messages: {result.get('message_count', 'N/A')}")
if result.get('topics'):
output.append(f" Topics: {', '.join(result['topics'][:3])}")
all_files = (result.get('files_read', []) +
result.get('files_written', []) +
result.get('files_edited', []))
if all_files:
output.append(f" Files: {', '.join(all_files[:3])}")
if result.get('first_user_message'):
msg = result['first_user_message'][:150].replace('\n', ' ')
output.append(f" Preview: {msg}...")
output.append("")
return "\n".join(output)
def close(self):
"""Close connections"""
self.indexer.close()
if self.conn:
self.conn.close()
@click.command()
@click.argument('query', required=False)
@click.option('--db-path', type=click.Path(), default='.claude/skills/cc-insights/.processed/conversations.db',
help='SQLite database path')
@click.option('--embeddings-dir', type=click.Path(), default='.claude/skills/cc-insights/.processed/embeddings',
help='ChromaDB embeddings directory')
@click.option('--semantic/--keyword', default=True, help='Use semantic (RAG) or keyword search')
@click.option('--file', type=str, help='Filter by file pattern')
@click.option('--tool', type=str, help='Search by tool name')
@click.option('--date-from', type=str, help='Start date (ISO format)')
@click.option('--date-to', type=str, help='End date (ISO format)')
@click.option('--limit', default=10, help='Maximum results')
@click.option('--format', type=click.Choice(['text', 'json', 'markdown']), default='text', help='Output format')
@click.option('--verbose', is_flag=True, help='Show detailed logs')
def main(query: Optional[str], db_path: str, embeddings_dir: str, semantic: bool, file: Optional[str],
tool: Optional[str], date_from: Optional[str], date_to: Optional[str], limit: int, format: str, verbose: bool):
"""Search Claude Code conversations
Examples:
# Semantic search
python search-conversations.py "authentication bugs"
# Keyword search
python search-conversations.py "React optimization" --keyword
# Filter by file
python search-conversations.py "testing" --file "src/components"
# Search by tool
python search-conversations.py --tool "Write"
# Date range
python search-conversations.py "refactoring" --date-from 2025-10-01
# JSON output
python search-conversations.py "deployment" --format json
"""
db_path = Path(db_path)
embeddings_dir = Path(embeddings_dir)
if not db_path.exists():
print(f"Error: Database not found at {db_path}")
print("Run conversation-processor.py first")
exit(1)
searcher = ConversationSearch(db_path, embeddings_dir, verbose=verbose)
try:
results = []
if tool:
# Search by tool
results = searcher.search_by_tool(tool, limit=limit)
elif file:
# Search by file
results = searcher.search_by_file(file, limit=limit)
elif query:
# Text search
if semantic:
results = searcher.semantic_search(
query,
limit=limit,
date_from=date_from,
date_to=date_to,
file_pattern=file
)
else:
results = searcher.keyword_search(
query,
limit=limit,
date_from=date_from,
date_to=date_to,
file_pattern=file
)
else:
print("Error: Provide a query, --file, or --tool option")
exit(1)
# Format and output
output = searcher.format_results(results, format=format)
print(output)
finally:
searcher.close()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,32 @@
# Weekly Activity Summary
**Period:** {{ date_from }} to {{ date_to }}
**Generated:** {{ generation_date }}
## Overview
- **Total Conversations:** {{ total_conversations }}
- **Active Days:** {{ active_days }}
- **Total Messages:** {{ total_messages }}
- **Average Messages per Conversation:** {{ avg_messages }}
## Activity Timeline
{{ activity_timeline }}
## Top Files Modified
{% for file in top_files %}
{{ loop.index }}. {{ file.heat_emoji }} **{{ file.path }}**
- Conversations: {{ file.count }}
- Interactions: Read {{ file.read }}, Write {{ file.write }}, Edit {{ file.edit }}
{% endfor %}
## Tool Usage
{% for tool in top_tools %}
{{ loop.index }}. **{{ tool.name }}**: {{ tool.count }} uses ({{ tool.percentage }}%)
{% endfor %}
## Topics
{% for topic in top_topics %}
- {{ topic.name }}: {{ topic.count }} mentions
{% endfor %}
## Insights & Recommendations
{{ insights }}