# cc-insights: Claude Code Conversation Insights Automatically process, search, and analyze your Claude Code conversation history using RAG-powered semantic search and intelligent pattern detection. ## Overview This skill transforms your Claude Code conversations into actionable insights without any manual effort. It automatically processes conversations stored in `~/.claude/projects/`, builds a searchable knowledge base with semantic understanding, and generates insightful reports about your development patterns. ### Key Features - 🔍 **RAG-Powered Semantic Search**: Find conversations by meaning, not just keywords - 📊 **Automatic Insight Reports**: Detect patterns, file hotspots, and tool usage analytics - 📈 **Activity Trends**: Understand development patterns over time - 💡 **Knowledge Extraction**: Surface recurring topics and solutions - 🎯 **Zero Manual Effort**: Fully automatic processing of existing conversations - 🚀 **Fast Performance**: <1s search, <10s report generation ## Quick Start ### 1. Installation ```bash # Navigate to the skill directory cd .claude/skills/cc-insights # Install Python dependencies pip install -r requirements.txt # Verify installation python scripts/conversation-processor.py --help ``` ### 2. Initial Setup Process your existing conversations: ```bash # Process all conversations for current project python scripts/conversation-processor.py --project-name annex --verbose --stats # Build semantic search index python scripts/rag-indexer.py --verbose --stats ``` This one-time setup will: - Parse all JSONL files from `~/.claude/projects/` - Extract metadata (files, tools, topics, timestamps) - Build SQLite database for fast queries - Generate vector embeddings for semantic search - Create ChromaDB index **Time**: ~1-2 minutes for 100 conversations ### 3. Search Conversations ```bash # Semantic search (understands meaning) python scripts/search-conversations.py "fixing authentication bugs" # Search by file python scripts/search-conversations.py --file "src/auth/token.ts" # Search by tool python scripts/search-conversations.py --tool "Write" # Keyword search with date filter python scripts/search-conversations.py "refactoring" --keyword --date-from 2025-10-01 ``` ### 4. Generate Insights ```bash # Weekly activity report python scripts/insight-generator.py weekly --verbose # File heatmap (most modified files) python scripts/insight-generator.py file-heatmap # Tool usage analytics python scripts/insight-generator.py tool-usage # Save report to file python scripts/insight-generator.py weekly --output weekly-report.md ``` ## Usage via Skill Once set up, you can interact with the skill naturally: ``` User: "Search conversations about React performance optimization" → Returns top semantic matches with context User: "Generate insights for the past week" → Creates comprehensive weekly report with metrics User: "Show me files I've modified most often" → Generates file heatmap with recommendations ``` ## Architecture ``` .claude/skills/cc-insights/ ├── SKILL.md # Skill definition for Claude ├── README.md # This file ├── requirements.txt # Python dependencies ├── .gitignore # Git ignore rules │ ├── scripts/ # Core functionality │ ├── conversation-processor.py # Parse JSONL, extract metadata │ ├── rag-indexer.py # Build vector embeddings │ ├── search-conversations.py # Search interface │ └── insight-generator.py # Report generation │ ├── templates/ # Report templates │ └── weekly-summary.md │ └── .processed/ # Generated data (gitignored) ├── conversations.db # SQLite metadata └── embeddings/ # ChromaDB vector store ├── chroma.sqlite3 └── [embedding data] ``` ## Scripts Reference ### conversation-processor.py Parse JSONL files and extract conversation metadata. **Usage:** ```bash python scripts/conversation-processor.py [OPTIONS] Options: --project-name TEXT Project to process (default: annex) --db-path PATH Database path --reindex Reprocess all (ignore cache) --verbose Show detailed logs --stats Display statistics after processing ``` **What it does:** - Scans `~/.claude/projects/[project]/*.jsonl` - Decodes base64-encoded conversation content - Extracts: messages, files, tools, topics, timestamps - Stores in SQLite with indexes for fast queries - Tracks processing state for incremental updates **Output:** - SQLite database at `.processed/conversations.db` - Processing state for incremental updates ### rag-indexer.py Build vector embeddings for semantic search. **Usage:** ```bash python scripts/rag-indexer.py [OPTIONS] Options: --db-path PATH Database path --embeddings-dir PATH ChromaDB directory --model TEXT Embedding model (default: all-MiniLM-L6-v2) --rebuild Rebuild entire index --batch-size INT Batch size (default: 32) --verbose Show detailed logs --stats Display statistics --test-search TEXT Test search with query ``` **What it does:** - Reads conversations from SQLite - Generates embeddings using sentence-transformers - Stores in ChromaDB for similarity search - Supports incremental indexing (only new conversations) **Models:** - `all-MiniLM-L6-v2` (default): Fast, good quality, 384 dimensions - `all-mpnet-base-v2`: Higher quality, slower, 768 dimensions ### search-conversations.py Search conversations with semantic + metadata filters. **Usage:** ```bash python scripts/search-conversations.py QUERY [OPTIONS] Options: --semantic/--keyword Semantic (RAG) or keyword search (default: semantic) --file TEXT Filter by file pattern --tool TEXT Search by tool name --date-from TEXT Start date (ISO format) --date-to TEXT End date (ISO format) --limit INT Max results (default: 10) --format TEXT Output: text|json|markdown (default: text) --verbose Show detailed logs ``` **Examples:** ```bash # Semantic search python scripts/search-conversations.py "authentication bugs" # Filter by file python scripts/search-conversations.py "React optimization" --file "src/components" # Search by tool python scripts/search-conversations.py --tool "Edit" # Date range python scripts/search-conversations.py "deployment" --date-from 2025-10-01 # JSON output for integration python scripts/search-conversations.py "testing" --format json > results.json ``` ### insight-generator.py Generate pattern-based reports and analytics. **Usage:** ```bash python scripts/insight-generator.py REPORT_TYPE [OPTIONS] Report Types: weekly Weekly activity summary file-heatmap File modification heatmap tool-usage Tool usage analytics Options: --date-from TEXT Start date (ISO format) --date-to TEXT End date (ISO format) --output PATH Save to file (default: stdout) --verbose Show detailed logs ``` **Examples:** ```bash # Weekly report (last 7 days) python scripts/insight-generator.py weekly # Custom date range python scripts/insight-generator.py weekly --date-from 2025-10-01 --date-to 2025-10-15 # File heatmap with output python scripts/insight-generator.py file-heatmap --output heatmap.md # Tool analytics python scripts/insight-generator.py tool-usage ``` ## Data Storage All processed data is stored locally in `.processed/` (gitignored): ### SQLite Database (`conversations.db`) **Tables:** - `conversations`: Main metadata (timestamps, messages, topics) - `file_interactions`: File-level interactions (read, write, edit) - `tool_usage`: Tool usage counts per conversation - `processing_state`: Tracks processed files for incremental updates **Indexes:** - `idx_timestamp`: Fast date-range queries - `idx_project`: Filter by project - `idx_file_path`: File-based searches - `idx_tool_name`: Tool usage queries ### ChromaDB Vector Store (`embeddings/`) **Contents:** - Vector embeddings (384-dimensional by default) - Document text for retrieval - Metadata for filtering - HNSW index for fast similarity search **Performance:** - <1 second for semantic search - Handles 10,000+ conversations efficiently - ~100MB per 1,000 conversations ## Performance | Operation | Time | Notes | |-----------|------|-------| | Initial processing (100 convs) | ~30s | One-time setup | | Initial indexing (100 convs) | ~60s | One-time setup | | Incremental processing | <5s | Only new conversations | | Semantic search | <1s | Top 10 results | | Keyword search | <0.1s | SQLite FTS | | Weekly report generation | <10s | Includes visualizations | ## Troubleshooting ### "Database not found" **Problem:** Scripts can't find `conversations.db` **Solution:** ```bash # Run processor first python scripts/conversation-processor.py --project-name annex --verbose ``` ### "No conversations found" **Problem:** Project name doesn't match or no JSONL files **Solution:** ```bash # Check project directories ls ~/.claude/projects/ # Use correct project name (may be encoded) python scripts/conversation-processor.py --project-name [actual-name] --verbose ``` ### "ImportError: sentence_transformers" **Problem:** Dependencies not installed **Solution:** ```bash # Install requirements pip install -r requirements.txt # Or individually pip install sentence-transformers chromadb jinja2 click python-dateutil ``` ### "Slow embedding generation" **Problem:** Large number of conversations **Solution:** ```bash # Use smaller batch size python scripts/rag-indexer.py --batch-size 16 # Or use faster model (lower quality) python scripts/rag-indexer.py --model all-MiniLM-L6-v2 ``` ### "Out of memory" **Problem:** Too many conversations processed at once **Solution:** ```bash # Smaller batch size python scripts/rag-indexer.py --batch-size 8 # Or process in chunks by date python scripts/conversation-processor.py --date-from 2025-10-01 --date-to 2025-10-15 ``` ## Incremental Updates The system automatically handles incremental updates: 1. **Conversation Processor**: Tracks file hashes in `processing_state` table - Only reprocesses changed files - Detects new JSONL files automatically 2. **RAG Indexer**: Checks ChromaDB for existing IDs - Only indexes new conversations - Skips already-embedded conversations **Recommended workflow:** ```bash # Daily/weekly: Run both for new conversations python scripts/conversation-processor.py --project-name annex python scripts/rag-indexer.py # Takes <5s if only a few new conversations ``` ## Integration Examples ### Search from command line ```bash # Quick search function in .bashrc or .zshrc cc-search() { python ~/.claude/skills/cc-insights/scripts/search-conversations.py "$@" } # Usage cc-search "authentication bugs" ``` ### Generate weekly report automatically ```bash # Add to crontab for weekly reports 0 9 * * MON cd ~/.claude/skills/cc-insights && python scripts/insight-generator.py weekly --output ~/reports/weekly-$(date +\%Y-\%m-\%d).md ``` ### Export data for external tools ```bash # Export to JSON python scripts/search-conversations.py "testing" --format json | jq # Export metadata sqlite3 .processed/conversations.db "SELECT * FROM conversations" -json > export.json ``` ## Privacy & Security - **Local-only**: All data stays on your machine - **No external APIs**: Embeddings generated locally - **Project-scoped**: Only accesses current project - **Gitignored**: `.processed/` excluded from version control - **Sensitive data**: Review before sharing reports (may contain secrets) ## Requirements ### Python Dependencies - `sentence-transformers>=2.2.0` - Semantic embeddings - `chromadb>=0.4.0` - Vector database - `jinja2>=3.1.0` - Template engine - `click>=8.1.0` - CLI framework - `python-dateutil>=2.8.0` - Date utilities ### System Requirements - Python 3.8+ - 500MB disk space (for 1,000 conversations) - 2GB RAM (for embedding generation) ## Limitations - **Read-only**: Analyzes existing conversations, doesn't modify them - **Single project**: Designed for per-project insights (not cross-project) - **Static analysis**: Analyzes saved conversations, not real-time - **Embedding quality**: Good but not GPT-4 level (local models) - **JSONL format**: Depends on Claude Code's internal storage format ## Future Enhancements Potential additions (not currently implemented): - [ ] Cross-project analytics dashboard - [ ] AI-powered summarization with LLM - [ ] Slack/Discord integration for weekly reports - [ ] Git commit correlation - [ ] VS Code extension - [ ] Web dashboard (Next.js) - [ ] Confluence/Notion export - [ ] Custom embedding models ## FAQ **Q: How often should I rebuild the index?** A: Never, unless changing models. Use incremental updates. **Q: Can I change the embedding model?** A: Yes, use `--model` flag with rag-indexer.py, then `--rebuild`. **Q: Does this work with incognito mode?** A: No, incognito conversations aren't saved to JSONL files. **Q: Can I share reports with my team?** A: Yes, but review for sensitive information first (API keys, secrets). **Q: What if Claude Code changes the JSONL format?** A: The processor may need updates. File an issue if parsing breaks. **Q: Can I delete old conversations?** A: Yes, remove JSONL files and run `--reindex` to rebuild. ## Contributing Contributions welcome! Areas to improve: - Additional report templates - Better pattern detection algorithms - Performance optimizations - Web dashboard implementation - Documentation improvements ## License MIT License - See repository root for details ## Support For issues or questions: 1. Check this README and SKILL.md 2. Review script `--help` output 3. Run with `--verbose` to see detailed logs 4. Check `.processed/logs/` if created 5. Open an issue in the repository --- **Built for Connor's annex project** *Zero-effort conversation intelligence*