Files
gh-cskiro-claudex-claude-co…/skills/cc-insights/README.md
2025-11-29 18:16:51 +08:00

14 KiB

cc-insights: Claude Code Conversation Insights

Automatically process, search, and analyze your Claude Code conversation history using RAG-powered semantic search and intelligent pattern detection.

Overview

This skill transforms your Claude Code conversations into actionable insights without any manual effort. It automatically processes conversations stored in ~/.claude/projects/, builds a searchable knowledge base with semantic understanding, and generates insightful reports about your development patterns.

Key Features

  • 🔍 RAG-Powered Semantic Search: Find conversations by meaning, not just keywords
  • 📊 Automatic Insight Reports: Detect patterns, file hotspots, and tool usage analytics
  • 📈 Activity Trends: Understand development patterns over time
  • 💡 Knowledge Extraction: Surface recurring topics and solutions
  • 🎯 Zero Manual Effort: Fully automatic processing of existing conversations
  • 🚀 Fast Performance: <1s search, <10s report generation

Quick Start

1. Installation

# Navigate to the skill directory
cd .claude/skills/cc-insights

# Install Python dependencies
pip install -r requirements.txt

# Verify installation
python scripts/conversation-processor.py --help

2. Initial Setup

Process your existing conversations:

# Process all conversations for current project
python scripts/conversation-processor.py --project-name annex --verbose --stats

# Build semantic search index
python scripts/rag-indexer.py --verbose --stats

This one-time setup will:

  • Parse all JSONL files from ~/.claude/projects/
  • Extract metadata (files, tools, topics, timestamps)
  • Build SQLite database for fast queries
  • Generate vector embeddings for semantic search
  • Create ChromaDB index

Time: ~1-2 minutes for 100 conversations

3. Search Conversations

# Semantic search (understands meaning)
python scripts/search-conversations.py "fixing authentication bugs"

# Search by file
python scripts/search-conversations.py --file "src/auth/token.ts"

# Search by tool
python scripts/search-conversations.py --tool "Write"

# Keyword search with date filter
python scripts/search-conversations.py "refactoring" --keyword --date-from 2025-10-01

4. Generate Insights

# Weekly activity report
python scripts/insight-generator.py weekly --verbose

# File heatmap (most modified files)
python scripts/insight-generator.py file-heatmap

# Tool usage analytics
python scripts/insight-generator.py tool-usage

# Save report to file
python scripts/insight-generator.py weekly --output weekly-report.md

Usage via Skill

Once set up, you can interact with the skill naturally:

User: "Search conversations about React performance optimization"
→ Returns top semantic matches with context

User: "Generate insights for the past week"
→ Creates comprehensive weekly report with metrics

User: "Show me files I've modified most often"
→ Generates file heatmap with recommendations

Architecture

.claude/skills/cc-insights/
├── SKILL.md                   # Skill definition for Claude
├── README.md                  # This file
├── requirements.txt           # Python dependencies
├── .gitignore                # Git ignore rules
│
├── scripts/                   # Core functionality
│   ├── conversation-processor.py   # Parse JSONL, extract metadata
│   ├── rag-indexer.py              # Build vector embeddings
│   ├── search-conversations.py     # Search interface
│   └── insight-generator.py        # Report generation
│
├── templates/                 # Report templates
│   └── weekly-summary.md
│
└── .processed/               # Generated data (gitignored)
    ├── conversations.db       # SQLite metadata
    └── embeddings/           # ChromaDB vector store
        ├── chroma.sqlite3
        └── [embedding data]

Scripts Reference

conversation-processor.py

Parse JSONL files and extract conversation metadata.

Usage:

python scripts/conversation-processor.py [OPTIONS]

Options:
  --project-name TEXT    Project to process (default: annex)
  --db-path PATH         Database path
  --reindex              Reprocess all (ignore cache)
  --verbose              Show detailed logs
  --stats                Display statistics after processing

What it does:

  • Scans ~/.claude/projects/[project]/*.jsonl
  • Decodes base64-encoded conversation content
  • Extracts: messages, files, tools, topics, timestamps
  • Stores in SQLite with indexes for fast queries
  • Tracks processing state for incremental updates

Output:

  • SQLite database at .processed/conversations.db
  • Processing state for incremental updates

rag-indexer.py

Build vector embeddings for semantic search.

Usage:

python scripts/rag-indexer.py [OPTIONS]

Options:
  --db-path PATH         Database path
  --embeddings-dir PATH  ChromaDB directory
  --model TEXT           Embedding model (default: all-MiniLM-L6-v2)
  --rebuild              Rebuild entire index
  --batch-size INT       Batch size (default: 32)
  --verbose              Show detailed logs
  --stats                Display statistics
  --test-search TEXT     Test search with query

What it does:

  • Reads conversations from SQLite
  • Generates embeddings using sentence-transformers
  • Stores in ChromaDB for similarity search
  • Supports incremental indexing (only new conversations)

Models:

  • all-MiniLM-L6-v2 (default): Fast, good quality, 384 dimensions
  • all-mpnet-base-v2: Higher quality, slower, 768 dimensions

search-conversations.py

Search conversations with semantic + metadata filters.

Usage:

python scripts/search-conversations.py QUERY [OPTIONS]

Options:
  --semantic/--keyword   Semantic (RAG) or keyword search (default: semantic)
  --file TEXT            Filter by file pattern
  --tool TEXT            Search by tool name
  --date-from TEXT       Start date (ISO format)
  --date-to TEXT         End date (ISO format)
  --limit INT            Max results (default: 10)
  --format TEXT          Output: text|json|markdown (default: text)
  --verbose              Show detailed logs

Examples:

# Semantic search
python scripts/search-conversations.py "authentication bugs"

# Filter by file
python scripts/search-conversations.py "React optimization" --file "src/components"

# Search by tool
python scripts/search-conversations.py --tool "Edit"

# Date range
python scripts/search-conversations.py "deployment" --date-from 2025-10-01

# JSON output for integration
python scripts/search-conversations.py "testing" --format json > results.json

insight-generator.py

Generate pattern-based reports and analytics.

Usage:

python scripts/insight-generator.py REPORT_TYPE [OPTIONS]

Report Types:
  weekly          Weekly activity summary
  file-heatmap    File modification heatmap
  tool-usage      Tool usage analytics

Options:
  --date-from TEXT       Start date (ISO format)
  --date-to TEXT         End date (ISO format)
  --output PATH          Save to file (default: stdout)
  --verbose              Show detailed logs

Examples:

# Weekly report (last 7 days)
python scripts/insight-generator.py weekly

# Custom date range
python scripts/insight-generator.py weekly --date-from 2025-10-01 --date-to 2025-10-15

# File heatmap with output
python scripts/insight-generator.py file-heatmap --output heatmap.md

# Tool analytics
python scripts/insight-generator.py tool-usage

Data Storage

All processed data is stored locally in .processed/ (gitignored):

SQLite Database (conversations.db)

Tables:

  • conversations: Main metadata (timestamps, messages, topics)
  • file_interactions: File-level interactions (read, write, edit)
  • tool_usage: Tool usage counts per conversation
  • processing_state: Tracks processed files for incremental updates

Indexes:

  • idx_timestamp: Fast date-range queries
  • idx_project: Filter by project
  • idx_file_path: File-based searches
  • idx_tool_name: Tool usage queries

ChromaDB Vector Store (embeddings/)

Contents:

  • Vector embeddings (384-dimensional by default)
  • Document text for retrieval
  • Metadata for filtering
  • HNSW index for fast similarity search

Performance:

  • <1 second for semantic search
  • Handles 10,000+ conversations efficiently
  • ~100MB per 1,000 conversations

Performance

Operation Time Notes
Initial processing (100 convs) ~30s One-time setup
Initial indexing (100 convs) ~60s One-time setup
Incremental processing <5s Only new conversations
Semantic search <1s Top 10 results
Keyword search <0.1s SQLite FTS
Weekly report generation <10s Includes visualizations

Troubleshooting

"Database not found"

Problem: Scripts can't find conversations.db

Solution:

# Run processor first
python scripts/conversation-processor.py --project-name annex --verbose

"No conversations found"

Problem: Project name doesn't match or no JSONL files

Solution:

# Check project directories
ls ~/.claude/projects/

# Use correct project name (may be encoded)
python scripts/conversation-processor.py --project-name [actual-name] --verbose

"ImportError: sentence_transformers"

Problem: Dependencies not installed

Solution:

# Install requirements
pip install -r requirements.txt

# Or individually
pip install sentence-transformers chromadb jinja2 click python-dateutil

"Slow embedding generation"

Problem: Large number of conversations

Solution:

# Use smaller batch size
python scripts/rag-indexer.py --batch-size 16

# Or use faster model (lower quality)
python scripts/rag-indexer.py --model all-MiniLM-L6-v2

"Out of memory"

Problem: Too many conversations processed at once

Solution:

# Smaller batch size
python scripts/rag-indexer.py --batch-size 8

# Or process in chunks by date
python scripts/conversation-processor.py --date-from 2025-10-01 --date-to 2025-10-15

Incremental Updates

The system automatically handles incremental updates:

  1. Conversation Processor: Tracks file hashes in processing_state table

    • Only reprocesses changed files
    • Detects new JSONL files automatically
  2. RAG Indexer: Checks ChromaDB for existing IDs

    • Only indexes new conversations
    • Skips already-embedded conversations

Recommended workflow:

# Daily/weekly: Run both for new conversations
python scripts/conversation-processor.py --project-name annex
python scripts/rag-indexer.py

# Takes <5s if only a few new conversations

Integration Examples

Search from command line

# Quick search function in .bashrc or .zshrc
cc-search() {
  python ~/.claude/skills/cc-insights/scripts/search-conversations.py "$@"
}

# Usage
cc-search "authentication bugs"

Generate weekly report automatically

# Add to crontab for weekly reports
0 9 * * MON cd ~/.claude/skills/cc-insights && python scripts/insight-generator.py weekly --output ~/reports/weekly-$(date +\%Y-\%m-\%d).md

Export data for external tools

# Export to JSON
python scripts/search-conversations.py "testing" --format json | jq

# Export metadata
sqlite3 .processed/conversations.db "SELECT * FROM conversations" -json > export.json

Privacy & Security

  • Local-only: All data stays on your machine
  • No external APIs: Embeddings generated locally
  • Project-scoped: Only accesses current project
  • Gitignored: .processed/ excluded from version control
  • Sensitive data: Review before sharing reports (may contain secrets)

Requirements

Python Dependencies

  • sentence-transformers>=2.2.0 - Semantic embeddings
  • chromadb>=0.4.0 - Vector database
  • jinja2>=3.1.0 - Template engine
  • click>=8.1.0 - CLI framework
  • python-dateutil>=2.8.0 - Date utilities

System Requirements

  • Python 3.8+
  • 500MB disk space (for 1,000 conversations)
  • 2GB RAM (for embedding generation)

Limitations

  • Read-only: Analyzes existing conversations, doesn't modify them
  • Single project: Designed for per-project insights (not cross-project)
  • Static analysis: Analyzes saved conversations, not real-time
  • Embedding quality: Good but not GPT-4 level (local models)
  • JSONL format: Depends on Claude Code's internal storage format

Future Enhancements

Potential additions (not currently implemented):

  • Cross-project analytics dashboard
  • AI-powered summarization with LLM
  • Slack/Discord integration for weekly reports
  • Git commit correlation
  • VS Code extension
  • Web dashboard (Next.js)
  • Confluence/Notion export
  • Custom embedding models

FAQ

Q: How often should I rebuild the index? A: Never, unless changing models. Use incremental updates.

Q: Can I change the embedding model? A: Yes, use --model flag with rag-indexer.py, then --rebuild.

Q: Does this work with incognito mode? A: No, incognito conversations aren't saved to JSONL files.

Q: Can I share reports with my team? A: Yes, but review for sensitive information first (API keys, secrets).

Q: What if Claude Code changes the JSONL format? A: The processor may need updates. File an issue if parsing breaks.

Q: Can I delete old conversations? A: Yes, remove JSONL files and run --reindex to rebuild.

Contributing

Contributions welcome! Areas to improve:

  • Additional report templates
  • Better pattern detection algorithms
  • Performance optimizations
  • Web dashboard implementation
  • Documentation improvements

License

MIT License - See repository root for details

Support

For issues or questions:

  1. Check this README and SKILL.md
  2. Review script --help output
  3. Run with --verbose to see detailed logs
  4. Check .processed/logs/ if created
  5. Open an issue in the repository

Built for Connor's annex project Zero-effort conversation intelligence