--- description: Conduct comprehensive codebase research using parallel sub-agents category: workflow tools: Read, Write, Grep, Glob, Task, TodoWrite, Bash model: inherit version: 1.0.0 --- # Research Codebase You are tasked with conducting comprehensive research across the codebase to answer user questions by spawning parallel sub-agents and synthesizing their findings. ## CRITICAL: YOUR ONLY JOB IS TO DOCUMENT AND EXPLAIN THE CODEBASE AS IT EXISTS TODAY - DO NOT suggest improvements or changes unless the user explicitly asks for them - DO NOT perform root cause analysis unless the user explicitly asks for them - DO NOT propose future enhancements unless the user explicitly asks for them - DO NOT critique the implementation or identify problems - DO NOT recommend refactoring, optimization, or architectural changes - ONLY describe what exists, where it exists, how it works, and how components interact - You are creating a technical map/documentation of the existing system ## Prerequisites Before executing, verify all required tools and systems: ```bash # 1. Validate thoughts system (REQUIRED) if [[ -f "scripts/validate-thoughts-setup.sh" ]]; then ./scripts/validate-thoughts-setup.sh || exit 1 else # Inline validation if script not found if [[ ! -d "thoughts/shared" ]]; then echo "❌ ERROR: Thoughts system not configured" echo "Run: ./scripts/humanlayer/init-project.sh . {project-name}" exit 1 fi fi # 2. Validate plugin scripts if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" ]]; then "${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" || exit 1 fi ``` ## Initial Setup When this command is invoked, respond with: ``` I'm ready to research the codebase. Please provide your research question or area of interest, and I'll analyze it thoroughly by exploring relevant components and connections. ``` Then wait for the user's research query. ## Steps to Follow After Receiving the Research Query ### Step 1: Read Any Directly Mentioned Files First - If the user mentions specific files (tickets, docs, JSON), read them FULLY first - **IMPORTANT**: Use the Read tool WITHOUT limit/offset parameters to read entire files - **CRITICAL**: Read these files yourself in the main context before spawning any sub-tasks - This ensures you have full context before decomposing the research ### Step 2: Analyze and Decompose the Research Question - Break down the user's query into composable research areas - Take time to think deeply about the underlying patterns, connections, and architectural implications the user might be seeking - Identify specific components, patterns, or concepts to investigate - Create a research plan using TodoWrite to track all subtasks - Consider which directories, files, or architectural patterns are relevant ### Step 3: Spawn Parallel Sub-Agent Tasks for Comprehensive Research Create multiple Task agents to research different aspects concurrently. We have specialized agents that know how to do specific research tasks: **For codebase research:** - Use the **codebase-locator** agent to find WHERE files and components live - Use the **codebase-analyzer** agent to understand HOW specific code works (without critiquing it) - Use the **codebase-pattern-finder** agent to find examples of existing patterns (without evaluating them) **IMPORTANT**: All agents are documentarians, not critics. They will describe what exists without suggesting improvements or identifying issues. **For thoughts directory (if using thoughts system):** - Use the **thoughts-locator** agent to discover what documents exist about the topic - Use the **thoughts-analyzer** agent to extract key insights from specific documents (only the most relevant ones) **For external research (only if user explicitly asks):** - Use the **external-research** agent for external documentation and resources - IF you use external research agents, instruct them to return LINKS with their findings, and INCLUDE those links in your final report **For Linear tickets (if relevant):** - Use the **linear-ticket-reader** agent to get full details of a specific ticket (if Linear MCP available) - Use the **linear-searcher** agent to find related tickets or historical context The key is to use these agents intelligently: - Start with locator agents to find what exists - Then use analyzer agents on the most promising findings to document how they work - Run multiple agents in parallel when they're searching for different things - Each agent knows its job - just tell it what you're looking for - Don't write detailed prompts about HOW to search - the agents already know - Remind agents they are documenting, not evaluating or improving **Example of spawning parallel research tasks:** ``` I'm going to spawn 3 parallel research tasks: Task 1 - Find WHERE components live: "Use codebase-locator to find all files related to [topic]. Focus on [specific directories if known]." Task 2 - Understand HOW it works: "Use codebase-analyzer to analyze [specific component] and document how it currently works. Include data flow and key integration points." Task 3 - Find existing patterns: "Use codebase-pattern-finder to find similar implementations of [pattern] in the codebase. Show concrete examples." ``` ### Step 4: Wait for All Sub-Agents to Complete and Synthesize Findings - **IMPORTANT**: Wait for ALL sub-agent tasks to complete before proceeding - Compile all sub-agent results (both codebase and thoughts findings if applicable) - Prioritize live codebase findings as primary source of truth - Use thoughts/ findings as supplementary historical context (if thoughts system is used) - Connect findings across different components - Document specific file paths and line numbers (format: `file.ext:line`) - Explain how components interact with each other - Include temporal context where relevant (e.g., "This was added in commit abc123") - Mark all research tasks as complete in TodoWrite ### Step 5: Gather Metadata for the Research Document Collect metadata for the research document: **If using thoughts system with metadata script:** - Run `hack/spec_metadata.sh` or equivalent to generate metadata - Metadata includes: date, researcher, git commit, branch, repository **If using simple approach:** - Get current date/time - Get git commit hash: `git rev-parse HEAD` - Get current branch: `git branch --show-current` - Get repository name from `.git/config` or working directory **Document Storage:** All research documents are stored in the **thoughts system** for persistence: **Required location:** `thoughts/shared/research/YYYY-MM-DD-{ticket}-{description}.md` **Why thoughts/shared/**: - ✅ Persisted across sessions (git-backed via HumanLayer) - ✅ Shared across worktrees - ✅ Synced via `humanlayer thoughts sync` - ✅ Team collaboration ready **Filename format:** - With ticket: `thoughts/shared/research/YYYY-MM-DD-PROJ-XXXX-description.md` - Without ticket: `thoughts/shared/research/YYYY-MM-DD-description.md` Replace `PROJ` with your ticket prefix from `.claude/config.json`. **Examples:** - `thoughts/shared/research/2025-01-08-PROJ-1478-parent-child-tracking.md` - `thoughts/shared/research/2025-01-08-authentication-flow.md` (no ticket) ### Step 6: Generate Research Document Create a structured research document with the following format: ```markdown --- date: YYYY-MM-DDTHH:MM:SS+TZ researcher: { your-name } git_commit: { commit-hash } branch: { branch-name } repository: { repo-name } topic: "{User's Research Question}" tags: [research, codebase, { component-names }] status: complete last_updated: YYYY-MM-DD last_updated_by: { your-name } --- # Research: {User's Research Question} **Date**: {date/time with timezone} **Researcher**: {your-name} **Git Commit**: {commit-hash} **Branch**: {branch-name} **Repository**: {repo-name} ## Research Question {Original user query, verbatim} ## Summary {High-level documentation of what you found. 2-3 paragraphs explaining the current state of the system in this area. Focus on WHAT EXISTS, not what should exist.} ## Detailed Findings ### {Component/Area 1} **What exists**: {Describe the current implementation} - File location: `path/to/file.ext:123` - Current behavior: {what it does} - Key functions/classes: {list with file:line references} **Connections**: {How this component integrates with others} - Calls: `other-component.ts:45` - {description} - Used by: `consumer.ts:67` - {description} **Implementation details**: {Technical specifics without evaluation} ### {Component/Area 2} {Same structure as above} ### {Component/Area N} {Continue for all major findings} ## Code References Quick reference of key files and their roles: - `path/to/file1.ext:123-145` - {What this code does} - `path/to/file2.ext:67` - {What this code does} - `path/to/file3.ext:200-250` - {What this code does} ## Architecture Documentation {Document the current architectural patterns, conventions, and design decisions observed in the code. This is descriptive, not prescriptive.} ### Current Patterns - **Pattern 1**: {How it's implemented in the codebase} - **Pattern 2**: {How it's implemented in the codebase} ### Data Flow {Document how data moves through the system in this area} ``` Component A → Component B → Component C {Describe what happens at each step} ``` ### Key Integrations {Document how different parts of the system connect} ## Historical Context (from thoughts/) {ONLY if using thoughts system} {Include insights from thoughts/ documents that provide context} - `thoughts/shared/research/previous-doc.md` - {Key decision or insight} - `thoughts/shared/plans/plan-123.md` - {Related implementation detail} ## Related Research {Links to other research documents that touch on related topics} - `research/YYYY-MM-DD-related-topic.md` - {How it relates} ## Open Questions {Areas that would benefit from further investigation - NOT problems to fix, just areas where understanding could be deepened} - {Question 1} - {Question 2} ``` ### Step 7: Add GitHub Permalinks (If Applicable) **If you're on the main/master branch OR if the commit is pushed:** Generate GitHub permalinks and replace file references: ``` https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{line} ``` For line ranges: ``` https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{start}-L{end} ``` **If working on a feature branch that's not pushed yet:** - Keep local file references: `path/to/file.ext:line` - Add note: "GitHub permalinks will be added once this branch is pushed" ### Step 8: Sync and Present Findings **If using thoughts system:** - Run `humanlayer thoughts sync` to sync the thoughts directory - This updates symlinks, creates searchable index, and commits to thoughts repo **If using simple approach:** - Just save the file to your research directory - Optionally commit to git **Present to user:** ```markdown ✅ Research complete! **Research document**: {file-path} ## Summary {2-3 sentence summary of key findings} ## Key Files {Top 3-5 most important file references} ## What I Found {Brief overview - save details for the document} --- ## 📊 Context Status Current usage: {X}% ({Y}K/{Z}K tokens) {If >60%}: ⚠️ **Recommendation**: Context is getting full. For best results in the planning phase, I recommend clearing context now. **Options**: 1. ✅ Clear context now (recommended) - Close this session and start fresh for planning 2. Create handoff to pause work 3. Continue anyway (may impact performance) **Why clear?** Fresh context ensures optimal AI performance for the planning phase, which will load additional files and research. {If <60%}: ✅ Context healthy. Ready to proceed to planning phase if needed. --- Would you like me to: 1. Dive deeper into any specific area? 2. Create an implementation plan based on this research? 3. Explore related topics? ``` ### Step 9: Handle Follow-Up Questions If the user has follow-up questions: 1. **DO NOT create a new research document** - append to the same one 2. **Update frontmatter fields:** - `last_updated`: {new date} - `last_updated_by`: {your name} - Add `last_updated_note`: "{Brief note about what was added}" 3. **Add new section to existing document:** ```markdown --- ## Follow-up Research: {Follow-up Question} **Date**: {date} **Updated by**: {your-name} ### Additional Findings {New research results using same structure as above} ``` 4. **Spawn new sub-agents as needed** for the follow-up research 5. **Re-sync** (if using thoughts system) ## Important Notes ### Proactive Context Management **Monitor Your Context Throughout Research**: - Check token usage after spawning parallel agents - After synthesis phase, check context again - **If context >60%**: Warn user and recommend handoff **Example Warning**: ``` ⚠️ Context Usage Alert: Currently at 65% (130K/200K tokens) Research is complete, but context is getting full. Before continuing to planning phase, I recommend creating a handoff to preserve this work and start fresh. Would you like me to: 1. Create a handoff now (recommended) 2. Continue and clear context manually 3. Proceed anyway (not recommended - may impact planning quality) **Why this matters**: The planning phase will load additional context. Starting fresh ensures optimal AI performance. ``` **When to Warn**: - After Step 7 (document generated) if context >60% - After Step 9 (follow-up complete) if context >70% - Anytime during research if context >80% **Educate the User**: - Explain WHY clearing context matters (performance, token efficiency) - Explain WHEN to clear (between phases) - Offer to create handoff yourself if `/create-handoff` command exists ### Parallel Execution - ALWAYS use parallel Task agents for efficiency - Don't wait for one agent to finish before spawning the next - Spawn all research tasks at once, then wait for all to complete ### Research Philosophy - Always perform fresh codebase research - never rely solely on existing docs - The `thoughts/` directory (if used) provides historical context, not primary source - Focus on concrete file paths and line numbers - make it easy to navigate - Research documents should be self-contained and understandable months later ### Sub-Agent Prompts - Be specific about what to search for - Specify directories to focus on when known - Make prompts focused on read-only documentation - Remind agents they are documentarians, not critics ### Cross-Component Understanding - Document how components interact, not just what they do individually - Trace data flow across boundaries - Note integration points and dependencies ### Temporal Context - Include when things were added/changed if relevant - Note deprecated patterns still in the codebase - Don't judge - just document the timeline ### GitHub Links - Use permalinks for permanent references - Include line numbers for precision - Link to specific commits, not branches (branches move) ### Main Agent Role - Your role is synthesis, not deep file reading - Let sub-agents do the detailed reading - You orchestrate, compile, and connect their findings - Focus on the big picture and cross-component connections ### Documentation Style - Sub-agents document examples and usage patterns as they exist - Main agent synthesizes into coherent narrative - Both levels: documentarian, not evaluator - Never recommend changes or improvements unless explicitly asked ### File Reading Rules - ALWAYS read mentioned files fully before spawning sub-tasks - Use Read tool WITHOUT limit/offset for complete files - This is critical for proper decomposition ### Follow the Steps - These numbered steps are not suggestions - follow them exactly - Don't skip steps or reorder them - Each step builds on the previous ones ### Thoughts Directory Handling **If using thoughts system:** - `thoughts/searchable/` is a special directory - paths found there should be documented as their actual location - Example: `thoughts/searchable/allison/notes.md` → document as `thoughts/allison/notes.md` - Don't change directory names (keep `allison/`, don't change to `shared/`) **If NOT using thoughts system:** - Skip thoughts-related agents - Skip thoughts sync commands - Save research docs to `research/` directory in workspace root ### Frontmatter Consistency - Always include complete frontmatter as shown in template - Use ISO 8601 dates with timezone - Keep tags consistent across research documents - Update `last_updated` fields when appending follow-ups ## Linear Integration If a Linear ticket is associated with the research, the command can automatically update the ticket status. ### How It Works **Ticket detection** (same as other commands): 1. User provides ticket ID explicitly: `/research_codebase PROJ-123` 2. Ticket mentioned in research query 3. Auto-detected from current context **Status updates:** - When research starts → Move ticket to **"Research"** - When research document is saved → Add comment with link to research doc ### Implementation Pattern **At research start** (Step 2 - after reading mentioned files): ```bash # If ticket is detected or provided if [[ -n "$ticketId" ]]; then # Check if Linearis CLI is available if command -v linearis &> /dev/null; then # Update ticket state to "Research" (use --state NOT --status!) linearis issues update "$ticketId" --state "Research" # Add comment (use 'comments create' NOT 'issues comment'!) linearis comments create "$ticketId" --body "Starting research: [user's research question]" else echo "⚠️ Linearis CLI not found - skipping Linear ticket update" fi fi ``` **After research document is saved** (Step 6 - after generating document): ```bash # Attach research document to ticket if [[ -n "$ticketId" ]] && [[ -n "$githubPermalink" ]]; then # Check if Linearis CLI is available if command -v linearis &> /dev/null; then # Add completion comment with research doc link linearis comments create "$ticketId" \ --body "Research complete! See findings: $githubPermalink" else echo "⚠️ Linearis CLI not found - skipping Linear ticket update" fi fi ``` ### User Experience **With ticket:** ```bash /catalyst-dev:research_codebase PROJ-123 > "How does authentication work?" ``` **What happens:** 1. Command detects ticket PROJ-123 2. Moves ticket from Backlog → Research 3. Adds comment: "Starting research: How does authentication work?" 4. Conducts research with parallel agents 5. Saves document to thoughts/shared/research/ 6. Attaches document to Linear ticket 7. Adds comment: "Research complete! See findings: [link]" **Without ticket:** ```bash /catalyst-dev:research_codebase > "How does authentication work?" ``` **What happens:** - Same research process, but no Linear updates - User can manually attach research to ticket later ### Configuration Uses the same Linear configuration as other commands from `.claude/config.json`: - `linear.teamId` - `linear.thoughtsRepoUrl` (for GitHub permalinks) ### Error Handling **If Linear MCP not available:** - Skip Linear integration silently - Continue with research as normal - Note in output: "Research complete (Linear not configured)" **If ticket not found:** - Show warning: "Ticket PROJ-123 not found in Linear" - Ask user: "Continue research without Linear integration? (Y/n)" **If status update fails:** - Log error but continue research - Include note in final output: "⚠️ Could not update Linear ticket status" ## Integration with Other Commands This command integrates with the complete development workflow: ``` /research-codebase → research document (+ Linear: Research) ↓ /create-plan → implementation plan (+ Linear: Planning) ↓ /implement-plan → code changes (+ Linear: In Progress) ↓ /describe-pr → PR created (+ Linear: In Review) ``` **How it connects:** - **research_codebase → Linear**: Moves ticket to "Research" status and attaches research document - **research_codebase → create_plan**: Research findings provide foundation for planning. The create_plan command can reference research documents in its "References" section. - **Research before planning**: Always research the codebase first to understand what exists before planning changes. - **Shared agents**: Both research_codebase and create_plan use the same specialized agents (codebase-locator, codebase-analyzer, codebase-pattern-finder). - **Documentation persistence**: Research documents serve as permanent reference for future work. ## Example Workflow ```bash # User starts research /research-codebase # You respond with initial prompt # User asks: "How does authentication work in the API?" # You execute: # 1. Read any mentioned files fully # 2. Decompose into research areas (auth middleware, token validation, session management) # 3. Spawn parallel agents: # - codebase-locator: Find auth-related files # - codebase-analyzer: Understand auth middleware implementation # - codebase-pattern-finder: Find auth usage patterns # - thoughts-locator: Find previous auth discussions (if using thoughts) # 4. Wait for all agents # 5. Synthesize findings # 6. Generate research document at research/2025-01-08-authentication-system.md # 7. Present summary to user # User follows up: "How does it integrate with the database?" # You append to same document with new findings ``` ### Track in Workflow Context After saving the research document, add it to workflow context: ```bash if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" ]]; then "${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" add research "$DOC_PATH" "${TICKET_ID:-null}" fi ``` ## Adaptation Notes This command is adapted from HumanLayer's research_codebase command. Key differences for portability: - **Thoughts system**: Made optional - can use simple `research/` directory - **Metadata script**: Made optional - can generate metadata inline - **Ticket prefixes**: Read from `.claude/config.json` or use PROJ- placeholder - **Linear integration**: Made optional - only used if Linear MCP available - **Web research**: Uses `external-research` agent instead of `web-search-researcher` The core workflow and philosophy remain the same: parallel sub-agents, documentarian mindset, and structured output.