Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:14:39 +08:00
commit bb966f5886
35 changed files with 8872 additions and 0 deletions

View File

@@ -0,0 +1,715 @@
---
description: Conduct comprehensive codebase research using parallel sub-agents
category: workflow
tools: Read, Write, Grep, Glob, Task, TodoWrite, Bash
model: inherit
version: 1.0.0
---
# Research Codebase
You are tasked with conducting comprehensive research across the codebase to answer user questions
by spawning parallel sub-agents and synthesizing their findings.
## CRITICAL: YOUR ONLY JOB IS TO DOCUMENT AND EXPLAIN THE CODEBASE AS IT EXISTS TODAY
- DO NOT suggest improvements or changes unless the user explicitly asks for them
- DO NOT perform root cause analysis unless the user explicitly asks for them
- DO NOT propose future enhancements unless the user explicitly asks for them
- DO NOT critique the implementation or identify problems
- DO NOT recommend refactoring, optimization, or architectural changes
- ONLY describe what exists, where it exists, how it works, and how components interact
- You are creating a technical map/documentation of the existing system
## Prerequisites
Before executing, verify all required tools and systems:
```bash
# 1. Validate thoughts system (REQUIRED)
if [[ -f "scripts/validate-thoughts-setup.sh" ]]; then
./scripts/validate-thoughts-setup.sh || exit 1
else
# Inline validation if script not found
if [[ ! -d "thoughts/shared" ]]; then
echo "❌ ERROR: Thoughts system not configured"
echo "Run: ./scripts/humanlayer/init-project.sh . {project-name}"
exit 1
fi
fi
# 2. Validate plugin scripts
if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" ]]; then
"${CLAUDE_PLUGIN_ROOT}/scripts/check-prerequisites.sh" || exit 1
fi
```
## Initial Setup
When this command is invoked, respond with:
```
I'm ready to research the codebase. Please provide your research question or area of interest, and I'll analyze it thoroughly by exploring relevant components and connections.
```
Then wait for the user's research query.
## Steps to Follow After Receiving the Research Query
### Step 1: Read Any Directly Mentioned Files First
- If the user mentions specific files (tickets, docs, JSON), read them FULLY first
- **IMPORTANT**: Use the Read tool WITHOUT limit/offset parameters to read entire files
- **CRITICAL**: Read these files yourself in the main context before spawning any sub-tasks
- This ensures you have full context before decomposing the research
### Step 2: Analyze and Decompose the Research Question
- Break down the user's query into composable research areas
- Take time to think deeply about the underlying patterns, connections, and architectural
implications the user might be seeking
- Identify specific components, patterns, or concepts to investigate
- Create a research plan using TodoWrite to track all subtasks
- Consider which directories, files, or architectural patterns are relevant
### Step 3: Spawn Parallel Sub-Agent Tasks for Comprehensive Research
Create multiple Task agents to research different aspects concurrently.
We have specialized agents that know how to do specific research tasks:
**For codebase research:**
- Use the **codebase-locator** agent to find WHERE files and components live
- Use the **codebase-analyzer** agent to understand HOW specific code works (without critiquing it)
- Use the **codebase-pattern-finder** agent to find examples of existing patterns (without
evaluating them)
**IMPORTANT**: All agents are documentarians, not critics. They will describe what exists without
suggesting improvements or identifying issues.
**For thoughts directory (if using thoughts system):**
- Use the **thoughts-locator** agent to discover what documents exist about the topic
- Use the **thoughts-analyzer** agent to extract key insights from specific documents (only the most
relevant ones)
**For external research (only if user explicitly asks):**
- Use the **external-research** agent for external documentation and resources
- IF you use external research agents, instruct them to return LINKS with their findings, and
INCLUDE those links in your final report
**For Linear tickets (if relevant):**
- Use the **linear-ticket-reader** agent to get full details of a specific ticket (if Linear MCP
available)
- Use the **linear-searcher** agent to find related tickets or historical context
The key is to use these agents intelligently:
- Start with locator agents to find what exists
- Then use analyzer agents on the most promising findings to document how they work
- Run multiple agents in parallel when they're searching for different things
- Each agent knows its job - just tell it what you're looking for
- Don't write detailed prompts about HOW to search - the agents already know
- Remind agents they are documenting, not evaluating or improving
**Example of spawning parallel research tasks:**
```
I'm going to spawn 3 parallel research tasks:
Task 1 - Find WHERE components live:
"Use codebase-locator to find all files related to [topic]. Focus on [specific directories if known]."
Task 2 - Understand HOW it works:
"Use codebase-analyzer to analyze [specific component] and document how it currently works. Include data flow and key integration points."
Task 3 - Find existing patterns:
"Use codebase-pattern-finder to find similar implementations of [pattern] in the codebase. Show concrete examples."
```
### Step 4: Wait for All Sub-Agents to Complete and Synthesize Findings
- **IMPORTANT**: Wait for ALL sub-agent tasks to complete before proceeding
- Compile all sub-agent results (both codebase and thoughts findings if applicable)
- Prioritize live codebase findings as primary source of truth
- Use thoughts/ findings as supplementary historical context (if thoughts system is used)
- Connect findings across different components
- Document specific file paths and line numbers (format: `file.ext:line`)
- Explain how components interact with each other
- Include temporal context where relevant (e.g., "This was added in commit abc123")
- Mark all research tasks as complete in TodoWrite
### Step 5: Gather Metadata for the Research Document
Collect metadata for the research document:
**If using thoughts system with metadata script:**
- Run `hack/spec_metadata.sh` or equivalent to generate metadata
- Metadata includes: date, researcher, git commit, branch, repository
**If using simple approach:**
- Get current date/time
- Get git commit hash: `git rev-parse HEAD`
- Get current branch: `git branch --show-current`
- Get repository name from `.git/config` or working directory
**Document Storage:**
All research documents are stored in the **thoughts system** for persistence:
**Required location:** `thoughts/shared/research/YYYY-MM-DD-{ticket}-{description}.md`
**Why thoughts/shared/**:
- ✅ Persisted across sessions (git-backed via HumanLayer)
- ✅ Shared across worktrees
- ✅ Synced via `humanlayer thoughts sync`
- ✅ Team collaboration ready
**Filename format:**
- With ticket: `thoughts/shared/research/YYYY-MM-DD-PROJ-XXXX-description.md`
- Without ticket: `thoughts/shared/research/YYYY-MM-DD-description.md`
Replace `PROJ` with your ticket prefix from `.claude/config.json`.
**Examples:**
- `thoughts/shared/research/2025-01-08-PROJ-1478-parent-child-tracking.md`
- `thoughts/shared/research/2025-01-08-authentication-flow.md` (no ticket)
### Step 6: Generate Research Document
Create a structured research document with the following format:
```markdown
---
date: YYYY-MM-DDTHH:MM:SS+TZ
researcher: { your-name }
git_commit: { commit-hash }
branch: { branch-name }
repository: { repo-name }
topic: "{User's Research Question}"
tags: [research, codebase, { component-names }]
status: complete
last_updated: YYYY-MM-DD
last_updated_by: { your-name }
---
# Research: {User's Research Question}
**Date**: {date/time with timezone} **Researcher**: {your-name} **Git Commit**: {commit-hash}
**Branch**: {branch-name} **Repository**: {repo-name}
## Research Question
{Original user query, verbatim}
## Summary
{High-level documentation of what you found. 2-3 paragraphs explaining the current state of the
system in this area. Focus on WHAT EXISTS, not what should exist.}
## Detailed Findings
### {Component/Area 1}
**What exists**: {Describe the current implementation}
- File location: `path/to/file.ext:123`
- Current behavior: {what it does}
- Key functions/classes: {list with file:line references}
**Connections**: {How this component integrates with others}
- Calls: `other-component.ts:45` - {description}
- Used by: `consumer.ts:67` - {description}
**Implementation details**: {Technical specifics without evaluation}
### {Component/Area 2}
{Same structure as above}
### {Component/Area N}
{Continue for all major findings}
## Code References
Quick reference of key files and their roles:
- `path/to/file1.ext:123-145` - {What this code does}
- `path/to/file2.ext:67` - {What this code does}
- `path/to/file3.ext:200-250` - {What this code does}
## Architecture Documentation
{Document the current architectural patterns, conventions, and design decisions observed in the
code. This is descriptive, not prescriptive.}
### Current Patterns
- **Pattern 1**: {How it's implemented in the codebase}
- **Pattern 2**: {How it's implemented in the codebase}
### Data Flow
{Document how data moves through the system in this area}
```
Component A → Component B → Component C {Describe what happens at each step}
```
### Key Integrations
{Document how different parts of the system connect}
## Historical Context (from thoughts/)
{ONLY if using thoughts system}
{Include insights from thoughts/ documents that provide context}
- `thoughts/shared/research/previous-doc.md` - {Key decision or insight}
- `thoughts/shared/plans/plan-123.md` - {Related implementation detail}
## Related Research
{Links to other research documents that touch on related topics}
- `research/YYYY-MM-DD-related-topic.md` - {How it relates}
## Open Questions
{Areas that would benefit from further investigation - NOT problems to fix, just areas where understanding could be deepened}
- {Question 1}
- {Question 2}
```
### Step 7: Add GitHub Permalinks (If Applicable)
**If you're on the main/master branch OR if the commit is pushed:**
Generate GitHub permalinks and replace file references:
```
https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{line}
```
For line ranges:
```
https://github.com/{owner}/{repo}/blob/{commit-hash}/{file-path}#L{start}-L{end}
```
**If working on a feature branch that's not pushed yet:**
- Keep local file references: `path/to/file.ext:line`
- Add note: "GitHub permalinks will be added once this branch is pushed"
### Step 8: Sync and Present Findings
**If using thoughts system:**
- Run `humanlayer thoughts sync` to sync the thoughts directory
- This updates symlinks, creates searchable index, and commits to thoughts repo
**If using simple approach:**
- Just save the file to your research directory
- Optionally commit to git
**Present to user:**
```markdown
✅ Research complete!
**Research document**: {file-path}
## Summary
{2-3 sentence summary of key findings}
## Key Files
{Top 3-5 most important file references}
## What I Found
{Brief overview - save details for the document}
---
## 📊 Context Status
Current usage: {X}% ({Y}K/{Z}K tokens)
{If >60%}: ⚠️ **Recommendation**: Context is getting full. For best results in the planning phase, I
recommend clearing context now.
**Options**:
1. ✅ Clear context now (recommended) - Close this session and start fresh for planning
2. Create handoff to pause work
3. Continue anyway (may impact performance)
**Why clear?** Fresh context ensures optimal AI performance for the planning phase, which will load
additional files and research.
{If <60%}: ✅ Context healthy. Ready to proceed to planning phase if needed.
---
Would you like me to:
1. Dive deeper into any specific area?
2. Create an implementation plan based on this research?
3. Explore related topics?
```
### Step 9: Handle Follow-Up Questions
If the user has follow-up questions:
1. **DO NOT create a new research document** - append to the same one
2. **Update frontmatter fields:**
- `last_updated`: {new date}
- `last_updated_by`: {your name}
- Add `last_updated_note`: "{Brief note about what was added}"
3. **Add new section to existing document:**
```markdown
---
## Follow-up Research: {Follow-up Question}
**Date**: {date} **Updated by**: {your-name}
### Additional Findings
{New research results using same structure as above}
```
4. **Spawn new sub-agents as needed** for the follow-up research
5. **Re-sync** (if using thoughts system)
## Important Notes
### Proactive Context Management
**Monitor Your Context Throughout Research**:
- Check token usage after spawning parallel agents
- After synthesis phase, check context again
- **If context >60%**: Warn user and recommend handoff
**Example Warning**:
```
⚠️ Context Usage Alert: Currently at 65% (130K/200K tokens)
Research is complete, but context is getting full. Before continuing to
planning phase, I recommend creating a handoff to preserve this work
and start fresh.
Would you like me to:
1. Create a handoff now (recommended)
2. Continue and clear context manually
3. Proceed anyway (not recommended - may impact planning quality)
**Why this matters**: The planning phase will load additional context.
Starting fresh ensures optimal AI performance.
```
**When to Warn**:
- After Step 7 (document generated) if context >60%
- After Step 9 (follow-up complete) if context >70%
- Anytime during research if context >80%
**Educate the User**:
- Explain WHY clearing context matters (performance, token efficiency)
- Explain WHEN to clear (between phases)
- Offer to create handoff yourself if `/create-handoff` command exists
### Parallel Execution
- ALWAYS use parallel Task agents for efficiency
- Don't wait for one agent to finish before spawning the next
- Spawn all research tasks at once, then wait for all to complete
### Research Philosophy
- Always perform fresh codebase research - never rely solely on existing docs
- The `thoughts/` directory (if used) provides historical context, not primary source
- Focus on concrete file paths and line numbers - make it easy to navigate
- Research documents should be self-contained and understandable months later
### Sub-Agent Prompts
- Be specific about what to search for
- Specify directories to focus on when known
- Make prompts focused on read-only documentation
- Remind agents they are documentarians, not critics
### Cross-Component Understanding
- Document how components interact, not just what they do individually
- Trace data flow across boundaries
- Note integration points and dependencies
### Temporal Context
- Include when things were added/changed if relevant
- Note deprecated patterns still in the codebase
- Don't judge - just document the timeline
### GitHub Links
- Use permalinks for permanent references
- Include line numbers for precision
- Link to specific commits, not branches (branches move)
### Main Agent Role
- Your role is synthesis, not deep file reading
- Let sub-agents do the detailed reading
- You orchestrate, compile, and connect their findings
- Focus on the big picture and cross-component connections
### Documentation Style
- Sub-agents document examples and usage patterns as they exist
- Main agent synthesizes into coherent narrative
- Both levels: documentarian, not evaluator
- Never recommend changes or improvements unless explicitly asked
### File Reading Rules
- ALWAYS read mentioned files fully before spawning sub-tasks
- Use Read tool WITHOUT limit/offset for complete files
- This is critical for proper decomposition
### Follow the Steps
- These numbered steps are not suggestions - follow them exactly
- Don't skip steps or reorder them
- Each step builds on the previous ones
### Thoughts Directory Handling
**If using thoughts system:**
- `thoughts/searchable/` is a special directory - paths found there should be documented as their
actual location
- Example: `thoughts/searchable/allison/notes.md` → document as `thoughts/allison/notes.md`
- Don't change directory names (keep `allison/`, don't change to `shared/`)
**If NOT using thoughts system:**
- Skip thoughts-related agents
- Skip thoughts sync commands
- Save research docs to `research/` directory in workspace root
### Frontmatter Consistency
- Always include complete frontmatter as shown in template
- Use ISO 8601 dates with timezone
- Keep tags consistent across research documents
- Update `last_updated` fields when appending follow-ups
## Linear Integration
If a Linear ticket is associated with the research, the command can automatically update the ticket
status.
### How It Works
**Ticket detection** (same as other commands):
1. User provides ticket ID explicitly: `/research_codebase PROJ-123`
2. Ticket mentioned in research query
3. Auto-detected from current context
**Status updates:**
- When research starts → Move ticket to **"Research"**
- When research document is saved → Add comment with link to research doc
### Implementation Pattern
**At research start** (Step 2 - after reading mentioned files):
```bash
# If ticket is detected or provided
if [[ -n "$ticketId" ]]; then
# Check if Linearis CLI is available
if command -v linearis &> /dev/null; then
# Update ticket state to "Research" (use --state NOT --status!)
linearis issues update "$ticketId" --state "Research"
# Add comment (use 'comments create' NOT 'issues comment'!)
linearis comments create "$ticketId" --body "Starting research: [user's research question]"
else
echo "⚠️ Linearis CLI not found - skipping Linear ticket update"
fi
fi
```
**After research document is saved** (Step 6 - after generating document):
```bash
# Attach research document to ticket
if [[ -n "$ticketId" ]] && [[ -n "$githubPermalink" ]]; then
# Check if Linearis CLI is available
if command -v linearis &> /dev/null; then
# Add completion comment with research doc link
linearis comments create "$ticketId" \
--body "Research complete! See findings: $githubPermalink"
else
echo "⚠️ Linearis CLI not found - skipping Linear ticket update"
fi
fi
```
### User Experience
**With ticket:**
```bash
/catalyst-dev:research_codebase PROJ-123
> "How does authentication work?"
```
**What happens:**
1. Command detects ticket PROJ-123
2. Moves ticket from Backlog → Research
3. Adds comment: "Starting research: How does authentication work?"
4. Conducts research with parallel agents
5. Saves document to thoughts/shared/research/
6. Attaches document to Linear ticket
7. Adds comment: "Research complete! See findings: [link]"
**Without ticket:**
```bash
/catalyst-dev:research_codebase
> "How does authentication work?"
```
**What happens:**
- Same research process, but no Linear updates
- User can manually attach research to ticket later
### Configuration
Uses the same Linear configuration as other commands from `.claude/config.json`:
- `linear.teamId`
- `linear.thoughtsRepoUrl` (for GitHub permalinks)
### Error Handling
**If Linear MCP not available:**
- Skip Linear integration silently
- Continue with research as normal
- Note in output: "Research complete (Linear not configured)"
**If ticket not found:**
- Show warning: "Ticket PROJ-123 not found in Linear"
- Ask user: "Continue research without Linear integration? (Y/n)"
**If status update fails:**
- Log error but continue research
- Include note in final output: "⚠️ Could not update Linear ticket status"
## Integration with Other Commands
This command integrates with the complete development workflow:
```
/research-codebase → research document (+ Linear: Research)
/create-plan → implementation plan (+ Linear: Planning)
/implement-plan → code changes (+ Linear: In Progress)
/describe-pr → PR created (+ Linear: In Review)
```
**How it connects:**
- **research_codebase → Linear**: Moves ticket to "Research" status and attaches research document
- **research_codebase → create_plan**: Research findings provide foundation for planning. The
create_plan command can reference research documents in its "References" section.
- **Research before planning**: Always research the codebase first to understand what exists before
planning changes.
- **Shared agents**: Both research_codebase and create_plan use the same specialized agents
(codebase-locator, codebase-analyzer, codebase-pattern-finder).
- **Documentation persistence**: Research documents serve as permanent reference for future work.
## Example Workflow
```bash
# User starts research
/research-codebase
# You respond with initial prompt
# User asks: "How does authentication work in the API?"
# You execute:
# 1. Read any mentioned files fully
# 2. Decompose into research areas (auth middleware, token validation, session management)
# 3. Spawn parallel agents:
# - codebase-locator: Find auth-related files
# - codebase-analyzer: Understand auth middleware implementation
# - codebase-pattern-finder: Find auth usage patterns
# - thoughts-locator: Find previous auth discussions (if using thoughts)
# 4. Wait for all agents
# 5. Synthesize findings
# 6. Generate research document at research/2025-01-08-authentication-system.md
# 7. Present summary to user
# User follows up: "How does it integrate with the database?"
# You append to same document with new findings
```
### Track in Workflow Context
After saving the research document, add it to workflow context:
```bash
if [[ -f "${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" ]]; then
"${CLAUDE_PLUGIN_ROOT}/scripts/workflow-context.sh" add research "$DOC_PATH" "${TICKET_ID:-null}"
fi
```
## Adaptation Notes
This command is adapted from HumanLayer's research_codebase command. Key differences for
portability:
- **Thoughts system**: Made optional - can use simple `research/` directory
- **Metadata script**: Made optional - can generate metadata inline
- **Ticket prefixes**: Read from `.claude/config.json` or use PROJ- placeholder
- **Linear integration**: Made optional - only used if Linear MCP available
- **Web research**: Uses `external-research` agent instead of `web-search-researcher`
The core workflow and philosophy remain the same: parallel sub-agents, documentarian mindset, and
structured output.