633 lines
13 KiB
Markdown
633 lines
13 KiB
Markdown
# Best Practices
|
||
|
||
Essential principles and proven strategies for effective documentation discovery.
|
||
|
||
## 1. Prioritize context7.com for llms.txt
|
||
|
||
### Why
|
||
|
||
- **Comprehensive aggregator**: Single source for most documentation
|
||
- **Most efficient**: Instant access without searching
|
||
- **Authoritative**: Aggregates official sources
|
||
- **Up-to-date**: Continuously maintained
|
||
- **Fast**: Direct URL construction vs searching
|
||
- **Topic filtering**: Targeted results with ?topic= parameter
|
||
|
||
### Implementation
|
||
|
||
```
|
||
Step 1: Try context7.com (ALWAYS FIRST)
|
||
↓
|
||
Know GitHub repo?
|
||
YES → https://context7.com/{org}/{repo}/llms.txt
|
||
NO → Continue
|
||
↓
|
||
Know website?
|
||
YES → https://context7.com/websites/{normalized-path}/llms.txt
|
||
NO → Continue
|
||
↓
|
||
Specific topic needed?
|
||
YES → Add ?topic={query} parameter
|
||
NO → Use base URL
|
||
↓
|
||
Found?
|
||
YES → Use as primary source
|
||
NO → Fall back to WebSearch for llms.txt
|
||
↓
|
||
Still not found?
|
||
YES → Fall back to repository analysis
|
||
```
|
||
|
||
### Examples
|
||
|
||
```
|
||
Best approach (context7.com):
|
||
1. Direct URL: https://context7.com/vercel/next.js/llms.txt
|
||
2. WebFetch llms.txt
|
||
3. Launch Explorer agents for URLs
|
||
Total time: ~15 seconds
|
||
|
||
Topic-specific approach:
|
||
1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
|
||
2. WebFetch filtered content
|
||
3. Present focused results
|
||
Total time: ~10 seconds
|
||
|
||
Good fallback approach:
|
||
1. context7.com returns 404
|
||
2. WebSearch: "Astro llms.txt site:docs.astro.build"
|
||
3. Found → WebFetch llms.txt
|
||
4. Launch Explorer agents for URLs
|
||
Total time: ~60 seconds
|
||
|
||
Poor approach:
|
||
1. Skip context7.com entirely
|
||
2. Search for various documentation pages
|
||
3. Manually collect URLs
|
||
4. Process one by one
|
||
Total time: ~5 minutes
|
||
```
|
||
|
||
### When Not Available
|
||
|
||
Fallback strategy when context7.com unavailable:
|
||
- If context7.com returns 404 → try WebSearch for llms.txt
|
||
- If WebSearch finds nothing in 30 seconds → move to repository
|
||
- If domain is incorrect → try 2-3 alternatives, then move on
|
||
- If documentation is very old → likely doesn't have llms.txt
|
||
|
||
## 2. Use Parallel Agents Aggressively
|
||
|
||
### Why
|
||
|
||
- **Speed**: N tasks in time of 1 (vs N × time)
|
||
- **Efficiency**: Better resource utilization
|
||
- **Coverage**: Comprehensive results faster
|
||
- **Scalability**: Handles large documentation sets
|
||
|
||
### Guidelines
|
||
|
||
**Always use parallel for 3+ URLs:**
|
||
```
|
||
3 URLs → 1 Explorer agent (acceptable)
|
||
4-10 URLs → 3-5 Explorer agents (optimal)
|
||
11+ URLs → 5-7 agents in phases (best)
|
||
```
|
||
|
||
**Launch all agents in single message:**
|
||
```
|
||
Good:
|
||
[Send one message with 5 Task tool calls]
|
||
|
||
Bad:
|
||
[Send message with Task call]
|
||
[Wait for result]
|
||
[Send another message with Task call]
|
||
[Wait for result]
|
||
...
|
||
```
|
||
|
||
### Distribution Strategy
|
||
|
||
**Even distribution:**
|
||
```
|
||
10 URLs, 5 agents:
|
||
Agent 1: URLs 1-2
|
||
Agent 2: URLs 3-4
|
||
Agent 3: URLs 5-6
|
||
Agent 4: URLs 7-8
|
||
Agent 5: URLs 9-10
|
||
```
|
||
|
||
**Topic-based distribution:**
|
||
```
|
||
10 URLs, 3 agents:
|
||
Agent 1: Installation & Setup (URLs 1-3)
|
||
Agent 2: Core Concepts & API (URLs 4-7)
|
||
Agent 3: Examples & Guides (URLs 8-10)
|
||
```
|
||
|
||
### When Not to Parallelize
|
||
|
||
- Single URL (use WebFetch)
|
||
- 2 URLs (single agent is fine)
|
||
- Dependencies between tasks (sequential required)
|
||
- Limited documentation (1-2 pages)
|
||
|
||
## 3. Verify Official Sources
|
||
|
||
### Why
|
||
|
||
- **Accuracy**: Avoid outdated information
|
||
- **Security**: Prevent malicious content
|
||
- **Credibility**: Maintain trust
|
||
- **Relevance**: Match user's version/needs
|
||
|
||
### Verification Checklist
|
||
|
||
**For llms.txt:**
|
||
```
|
||
[ ] Domain matches official site
|
||
[ ] HTTPS connection
|
||
[ ] Content format is valid
|
||
[ ] URLs point to official docs
|
||
[ ] Last-Modified header is recent (if available)
|
||
```
|
||
|
||
**For repositories:**
|
||
```
|
||
[ ] Organization matches official entity
|
||
[ ] Star count appropriate for library
|
||
[ ] Recent commits (last 6 months)
|
||
[ ] README mentions official status
|
||
[ ] Links back to official website
|
||
[ ] License matches expectations
|
||
```
|
||
|
||
**For documentation:**
|
||
```
|
||
[ ] Domain is official
|
||
[ ] Version matches user request
|
||
[ ] Last updated date visible
|
||
[ ] Content is complete (not stubs)
|
||
[ ] Links work (not 404s)
|
||
```
|
||
|
||
### Red Flags
|
||
|
||
⚠️ **Unofficial sources:**
|
||
- Personal GitHub forks
|
||
- Outdated tutorials (>2 years old)
|
||
- Unmaintained repositories
|
||
- Suspicious domains
|
||
- No version information
|
||
- Conflicting with official docs
|
||
|
||
### When to Use Unofficial Sources
|
||
|
||
Acceptable when:
|
||
- No official documentation exists
|
||
- Clearly labeled as community resource
|
||
- Recent and well-maintained
|
||
- Cross-referenced with official info
|
||
- User is aware of unofficial status
|
||
|
||
## 4. Report Methodology
|
||
|
||
### Why
|
||
|
||
- **Transparency**: User knows how you found info
|
||
- **Reproducibility**: User can verify
|
||
- **Troubleshooting**: Helps debug issues
|
||
- **Trust**: Builds confidence in results
|
||
|
||
### What to Include
|
||
|
||
**Always report:**
|
||
```markdown
|
||
## Source
|
||
|
||
**Method**: llms.txt / Repository / Research / Mixed
|
||
**Primary source**: [main URL or repository]
|
||
**Additional sources**: [list]
|
||
**Date accessed**: [current date]
|
||
**Version**: [documentation version]
|
||
```
|
||
|
||
**For llms.txt:**
|
||
```markdown
|
||
**Method**: llms.txt
|
||
**URL**: https://docs.astro.build/llms.txt
|
||
**URLs processed**: 8
|
||
**Date accessed**: 2025-10-26
|
||
**Version**: Latest (as of Oct 2025)
|
||
```
|
||
|
||
**For repository:**
|
||
```markdown
|
||
**Method**: Repository analysis (Repomix)
|
||
**Repository**: https://github.com/org/library
|
||
**Commit**: abc123f (2025-10-20)
|
||
**Stars**: 15.2k
|
||
**Analysis date**: 2025-10-26
|
||
```
|
||
|
||
**For research:**
|
||
```markdown
|
||
**Method**: Multi-source research
|
||
**Sources**:
|
||
- Official website: [url]
|
||
- Package registry: [url]
|
||
- Stack Overflow: [url]
|
||
- Community tutorials: [urls]
|
||
**Date accessed**: 2025-10-26
|
||
**Note**: No official llms.txt or repository available
|
||
```
|
||
|
||
### Limitations Disclosure
|
||
|
||
Always note:
|
||
```markdown
|
||
## ⚠️ Limitations
|
||
|
||
- Documentation for v2.x (user may need v3.x)
|
||
- API reference section incomplete
|
||
- Examples based on TypeScript (Python examples unavailable)
|
||
- Last updated 6 months ago
|
||
```
|
||
|
||
## 5. Handle Versions Explicitly
|
||
|
||
### Why
|
||
|
||
- **Compatibility**: Avoid version mismatch errors
|
||
- **Accuracy**: Features vary by version
|
||
- **Migration**: Support upgrade paths
|
||
- **Clarity**: No ambiguity about what's covered
|
||
|
||
### Version Detection
|
||
|
||
**Check these sources:**
|
||
```
|
||
1. URL path: /docs/v2/
|
||
2. Page header/title
|
||
3. Version selector on page
|
||
4. Git tag/branch name
|
||
5. Package.json or equivalent
|
||
6. Release date correlation
|
||
```
|
||
|
||
### Version Handling Rules
|
||
|
||
**User specifies version:**
|
||
```
|
||
Request: "Documentation for React 18"
|
||
→ Search: "React v18 documentation"
|
||
→ Verify: Check version in content
|
||
→ Report: "Documentation for React v18.2.0"
|
||
```
|
||
|
||
**User doesn't specify:**
|
||
```
|
||
Request: "Documentation for Next.js"
|
||
→ Default: Assume latest
|
||
→ Confirm: "I'll find the latest Next.js documentation"
|
||
→ Report: "Documentation for Next.js 14.0 (latest as of [date])"
|
||
```
|
||
|
||
**Version mismatch found:**
|
||
```
|
||
Request: "Docs for v2"
|
||
Found: Only v3 documentation
|
||
→ Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide."
|
||
```
|
||
|
||
### Multi-Version Scenarios
|
||
|
||
**Comparison request:**
|
||
```
|
||
Request: "Compare v1 and v2"
|
||
→ Find both versions
|
||
→ Launch parallel agents (set A for v1, set B for v2)
|
||
→ Present side-by-side analysis
|
||
```
|
||
|
||
**Migration request:**
|
||
```
|
||
Request: "How to migrate from v1 to v2"
|
||
→ Find v2 migration guide
|
||
→ Also fetch v1 and v2 docs
|
||
→ Highlight breaking changes
|
||
→ Provide code examples (before/after)
|
||
```
|
||
|
||
## 6. Aggregate Intelligently
|
||
|
||
### Why
|
||
|
||
- **Clarity**: Easier to understand
|
||
- **Efficiency**: Less cognitive load
|
||
- **Completeness**: Unified view
|
||
- **Actionability**: Clear next steps
|
||
|
||
### Bad Aggregation (Don't Do This)
|
||
|
||
```markdown
|
||
## Results
|
||
|
||
Agent 1 found:
|
||
[dump of agent 1 output]
|
||
|
||
Agent 2 found:
|
||
[dump of agent 2 output]
|
||
|
||
Agent 3 found:
|
||
[dump of agent 3 output]
|
||
```
|
||
|
||
Problems:
|
||
- Redundant information repeated
|
||
- No synthesis
|
||
- Hard to scan
|
||
- Lacks narrative
|
||
|
||
### Good Aggregation (Do This)
|
||
|
||
```markdown
|
||
## Installation
|
||
|
||
[Synthesized from agents 1 & 2]
|
||
Three installation methods available:
|
||
|
||
1. **npm (recommended)**:
|
||
```bash
|
||
npm install library-name
|
||
```
|
||
|
||
2. **CDN**: [from agent 1]
|
||
```html
|
||
<script src="..."></script>
|
||
```
|
||
|
||
3. **Manual**: [from agent 3]
|
||
Download and include in project
|
||
|
||
## Core Concepts
|
||
|
||
[Synthesized from agents 2 & 4]
|
||
The library is built around three main concepts:
|
||
|
||
1. **Components**: [definition from agent 2]
|
||
2. **State**: [definition from agent 4]
|
||
3. **Effects**: [definition from agent 2]
|
||
|
||
## Examples
|
||
|
||
[From agents 3 & 5, deduplicated]
|
||
...
|
||
```
|
||
|
||
Benefits:
|
||
- Organized by topic
|
||
- Deduplicated
|
||
- Clear narrative
|
||
- Easy to scan
|
||
|
||
### Synthesis Techniques
|
||
|
||
**Deduplication:**
|
||
```
|
||
Agent 1: "Install with npm install foo"
|
||
Agent 2: "You can install using npm: npm install foo"
|
||
→ Synthesized: "Install: `npm install foo`"
|
||
```
|
||
|
||
**Prioritization:**
|
||
```
|
||
Agent 1: Basic usage example
|
||
Agent 2: Basic usage example (same)
|
||
Agent 3: Advanced usage example
|
||
→ Keep: Basic (from agent 1) + Advanced (from agent 3)
|
||
```
|
||
|
||
**Organization:**
|
||
```
|
||
Agents returned mixed information:
|
||
- Installation steps
|
||
- Configuration
|
||
- Usage example
|
||
- Installation requirements
|
||
- More usage examples
|
||
|
||
→ Reorganize:
|
||
1. Installation (requirements + steps)
|
||
2. Configuration
|
||
3. Usage (all examples together)
|
||
```
|
||
|
||
## 7. Time Management
|
||
|
||
### Why
|
||
|
||
- **User experience**: Fast results
|
||
- **Resource efficiency**: Don't waste compute
|
||
- **Fail fast**: Quickly try alternatives
|
||
- **Practical limits**: Avoid hanging
|
||
|
||
### Timeouts
|
||
|
||
**Set explicit timeouts:**
|
||
```
|
||
WebSearch: 30 seconds
|
||
WebFetch: 60 seconds
|
||
Repository clone: 5 minutes
|
||
Repomix processing: 10 minutes
|
||
Explorer agent: 5 minutes per URL
|
||
Researcher agent: 10 minutes
|
||
```
|
||
|
||
### Time Budgets
|
||
|
||
**Simple query (single library, latest version):**
|
||
```
|
||
Target: <2 minutes total
|
||
|
||
Phase 1 (Discovery): 30 seconds
|
||
- llms.txt search: 15 seconds
|
||
- Fetch llms.txt: 15 seconds
|
||
|
||
Phase 2 (Exploration): 60 seconds
|
||
- Launch agents: 5 seconds
|
||
- Agents fetch URLs: 60 seconds (parallel)
|
||
|
||
Phase 3 (Aggregation): 30 seconds
|
||
- Synthesize results
|
||
- Format output
|
||
|
||
Total: ~2 minutes
|
||
```
|
||
|
||
**Complex query (multiple versions, comparison):**
|
||
```
|
||
Target: <5 minutes total
|
||
|
||
Phase 1 (Discovery): 60 seconds
|
||
- Search both versions
|
||
- Fetch both llms.txt files
|
||
|
||
Phase 2 (Exploration): 180 seconds
|
||
- Launch 6 agents (2 sets of 3)
|
||
- Parallel exploration
|
||
|
||
Phase 3 (Comparison): 60 seconds
|
||
- Analyze differences
|
||
- Format side-by-side
|
||
|
||
Total: ~5 minutes
|
||
```
|
||
|
||
### When to Extend Timeouts
|
||
|
||
Acceptable to go longer when:
|
||
- User explicitly requests comprehensive analysis
|
||
- Repository is large but necessary
|
||
- Multiple fallbacks attempted
|
||
- User is informed of delay
|
||
|
||
### When to Give Up
|
||
|
||
Move to next method after:
|
||
- 3 failed attempts on same approach
|
||
- Timeout exceeded by 2x
|
||
- No progress for 30 seconds
|
||
- Error indicates permanent failure (404, auth required)
|
||
|
||
## 8. Cache Findings
|
||
|
||
### Why
|
||
|
||
- **Speed**: Instant results for repeated requests
|
||
- **Efficiency**: Reduce network requests
|
||
- **Consistency**: Same results within session
|
||
- **Reliability**: Less dependent on network
|
||
|
||
### What to Cache
|
||
|
||
**High value (always cache):**
|
||
```
|
||
- Repomix output (large, expensive to generate)
|
||
- llms.txt content (static, frequently referenced)
|
||
- Repository README (relatively static)
|
||
- Package registry metadata (changes rarely)
|
||
```
|
||
|
||
**Medium value (cache within session):**
|
||
```
|
||
- Documentation page content
|
||
- Search results
|
||
- Repository structure
|
||
- Version lists
|
||
```
|
||
|
||
**Low value (don't cache):**
|
||
```
|
||
- Real-time data (latest releases)
|
||
- User-specific content
|
||
- Time-sensitive information
|
||
```
|
||
|
||
### Cache Duration
|
||
|
||
```
|
||
Within conversation:
|
||
- All fetched content (reuse freely)
|
||
|
||
Within session:
|
||
- Repomix output (until conversation ends)
|
||
- llms.txt content (until new version requested)
|
||
|
||
Across sessions:
|
||
- Don't cache (start fresh each time)
|
||
```
|
||
|
||
### Cache Invalidation
|
||
|
||
Refresh cache when:
|
||
```
|
||
- User requests specific different version
|
||
- User says "get latest" or "refresh"
|
||
- Explicit time reference ("docs from today")
|
||
- Previous cache is from different library
|
||
```
|
||
|
||
### Implementation
|
||
|
||
```
|
||
# First request for library X
|
||
1. Fetch llms.txt
|
||
2. Store content in session variable
|
||
3. Use for processing
|
||
|
||
# Second request for library X (same session)
|
||
1. Check if llms.txt cached
|
||
2. Reuse cached content
|
||
3. Skip redundant fetch
|
||
|
||
# Request for library Y
|
||
1. Don't reuse library X cache
|
||
2. Fetch fresh for library Y
|
||
```
|
||
|
||
### Cache Hit Messages
|
||
|
||
```markdown
|
||
ℹ️ Using cached llms.txt from 5 minutes ago.
|
||
To fetch fresh, say "refresh" or "get latest".
|
||
```
|
||
|
||
## Quick Reference Checklist
|
||
|
||
### Before Starting
|
||
|
||
- [ ] Identify library name clearly
|
||
- [ ] Confirm version (default: latest)
|
||
- [ ] Check if cached data available
|
||
- [ ] Plan method (llms.txt → repo → research)
|
||
|
||
### During Discovery
|
||
|
||
- [ ] Start with llms.txt search
|
||
- [ ] Verify source is official
|
||
- [ ] Check version matches requirement
|
||
- [ ] Set timeout for each operation
|
||
- [ ] Fall back quickly if method fails
|
||
|
||
### During Exploration
|
||
|
||
- [ ] Use parallel agents for 3+ URLs
|
||
- [ ] Launch all agents in single message
|
||
- [ ] Distribute workload evenly
|
||
- [ ] Monitor for errors/timeouts
|
||
- [ ] Be ready to retry or fallback
|
||
|
||
### Before Presenting
|
||
|
||
- [ ] Synthesize by topic (not by agent)
|
||
- [ ] Deduplicate repeated information
|
||
- [ ] Verify version is correct
|
||
- [ ] Include source attribution
|
||
- [ ] Note any limitations
|
||
- [ ] Format clearly
|
||
- [ ] Check completeness
|
||
|
||
### Quality Gates
|
||
|
||
Ask before presenting:
|
||
- [ ] Is information accurate?
|
||
- [ ] Are sources official?
|
||
- [ ] Does version match request?
|
||
- [ ] Are all key topics covered?
|
||
- [ ] Are limitations noted?
|
||
- [ ] Is methodology documented?
|
||
- [ ] Is output well-organized?
|