13 KiB
Best Practices
Essential principles and proven strategies for effective documentation discovery.
1. Prioritize context7.com for llms.txt
Why
- Comprehensive aggregator: Single source for most documentation
- Most efficient: Instant access without searching
- Authoritative: Aggregates official sources
- Up-to-date: Continuously maintained
- Fast: Direct URL construction vs searching
- Topic filtering: Targeted results with ?topic= parameter
Implementation
Step 1: Try context7.com (ALWAYS FIRST)
↓
Know GitHub repo?
YES → https://context7.com/{org}/{repo}/llms.txt
NO → Continue
↓
Know website?
YES → https://context7.com/websites/{normalized-path}/llms.txt
NO → Continue
↓
Specific topic needed?
YES → Add ?topic={query} parameter
NO → Use base URL
↓
Found?
YES → Use as primary source
NO → Fall back to WebSearch for llms.txt
↓
Still not found?
YES → Fall back to repository analysis
Examples
Best approach (context7.com):
1. Direct URL: https://context7.com/vercel/next.js/llms.txt
2. WebFetch llms.txt
3. Launch Explorer agents for URLs
Total time: ~15 seconds
Topic-specific approach:
1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
2. WebFetch filtered content
3. Present focused results
Total time: ~10 seconds
Good fallback approach:
1. context7.com returns 404
2. WebSearch: "Astro llms.txt site:docs.astro.build"
3. Found → WebFetch llms.txt
4. Launch Explorer agents for URLs
Total time: ~60 seconds
Poor approach:
1. Skip context7.com entirely
2. Search for various documentation pages
3. Manually collect URLs
4. Process one by one
Total time: ~5 minutes
When Not Available
Fallback strategy when context7.com unavailable:
- If context7.com returns 404 → try WebSearch for llms.txt
- If WebSearch finds nothing in 30 seconds → move to repository
- If domain is incorrect → try 2-3 alternatives, then move on
- If documentation is very old → likely doesn't have llms.txt
2. Use Parallel Agents Aggressively
Why
- Speed: N tasks in time of 1 (vs N × time)
- Efficiency: Better resource utilization
- Coverage: Comprehensive results faster
- Scalability: Handles large documentation sets
Guidelines
Always use parallel for 3+ URLs:
3 URLs → 1 Explorer agent (acceptable)
4-10 URLs → 3-5 Explorer agents (optimal)
11+ URLs → 5-7 agents in phases (best)
Launch all agents in single message:
Good:
[Send one message with 5 Task tool calls]
Bad:
[Send message with Task call]
[Wait for result]
[Send another message with Task call]
[Wait for result]
...
Distribution Strategy
Even distribution:
10 URLs, 5 agents:
Agent 1: URLs 1-2
Agent 2: URLs 3-4
Agent 3: URLs 5-6
Agent 4: URLs 7-8
Agent 5: URLs 9-10
Topic-based distribution:
10 URLs, 3 agents:
Agent 1: Installation & Setup (URLs 1-3)
Agent 2: Core Concepts & API (URLs 4-7)
Agent 3: Examples & Guides (URLs 8-10)
When Not to Parallelize
- Single URL (use WebFetch)
- 2 URLs (single agent is fine)
- Dependencies between tasks (sequential required)
- Limited documentation (1-2 pages)
3. Verify Official Sources
Why
- Accuracy: Avoid outdated information
- Security: Prevent malicious content
- Credibility: Maintain trust
- Relevance: Match user's version/needs
Verification Checklist
For llms.txt:
[ ] Domain matches official site
[ ] HTTPS connection
[ ] Content format is valid
[ ] URLs point to official docs
[ ] Last-Modified header is recent (if available)
For repositories:
[ ] Organization matches official entity
[ ] Star count appropriate for library
[ ] Recent commits (last 6 months)
[ ] README mentions official status
[ ] Links back to official website
[ ] License matches expectations
For documentation:
[ ] Domain is official
[ ] Version matches user request
[ ] Last updated date visible
[ ] Content is complete (not stubs)
[ ] Links work (not 404s)
Red Flags
⚠️ Unofficial sources:
- Personal GitHub forks
- Outdated tutorials (>2 years old)
- Unmaintained repositories
- Suspicious domains
- No version information
- Conflicting with official docs
When to Use Unofficial Sources
Acceptable when:
- No official documentation exists
- Clearly labeled as community resource
- Recent and well-maintained
- Cross-referenced with official info
- User is aware of unofficial status
4. Report Methodology
Why
- Transparency: User knows how you found info
- Reproducibility: User can verify
- Troubleshooting: Helps debug issues
- Trust: Builds confidence in results
What to Include
Always report:
## Source
**Method**: llms.txt / Repository / Research / Mixed
**Primary source**: [main URL or repository]
**Additional sources**: [list]
**Date accessed**: [current date]
**Version**: [documentation version]
For llms.txt:
**Method**: llms.txt
**URL**: https://docs.astro.build/llms.txt
**URLs processed**: 8
**Date accessed**: 2025-10-26
**Version**: Latest (as of Oct 2025)
For repository:
**Method**: Repository analysis (Repomix)
**Repository**: https://github.com/org/library
**Commit**: abc123f (2025-10-20)
**Stars**: 15.2k
**Analysis date**: 2025-10-26
For research:
**Method**: Multi-source research
**Sources**:
- Official website: [url]
- Package registry: [url]
- Stack Overflow: [url]
- Community tutorials: [urls]
**Date accessed**: 2025-10-26
**Note**: No official llms.txt or repository available
Limitations Disclosure
Always note:
## ⚠️ Limitations
- Documentation for v2.x (user may need v3.x)
- API reference section incomplete
- Examples based on TypeScript (Python examples unavailable)
- Last updated 6 months ago
5. Handle Versions Explicitly
Why
- Compatibility: Avoid version mismatch errors
- Accuracy: Features vary by version
- Migration: Support upgrade paths
- Clarity: No ambiguity about what's covered
Version Detection
Check these sources:
1. URL path: /docs/v2/
2. Page header/title
3. Version selector on page
4. Git tag/branch name
5. Package.json or equivalent
6. Release date correlation
Version Handling Rules
User specifies version:
Request: "Documentation for React 18"
→ Search: "React v18 documentation"
→ Verify: Check version in content
→ Report: "Documentation for React v18.2.0"
User doesn't specify:
Request: "Documentation for Next.js"
→ Default: Assume latest
→ Confirm: "I'll find the latest Next.js documentation"
→ Report: "Documentation for Next.js 14.0 (latest as of [date])"
Version mismatch found:
Request: "Docs for v2"
Found: Only v3 documentation
→ Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide."
Multi-Version Scenarios
Comparison request:
Request: "Compare v1 and v2"
→ Find both versions
→ Launch parallel agents (set A for v1, set B for v2)
→ Present side-by-side analysis
Migration request:
Request: "How to migrate from v1 to v2"
→ Find v2 migration guide
→ Also fetch v1 and v2 docs
→ Highlight breaking changes
→ Provide code examples (before/after)
6. Aggregate Intelligently
Why
- Clarity: Easier to understand
- Efficiency: Less cognitive load
- Completeness: Unified view
- Actionability: Clear next steps
Bad Aggregation (Don't Do This)
## Results
Agent 1 found:
[dump of agent 1 output]
Agent 2 found:
[dump of agent 2 output]
Agent 3 found:
[dump of agent 3 output]
Problems:
- Redundant information repeated
- No synthesis
- Hard to scan
- Lacks narrative
Good Aggregation (Do This)
## Installation
[Synthesized from agents 1 & 2]
Three installation methods available:
1. **npm (recommended)**:
```bash
npm install library-name
-
CDN: [from agent 1]
<script src="..."></script> -
Manual: [from agent 3] Download and include in project
Core Concepts
[Synthesized from agents 2 & 4] The library is built around three main concepts:
- Components: [definition from agent 2]
- State: [definition from agent 4]
- Effects: [definition from agent 2]
Examples
[From agents 3 & 5, deduplicated] ...
Benefits:
- Organized by topic
- Deduplicated
- Clear narrative
- Easy to scan
### Synthesis Techniques
**Deduplication:**
Agent 1: "Install with npm install foo"
Agent 2: "You can install using npm: npm install foo"
→ Synthesized: "Install: npm install foo"
**Prioritization:**
Agent 1: Basic usage example Agent 2: Basic usage example (same) Agent 3: Advanced usage example → Keep: Basic (from agent 1) + Advanced (from agent 3)
**Organization:**
Agents returned mixed information:
- Installation steps
- Configuration
- Usage example
- Installation requirements
- More usage examples
→ Reorganize:
- Installation (requirements + steps)
- Configuration
- Usage (all examples together)
## 7. Time Management
### Why
- **User experience**: Fast results
- **Resource efficiency**: Don't waste compute
- **Fail fast**: Quickly try alternatives
- **Practical limits**: Avoid hanging
### Timeouts
**Set explicit timeouts:**
WebSearch: 30 seconds WebFetch: 60 seconds Repository clone: 5 minutes Repomix processing: 10 minutes Explorer agent: 5 minutes per URL Researcher agent: 10 minutes
### Time Budgets
**Simple query (single library, latest version):**
Target: <2 minutes total
Phase 1 (Discovery): 30 seconds
- llms.txt search: 15 seconds
- Fetch llms.txt: 15 seconds
Phase 2 (Exploration): 60 seconds
- Launch agents: 5 seconds
- Agents fetch URLs: 60 seconds (parallel)
Phase 3 (Aggregation): 30 seconds
- Synthesize results
- Format output
Total: ~2 minutes
**Complex query (multiple versions, comparison):**
Target: <5 minutes total
Phase 1 (Discovery): 60 seconds
- Search both versions
- Fetch both llms.txt files
Phase 2 (Exploration): 180 seconds
- Launch 6 agents (2 sets of 3)
- Parallel exploration
Phase 3 (Comparison): 60 seconds
- Analyze differences
- Format side-by-side
Total: ~5 minutes
### When to Extend Timeouts
Acceptable to go longer when:
- User explicitly requests comprehensive analysis
- Repository is large but necessary
- Multiple fallbacks attempted
- User is informed of delay
### When to Give Up
Move to next method after:
- 3 failed attempts on same approach
- Timeout exceeded by 2x
- No progress for 30 seconds
- Error indicates permanent failure (404, auth required)
## 8. Cache Findings
### Why
- **Speed**: Instant results for repeated requests
- **Efficiency**: Reduce network requests
- **Consistency**: Same results within session
- **Reliability**: Less dependent on network
### What to Cache
**High value (always cache):**
- Repomix output (large, expensive to generate)
- llms.txt content (static, frequently referenced)
- Repository README (relatively static)
- Package registry metadata (changes rarely)
**Medium value (cache within session):**
- Documentation page content
- Search results
- Repository structure
- Version lists
**Low value (don't cache):**
- Real-time data (latest releases)
- User-specific content
- Time-sensitive information
### Cache Duration
Within conversation:
- All fetched content (reuse freely)
Within session:
- Repomix output (until conversation ends)
- llms.txt content (until new version requested)
Across sessions:
- Don't cache (start fresh each time)
### Cache Invalidation
Refresh cache when:
- User requests specific different version
- User says "get latest" or "refresh"
- Explicit time reference ("docs from today")
- Previous cache is from different library
### Implementation
First request for library X
- Fetch llms.txt
- Store content in session variable
- Use for processing
Second request for library X (same session)
- Check if llms.txt cached
- Reuse cached content
- Skip redundant fetch
Request for library Y
- Don't reuse library X cache
- Fetch fresh for library Y
### Cache Hit Messages
```markdown
ℹ️ Using cached llms.txt from 5 minutes ago.
To fetch fresh, say "refresh" or "get latest".
Quick Reference Checklist
Before Starting
- Identify library name clearly
- Confirm version (default: latest)
- Check if cached data available
- Plan method (llms.txt → repo → research)
During Discovery
- Start with llms.txt search
- Verify source is official
- Check version matches requirement
- Set timeout for each operation
- Fall back quickly if method fails
During Exploration
- Use parallel agents for 3+ URLs
- Launch all agents in single message
- Distribute workload evenly
- Monitor for errors/timeouts
- Be ready to retry or fallback
Before Presenting
- Synthesize by topic (not by agent)
- Deduplicate repeated information
- Verify version is correct
- Include source attribution
- Note any limitations
- Format clearly
- Check completeness
Quality Gates
Ask before presenting:
- Is information accurate?
- Are sources official?
- Does version match request?
- Are all key topics covered?
- Are limitations noted?
- Is methodology documented?
- Is output well-organized?