# Best Practices Essential principles and proven strategies for effective documentation discovery. ## 1. Prioritize context7.com for llms.txt ### Why - **Comprehensive aggregator**: Single source for most documentation - **Most efficient**: Instant access without searching - **Authoritative**: Aggregates official sources - **Up-to-date**: Continuously maintained - **Fast**: Direct URL construction vs searching - **Topic filtering**: Targeted results with ?topic= parameter ### Implementation ``` Step 1: Try context7.com (ALWAYS FIRST) ↓ Know GitHub repo? YES → https://context7.com/{org}/{repo}/llms.txt NO → Continue ↓ Know website? YES → https://context7.com/websites/{normalized-path}/llms.txt NO → Continue ↓ Specific topic needed? YES → Add ?topic={query} parameter NO → Use base URL ↓ Found? YES → Use as primary source NO → Fall back to WebSearch for llms.txt ↓ Still not found? YES → Fall back to repository analysis ``` ### Examples ``` Best approach (context7.com): 1. Direct URL: https://context7.com/vercel/next.js/llms.txt 2. WebFetch llms.txt 3. Launch Explorer agents for URLs Total time: ~15 seconds Topic-specific approach: 1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date 2. WebFetch filtered content 3. Present focused results Total time: ~10 seconds Good fallback approach: 1. context7.com returns 404 2. WebSearch: "Astro llms.txt site:docs.astro.build" 3. Found → WebFetch llms.txt 4. Launch Explorer agents for URLs Total time: ~60 seconds Poor approach: 1. Skip context7.com entirely 2. Search for various documentation pages 3. Manually collect URLs 4. Process one by one Total time: ~5 minutes ``` ### When Not Available Fallback strategy when context7.com unavailable: - If context7.com returns 404 → try WebSearch for llms.txt - If WebSearch finds nothing in 30 seconds → move to repository - If domain is incorrect → try 2-3 alternatives, then move on - If documentation is very old → likely doesn't have llms.txt ## 2. Use Parallel Agents Aggressively ### Why - **Speed**: N tasks in time of 1 (vs N × time) - **Efficiency**: Better resource utilization - **Coverage**: Comprehensive results faster - **Scalability**: Handles large documentation sets ### Guidelines **Always use parallel for 3+ URLs:** ``` 3 URLs → 1 Explorer agent (acceptable) 4-10 URLs → 3-5 Explorer agents (optimal) 11+ URLs → 5-7 agents in phases (best) ``` **Launch all agents in single message:** ``` Good: [Send one message with 5 Task tool calls] Bad: [Send message with Task call] [Wait for result] [Send another message with Task call] [Wait for result] ... ``` ### Distribution Strategy **Even distribution:** ``` 10 URLs, 5 agents: Agent 1: URLs 1-2 Agent 2: URLs 3-4 Agent 3: URLs 5-6 Agent 4: URLs 7-8 Agent 5: URLs 9-10 ``` **Topic-based distribution:** ``` 10 URLs, 3 agents: Agent 1: Installation & Setup (URLs 1-3) Agent 2: Core Concepts & API (URLs 4-7) Agent 3: Examples & Guides (URLs 8-10) ``` ### When Not to Parallelize - Single URL (use WebFetch) - 2 URLs (single agent is fine) - Dependencies between tasks (sequential required) - Limited documentation (1-2 pages) ## 3. Verify Official Sources ### Why - **Accuracy**: Avoid outdated information - **Security**: Prevent malicious content - **Credibility**: Maintain trust - **Relevance**: Match user's version/needs ### Verification Checklist **For llms.txt:** ``` [ ] Domain matches official site [ ] HTTPS connection [ ] Content format is valid [ ] URLs point to official docs [ ] Last-Modified header is recent (if available) ``` **For repositories:** ``` [ ] Organization matches official entity [ ] Star count appropriate for library [ ] Recent commits (last 6 months) [ ] README mentions official status [ ] Links back to official website [ ] License matches expectations ``` **For documentation:** ``` [ ] Domain is official [ ] Version matches user request [ ] Last updated date visible [ ] Content is complete (not stubs) [ ] Links work (not 404s) ``` ### Red Flags ⚠️ **Unofficial sources:** - Personal GitHub forks - Outdated tutorials (>2 years old) - Unmaintained repositories - Suspicious domains - No version information - Conflicting with official docs ### When to Use Unofficial Sources Acceptable when: - No official documentation exists - Clearly labeled as community resource - Recent and well-maintained - Cross-referenced with official info - User is aware of unofficial status ## 4. Report Methodology ### Why - **Transparency**: User knows how you found info - **Reproducibility**: User can verify - **Troubleshooting**: Helps debug issues - **Trust**: Builds confidence in results ### What to Include **Always report:** ```markdown ## Source **Method**: llms.txt / Repository / Research / Mixed **Primary source**: [main URL or repository] **Additional sources**: [list] **Date accessed**: [current date] **Version**: [documentation version] ``` **For llms.txt:** ```markdown **Method**: llms.txt **URL**: https://docs.astro.build/llms.txt **URLs processed**: 8 **Date accessed**: 2025-10-26 **Version**: Latest (as of Oct 2025) ``` **For repository:** ```markdown **Method**: Repository analysis (Repomix) **Repository**: https://github.com/org/library **Commit**: abc123f (2025-10-20) **Stars**: 15.2k **Analysis date**: 2025-10-26 ``` **For research:** ```markdown **Method**: Multi-source research **Sources**: - Official website: [url] - Package registry: [url] - Stack Overflow: [url] - Community tutorials: [urls] **Date accessed**: 2025-10-26 **Note**: No official llms.txt or repository available ``` ### Limitations Disclosure Always note: ```markdown ## ⚠️ Limitations - Documentation for v2.x (user may need v3.x) - API reference section incomplete - Examples based on TypeScript (Python examples unavailable) - Last updated 6 months ago ``` ## 5. Handle Versions Explicitly ### Why - **Compatibility**: Avoid version mismatch errors - **Accuracy**: Features vary by version - **Migration**: Support upgrade paths - **Clarity**: No ambiguity about what's covered ### Version Detection **Check these sources:** ``` 1. URL path: /docs/v2/ 2. Page header/title 3. Version selector on page 4. Git tag/branch name 5. Package.json or equivalent 6. Release date correlation ``` ### Version Handling Rules **User specifies version:** ``` Request: "Documentation for React 18" → Search: "React v18 documentation" → Verify: Check version in content → Report: "Documentation for React v18.2.0" ``` **User doesn't specify:** ``` Request: "Documentation for Next.js" → Default: Assume latest → Confirm: "I'll find the latest Next.js documentation" → Report: "Documentation for Next.js 14.0 (latest as of [date])" ``` **Version mismatch found:** ``` Request: "Docs for v2" Found: Only v3 documentation → Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide." ``` ### Multi-Version Scenarios **Comparison request:** ``` Request: "Compare v1 and v2" → Find both versions → Launch parallel agents (set A for v1, set B for v2) → Present side-by-side analysis ``` **Migration request:** ``` Request: "How to migrate from v1 to v2" → Find v2 migration guide → Also fetch v1 and v2 docs → Highlight breaking changes → Provide code examples (before/after) ``` ## 6. Aggregate Intelligently ### Why - **Clarity**: Easier to understand - **Efficiency**: Less cognitive load - **Completeness**: Unified view - **Actionability**: Clear next steps ### Bad Aggregation (Don't Do This) ```markdown ## Results Agent 1 found: [dump of agent 1 output] Agent 2 found: [dump of agent 2 output] Agent 3 found: [dump of agent 3 output] ``` Problems: - Redundant information repeated - No synthesis - Hard to scan - Lacks narrative ### Good Aggregation (Do This) ```markdown ## Installation [Synthesized from agents 1 & 2] Three installation methods available: 1. **npm (recommended)**: ```bash npm install library-name ``` 2. **CDN**: [from agent 1] ```html ``` 3. **Manual**: [from agent 3] Download and include in project ## Core Concepts [Synthesized from agents 2 & 4] The library is built around three main concepts: 1. **Components**: [definition from agent 2] 2. **State**: [definition from agent 4] 3. **Effects**: [definition from agent 2] ## Examples [From agents 3 & 5, deduplicated] ... ``` Benefits: - Organized by topic - Deduplicated - Clear narrative - Easy to scan ### Synthesis Techniques **Deduplication:** ``` Agent 1: "Install with npm install foo" Agent 2: "You can install using npm: npm install foo" → Synthesized: "Install: `npm install foo`" ``` **Prioritization:** ``` Agent 1: Basic usage example Agent 2: Basic usage example (same) Agent 3: Advanced usage example → Keep: Basic (from agent 1) + Advanced (from agent 3) ``` **Organization:** ``` Agents returned mixed information: - Installation steps - Configuration - Usage example - Installation requirements - More usage examples → Reorganize: 1. Installation (requirements + steps) 2. Configuration 3. Usage (all examples together) ``` ## 7. Time Management ### Why - **User experience**: Fast results - **Resource efficiency**: Don't waste compute - **Fail fast**: Quickly try alternatives - **Practical limits**: Avoid hanging ### Timeouts **Set explicit timeouts:** ``` WebSearch: 30 seconds WebFetch: 60 seconds Repository clone: 5 minutes Repomix processing: 10 minutes Explorer agent: 5 minutes per URL Researcher agent: 10 minutes ``` ### Time Budgets **Simple query (single library, latest version):** ``` Target: <2 minutes total Phase 1 (Discovery): 30 seconds - llms.txt search: 15 seconds - Fetch llms.txt: 15 seconds Phase 2 (Exploration): 60 seconds - Launch agents: 5 seconds - Agents fetch URLs: 60 seconds (parallel) Phase 3 (Aggregation): 30 seconds - Synthesize results - Format output Total: ~2 minutes ``` **Complex query (multiple versions, comparison):** ``` Target: <5 minutes total Phase 1 (Discovery): 60 seconds - Search both versions - Fetch both llms.txt files Phase 2 (Exploration): 180 seconds - Launch 6 agents (2 sets of 3) - Parallel exploration Phase 3 (Comparison): 60 seconds - Analyze differences - Format side-by-side Total: ~5 minutes ``` ### When to Extend Timeouts Acceptable to go longer when: - User explicitly requests comprehensive analysis - Repository is large but necessary - Multiple fallbacks attempted - User is informed of delay ### When to Give Up Move to next method after: - 3 failed attempts on same approach - Timeout exceeded by 2x - No progress for 30 seconds - Error indicates permanent failure (404, auth required) ## 8. Cache Findings ### Why - **Speed**: Instant results for repeated requests - **Efficiency**: Reduce network requests - **Consistency**: Same results within session - **Reliability**: Less dependent on network ### What to Cache **High value (always cache):** ``` - Repomix output (large, expensive to generate) - llms.txt content (static, frequently referenced) - Repository README (relatively static) - Package registry metadata (changes rarely) ``` **Medium value (cache within session):** ``` - Documentation page content - Search results - Repository structure - Version lists ``` **Low value (don't cache):** ``` - Real-time data (latest releases) - User-specific content - Time-sensitive information ``` ### Cache Duration ``` Within conversation: - All fetched content (reuse freely) Within session: - Repomix output (until conversation ends) - llms.txt content (until new version requested) Across sessions: - Don't cache (start fresh each time) ``` ### Cache Invalidation Refresh cache when: ``` - User requests specific different version - User says "get latest" or "refresh" - Explicit time reference ("docs from today") - Previous cache is from different library ``` ### Implementation ``` # First request for library X 1. Fetch llms.txt 2. Store content in session variable 3. Use for processing # Second request for library X (same session) 1. Check if llms.txt cached 2. Reuse cached content 3. Skip redundant fetch # Request for library Y 1. Don't reuse library X cache 2. Fetch fresh for library Y ``` ### Cache Hit Messages ```markdown ℹ️ Using cached llms.txt from 5 minutes ago. To fetch fresh, say "refresh" or "get latest". ``` ## Quick Reference Checklist ### Before Starting - [ ] Identify library name clearly - [ ] Confirm version (default: latest) - [ ] Check if cached data available - [ ] Plan method (llms.txt → repo → research) ### During Discovery - [ ] Start with llms.txt search - [ ] Verify source is official - [ ] Check version matches requirement - [ ] Set timeout for each operation - [ ] Fall back quickly if method fails ### During Exploration - [ ] Use parallel agents for 3+ URLs - [ ] Launch all agents in single message - [ ] Distribute workload evenly - [ ] Monitor for errors/timeouts - [ ] Be ready to retry or fallback ### Before Presenting - [ ] Synthesize by topic (not by agent) - [ ] Deduplicate repeated information - [ ] Verify version is correct - [ ] Include source attribution - [ ] Note any limitations - [ ] Format clearly - [ ] Check completeness ### Quality Gates Ask before presenting: - [ ] Is information accurate? - [ ] Are sources official? - [ ] Does version match request? - [ ] Are all key topics covered? - [ ] Are limitations noted? - [ ] Is methodology documented? - [ ] Is output well-organized?