Files
2025-11-30 08:48:52 +08:00

13 KiB
Raw Permalink Blame History

Best Practices

Essential principles and proven strategies for effective documentation discovery.

1. Prioritize context7.com for llms.txt

Why

  • Comprehensive aggregator: Single source for most documentation
  • Most efficient: Instant access without searching
  • Authoritative: Aggregates official sources
  • Up-to-date: Continuously maintained
  • Fast: Direct URL construction vs searching
  • Topic filtering: Targeted results with ?topic= parameter

Implementation

Step 1: Try context7.com (ALWAYS FIRST)
  ↓
Know GitHub repo?
  YES → https://context7.com/{org}/{repo}/llms.txt
  NO → Continue
  ↓
Know website?
  YES → https://context7.com/websites/{normalized-path}/llms.txt
  NO → Continue
  ↓
Specific topic needed?
  YES → Add ?topic={query} parameter
  NO → Use base URL
  ↓
Found?
  YES → Use as primary source
  NO → Fall back to WebSearch for llms.txt
  ↓
Still not found?
  YES → Fall back to repository analysis

Examples

Best approach (context7.com):
1. Direct URL: https://context7.com/vercel/next.js/llms.txt
2. WebFetch llms.txt
3. Launch Explorer agents for URLs
Total time: ~15 seconds

Topic-specific approach:
1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
2. WebFetch filtered content
3. Present focused results
Total time: ~10 seconds

Good fallback approach:
1. context7.com returns 404
2. WebSearch: "Astro llms.txt site:docs.astro.build"
3. Found → WebFetch llms.txt
4. Launch Explorer agents for URLs
Total time: ~60 seconds

Poor approach:
1. Skip context7.com entirely
2. Search for various documentation pages
3. Manually collect URLs
4. Process one by one
Total time: ~5 minutes

When Not Available

Fallback strategy when context7.com unavailable:

  • If context7.com returns 404 → try WebSearch for llms.txt
  • If WebSearch finds nothing in 30 seconds → move to repository
  • If domain is incorrect → try 2-3 alternatives, then move on
  • If documentation is very old → likely doesn't have llms.txt

2. Use Parallel Agents Aggressively

Why

  • Speed: N tasks in time of 1 (vs N × time)
  • Efficiency: Better resource utilization
  • Coverage: Comprehensive results faster
  • Scalability: Handles large documentation sets

Guidelines

Always use parallel for 3+ URLs:

3 URLs → 1 Explorer agent (acceptable)
4-10 URLs → 3-5 Explorer agents (optimal)
11+ URLs → 5-7 agents in phases (best)

Launch all agents in single message:

Good:
[Send one message with 5 Task tool calls]

Bad:
[Send message with Task call]
[Wait for result]
[Send another message with Task call]
[Wait for result]
...

Distribution Strategy

Even distribution:

10 URLs, 5 agents:
Agent 1: URLs 1-2
Agent 2: URLs 3-4
Agent 3: URLs 5-6
Agent 4: URLs 7-8
Agent 5: URLs 9-10

Topic-based distribution:

10 URLs, 3 agents:
Agent 1: Installation & Setup (URLs 1-3)
Agent 2: Core Concepts & API (URLs 4-7)
Agent 3: Examples & Guides (URLs 8-10)

When Not to Parallelize

  • Single URL (use WebFetch)
  • 2 URLs (single agent is fine)
  • Dependencies between tasks (sequential required)
  • Limited documentation (1-2 pages)

3. Verify Official Sources

Why

  • Accuracy: Avoid outdated information
  • Security: Prevent malicious content
  • Credibility: Maintain trust
  • Relevance: Match user's version/needs

Verification Checklist

For llms.txt:

[ ] Domain matches official site
[ ] HTTPS connection
[ ] Content format is valid
[ ] URLs point to official docs
[ ] Last-Modified header is recent (if available)

For repositories:

[ ] Organization matches official entity
[ ] Star count appropriate for library
[ ] Recent commits (last 6 months)
[ ] README mentions official status
[ ] Links back to official website
[ ] License matches expectations

For documentation:

[ ] Domain is official
[ ] Version matches user request
[ ] Last updated date visible
[ ] Content is complete (not stubs)
[ ] Links work (not 404s)

Red Flags

⚠️ Unofficial sources:

  • Personal GitHub forks
  • Outdated tutorials (>2 years old)
  • Unmaintained repositories
  • Suspicious domains
  • No version information
  • Conflicting with official docs

When to Use Unofficial Sources

Acceptable when:

  • No official documentation exists
  • Clearly labeled as community resource
  • Recent and well-maintained
  • Cross-referenced with official info
  • User is aware of unofficial status

4. Report Methodology

Why

  • Transparency: User knows how you found info
  • Reproducibility: User can verify
  • Troubleshooting: Helps debug issues
  • Trust: Builds confidence in results

What to Include

Always report:

## Source

**Method**: llms.txt / Repository / Research / Mixed
**Primary source**: [main URL or repository]
**Additional sources**: [list]
**Date accessed**: [current date]
**Version**: [documentation version]

For llms.txt:

**Method**: llms.txt
**URL**: https://docs.astro.build/llms.txt
**URLs processed**: 8
**Date accessed**: 2025-10-26
**Version**: Latest (as of Oct 2025)

For repository:

**Method**: Repository analysis (Repomix)
**Repository**: https://github.com/org/library
**Commit**: abc123f (2025-10-20)
**Stars**: 15.2k
**Analysis date**: 2025-10-26

For research:

**Method**: Multi-source research
**Sources**:
- Official website: [url]
- Package registry: [url]
- Stack Overflow: [url]
- Community tutorials: [urls]
**Date accessed**: 2025-10-26
**Note**: No official llms.txt or repository available

Limitations Disclosure

Always note:

## ⚠️ Limitations

- Documentation for v2.x (user may need v3.x)
- API reference section incomplete
- Examples based on TypeScript (Python examples unavailable)
- Last updated 6 months ago

5. Handle Versions Explicitly

Why

  • Compatibility: Avoid version mismatch errors
  • Accuracy: Features vary by version
  • Migration: Support upgrade paths
  • Clarity: No ambiguity about what's covered

Version Detection

Check these sources:

1. URL path: /docs/v2/
2. Page header/title
3. Version selector on page
4. Git tag/branch name
5. Package.json or equivalent
6. Release date correlation

Version Handling Rules

User specifies version:

Request: "Documentation for React 18"
→ Search: "React v18 documentation"
→ Verify: Check version in content
→ Report: "Documentation for React v18.2.0"

User doesn't specify:

Request: "Documentation for Next.js"
→ Default: Assume latest
→ Confirm: "I'll find the latest Next.js documentation"
→ Report: "Documentation for Next.js 14.0 (latest as of [date])"

Version mismatch found:

Request: "Docs for v2"
Found: Only v3 documentation
→ Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide."

Multi-Version Scenarios

Comparison request:

Request: "Compare v1 and v2"
→ Find both versions
→ Launch parallel agents (set A for v1, set B for v2)
→ Present side-by-side analysis

Migration request:

Request: "How to migrate from v1 to v2"
→ Find v2 migration guide
→ Also fetch v1 and v2 docs
→ Highlight breaking changes
→ Provide code examples (before/after)

6. Aggregate Intelligently

Why

  • Clarity: Easier to understand
  • Efficiency: Less cognitive load
  • Completeness: Unified view
  • Actionability: Clear next steps

Bad Aggregation (Don't Do This)

## Results

Agent 1 found:
[dump of agent 1 output]

Agent 2 found:
[dump of agent 2 output]

Agent 3 found:
[dump of agent 3 output]

Problems:

  • Redundant information repeated
  • No synthesis
  • Hard to scan
  • Lacks narrative

Good Aggregation (Do This)

## Installation

[Synthesized from agents 1 & 2]
Three installation methods available:

1. **npm (recommended)**:
   ```bash
   npm install library-name
  1. CDN: [from agent 1]

    <script src="..."></script>
    
  2. Manual: [from agent 3] Download and include in project

Core Concepts

[Synthesized from agents 2 & 4] The library is built around three main concepts:

  1. Components: [definition from agent 2]
  2. State: [definition from agent 4]
  3. Effects: [definition from agent 2]

Examples

[From agents 3 & 5, deduplicated] ...


Benefits:
- Organized by topic
- Deduplicated
- Clear narrative
- Easy to scan

### Synthesis Techniques

**Deduplication:**

Agent 1: "Install with npm install foo" Agent 2: "You can install using npm: npm install foo" → Synthesized: "Install: npm install foo"


**Prioritization:**

Agent 1: Basic usage example Agent 2: Basic usage example (same) Agent 3: Advanced usage example → Keep: Basic (from agent 1) + Advanced (from agent 3)


**Organization:**

Agents returned mixed information:

  • Installation steps
  • Configuration
  • Usage example
  • Installation requirements
  • More usage examples

→ Reorganize:

  1. Installation (requirements + steps)
  2. Configuration
  3. Usage (all examples together)

## 7. Time Management

### Why

- **User experience**: Fast results
- **Resource efficiency**: Don't waste compute
- **Fail fast**: Quickly try alternatives
- **Practical limits**: Avoid hanging

### Timeouts

**Set explicit timeouts:**

WebSearch: 30 seconds WebFetch: 60 seconds Repository clone: 5 minutes Repomix processing: 10 minutes Explorer agent: 5 minutes per URL Researcher agent: 10 minutes


### Time Budgets

**Simple query (single library, latest version):**

Target: <2 minutes total

Phase 1 (Discovery): 30 seconds

  • llms.txt search: 15 seconds
  • Fetch llms.txt: 15 seconds

Phase 2 (Exploration): 60 seconds

  • Launch agents: 5 seconds
  • Agents fetch URLs: 60 seconds (parallel)

Phase 3 (Aggregation): 30 seconds

  • Synthesize results
  • Format output

Total: ~2 minutes


**Complex query (multiple versions, comparison):**

Target: <5 minutes total

Phase 1 (Discovery): 60 seconds

  • Search both versions
  • Fetch both llms.txt files

Phase 2 (Exploration): 180 seconds

  • Launch 6 agents (2 sets of 3)
  • Parallel exploration

Phase 3 (Comparison): 60 seconds

  • Analyze differences
  • Format side-by-side

Total: ~5 minutes


### When to Extend Timeouts

Acceptable to go longer when:
- User explicitly requests comprehensive analysis
- Repository is large but necessary
- Multiple fallbacks attempted
- User is informed of delay

### When to Give Up

Move to next method after:
- 3 failed attempts on same approach
- Timeout exceeded by 2x
- No progress for 30 seconds
- Error indicates permanent failure (404, auth required)

## 8. Cache Findings

### Why

- **Speed**: Instant results for repeated requests
- **Efficiency**: Reduce network requests
- **Consistency**: Same results within session
- **Reliability**: Less dependent on network

### What to Cache

**High value (always cache):**
  • Repomix output (large, expensive to generate)
  • llms.txt content (static, frequently referenced)
  • Repository README (relatively static)
  • Package registry metadata (changes rarely)

**Medium value (cache within session):**
  • Documentation page content
  • Search results
  • Repository structure
  • Version lists

**Low value (don't cache):**
  • Real-time data (latest releases)
  • User-specific content
  • Time-sensitive information

### Cache Duration

Within conversation:

  • All fetched content (reuse freely)

Within session:

  • Repomix output (until conversation ends)
  • llms.txt content (until new version requested)

Across sessions:

  • Don't cache (start fresh each time)

### Cache Invalidation

Refresh cache when:
  • User requests specific different version
  • User says "get latest" or "refresh"
  • Explicit time reference ("docs from today")
  • Previous cache is from different library

### Implementation

First request for library X

  1. Fetch llms.txt
  2. Store content in session variable
  3. Use for processing

Second request for library X (same session)

  1. Check if llms.txt cached
  2. Reuse cached content
  3. Skip redundant fetch

Request for library Y

  1. Don't reuse library X cache
  2. Fetch fresh for library Y

### Cache Hit Messages

```markdown
 Using cached llms.txt from 5 minutes ago.
To fetch fresh, say "refresh" or "get latest".

Quick Reference Checklist

Before Starting

  • Identify library name clearly
  • Confirm version (default: latest)
  • Check if cached data available
  • Plan method (llms.txt → repo → research)

During Discovery

  • Start with llms.txt search
  • Verify source is official
  • Check version matches requirement
  • Set timeout for each operation
  • Fall back quickly if method fails

During Exploration

  • Use parallel agents for 3+ URLs
  • Launch all agents in single message
  • Distribute workload evenly
  • Monitor for errors/timeouts
  • Be ready to retry or fallback

Before Presenting

  • Synthesize by topic (not by agent)
  • Deduplicate repeated information
  • Verify version is correct
  • Include source attribution
  • Note any limitations
  • Format clearly
  • Check completeness

Quality Gates

Ask before presenting:

  • Is information accurate?
  • Are sources official?
  • Does version match request?
  • Are all key topics covered?
  • Are limitations noted?
  • Is methodology documented?
  • Is output well-organized?