Files
gh-rafaelcalleja-claude-mar…/skills/docs-seeker/references/best-practices.md
2025-11-30 08:48:52 +08:00

633 lines
13 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Best Practices
Essential principles and proven strategies for effective documentation discovery.
## 1. Prioritize context7.com for llms.txt
### Why
- **Comprehensive aggregator**: Single source for most documentation
- **Most efficient**: Instant access without searching
- **Authoritative**: Aggregates official sources
- **Up-to-date**: Continuously maintained
- **Fast**: Direct URL construction vs searching
- **Topic filtering**: Targeted results with ?topic= parameter
### Implementation
```
Step 1: Try context7.com (ALWAYS FIRST)
Know GitHub repo?
YES → https://context7.com/{org}/{repo}/llms.txt
NO → Continue
Know website?
YES → https://context7.com/websites/{normalized-path}/llms.txt
NO → Continue
Specific topic needed?
YES → Add ?topic={query} parameter
NO → Use base URL
Found?
YES → Use as primary source
NO → Fall back to WebSearch for llms.txt
Still not found?
YES → Fall back to repository analysis
```
### Examples
```
Best approach (context7.com):
1. Direct URL: https://context7.com/vercel/next.js/llms.txt
2. WebFetch llms.txt
3. Launch Explorer agents for URLs
Total time: ~15 seconds
Topic-specific approach:
1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
2. WebFetch filtered content
3. Present focused results
Total time: ~10 seconds
Good fallback approach:
1. context7.com returns 404
2. WebSearch: "Astro llms.txt site:docs.astro.build"
3. Found → WebFetch llms.txt
4. Launch Explorer agents for URLs
Total time: ~60 seconds
Poor approach:
1. Skip context7.com entirely
2. Search for various documentation pages
3. Manually collect URLs
4. Process one by one
Total time: ~5 minutes
```
### When Not Available
Fallback strategy when context7.com unavailable:
- If context7.com returns 404 → try WebSearch for llms.txt
- If WebSearch finds nothing in 30 seconds → move to repository
- If domain is incorrect → try 2-3 alternatives, then move on
- If documentation is very old → likely doesn't have llms.txt
## 2. Use Parallel Agents Aggressively
### Why
- **Speed**: N tasks in time of 1 (vs N × time)
- **Efficiency**: Better resource utilization
- **Coverage**: Comprehensive results faster
- **Scalability**: Handles large documentation sets
### Guidelines
**Always use parallel for 3+ URLs:**
```
3 URLs → 1 Explorer agent (acceptable)
4-10 URLs → 3-5 Explorer agents (optimal)
11+ URLs → 5-7 agents in phases (best)
```
**Launch all agents in single message:**
```
Good:
[Send one message with 5 Task tool calls]
Bad:
[Send message with Task call]
[Wait for result]
[Send another message with Task call]
[Wait for result]
...
```
### Distribution Strategy
**Even distribution:**
```
10 URLs, 5 agents:
Agent 1: URLs 1-2
Agent 2: URLs 3-4
Agent 3: URLs 5-6
Agent 4: URLs 7-8
Agent 5: URLs 9-10
```
**Topic-based distribution:**
```
10 URLs, 3 agents:
Agent 1: Installation & Setup (URLs 1-3)
Agent 2: Core Concepts & API (URLs 4-7)
Agent 3: Examples & Guides (URLs 8-10)
```
### When Not to Parallelize
- Single URL (use WebFetch)
- 2 URLs (single agent is fine)
- Dependencies between tasks (sequential required)
- Limited documentation (1-2 pages)
## 3. Verify Official Sources
### Why
- **Accuracy**: Avoid outdated information
- **Security**: Prevent malicious content
- **Credibility**: Maintain trust
- **Relevance**: Match user's version/needs
### Verification Checklist
**For llms.txt:**
```
[ ] Domain matches official site
[ ] HTTPS connection
[ ] Content format is valid
[ ] URLs point to official docs
[ ] Last-Modified header is recent (if available)
```
**For repositories:**
```
[ ] Organization matches official entity
[ ] Star count appropriate for library
[ ] Recent commits (last 6 months)
[ ] README mentions official status
[ ] Links back to official website
[ ] License matches expectations
```
**For documentation:**
```
[ ] Domain is official
[ ] Version matches user request
[ ] Last updated date visible
[ ] Content is complete (not stubs)
[ ] Links work (not 404s)
```
### Red Flags
⚠️ **Unofficial sources:**
- Personal GitHub forks
- Outdated tutorials (>2 years old)
- Unmaintained repositories
- Suspicious domains
- No version information
- Conflicting with official docs
### When to Use Unofficial Sources
Acceptable when:
- No official documentation exists
- Clearly labeled as community resource
- Recent and well-maintained
- Cross-referenced with official info
- User is aware of unofficial status
## 4. Report Methodology
### Why
- **Transparency**: User knows how you found info
- **Reproducibility**: User can verify
- **Troubleshooting**: Helps debug issues
- **Trust**: Builds confidence in results
### What to Include
**Always report:**
```markdown
## Source
**Method**: llms.txt / Repository / Research / Mixed
**Primary source**: [main URL or repository]
**Additional sources**: [list]
**Date accessed**: [current date]
**Version**: [documentation version]
```
**For llms.txt:**
```markdown
**Method**: llms.txt
**URL**: https://docs.astro.build/llms.txt
**URLs processed**: 8
**Date accessed**: 2025-10-26
**Version**: Latest (as of Oct 2025)
```
**For repository:**
```markdown
**Method**: Repository analysis (Repomix)
**Repository**: https://github.com/org/library
**Commit**: abc123f (2025-10-20)
**Stars**: 15.2k
**Analysis date**: 2025-10-26
```
**For research:**
```markdown
**Method**: Multi-source research
**Sources**:
- Official website: [url]
- Package registry: [url]
- Stack Overflow: [url]
- Community tutorials: [urls]
**Date accessed**: 2025-10-26
**Note**: No official llms.txt or repository available
```
### Limitations Disclosure
Always note:
```markdown
## ⚠️ Limitations
- Documentation for v2.x (user may need v3.x)
- API reference section incomplete
- Examples based on TypeScript (Python examples unavailable)
- Last updated 6 months ago
```
## 5. Handle Versions Explicitly
### Why
- **Compatibility**: Avoid version mismatch errors
- **Accuracy**: Features vary by version
- **Migration**: Support upgrade paths
- **Clarity**: No ambiguity about what's covered
### Version Detection
**Check these sources:**
```
1. URL path: /docs/v2/
2. Page header/title
3. Version selector on page
4. Git tag/branch name
5. Package.json or equivalent
6. Release date correlation
```
### Version Handling Rules
**User specifies version:**
```
Request: "Documentation for React 18"
→ Search: "React v18 documentation"
→ Verify: Check version in content
→ Report: "Documentation for React v18.2.0"
```
**User doesn't specify:**
```
Request: "Documentation for Next.js"
→ Default: Assume latest
→ Confirm: "I'll find the latest Next.js documentation"
→ Report: "Documentation for Next.js 14.0 (latest as of [date])"
```
**Version mismatch found:**
```
Request: "Docs for v2"
Found: Only v3 documentation
→ Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide."
```
### Multi-Version Scenarios
**Comparison request:**
```
Request: "Compare v1 and v2"
→ Find both versions
→ Launch parallel agents (set A for v1, set B for v2)
→ Present side-by-side analysis
```
**Migration request:**
```
Request: "How to migrate from v1 to v2"
→ Find v2 migration guide
→ Also fetch v1 and v2 docs
→ Highlight breaking changes
→ Provide code examples (before/after)
```
## 6. Aggregate Intelligently
### Why
- **Clarity**: Easier to understand
- **Efficiency**: Less cognitive load
- **Completeness**: Unified view
- **Actionability**: Clear next steps
### Bad Aggregation (Don't Do This)
```markdown
## Results
Agent 1 found:
[dump of agent 1 output]
Agent 2 found:
[dump of agent 2 output]
Agent 3 found:
[dump of agent 3 output]
```
Problems:
- Redundant information repeated
- No synthesis
- Hard to scan
- Lacks narrative
### Good Aggregation (Do This)
```markdown
## Installation
[Synthesized from agents 1 & 2]
Three installation methods available:
1. **npm (recommended)**:
```bash
npm install library-name
```
2. **CDN**: [from agent 1]
```html
<script src="..."></script>
```
3. **Manual**: [from agent 3]
Download and include in project
## Core Concepts
[Synthesized from agents 2 & 4]
The library is built around three main concepts:
1. **Components**: [definition from agent 2]
2. **State**: [definition from agent 4]
3. **Effects**: [definition from agent 2]
## Examples
[From agents 3 & 5, deduplicated]
...
```
Benefits:
- Organized by topic
- Deduplicated
- Clear narrative
- Easy to scan
### Synthesis Techniques
**Deduplication:**
```
Agent 1: "Install with npm install foo"
Agent 2: "You can install using npm: npm install foo"
→ Synthesized: "Install: `npm install foo`"
```
**Prioritization:**
```
Agent 1: Basic usage example
Agent 2: Basic usage example (same)
Agent 3: Advanced usage example
→ Keep: Basic (from agent 1) + Advanced (from agent 3)
```
**Organization:**
```
Agents returned mixed information:
- Installation steps
- Configuration
- Usage example
- Installation requirements
- More usage examples
→ Reorganize:
1. Installation (requirements + steps)
2. Configuration
3. Usage (all examples together)
```
## 7. Time Management
### Why
- **User experience**: Fast results
- **Resource efficiency**: Don't waste compute
- **Fail fast**: Quickly try alternatives
- **Practical limits**: Avoid hanging
### Timeouts
**Set explicit timeouts:**
```
WebSearch: 30 seconds
WebFetch: 60 seconds
Repository clone: 5 minutes
Repomix processing: 10 minutes
Explorer agent: 5 minutes per URL
Researcher agent: 10 minutes
```
### Time Budgets
**Simple query (single library, latest version):**
```
Target: <2 minutes total
Phase 1 (Discovery): 30 seconds
- llms.txt search: 15 seconds
- Fetch llms.txt: 15 seconds
Phase 2 (Exploration): 60 seconds
- Launch agents: 5 seconds
- Agents fetch URLs: 60 seconds (parallel)
Phase 3 (Aggregation): 30 seconds
- Synthesize results
- Format output
Total: ~2 minutes
```
**Complex query (multiple versions, comparison):**
```
Target: <5 minutes total
Phase 1 (Discovery): 60 seconds
- Search both versions
- Fetch both llms.txt files
Phase 2 (Exploration): 180 seconds
- Launch 6 agents (2 sets of 3)
- Parallel exploration
Phase 3 (Comparison): 60 seconds
- Analyze differences
- Format side-by-side
Total: ~5 minutes
```
### When to Extend Timeouts
Acceptable to go longer when:
- User explicitly requests comprehensive analysis
- Repository is large but necessary
- Multiple fallbacks attempted
- User is informed of delay
### When to Give Up
Move to next method after:
- 3 failed attempts on same approach
- Timeout exceeded by 2x
- No progress for 30 seconds
- Error indicates permanent failure (404, auth required)
## 8. Cache Findings
### Why
- **Speed**: Instant results for repeated requests
- **Efficiency**: Reduce network requests
- **Consistency**: Same results within session
- **Reliability**: Less dependent on network
### What to Cache
**High value (always cache):**
```
- Repomix output (large, expensive to generate)
- llms.txt content (static, frequently referenced)
- Repository README (relatively static)
- Package registry metadata (changes rarely)
```
**Medium value (cache within session):**
```
- Documentation page content
- Search results
- Repository structure
- Version lists
```
**Low value (don't cache):**
```
- Real-time data (latest releases)
- User-specific content
- Time-sensitive information
```
### Cache Duration
```
Within conversation:
- All fetched content (reuse freely)
Within session:
- Repomix output (until conversation ends)
- llms.txt content (until new version requested)
Across sessions:
- Don't cache (start fresh each time)
```
### Cache Invalidation
Refresh cache when:
```
- User requests specific different version
- User says "get latest" or "refresh"
- Explicit time reference ("docs from today")
- Previous cache is from different library
```
### Implementation
```
# First request for library X
1. Fetch llms.txt
2. Store content in session variable
3. Use for processing
# Second request for library X (same session)
1. Check if llms.txt cached
2. Reuse cached content
3. Skip redundant fetch
# Request for library Y
1. Don't reuse library X cache
2. Fetch fresh for library Y
```
### Cache Hit Messages
```markdown
Using cached llms.txt from 5 minutes ago.
To fetch fresh, say "refresh" or "get latest".
```
## Quick Reference Checklist
### Before Starting
- [ ] Identify library name clearly
- [ ] Confirm version (default: latest)
- [ ] Check if cached data available
- [ ] Plan method (llms.txt → repo → research)
### During Discovery
- [ ] Start with llms.txt search
- [ ] Verify source is official
- [ ] Check version matches requirement
- [ ] Set timeout for each operation
- [ ] Fall back quickly if method fails
### During Exploration
- [ ] Use parallel agents for 3+ URLs
- [ ] Launch all agents in single message
- [ ] Distribute workload evenly
- [ ] Monitor for errors/timeouts
- [ ] Be ready to retry or fallback
### Before Presenting
- [ ] Synthesize by topic (not by agent)
- [ ] Deduplicate repeated information
- [ ] Verify version is correct
- [ ] Include source attribution
- [ ] Note any limitations
- [ ] Format clearly
- [ ] Check completeness
### Quality Gates
Ask before presenting:
- [ ] Is information accurate?
- [ ] Are sources official?
- [ ] Does version match request?
- [ ] Are all key topics covered?
- [ ] Are limitations noted?
- [ ] Is methodology documented?
- [ ] Is output well-organized?