Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:48:52 +08:00
commit 6ec3196ecc
434 changed files with 125248 additions and 0 deletions

View File

@@ -0,0 +1,632 @@
# Best Practices
Essential principles and proven strategies for effective documentation discovery.
## 1. Prioritize context7.com for llms.txt
### Why
- **Comprehensive aggregator**: Single source for most documentation
- **Most efficient**: Instant access without searching
- **Authoritative**: Aggregates official sources
- **Up-to-date**: Continuously maintained
- **Fast**: Direct URL construction vs searching
- **Topic filtering**: Targeted results with ?topic= parameter
### Implementation
```
Step 1: Try context7.com (ALWAYS FIRST)
Know GitHub repo?
YES → https://context7.com/{org}/{repo}/llms.txt
NO → Continue
Know website?
YES → https://context7.com/websites/{normalized-path}/llms.txt
NO → Continue
Specific topic needed?
YES → Add ?topic={query} parameter
NO → Use base URL
Found?
YES → Use as primary source
NO → Fall back to WebSearch for llms.txt
Still not found?
YES → Fall back to repository analysis
```
### Examples
```
Best approach (context7.com):
1. Direct URL: https://context7.com/vercel/next.js/llms.txt
2. WebFetch llms.txt
3. Launch Explorer agents for URLs
Total time: ~15 seconds
Topic-specific approach:
1. Direct URL: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
2. WebFetch filtered content
3. Present focused results
Total time: ~10 seconds
Good fallback approach:
1. context7.com returns 404
2. WebSearch: "Astro llms.txt site:docs.astro.build"
3. Found → WebFetch llms.txt
4. Launch Explorer agents for URLs
Total time: ~60 seconds
Poor approach:
1. Skip context7.com entirely
2. Search for various documentation pages
3. Manually collect URLs
4. Process one by one
Total time: ~5 minutes
```
### When Not Available
Fallback strategy when context7.com unavailable:
- If context7.com returns 404 → try WebSearch for llms.txt
- If WebSearch finds nothing in 30 seconds → move to repository
- If domain is incorrect → try 2-3 alternatives, then move on
- If documentation is very old → likely doesn't have llms.txt
## 2. Use Parallel Agents Aggressively
### Why
- **Speed**: N tasks in time of 1 (vs N × time)
- **Efficiency**: Better resource utilization
- **Coverage**: Comprehensive results faster
- **Scalability**: Handles large documentation sets
### Guidelines
**Always use parallel for 3+ URLs:**
```
3 URLs → 1 Explorer agent (acceptable)
4-10 URLs → 3-5 Explorer agents (optimal)
11+ URLs → 5-7 agents in phases (best)
```
**Launch all agents in single message:**
```
Good:
[Send one message with 5 Task tool calls]
Bad:
[Send message with Task call]
[Wait for result]
[Send another message with Task call]
[Wait for result]
...
```
### Distribution Strategy
**Even distribution:**
```
10 URLs, 5 agents:
Agent 1: URLs 1-2
Agent 2: URLs 3-4
Agent 3: URLs 5-6
Agent 4: URLs 7-8
Agent 5: URLs 9-10
```
**Topic-based distribution:**
```
10 URLs, 3 agents:
Agent 1: Installation & Setup (URLs 1-3)
Agent 2: Core Concepts & API (URLs 4-7)
Agent 3: Examples & Guides (URLs 8-10)
```
### When Not to Parallelize
- Single URL (use WebFetch)
- 2 URLs (single agent is fine)
- Dependencies between tasks (sequential required)
- Limited documentation (1-2 pages)
## 3. Verify Official Sources
### Why
- **Accuracy**: Avoid outdated information
- **Security**: Prevent malicious content
- **Credibility**: Maintain trust
- **Relevance**: Match user's version/needs
### Verification Checklist
**For llms.txt:**
```
[ ] Domain matches official site
[ ] HTTPS connection
[ ] Content format is valid
[ ] URLs point to official docs
[ ] Last-Modified header is recent (if available)
```
**For repositories:**
```
[ ] Organization matches official entity
[ ] Star count appropriate for library
[ ] Recent commits (last 6 months)
[ ] README mentions official status
[ ] Links back to official website
[ ] License matches expectations
```
**For documentation:**
```
[ ] Domain is official
[ ] Version matches user request
[ ] Last updated date visible
[ ] Content is complete (not stubs)
[ ] Links work (not 404s)
```
### Red Flags
⚠️ **Unofficial sources:**
- Personal GitHub forks
- Outdated tutorials (>2 years old)
- Unmaintained repositories
- Suspicious domains
- No version information
- Conflicting with official docs
### When to Use Unofficial Sources
Acceptable when:
- No official documentation exists
- Clearly labeled as community resource
- Recent and well-maintained
- Cross-referenced with official info
- User is aware of unofficial status
## 4. Report Methodology
### Why
- **Transparency**: User knows how you found info
- **Reproducibility**: User can verify
- **Troubleshooting**: Helps debug issues
- **Trust**: Builds confidence in results
### What to Include
**Always report:**
```markdown
## Source
**Method**: llms.txt / Repository / Research / Mixed
**Primary source**: [main URL or repository]
**Additional sources**: [list]
**Date accessed**: [current date]
**Version**: [documentation version]
```
**For llms.txt:**
```markdown
**Method**: llms.txt
**URL**: https://docs.astro.build/llms.txt
**URLs processed**: 8
**Date accessed**: 2025-10-26
**Version**: Latest (as of Oct 2025)
```
**For repository:**
```markdown
**Method**: Repository analysis (Repomix)
**Repository**: https://github.com/org/library
**Commit**: abc123f (2025-10-20)
**Stars**: 15.2k
**Analysis date**: 2025-10-26
```
**For research:**
```markdown
**Method**: Multi-source research
**Sources**:
- Official website: [url]
- Package registry: [url]
- Stack Overflow: [url]
- Community tutorials: [urls]
**Date accessed**: 2025-10-26
**Note**: No official llms.txt or repository available
```
### Limitations Disclosure
Always note:
```markdown
## ⚠️ Limitations
- Documentation for v2.x (user may need v3.x)
- API reference section incomplete
- Examples based on TypeScript (Python examples unavailable)
- Last updated 6 months ago
```
## 5. Handle Versions Explicitly
### Why
- **Compatibility**: Avoid version mismatch errors
- **Accuracy**: Features vary by version
- **Migration**: Support upgrade paths
- **Clarity**: No ambiguity about what's covered
### Version Detection
**Check these sources:**
```
1. URL path: /docs/v2/
2. Page header/title
3. Version selector on page
4. Git tag/branch name
5. Package.json or equivalent
6. Release date correlation
```
### Version Handling Rules
**User specifies version:**
```
Request: "Documentation for React 18"
→ Search: "React v18 documentation"
→ Verify: Check version in content
→ Report: "Documentation for React v18.2.0"
```
**User doesn't specify:**
```
Request: "Documentation for Next.js"
→ Default: Assume latest
→ Confirm: "I'll find the latest Next.js documentation"
→ Report: "Documentation for Next.js 14.0 (latest as of [date])"
```
**Version mismatch found:**
```
Request: "Docs for v2"
Found: Only v3 documentation
→ Report: "⚠️ Requested v2, but only v3 docs available. Here's v3 with migration guide."
```
### Multi-Version Scenarios
**Comparison request:**
```
Request: "Compare v1 and v2"
→ Find both versions
→ Launch parallel agents (set A for v1, set B for v2)
→ Present side-by-side analysis
```
**Migration request:**
```
Request: "How to migrate from v1 to v2"
→ Find v2 migration guide
→ Also fetch v1 and v2 docs
→ Highlight breaking changes
→ Provide code examples (before/after)
```
## 6. Aggregate Intelligently
### Why
- **Clarity**: Easier to understand
- **Efficiency**: Less cognitive load
- **Completeness**: Unified view
- **Actionability**: Clear next steps
### Bad Aggregation (Don't Do This)
```markdown
## Results
Agent 1 found:
[dump of agent 1 output]
Agent 2 found:
[dump of agent 2 output]
Agent 3 found:
[dump of agent 3 output]
```
Problems:
- Redundant information repeated
- No synthesis
- Hard to scan
- Lacks narrative
### Good Aggregation (Do This)
```markdown
## Installation
[Synthesized from agents 1 & 2]
Three installation methods available:
1. **npm (recommended)**:
```bash
npm install library-name
```
2. **CDN**: [from agent 1]
```html
<script src="..."></script>
```
3. **Manual**: [from agent 3]
Download and include in project
## Core Concepts
[Synthesized from agents 2 & 4]
The library is built around three main concepts:
1. **Components**: [definition from agent 2]
2. **State**: [definition from agent 4]
3. **Effects**: [definition from agent 2]
## Examples
[From agents 3 & 5, deduplicated]
...
```
Benefits:
- Organized by topic
- Deduplicated
- Clear narrative
- Easy to scan
### Synthesis Techniques
**Deduplication:**
```
Agent 1: "Install with npm install foo"
Agent 2: "You can install using npm: npm install foo"
→ Synthesized: "Install: `npm install foo`"
```
**Prioritization:**
```
Agent 1: Basic usage example
Agent 2: Basic usage example (same)
Agent 3: Advanced usage example
→ Keep: Basic (from agent 1) + Advanced (from agent 3)
```
**Organization:**
```
Agents returned mixed information:
- Installation steps
- Configuration
- Usage example
- Installation requirements
- More usage examples
→ Reorganize:
1. Installation (requirements + steps)
2. Configuration
3. Usage (all examples together)
```
## 7. Time Management
### Why
- **User experience**: Fast results
- **Resource efficiency**: Don't waste compute
- **Fail fast**: Quickly try alternatives
- **Practical limits**: Avoid hanging
### Timeouts
**Set explicit timeouts:**
```
WebSearch: 30 seconds
WebFetch: 60 seconds
Repository clone: 5 minutes
Repomix processing: 10 minutes
Explorer agent: 5 minutes per URL
Researcher agent: 10 minutes
```
### Time Budgets
**Simple query (single library, latest version):**
```
Target: <2 minutes total
Phase 1 (Discovery): 30 seconds
- llms.txt search: 15 seconds
- Fetch llms.txt: 15 seconds
Phase 2 (Exploration): 60 seconds
- Launch agents: 5 seconds
- Agents fetch URLs: 60 seconds (parallel)
Phase 3 (Aggregation): 30 seconds
- Synthesize results
- Format output
Total: ~2 minutes
```
**Complex query (multiple versions, comparison):**
```
Target: <5 minutes total
Phase 1 (Discovery): 60 seconds
- Search both versions
- Fetch both llms.txt files
Phase 2 (Exploration): 180 seconds
- Launch 6 agents (2 sets of 3)
- Parallel exploration
Phase 3 (Comparison): 60 seconds
- Analyze differences
- Format side-by-side
Total: ~5 minutes
```
### When to Extend Timeouts
Acceptable to go longer when:
- User explicitly requests comprehensive analysis
- Repository is large but necessary
- Multiple fallbacks attempted
- User is informed of delay
### When to Give Up
Move to next method after:
- 3 failed attempts on same approach
- Timeout exceeded by 2x
- No progress for 30 seconds
- Error indicates permanent failure (404, auth required)
## 8. Cache Findings
### Why
- **Speed**: Instant results for repeated requests
- **Efficiency**: Reduce network requests
- **Consistency**: Same results within session
- **Reliability**: Less dependent on network
### What to Cache
**High value (always cache):**
```
- Repomix output (large, expensive to generate)
- llms.txt content (static, frequently referenced)
- Repository README (relatively static)
- Package registry metadata (changes rarely)
```
**Medium value (cache within session):**
```
- Documentation page content
- Search results
- Repository structure
- Version lists
```
**Low value (don't cache):**
```
- Real-time data (latest releases)
- User-specific content
- Time-sensitive information
```
### Cache Duration
```
Within conversation:
- All fetched content (reuse freely)
Within session:
- Repomix output (until conversation ends)
- llms.txt content (until new version requested)
Across sessions:
- Don't cache (start fresh each time)
```
### Cache Invalidation
Refresh cache when:
```
- User requests specific different version
- User says "get latest" or "refresh"
- Explicit time reference ("docs from today")
- Previous cache is from different library
```
### Implementation
```
# First request for library X
1. Fetch llms.txt
2. Store content in session variable
3. Use for processing
# Second request for library X (same session)
1. Check if llms.txt cached
2. Reuse cached content
3. Skip redundant fetch
# Request for library Y
1. Don't reuse library X cache
2. Fetch fresh for library Y
```
### Cache Hit Messages
```markdown
Using cached llms.txt from 5 minutes ago.
To fetch fresh, say "refresh" or "get latest".
```
## Quick Reference Checklist
### Before Starting
- [ ] Identify library name clearly
- [ ] Confirm version (default: latest)
- [ ] Check if cached data available
- [ ] Plan method (llms.txt → repo → research)
### During Discovery
- [ ] Start with llms.txt search
- [ ] Verify source is official
- [ ] Check version matches requirement
- [ ] Set timeout for each operation
- [ ] Fall back quickly if method fails
### During Exploration
- [ ] Use parallel agents for 3+ URLs
- [ ] Launch all agents in single message
- [ ] Distribute workload evenly
- [ ] Monitor for errors/timeouts
- [ ] Be ready to retry or fallback
### Before Presenting
- [ ] Synthesize by topic (not by agent)
- [ ] Deduplicate repeated information
- [ ] Verify version is correct
- [ ] Include source attribution
- [ ] Note any limitations
- [ ] Format clearly
- [ ] Check completeness
### Quality Gates
Ask before presenting:
- [ ] Is information accurate?
- [ ] Are sources official?
- [ ] Does version match request?
- [ ] Are all key topics covered?
- [ ] Are limitations noted?
- [ ] Is methodology documented?
- [ ] Is output well-organized?

View File

@@ -0,0 +1,461 @@
# Common Documentation Sources
Reference guide for locating documentation across popular platforms and ecosystems.
## context7.com Locations (PRIORITY)
**ALWAYS try context7.com first for all libraries**
### JavaScript/TypeScript Frameworks
- **Astro**: https://context7.com/withastro/astro/llms.txt
- **Next.js**: https://context7.com/vercel/next.js/llms.txt
- **Remix**: https://context7.com/remix-run/remix/llms.txt
- **SvelteKit**: https://context7.com/sveltejs/kit/llms.txt
- **Nuxt**: https://context7.com/nuxt/nuxt/llms.txt
### Frontend Libraries & UI
- **React**: https://context7.com/facebook/react/llms.txt
- **Vue**: https://context7.com/vuejs/core/llms.txt
- **Svelte**: https://context7.com/sveltejs/svelte/llms.txt
- **shadcn/ui**: https://context7.com/shadcn-ui/ui/llms.txt
- **Radix UI**: https://context7.com/radix-ui/primitives/llms.txt
### Backend/Full-stack
- **Hono**: https://context7.com/honojs/hono/llms.txt
- **Fastify**: https://context7.com/fastify/fastify/llms.txt
- **tRPC**: https://context7.com/trpc/trpc/llms.txt
### Build Tools
- **Vite**: https://context7.com/vitejs/vite/llms.txt
- **Turbo**: https://context7.com/vercel/turbo/llms.txt
### Databases/ORMs
- **Prisma**: https://context7.com/prisma/prisma/llms.txt
- **Drizzle**: https://context7.com/drizzle-team/drizzle-orm/llms.txt
### Authentication
- **Better Auth**: https://context7.com/better-auth/better-auth/llms.txt
- **Auth.js**: https://context7.com/nextauthjs/next-auth/llms.txt
### Image Processing
- **ImageMagick**: https://context7.com/imagick/imagick/llms.txt
### Topic-Specific Examples
- **shadcn/ui date components**: https://context7.com/shadcn-ui/ui/llms.txt?topic=date
- **shadcn/ui buttons**: https://context7.com/shadcn-ui/ui/llms.txt?topic=button
- **Next.js caching**: https://context7.com/vercel/next.js/llms.txt?topic=cache
- **FFmpeg compression**: https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt?topic=compress
## Official llms.txt Locations (FALLBACK)
Use these only if context7.com returns 404:
### JavaScript/TypeScript Frameworks
- **Astro**: https://docs.astro.build/llms.txt
- **Next.js**: https://nextjs.org/llms.txt
- **Remix**: https://remix.run/llms.txt
- **SvelteKit**: https://kit.svelte.dev/llms.txt
- **Nuxt**: https://nuxt.com/llms.txt
### Frontend Libraries
- **React**: https://react.dev/llms.txt
- **Vue**: https://vuejs.org/llms.txt
- **Svelte**: https://svelte.dev/llms.txt
### Backend/Full-stack
- **Hono**: https://hono.dev/llms.txt
- **Fastify**: https://fastify.dev/llms.txt
- **tRPC**: https://trpc.io/llms.txt
### Build Tools
- **Vite**: https://vitejs.dev/llms.txt
- **Turbopack**: https://turbo.build/llms.txt
### Databases/ORMs
- **Prisma**: https://prisma.io/llms.txt
- **Drizzle**: https://orm.drizzle.team/llms.txt
## Repository Patterns
### GitHub (Most Common)
**URL patterns:**
```
https://github.com/[org]/[repo]
https://github.com/[user]/[repo]
```
**Common organization names:**
- Company name: `github.com/vercel/next.js`
- Project name: `github.com/remix-run/remix`
- Community: `github.com/facebook/react`
**Documentation locations in repositories:**
```
/docs/
/documentation/
/website/docs/
/packages/docs/
README.md
CONTRIBUTING.md
/examples/
```
### GitLab
**URL pattern:**
```
https://gitlab.com/[org]/[repo]
```
### Bitbucket (Less Common)
**URL pattern:**
```
https://bitbucket.org/[org]/[repo]
```
## Package Registries
### npm (JavaScript/TypeScript)
**URL**: `https://npmjs.com/package/[name]`
**Available info:**
- Description
- Homepage link
- Repository link
- Version history
- Dependencies
**Useful for:**
- Finding official links
- Version information
- Package metadata
### PyPI (Python)
**URL**: `https://pypi.org/project/[name]`
**Available info:**
- Description
- Documentation link
- Homepage
- Repository link
- Release history
**Useful for:**
- Python package documentation
- Official links
- Version compatibility
### RubyGems (Ruby)
**URL**: `https://rubygems.org/gems/[name]`
**Available info:**
- Description
- Homepage
- Documentation
- Source code link
- Dependencies
**Useful for:**
- Ruby gem documentation
- Version information
### Cargo (Rust)
**URL**: `https://crates.io/crates/[name]`
**Available info:**
- Description
- docs.rs link (auto-generated docs)
- Repository
- Version history
**Useful for:**
- Rust crate documentation
- Auto-generated API docs
- Repository link
### Maven Central (Java)
**URL**: `https://search.maven.org/artifact/[group]/[artifact]`
**Available info:**
- Versions
- Dependencies
- Repository link
- License
**Useful for:**
- Java library information
- Dependency management
## Documentation Hosting Platforms
### Read the Docs
**URL patterns:**
```
https://[project].readthedocs.io
https://readthedocs.org/projects/[project]
```
**Features:**
- Version switching
- Multiple formats (HTML, PDF, ePub)
- Search functionality
- Often auto-generated from reStructuredText/Markdown
### GitBook
**URL patterns:**
```
https://[org].gitbook.io/[project]
https://docs.[domain].com (often GitBook-powered)
```
**Features:**
- Clean, modern interface
- Good navigation
- Often manually curated
- May require API key for programmatic access
### Docusaurus
**URL patterns:**
```
https://[project].io
https://docs.[project].com
```
**Common in:**
- React ecosystem
- Meta/Facebook projects
- Modern open-source projects
**Features:**
- React-based
- Fast, static site
- Version management
- Good search
### MkDocs
**URL patterns:**
```
https://[user].github.io/[project]
https://[custom-domain].com
```
**Features:**
- Python ecosystem
- Static site from Markdown
- Often on GitHub Pages
- Material theme popular
### VitePress
**URL patterns:**
```
https://[project].dev
https://docs.[project].com
```
**Common in:**
- Vue ecosystem
- Modern projects
- Vite-based projects
**Features:**
- Vue-powered
- Very fast
- Clean design
- Good DX
## Documentation Search Patterns
### Finding llms.txt
**ALWAYS try context7.com first:**
For GitHub repositories:
```
https://context7.com/{org}/{repo}/llms.txt
```
For websites:
```
https://context7.com/websites/{normalized-path}/llms.txt
```
With topic filter:
```
https://context7.com/{path}/llms.txt?topic={query}
```
**Fallback: Traditional search if context7.com returns 404:**
```
"[library] llms.txt site:[known-domain]"
```
**Alternative domains to try:**
```
site:docs.[library].com
site:[library].dev
site:[library].io
site:[library].org
```
### Finding Official Repository
**Search pattern:**
```
"[library] official github repository"
"[library] source code github"
```
**Verification checklist:**
- Check organization/user is official
- Verify star count (popular libraries have many)
- Check last commit date (active maintenance)
- Look for official links in README
### Finding Official Documentation
**Search patterns:**
```
"[library] official documentation"
"[library] docs site:official-domain"
"[library] API reference"
```
**Domain patterns:**
```
docs.[library].com
[library].dev/docs
docs.[library].io
[library].readthedocs.io
```
## Common Documentation Structures
### Typical Section Names
**Getting started:**
- Getting Started
- Quick Start
- Introduction
- Installation
- Setup
**Core concepts:**
- Core Concepts
- Fundamentals
- Basics
- Key Concepts
- Architecture
**Guides:**
- Guides
- How-To Guides
- Tutorials
- Examples
- Recipes
**Reference:**
- API Reference
- API Documentation
- Reference
- API
- CLI Reference
**Advanced:**
- Advanced
- Advanced Topics
- Deep Dives
- Internals
- Performance
### Common File Names
```
README.md
GETTING_STARTED.md
INSTALLATION.md
CONTRIBUTING.md
CHANGELOG.md
API.md
TUTORIAL.md
EXAMPLES.md
FAQ.md
```
## Framework-Specific Patterns
### React Ecosystem
**Common patterns:**
```
- Uses Docusaurus
- Documentation at [project].dev or docs.[project].com
- Often has interactive examples
- CodeSandbox/StackBlitz embeds
```
### Vue Ecosystem
**Common patterns:**
```
- Uses VitePress
- Documentation at [project].vuejs.org
- Bilingual (English/Chinese)
- API reference auto-generated
```
### Python Ecosystem
**Common patterns:**
```
- Read the Docs hosting
- Sphinx-generated
- reStructuredText format
- [project].readthedocs.io
```
### Rust Ecosystem
**Common patterns:**
```
- docs.rs for API docs
- Book format for guides ([project].rs/book)
- Markdown in repository
- Well-structured examples/
```
## Quick Lookup Table
| Ecosystem | Registry | Docs Pattern | Common Host |
|-----------|----------|--------------|-------------|
| JavaScript/TS | npmjs.com | [name].dev | Docusaurus, VitePress |
| Python | pypi.org | readthedocs.io | Read the Docs |
| Rust | crates.io | docs.rs | docs.rs |
| Ruby | rubygems.org | rubydoc.info | RDoc |
| Go | pkg.go.dev | pkg.go.dev | pkg.go.dev |
| PHP | packagist.org | [name].org | Various |
| Java | maven.org | javadoc | Maven Central |

View File

@@ -0,0 +1,621 @@
# Error Handling Guide
Comprehensive troubleshooting and error resolution strategies for documentation discovery.
## context7.com Not Accessible
### Symptoms
- 404 error (library not indexed)
- Connection timeout
- Server error (500)
- Empty response
### Troubleshooting Steps
**1. Verify URL pattern:**
For GitHub repos:
```
✓ Correct: https://context7.com/vercel/next.js/llms.txt
✗ Wrong: https://context7.com/nextjs/llms.txt
```
For websites:
```
✓ Correct: https://context7.com/websites/imgix/llms.txt
✗ Wrong: https://context7.com/docs.imgix.com/llms.txt
```
**2. Try official llms.txt as fallback:**
```
https://docs.[library].com/llms.txt
https://[library].dev/llms.txt
https://[library].io/llms.txt
```
**3. Search for llms.txt if still not found:**
```
WebSearch: "[library] llms.txt"
WebSearch: "[library] documentation AI format"
```
**4. Fall back to repository analysis:**
- If no llms.txt available anywhere
- Note in report: "llms.txt not available, used repository analysis"
### Common Causes
- Library not yet indexed by context7.com
- Very new or obscure library
- Private repository
- context7.com temporary outage
### Example Resolution
```
Problem: https://context7.com/org/new-lib/llms.txt returns 404
Steps:
1. Check official site: https://new-lib.dev/llms.txt ✗ Not found
2. WebSearch for llms.txt ✗ Not found
3. Fall back to repository: https://github.com/org/new-lib ✓ Found
4. Use Repomix for documentation extraction
5. Note in report: "No llms.txt available, analyzed repository directly"
```
## llms.txt Not Accessible (Official Sites)
### Symptoms
- 404 error
- Connection timeout
- Access denied (403)
- Empty response
### Troubleshooting Steps
**1. ALWAYS try context7.com first:**
```
https://context7.com/{org}/{repo}/llms.txt
```
**2. Try alternative official domains:**
```
https://[name].dev/llms.txt
https://[name].io/llms.txt
https://[name].com/llms.txt
https://docs.[name].com/llms.txt
https://www.[name].com/llms.txt
```
**3. Check for redirects:**
- Old domain → new domain
- Non-HTTPS → HTTPS
- www → non-www or vice versa
- Root → /docs subdirectory
**4. Search for llms.txt mention:**
```
WebSearch: "[library] llms.txt"
WebSearch: "[library] documentation AI format"
```
**5. Check documentation announcements:**
- Blog posts about llms.txt
- GitHub discussions
- Recent release notes
**6. If all fail:**
- Fall back to repository analysis (Phase 3)
- Note in report: "llms.txt not available"
### Common Causes
- Documentation recently moved/redesigned
- llms.txt not yet implemented
- Domain configuration issues
- Rate limiting or IP blocking
- Firewall/security restrictions
### Example Resolution
```
Problem: https://example.dev/llms.txt returns 404
Steps:
1. Try: https://docs.example.dev/llms.txt ✓ Works!
2. Note: Documentation moved to docs subdomain
3. Proceed with Phase 2 using correct URL
```
## Repository Not Found
### Symptoms
- GitHub 404 error
- No official repository found
- Repository is private/requires auth
- Multiple competing repositories
### Troubleshooting Steps
**1. Search official website:**
```
WebSearch: "[library] official website"
```
**2. Check package registries:**
```
WebSearch: "[library] npm"
WebSearch: "[library] pypi"
WebSearch: "[library] crates.io"
```
**3. Look for organization GitHub:**
```
WebSearch: "[company] github organization"
WebSearch: "[library] github org:[known-org]"
```
**4. Check for mirrors or forks:**
```
WebSearch: "[library] github mirror"
WebSearch: "[library] source code"
```
**5. Verify through package manager:**
```bash
# npm example
npm info [package-name] repository
# pip example
pip show [package-name]
```
**6. If all fail:**
- Use Researcher agents (Phase 4)
- Note: "No public repository available"
### Common Causes
- Proprietary/closed-source software
- Documentation separate from code repository
- Company uses internal hosting (GitLab, Bitbucket, self-hosted)
- Project discontinued or archived
- Repository renamed/moved
### Verification Checklist
When you find a repository, verify:
- [ ] Organization/user matches official entity
- [ ] Star count appropriate for library popularity
- [ ] Recent commits (active maintenance)
- [ ] README mentions official status
- [ ] Links back to official website
- [ ] License matches expectations
## Repomix Failures
### Symptoms
- Out of memory error
- Command hangs indefinitely
- Output file empty or corrupted
- Permission errors
- Network timeout during clone
### Troubleshooting Steps
**1. Check repository size:**
```bash
# Clone and check size
git clone [url] /tmp/test-repo
du -sh /tmp/test-repo
# If >500MB, use focused approach
```
**2. Focus on documentation only:**
```bash
repomix --include "docs/**,README.md,*.md" --output docs.xml
```
**3. Exclude large files:**
```bash
repomix --exclude "*.png,*.jpg,*.pdf,*.zip,dist/**,build/**,node_modules/**" --output repomix-output.xml
```
**4. Use shallow clone:**
```bash
git clone --depth 1 [url] /tmp/docs-analysis
cd /tmp/docs-analysis
repomix --output repomix-output.xml
```
**5. Alternative: Explorer agents**
```
If Repomix fails completely:
1. Read README.md directly
2. List /docs directory structure
3. Launch Explorer agents for key files
4. Read specific documentation files
```
**6. Check system resources:**
```bash
# Check disk space
df -h /tmp
# Check available memory
free -h
# Kill if hung
pkill -9 repomix
```
### Common Causes
- Repository too large (>1GB)
- Many binary files (images, videos)
- Large commit history
- Insufficient disk space
- Memory constraints
- Slow network connection
- Repository has submodules
### Size Guidelines
| Repo Size | Strategy |
|-----------|----------|
| <50MB | Full Repomix |
| 50-200MB | Exclude binaries |
| 200-500MB | Focus on /docs |
| 500MB-1GB | Shallow clone + focus |
| >1GB | Explorer agents only |
## Multiple Conflicting Sources
### Symptoms
- Different installation instructions
- Conflicting API signatures
- Contradictory recommendations
- Version mismatches
- Breaking changes not documented
### Resolution Steps
**1. Check version of each source:**
```
- Note documentation version number
- Check last-updated date
- Check URL for version indicator (v1/, v2/)
- Look for version selector on page
```
**2. Prioritize sources:**
```
Priority order:
1. Official docs (latest version)
2. Official docs (specified version)
3. Package registry (verified)
4. Official repository README
5. Community tutorials (recent)
6. Stack Overflow (recent, high votes)
7. Blog posts (date-verified)
```
**3. Present both with context:**
```markdown
## Installation (v1.x - Legacy)
[old method]
Source: [link] (Last updated: [date])
## Installation (v2.x - Current)
[new method]
Source: [link] (Last updated: [date])
⚠️ Note: v2.x is recommended for new projects.
Migration guide: [link]
```
**4. Cross-reference:**
- Check if conflict is intentional (breaking change)
- Look for migration guides
- Check changelog/release notes
- Verify in GitHub issues/discussions
**5. Document discrepancy:**
```markdown
## ⚠️ Conflicting Information Found
**Source 1** (official docs): Method A
**Source 2** (repository): Method B
**Analysis**: Source 1 reflects v2.x API. Source 2 README
not yet updated. Confirmed via changelog [link].
**Recommendation**: Use Method A (official docs).
```
### Version Identification
**Check these locations:**
```
- URL path: /docs/v2/...
- Page header/footer
- Version selector dropdown
- Git branch/tag
- Package.json or equivalent
- CHANGELOG.md date correlation
```
## Rate Limiting
### Symptoms
- 429 Too Many Requests
- 403 Forbidden (temporary)
- Slow responses
- Connection refused
- "Rate limit exceeded" message
### Solutions
**1. Add delays between requests:**
```bash
# Add 2-second delay
sleep 2
```
**2. Use alternative sources:**
```
Priority fallback chain:
GitHub → Official docs → Package registry → Repository → Archive
```
**3. Batch operations:**
```
Instead of:
- WebFetch URL 1
- WebFetch URL 2
- WebFetch URL 3
Use:
- Launch 3 Explorer agents (single batch)
```
**4. Cache aggressively:**
```
- Reuse fetched content within session
- Don't re-fetch same URLs
- Store repomix output for reuse
- Note fetch time, reuse if <1 hour old
```
**5. Check rate limit headers:**
```
If available:
- X-RateLimit-Remaining
- X-RateLimit-Reset
- Retry-After
```
**6. Respect robots.txt:**
```bash
# Check before aggressive crawling
curl https://example.com/robots.txt
```
### Rate Limit Recovery
**GitHub API (if applicable):**
```
- Anonymous: 60 requests/hour
- Authenticated: 5000 requests/hour
- Wait period: 1 hour from first request
```
**General approach:**
```
1. Detect rate limit (429 or slow responses)
2. Switch to alternative source immediately
3. Don't retry same endpoint repeatedly
4. Note in report: "Rate limit encountered, used [alternative]"
```
## Network Timeouts
### Symptoms
- Request hangs indefinitely
- Connection timeout error
- No response received
- Partial content received
### Solutions
**1. Set explicit timeouts:**
```
WebSearch: 30 seconds max
WebFetch: 60 seconds max
Repository clone: 5 minutes max
Repomix processing: 10 minutes max
```
**2. Retry with timeout:**
```
1st attempt: 60 seconds
2nd attempt: 90 seconds (if needed)
3rd attempt: Switch to alternative method
```
**3. Check network connectivity:**
```bash
# Test basic connectivity
ping -c 3 8.8.8.8
# Test DNS resolution
nslookup docs.example.com
# Test specific host
curl -I https://docs.example.com
```
**4. Use alternative endpoints:**
```
If main site times out:
- Try CDN version
- Try regional mirror
- Try cached version (Google Cache, Archive.org)
```
**5. Fall back gracefully:**
```
Main docs timeout → Repository → Package registry → Research
```
## Incomplete Documentation
### Symptoms
- Documentation stub pages
- "Coming soon" sections
- Broken links (404)
- Missing API reference
- Outdated examples
### Handling Strategy
**1. Identify gaps:**
```markdown
## Documentation Status
✅ Available:
- Installation guide
- Basic usage examples
⚠️ Incomplete:
- Advanced features (stub page)
- API reference (404 links)
❌ Missing:
- Migration guide
- Performance optimization
```
**2. Supplement from repository:**
```
- Check /examples directory
- Read test files for usage
- Analyze TypeScript definitions
- Check CHANGELOG for features
```
**3. Use community sources:**
```
- Recent Stack Overflow answers
- GitHub discussions
- Blog posts from maintainers
- Video tutorials
```
**4. Note limitations clearly:**
```markdown
⚠️ **Documentation Limitations**
Official docs incomplete (as of [date]).
The following information inferred from:
- Repository examples
- TypeScript definitions
- Community discussions
May not reflect official recommendations.
```
## Authentication/Access Issues
### Symptoms
- Private repository
- Login required
- Organization-only access
- Documentation behind paywall
### Solutions
**1. For private repositories:**
```
- Note: "Repository is private"
- Check for public mirror
- Look for public documentation site
- Search package registry for info
```
**2. For paywalled docs:**
```
- Check for free tier/trial
- Look for open-source alternative
- Search for community mirrors
- Use package registry info instead
```
**3. Document access limitation:**
```markdown
## ⚠️ Access Limitation
Official repository is private. This report based on:
- Public documentation site: [url]
- Package registry info: [url]
- Community resources: [urls]
May not include internal implementation details.
```
## Error Handling Best Practices
### General Principles
1. **Fail fast**: Don't retry same method repeatedly
2. **Fall back**: Have alternative strategies ready
3. **Document**: Note what failed and why
4. **Inform user**: Clear about limitations
5. **Partial success**: Deliver what you can find
### Error Reporting Template
```markdown
## ⚠️ Discovery Issues Encountered
**Primary method**: [method] - [reason for failure]
**Fallback used**: [alternative method]
**Information completeness**: [percentage or description]
**What was found**:
- [list available information]
**What is missing**:
- [list gaps]
**Recommended action**:
- [how user can get missing info]
```
### Recovery Decision Tree
```
Error encountered
Is there an obvious alternative?
YES → Try alternative immediately
NO → Continue below
Have we tried all primary methods?
NO → Try next method in sequence
YES → Continue below
Is partial information useful?
YES → Deliver partial results with notes
NO → Inform user, request guidance
```

View File

@@ -0,0 +1,821 @@
# Limitations & Success Criteria
Understanding boundaries and measuring effectiveness of documentation discovery.
## context7.com Limitations
### Not All Libraries Indexed
**Limitation:**
- context7.com doesn't index every repository/website
- Very new libraries may not be available yet
- Private repositories not accessible
- Some niche libraries missing
**Impact:**
- Need fallback to official llms.txt or repository analysis
- May add 10-20 seconds to discovery time
- Requires manual search for obscure libraries
**Workarounds:**
```
1. Try context7.com first (always)
2. If 404, fall back to official llms.txt search
3. If still not found, use repository analysis
4. Note in report which method was used
```
**Example:**
```
Tried: https://context7.com/org/new-lib/llms.txt → 404
Fallback: WebSearch for "new-lib llms.txt" → Found
Used: Official llms.txt from website
```
### Topic Filtering Accuracy
**Limitation:**
- ?topic= parameter relies on keyword matching
- May miss relevant content with different terminology
- May include tangentially related content
- Quality depends on context7 indexing
**Impact:**
- Occasionally need to review base llms.txt without topic filter
- May miss some relevant documentation
**Workarounds:**
```
- Try multiple topic keywords
- Fall back to full llms.txt if topic search insufficient
- Use broader terms for better coverage
```
## Cannot Handle
### Password-Protected Documentation
**Limitation:**
- No access to authentication-required content
- Cannot log in to platforms
- No credential management
- Cannot access organization-internal docs
**Impact:**
- Enterprise documentation inaccessible
- Premium content unavailable
- Private beta docs unreachable
- Internal wikis not readable
**Workarounds:**
```
- Ask user for public alternatives
- Search for public subset of docs
- Use publicly available README/marketing
- Check if trial/demo access available
- Note limitation in report
```
**Report template:**
```markdown
⚠️ **Access Limitation**
Documentation requires authentication.
**What we can access**:
- Public README: [url]
- Package registry info: [url]
- Marketing site: [url]
**Cannot access**:
- Full documentation (requires login)
- Internal guides
- Premium content
**Recommendation**: Contact vendor for access or check if public docs available.
```
### Rate-Limited APIs
**Limitation:**
- No API credentials for authenticated access
- Subject to anonymous rate limits
- Cannot request increased limits
- No retry with authentication
**Impact:**
- Limited requests per hour (e.g., GitHub: 60/hour anonymous)
- May hit limits during comprehensive search
- Slower fallback required
- Incomplete coverage possible
**Workarounds:**
```
- Add delays between requests
- Use alternative sources (cached, mirrors)
- Prioritize critical pages
- Use Researcher agents instead of API
- Switch to repository analysis
```
**Detection:**
```
Symptoms:
- 429 Too Many Requests
- X-RateLimit-Remaining: 0
- Slow or refused connections
Response:
- Immediately switch to alternative method
- Don't retry same endpoint
- Note in report which method used
```
### Real-Time Documentation
**Limitation:**
- Uses snapshot at time of access
- Cannot monitor for updates
- No real-time synchronization
- May miss very recent changes
**Impact:**
- Documentation updated minutes ago may not be reflected
- Breaking changes announced today might be missed
- Latest release notes may not be current
- Version just released may not be documented
**Workarounds:**
```
- Note access date in report
- Recommend user verify if critical
- Check last-modified headers
- Compare with release dates
- Suggest official site for latest
```
**Report template:**
```markdown
**Snapshot Information**
Documentation retrieved: 2025-10-26 14:30 UTC
**Last-Modified** (if available):
- Main docs: 2025-10-24
- API reference: 2025-10-22
**Note**: For real-time updates, check official site: [url]
```
### Interactive Documentation
**Limitation:**
- Cannot run interactive examples
- Cannot execute code playgrounds
- No ability to test API calls
- Cannot verify functionality
**Impact:**
- Cannot confirm examples work
- Cannot test edge cases
- Cannot validate API responses
- Cannot verify performance claims
**Workarounds:**
```
- Provide code examples as-is
- Note: "Example provided, not tested"
- Recommend user run examples
- Link to interactive playground if available
- Include caveats about untested code
```
**Report template:**
```markdown
## Example Usage
```python
# Example from official docs (not tested)
import library
result = library.do_thing()
```
⚠️ **Note**: Example provided from documentation but not executed.
Please test in your environment.
**Interactive playground**: [url if available]
```
### Video-Only Documentation
**Limitation:**
- Cannot process video content directly
- Limited transcript access
- Cannot extract code from video
- Cannot parse visual diagrams
**Impact:**
- Video tutorials not usable
- YouTube courses inaccessible
- Screencasts not processable
- Visual walkthroughs unavailable
**Workarounds:**
```
- Search for transcript if available
- Look for accompanying blog post
- Find text-based alternative
- Check for community notes
- Use automated captions if available (low quality)
```
**Report template:**
```markdown
**Video Content Detected**
Primary documentation is video-based: [url]
**Alternatives found**:
- Blog post summary: [url]
- Community notes: [url]
**Cannot extract**:
- Detailed walkthrough from video
- Visual examples
- Demonstration steps
**Recommendation**: Watch video directly for visual content.
```
## May Struggle With
### Very Large Repositories (>1GB)
**Challenge:**
- Repomix may fail or hang
- Clone takes very long time
- Processing exceeds memory limits
- Output file too large to read
**Success rate:** ~30% for >1GB repos
**Mitigation:**
```
1. Try shallow clone: git clone --depth 1
2. Focus on docs only: repomix --include "docs/**"
3. Exclude binaries: --exclude "*.png,*.jpg,dist/**"
4. If fails: Use Explorer agents on specific files
5. Note limitation in report
```
**When to skip:**
```
Repository size indicators:
- Git clone shows >1GB download
- Contains large binaries (ml models, datasets)
- Has extensive history (>10k commits)
- Many multimedia files
→ Skip Repomix, use targeted exploration
```
### Documentation in Images/PDFs
**Challenge:**
- Cannot reliably extract text from images
- PDF parsing limited
- Formatting often lost
- Code snippets may be corrupted
**Success rate:** ~50% quality for PDFs, ~10% for images
**Mitigation:**
```
1. Search for text alternative
2. Try OCR if critical (low quality)
3. Provide image URL instead
4. Note content not extractable
5. Recommend manual review
```
**Report template:**
```markdown
⚠️ **Image-Based Documentation**
Primary documentation in PDF/images: [url]
**Extraction quality**: Limited
**Recommendation**: Download and review manually
**Text alternatives found**:
- [any alternatives]
```
### Non-English Documentation
**Challenge:**
- No automatic translation
- May miss context/nuance
- Technical terms may not translate well
- Examples may be language-specific
**Success rate:** Variable (depends on user needs)
**Mitigation:**
```
1. Note language in report
2. Offer key section translation if user requests
3. Search for English version
4. Check if bilingual docs exist
5. Provide original with language note
```
**Report template:**
```markdown
**Language Notice**
Primary documentation in: Japanese
**English availability**:
- Partial translation: [url]
- Community translation: [url]
- No official English version found
**Recommendation**: Use translation tool or request community help.
```
### Scattered Documentation
**Challenge:**
- Multiple sites/repositories
- Inconsistent structure
- Conflicting information
- No central source
**Success rate:** ~60% coverage
**Mitigation:**
```
1. Use Researcher agents
2. Prioritize official sources
3. Cross-reference findings
4. Note conflicts clearly
5. Take longer but be thorough
```
**Report template:**
```markdown
**Fragmented Documentation**
Information found across multiple sources:
**Official** (incomplete):
- Website: [url]
- Package registry: [url]
**Community** (supplementary):
- Stack Overflow: [url]
- Tutorial: [url]
**Note**: No centralized documentation. Information aggregated from
multiple sources. Conflicts resolved by prioritizing official sources.
```
### Deprecated/Legacy Libraries
**Challenge:**
- Documentation removed or archived
- Only old versions available
- Outdated information
- No current maintenance
**Success rate:** ~40% for fully deprecated libraries
**Mitigation:**
```
1. Use Internet Archive (Wayback Machine)
2. Search GitHub repository history
3. Check package registry for old README
4. Look for fork with docs
5. Note legacy status clearly
```
**Report template:**
```markdown
⚠️ **Legacy Library**
**Status**: Deprecated as of [date]
**Last update**: [date]
**Documentation sources**:
- Archived docs (via Wayback): [url]
- Repository (last commit [date]): [url]
**Recommendation**: Consider modern alternative: [suggestion]
**Migration path**: [if available]
```
## Success Criteria
### 1. Finds Relevant Information
**Measured by:**
- [ ] Answers user's specific question
- [ ] Covers requested topics
- [ ] Appropriate depth/detail
- [ ] Includes practical examples
- [ ] Links to additional resources
**Quality levels:**
**Excellent (100%):**
```
- All requested topics covered
- Examples for each major concept
- Clear, comprehensive information
- Official source, current version
- No gaps or limitations
```
**Good (80-99%):**
```
- Most requested topics covered
- Examples for core concepts
- Information mostly complete
- Official source, some gaps noted
- Minor limitations
```
**Acceptable (60-79%):**
```
- Core topics covered
- Some examples present
- Information somewhat complete
- Mix of official/community sources
- Some gaps noted
```
**Poor (<60%):**
```
- Only partial coverage
- Few or no examples
- Significant gaps
- Mostly unofficial sources
- Many limitations
```
### 2. Uses Most Efficient Method
**Measured by:**
- [ ] Started with llms.txt
- [ ] Used parallel agents appropriately
- [ ] Avoided unnecessary operations
- [ ] Completed in reasonable time
- [ ] Fell back efficiently when needed
**Efficiency score:**
**Optimal:**
```
- Found llms.txt immediately
- Parallel agents for all URLs
- Single batch processing
- Completed in <2 minutes
- No wasted operations
```
**Good:**
```
- Found llms.txt after 1-2 tries
- Mostly parallel processing
- Minimal sequential operations
- Completed in <5 minutes
- One minor inefficiency
```
**Acceptable:**
```
- Fell back to repository after llms.txt search
- Mix of parallel and sequential
- Some redundant operations
- Completed in <10 minutes
- A few inefficiencies
```
**Poor:**
```
- Didn't try llms.txt first
- Mostly sequential processing
- Many redundant operations
- Took >10 minutes
- Multiple inefficiencies
```
### 3. Completes in Reasonable Time
**Target times:**
| Scenario | Excellent | Good | Acceptable | Poor |
|----------|-----------|------|------------|------|
| Simple (1-5 URLs) | <1 min | 1-2 min | 2-5 min | >5 min |
| Medium (6-15 URLs) | <2 min | 2-4 min | 4-7 min | >7 min |
| Complex (16+ URLs) | <3 min | 3-6 min | 6-10 min | >10 min |
| Repository | <3 min | 3-6 min | 6-10 min | >10 min |
| Research | <5 min | 5-8 min | 8-12 min | >12 min |
**Factors affecting time:**
- Documentation structure (well-organized vs scattered)
- Source availability (llms.txt vs research)
- Content volume (few pages vs many)
- Network conditions (fast vs slow)
- Complexity (simple vs comprehensive)
### 4. Provides Clear Source Attribution
**Measured by:**
- [ ] Lists all sources used
- [ ] Notes method employed
- [ ] Includes URLs/references
- [ ] Identifies official vs community
- [ ] Credits authors when relevant
**Quality template:**
**Excellent:**
```markdown
## Sources
**Primary method**: llms.txt
**URL**: https://docs.library.com/llms.txt
**Documentation retrieved**:
1. Getting Started (official): [url]
2. API Reference (official): [url]
3. Examples (official): [url]
**Additional sources**:
- Repository: https://github.com/org/library
- Package registry: https://npmjs.com/package/library
**Method**: Parallel exploration with 3 agents
**Date**: 2025-10-26 14:30 UTC
```
### 5. Identifies Version/Date
**Measured by:**
- [ ] Documentation version noted
- [ ] Last-updated date included
- [ ] Matches user's version requirement
- [ ] Flags if version mismatch
- [ ] Notes if version unclear
**Best practice:**
```markdown
## Version Information
**Documentation version**: v3.2.1
**Last updated**: 2025-10-20
**Retrieved**: 2025-10-26
**User requested**: v3.x ✓ Match
**Note**: This is the latest stable version as of retrieval date.
```
### 6. Notes Limitations/Gaps
**Measured by:**
- [ ] Missing information identified
- [ ] Incomplete sections noted
- [ ] Known issues mentioned
- [ ] Alternatives suggested
- [ ] Workarounds provided
**Good practice:**
```markdown
## ⚠️ Limitations
**Incomplete documentation**:
- Advanced features section (stub page)
- Migration guide (404 error)
**Not available**:
- Video tutorials mentioned but not accessible
- Interactive examples require login
**Workarounds**:
- Advanced features: See examples in repository
- Migration: Check CHANGELOG.md for breaking changes
**Alternatives**:
- Community tutorial: [url]
- Stack Overflow: [url]
```
### 7. Well-Organized Output
**Measured by:**
- [ ] Clear structure
- [ ] Logical flow
- [ ] Easy to scan
- [ ] Actionable information
- [ ] Proper formatting
**Structure template:**
```markdown
# Documentation for [Library] [Version]
## Overview
[Brief description]
## Source
[Attribution]
## Installation
[Step-by-step]
## Quick Start
[Minimal example]
## Core Concepts
[Main ideas]
## API Reference
[Key methods]
## Examples
[Practical usage]
## Additional Resources
[Links]
## Limitations
[Any gaps]
```
## Quality Checklist
### Before Presenting Report
**Content quality:**
- [ ] Information is accurate
- [ ] Sources are official (or noted as unofficial)
- [ ] Version matches request
- [ ] Examples are clear
- [ ] No obvious errors
**Completeness:**
- [ ] All key topics covered
- [ ] Installation instructions present
- [ ] Usage examples included
- [ ] Configuration documented
- [ ] Troubleshooting information available
**Organization:**
- [ ] Logical flow
- [ ] Clear headings
- [ ] Proper code formatting
- [ ] Links working (spot check)
- [ ] Easy to scan
**Attribution:**
- [ ] Sources listed
- [ ] Method documented
- [ ] Version identified
- [ ] Date noted
- [ ] Limitations disclosed
**Usability:**
- [ ] Actionable information
- [ ] Copy-paste ready examples
- [ ] Next steps clear
- [ ] Resources for deep dive
- [ ] Known issues noted
## Performance Metrics
### Time-to-Value
**Measures:** How quickly user gets useful information
**Targets:**
```
First useful info: <30 seconds
Core coverage: <2 minutes
Complete report: <5 minutes
```
**Tracking:**
```
Start → llms.txt found → First URL fetched → Critical info extracted
15s 30s 45s 60s
User can act on info at 60s mark
(even if full report takes 5 minutes)
```
### Coverage Completeness
**Measures:** Percentage of user needs met
**Calculation:**
```
User needs 5 topics:
- Installation ✓
- Basic usage ✓
- API reference ✓
- Configuration ✓
- Examples ✗ (not found)
Coverage: 4/5 = 80%
```
**Targets:**
```
Excellent: 90-100%
Good: 75-89%
Acceptable: 60-74%
Poor: <60%
```
### Source Quality
**Measures:** Reliability of sources used
**Scoring:**
```
Official docs: 100 points
Official repository: 80 points
Package registry: 60 points
Recent community (verified): 40 points
Old community (unverified): 20 points
```
**Target:** Average >70 points
### User Satisfaction Indicators
**Positive signals:**
```
- User proceeds immediately with info
- No follow-up questions needed
- User says "perfect" or "thanks"
- User marks conversation complete
```
**Negative signals:**
```
- User asks same question differently
- User requests more details
- User says "incomplete" or "not what I needed"
- User abandons task
```
## Continuous Improvement
### Learn from Failures
**When documentation discovery struggles:**
1. **Document the issue**
```
- What was attempted
- What failed
- Why it failed
- How it was resolved (if at all)
```
2. **Identify patterns**
```
- Is this library-specific?
- Is this ecosystem-specific?
- Is this a general limitation?
```
3. **Update strategies**
```
- Add workaround to playbook
- Update fallback sequence
- Note limitation in documentation
```
### Measure and Optimize
**Track these metrics:**
```
- Average time to complete
- Coverage percentage
- Source quality score
- User satisfaction
- Failure rate by method
```
**Optimize based on data:**
```
- Which method succeeds most often?
- Which ecosystems need special handling?
- Where are time bottlenecks?
- What causes most failures?
```

View File

@@ -0,0 +1,574 @@
# Performance Optimization
Strategies and techniques for maximizing speed and efficiency in documentation discovery.
## Core Principles
### 0. Use context7.com for Instant llms.txt Access
**Fastest Approach:**
Direct URL construction instead of searching:
```
Traditional: WebSearch (15-30s) → WebFetch (5-10s) = 20-40s
context7.com: Direct WebFetch (5-10s) = 5-10s
Speed improvement: 2-4x faster
```
**Benefits:**
- No search required (instant URL construction)
- Consistent URL patterns
- Reliable availability
- Topic filtering for targeted results
**Examples:**
```
GitHub repo:
https://context7.com/vercel/next.js/llms.txt
→ Instant, no search needed
Website:
https://context7.com/websites/imgix/llms.txt
→ Instant, no search needed
Topic-specific:
https://context7.com/shadcn-ui/ui/llms.txt?topic=date
→ Filtered results, even faster
```
**Performance Impact:**
```
Without context7.com:
1. WebSearch for llms.txt: 15s
2. WebFetch llms.txt: 5s
3. Launch agents: 5s
Total: 25s
With context7.com:
1. Direct WebFetch: 5s
2. Launch agents: 5s
Total: 10s (2.5x faster!)
With context7.com + topic:
1. Direct WebFetch (filtered): 3s
2. Process focused results: 2s
Total: 5s (5x faster!)
```
### 1. Minimize Sequential Operations
**The Problem:**
Sequential operations add up linearly:
```
Total Time = Op1 + Op2 + Op3 + ... + OpN
```
Example:
```
Fetch URL 1: 5 seconds
Fetch URL 2: 5 seconds
Fetch URL 3: 5 seconds
Total: 15 seconds
```
**The Solution:**
Parallel operations complete in max time of slowest:
```
Total Time = max(Op1, Op2, Op3, ..., OpN)
```
Example:
```
Launch 3 agents simultaneously
All complete in: ~5 seconds
Total: 5 seconds (3x faster!)
```
### 2. Batch Related Operations
**Benefits:**
- Fewer context switches
- Better resource utilization
- Easier to track
- More efficient aggregation
**Grouping Strategies:**
**By topic:**
```
Agent 1: All authentication-related docs
Agent 2: All database-related docs
Agent 3: All API-related docs
```
**By content type:**
```
Agent 1: All tutorials
Agent 2: All reference docs
Agent 3: All examples
```
**By priority:**
```
Phase 1 (critical): Getting started, installation, core concepts
Phase 2 (important): Guides, API reference, configuration
Phase 3 (optional): Advanced topics, internals, optimization
```
### 3. Smart Caching
**What to cache:**
- Repomix output (expensive to generate)
- llms.txt content (static)
- Repository structure (rarely changes)
- Documentation URLs (reference list)
**When to refresh:**
- User requests specific version
- Documentation updated (check last-modified)
- Cache older than session
- User explicitly requests fresh data
### 4. Early Termination
**When to stop:**
```
✓ User's core needs met
✓ Critical information found
✓ Time limit approaching
✓ Diminishing returns (90% coverage achieved)
```
**How to decide:**
```
After Phase 1 (critical docs):
- Review what was found
- Check against user request
- If 80%+ covered → deliver now
- Offer to fetch more if needed
```
## Performance Patterns
### Pattern 1: Parallel Exploration
**Scenario:** llms.txt contains 10 URLs
**Slow approach (sequential):**
```
Time: 10 URLs × 5 seconds = 50 seconds
Step 1: Fetch URL 1 (5s)
Step 2: Fetch URL 2 (5s)
Step 3: Fetch URL 3 (5s)
...
Step 10: Fetch URL 10 (5s)
```
**Fast approach (parallel):**
```
Time: ~5-10 seconds total
Step 1: Launch 5 Explorer agents (simultaneous)
Agent 1: URLs 1-2
Agent 2: URLs 3-4
Agent 3: URLs 5-6
Agent 4: URLs 7-8
Agent 5: URLs 9-10
Step 2: Wait for all (max time: ~5-10s)
Step 3: Aggregate results
```
**Speedup:** 5-10x faster
### Pattern 2: Lazy Loading
**Scenario:** Documentation has 30+ pages
**Slow approach (fetch everything):**
```
Time: 30 URLs × 5 seconds ÷ 5 agents = 30 seconds
Fetch all 30 pages upfront
User only needs 5 of them
Wasted: 25 pages × 5 seconds ÷ 5 = 25 seconds
```
**Fast approach (priority loading):**
```
Time: 10 URLs × 5 seconds ÷ 5 agents = 10 seconds
Phase 1: Fetch critical 10 pages
Review: Does this cover user's needs?
If yes: Stop here (saved 20 seconds)
If no: Fetch additional as needed
```
**Speedup:** Up to 3x faster for typical use cases
### Pattern 3: Smart Fallbacks
**Scenario:** llms.txt not found
**Slow approach (exhaustive search):**
```
Time: ~5 minutes
Try: docs.library.com/llms.txt (30s timeout)
Try: library.dev/llms.txt (30s timeout)
Try: library.io/llms.txt (30s timeout)
Try: library.org/llms.txt (30s timeout)
Try: www.library.com/llms.txt (30s timeout)
Then: Fall back to repository
```
**Fast approach (quick fallback):**
```
Time: ~1 minute
Try: docs.library.com/llms.txt (15s)
Try: library.dev/llms.txt (15s)
Not found → Immediately try repository (30s)
```
**Speedup:** 5x faster
### Pattern 4: Incremental Results
**Scenario:** Large documentation set
**Slow approach (all-or-nothing):**
```
Time: 5 minutes until first result
Fetch all documentation
Aggregate everything
Present complete report
User waits 5 minutes
```
**Fast approach (streaming):**
```
Time: 30 seconds to first result
Phase 1: Fetch critical docs (30s)
Present: Initial findings
Phase 2: Fetch important docs (60s)
Update: Additional findings
Phase 3: Fetch supplementary (90s)
Final: Complete report
```
**Benefit:** User gets value immediately, can stop early if satisfied
## Optimization Techniques
### Technique 1: Workload Balancing
**Problem:** Uneven distribution causes bottlenecks
```
Bad distribution:
Agent 1: 1 URL (small) → finishes in 5s
Agent 2: 10 URLs (large) → finishes in 50s
Total: 50s (bottlenecked by Agent 2)
```
**Solution:** Balance by estimated size
```
Good distribution:
Agent 1: 3 URLs (medium pages) → ~15s
Agent 2: 3 URLs (medium pages) → ~15s
Agent 3: 3 URLs (medium pages) → ~15s
Agent 4: 1 URL (large page) → ~15s
Total: ~15s (balanced)
```
### Technique 2: Request Coalescing
**Problem:** Redundant requests slow things down
```
Bad:
Agent 1: Fetch README.md
Agent 2: Fetch README.md (duplicate!)
Agent 3: Fetch README.md (duplicate!)
Wasted: 2 redundant fetches
```
**Solution:** Deduplicate before fetching
```
Good:
Pre-processing: Identify unique URLs
Agent 1: Fetch README.md (once)
Agent 2: Fetch INSTALL.md
Agent 3: Fetch API.md
Share: README.md content across agents if needed
```
### Technique 3: Timeout Tuning
**Problem:** Default timeouts too conservative
```
Slow:
WebFetch timeout: 120s (too long for fast sites)
If site is down: Wait 120s before failing
```
**Solution:** Adaptive timeouts
```
Fast:
Known fast sites (official docs): 30s timeout
Unknown sites: 60s timeout
Large repos: 120s timeout
If timeout hit: Immediately try alternative
```
### Technique 4: Selective Fetching
**Problem:** Fetching irrelevant content
```
Wasteful:
Fetch: Installation guide ✓ (needed)
Fetch: API reference ✓ (needed)
Fetch: Internal architecture ✗ (not needed for basic usage)
Fetch: Contributing guide ✗ (not needed)
Fetch: Changelog ✗ (not needed)
```
**Solution:** Filter by user needs
```
Efficient:
User need: "How to get started"
Fetch only: Installation, basic usage, examples
Skip: Advanced topics, internals, contribution
Speedup: 50% less fetching
```
## Performance Benchmarks
### Target Times
| Scenario | Target Time | Acceptable | Too Slow |
|----------|-------------|------------|----------|
| Single URL | <10s | 10-20s | >20s |
| llms.txt (5 URLs) | <30s | 30-60s | >60s |
| llms.txt (15 URLs) | <60s | 60-120s | >120s |
| Repository analysis | <2min | 2-5min | >5min |
| Research fallback | <3min | 3-7min | >7min |
### Real-World Examples
**Fast case (Next.js with llms.txt):**
```
00:00 - Start
00:05 - Found llms.txt
00:10 - Fetched content (12 URLs)
00:15 - Launched 4 agents
00:45 - All agents complete
00:55 - Report ready
Total: 55 seconds ✓
```
**Medium case (Repository without llms.txt):**
```
00:00 - Start
00:15 - llms.txt not found
00:20 - Found repository
00:30 - Cloned repository
02:00 - Repomix complete
02:30 - Analyzed output
02:45 - Report ready
Total: 2m 45s ✓
```
**Slow case (Scattered documentation):**
```
00:00 - Start
00:30 - llms.txt not found
00:45 - Repository not found
01:00 - Launched 4 Researcher agents
05:00 - All research complete
06:00 - Aggregated findings
06:30 - Report ready
Total: 6m 30s (acceptable for research)
```
## Common Performance Issues
### Issue 1: Too Many Agents
**Symptom:** Slower than sequential
```
Problem:
Launched 15 agents for 15 URLs
Overhead: Agent initialization, coordination
Result: Slower than 5 agents with 3 URLs each
```
**Solution:**
```
Max 7 agents per batch
Group URLs sensibly
Use phases for large sets
```
### Issue 2: Blocking Operations
**Symptom:** Agents waiting unnecessarily
```
Problem:
Agent 1: Fetch URL, wait for Agent 2
Agent 2: Fetch URL, wait for Agent 3
Agent 3: Fetch URL
Result: Sequential instead of parallel
```
**Solution:**
```
Launch all agents independently
No dependencies between agents
Aggregate after all complete
```
### Issue 3: Redundant Fetching
**Symptom:** Same content fetched multiple times
```
Problem:
Phase 1: Fetch installation guide
Phase 2: Fetch installation guide again
Result: Wasted time
```
**Solution:**
```
Cache fetched content
Check cache before fetching
Reuse within session
```
### Issue 4: Late Bailout
**Symptom:** Continuing when should stop
```
Problem:
Found 90% of needed info after 1 minute
Spent 4 more minutes on remaining 10%
Result: 5x time for marginal gain
```
**Solution:**
```
Check progress after critical phase
If 80%+ covered → offer to stop
Only continue if user wants comprehensive
```
## Performance Monitoring
### Key Metrics
**Track these times:**
```
- llms.txt discovery: Target <30s
- Repository clone: Target <60s
- Repomix processing: Target <2min
- Agent exploration: Target <60s
- Total time: Target <3min for typical case
```
### Performance Report Template
```markdown
## Performance Summary
**Total time**: 1m 25s
**Method**: llms.txt + parallel exploration
**Breakdown**:
- Discovery: 15s (llms.txt search & fetch)
- Exploration: 50s (4 agents, 12 URLs)
- Aggregation: 20s (synthesis & formatting)
**Efficiency**: 8.5x faster than sequential
(12 URLs × 5s = 60s sequential, actual: 50s parallel)
```
### When to Optimize Further
Optimize if:
- [ ] Total time >2x target
- [ ] User explicitly requests "fast"
- [ ] Repeated similar queries (cache benefit)
- [ ] Large documentation set (>20 URLs)
Don't over-optimize if:
- [ ] Already meeting targets
- [ ] One-time query
- [ ] User values completeness over speed
- [ ] Research requires thoroughness
## Quick Optimization Checklist
### Before Starting
- [ ] Check if content already cached
- [ ] Identify fastest method for this case
- [ ] Plan for parallel execution
- [ ] Set appropriate timeouts
### During Execution
- [ ] Launch agents in parallel (not sequential)
- [ ] Use single message for multiple agents
- [ ] Monitor for bottlenecks
- [ ] Be ready to terminate early
### After First Phase
- [ ] Assess coverage achieved
- [ ] Determine if user needs met
- [ ] Decide: continue or deliver now
- [ ] Cache results for potential reuse
### Optimization Decision Tree
```
Need documentation?
Check cache
HIT → Use cached (0s) ✓
MISS → Continue
llms.txt available?
YES → Parallel agents (30-60s) ✓
NO → Continue
Repository available?
YES → Repomix (2-5min)
NO → Research (3-7min)
After Phase 1:
80%+ coverage?
YES → Deliver now (save time) ✓
NO → Continue to Phase 2
```

View File

@@ -0,0 +1,262 @@
# Tool Selection Guide
Complete reference for choosing and using the right tools for documentation discovery.
## context7.com (PRIORITY)
**Use FIRST for all llms.txt lookups**
**Patterns:**
GitHub repositories:
```
Pattern: https://context7.com/{org}/{repo}/llms.txt
Examples:
- https://github.com/vercel/next.js → https://context7.com/vercel/next.js/llms.txt
- https://github.com/shadcn-ui/ui → https://context7.com/shadcn-ui/ui/llms.txt
```
Websites:
```
Pattern: https://context7.com/websites/{normalized-path}/llms.txt
Examples:
- https://docs.imgix.com/ → https://context7.com/websites/imgix/llms.txt
- https://ffmpeg.org/doxygen/8.0/ → https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt
```
Topic-specific searches:
```
Pattern: https://context7.com/{path}/llms.txt?topic={query}
Examples:
- https://context7.com/shadcn-ui/ui/llms.txt?topic=date
- https://context7.com/vercel/next.js/llms.txt?topic=cache
- https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt?topic=compress
```
**Benefits:**
- Comprehensive aggregator of documentation
- Up-to-date content
- Topic filtering for targeted results
- Consistent format across libraries
- Reduces search time
**When to use:**
- ALWAYS try context7.com first before WebSearch
- Use topic parameter when user asks about specific feature
- Fall back to WebSearch only if context7.com returns 404
## WebSearch
**Use when:**
- context7.com unavailable or returns 404
- Finding GitHub repository URLs
- Locating official documentation sites
- Identifying package registries
- Searching for specific versions
**Best practices:**
- Try context7.com FIRST
- Include domain in query: `site:docs.example.com`
- Specify version when needed: `v2.0 llms.txt`
- Use official terms: "official repository" "documentation"
- Check multiple domains if first fails
**Example queries:**
```
Good: "Next.js llms.txt site:nextjs.org"
Good: "React v18 documentation site:react.dev"
Good: "Vue 3 official github repository"
Avoid: "how to use react" (too vague)
Avoid: "best react tutorial" (not official)
```
## WebFetch
**Use when:**
- Reading llms.txt content
- Accessing single documentation pages
- Retrieving specific URLs
- Checking documentation structure
- Verifying content availability
**Best practices:**
- Use specific prompt: "Extract all documentation URLs"
- Handle redirects properly
- Check for rate limiting
- Verify content is complete
- Note last-modified dates when available
**Limitations:**
- Single URL at a time (use Explorer for multiple)
- May timeout on very large pages
- Cannot handle dynamic content
- No JavaScript execution
## Task Tool with Explore Subagent
**Use when:**
- Multiple URLs to read (3+)
- Need parallel exploration
- Comprehensive documentation coverage
- Time-sensitive requests
- Large documentation sets
**Best practices:**
- Launch all agents in single message
- Distribute workload evenly
- Group related URLs per agent
- Maximum 7 agents per batch
- Provide clear extraction goals
**Example prompt:**
```
"Read the following URLs and extract:
1. Installation instructions
2. Core API methods
3. Configuration options
4. Common usage examples
URLs:
- [url1]
- [url2]
- [url3]"
```
## Task Tool with Researcher Subagent
**Use when:**
- No structured documentation found
- Need diverse information sources
- Community knowledge required
- Scattered documentation
- Comparative analysis needed
**Best practices:**
- Assign specific research areas per agent
- Request source verification
- Ask for date/version information
- Prioritize official sources
- Cross-reference findings
**Example prompt:**
```
"Research [library] focusing on:
1. Official installation methods
2. Common usage patterns
3. Known limitations or issues
4. Community best practices
Prioritize official sources and note version/date for all findings."
```
## Repomix
**Use when:**
- GitHub repository available
- Need complete codebase analysis
- Documentation scattered in repository
- Want to analyze code structure
- API documentation in code comments
**Installation:**
```bash
# Check if installed
which repomix
# Install globally if needed
npm install -g repomix
# Verify installation
repomix --version
```
**Usage:**
```bash
# Basic usage
git clone [repo-url] /tmp/docs-analysis
cd /tmp/docs-analysis
repomix --output repomix-output.xml
# Focus on specific directory
repomix --include "docs/**" --output docs-only.xml
# Exclude large files
repomix --exclude "*.png,*.jpg,*.pdf" --output repomix-output.xml
```
**When Repomix may fail:**
- Repository > 1GB (too large)
- Requires authentication (private repo)
- Slow network connection
- Limited disk space
- Binary-heavy repository
**Alternatives if Repomix fails:**
```bash
# Option 1: Focus on docs directory only
repomix --include "docs/**,README.md" --output docs.xml
# Option 2: Use Explorer agents to read specific files
# Launch agents to read key documentation files directly
# Option 3: Manual repository exploration
# Read README, then explore /docs directory structure
```
## Tool Selection Decision Tree
```
Need documentation?
Try context7.com first
Know GitHub org/repo?
YES → https://context7.com/{org}/{repo}/llms.txt
NO → Continue
Know website URL?
YES → https://context7.com/websites/{normalized-path}/llms.txt
NO → Continue
Specific topic/feature?
YES → Add ?topic={query} parameter
NO → Use base llms.txt URL
context7.com found?
YES → Process llms.txt URLs (go to URL count check)
NO → Continue
Fallback: WebSearch for llms.txt
Single URL?
YES → WebFetch
NO → Continue
1-3 URLs?
YES → Single Explorer agent
NO → Continue
4+ URLs?
YES → Multiple Explorer agents (3-7)
NO → Continue
Need repository analysis?
YES → Repomix (if available)
NO → Continue
No structured docs?
YES → Researcher agents
```
## Quick Reference
| Tool | Best For | Speed | Coverage | Complexity |
|------|----------|-------|----------|------------|
| context7.com | llms.txt lookup | Instant | High | Low |
| context7.com?topic= | Targeted search | Instant | Focused | Low |
| WebSearch | Finding URLs | Fast | Narrow | Low |
| WebFetch | Single page | Fast | Single | Low |
| Explorer | Multiple URLs | Fast | Medium | Medium |
| Researcher | Scattered info | Slow | Wide | High |
| Repomix | Repository | Medium | Complete | Medium |