Files
2025-11-30 08:48:52 +08:00

208 lines
7.3 KiB
Markdown

---
name: docs-seeker
description: "Searching internet for technical documentation using llms.txt standard, GitHub repositories via Repomix, and parallel exploration. Use when user needs: (1) Latest documentation for libraries/frameworks, (2) Documentation in llms.txt format, (3) GitHub repository analysis, (4) Documentation without direct llms.txt support, (5) Multiple documentation sources in parallel"
version: 1.0.0
---
# Documentation Discovery & Analysis
## Overview
Intelligent discovery and analysis of technical documentation through multiple strategies:
1. **llms.txt-first**: Search for standardized AI-friendly documentation
2. **Repository analysis**: Use Repomix to analyze GitHub repositories
3. **Parallel exploration**: Deploy multiple Explorer agents for comprehensive coverage
4. **Fallback research**: Use Researcher agents when other methods unavailable
## Core Workflow
### Phase 1: Initial Discovery
1. **Identify target**
- Extract library/framework name from user request
- Note version requirements (default: latest)
- Clarify scope if ambiguous
- Identify if target is GitHub repository or website
2. **Search for llms.txt (PRIORITIZE context7.com)**
**First: Try context7.com patterns**
For GitHub repositories:
```
Pattern: https://context7.com/{org}/{repo}/llms.txt
Examples:
- https://github.com/imagick/imagick → https://context7.com/imagick/imagick/llms.txt
- https://github.com/vercel/next.js → https://context7.com/vercel/next.js/llms.txt
- https://github.com/better-auth/better-auth → https://context7.com/better-auth/better-auth/llms.txt
```
For websites:
```
Pattern: https://context7.com/websites/{normalized-domain-path}/llms.txt
Examples:
- https://docs.imgix.com/ → https://context7.com/websites/imgix/llms.txt
- https://docs.byteplus.com/en/docs/ModelArk/ → https://context7.com/websites/byteplus_en_modelark/llms.txt
- https://docs.haystack.deepset.ai/docs → https://context7.com/websites/haystack_deepset_ai/llms.txt
- https://ffmpeg.org/doxygen/8.0/ → https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt
```
**Topic-specific searches** (when user asks about specific feature):
```
Pattern: https://context7.com/{path}/llms.txt?topic={query}
Examples:
- https://context7.com/shadcn-ui/ui/llms.txt?topic=date
- https://context7.com/shadcn-ui/ui/llms.txt?topic=button
- https://context7.com/vercel/next.js/llms.txt?topic=cache
- https://context7.com/websites/ffmpeg_doxygen_8_0/llms.txt?topic=compress
```
**Fallback: Traditional llms.txt search**
```
WebSearch: "[library name] llms.txt site:[docs domain]"
```
Common patterns:
- `https://docs.[library].com/llms.txt`
- `https://[library].dev/llms.txt`
- `https://[library].io/llms.txt`
→ Found? Proceed to Phase 2
→ Not found? Proceed to Phase 3
### Phase 2: llms.txt Processing
**Single URL:**
- WebFetch to retrieve content
- Extract and present information
**Multiple URLs (3+):**
- **CRITICAL**: Launch multiple Explorer agents in parallel
- One agent per major documentation section (max 5 in first batch)
- Each agent reads assigned URLs
- Aggregate findings into consolidated report
Example:
```
Launch 3 Explorer agents simultaneously:
- Agent 1: getting-started.md, installation.md
- Agent 2: api-reference.md, core-concepts.md
- Agent 3: examples.md, best-practices.md
```
### Phase 3: Repository Analysis
**When llms.txt not found:**
1. Find GitHub repository via WebSearch
2. Use Repomix to pack repository:
```bash
npm install -g repomix # if needed
git clone [repo-url] /tmp/docs-analysis
cd /tmp/docs-analysis
repomix --output repomix-output.xml
```
3. Read repomix-output.xml and extract documentation
**Repomix benefits:**
- Entire repository in single AI-friendly file
- Preserves directory structure
- Optimized for AI consumption
### Phase 4: Fallback Research
**When no GitHub repository exists:**
- Launch multiple Researcher agents in parallel
- Focus areas: official docs, tutorials, API references, community guides
- Aggregate findings into consolidated report
## Agent Distribution Guidelines
- **1-3 URLs**: Single Explorer agent
- **4-10 URLs**: 3-5 Explorer agents (2-3 URLs each)
- **11+ URLs**: 5-7 Explorer agents (prioritize most relevant)
## Version Handling
**Latest (default):**
- Search without version specifier
- Use current documentation paths
**Specific version:**
- Include version in search: `[library] v[version] llms.txt`
- Check versioned paths: `/v[version]/llms.txt`
- For repositories: checkout specific tag/branch
## Output Format
```markdown
# Documentation for [Library] [Version]
## Source
- Method: [llms.txt / Repository / Research]
- URLs: [list of sources]
- Date accessed: [current date]
## Key Information
[Extracted relevant information organized by topic]
## Additional Resources
[Related links, examples, references]
## Notes
[Any limitations, missing information, or caveats]
```
## Quick Reference
**Tool selection:**
- WebSearch → Find llms.txt URLs, GitHub repositories
- WebFetch → Read single documentation pages
- Task (Explore) → Multiple URLs, parallel exploration
- Task (Researcher) → Scattered documentation, diverse sources
- Repomix → Complete codebase analysis
**Popular llms.txt locations (try context7.com first):**
- Astro: https://context7.com/withastro/astro/llms.txt
- Next.js: https://context7.com/vercel/next.js/llms.txt
- Remix: https://context7.com/remix-run/remix/llms.txt
- shadcn/ui: https://context7.com/shadcn-ui/ui/llms.txt
- Better Auth: https://context7.com/better-auth/better-auth/llms.txt
**Fallback to official sites if context7.com unavailable:**
- Astro: https://docs.astro.build/llms.txt
- Next.js: https://nextjs.org/llms.txt
- Remix: https://remix.run/llms.txt
- SvelteKit: https://kit.svelte.dev/llms.txt
## Error Handling
- **llms.txt not accessible** → Try alternative domains → Repository analysis
- **Repository not found** → Search official website → Use Researcher agents
- **Repomix fails** → Try /docs directory only → Manual exploration
- **Multiple conflicting sources** → Prioritize official → Note versions
## Key Principles
1. **Prioritize context7.com for llms.txt** — Most comprehensive and up-to-date aggregator
2. **Use topic parameters when applicable** — Enables targeted searches with ?topic=...
3. **Use parallel agents aggressively** — Faster results, better coverage
4. **Verify official sources as fallback** — Use when context7.com unavailable
5. **Report methodology** — Tell user which approach was used
6. **Handle versions explicitly** — Don't assume latest
## Detailed Documentation
For comprehensive guides, examples, and best practices:
**Workflows:**
- [WORKFLOWS.md](./WORKFLOWS.md) — Detailed workflow examples and strategies
**Reference guides:**
- [Tool Selection](./references/tool-selection.md) — Complete guide to choosing and using tools
- [Documentation Sources](./references/documentation-sources.md) — Common sources and patterns across ecosystems
- [Error Handling](./references/error-handling.md) — Troubleshooting and resolution strategies
- [Best Practices](./references/best-practices.md) — 8 essential principles for effective discovery
- [Performance](./references/performance.md) — Optimization techniques and benchmarks
- [Limitations](./references/limitations.md) — Boundaries and success criteria