gh-rafaelcalleja-claude-mar…/skills/docs-seeker/references/performance.md

# Performance Optimization

Strategies and techniques for maximizing speed and efficiency in documentation discovery.

## Core Principles

### 0. Use context7.com for Instant llms.txt Access

**Fastest Approach:**

Direct URL construction instead of searching:
```
Traditional: WebSearch (15-30s) → WebFetch (5-10s) = 20-40s
context7.com: Direct WebFetch (5-10s) = 5-10s

Speed improvement: 2-4x faster
```

**Benefits:**
- No search required (instant URL construction)
- Consistent URL patterns
- Reliable availability
- Topic filtering for targeted results

**Examples:**
```
GitHub repo:
https://context7.com/vercel/next.js/llms.txt
→ Instant, no search needed

Website:
https://context7.com/websites/imgix/llms.txt
→ Instant, no search needed

Topic-specific:
https://context7.com/shadcn-ui/ui/llms.txt?topic=date
→ Filtered results, even faster
```

**Performance Impact:**
```
Without context7.com:
1. WebSearch for llms.txt: 15s
2. WebFetch llms.txt: 5s
3. Launch agents: 5s
Total: 25s

With context7.com:
1. Direct WebFetch: 5s
2. Launch agents: 5s
Total: 10s (2.5x faster!)

With context7.com + topic:
1. Direct WebFetch (filtered): 3s
2. Process focused results: 2s
Total: 5s (5x faster!)
```

### 1. Minimize Sequential Operations

**The Problem:**

Sequential operations add up linearly:
```
Total Time = Op1 + Op2 + Op3 + ... + OpN
```

Example:
```
Fetch URL 1: 5 seconds
Fetch URL 2: 5 seconds
Fetch URL 3: 5 seconds
Total: 15 seconds
```

**The Solution:**

Parallel operations complete in max time of slowest:
```
Total Time = max(Op1, Op2, Op3, ..., OpN)
```

Example:
```
Launch 3 agents simultaneously
All complete in: ~5 seconds
Total: 5 seconds (3x faster!)
```

### 2. Batch Related Operations

**Benefits:**
- Fewer context switches
- Better resource utilization
- Easier to track
- More efficient aggregation

**Grouping Strategies:**

**By topic:**
```
Agent 1: All authentication-related docs
Agent 2: All database-related docs
Agent 3: All API-related docs
```

**By content type:**
```
Agent 1: All tutorials
Agent 2: All reference docs
Agent 3: All examples
```

**By priority:**
```
Phase 1 (critical): Getting started, installation, core concepts
Phase 2 (important): Guides, API reference, configuration
Phase 3 (optional): Advanced topics, internals, optimization
```

### 3. Smart Caching

**What to cache:**
- Repomix output (expensive to generate)
- llms.txt content (static)
- Repository structure (rarely changes)
- Documentation URLs (reference list)

**When to refresh:**
- User requests specific version
- Documentation updated (check last-modified)
- Cache older than session
- User explicitly requests fresh data

### 4. Early Termination

**When to stop:**
```
✓ User's core needs met
✓ Critical information found
✓ Time limit approaching
✓ Diminishing returns (90% coverage achieved)
```

**How to decide:**
```
After Phase 1 (critical docs):
- Review what was found
- Check against user request
- If 80%+ covered → deliver now
- Offer to fetch more if needed
```

## Performance Patterns

### Pattern 1: Parallel Exploration

**Scenario:** llms.txt contains 10 URLs

**Slow approach (sequential):**
```
Time: 10 URLs × 5 seconds = 50 seconds

Step 1: Fetch URL 1 (5s)
Step 2: Fetch URL 2 (5s)
Step 3: Fetch URL 3 (5s)
...
Step 10: Fetch URL 10 (5s)
```

**Fast approach (parallel):**
```
Time: ~5-10 seconds total

Step 1: Launch 5 Explorer agents (simultaneous)
  Agent 1: URLs 1-2
  Agent 2: URLs 3-4
  Agent 3: URLs 5-6
  Agent 4: URLs 7-8
  Agent 5: URLs 9-10

Step 2: Wait for all (max time: ~5-10s)
Step 3: Aggregate results
```

**Speedup:** 5-10x faster

### Pattern 2: Lazy Loading

**Scenario:** Documentation has 30+ pages

**Slow approach (fetch everything):**
```
Time: 30 URLs × 5 seconds ÷ 5 agents = 30 seconds

Fetch all 30 pages upfront
User only needs 5 of them
Wasted: 25 pages × 5 seconds ÷ 5 = 25 seconds
```

**Fast approach (priority loading):**
```
Time: 10 URLs × 5 seconds ÷ 5 agents = 10 seconds

Phase 1: Fetch critical 10 pages
Review: Does this cover user's needs?
If yes: Stop here (saved 20 seconds)
If no: Fetch additional as needed
```

**Speedup:** Up to 3x faster for typical use cases

### Pattern 3: Smart Fallbacks

**Scenario:** llms.txt not found

**Slow approach (exhaustive search):**
```
Time: ~5 minutes

Try: docs.library.com/llms.txt (30s timeout)
Try: library.dev/llms.txt (30s timeout)
Try: library.io/llms.txt (30s timeout)
Try: library.org/llms.txt (30s timeout)
Try: www.library.com/llms.txt (30s timeout)
Then: Fall back to repository
```

**Fast approach (quick fallback):**
```
Time: ~1 minute

Try: docs.library.com/llms.txt (15s)
Try: library.dev/llms.txt (15s)
Not found → Immediately try repository (30s)
```

**Speedup:** 5x faster

### Pattern 4: Incremental Results

**Scenario:** Large documentation set

**Slow approach (all-or-nothing):**
```
Time: 5 minutes until first result

Fetch all documentation
Aggregate everything
Present complete report
User waits 5 minutes
```

**Fast approach (streaming):**
```
Time: 30 seconds to first result

Phase 1: Fetch critical docs (30s)
Present: Initial findings
Phase 2: Fetch important docs (60s)
Update: Additional findings
Phase 3: Fetch supplementary (90s)
Final: Complete report
```

**Benefit:** User gets value immediately, can stop early if satisfied

## Optimization Techniques

### Technique 1: Workload Balancing

**Problem:** Uneven distribution causes bottlenecks

```
Bad distribution:
Agent 1: 1 URL (small) → finishes in 5s
Agent 2: 10 URLs (large) → finishes in 50s
Total: 50s (bottlenecked by Agent 2)
```

**Solution:** Balance by estimated size

```
Good distribution:
Agent 1: 3 URLs (medium pages) → ~15s
Agent 2: 3 URLs (medium pages) → ~15s
Agent 3: 3 URLs (medium pages) → ~15s
Agent 4: 1 URL (large page) → ~15s
Total: ~15s (balanced)
```

### Technique 2: Request Coalescing

**Problem:** Redundant requests slow things down

```
Bad:
Agent 1: Fetch README.md
Agent 2: Fetch README.md (duplicate!)
Agent 3: Fetch README.md (duplicate!)
Wasted: 2 redundant fetches
```

**Solution:** Deduplicate before fetching

```
Good:
Pre-processing: Identify unique URLs
Agent 1: Fetch README.md (once)
Agent 2: Fetch INSTALL.md
Agent 3: Fetch API.md
Share: README.md content across agents if needed
```

### Technique 3: Timeout Tuning

**Problem:** Default timeouts too conservative

```
Slow:
WebFetch timeout: 120s (too long for fast sites)
If site is down: Wait 120s before failing
```

**Solution:** Adaptive timeouts

```
Fast:
Known fast sites (official docs): 30s timeout
Unknown sites: 60s timeout
Large repos: 120s timeout
If timeout hit: Immediately try alternative
```

### Technique 4: Selective Fetching

**Problem:** Fetching irrelevant content

```
Wasteful:
Fetch: Installation guide ✓ (needed)
Fetch: API reference ✓ (needed)
Fetch: Internal architecture ✗ (not needed for basic usage)
Fetch: Contributing guide ✗ (not needed)
Fetch: Changelog ✗ (not needed)
```

**Solution:** Filter by user needs

```
Efficient:
User need: "How to get started"
Fetch only: Installation, basic usage, examples
Skip: Advanced topics, internals, contribution
Speedup: 50% less fetching
```

## Performance Benchmarks

### Target Times

| Scenario | Target Time | Acceptable | Too Slow |
|----------|-------------|------------|----------|
| Single URL | <10s | 10-20s | >20s |
| llms.txt (5 URLs) | <30s | 30-60s | >60s |
| llms.txt (15 URLs) | <60s | 60-120s | >120s |
| Repository analysis | <2min | 2-5min | >5min |
| Research fallback | <3min | 3-7min | >7min |

### Real-World Examples

**Fast case (Next.js with llms.txt):**
```
00:00 - Start
00:05 - Found llms.txt
00:10 - Fetched content (12 URLs)
00:15 - Launched 4 agents
00:45 - All agents complete
00:55 - Report ready
Total: 55 seconds ✓
```

**Medium case (Repository without llms.txt):**
```
00:00 - Start
00:15 - llms.txt not found
00:20 - Found repository
00:30 - Cloned repository
02:00 - Repomix complete
02:30 - Analyzed output
02:45 - Report ready
Total: 2m 45s ✓
```

**Slow case (Scattered documentation):**
```
00:00 - Start
00:30 - llms.txt not found
00:45 - Repository not found
01:00 - Launched 4 Researcher agents
05:00 - All research complete
06:00 - Aggregated findings
06:30 - Report ready
Total: 6m 30s (acceptable for research)
```

## Common Performance Issues

### Issue 1: Too Many Agents

**Symptom:** Slower than sequential

```
Problem:
Launched 15 agents for 15 URLs
Overhead: Agent initialization, coordination
Result: Slower than 5 agents with 3 URLs each
```

**Solution:**
```
Max 7 agents per batch
Group URLs sensibly
Use phases for large sets
```

### Issue 2: Blocking Operations

**Symptom:** Agents waiting unnecessarily

```
Problem:
Agent 1: Fetch URL, wait for Agent 2
Agent 2: Fetch URL, wait for Agent 3
Agent 3: Fetch URL
Result: Sequential instead of parallel
```

**Solution:**
```
Launch all agents independently
No dependencies between agents
Aggregate after all complete
```

### Issue 3: Redundant Fetching

**Symptom:** Same content fetched multiple times

```
Problem:
Phase 1: Fetch installation guide
Phase 2: Fetch installation guide again
Result: Wasted time
```

**Solution:**
```
Cache fetched content
Check cache before fetching
Reuse within session
```

### Issue 4: Late Bailout

**Symptom:** Continuing when should stop

```
Problem:
Found 90% of needed info after 1 minute
Spent 4 more minutes on remaining 10%
Result: 5x time for marginal gain
```

**Solution:**
```
Check progress after critical phase
If 80%+ covered → offer to stop
Only continue if user wants comprehensive
```

## Performance Monitoring

### Key Metrics

**Track these times:**
```
- llms.txt discovery: Target <30s
- Repository clone: Target <60s
- Repomix processing: Target <2min
- Agent exploration: Target <60s
- Total time: Target <3min for typical case
```

### Performance Report Template

```markdown
## Performance Summary

**Total time**: 1m 25s
**Method**: llms.txt + parallel exploration

**Breakdown**:
- Discovery: 15s (llms.txt search & fetch)
- Exploration: 50s (4 agents, 12 URLs)
- Aggregation: 20s (synthesis & formatting)

**Efficiency**: 8.5x faster than sequential
(12 URLs × 5s = 60s sequential, actual: 50s parallel)
```

### When to Optimize Further

Optimize if:
- [ ] Total time >2x target
- [ ] User explicitly requests "fast"
- [ ] Repeated similar queries (cache benefit)
- [ ] Large documentation set (>20 URLs)

Don't over-optimize if:
- [ ] Already meeting targets
- [ ] One-time query
- [ ] User values completeness over speed
- [ ] Research requires thoroughness

## Quick Optimization Checklist

### Before Starting

- [ ] Check if content already cached
- [ ] Identify fastest method for this case
- [ ] Plan for parallel execution
- [ ] Set appropriate timeouts

### During Execution

- [ ] Launch agents in parallel (not sequential)
- [ ] Use single message for multiple agents
- [ ] Monitor for bottlenecks
- [ ] Be ready to terminate early

### After First Phase

- [ ] Assess coverage achieved
- [ ] Determine if user needs met
- [ ] Decide: continue or deliver now
- [ ] Cache results for potential reuse

### Optimization Decision Tree

```
Need documentation?
  ↓
Check cache
  ↓
HIT → Use cached (0s) ✓
MISS → Continue
  ↓
llms.txt available?
  ↓
YES → Parallel agents (30-60s) ✓
NO → Continue
  ↓
Repository available?
  ↓
YES → Repomix (2-5min)
NO → Research (3-7min)
  ↓
After Phase 1:
80%+ coverage?
  ↓
YES → Deliver now (save time) ✓
NO → Continue to Phase 2
```