Files
2025-11-30 09:01:40 +08:00

3.7 KiB
Raw Permalink Blame History

Progressive Disclosure Pattern (MANDATORY)

Core Principle: Find the smallest set of high-signal tokens first (index format), then drill down to full details only for relevant items.

The 4-Step Workflow

Step 1: Start with Index Format

Action:

  • Use format=index (default in most operations)
  • Set limit=3-5 (not 20)
  • Review titles and dates ONLY

Token Cost: ~50-100 tokens per result

Why: Minimal token investment for maximum signal. Get overview before committing to full details.

Example:

curl -s "http://localhost:37777/api/search/observations?query=authentication&format=index&limit=5"

Response:

{
  "query": "authentication",
  "count": 5,
  "format": "index",
  "results": [
    {
      "id": 1234,
      "type": "feature",
      "title": "Implemented JWT authentication",
      "subtitle": "Added token-based auth with refresh tokens",
      "created_at_epoch": 1699564800000,
      "project": "api-server"
    }
  ]
}

Step 2: Identify Relevant Items

Cognitive Task:

  • Scan index results for relevance
  • Note which items need full details
  • Discard irrelevant items

Why: Human-in-the-loop filtering before expensive operations. Don't load full details for items you'll ignore.

Step 3: Request Full Details (Selectively)

Action:

  • Use format=full ONLY for specific items of interest
  • Target by ID or use refined search query

Token Cost: ~500-1000 tokens per result

Principle: Load only what you need

Example:

# After reviewing index, get full details for observation #1234
curl -s "http://localhost:37777/api/search/observations?query=authentication&format=full&limit=1&offset=2"

Why: Targeted token expenditure with high ROI. 10x cost difference means selectivity matters.

Step 4: Refine with Filters (If Needed)

Techniques:

  • Use type, dateRange, concepts, files filters
  • Narrow scope BEFORE requesting more results
  • Use offset for pagination instead of large limits

Why: Reduce result set first, then expand selectively. Don't load 20 results when filters could narrow to 3.

Token Budget Awareness

Costs:

  • Index result: ~50-100 tokens
  • Full result: ~500-1000 tokens
  • 10x cost difference

Starting Points:

  • Start with limit=3-5 (not 20)
  • Reduce limit if hitting token errors

Savings Example:

  • Naive: 10 items × 750 tokens (avg full) = 7,500 tokens
  • Progressive: (5 items × 75 tokens index) + (2 items × 750 tokens full) = 1,875 tokens
  • Savings: 5,625 tokens (75% reduction)

What Problems This Solves

  1. Token exhaustion: Without this, LLMs load everything in full format (9,000+ tokens for 10 items)
  2. Poor signal-to-noise: Loading full details for irrelevant items wastes tokens
  3. MCP limits: Large payloads hit protocol limits (system failures)
  4. Inefficiency: Loading 20 full results when only 2 are relevant

How It Scales

With 10 records:

  • Index (500 tokens) → Full (2,000 tokens for 2 relevant) = 2,500 tokens
  • Without pattern: Full (10,000 tokens for all 10) = 4x more expensive

With 1,000 records:

  • Index (500 tokens for top 5) → Full (1,000 tokens for 1 relevant) = 1,500 tokens
  • Without pattern: Would hit MCP limits before seeing relevant data

Context Engineering Alignment

This pattern implements core context engineering principles:

  • Just-in-time context: Load data dynamically at runtime
  • Progressive disclosure: Lightweight identifiers (index) → full details as needed
  • Token efficiency: Minimal high-signal tokens first, expand selectively
  • Attention budget: Treat context as finite resource with diminishing returns

Always start with the smallest set of high-signal tokens that maximize likelihood of desired outcome.