Files
gh-raw-labs-claude-code-mar…/skills/mxcp-expert/references/llm-friendly-documentation.md
2025-11-30 08:49:50 +08:00

21 KiB

LLM-Friendly Documentation Guide

CRITICAL: Tools must be self-documenting for LLMs without any prior context.

Core Principle

LLMs connecting to MXCP servers have ZERO context about your domain, data, or tools.

They only see:

  • Tool name
  • Tool description
  • Parameter names and types
  • Parameter descriptions
  • Return type structure

The documentation YOU provide is the ONLY information they have.

Tool Description Requirements

BAD Tool Description

tool:
  name: get_data
  description: "Gets data"  # ❌ Useless - what data? how? when to use?
  parameters:
    - name: id
      type: string  # ❌ No description - what kind of ID?
  return:
    type: array  # ❌ Array of what?

Why bad: LLM has no idea when to use this, what ID means, what data is returned.

GOOD Tool Description

tool:
  name: get_customer_orders
  description: "Retrieve all orders for a specific customer by customer ID. Returns order history including order date, total amount, status, and items. Use this to answer questions about a customer's purchase history or order status."
  parameters:
    - name: customer_id
      type: string
      description: "Unique customer identifier (e.g., 'CUST_12345'). Found in customer records or from list_customers tool."
      required: true
      examples: ["CUST_12345", "CUST_98765"]
    - name: status
      type: string
      description: "Optional filter by order status. Valid values: 'pending', 'shipped', 'delivered', 'cancelled'. Omit to get all orders."
      required: false
      examples: ["pending", "shipped"]
  return:
    type: array
    items:
      type: object
      properties:
        order_id: { type: string, description: "Unique order identifier" }
        order_date: { type: string, description: "ISO 8601 date when order was placed" }
        total_amount: { type: number, description: "Total order value in USD" }
        status: { type: string, description: "Current order status" }
        items: { type: array, description: "List of items in the order" }

Why good:

  • LLM knows WHEN to use it (customer purchase history, order status)
  • LLM knows WHAT parameters mean and valid values
  • LLM knows WHAT will be returned
  • LLM can chain with other tools (mentions list_customers)

Description Template

Tool-Level Description

Format: <What it does> <What it returns> <When to use it>

description: "Retrieve sales analytics by region and time period. Returns aggregated metrics including total sales, transaction count, and average order value. Use this to answer questions about sales performance, regional comparisons, or time-based trends."

Must include:

  1. What: What data/operation
  2. Returns: Summary of return data
  3. When: Use cases / when LLM should call this

Parameter Description

Format: <What it is> <Valid values/format> <Optional context>

parameters:
  - name: region
    type: string
    description: "Geographic region code. Valid values: 'north', 'south', 'east', 'west'. Use 'all' for aggregated data across all regions."
    examples: ["north", "south", "all"]

  - name: start_date
    type: string
    format: date
    description: "Start date for analytics period in YYYY-MM-DD format. Defaults to 30 days ago if omitted."
    required: false
    examples: ["2024-01-01", "2024-06-15"]

  - name: limit
    type: integer
    description: "Maximum number of results to return. Defaults to 100. Set to -1 for all results (use cautiously for large datasets)."
    default: 100
    examples: [10, 50, 100]

Must include:

  1. What it is: Clear explanation
  2. Valid values: Enums, formats, ranges
  3. Defaults: If parameter is optional
  4. Examples: Concrete examples

Return Type Description

Include descriptions for ALL fields:

return:
  type: object
  properties:
    total_sales:
      type: number
      description: "Sum of all sales in USD for the period"
    transaction_count:
      type: integer
      description: "Number of individual transactions"
    avg_order_value:
      type: number
      description: "Average transaction amount (total_sales / transaction_count)"
    top_products:
      type: array
      description: "Top 5 products by revenue"
      items:
        type: object
        properties:
          product_id: { type: string, description: "Product identifier" }
          product_name: { type: string, description: "Human-readable product name" }
          revenue: { type: number, description: "Total revenue for this product in USD" }

Combining Tools - Cross-References

Help LLMs chain tools together by mentioning related tools:

tool:
  name: get_customer_details
  description: "Get detailed information for a specific customer. Use customer_id from list_customers tool or search_customers tool. Returns personal info, account status, and lifetime value."
  # ... parameters ...
tool:
  name: list_customers
  description: "List all customers with optional filtering. Returns customer_id needed for get_customer_details and get_customer_orders tools."
  # ... parameters ...

LLM workflow enabled:

  1. LLM sees: "I need customer details"
  2. Reads: "Use customer_id from list_customers tool"
  3. Calls: list_customers first
  4. Gets: customer_id
  5. Calls: get_customer_details with that ID

Examples in Descriptions

ALWAYS provide concrete examples:

parameters:
  - name: date_range
    type: string
    description: "Date range in format 'YYYY-MM-DD to YYYY-MM-DD' or use shortcuts: 'today', 'yesterday', 'last_7_days', 'last_30_days', 'last_month', 'this_year'"
    examples:
      - "2024-01-01 to 2024-12-31"
      - "last_7_days"
      - "this_year"

Error Cases in Descriptions

Document expected errors:

tool:
  name: get_order
  description: "Retrieve order by order ID. Returns order details if found. Returns error if order_id doesn't exist or user doesn't have permission to view this order."
  parameters:
    - name: order_id
      type: string
      description: "Order identifier. Format: ORD_XXXXXX (e.g., 'ORD_123456'). Returns error if order not found."

Resource URIs

Make URI templates clear:

resource:
  uri: "customer://profile/{customer_id}"
  description: "Access customer profile data. Replace {customer_id} with actual customer ID (e.g., 'CUST_12345'). Returns 404 if customer doesn't exist."
  parameters:
    - name: customer_id
      type: string
      description: "Customer identifier from list_customers or search_customers"

Prompt Templates

Explain template variables clearly:

prompt:
  name: analyze_customer
  description: "Generate customer analysis report. Provide customer_id to analyze spending patterns, order frequency, and recommendations."
  parameters:
    - name: customer_id
      type: string
      description: "Customer to analyze (from list_customers)"
    - name: analysis_type
      type: string
      description: "Type of analysis: 'spending' (purchase patterns), 'behavior' (order frequency), 'recommendations' (product suggestions)"
      examples: ["spending", "behavior", "recommendations"]
  messages:
    - role: system
      type: text
      prompt: "You are a customer analytics expert. Analyze data thoroughly and provide actionable insights."
    - role: user
      type: text
      prompt: "Analyze customer {{ customer_id }} focusing on {{ analysis_type }}. Include specific metrics and recommendations."

Complete Example: Well-Documented Tool Set

# tools/list_products.yml
mxcp: 1
tool:
  name: list_products
  description: "List all available products with optional category filtering. Returns product catalog with IDs, names, prices, and stock levels. Use this to browse products or find product_id for get_product_details tool."
  parameters:
    - name: category
      type: string
      description: "Filter by product category. Valid values: 'electronics', 'clothing', 'food', 'books', 'home'. Omit to see all categories."
      required: false
      examples: ["electronics", "clothing"]
    - name: in_stock_only
      type: boolean
      description: "If true, only return products currently in stock. Default: false (shows all products)."
      default: false
  return:
    type: array
    description: "Array of product objects sorted by name"
    items:
      type: object
      properties:
        product_id:
          type: string
          description: "Unique product identifier (use with get_product_details)"
        name:
          type: string
          description: "Product name"
        category:
          type: string
          description: "Product category"
        price:
          type: number
          description: "Current price in USD"
        stock:
          type: integer
          description: "Current stock level (0 = out of stock)"
  source:
    code: |
      SELECT
        product_id,
        name,
        category,
        price,
        stock
      FROM products
      WHERE ($category IS NULL OR category = $category)
        AND ($in_stock_only = false OR stock > 0)
      ORDER BY name
# tools/get_product_details.yml
mxcp: 1
tool:
  name: get_product_details
  description: "Get detailed information for a specific product including full description, specifications, reviews, and related products. Use product_id from list_products tool."
  parameters:
    - name: product_id
      type: string
      description: "Product identifier from list_products (e.g., 'PROD_12345')"
      required: true
      examples: ["PROD_12345"]
  return:
    type: object
    description: "Complete product information"
    properties:
      product_id: { type: string, description: "Product identifier" }
      name: { type: string, description: "Product name" }
      description: { type: string, description: "Detailed product description" }
      price: { type: number, description: "Current price in USD" }
      stock: { type: integer, description: "Available quantity" }
      specifications: { type: object, description: "Product specs (varies by category)" }
      avg_rating: { type: number, description: "Average customer rating (0-5)" }
      review_count: { type: integer, description: "Number of customer reviews" }
      related_products: { type: array, description: "Product IDs of related items" }
  source:
    code: |
      SELECT * FROM product_details WHERE product_id = $product_id

Documentation Quality Checklist

Before declaring a tool complete, verify:

Tool Level:

  • Description explains WHAT it does
  • Description explains WHAT it returns
  • Description explains WHEN to use it
  • Cross-references to related tools (if applicable)
  • Use cases are clear

Parameter Level:

  • Every parameter has a description
  • Valid values/formats are documented
  • Examples provided for complex parameters
  • Required vs optional is clear
  • Defaults documented (if optional)

Return Type Level:

  • Return type structure is documented
  • Every field has a description
  • Complex nested objects are explained
  • Array item types are described

Overall:

  • An LLM reading this can use the tool WITHOUT human explanation
  • An LLM knows WHEN to call this vs other tools
  • An LLM knows HOW to get required parameters
  • An LLM knows WHAT to expect in the response

Common Documentation Mistakes

MISTAKE 1: Vague Descriptions

description: "Gets user info"  # ❌ Which user? What info? When?

FIX:

description: "Retrieve complete user profile including contact information, account status, and preferences for a specific user. Use user_id from list_users or search_users tools."

MISTAKE 2: Missing Parameter Details

parameters:
  - name: status
    type: string  # ❌ What are valid values?

FIX:

parameters:
  - name: status
    type: string
    description: "Order status filter. Valid values: 'pending', 'processing', 'shipped', 'delivered', 'cancelled'"
    examples: ["pending", "shipped"]

MISTAKE 3: Undocumented Return Fields

return:
  type: object
  properties:
    total: { type: number }  # ❌ Total what? In what units?

FIX:

return:
  type: object
  properties:
    total: { type: number, description: "Total order amount in USD including tax and shipping" }

MISTAKE 4: No Cross-References

tool:
  name: get_order_details
  parameters:
    - name: order_id
      type: string  # ❌ Where does LLM get this?

FIX:

tool:
  name: get_order_details
  description: "Get detailed order information. Use order_id from list_orders or search_orders tools."
  parameters:
    - name: order_id
      type: string
      description: "Order identifier (format: ORD_XXXXXX) from list_orders or search_orders"

MISTAKE 5: Technical Jargon Without Explanation

description: "Executes SOQL query on SF objects"  # ❌ LLM doesn't know SOQL or SF

FIX:

description: "Query Salesforce data using filters. Searches across accounts, contacts, and opportunities. Returns matching records with standard fields."

Testing Documentation Quality

Ask yourself: "If I gave this to an LLM with ZERO context about my domain, could it use this tool correctly?"

Test by asking:

  1. When should this tool be called?
  2. What parameters are needed and where do I get them?
  3. What will I get back?
  4. How does this relate to other tools?

If you can't answer clearly from the YAML alone, the documentation is insufficient.

Response Format Best Practices

Design tool outputs to optimize LLM context usage.

Provide Detail Level Options

Allow LLMs to request different levels of detail based on their needs.

tool:
  name: search_products
  parameters:
    - name: query
      type: string
      description: "Product search query"
    - name: detail_level
      type: string
      description: "Level of detail in response"
      enum: ["minimal", "standard", "full"]
      default: "standard"
      examples:
        - "minimal: Only ID, name, price (fastest, least context)"
        - "standard: Basic info + category + stock"
        - "full: All fields including descriptions and specifications"

Implementation in SQL:

SELECT
  CASE $detail_level
    WHEN 'minimal' THEN json_object('id', id, 'name', name, 'price', price)
    WHEN 'standard' THEN json_object('id', id, 'name', name, 'price', price, 'category', category, 'in_stock', stock > 0)
    ELSE json_object('id', id, 'name', name, 'price', price, 'category', category, 'stock', stock, 'description', description, 'specs', specs)
  END as product
FROM products
WHERE name LIKE '%' || $query || '%'

Use Human-Readable Formats

Return data in formats LLMs can easily understand and communicate to users.

Good: Human-Readable

return:
  type: object
  properties:
    customer_id: { type: string, description: "Customer ID (CUST_12345)" }
    customer_name: { type: string, description: "Display name" }
    last_order_date: { type: string, description: "Date in YYYY-MM-DD format" }
    total_spent: { type: number, description: "Total amount in USD" }
    status: { type: string, description: "Account status: active, inactive, suspended" }

SQL implementation:

SELECT
  customer_id,
  name as customer_name,
  DATE_FORMAT(last_order_date, '%Y-%m-%d') as last_order_date,  -- Not epoch timestamp
  ROUND(total_spent, 2) as total_spent,
  status
FROM customers

Bad: Opaque/Technical

return:
  type: object
  properties:
    cust_id: { type: integer }              # Unclear name
    ts: { type: integer }                   # Epoch timestamp - not human readable
    amt: { type: number }                   # Unclear abbreviation
    stat_cd: { type: integer }              # Status code instead of name

Include Display Names with IDs

When returning IDs, also return human-readable names.

return:
  type: object
  properties:
    assigned_to_user_id: { type: string, description: "User ID" }
    assigned_to_name: { type: string, description: "User display name" }
    category_id: { type: string, description: "Category ID" }
    category_name: { type: string, description: "Category name" }

Why: LLM can understand relationships without additional tool calls.

Limit Response Size

Prevent overwhelming LLMs with too much data.

tool:
  name: list_transactions
  parameters:
    - name: limit
      type: integer
      description: "Maximum number of transactions to return (1-1000)"
      default: 100
      minimum: 1
      maximum: 1000

Python implementation with truncation:

def list_transactions(limit: int = 100) -> dict:
    """List recent transactions with size limits"""

    if limit > 1000:
        return {
            "success": False,
            "error": f"Limit of {limit} exceeds maximum (1000). Use date filters to narrow results.",
            "error_code": "LIMIT_EXCEEDED",
            "suggestion": "Try adding 'start_date' and 'end_date' parameters"
        }

    results = db.execute(
        "SELECT * FROM transactions ORDER BY date DESC LIMIT $limit",
        {"limit": limit}
    )

    return {
        "success": True,
        "count": len(results),
        "limit": limit,
        "has_more": len(results) == limit,
        "transactions": results,
        "note": "Use pagination or filters if more results needed"
    }

Provide Pagination Metadata

Help LLMs understand when more data is available.

return:
  type: object
  properties:
    items: { type: array, description: "Results for this page" }
    total_count: { type: integer, description: "Total matching results" }
    returned_count: { type: integer, description: "Number returned in this response" }
    has_more: { type: boolean, description: "Whether more results are available" }
    next_offset: { type: integer, description: "Offset for next page" }

SQL implementation:

-- Get total count
WITH total AS (
  SELECT COUNT(*) as count FROM products WHERE category = $category
)
SELECT
  json_object(
    'items', (SELECT json_group_array(json_object('id', id, 'name', name))
              FROM products WHERE category = $category LIMIT $limit OFFSET $offset),
    'total_count', (SELECT count FROM total),
    'returned_count', MIN($limit, (SELECT count FROM total) - $offset),
    'has_more', (SELECT count FROM total) > ($offset + $limit),
    'next_offset', $offset + $limit
  ) as result

Format for Readability

Use clear field names and consistent structures.

Good: Clear Structure

return:
  type: object
  properties:
    summary:
      type: object
      description: "High-level summary"
      properties:
        total_orders: { type: integer }
        total_revenue: { type: number }
        average_order_value: { type: number }
    top_products:
      type: array
      description: "Top 5 selling products"
      items:
        type: object
        properties:
          product_name: { type: string }
          units_sold: { type: integer }
          revenue: { type: number }

Bad: Flat Unstructured

return:
  type: object
  properties:
    total_orders: { type: integer }
    total_revenue: { type: number }
    product1_name: { type: string }
    product1_units: { type: integer }
    product2_name: { type: string }
    # ...repeated pattern

Omit Verbose Metadata

Don't return internal/technical metadata that doesn't help LLMs.

# ✅ GOOD: Essential information only
return:
  type: object
  properties:
    user_id: { type: string }
    name: { type: string }
    email: { type: string }
    profile_image: { type: string, description: "Profile image URL" }

# ❌ BAD: Too much metadata
return:
  type: object
  properties:
    user_id: { type: string }
    name: { type: string }
    email: { type: string }
    profile_image_small: { type: string }
    profile_image_medium: { type: string }
    profile_image_large: { type: string }
    profile_image_xlarge: { type: string }
    internal_db_id: { type: integer }
    created_timestamp_unix: { type: integer }
    modified_timestamp_unix: { type: integer }
    schema_version: { type: integer }

Principle: Include one best representation, not all variations.

Summary

Every tool must be self-documenting:

  • Clear, detailed descriptions
  • Documented parameters with examples
  • Documented return types
  • Cross-references to related tools
  • Valid values and formats
  • Use cases explained

Response format best practices:

  • Provide detail level options (minimal/standard/full)
  • Use human-readable formats (dates, names, not codes)
  • Include display names alongside IDs
  • Limit response sizes with clear guidance
  • Provide pagination metadata
  • Structure data clearly
  • Omit verbose internal metadata

Remember: The LLM has NO prior knowledge. Your descriptions are its ONLY guide.