21 KiB
LLM-Friendly Documentation Guide
CRITICAL: Tools must be self-documenting for LLMs without any prior context.
Core Principle
LLMs connecting to MXCP servers have ZERO context about your domain, data, or tools.
They only see:
- Tool name
- Tool description
- Parameter names and types
- Parameter descriptions
- Return type structure
The documentation YOU provide is the ONLY information they have.
Tool Description Requirements
❌ BAD Tool Description
tool:
name: get_data
description: "Gets data" # ❌ Useless - what data? how? when to use?
parameters:
- name: id
type: string # ❌ No description - what kind of ID?
return:
type: array # ❌ Array of what?
Why bad: LLM has no idea when to use this, what ID means, what data is returned.
✅ GOOD Tool Description
tool:
name: get_customer_orders
description: "Retrieve all orders for a specific customer by customer ID. Returns order history including order date, total amount, status, and items. Use this to answer questions about a customer's purchase history or order status."
parameters:
- name: customer_id
type: string
description: "Unique customer identifier (e.g., 'CUST_12345'). Found in customer records or from list_customers tool."
required: true
examples: ["CUST_12345", "CUST_98765"]
- name: status
type: string
description: "Optional filter by order status. Valid values: 'pending', 'shipped', 'delivered', 'cancelled'. Omit to get all orders."
required: false
examples: ["pending", "shipped"]
return:
type: array
items:
type: object
properties:
order_id: { type: string, description: "Unique order identifier" }
order_date: { type: string, description: "ISO 8601 date when order was placed" }
total_amount: { type: number, description: "Total order value in USD" }
status: { type: string, description: "Current order status" }
items: { type: array, description: "List of items in the order" }
Why good:
- LLM knows WHEN to use it (customer purchase history, order status)
- LLM knows WHAT parameters mean and valid values
- LLM knows WHAT will be returned
- LLM can chain with other tools (mentions list_customers)
Description Template
Tool-Level Description
Format: <What it does> <What it returns> <When to use it>
description: "Retrieve sales analytics by region and time period. Returns aggregated metrics including total sales, transaction count, and average order value. Use this to answer questions about sales performance, regional comparisons, or time-based trends."
Must include:
- What: What data/operation
- Returns: Summary of return data
- When: Use cases / when LLM should call this
Parameter Description
Format: <What it is> <Valid values/format> <Optional context>
parameters:
- name: region
type: string
description: "Geographic region code. Valid values: 'north', 'south', 'east', 'west'. Use 'all' for aggregated data across all regions."
examples: ["north", "south", "all"]
- name: start_date
type: string
format: date
description: "Start date for analytics period in YYYY-MM-DD format. Defaults to 30 days ago if omitted."
required: false
examples: ["2024-01-01", "2024-06-15"]
- name: limit
type: integer
description: "Maximum number of results to return. Defaults to 100. Set to -1 for all results (use cautiously for large datasets)."
default: 100
examples: [10, 50, 100]
Must include:
- What it is: Clear explanation
- Valid values: Enums, formats, ranges
- Defaults: If parameter is optional
- Examples: Concrete examples
Return Type Description
Include descriptions for ALL fields:
return:
type: object
properties:
total_sales:
type: number
description: "Sum of all sales in USD for the period"
transaction_count:
type: integer
description: "Number of individual transactions"
avg_order_value:
type: number
description: "Average transaction amount (total_sales / transaction_count)"
top_products:
type: array
description: "Top 5 products by revenue"
items:
type: object
properties:
product_id: { type: string, description: "Product identifier" }
product_name: { type: string, description: "Human-readable product name" }
revenue: { type: number, description: "Total revenue for this product in USD" }
Combining Tools - Cross-References
Help LLMs chain tools together by mentioning related tools:
tool:
name: get_customer_details
description: "Get detailed information for a specific customer. Use customer_id from list_customers tool or search_customers tool. Returns personal info, account status, and lifetime value."
# ... parameters ...
tool:
name: list_customers
description: "List all customers with optional filtering. Returns customer_id needed for get_customer_details and get_customer_orders tools."
# ... parameters ...
LLM workflow enabled:
- LLM sees: "I need customer details"
- Reads: "Use customer_id from list_customers tool"
- Calls:
list_customersfirst - Gets:
customer_id - Calls:
get_customer_detailswith that ID
Examples in Descriptions
ALWAYS provide concrete examples:
parameters:
- name: date_range
type: string
description: "Date range in format 'YYYY-MM-DD to YYYY-MM-DD' or use shortcuts: 'today', 'yesterday', 'last_7_days', 'last_30_days', 'last_month', 'this_year'"
examples:
- "2024-01-01 to 2024-12-31"
- "last_7_days"
- "this_year"
Error Cases in Descriptions
Document expected errors:
tool:
name: get_order
description: "Retrieve order by order ID. Returns order details if found. Returns error if order_id doesn't exist or user doesn't have permission to view this order."
parameters:
- name: order_id
type: string
description: "Order identifier. Format: ORD_XXXXXX (e.g., 'ORD_123456'). Returns error if order not found."
Resource URIs
Make URI templates clear:
resource:
uri: "customer://profile/{customer_id}"
description: "Access customer profile data. Replace {customer_id} with actual customer ID (e.g., 'CUST_12345'). Returns 404 if customer doesn't exist."
parameters:
- name: customer_id
type: string
description: "Customer identifier from list_customers or search_customers"
Prompt Templates
Explain template variables clearly:
prompt:
name: analyze_customer
description: "Generate customer analysis report. Provide customer_id to analyze spending patterns, order frequency, and recommendations."
parameters:
- name: customer_id
type: string
description: "Customer to analyze (from list_customers)"
- name: analysis_type
type: string
description: "Type of analysis: 'spending' (purchase patterns), 'behavior' (order frequency), 'recommendations' (product suggestions)"
examples: ["spending", "behavior", "recommendations"]
messages:
- role: system
type: text
prompt: "You are a customer analytics expert. Analyze data thoroughly and provide actionable insights."
- role: user
type: text
prompt: "Analyze customer {{ customer_id }} focusing on {{ analysis_type }}. Include specific metrics and recommendations."
Complete Example: Well-Documented Tool Set
# tools/list_products.yml
mxcp: 1
tool:
name: list_products
description: "List all available products with optional category filtering. Returns product catalog with IDs, names, prices, and stock levels. Use this to browse products or find product_id for get_product_details tool."
parameters:
- name: category
type: string
description: "Filter by product category. Valid values: 'electronics', 'clothing', 'food', 'books', 'home'. Omit to see all categories."
required: false
examples: ["electronics", "clothing"]
- name: in_stock_only
type: boolean
description: "If true, only return products currently in stock. Default: false (shows all products)."
default: false
return:
type: array
description: "Array of product objects sorted by name"
items:
type: object
properties:
product_id:
type: string
description: "Unique product identifier (use with get_product_details)"
name:
type: string
description: "Product name"
category:
type: string
description: "Product category"
price:
type: number
description: "Current price in USD"
stock:
type: integer
description: "Current stock level (0 = out of stock)"
source:
code: |
SELECT
product_id,
name,
category,
price,
stock
FROM products
WHERE ($category IS NULL OR category = $category)
AND ($in_stock_only = false OR stock > 0)
ORDER BY name
# tools/get_product_details.yml
mxcp: 1
tool:
name: get_product_details
description: "Get detailed information for a specific product including full description, specifications, reviews, and related products. Use product_id from list_products tool."
parameters:
- name: product_id
type: string
description: "Product identifier from list_products (e.g., 'PROD_12345')"
required: true
examples: ["PROD_12345"]
return:
type: object
description: "Complete product information"
properties:
product_id: { type: string, description: "Product identifier" }
name: { type: string, description: "Product name" }
description: { type: string, description: "Detailed product description" }
price: { type: number, description: "Current price in USD" }
stock: { type: integer, description: "Available quantity" }
specifications: { type: object, description: "Product specs (varies by category)" }
avg_rating: { type: number, description: "Average customer rating (0-5)" }
review_count: { type: integer, description: "Number of customer reviews" }
related_products: { type: array, description: "Product IDs of related items" }
source:
code: |
SELECT * FROM product_details WHERE product_id = $product_id
Documentation Quality Checklist
Before declaring a tool complete, verify:
Tool Level:
- Description explains WHAT it does
- Description explains WHAT it returns
- Description explains WHEN to use it
- Cross-references to related tools (if applicable)
- Use cases are clear
Parameter Level:
- Every parameter has a description
- Valid values/formats are documented
- Examples provided for complex parameters
- Required vs optional is clear
- Defaults documented (if optional)
Return Type Level:
- Return type structure is documented
- Every field has a description
- Complex nested objects are explained
- Array item types are described
Overall:
- An LLM reading this can use the tool WITHOUT human explanation
- An LLM knows WHEN to call this vs other tools
- An LLM knows HOW to get required parameters
- An LLM knows WHAT to expect in the response
Common Documentation Mistakes
❌ MISTAKE 1: Vague Descriptions
description: "Gets user info" # ❌ Which user? What info? When?
✅ FIX:
description: "Retrieve complete user profile including contact information, account status, and preferences for a specific user. Use user_id from list_users or search_users tools."
❌ MISTAKE 2: Missing Parameter Details
parameters:
- name: status
type: string # ❌ What are valid values?
✅ FIX:
parameters:
- name: status
type: string
description: "Order status filter. Valid values: 'pending', 'processing', 'shipped', 'delivered', 'cancelled'"
examples: ["pending", "shipped"]
❌ MISTAKE 3: Undocumented Return Fields
return:
type: object
properties:
total: { type: number } # ❌ Total what? In what units?
✅ FIX:
return:
type: object
properties:
total: { type: number, description: "Total order amount in USD including tax and shipping" }
❌ MISTAKE 4: No Cross-References
tool:
name: get_order_details
parameters:
- name: order_id
type: string # ❌ Where does LLM get this?
✅ FIX:
tool:
name: get_order_details
description: "Get detailed order information. Use order_id from list_orders or search_orders tools."
parameters:
- name: order_id
type: string
description: "Order identifier (format: ORD_XXXXXX) from list_orders or search_orders"
❌ MISTAKE 5: Technical Jargon Without Explanation
description: "Executes SOQL query on SF objects" # ❌ LLM doesn't know SOQL or SF
✅ FIX:
description: "Query Salesforce data using filters. Searches across accounts, contacts, and opportunities. Returns matching records with standard fields."
Testing Documentation Quality
Ask yourself: "If I gave this to an LLM with ZERO context about my domain, could it use this tool correctly?"
Test by asking:
- When should this tool be called?
- What parameters are needed and where do I get them?
- What will I get back?
- How does this relate to other tools?
If you can't answer clearly from the YAML alone, the documentation is insufficient.
Response Format Best Practices
Design tool outputs to optimize LLM context usage.
Provide Detail Level Options
Allow LLMs to request different levels of detail based on their needs.
tool:
name: search_products
parameters:
- name: query
type: string
description: "Product search query"
- name: detail_level
type: string
description: "Level of detail in response"
enum: ["minimal", "standard", "full"]
default: "standard"
examples:
- "minimal: Only ID, name, price (fastest, least context)"
- "standard: Basic info + category + stock"
- "full: All fields including descriptions and specifications"
Implementation in SQL:
SELECT
CASE $detail_level
WHEN 'minimal' THEN json_object('id', id, 'name', name, 'price', price)
WHEN 'standard' THEN json_object('id', id, 'name', name, 'price', price, 'category', category, 'in_stock', stock > 0)
ELSE json_object('id', id, 'name', name, 'price', price, 'category', category, 'stock', stock, 'description', description, 'specs', specs)
END as product
FROM products
WHERE name LIKE '%' || $query || '%'
Use Human-Readable Formats
Return data in formats LLMs can easily understand and communicate to users.
✅ Good: Human-Readable
return:
type: object
properties:
customer_id: { type: string, description: "Customer ID (CUST_12345)" }
customer_name: { type: string, description: "Display name" }
last_order_date: { type: string, description: "Date in YYYY-MM-DD format" }
total_spent: { type: number, description: "Total amount in USD" }
status: { type: string, description: "Account status: active, inactive, suspended" }
SQL implementation:
SELECT
customer_id,
name as customer_name,
DATE_FORMAT(last_order_date, '%Y-%m-%d') as last_order_date, -- Not epoch timestamp
ROUND(total_spent, 2) as total_spent,
status
FROM customers
❌ Bad: Opaque/Technical
return:
type: object
properties:
cust_id: { type: integer } # Unclear name
ts: { type: integer } # Epoch timestamp - not human readable
amt: { type: number } # Unclear abbreviation
stat_cd: { type: integer } # Status code instead of name
Include Display Names with IDs
When returning IDs, also return human-readable names.
return:
type: object
properties:
assigned_to_user_id: { type: string, description: "User ID" }
assigned_to_name: { type: string, description: "User display name" }
category_id: { type: string, description: "Category ID" }
category_name: { type: string, description: "Category name" }
Why: LLM can understand relationships without additional tool calls.
Limit Response Size
Prevent overwhelming LLMs with too much data.
tool:
name: list_transactions
parameters:
- name: limit
type: integer
description: "Maximum number of transactions to return (1-1000)"
default: 100
minimum: 1
maximum: 1000
Python implementation with truncation:
def list_transactions(limit: int = 100) -> dict:
"""List recent transactions with size limits"""
if limit > 1000:
return {
"success": False,
"error": f"Limit of {limit} exceeds maximum (1000). Use date filters to narrow results.",
"error_code": "LIMIT_EXCEEDED",
"suggestion": "Try adding 'start_date' and 'end_date' parameters"
}
results = db.execute(
"SELECT * FROM transactions ORDER BY date DESC LIMIT $limit",
{"limit": limit}
)
return {
"success": True,
"count": len(results),
"limit": limit,
"has_more": len(results) == limit,
"transactions": results,
"note": "Use pagination or filters if more results needed"
}
Provide Pagination Metadata
Help LLMs understand when more data is available.
return:
type: object
properties:
items: { type: array, description: "Results for this page" }
total_count: { type: integer, description: "Total matching results" }
returned_count: { type: integer, description: "Number returned in this response" }
has_more: { type: boolean, description: "Whether more results are available" }
next_offset: { type: integer, description: "Offset for next page" }
SQL implementation:
-- Get total count
WITH total AS (
SELECT COUNT(*) as count FROM products WHERE category = $category
)
SELECT
json_object(
'items', (SELECT json_group_array(json_object('id', id, 'name', name))
FROM products WHERE category = $category LIMIT $limit OFFSET $offset),
'total_count', (SELECT count FROM total),
'returned_count', MIN($limit, (SELECT count FROM total) - $offset),
'has_more', (SELECT count FROM total) > ($offset + $limit),
'next_offset', $offset + $limit
) as result
Format for Readability
Use clear field names and consistent structures.
✅ Good: Clear Structure
return:
type: object
properties:
summary:
type: object
description: "High-level summary"
properties:
total_orders: { type: integer }
total_revenue: { type: number }
average_order_value: { type: number }
top_products:
type: array
description: "Top 5 selling products"
items:
type: object
properties:
product_name: { type: string }
units_sold: { type: integer }
revenue: { type: number }
❌ Bad: Flat Unstructured
return:
type: object
properties:
total_orders: { type: integer }
total_revenue: { type: number }
product1_name: { type: string }
product1_units: { type: integer }
product2_name: { type: string }
# ...repeated pattern
Omit Verbose Metadata
Don't return internal/technical metadata that doesn't help LLMs.
# ✅ GOOD: Essential information only
return:
type: object
properties:
user_id: { type: string }
name: { type: string }
email: { type: string }
profile_image: { type: string, description: "Profile image URL" }
# ❌ BAD: Too much metadata
return:
type: object
properties:
user_id: { type: string }
name: { type: string }
email: { type: string }
profile_image_small: { type: string }
profile_image_medium: { type: string }
profile_image_large: { type: string }
profile_image_xlarge: { type: string }
internal_db_id: { type: integer }
created_timestamp_unix: { type: integer }
modified_timestamp_unix: { type: integer }
schema_version: { type: integer }
Principle: Include one best representation, not all variations.
Summary
Every tool must be self-documenting:
- ✅ Clear, detailed descriptions
- ✅ Documented parameters with examples
- ✅ Documented return types
- ✅ Cross-references to related tools
- ✅ Valid values and formats
- ✅ Use cases explained
Response format best practices:
- ✅ Provide detail level options (minimal/standard/full)
- ✅ Use human-readable formats (dates, names, not codes)
- ✅ Include display names alongside IDs
- ✅ Limit response sizes with clear guidance
- ✅ Provide pagination metadata
- ✅ Structure data clearly
- ✅ Omit verbose internal metadata
Remember: The LLM has NO prior knowledge. Your descriptions are its ONLY guide.