# LLM-Friendly Documentation Guide **CRITICAL: Tools must be self-documenting for LLMs without any prior context.** ## Core Principle **LLMs connecting to MXCP servers have ZERO context about your domain, data, or tools.** They only see: - Tool name - Tool description - Parameter names and types - Parameter descriptions - Return type structure **The documentation YOU provide is the ONLY information they have.** ## Tool Description Requirements ### ❌ BAD Tool Description ```yaml tool: name: get_data description: "Gets data" # ❌ Useless - what data? how? when to use? parameters: - name: id type: string # ❌ No description - what kind of ID? return: type: array # ❌ Array of what? ``` **Why bad**: LLM has no idea when to use this, what ID means, what data is returned. ### ✅ GOOD Tool Description ```yaml tool: name: get_customer_orders description: "Retrieve all orders for a specific customer by customer ID. Returns order history including order date, total amount, status, and items. Use this to answer questions about a customer's purchase history or order status." parameters: - name: customer_id type: string description: "Unique customer identifier (e.g., 'CUST_12345'). Found in customer records or from list_customers tool." required: true examples: ["CUST_12345", "CUST_98765"] - name: status type: string description: "Optional filter by order status. Valid values: 'pending', 'shipped', 'delivered', 'cancelled'. Omit to get all orders." required: false examples: ["pending", "shipped"] return: type: array items: type: object properties: order_id: { type: string, description: "Unique order identifier" } order_date: { type: string, description: "ISO 8601 date when order was placed" } total_amount: { type: number, description: "Total order value in USD" } status: { type: string, description: "Current order status" } items: { type: array, description: "List of items in the order" } ``` **Why good**: - LLM knows WHEN to use it (customer purchase history, order status) - LLM knows WHAT parameters mean and valid values - LLM knows WHAT will be returned - LLM can chain with other tools (mentions list_customers) ## Description Template ### Tool-Level Description **Format**: ` ` ```yaml description: "Retrieve sales analytics by region and time period. Returns aggregated metrics including total sales, transaction count, and average order value. Use this to answer questions about sales performance, regional comparisons, or time-based trends." ``` **Must include**: 1. **What**: What data/operation 2. **Returns**: Summary of return data 3. **When**: Use cases / when LLM should call this ### Parameter Description **Format**: ` ` ```yaml parameters: - name: region type: string description: "Geographic region code. Valid values: 'north', 'south', 'east', 'west'. Use 'all' for aggregated data across all regions." examples: ["north", "south", "all"] - name: start_date type: string format: date description: "Start date for analytics period in YYYY-MM-DD format. Defaults to 30 days ago if omitted." required: false examples: ["2024-01-01", "2024-06-15"] - name: limit type: integer description: "Maximum number of results to return. Defaults to 100. Set to -1 for all results (use cautiously for large datasets)." default: 100 examples: [10, 50, 100] ``` **Must include**: 1. **What it is**: Clear explanation 2. **Valid values**: Enums, formats, ranges 3. **Defaults**: If parameter is optional 4. **Examples**: Concrete examples ### Return Type Description **Include descriptions for ALL fields**: ```yaml return: type: object properties: total_sales: type: number description: "Sum of all sales in USD for the period" transaction_count: type: integer description: "Number of individual transactions" avg_order_value: type: number description: "Average transaction amount (total_sales / transaction_count)" top_products: type: array description: "Top 5 products by revenue" items: type: object properties: product_id: { type: string, description: "Product identifier" } product_name: { type: string, description: "Human-readable product name" } revenue: { type: number, description: "Total revenue for this product in USD" } ``` ## Combining Tools - Cross-References **Help LLMs chain tools together** by mentioning related tools: ```yaml tool: name: get_customer_details description: "Get detailed information for a specific customer. Use customer_id from list_customers tool or search_customers tool. Returns personal info, account status, and lifetime value." # ... parameters ... ``` ```yaml tool: name: list_customers description: "List all customers with optional filtering. Returns customer_id needed for get_customer_details and get_customer_orders tools." # ... parameters ... ``` **LLM workflow enabled**: 1. LLM sees: "I need customer details" 2. Reads: "Use customer_id from list_customers tool" 3. Calls: `list_customers` first 4. Gets: `customer_id` 5. Calls: `get_customer_details` with that ID ## Examples in Descriptions **ALWAYS provide concrete examples**: ```yaml parameters: - name: date_range type: string description: "Date range in format 'YYYY-MM-DD to YYYY-MM-DD' or use shortcuts: 'today', 'yesterday', 'last_7_days', 'last_30_days', 'last_month', 'this_year'" examples: - "2024-01-01 to 2024-12-31" - "last_7_days" - "this_year" ``` ## Error Cases in Descriptions **Document expected errors**: ```yaml tool: name: get_order description: "Retrieve order by order ID. Returns order details if found. Returns error if order_id doesn't exist or user doesn't have permission to view this order." parameters: - name: order_id type: string description: "Order identifier. Format: ORD_XXXXXX (e.g., 'ORD_123456'). Returns error if order not found." ``` ## Resource URIs **Make URI templates clear**: ```yaml resource: uri: "customer://profile/{customer_id}" description: "Access customer profile data. Replace {customer_id} with actual customer ID (e.g., 'CUST_12345'). Returns 404 if customer doesn't exist." parameters: - name: customer_id type: string description: "Customer identifier from list_customers or search_customers" ``` ## Prompt Templates **Explain template variables clearly**: ```yaml prompt: name: analyze_customer description: "Generate customer analysis report. Provide customer_id to analyze spending patterns, order frequency, and recommendations." parameters: - name: customer_id type: string description: "Customer to analyze (from list_customers)" - name: analysis_type type: string description: "Type of analysis: 'spending' (purchase patterns), 'behavior' (order frequency), 'recommendations' (product suggestions)" examples: ["spending", "behavior", "recommendations"] messages: - role: system type: text prompt: "You are a customer analytics expert. Analyze data thoroughly and provide actionable insights." - role: user type: text prompt: "Analyze customer {{ customer_id }} focusing on {{ analysis_type }}. Include specific metrics and recommendations." ``` ## Complete Example: Well-Documented Tool Set ```yaml # tools/list_products.yml mxcp: 1 tool: name: list_products description: "List all available products with optional category filtering. Returns product catalog with IDs, names, prices, and stock levels. Use this to browse products or find product_id for get_product_details tool." parameters: - name: category type: string description: "Filter by product category. Valid values: 'electronics', 'clothing', 'food', 'books', 'home'. Omit to see all categories." required: false examples: ["electronics", "clothing"] - name: in_stock_only type: boolean description: "If true, only return products currently in stock. Default: false (shows all products)." default: false return: type: array description: "Array of product objects sorted by name" items: type: object properties: product_id: type: string description: "Unique product identifier (use with get_product_details)" name: type: string description: "Product name" category: type: string description: "Product category" price: type: number description: "Current price in USD" stock: type: integer description: "Current stock level (0 = out of stock)" source: code: | SELECT product_id, name, category, price, stock FROM products WHERE ($category IS NULL OR category = $category) AND ($in_stock_only = false OR stock > 0) ORDER BY name ``` ```yaml # tools/get_product_details.yml mxcp: 1 tool: name: get_product_details description: "Get detailed information for a specific product including full description, specifications, reviews, and related products. Use product_id from list_products tool." parameters: - name: product_id type: string description: "Product identifier from list_products (e.g., 'PROD_12345')" required: true examples: ["PROD_12345"] return: type: object description: "Complete product information" properties: product_id: { type: string, description: "Product identifier" } name: { type: string, description: "Product name" } description: { type: string, description: "Detailed product description" } price: { type: number, description: "Current price in USD" } stock: { type: integer, description: "Available quantity" } specifications: { type: object, description: "Product specs (varies by category)" } avg_rating: { type: number, description: "Average customer rating (0-5)" } review_count: { type: integer, description: "Number of customer reviews" } related_products: { type: array, description: "Product IDs of related items" } source: code: | SELECT * FROM product_details WHERE product_id = $product_id ``` ## Documentation Quality Checklist Before declaring a tool complete, verify: ### Tool Level: - [ ] Description explains WHAT it does - [ ] Description explains WHAT it returns - [ ] Description explains WHEN to use it - [ ] Cross-references to related tools (if applicable) - [ ] Use cases are clear ### Parameter Level: - [ ] Every parameter has a description - [ ] Valid values/formats are documented - [ ] Examples provided for complex parameters - [ ] Required vs optional is clear - [ ] Defaults documented (if optional) ### Return Type Level: - [ ] Return type structure is documented - [ ] Every field has a description - [ ] Complex nested objects are explained - [ ] Array item types are described ### Overall: - [ ] An LLM reading this can use the tool WITHOUT human explanation - [ ] An LLM knows WHEN to call this vs other tools - [ ] An LLM knows HOW to get required parameters - [ ] An LLM knows WHAT to expect in the response ## Common Documentation Mistakes ### ❌ MISTAKE 1: Vague Descriptions ```yaml description: "Gets user info" # ❌ Which user? What info? When? ``` ✅ **FIX**: ```yaml description: "Retrieve complete user profile including contact information, account status, and preferences for a specific user. Use user_id from list_users or search_users tools." ``` ### ❌ MISTAKE 2: Missing Parameter Details ```yaml parameters: - name: status type: string # ❌ What are valid values? ``` ✅ **FIX**: ```yaml parameters: - name: status type: string description: "Order status filter. Valid values: 'pending', 'processing', 'shipped', 'delivered', 'cancelled'" examples: ["pending", "shipped"] ``` ### ❌ MISTAKE 3: Undocumented Return Fields ```yaml return: type: object properties: total: { type: number } # ❌ Total what? In what units? ``` ✅ **FIX**: ```yaml return: type: object properties: total: { type: number, description: "Total order amount in USD including tax and shipping" } ``` ### ❌ MISTAKE 4: No Cross-References ```yaml tool: name: get_order_details parameters: - name: order_id type: string # ❌ Where does LLM get this? ``` ✅ **FIX**: ```yaml tool: name: get_order_details description: "Get detailed order information. Use order_id from list_orders or search_orders tools." parameters: - name: order_id type: string description: "Order identifier (format: ORD_XXXXXX) from list_orders or search_orders" ``` ### ❌ MISTAKE 5: Technical Jargon Without Explanation ```yaml description: "Executes SOQL query on SF objects" # ❌ LLM doesn't know SOQL or SF ``` ✅ **FIX**: ```yaml description: "Query Salesforce data using filters. Searches across accounts, contacts, and opportunities. Returns matching records with standard fields." ``` ## Testing Documentation Quality **Ask yourself**: "If I gave this to an LLM with ZERO context about my domain, could it use this tool correctly?" **Test by asking**: 1. When should this tool be called? 2. What parameters are needed and where do I get them? 3. What will I get back? 4. How does this relate to other tools? If you can't answer clearly from the YAML alone, **the documentation is insufficient.** ## Response Format Best Practices **Design tool outputs to optimize LLM context usage.** ### Provide Detail Level Options Allow LLMs to request different levels of detail based on their needs. ```yaml tool: name: search_products parameters: - name: query type: string description: "Product search query" - name: detail_level type: string description: "Level of detail in response" enum: ["minimal", "standard", "full"] default: "standard" examples: - "minimal: Only ID, name, price (fastest, least context)" - "standard: Basic info + category + stock" - "full: All fields including descriptions and specifications" ``` **Implementation in SQL**: ```sql SELECT CASE $detail_level WHEN 'minimal' THEN json_object('id', id, 'name', name, 'price', price) WHEN 'standard' THEN json_object('id', id, 'name', name, 'price', price, 'category', category, 'in_stock', stock > 0) ELSE json_object('id', id, 'name', name, 'price', price, 'category', category, 'stock', stock, 'description', description, 'specs', specs) END as product FROM products WHERE name LIKE '%' || $query || '%' ``` ### Use Human-Readable Formats **Return data in formats LLMs can easily understand and communicate to users.** #### ✅ Good: Human-Readable ```yaml return: type: object properties: customer_id: { type: string, description: "Customer ID (CUST_12345)" } customer_name: { type: string, description: "Display name" } last_order_date: { type: string, description: "Date in YYYY-MM-DD format" } total_spent: { type: number, description: "Total amount in USD" } status: { type: string, description: "Account status: active, inactive, suspended" } ``` **SQL implementation**: ```sql SELECT customer_id, name as customer_name, DATE_FORMAT(last_order_date, '%Y-%m-%d') as last_order_date, -- Not epoch timestamp ROUND(total_spent, 2) as total_spent, status FROM customers ``` #### ❌ Bad: Opaque/Technical ```yaml return: type: object properties: cust_id: { type: integer } # Unclear name ts: { type: integer } # Epoch timestamp - not human readable amt: { type: number } # Unclear abbreviation stat_cd: { type: integer } # Status code instead of name ``` ### Include Display Names with IDs When returning IDs, also return human-readable names. ```yaml return: type: object properties: assigned_to_user_id: { type: string, description: "User ID" } assigned_to_name: { type: string, description: "User display name" } category_id: { type: string, description: "Category ID" } category_name: { type: string, description: "Category name" } ``` **Why**: LLM can understand relationships without additional tool calls. ### Limit Response Size **Prevent overwhelming LLMs with too much data.** ```yaml tool: name: list_transactions parameters: - name: limit type: integer description: "Maximum number of transactions to return (1-1000)" default: 100 minimum: 1 maximum: 1000 ``` **Python implementation with truncation**: ```python def list_transactions(limit: int = 100) -> dict: """List recent transactions with size limits""" if limit > 1000: return { "success": False, "error": f"Limit of {limit} exceeds maximum (1000). Use date filters to narrow results.", "error_code": "LIMIT_EXCEEDED", "suggestion": "Try adding 'start_date' and 'end_date' parameters" } results = db.execute( "SELECT * FROM transactions ORDER BY date DESC LIMIT $limit", {"limit": limit} ) return { "success": True, "count": len(results), "limit": limit, "has_more": len(results) == limit, "transactions": results, "note": "Use pagination or filters if more results needed" } ``` ### Provide Pagination Metadata **Help LLMs understand when more data is available.** ```yaml return: type: object properties: items: { type: array, description: "Results for this page" } total_count: { type: integer, description: "Total matching results" } returned_count: { type: integer, description: "Number returned in this response" } has_more: { type: boolean, description: "Whether more results are available" } next_offset: { type: integer, description: "Offset for next page" } ``` **SQL implementation**: ```sql -- Get total count WITH total AS ( SELECT COUNT(*) as count FROM products WHERE category = $category ) SELECT json_object( 'items', (SELECT json_group_array(json_object('id', id, 'name', name)) FROM products WHERE category = $category LIMIT $limit OFFSET $offset), 'total_count', (SELECT count FROM total), 'returned_count', MIN($limit, (SELECT count FROM total) - $offset), 'has_more', (SELECT count FROM total) > ($offset + $limit), 'next_offset', $offset + $limit ) as result ``` ### Format for Readability **Use clear field names and consistent structures.** #### ✅ Good: Clear Structure ```yaml return: type: object properties: summary: type: object description: "High-level summary" properties: total_orders: { type: integer } total_revenue: { type: number } average_order_value: { type: number } top_products: type: array description: "Top 5 selling products" items: type: object properties: product_name: { type: string } units_sold: { type: integer } revenue: { type: number } ``` #### ❌ Bad: Flat Unstructured ```yaml return: type: object properties: total_orders: { type: integer } total_revenue: { type: number } product1_name: { type: string } product1_units: { type: integer } product2_name: { type: string } # ...repeated pattern ``` ### Omit Verbose Metadata **Don't return internal/technical metadata that doesn't help LLMs.** ```yaml # ✅ GOOD: Essential information only return: type: object properties: user_id: { type: string } name: { type: string } email: { type: string } profile_image: { type: string, description: "Profile image URL" } # ❌ BAD: Too much metadata return: type: object properties: user_id: { type: string } name: { type: string } email: { type: string } profile_image_small: { type: string } profile_image_medium: { type: string } profile_image_large: { type: string } profile_image_xlarge: { type: string } internal_db_id: { type: integer } created_timestamp_unix: { type: integer } modified_timestamp_unix: { type: integer } schema_version: { type: integer } ``` **Principle**: Include one best representation, not all variations. ## Summary **Every tool must be self-documenting**: - ✅ Clear, detailed descriptions - ✅ Documented parameters with examples - ✅ Documented return types - ✅ Cross-references to related tools - ✅ Valid values and formats - ✅ Use cases explained **Response format best practices**: - ✅ Provide detail level options (minimal/standard/full) - ✅ Use human-readable formats (dates, names, not codes) - ✅ Include display names alongside IDs - ✅ Limit response sizes with clear guidance - ✅ Provide pagination metadata - ✅ Structure data clearly - ✅ Omit verbose internal metadata **Remember**: The LLM has NO prior knowledge. Your descriptions are its ONLY guide.