Initial commit

2025-11-29 18:00:42 +08:00
commit cc49e355bc
37 changed files with 10917 additions and 0 deletions
--- a/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
+++ b/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
@@ -0,0 +1,299 @@
+# Tool Calling Configuration for gpt-oss with vLLM
+
+## Required vLLM Server Flags
+
+For gpt-oss tool calling to work, vLLM must be started with specific flags.
+
+### Minimal Configuration
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice
+```
+
+### Full Configuration with Tool Server
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server demo \
+  --max-model-len 8192 \
+  --dtype auto
+```
+
+## Flag Explanations
+
+### --tool-call-parser openai
+- **Required**: Yes
+- **Purpose**: Uses OpenAI-compatible tool calling format
+- **Effect**: Enables proper parsing of tool call tokens
+- **Alternatives**: None for gpt-oss compatibility
+
+### --enable-auto-tool-choice
+- **Required**: Yes
+- **Purpose**: Allows automatic tool selection
+- **Effect**: Model can choose which tool to call
+- **Note**: Only `tool_choice='auto'` is supported
+
+### --tool-server
+- **Required**: Optional, but needed for demo tools
+- **Options**:
+  - `demo`: Built-in demo tools (browser, Python interpreter)
+  - `ip:port`: Custom MCP tool server
+  - Multiple servers: `ip1:port1,ip2:port2`
+
+### --max-model-len
+- **Required**: No
+- **Purpose**: Sets maximum context length
+- **Recommended**: 8192 or higher for tool calling contexts
+- **Effect**: Prevents truncation during multi-turn tool conversations
+
+## Tool Server Options
+
+### Demo Tool Server
+
+Requires gpt-oss library:
+```bash
+pip install gpt-oss
+```
+
+Provides:
+- Web browser tool
+- Python interpreter tool
+
+Start command:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server demo
+```
+
+### MCP Tool Servers
+
+Start vLLM with MCP server URLs:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server localhost:5000,localhost:5001
+```
+
+Requirements:
+- MCP servers must be running before vLLM starts
+- Must implement MCP protocol
+- Should return results in expected format
+
+### No Tool Server (Client-Managed Tools)
+
+For client-side tool management (e.g., llama-stack with MCP):
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice
+```
+
+Tools are provided via API request, not server config.
+
+## Environment Variables
+
+### Search Tools
+
+For demo tool server with search:
+```bash
+export EXA_API_KEY="your-exa-api-key"
+```
+
+### Python Execution
+
+For safe Python execution (recommended for demo):
+```bash
+export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
+```
+
+**Warning**: Demo Python execution is for testing only.
+
+## Client Configuration
+
+### OpenAI-Compatible Clients
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="not-needed"  # vLLM doesn't require auth by default
+)
+
+response = client.chat.completions.create(
+    model="openai/gpt-oss-20b",
+    messages=[{"role": "user", "content": "What's 2+2?"}],
+    tools=[{
+        "type": "function",
+        "function": {
+            "name": "calculator",
+            "description": "Perform calculations",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "expression": {"type": "string"}
+                },
+                "required": ["expression"]
+            }
+        }
+    }],
+    tool_choice="auto"  # MUST be 'auto' - only supported value
+)
+```
+
+### llama-stack Configuration
+
+Example run.yaml for llama-stack with vLLM:
+```yaml
+inference:
+  - provider_id: vllm-provider
+    provider_type: remote::vllm
+    config:
+      url: http://localhost:8000/v1
+      # No auth_credential needed if vLLM has no auth
+
+tool_runtime:
+  - provider_id: mcp-provider
+    provider_type: mcp
+    config:
+      servers:
+        - server_name: my-tools
+          url: http://localhost:5000
+```
+
+## Common Configuration Issues
+
+### Issue: tool_choice Not Working
+
+**Symptom**: Error about unsupported tool_choice value
+
+**Solution**: Only use `tool_choice="auto"`, other values not supported:
+```python
+# GOOD
+tool_choice="auto"
+
+# BAD - will fail
+tool_choice="required"
+tool_choice={"type": "function", "function": {"name": "my_func"}}
+```
+
+### Issue: Tools Not Being Called
+
+**Symptoms**:
+- Model describes tool usage in text
+- No tool_calls in response
+- Empty tool_calls array
+
+**Checklist**:
+1. Verify `--tool-call-parser openai` flag is set
+2. Verify `--enable-auto-tool-choice` flag is set
+3. Check generation_config.json is up to date (see model-updates.md)
+4. Try simpler tool schemas first
+5. Check vLLM logs for parsing errors
+
+### Issue: Token Parsing Errors
+
+**Error**: `openai_harmony.HarmonyError: Unexpected token X`
+
+**Solutions**:
+1. Update model files (see model-updates.md)
+2. Verify vLLM version is recent
+3. Check vLLM startup logs for warnings
+4. Restart vLLM server after any config changes
+
+## Performance Tuning
+
+### GPU Memory
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --gpu-memory-utilization 0.9
+```
+
+### Tensor Parallelism
+
+For multi-GPU:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tensor-parallel-size 2
+```
+
+### Batching
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --max-num-batched-tokens 8192 \
+  --max-num-seqs 256
+```
+
+## Testing Your Configuration
+
+### Basic Test
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-oss-20b",
+    "messages": [{"role": "user", "content": "Calculate 15 * 7"}],
+    "tools": [{
+      "type": "function",
+      "function": {
+        "name": "calculator",
+        "description": "Perform math",
+        "parameters": {
+          "type": "object",
+          "properties": {"expr": {"type": "string"}}
+        }
+      }
+    }],
+    "tool_choice": "auto"
+  }'
+```
+
+### Expected Response
+
+Success indicates tool_calls array with function call:
+```json
+{
+  "choices": [{
+    "message": {
+      "role": "assistant",
+      "content": null,
+      "tool_calls": [{
+        "id": "call_123",
+        "type": "function",
+        "function": {
+          "name": "calculator",
+          "arguments": "{\"expr\": \"15 * 7\"}"
+        }
+      }]
+    }
+  }]
+}
+```
+
+### Failure Indicators
+
+- `content` field has text describing calculation instead of null
+- `tool_calls` is empty or null
+- Error in response about tool parsing
+- HarmonyError in vLLM logs
+
+## Cross-References
+
+- Model file updates: See model-updates.md
+- Known issues: See known-issues.md