Files
gh-bbrowning-bbrowning-clau…/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
2025-11-29 18:00:42 +08:00

300 lines
6.5 KiB
Markdown

# Tool Calling Configuration for gpt-oss with vLLM
## Required vLLM Server Flags
For gpt-oss tool calling to work, vLLM must be started with specific flags.
### Minimal Configuration
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice
```
### Full Configuration with Tool Server
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--tool-server demo \
--max-model-len 8192 \
--dtype auto
```
## Flag Explanations
### --tool-call-parser openai
- **Required**: Yes
- **Purpose**: Uses OpenAI-compatible tool calling format
- **Effect**: Enables proper parsing of tool call tokens
- **Alternatives**: None for gpt-oss compatibility
### --enable-auto-tool-choice
- **Required**: Yes
- **Purpose**: Allows automatic tool selection
- **Effect**: Model can choose which tool to call
- **Note**: Only `tool_choice='auto'` is supported
### --tool-server
- **Required**: Optional, but needed for demo tools
- **Options**:
- `demo`: Built-in demo tools (browser, Python interpreter)
- `ip:port`: Custom MCP tool server
- Multiple servers: `ip1:port1,ip2:port2`
### --max-model-len
- **Required**: No
- **Purpose**: Sets maximum context length
- **Recommended**: 8192 or higher for tool calling contexts
- **Effect**: Prevents truncation during multi-turn tool conversations
## Tool Server Options
### Demo Tool Server
Requires gpt-oss library:
```bash
pip install gpt-oss
```
Provides:
- Web browser tool
- Python interpreter tool
Start command:
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--tool-server demo
```
### MCP Tool Servers
Start vLLM with MCP server URLs:
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--tool-server localhost:5000,localhost:5001
```
Requirements:
- MCP servers must be running before vLLM starts
- Must implement MCP protocol
- Should return results in expected format
### No Tool Server (Client-Managed Tools)
For client-side tool management (e.g., llama-stack with MCP):
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice
```
Tools are provided via API request, not server config.
## Environment Variables
### Search Tools
For demo tool server with search:
```bash
export EXA_API_KEY="your-exa-api-key"
```
### Python Execution
For safe Python execution (recommended for demo):
```bash
export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
```
**Warning**: Demo Python execution is for testing only.
## Client Configuration
### OpenAI-Compatible Clients
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed" # vLLM doesn't require auth by default
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Perform calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
}
}],
tool_choice="auto" # MUST be 'auto' - only supported value
)
```
### llama-stack Configuration
Example run.yaml for llama-stack with vLLM:
```yaml
inference:
- provider_id: vllm-provider
provider_type: remote::vllm
config:
url: http://localhost:8000/v1
# No auth_credential needed if vLLM has no auth
tool_runtime:
- provider_id: mcp-provider
provider_type: mcp
config:
servers:
- server_name: my-tools
url: http://localhost:5000
```
## Common Configuration Issues
### Issue: tool_choice Not Working
**Symptom**: Error about unsupported tool_choice value
**Solution**: Only use `tool_choice="auto"`, other values not supported:
```python
# GOOD
tool_choice="auto"
# BAD - will fail
tool_choice="required"
tool_choice={"type": "function", "function": {"name": "my_func"}}
```
### Issue: Tools Not Being Called
**Symptoms**:
- Model describes tool usage in text
- No tool_calls in response
- Empty tool_calls array
**Checklist**:
1. Verify `--tool-call-parser openai` flag is set
2. Verify `--enable-auto-tool-choice` flag is set
3. Check generation_config.json is up to date (see model-updates.md)
4. Try simpler tool schemas first
5. Check vLLM logs for parsing errors
### Issue: Token Parsing Errors
**Error**: `openai_harmony.HarmonyError: Unexpected token X`
**Solutions**:
1. Update model files (see model-updates.md)
2. Verify vLLM version is recent
3. Check vLLM startup logs for warnings
4. Restart vLLM server after any config changes
## Performance Tuning
### GPU Memory
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--gpu-memory-utilization 0.9
```
### Tensor Parallelism
For multi-GPU:
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--tensor-parallel-size 2
```
### Batching
```bash
vllm serve openai/gpt-oss-20b \
--tool-call-parser openai \
--enable-auto-tool-choice \
--max-num-batched-tokens 8192 \
--max-num-seqs 256
```
## Testing Your Configuration
### Basic Test
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [{"role": "user", "content": "Calculate 15 * 7"}],
"tools": [{
"type": "function",
"function": {
"name": "calculator",
"description": "Perform math",
"parameters": {
"type": "object",
"properties": {"expr": {"type": "string"}}
}
}
}],
"tool_choice": "auto"
}'
```
### Expected Response
Success indicates tool_calls array with function call:
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_123",
"type": "function",
"function": {
"name": "calculator",
"arguments": "{\"expr\": \"15 * 7\"}"
}
}]
}
}]
}
```
### Failure Indicators
- `content` field has text describing calculation instead of null
- `tool_calls` is empty or null
- Error in response about tool parsing
- HarmonyError in vLLM logs
## Cross-References
- Model file updates: See model-updates.md
- Known issues: See known-issues.md