300 lines
6.5 KiB
Markdown
300 lines
6.5 KiB
Markdown
# Tool Calling Configuration for gpt-oss with vLLM
|
|
|
|
## Required vLLM Server Flags
|
|
|
|
For gpt-oss tool calling to work, vLLM must be started with specific flags.
|
|
|
|
### Minimal Configuration
|
|
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice
|
|
```
|
|
|
|
### Full Configuration with Tool Server
|
|
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--tool-server demo \
|
|
--max-model-len 8192 \
|
|
--dtype auto
|
|
```
|
|
|
|
## Flag Explanations
|
|
|
|
### --tool-call-parser openai
|
|
- **Required**: Yes
|
|
- **Purpose**: Uses OpenAI-compatible tool calling format
|
|
- **Effect**: Enables proper parsing of tool call tokens
|
|
- **Alternatives**: None for gpt-oss compatibility
|
|
|
|
### --enable-auto-tool-choice
|
|
- **Required**: Yes
|
|
- **Purpose**: Allows automatic tool selection
|
|
- **Effect**: Model can choose which tool to call
|
|
- **Note**: Only `tool_choice='auto'` is supported
|
|
|
|
### --tool-server
|
|
- **Required**: Optional, but needed for demo tools
|
|
- **Options**:
|
|
- `demo`: Built-in demo tools (browser, Python interpreter)
|
|
- `ip:port`: Custom MCP tool server
|
|
- Multiple servers: `ip1:port1,ip2:port2`
|
|
|
|
### --max-model-len
|
|
- **Required**: No
|
|
- **Purpose**: Sets maximum context length
|
|
- **Recommended**: 8192 or higher for tool calling contexts
|
|
- **Effect**: Prevents truncation during multi-turn tool conversations
|
|
|
|
## Tool Server Options
|
|
|
|
### Demo Tool Server
|
|
|
|
Requires gpt-oss library:
|
|
```bash
|
|
pip install gpt-oss
|
|
```
|
|
|
|
Provides:
|
|
- Web browser tool
|
|
- Python interpreter tool
|
|
|
|
Start command:
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--tool-server demo
|
|
```
|
|
|
|
### MCP Tool Servers
|
|
|
|
Start vLLM with MCP server URLs:
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--tool-server localhost:5000,localhost:5001
|
|
```
|
|
|
|
Requirements:
|
|
- MCP servers must be running before vLLM starts
|
|
- Must implement MCP protocol
|
|
- Should return results in expected format
|
|
|
|
### No Tool Server (Client-Managed Tools)
|
|
|
|
For client-side tool management (e.g., llama-stack with MCP):
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice
|
|
```
|
|
|
|
Tools are provided via API request, not server config.
|
|
|
|
## Environment Variables
|
|
|
|
### Search Tools
|
|
|
|
For demo tool server with search:
|
|
```bash
|
|
export EXA_API_KEY="your-exa-api-key"
|
|
```
|
|
|
|
### Python Execution
|
|
|
|
For safe Python execution (recommended for demo):
|
|
```bash
|
|
export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
|
|
```
|
|
|
|
**Warning**: Demo Python execution is for testing only.
|
|
|
|
## Client Configuration
|
|
|
|
### OpenAI-Compatible Clients
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url="http://localhost:8000/v1",
|
|
api_key="not-needed" # vLLM doesn't require auth by default
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="openai/gpt-oss-20b",
|
|
messages=[{"role": "user", "content": "What's 2+2?"}],
|
|
tools=[{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "calculator",
|
|
"description": "Perform calculations",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"expression": {"type": "string"}
|
|
},
|
|
"required": ["expression"]
|
|
}
|
|
}
|
|
}],
|
|
tool_choice="auto" # MUST be 'auto' - only supported value
|
|
)
|
|
```
|
|
|
|
### llama-stack Configuration
|
|
|
|
Example run.yaml for llama-stack with vLLM:
|
|
```yaml
|
|
inference:
|
|
- provider_id: vllm-provider
|
|
provider_type: remote::vllm
|
|
config:
|
|
url: http://localhost:8000/v1
|
|
# No auth_credential needed if vLLM has no auth
|
|
|
|
tool_runtime:
|
|
- provider_id: mcp-provider
|
|
provider_type: mcp
|
|
config:
|
|
servers:
|
|
- server_name: my-tools
|
|
url: http://localhost:5000
|
|
```
|
|
|
|
## Common Configuration Issues
|
|
|
|
### Issue: tool_choice Not Working
|
|
|
|
**Symptom**: Error about unsupported tool_choice value
|
|
|
|
**Solution**: Only use `tool_choice="auto"`, other values not supported:
|
|
```python
|
|
# GOOD
|
|
tool_choice="auto"
|
|
|
|
# BAD - will fail
|
|
tool_choice="required"
|
|
tool_choice={"type": "function", "function": {"name": "my_func"}}
|
|
```
|
|
|
|
### Issue: Tools Not Being Called
|
|
|
|
**Symptoms**:
|
|
- Model describes tool usage in text
|
|
- No tool_calls in response
|
|
- Empty tool_calls array
|
|
|
|
**Checklist**:
|
|
1. Verify `--tool-call-parser openai` flag is set
|
|
2. Verify `--enable-auto-tool-choice` flag is set
|
|
3. Check generation_config.json is up to date (see model-updates.md)
|
|
4. Try simpler tool schemas first
|
|
5. Check vLLM logs for parsing errors
|
|
|
|
### Issue: Token Parsing Errors
|
|
|
|
**Error**: `openai_harmony.HarmonyError: Unexpected token X`
|
|
|
|
**Solutions**:
|
|
1. Update model files (see model-updates.md)
|
|
2. Verify vLLM version is recent
|
|
3. Check vLLM startup logs for warnings
|
|
4. Restart vLLM server after any config changes
|
|
|
|
## Performance Tuning
|
|
|
|
### GPU Memory
|
|
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--gpu-memory-utilization 0.9
|
|
```
|
|
|
|
### Tensor Parallelism
|
|
|
|
For multi-GPU:
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--tensor-parallel-size 2
|
|
```
|
|
|
|
### Batching
|
|
|
|
```bash
|
|
vllm serve openai/gpt-oss-20b \
|
|
--tool-call-parser openai \
|
|
--enable-auto-tool-choice \
|
|
--max-num-batched-tokens 8192 \
|
|
--max-num-seqs 256
|
|
```
|
|
|
|
## Testing Your Configuration
|
|
|
|
### Basic Test
|
|
|
|
```bash
|
|
curl http://localhost:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "openai/gpt-oss-20b",
|
|
"messages": [{"role": "user", "content": "Calculate 15 * 7"}],
|
|
"tools": [{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "calculator",
|
|
"description": "Perform math",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {"expr": {"type": "string"}}
|
|
}
|
|
}
|
|
}],
|
|
"tool_choice": "auto"
|
|
}'
|
|
```
|
|
|
|
### Expected Response
|
|
|
|
Success indicates tool_calls array with function call:
|
|
```json
|
|
{
|
|
"choices": [{
|
|
"message": {
|
|
"role": "assistant",
|
|
"content": null,
|
|
"tool_calls": [{
|
|
"id": "call_123",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "calculator",
|
|
"arguments": "{\"expr\": \"15 * 7\"}"
|
|
}
|
|
}]
|
|
}
|
|
}]
|
|
}
|
|
```
|
|
|
|
### Failure Indicators
|
|
|
|
- `content` field has text describing calculation instead of null
|
|
- `tool_calls` is empty or null
|
|
- Error in response about tool parsing
|
|
- HarmonyError in vLLM logs
|
|
|
|
## Cross-References
|
|
|
|
- Model file updates: See model-updates.md
|
|
- Known issues: See known-issues.md
|