Initial commit
This commit is contained in:
299
skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
Normal file
299
skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Tool Calling Configuration for gpt-oss with vLLM
|
||||
|
||||
## Required vLLM Server Flags
|
||||
|
||||
For gpt-oss tool calling to work, vLLM must be started with specific flags.
|
||||
|
||||
### Minimal Configuration
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
### Full Configuration with Tool Server
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server demo \
|
||||
--max-model-len 8192 \
|
||||
--dtype auto
|
||||
```
|
||||
|
||||
## Flag Explanations
|
||||
|
||||
### --tool-call-parser openai
|
||||
- **Required**: Yes
|
||||
- **Purpose**: Uses OpenAI-compatible tool calling format
|
||||
- **Effect**: Enables proper parsing of tool call tokens
|
||||
- **Alternatives**: None for gpt-oss compatibility
|
||||
|
||||
### --enable-auto-tool-choice
|
||||
- **Required**: Yes
|
||||
- **Purpose**: Allows automatic tool selection
|
||||
- **Effect**: Model can choose which tool to call
|
||||
- **Note**: Only `tool_choice='auto'` is supported
|
||||
|
||||
### --tool-server
|
||||
- **Required**: Optional, but needed for demo tools
|
||||
- **Options**:
|
||||
- `demo`: Built-in demo tools (browser, Python interpreter)
|
||||
- `ip:port`: Custom MCP tool server
|
||||
- Multiple servers: `ip1:port1,ip2:port2`
|
||||
|
||||
### --max-model-len
|
||||
- **Required**: No
|
||||
- **Purpose**: Sets maximum context length
|
||||
- **Recommended**: 8192 or higher for tool calling contexts
|
||||
- **Effect**: Prevents truncation during multi-turn tool conversations
|
||||
|
||||
## Tool Server Options
|
||||
|
||||
### Demo Tool Server
|
||||
|
||||
Requires gpt-oss library:
|
||||
```bash
|
||||
pip install gpt-oss
|
||||
```
|
||||
|
||||
Provides:
|
||||
- Web browser tool
|
||||
- Python interpreter tool
|
||||
|
||||
Start command:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server demo
|
||||
```
|
||||
|
||||
### MCP Tool Servers
|
||||
|
||||
Start vLLM with MCP server URLs:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server localhost:5000,localhost:5001
|
||||
```
|
||||
|
||||
Requirements:
|
||||
- MCP servers must be running before vLLM starts
|
||||
- Must implement MCP protocol
|
||||
- Should return results in expected format
|
||||
|
||||
### No Tool Server (Client-Managed Tools)
|
||||
|
||||
For client-side tool management (e.g., llama-stack with MCP):
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
Tools are provided via API request, not server config.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Search Tools
|
||||
|
||||
For demo tool server with search:
|
||||
```bash
|
||||
export EXA_API_KEY="your-exa-api-key"
|
||||
```
|
||||
|
||||
### Python Execution
|
||||
|
||||
For safe Python execution (recommended for demo):
|
||||
```bash
|
||||
export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
|
||||
```
|
||||
|
||||
**Warning**: Demo Python execution is for testing only.
|
||||
|
||||
## Client Configuration
|
||||
|
||||
### OpenAI-Compatible Clients
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8000/v1",
|
||||
api_key="not-needed" # vLLM doesn't require auth by default
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="openai/gpt-oss-20b",
|
||||
messages=[{"role": "user", "content": "What's 2+2?"}],
|
||||
tools=[{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"description": "Perform calculations",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"expression": {"type": "string"}
|
||||
},
|
||||
"required": ["expression"]
|
||||
}
|
||||
}
|
||||
}],
|
||||
tool_choice="auto" # MUST be 'auto' - only supported value
|
||||
)
|
||||
```
|
||||
|
||||
### llama-stack Configuration
|
||||
|
||||
Example run.yaml for llama-stack with vLLM:
|
||||
```yaml
|
||||
inference:
|
||||
- provider_id: vllm-provider
|
||||
provider_type: remote::vllm
|
||||
config:
|
||||
url: http://localhost:8000/v1
|
||||
# No auth_credential needed if vLLM has no auth
|
||||
|
||||
tool_runtime:
|
||||
- provider_id: mcp-provider
|
||||
provider_type: mcp
|
||||
config:
|
||||
servers:
|
||||
- server_name: my-tools
|
||||
url: http://localhost:5000
|
||||
```
|
||||
|
||||
## Common Configuration Issues
|
||||
|
||||
### Issue: tool_choice Not Working
|
||||
|
||||
**Symptom**: Error about unsupported tool_choice value
|
||||
|
||||
**Solution**: Only use `tool_choice="auto"`, other values not supported:
|
||||
```python
|
||||
# GOOD
|
||||
tool_choice="auto"
|
||||
|
||||
# BAD - will fail
|
||||
tool_choice="required"
|
||||
tool_choice={"type": "function", "function": {"name": "my_func"}}
|
||||
```
|
||||
|
||||
### Issue: Tools Not Being Called
|
||||
|
||||
**Symptoms**:
|
||||
- Model describes tool usage in text
|
||||
- No tool_calls in response
|
||||
- Empty tool_calls array
|
||||
|
||||
**Checklist**:
|
||||
1. Verify `--tool-call-parser openai` flag is set
|
||||
2. Verify `--enable-auto-tool-choice` flag is set
|
||||
3. Check generation_config.json is up to date (see model-updates.md)
|
||||
4. Try simpler tool schemas first
|
||||
5. Check vLLM logs for parsing errors
|
||||
|
||||
### Issue: Token Parsing Errors
|
||||
|
||||
**Error**: `openai_harmony.HarmonyError: Unexpected token X`
|
||||
|
||||
**Solutions**:
|
||||
1. Update model files (see model-updates.md)
|
||||
2. Verify vLLM version is recent
|
||||
3. Check vLLM startup logs for warnings
|
||||
4. Restart vLLM server after any config changes
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### GPU Memory
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--gpu-memory-utilization 0.9
|
||||
```
|
||||
|
||||
### Tensor Parallelism
|
||||
|
||||
For multi-GPU:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tensor-parallel-size 2
|
||||
```
|
||||
|
||||
### Batching
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--max-num-batched-tokens 8192 \
|
||||
--max-num-seqs 256
|
||||
```
|
||||
|
||||
## Testing Your Configuration
|
||||
|
||||
### Basic Test
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-oss-20b",
|
||||
"messages": [{"role": "user", "content": "Calculate 15 * 7"}],
|
||||
"tools": [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"description": "Perform math",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"expr": {"type": "string"}}
|
||||
}
|
||||
}
|
||||
}],
|
||||
"tool_choice": "auto"
|
||||
}'
|
||||
```
|
||||
|
||||
### Expected Response
|
||||
|
||||
Success indicates tool_calls array with function call:
|
||||
```json
|
||||
{
|
||||
"choices": [{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": null,
|
||||
"tool_calls": [{
|
||||
"id": "call_123",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"arguments": "{\"expr\": \"15 * 7\"}"
|
||||
}
|
||||
}]
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Failure Indicators
|
||||
|
||||
- `content` field has text describing calculation instead of null
|
||||
- `tool_calls` is empty or null
|
||||
- Error in response about tool parsing
|
||||
- HarmonyError in vLLM logs
|
||||
|
||||
## Cross-References
|
||||
|
||||
- Model file updates: See model-updates.md
|
||||
- Known issues: See known-issues.md
|
||||
Reference in New Issue
Block a user