gh-bbrowning-bbrowning-clau…/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md

# Tool Calling Configuration for gpt-oss with vLLM

## Required vLLM Server Flags

For gpt-oss tool calling to work, vLLM must be started with specific flags.

### Minimal Configuration

```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice
```

### Full Configuration with Tool Server

```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server demo \
  --max-model-len 8192 \
  --dtype auto
```

## Flag Explanations

### --tool-call-parser openai
- **Required**: Yes
- **Purpose**: Uses OpenAI-compatible tool calling format
- **Effect**: Enables proper parsing of tool call tokens
- **Alternatives**: None for gpt-oss compatibility

### --enable-auto-tool-choice
- **Required**: Yes
- **Purpose**: Allows automatic tool selection
- **Effect**: Model can choose which tool to call
- **Note**: Only `tool_choice='auto'` is supported

### --tool-server
- **Required**: Optional, but needed for demo tools
- **Options**:
  - `demo`: Built-in demo tools (browser, Python interpreter)
  - `ip:port`: Custom MCP tool server
  - Multiple servers: `ip1:port1,ip2:port2`

### --max-model-len
- **Required**: No
- **Purpose**: Sets maximum context length
- **Recommended**: 8192 or higher for tool calling contexts
- **Effect**: Prevents truncation during multi-turn tool conversations

## Tool Server Options

### Demo Tool Server

Requires gpt-oss library:
```bash
pip install gpt-oss
```

Provides:
- Web browser tool
- Python interpreter tool

Start command:
```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server demo
```

### MCP Tool Servers

Start vLLM with MCP server URLs:
```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server localhost:5000,localhost:5001
```

Requirements:
- MCP servers must be running before vLLM starts
- Must implement MCP protocol
- Should return results in expected format

### No Tool Server (Client-Managed Tools)

For client-side tool management (e.g., llama-stack with MCP):
```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice
```

Tools are provided via API request, not server config.

## Environment Variables

### Search Tools

For demo tool server with search:
```bash
export EXA_API_KEY="your-exa-api-key"
```

### Python Execution

For safe Python execution (recommended for demo):
```bash
export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
```

**Warning**: Demo Python execution is for testing only.

## Client Configuration

### OpenAI-Compatible Clients

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # vLLM doesn't require auth by default
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    }],
    tool_choice="auto"  # MUST be 'auto' - only supported value
)
```

### llama-stack Configuration

Example run.yaml for llama-stack with vLLM:
```yaml
inference:
  - provider_id: vllm-provider
    provider_type: remote::vllm
    config:
      url: http://localhost:8000/v1
      # No auth_credential needed if vLLM has no auth

tool_runtime:
  - provider_id: mcp-provider
    provider_type: mcp
    config:
      servers:
        - server_name: my-tools
          url: http://localhost:5000
```

## Common Configuration Issues

### Issue: tool_choice Not Working

**Symptom**: Error about unsupported tool_choice value

**Solution**: Only use `tool_choice="auto"`, other values not supported:
```python
# GOOD
tool_choice="auto"

# BAD - will fail
tool_choice="required"
tool_choice={"type": "function", "function": {"name": "my_func"}}
```

### Issue: Tools Not Being Called

**Symptoms**:
- Model describes tool usage in text
- No tool_calls in response
- Empty tool_calls array

**Checklist**:
1. Verify `--tool-call-parser openai` flag is set
2. Verify `--enable-auto-tool-choice` flag is set
3. Check generation_config.json is up to date (see model-updates.md)
4. Try simpler tool schemas first
5. Check vLLM logs for parsing errors

### Issue: Token Parsing Errors

**Error**: `openai_harmony.HarmonyError: Unexpected token X`

**Solutions**:
1. Update model files (see model-updates.md)
2. Verify vLLM version is recent
3. Check vLLM startup logs for warnings
4. Restart vLLM server after any config changes

## Performance Tuning

### GPU Memory

```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --gpu-memory-utilization 0.9
```

### Tensor Parallelism

For multi-GPU:
```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2
```

### Batching

```bash
vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 256
```

## Testing Your Configuration

### Basic Test

```bash
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [{"role": "user", "content": "Calculate 15 * 7"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform math",
        "parameters": {
          "type": "object",
          "properties": {"expr": {"type": "string"}}
        }
      }
    }],
    "tool_choice": "auto"
  }'
```

### Expected Response

Success indicates tool_calls array with function call:
```json
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_123",
        "type": "function",
        "function": {
          "name": "calculator",
          "arguments": "{\"expr\": \"15 * 7\"}"
        }
      }]
    }
  }]
}
```

### Failure Indicators

- `content` field has text describing calculation instead of null
- `tool_calls` is empty or null
- Error in response about tool parsing
- HarmonyError in vLLM logs

## Cross-References

- Model file updates: See model-updates.md
- Known issues: See known-issues.md