zhongwei/gh-bbrowning-bbrowning-claude-marketplace-bbrowning-claude

Fork 0

Files

Zhongwei Li cc49e355bc Initial commit

2025-11-29 18:00:42 +08:00

6.5 KiB

Raw Blame History

Tool Calling Configuration for gpt-oss with vLLM

Required vLLM Server Flags

For gpt-oss tool calling to work, vLLM must be started with specific flags.

Minimal Configuration

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Full Configuration with Tool Server

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server demo \
  --max-model-len 8192 \
  --dtype auto

Flag Explanations

--tool-call-parser openai

Required: Yes
Purpose: Uses OpenAI-compatible tool calling format
Effect: Enables proper parsing of tool call tokens
Alternatives: None for gpt-oss compatibility

--enable-auto-tool-choice

Required: Yes
Purpose: Allows automatic tool selection
Effect: Model can choose which tool to call
Note: Only tool_choice='auto' is supported

--tool-server

Required: Optional, but needed for demo tools
Options:
- demo: Built-in demo tools (browser, Python interpreter)
- ip:port: Custom MCP tool server
- Multiple servers: ip1:port1,ip2:port2

--max-model-len

Required: No
Purpose: Sets maximum context length
Recommended: 8192 or higher for tool calling contexts
Effect: Prevents truncation during multi-turn tool conversations

Tool Server Options

Demo Tool Server

Requires gpt-oss library:

pip install gpt-oss

Provides:

Web browser tool
Python interpreter tool

Start command:

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server demo

MCP Tool Servers

Start vLLM with MCP server URLs:

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tool-server localhost:5000,localhost:5001

Requirements:

MCP servers must be running before vLLM starts
Must implement MCP protocol
Should return results in expected format

No Tool Server (Client-Managed Tools)

For client-side tool management (e.g., llama-stack with MCP):

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Tools are provided via API request, not server config.

Environment Variables

Search Tools

For demo tool server with search:

export EXA_API_KEY="your-exa-api-key"

Python Execution

For safe Python execution (recommended for demo):

export PYTHON_EXECUTION_BACKEND=dangerously_use_uv

Warning: Demo Python execution is for testing only.

Client Configuration

OpenAI-Compatible Clients

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # vLLM doesn't require auth by default
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        }
    }],
    tool_choice="auto"  # MUST be 'auto' - only supported value
)

llama-stack Configuration

Example run.yaml for llama-stack with vLLM:

inference:
  - provider_id: vllm-provider
    provider_type: remote::vllm
    config:
      url: http://localhost:8000/v1
      # No auth_credential needed if vLLM has no auth

tool_runtime:
  - provider_id: mcp-provider
    provider_type: mcp
    config:
      servers:
        - server_name: my-tools
          url: http://localhost:5000

Common Configuration Issues

Issue: tool_choice Not Working

Symptom: Error about unsupported tool_choice value

Solution: Only use tool_choice="auto", other values not supported:

# GOOD
tool_choice="auto"

# BAD - will fail
tool_choice="required"
tool_choice={"type": "function", "function": {"name": "my_func"}}

Issue: Tools Not Being Called

Symptoms:

Model describes tool usage in text
No tool_calls in response
Empty tool_calls array

Checklist:

Verify --tool-call-parser openai flag is set
Verify --enable-auto-tool-choice flag is set
Check generation_config.json is up to date (see model-updates.md)
Try simpler tool schemas first
Check vLLM logs for parsing errors

Issue: Token Parsing Errors

Error: openai_harmony.HarmonyError: Unexpected token X

Solutions:

Update model files (see model-updates.md)
Verify vLLM version is recent
Check vLLM startup logs for warnings
Restart vLLM server after any config changes

Performance Tuning

GPU Memory

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --gpu-memory-utilization 0.9

Tensor Parallelism

For multi-GPU:

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2

Batching

vllm serve openai/gpt-oss-20b \
  --tool-call-parser openai \
  --enable-auto-tool-choice \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 256

Testing Your Configuration

Basic Test

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [{"role": "user", "content": "Calculate 15 * 7"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform math",
        "parameters": {
          "type": "object",
          "properties": {"expr": {"type": "string"}}
        }
      }
    }],
    "tool_choice": "auto"
  }'

Expected Response

Success indicates tool_calls array with function call:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_123",
        "type": "function",
        "function": {
          "name": "calculator",
          "arguments": "{\"expr\": \"15 * 7\"}"
        }
      }]
    }
  }]
}

Failure Indicators

content field has text describing calculation instead of null
tool_calls is empty or null
Error in response about tool parsing
HarmonyError in vLLM logs

Cross-References

Model file updates: See model-updates.md
Known issues: See known-issues.md

6.5 KiB Raw Blame History