Initial commit
This commit is contained in:
111
skills/gpt-oss-troubleshooting/reference/known-issues.md
Normal file
111
skills/gpt-oss-troubleshooting/reference/known-issues.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Known GitHub Issues for gpt-oss and vLLM
|
||||
|
||||
## Active Issues
|
||||
|
||||
### vLLM Repository
|
||||
|
||||
#### Issue #22519: Token Error with gpt-oss-20b Tool Calls
|
||||
- **URL**: https://github.com/vllm-project/vllm/issues/22519
|
||||
- **Error**: `Unexpected token 12606 while expecting start token 200006`
|
||||
- **Status**: Open, To Triage
|
||||
- **Model**: gpt-oss-20b
|
||||
- **Symptoms**:
|
||||
- Error occurs after model returns token 200012
|
||||
- Token 12606 = "comment"
|
||||
- Hypothesis: Model incorrectly splitting "commentary" into "comment" + "ary"
|
||||
- **Workaround**: None currently documented
|
||||
|
||||
#### Issue #22515: Same Error, Fixed by Config Update
|
||||
- **URL**: https://github.com/vllm-project/vllm/issues/22515
|
||||
- **Error**: Same token parsing error
|
||||
- **Status**: Open
|
||||
- **Fix**: Update generation_config.json from HuggingFace
|
||||
- Specific commit: 8b193b0ef83bd41b40eb71fee8f1432315e02a3e
|
||||
- User andresC98 confirmed this resolved the issue
|
||||
- **Version**: Reported in vLLM v0.10.2
|
||||
|
||||
#### Issue #22578: gpt-oss-120b Tool Call Support
|
||||
- **URL**: https://github.com/vllm-project/vllm/issues/22578
|
||||
- **Error**: Chat Completions endpoint tool_call not working
|
||||
- **Status**: Open
|
||||
- **Model**: gpt-oss-120b
|
||||
- **Symptoms**: Tool calling doesn't work correctly via /v1/chat/completions
|
||||
|
||||
#### Issue #22337: Empty Tool Calls Array
|
||||
- **URL**: https://github.com/vllm-project/vllm/issues/22337
|
||||
- **Error**: tool_calls returning empty arrays
|
||||
- **Status**: Open
|
||||
- **Model**: gpt-oss-120b
|
||||
- **Symptoms**: Content appears in wrong format, tool_calls=[]
|
||||
|
||||
#### Issue #23567: Unexpected Tokens in Message Header
|
||||
- **URL**: https://github.com/vllm-project/vllm/issues/23567
|
||||
- **Error**: `openai_harmony.HarmonyError: unexpected tokens remaining in message header`
|
||||
- **Status**: Open
|
||||
- **Symptoms**: Occurs in multi-turn conversations with gpt-oss-120b
|
||||
- **Version**: vLLM v0.10.1 and v0.10.1.1
|
||||
|
||||
#### PR #24787: Tool Call Turn Tracking
|
||||
- **URL**: https://github.com/vllm-project/vllm/pull/24787
|
||||
- **Title**: Pass toolcall turn to kv cache manager
|
||||
- **Status**: Merged (September 2025)
|
||||
- **Description**: Adds toolcall_turn parameter for tracking turns in tool-calling conversations
|
||||
- **Impact**: Enables better prefix cache statistics for tool calling
|
||||
|
||||
### HuggingFace Discussions
|
||||
|
||||
#### gpt-oss-20b Discussion #80: Tool Calling Configuration
|
||||
- **URL**: https://huggingface.co/openai/gpt-oss-20b/discussions/80
|
||||
- **Summary**: Community discussion about tool calling best practices
|
||||
- **Key Findings**:
|
||||
- Explicit tool listing in system prompt improves results
|
||||
- Better results with tool_choice='required' or 'auto'
|
||||
- Avoid requiring JSON response format
|
||||
- Configuration and prompt engineering significantly impact tool calling behavior
|
||||
|
||||
#### gpt-oss-120b Discussion #69: Chat Template Spec Errors
|
||||
- **URL**: https://huggingface.co/openai/gpt-oss-120b/discussions/69
|
||||
- **Summary**: Errors in chat template compared to spec
|
||||
- **Impact**: May affect tool calling format
|
||||
|
||||
### openai/harmony Repository
|
||||
|
||||
#### Issue #33: EOS Error While Waiting for Message Header
|
||||
- **URL**: https://github.com/openai/harmony/issues/33
|
||||
- **Error**: `HarmonyError: Unexpected EOS while waiting for message header to complete`
|
||||
- **Status**: Open
|
||||
- **Context**: Core Harmony parser issue affecting message parsing
|
||||
|
||||
## Error Pattern Summary
|
||||
|
||||
### Token Mismatch Errors
|
||||
- **Pattern**: `Unexpected token X while expecting start token Y`
|
||||
- **Root Cause**: Model generating text tokens instead of Harmony control tokens
|
||||
- **Common Triggers**: Tool calling, multi-turn conversations
|
||||
- **Primary Fix**: Update generation_config.json
|
||||
|
||||
### Streaming Errors
|
||||
- **Pattern**: Parse failures during streaming responses
|
||||
- **Root Cause**: Incompatibility between request format and vLLM token generation
|
||||
- **Affected**: Both 20b and 120b models
|
||||
|
||||
### Tool Calling Failures
|
||||
- **Pattern**: Empty tool_calls arrays or text descriptions instead of calls
|
||||
- **Root Cause**: Configuration issues or outdated model files
|
||||
- **Primary Fix**: Correct vLLM flags and update generation_config.json
|
||||
|
||||
## Version Compatibility
|
||||
|
||||
### vLLM Versions
|
||||
- **v0.10.2**: Multiple token parsing errors reported
|
||||
- **v0.10.1/v0.10.1.1**: Multi-turn conversation errors
|
||||
- **Latest**: Check for fixes in newer releases
|
||||
|
||||
### Recommended Actions by Version
|
||||
- **Pre-v0.11**: Update to latest, refresh model files
|
||||
- **v0.11+**: Verify tool calling flags are set correctly
|
||||
|
||||
## Cross-References
|
||||
|
||||
- Model file updates: See model-updates.md
|
||||
- Tool calling configuration: See tool-calling-setup.md
|
||||
216
skills/gpt-oss-troubleshooting/reference/model-updates.md
Normal file
216
skills/gpt-oss-troubleshooting/reference/model-updates.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# Updating gpt-oss Model Files
|
||||
|
||||
## Why Update Model Files?
|
||||
|
||||
The `openai_harmony.HarmonyError: Unexpected token` errors are often caused by outdated `generation_config.json` files. HuggingFace updates these files to fix token parsing issues.
|
||||
|
||||
## Current Configuration Files
|
||||
|
||||
### gpt-oss-20b generation_config.json
|
||||
|
||||
Latest version includes:
|
||||
```json
|
||||
{
|
||||
"bos_token_id": 199998,
|
||||
"do_sample": true,
|
||||
"eos_token_id": [
|
||||
200002,
|
||||
199999,
|
||||
200012
|
||||
],
|
||||
"pad_token_id": 199999,
|
||||
"transformers_version": "4.55.0.dev0"
|
||||
}
|
||||
```
|
||||
|
||||
**Key elements**:
|
||||
- **eos_token_id**: Multiple EOS tokens including 200012 (tool call completion)
|
||||
- **do_sample**: Enabled for generation diversity
|
||||
- **transformers_version**: Indicates compatible transformers version
|
||||
|
||||
### gpt-oss-120b Critical Commit
|
||||
|
||||
**Commit**: 8b193b0ef83bd41b40eb71fee8f1432315e02a3e
|
||||
- Fixed generation_config.json
|
||||
- Confirmed to resolve token parsing errors by user andresC98
|
||||
- Applied to gpt-oss-120b model
|
||||
|
||||
## How to Update Model Files
|
||||
|
||||
### Method 1: Re-download with HuggingFace CLI
|
||||
|
||||
```bash
|
||||
# Install or update huggingface-hub
|
||||
pip install --upgrade huggingface-hub
|
||||
|
||||
# For gpt-oss-20b
|
||||
huggingface-cli download openai/gpt-oss-20b --local-dir ./gpt-oss-20b
|
||||
|
||||
# For gpt-oss-120b
|
||||
huggingface-cli download openai/gpt-oss-120b --local-dir ./gpt-oss-120b
|
||||
```
|
||||
|
||||
### Method 2: Manual Update via Web
|
||||
|
||||
1. Visit HuggingFace model page:
|
||||
- gpt-oss-20b: https://huggingface.co/openai/gpt-oss-20b
|
||||
- gpt-oss-120b: https://huggingface.co/openai/gpt-oss-120b
|
||||
|
||||
2. Navigate to "Files and versions" tab
|
||||
|
||||
3. Download latest `generation_config.json`
|
||||
|
||||
4. Replace in your local model directory:
|
||||
```bash
|
||||
# Find your model directory (varies by vLLM installation)
|
||||
# Common locations:
|
||||
# ~/.cache/huggingface/hub/models--openai--gpt-oss-20b/
|
||||
# ./models/gpt-oss-20b/
|
||||
|
||||
# Replace the file
|
||||
cp ~/Downloads/generation_config.json /path/to/model/directory/
|
||||
```
|
||||
|
||||
### Method 3: Update with git (if model was cloned)
|
||||
|
||||
```bash
|
||||
cd /path/to/model/directory
|
||||
git pull origin main
|
||||
```
|
||||
|
||||
## Verification Steps
|
||||
|
||||
After updating:
|
||||
|
||||
1. **Check file contents**:
|
||||
```bash
|
||||
cat generation_config.json
|
||||
```
|
||||
|
||||
Verify it matches the current version shown above.
|
||||
|
||||
2. **Check modification date**:
|
||||
```bash
|
||||
ls -l generation_config.json
|
||||
```
|
||||
|
||||
Should be recent (after the commit date).
|
||||
|
||||
3. **Restart vLLM server**:
|
||||
```bash
|
||||
# Stop existing server
|
||||
# Start with correct flags (see tool-calling-setup.md)
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
4. **Test tool calling**:
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(base_url="http://localhost:8000/v1")
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="openai/gpt-oss-20b",
|
||||
messages=[{"role": "user", "content": "What's the weather?"}],
|
||||
tools=[{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"description": "Get the weather",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}]
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Troubleshooting Update Issues
|
||||
|
||||
### vLLM Not Picking Up Changes
|
||||
|
||||
**Symptom**: Updated files but still getting errors
|
||||
|
||||
**Solutions**:
|
||||
1. Clear vLLM cache:
|
||||
```bash
|
||||
rm -rf ~/.cache/vllm/
|
||||
```
|
||||
|
||||
2. Restart vLLM with fresh model load:
|
||||
```bash
|
||||
# Use --download-dir to force specific directory
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--download-dir /path/to/models \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
3. Check vLLM is loading the correct model directory:
|
||||
- Look for model path in vLLM startup logs
|
||||
- Verify it matches where you updated files
|
||||
|
||||
### File Permission Issues
|
||||
|
||||
```bash
|
||||
# Ensure files are readable
|
||||
chmod 644 generation_config.json
|
||||
|
||||
# Check ownership
|
||||
ls -l generation_config.json
|
||||
```
|
||||
|
||||
### Multiple Model Copies
|
||||
|
||||
**Problem**: vLLM might be loading from a different location
|
||||
|
||||
**Solution**:
|
||||
1. Find all copies:
|
||||
```bash
|
||||
find ~/.cache -name "generation_config.json" -path "*/gpt-oss*"
|
||||
```
|
||||
|
||||
2. Update all copies or remove duplicates
|
||||
|
||||
3. Use explicit `--download-dir` flag when starting vLLM
|
||||
|
||||
## Additional Files to Check
|
||||
|
||||
While `generation_config.json` is the primary fix, also verify these files are current:
|
||||
|
||||
### config.json
|
||||
Contains model architecture configuration
|
||||
|
||||
### tokenizer_config.json
|
||||
Token encoding settings, including special tokens
|
||||
|
||||
### special_tokens_map.json
|
||||
Maps special token strings to IDs
|
||||
|
||||
**To update all**:
|
||||
```bash
|
||||
huggingface-cli download openai/gpt-oss-20b \
|
||||
--local-dir ./gpt-oss-20b \
|
||||
--force-download
|
||||
```
|
||||
|
||||
## When to Update
|
||||
|
||||
Update model files when:
|
||||
- Encountering token parsing errors
|
||||
- HuggingFace shows recent commits to model repo
|
||||
- vLLM error messages reference token IDs
|
||||
- After vLLM version upgrades
|
||||
- Community reports fixes via file updates
|
||||
|
||||
## Cross-References
|
||||
|
||||
- Known issues: See known-issues.md
|
||||
- vLLM configuration: See tool-calling-setup.md
|
||||
299
skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
Normal file
299
skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Tool Calling Configuration for gpt-oss with vLLM
|
||||
|
||||
## Required vLLM Server Flags
|
||||
|
||||
For gpt-oss tool calling to work, vLLM must be started with specific flags.
|
||||
|
||||
### Minimal Configuration
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
### Full Configuration with Tool Server
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server demo \
|
||||
--max-model-len 8192 \
|
||||
--dtype auto
|
||||
```
|
||||
|
||||
## Flag Explanations
|
||||
|
||||
### --tool-call-parser openai
|
||||
- **Required**: Yes
|
||||
- **Purpose**: Uses OpenAI-compatible tool calling format
|
||||
- **Effect**: Enables proper parsing of tool call tokens
|
||||
- **Alternatives**: None for gpt-oss compatibility
|
||||
|
||||
### --enable-auto-tool-choice
|
||||
- **Required**: Yes
|
||||
- **Purpose**: Allows automatic tool selection
|
||||
- **Effect**: Model can choose which tool to call
|
||||
- **Note**: Only `tool_choice='auto'` is supported
|
||||
|
||||
### --tool-server
|
||||
- **Required**: Optional, but needed for demo tools
|
||||
- **Options**:
|
||||
- `demo`: Built-in demo tools (browser, Python interpreter)
|
||||
- `ip:port`: Custom MCP tool server
|
||||
- Multiple servers: `ip1:port1,ip2:port2`
|
||||
|
||||
### --max-model-len
|
||||
- **Required**: No
|
||||
- **Purpose**: Sets maximum context length
|
||||
- **Recommended**: 8192 or higher for tool calling contexts
|
||||
- **Effect**: Prevents truncation during multi-turn tool conversations
|
||||
|
||||
## Tool Server Options
|
||||
|
||||
### Demo Tool Server
|
||||
|
||||
Requires gpt-oss library:
|
||||
```bash
|
||||
pip install gpt-oss
|
||||
```
|
||||
|
||||
Provides:
|
||||
- Web browser tool
|
||||
- Python interpreter tool
|
||||
|
||||
Start command:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server demo
|
||||
```
|
||||
|
||||
### MCP Tool Servers
|
||||
|
||||
Start vLLM with MCP server URLs:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tool-server localhost:5000,localhost:5001
|
||||
```
|
||||
|
||||
Requirements:
|
||||
- MCP servers must be running before vLLM starts
|
||||
- Must implement MCP protocol
|
||||
- Should return results in expected format
|
||||
|
||||
### No Tool Server (Client-Managed Tools)
|
||||
|
||||
For client-side tool management (e.g., llama-stack with MCP):
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice
|
||||
```
|
||||
|
||||
Tools are provided via API request, not server config.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Search Tools
|
||||
|
||||
For demo tool server with search:
|
||||
```bash
|
||||
export EXA_API_KEY="your-exa-api-key"
|
||||
```
|
||||
|
||||
### Python Execution
|
||||
|
||||
For safe Python execution (recommended for demo):
|
||||
```bash
|
||||
export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
|
||||
```
|
||||
|
||||
**Warning**: Demo Python execution is for testing only.
|
||||
|
||||
## Client Configuration
|
||||
|
||||
### OpenAI-Compatible Clients
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8000/v1",
|
||||
api_key="not-needed" # vLLM doesn't require auth by default
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="openai/gpt-oss-20b",
|
||||
messages=[{"role": "user", "content": "What's 2+2?"}],
|
||||
tools=[{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"description": "Perform calculations",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"expression": {"type": "string"}
|
||||
},
|
||||
"required": ["expression"]
|
||||
}
|
||||
}
|
||||
}],
|
||||
tool_choice="auto" # MUST be 'auto' - only supported value
|
||||
)
|
||||
```
|
||||
|
||||
### llama-stack Configuration
|
||||
|
||||
Example run.yaml for llama-stack with vLLM:
|
||||
```yaml
|
||||
inference:
|
||||
- provider_id: vllm-provider
|
||||
provider_type: remote::vllm
|
||||
config:
|
||||
url: http://localhost:8000/v1
|
||||
# No auth_credential needed if vLLM has no auth
|
||||
|
||||
tool_runtime:
|
||||
- provider_id: mcp-provider
|
||||
provider_type: mcp
|
||||
config:
|
||||
servers:
|
||||
- server_name: my-tools
|
||||
url: http://localhost:5000
|
||||
```
|
||||
|
||||
## Common Configuration Issues
|
||||
|
||||
### Issue: tool_choice Not Working
|
||||
|
||||
**Symptom**: Error about unsupported tool_choice value
|
||||
|
||||
**Solution**: Only use `tool_choice="auto"`, other values not supported:
|
||||
```python
|
||||
# GOOD
|
||||
tool_choice="auto"
|
||||
|
||||
# BAD - will fail
|
||||
tool_choice="required"
|
||||
tool_choice={"type": "function", "function": {"name": "my_func"}}
|
||||
```
|
||||
|
||||
### Issue: Tools Not Being Called
|
||||
|
||||
**Symptoms**:
|
||||
- Model describes tool usage in text
|
||||
- No tool_calls in response
|
||||
- Empty tool_calls array
|
||||
|
||||
**Checklist**:
|
||||
1. Verify `--tool-call-parser openai` flag is set
|
||||
2. Verify `--enable-auto-tool-choice` flag is set
|
||||
3. Check generation_config.json is up to date (see model-updates.md)
|
||||
4. Try simpler tool schemas first
|
||||
5. Check vLLM logs for parsing errors
|
||||
|
||||
### Issue: Token Parsing Errors
|
||||
|
||||
**Error**: `openai_harmony.HarmonyError: Unexpected token X`
|
||||
|
||||
**Solutions**:
|
||||
1. Update model files (see model-updates.md)
|
||||
2. Verify vLLM version is recent
|
||||
3. Check vLLM startup logs for warnings
|
||||
4. Restart vLLM server after any config changes
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### GPU Memory
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--gpu-memory-utilization 0.9
|
||||
```
|
||||
|
||||
### Tensor Parallelism
|
||||
|
||||
For multi-GPU:
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--tensor-parallel-size 2
|
||||
```
|
||||
|
||||
### Batching
|
||||
|
||||
```bash
|
||||
vllm serve openai/gpt-oss-20b \
|
||||
--tool-call-parser openai \
|
||||
--enable-auto-tool-choice \
|
||||
--max-num-batched-tokens 8192 \
|
||||
--max-num-seqs 256
|
||||
```
|
||||
|
||||
## Testing Your Configuration
|
||||
|
||||
### Basic Test
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-oss-20b",
|
||||
"messages": [{"role": "user", "content": "Calculate 15 * 7"}],
|
||||
"tools": [{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"description": "Perform math",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"expr": {"type": "string"}}
|
||||
}
|
||||
}
|
||||
}],
|
||||
"tool_choice": "auto"
|
||||
}'
|
||||
```
|
||||
|
||||
### Expected Response
|
||||
|
||||
Success indicates tool_calls array with function call:
|
||||
```json
|
||||
{
|
||||
"choices": [{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": null,
|
||||
"tool_calls": [{
|
||||
"id": "call_123",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "calculator",
|
||||
"arguments": "{\"expr\": \"15 * 7\"}"
|
||||
}
|
||||
}]
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Failure Indicators
|
||||
|
||||
- `content` field has text describing calculation instead of null
|
||||
- `tool_calls` is empty or null
|
||||
- Error in response about tool parsing
|
||||
- HarmonyError in vLLM logs
|
||||
|
||||
## Cross-References
|
||||
|
||||
- Model file updates: See model-updates.md
|
||||
- Known issues: See known-issues.md
|
||||
Reference in New Issue
Block a user