Initial commit

2025-11-29 18:00:42 +08:00
commit cc49e355bc
37 changed files with 10917 additions and 0 deletions
--- a/skills/gpt-oss-troubleshooting/reference/known-issues.md
+++ b/skills/gpt-oss-troubleshooting/reference/known-issues.md
@@ -0,0 +1,111 @@
+# Known GitHub Issues for gpt-oss and vLLM
+
+## Active Issues
+
+### vLLM Repository
+
+#### Issue #22519: Token Error with gpt-oss-20b Tool Calls
+- **URL**: https://github.com/vllm-project/vllm/issues/22519
+- **Error**: `Unexpected token 12606 while expecting start token 200006`
+- **Status**: Open, To Triage
+- **Model**: gpt-oss-20b
+- **Symptoms**:
+  - Error occurs after model returns token 200012
+  - Token 12606 = "comment"
+  - Hypothesis: Model incorrectly splitting "commentary" into "comment" + "ary"
+- **Workaround**: None currently documented
+
+#### Issue #22515: Same Error, Fixed by Config Update
+- **URL**: https://github.com/vllm-project/vllm/issues/22515
+- **Error**: Same token parsing error
+- **Status**: Open
+- **Fix**: Update generation_config.json from HuggingFace
+  - Specific commit: 8b193b0ef83bd41b40eb71fee8f1432315e02a3e
+  - User andresC98 confirmed this resolved the issue
+- **Version**: Reported in vLLM v0.10.2
+
+#### Issue #22578: gpt-oss-120b Tool Call Support
+- **URL**: https://github.com/vllm-project/vllm/issues/22578
+- **Error**: Chat Completions endpoint tool_call not working
+- **Status**: Open
+- **Model**: gpt-oss-120b
+- **Symptoms**: Tool calling doesn't work correctly via /v1/chat/completions
+
+#### Issue #22337: Empty Tool Calls Array
+- **URL**: https://github.com/vllm-project/vllm/issues/22337
+- **Error**: tool_calls returning empty arrays
+- **Status**: Open
+- **Model**: gpt-oss-120b
+- **Symptoms**: Content appears in wrong format, tool_calls=[]
+
+#### Issue #23567: Unexpected Tokens in Message Header
+- **URL**: https://github.com/vllm-project/vllm/issues/23567
+- **Error**: `openai_harmony.HarmonyError: unexpected tokens remaining in message header`
+- **Status**: Open
+- **Symptoms**: Occurs in multi-turn conversations with gpt-oss-120b
+- **Version**: vLLM v0.10.1 and v0.10.1.1
+
+#### PR #24787: Tool Call Turn Tracking
+- **URL**: https://github.com/vllm-project/vllm/pull/24787
+- **Title**: Pass toolcall turn to kv cache manager
+- **Status**: Merged (September 2025)
+- **Description**: Adds toolcall_turn parameter for tracking turns in tool-calling conversations
+- **Impact**: Enables better prefix cache statistics for tool calling
+
+### HuggingFace Discussions
+
+#### gpt-oss-20b Discussion #80: Tool Calling Configuration
+- **URL**: https://huggingface.co/openai/gpt-oss-20b/discussions/80
+- **Summary**: Community discussion about tool calling best practices
+- **Key Findings**:
+  - Explicit tool listing in system prompt improves results
+  - Better results with tool_choice='required' or 'auto'
+  - Avoid requiring JSON response format
+  - Configuration and prompt engineering significantly impact tool calling behavior
+
+#### gpt-oss-120b Discussion #69: Chat Template Spec Errors
+- **URL**: https://huggingface.co/openai/gpt-oss-120b/discussions/69
+- **Summary**: Errors in chat template compared to spec
+- **Impact**: May affect tool calling format
+
+### openai/harmony Repository
+
+#### Issue #33: EOS Error While Waiting for Message Header
+- **URL**: https://github.com/openai/harmony/issues/33
+- **Error**: `HarmonyError: Unexpected EOS while waiting for message header to complete`
+- **Status**: Open
+- **Context**: Core Harmony parser issue affecting message parsing
+
+## Error Pattern Summary
+
+### Token Mismatch Errors
+- **Pattern**: `Unexpected token X while expecting start token Y`
+- **Root Cause**: Model generating text tokens instead of Harmony control tokens
+- **Common Triggers**: Tool calling, multi-turn conversations
+- **Primary Fix**: Update generation_config.json
+
+### Streaming Errors
+- **Pattern**: Parse failures during streaming responses
+- **Root Cause**: Incompatibility between request format and vLLM token generation
+- **Affected**: Both 20b and 120b models
+
+### Tool Calling Failures
+- **Pattern**: Empty tool_calls arrays or text descriptions instead of calls
+- **Root Cause**: Configuration issues or outdated model files
+- **Primary Fix**: Correct vLLM flags and update generation_config.json
+
+## Version Compatibility
+
+### vLLM Versions
+- **v0.10.2**: Multiple token parsing errors reported
+- **v0.10.1/v0.10.1.1**: Multi-turn conversation errors
+- **Latest**: Check for fixes in newer releases
+
+### Recommended Actions by Version
+- **Pre-v0.11**: Update to latest, refresh model files
+- **v0.11+**: Verify tool calling flags are set correctly
+
+## Cross-References
+
+- Model file updates: See model-updates.md
+- Tool calling configuration: See tool-calling-setup.md
--- a/skills/gpt-oss-troubleshooting/reference/model-updates.md
+++ b/skills/gpt-oss-troubleshooting/reference/model-updates.md
@@ -0,0 +1,216 @@
+# Updating gpt-oss Model Files
+
+## Why Update Model Files?
+
+The `openai_harmony.HarmonyError: Unexpected token` errors are often caused by outdated `generation_config.json` files. HuggingFace updates these files to fix token parsing issues.
+
+## Current Configuration Files
+
+### gpt-oss-20b generation_config.json
+
+Latest version includes:
+```json
+{
+  "bos_token_id": 199998,
+  "do_sample": true,
+  "eos_token_id": [
+    200002,
+    199999,
+    200012
+  ],
+  "pad_token_id": 199999,
+  "transformers_version": "4.55.0.dev0"
+}
+```
+
+**Key elements**:
+- **eos_token_id**: Multiple EOS tokens including 200012 (tool call completion)
+- **do_sample**: Enabled for generation diversity
+- **transformers_version**: Indicates compatible transformers version
+
+### gpt-oss-120b Critical Commit
+
+**Commit**: 8b193b0ef83bd41b40eb71fee8f1432315e02a3e
+- Fixed generation_config.json
+- Confirmed to resolve token parsing errors by user andresC98
+- Applied to gpt-oss-120b model
+
+## How to Update Model Files
+
+### Method 1: Re-download with HuggingFace CLI
+
+```bash
+# Install or update huggingface-hub
+pip install --upgrade huggingface-hub
+
+# For gpt-oss-20b
+huggingface-cli download openai/gpt-oss-20b --local-dir ./gpt-oss-20b
+
+# For gpt-oss-120b
+huggingface-cli download openai/gpt-oss-120b --local-dir ./gpt-oss-120b
+```
+
+### Method 2: Manual Update via Web
+
+1. Visit HuggingFace model page:
+   - gpt-oss-20b: https://huggingface.co/openai/gpt-oss-20b
+   - gpt-oss-120b: https://huggingface.co/openai/gpt-oss-120b
+
+2. Navigate to "Files and versions" tab
+
+3. Download latest `generation_config.json`
+
+4. Replace in your local model directory:
+   ```bash
+   # Find your model directory (varies by vLLM installation)
+   # Common locations:
+   # ~/.cache/huggingface/hub/models--openai--gpt-oss-20b/
+   # ./models/gpt-oss-20b/
+
+   # Replace the file
+   cp ~/Downloads/generation_config.json /path/to/model/directory/
+   ```
+
+### Method 3: Update with git (if model was cloned)
+
+```bash
+cd /path/to/model/directory
+git pull origin main
+```
+
+## Verification Steps
+
+After updating:
+
+1. **Check file contents**:
+   ```bash
+   cat generation_config.json
+   ```
+
+   Verify it matches the current version shown above.
+
+2. **Check modification date**:
+   ```bash
+   ls -l generation_config.json
+   ```
+
+   Should be recent (after the commit date).
+
+3. **Restart vLLM server**:
+   ```bash
+   # Stop existing server
+   # Start with correct flags (see tool-calling-setup.md)
+   vllm serve openai/gpt-oss-20b \
+     --tool-call-parser openai \
+     --enable-auto-tool-choice
+   ```
+
+4. **Test tool calling**:
+   ```python
+   from openai import OpenAI
+
+   client = OpenAI(base_url="http://localhost:8000/v1")
+
+   response = client.chat.completions.create(
+       model="openai/gpt-oss-20b",
+       messages=[{"role": "user", "content": "What's the weather?"}],
+       tools=[{
+           "type": "function",
+           "function": {
+               "name": "get_weather",
+               "description": "Get the weather",
+               "parameters": {
+                   "type": "object",
+                   "properties": {
+                       "location": {"type": "string"}
+                   }
+               }
+           }
+       }]
+   )
+
+   print(response)
+   ```
+
+## Troubleshooting Update Issues
+
+### vLLM Not Picking Up Changes
+
+**Symptom**: Updated files but still getting errors
+
+**Solutions**:
+1. Clear vLLM cache:
+   ```bash
+   rm -rf ~/.cache/vllm/
+   ```
+
+2. Restart vLLM with fresh model load:
+   ```bash
+   # Use --download-dir to force specific directory
+   vllm serve openai/gpt-oss-20b \
+     --download-dir /path/to/models \
+     --tool-call-parser openai \
+     --enable-auto-tool-choice
+   ```
+
+3. Check vLLM is loading the correct model directory:
+   - Look for model path in vLLM startup logs
+   - Verify it matches where you updated files
+
+### File Permission Issues
+
+```bash
+# Ensure files are readable
+chmod 644 generation_config.json
+
+# Check ownership
+ls -l generation_config.json
+```
+
+### Multiple Model Copies
+
+**Problem**: vLLM might be loading from a different location
+
+**Solution**:
+1. Find all copies:
+   ```bash
+   find ~/.cache -name "generation_config.json" -path "*/gpt-oss*"
+   ```
+
+2. Update all copies or remove duplicates
+
+3. Use explicit `--download-dir` flag when starting vLLM
+
+## Additional Files to Check
+
+While `generation_config.json` is the primary fix, also verify these files are current:
+
+### config.json
+Contains model architecture configuration
+
+### tokenizer_config.json
+Token encoding settings, including special tokens
+
+### special_tokens_map.json
+Maps special token strings to IDs
+
+**To update all**:
+```bash
+huggingface-cli download openai/gpt-oss-20b \
+  --local-dir ./gpt-oss-20b \
+  --force-download
+```
+
+## When to Update
+
+Update model files when:
+- Encountering token parsing errors
+- HuggingFace shows recent commits to model repo
+- vLLM error messages reference token IDs
+- After vLLM version upgrades
+- Community reports fixes via file updates
+
+## Cross-References
+
+- Known issues: See known-issues.md
+- vLLM configuration: See tool-calling-setup.md
--- a/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
+++ b/skills/gpt-oss-troubleshooting/reference/tool-calling-setup.md
@@ -0,0 +1,299 @@
+# Tool Calling Configuration for gpt-oss with vLLM
+
+## Required vLLM Server Flags
+
+For gpt-oss tool calling to work, vLLM must be started with specific flags.
+
+### Minimal Configuration
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice
+```
+
+### Full Configuration with Tool Server
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server demo \
+  --max-model-len 8192 \
+  --dtype auto
+```
+
+## Flag Explanations
+
+### --tool-call-parser openai
+- **Required**: Yes
+- **Purpose**: Uses OpenAI-compatible tool calling format
+- **Effect**: Enables proper parsing of tool call tokens
+- **Alternatives**: None for gpt-oss compatibility
+
+### --enable-auto-tool-choice
+- **Required**: Yes
+- **Purpose**: Allows automatic tool selection
+- **Effect**: Model can choose which tool to call
+- **Note**: Only `tool_choice='auto'` is supported
+
+### --tool-server
+- **Required**: Optional, but needed for demo tools
+- **Options**:
+  - `demo`: Built-in demo tools (browser, Python interpreter)
+  - `ip:port`: Custom MCP tool server
+  - Multiple servers: `ip1:port1,ip2:port2`
+
+### --max-model-len
+- **Required**: No
+- **Purpose**: Sets maximum context length
+- **Recommended**: 8192 or higher for tool calling contexts
+- **Effect**: Prevents truncation during multi-turn tool conversations
+
+## Tool Server Options
+
+### Demo Tool Server
+
+Requires gpt-oss library:
+```bash
+pip install gpt-oss
+```
+
+Provides:
+- Web browser tool
+- Python interpreter tool
+
+Start command:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server demo
+```
+
+### MCP Tool Servers
+
+Start vLLM with MCP server URLs:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tool-server localhost:5000,localhost:5001
+```
+
+Requirements:
+- MCP servers must be running before vLLM starts
+- Must implement MCP protocol
+- Should return results in expected format
+
+### No Tool Server (Client-Managed Tools)
+
+For client-side tool management (e.g., llama-stack with MCP):
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice
+```
+
+Tools are provided via API request, not server config.
+
+## Environment Variables
+
+### Search Tools
+
+For demo tool server with search:
+```bash
+export EXA_API_KEY="your-exa-api-key"
+```
+
+### Python Execution
+
+For safe Python execution (recommended for demo):
+```bash
+export PYTHON_EXECUTION_BACKEND=dangerously_use_uv
+```
+
+**Warning**: Demo Python execution is for testing only.
+
+## Client Configuration
+
+### OpenAI-Compatible Clients
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="not-needed"  # vLLM doesn't require auth by default
+)
+
+response = client.chat.completions.create(
+    model="openai/gpt-oss-20b",
+    messages=[{"role": "user", "content": "What's 2+2?"}],
+    tools=[{
+        "type": "function",
+        "function": {
+            "name": "calculator",
+            "description": "Perform calculations",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "expression": {"type": "string"}
+                },
+                "required": ["expression"]
+            }
+        }
+    }],
+    tool_choice="auto"  # MUST be 'auto' - only supported value
+)
+```
+
+### llama-stack Configuration
+
+Example run.yaml for llama-stack with vLLM:
+```yaml
+inference:
+  - provider_id: vllm-provider
+    provider_type: remote::vllm
+    config:
+      url: http://localhost:8000/v1
+      # No auth_credential needed if vLLM has no auth
+
+tool_runtime:
+  - provider_id: mcp-provider
+    provider_type: mcp
+    config:
+      servers:
+        - server_name: my-tools
+          url: http://localhost:5000
+```
+
+## Common Configuration Issues
+
+### Issue: tool_choice Not Working
+
+**Symptom**: Error about unsupported tool_choice value
+
+**Solution**: Only use `tool_choice="auto"`, other values not supported:
+```python
+# GOOD
+tool_choice="auto"
+
+# BAD - will fail
+tool_choice="required"
+tool_choice={"type": "function", "function": {"name": "my_func"}}
+```
+
+### Issue: Tools Not Being Called
+
+**Symptoms**:
+- Model describes tool usage in text
+- No tool_calls in response
+- Empty tool_calls array
+
+**Checklist**:
+1. Verify `--tool-call-parser openai` flag is set
+2. Verify `--enable-auto-tool-choice` flag is set
+3. Check generation_config.json is up to date (see model-updates.md)
+4. Try simpler tool schemas first
+5. Check vLLM logs for parsing errors
+
+### Issue: Token Parsing Errors
+
+**Error**: `openai_harmony.HarmonyError: Unexpected token X`
+
+**Solutions**:
+1. Update model files (see model-updates.md)
+2. Verify vLLM version is recent
+3. Check vLLM startup logs for warnings
+4. Restart vLLM server after any config changes
+
+## Performance Tuning
+
+### GPU Memory
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --gpu-memory-utilization 0.9
+```
+
+### Tensor Parallelism
+
+For multi-GPU:
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --tensor-parallel-size 2
+```
+
+### Batching
+
+```bash
+vllm serve openai/gpt-oss-20b \
+  --tool-call-parser openai \
+  --enable-auto-tool-choice \
+  --max-num-batched-tokens 8192 \
+  --max-num-seqs 256
+```
+
+## Testing Your Configuration
+
+### Basic Test
+
+```bash
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-oss-20b",
+    "messages": [{"role": "user", "content": "Calculate 15 * 7"}],
+    "tools": [{
+      "type": "function",
+      "function": {
+        "name": "calculator",
+        "description": "Perform math",
+        "parameters": {
+          "type": "object",
+          "properties": {"expr": {"type": "string"}}
+        }
+      }
+    }],
+    "tool_choice": "auto"
+  }'
+```
+
+### Expected Response
+
+Success indicates tool_calls array with function call:
+```json
+{
+  "choices": [{
+    "message": {
+      "role": "assistant",
+      "content": null,
+      "tool_calls": [{
+        "id": "call_123",
+        "type": "function",
+        "function": {
+          "name": "calculator",
+          "arguments": "{\"expr\": \"15 * 7\"}"
+        }
+      }]
+    }
+  }]
+}
+```
+
+### Failure Indicators
+
+- `content` field has text describing calculation instead of null
+- `tool_calls` is empty or null
+- Error in response about tool parsing
+- HarmonyError in vLLM logs
+
+## Cross-References
+
+- Model file updates: See model-updates.md
+- Known issues: See known-issues.md