Initial commit

2025-11-30 08:50:49 +08:00
commit 97152999f5
10 changed files with 947 additions and 0 deletions
--- a/skills/openrouter/SKILL.md
+++ b/skills/openrouter/SKILL.md
@@ -0,0 +1,319 @@
+---
+name: openrouter
+description: OpenRouter API - Unified access to 400+ AI models through one API
+---
+
+# OpenRouter Skill
+
+Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.
+
+## When to Use This Skill
+
+This skill should be triggered when:
+- Making API calls to multiple AI model providers through a unified interface
+- Implementing model fallback strategies or auto-routing
+- Working with OpenAI-compatible SDKs but targeting multiple providers
+- Configuring advanced sampling parameters (temperature, top_p, penalties)
+- Setting up streaming responses or structured JSON outputs
+- Comparing costs across different AI models
+- Building applications that need automatic provider failover
+- Implementing function/tool calling across different models
+- Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)
+
+## Quick Reference
+
+### Basic Chat Completion (Python)
+```python
+from openai import OpenAI
+
+client = OpenAI(
+  base_url="https://openrouter.ai/api/v1",
+  api_key="<OPENROUTER_API_KEY>",
+)
+
+completion = client.chat.completions.create(
+  model="openai/gpt-4o",
+  messages=[{"role": "user", "content": "What is the meaning of life?"}]
+)
+print(completion.choices[0].message.content)
+```
+
+### Basic Chat Completion (JavaScript/TypeScript)
+```typescript
+import OpenAI from 'openai';
+
+const openai = new OpenAI({
+  baseURL: 'https://openrouter.ai/api/v1',
+  apiKey: '<OPENROUTER_API_KEY>',
+});
+
+const completion = await openai.chat.completions.create({
+  model: 'openai/gpt-4o',
+  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
+});
+console.log(completion.choices[0].message);
+```
+
+### cURL Request
+```bash
+curl https://openrouter.ai/api/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
+  -d '{
+    "model": "openai/gpt-4o",
+    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
+  }'
+```
+
+### Model Fallback Configuration (Python)
+```python
+completion = client.chat.completions.create(
+    model="openai/gpt-4o",
+    extra_body={
+        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
+    },
+    messages=[{"role": "user", "content": "Your prompt here"}]
+)
+```
+
+### Model Fallback Configuration (TypeScript)
+```typescript
+const completion = await client.chat.completions.create({
+    model: 'openai/gpt-4o',
+    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
+    messages: [{ role: 'user', content: 'Your prompt here' }],
+});
+```
+
+### Auto Router (Dynamic Model Selection)
+```python
+completion = client.chat.completions.create(
+    model="openrouter/auto",  # Automatically selects best model for the prompt
+    messages=[{"role": "user", "content": "Your prompt here"}]
+)
+```
+
+### Advanced Parameters Example
+```python
+completion = client.chat.completions.create(
+    model="openai/gpt-4o",
+    messages=[{"role": "user", "content": "Write a creative story"}],
+    temperature=0.8,           # Higher for creativity (0.0-2.0)
+    max_tokens=500,            # Limit response length
+    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
+    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
+    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
+)
+```
+
+### Streaming Response
+```python
+stream = client.chat.completions.create(
+    model="openai/gpt-4o",
+    messages=[{"role": "user", "content": "Tell me a story"}],
+    stream=True
+)
+
+for chunk in stream:
+    if chunk.choices[0].delta.content:
+        print(chunk.choices[0].delta.content, end='')
+```
+
+### JSON Mode (Structured Output)
+```python
+completion = client.chat.completions.create(
+    model="openai/gpt-4o",
+    messages=[{
+        "role": "user",
+        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
+    }],
+    response_format={"type": "json_object"}
+)
+```
+
+### Deterministic Output with Seed
+```python
+completion = client.chat.completions.create(
+    model="openai/gpt-4o",
+    messages=[{"role": "user", "content": "Generate a random number"}],
+    seed=42,            # Same seed = same output (when supported)
+    temperature=0.0     # Deterministic sampling
+)
+```
+
+## Key Concepts
+
+### Model Routing
+OpenRouter provides intelligent routing capabilities:
+- **Auto Router** (`openrouter/auto`): Automatically selects the best model based on your prompt using NotDiamond
+- **Fallback Models**: Specify multiple models that automatically retry if primary fails
+- **Provider Routing**: Automatically routes across providers for reliability
+
+### Authentication
+- Uses Bearer token authentication with API keys
+- API keys can be managed programmatically
+- Compatible with OpenAI SDK authentication patterns
+
+### Model Naming Convention
+Models use the format `provider/model-name`:
+- `openai/gpt-4o` - OpenAI's GPT-4 Optimized
+- `anthropic/claude-3.5-sonnet` - Anthropic's Claude 3.5 Sonnet
+- `google/gemini-2.0-flash-exp:free` - Google's free Gemini model
+- `openrouter/auto` - Auto-routing system
+
+### Sampling Parameters
+
+**Temperature** (0.0-2.0, default: 1.0)
+- Lower = more predictable, focused responses
+- Higher = more creative, diverse responses
+- Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work
+
+**Top P** (0.0-1.0, default: 1.0)
+- Limits choices to percentage of likely tokens
+- Dynamic filtering of improbable options
+- Balance between consistency and variety
+
+**Frequency/Presence Penalties** (-2.0-2.0, default: 0.0)
+- Frequency: Discourages repeating tokens proportional to use
+- Presence: Simpler penalty not scaled by count
+- Positive values reduce repetition, negative encourage reuse
+
+**Max Tokens** (integer)
+- Sets maximum response length
+- Cannot exceed context length minus prompt length
+- Use to control costs and enforce concise replies
+
+### Response Formats
+- **Standard JSON**: Default chat completion format
+- **Streaming**: Server-Sent Events (SSE) with `stream: true`
+- **JSON Mode**: Guaranteed valid JSON with `response_format: {"type": "json_object"}`
+- **Structured Outputs**: Schema-validated JSON responses
+
+### Advanced Features
+- **Tool/Function Calling**: Connect models to external APIs
+- **Multimodal Inputs**: Support for images, PDFs, audio
+- **Prompt Caching**: Reduce costs for repeated prompts
+- **Web Search Integration**: Enhanced responses with web data
+- **Zero Completion Insurance**: Protection against failed responses
+- **Logprobs**: Access token probabilities for confidence analysis
+
+## Reference Files
+
+This skill includes comprehensive documentation in `references/`:
+
+- **llms-full.md** - Complete list of available models with metadata
+- **llms-small.md** - Curated subset of popular models
+- **llms.md** - Standard model listings
+
+Use `view` to read specific reference files when detailed model information is needed.
+
+## Working with This Skill
+
+### For Beginners
+1. Start with basic chat completion examples (Python/JavaScript/cURL above)
+2. Use the standard OpenAI SDK for easy integration
+3. Try simple model names like `openai/gpt-4o` or `anthropic/claude-3.5-sonnet`
+4. Keep parameters simple initially (just model and messages)
+
+### For Intermediate Users
+1. Implement model fallback arrays for reliability
+2. Experiment with sampling parameters (temperature, top_p)
+3. Use streaming for better UX in conversational apps
+4. Try `openrouter/auto` for automatic model selection
+5. Implement JSON mode for structured data extraction
+
+### For Advanced Users
+1. Fine-tune multiple sampling parameters together
+2. Implement custom routing logic with fallback chains
+3. Use logprobs for confidence scoring
+4. Leverage tool/function calling capabilities
+5. Optimize costs by selecting appropriate models per task
+6. Implement prompt caching strategies
+7. Use seed parameter for reproducible testing
+
+## Common Patterns
+
+### Error Handling with Fallbacks
+```python
+try:
+    completion = client.chat.completions.create(
+        model="openai/gpt-4o",
+        extra_body={
+            "models": [
+                "anthropic/claude-3.5-sonnet",
+                "google/gemini-2.0-flash-exp:free"
+            ]
+        },
+        messages=[{"role": "user", "content": "Your prompt"}]
+    )
+except Exception as e:
+    print(f"All models failed: {e}")
+```
+
+### Cost-Optimized Routing
+```python
+# Use cheaper models for simple tasks
+simple_completion = client.chat.completions.create(
+    model="google/gemini-2.0-flash-exp:free",
+    messages=[{"role": "user", "content": "Simple question"}]
+)
+
+# Use premium models for complex tasks
+complex_completion = client.chat.completions.create(
+    model="openai/o1",
+    messages=[{"role": "user", "content": "Complex reasoning task"}]
+)
+```
+
+### Context-Aware Temperature
+```python
+# Low temperature for factual responses
+factual = client.chat.completions.create(
+    model="openai/gpt-4o",
+    temperature=0.2,
+    messages=[{"role": "user", "content": "What is the capital of France?"}]
+)
+
+# High temperature for creative content
+creative = client.chat.completions.create(
+    model="openai/gpt-4o",
+    temperature=1.2,
+    messages=[{"role": "user", "content": "Write a unique story opening"}]
+)
+```
+
+## Resources
+
+### Official Documentation
+- API Reference: https://openrouter.ai/docs/api-reference/overview
+- Quickstart Guide: https://openrouter.ai/docs/quickstart
+- Model List: https://openrouter.ai/docs/models
+- Parameters Guide: https://openrouter.ai/docs/api-reference/parameters
+
+### Key Endpoints
+- Chat Completions: `POST https://openrouter.ai/api/v1/chat/completions`
+- List Models: `GET https://openrouter.ai/api/v1/models`
+- Generation Info: `GET https://openrouter.ai/api/v1/generation`
+
+## Notes
+
+- OpenRouter normalizes API schemas across all providers
+- Uses OpenAI-compatible API format for easy migration
+- Automatic provider fallback if models are rate-limited or down
+- Pricing based on actual model used (important for fallbacks)
+- Response includes metadata about which model processed the request
+- All models support streaming via Server-Sent Events
+- Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)
+
+## Best Practices
+
+1. **Always implement fallbacks** for production applications
+2. **Use appropriate temperature** based on task type (low for factual, high for creative)
+3. **Set max_tokens** to control costs and response length
+4. **Enable streaming** for better user experience in chat applications
+5. **Use JSON mode** when you need guaranteed structured output
+6. **Test with seed parameter** for reproducible results during development
+7. **Monitor costs** by selecting appropriate models per task
+8. **Use auto-routing** when unsure which model performs best
+9. **Implement proper error handling** for rate limits and failures
+10. **Cache prompts** for repeated requests to reduce costs