Initial commit
This commit is contained in:
319
skills/openrouter/SKILL.md
Normal file
319
skills/openrouter/SKILL.md
Normal file
@@ -0,0 +1,319 @@
|
||||
---
|
||||
name: openrouter
|
||||
description: OpenRouter API - Unified access to 400+ AI models through one API
|
||||
---
|
||||
|
||||
# OpenRouter Skill
|
||||
|
||||
Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
This skill should be triggered when:
|
||||
- Making API calls to multiple AI model providers through a unified interface
|
||||
- Implementing model fallback strategies or auto-routing
|
||||
- Working with OpenAI-compatible SDKs but targeting multiple providers
|
||||
- Configuring advanced sampling parameters (temperature, top_p, penalties)
|
||||
- Setting up streaming responses or structured JSON outputs
|
||||
- Comparing costs across different AI models
|
||||
- Building applications that need automatic provider failover
|
||||
- Implementing function/tool calling across different models
|
||||
- Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Basic Chat Completion (Python)
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
api_key="<OPENROUTER_API_KEY>",
|
||||
)
|
||||
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
messages=[{"role": "user", "content": "What is the meaning of life?"}]
|
||||
)
|
||||
print(completion.choices[0].message.content)
|
||||
```
|
||||
|
||||
### Basic Chat Completion (JavaScript/TypeScript)
|
||||
```typescript
|
||||
import OpenAI from 'openai';
|
||||
|
||||
const openai = new OpenAI({
|
||||
baseURL: 'https://openrouter.ai/api/v1',
|
||||
apiKey: '<OPENROUTER_API_KEY>',
|
||||
});
|
||||
|
||||
const completion = await openai.chat.completions.create({
|
||||
model: 'openai/gpt-4o',
|
||||
messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
|
||||
});
|
||||
console.log(completion.choices[0].message);
|
||||
```
|
||||
|
||||
### cURL Request
|
||||
```bash
|
||||
curl https://openrouter.ai/api/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
|
||||
-d '{
|
||||
"model": "openai/gpt-4o",
|
||||
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### Model Fallback Configuration (Python)
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
extra_body={
|
||||
"models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
|
||||
},
|
||||
messages=[{"role": "user", "content": "Your prompt here"}]
|
||||
)
|
||||
```
|
||||
|
||||
### Model Fallback Configuration (TypeScript)
|
||||
```typescript
|
||||
const completion = await client.chat.completions.create({
|
||||
model: 'openai/gpt-4o',
|
||||
models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
|
||||
messages: [{ role: 'user', content: 'Your prompt here' }],
|
||||
});
|
||||
```
|
||||
|
||||
### Auto Router (Dynamic Model Selection)
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="openrouter/auto", # Automatically selects best model for the prompt
|
||||
messages=[{"role": "user", "content": "Your prompt here"}]
|
||||
)
|
||||
```
|
||||
|
||||
### Advanced Parameters Example
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Write a creative story"}],
|
||||
temperature=0.8, # Higher for creativity (0.0-2.0)
|
||||
max_tokens=500, # Limit response length
|
||||
top_p=0.9, # Nucleus sampling (0.0-1.0)
|
||||
frequency_penalty=0.5, # Reduce repetition (-2.0-2.0)
|
||||
presence_penalty=0.3 # Encourage topic diversity (-2.0-2.0)
|
||||
)
|
||||
```
|
||||
|
||||
### Streaming Response
|
||||
```python
|
||||
stream = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Tell me a story"}],
|
||||
stream=True
|
||||
)
|
||||
|
||||
for chunk in stream:
|
||||
if chunk.choices[0].delta.content:
|
||||
print(chunk.choices[0].delta.content, end='')
|
||||
```
|
||||
|
||||
### JSON Mode (Structured Output)
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
|
||||
}],
|
||||
response_format={"type": "json_object"}
|
||||
)
|
||||
```
|
||||
|
||||
### Deterministic Output with Seed
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
messages=[{"role": "user", "content": "Generate a random number"}],
|
||||
seed=42, # Same seed = same output (when supported)
|
||||
temperature=0.0 # Deterministic sampling
|
||||
)
|
||||
```
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Model Routing
|
||||
OpenRouter provides intelligent routing capabilities:
|
||||
- **Auto Router** (`openrouter/auto`): Automatically selects the best model based on your prompt using NotDiamond
|
||||
- **Fallback Models**: Specify multiple models that automatically retry if primary fails
|
||||
- **Provider Routing**: Automatically routes across providers for reliability
|
||||
|
||||
### Authentication
|
||||
- Uses Bearer token authentication with API keys
|
||||
- API keys can be managed programmatically
|
||||
- Compatible with OpenAI SDK authentication patterns
|
||||
|
||||
### Model Naming Convention
|
||||
Models use the format `provider/model-name`:
|
||||
- `openai/gpt-4o` - OpenAI's GPT-4 Optimized
|
||||
- `anthropic/claude-3.5-sonnet` - Anthropic's Claude 3.5 Sonnet
|
||||
- `google/gemini-2.0-flash-exp:free` - Google's free Gemini model
|
||||
- `openrouter/auto` - Auto-routing system
|
||||
|
||||
### Sampling Parameters
|
||||
|
||||
**Temperature** (0.0-2.0, default: 1.0)
|
||||
- Lower = more predictable, focused responses
|
||||
- Higher = more creative, diverse responses
|
||||
- Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work
|
||||
|
||||
**Top P** (0.0-1.0, default: 1.0)
|
||||
- Limits choices to percentage of likely tokens
|
||||
- Dynamic filtering of improbable options
|
||||
- Balance between consistency and variety
|
||||
|
||||
**Frequency/Presence Penalties** (-2.0-2.0, default: 0.0)
|
||||
- Frequency: Discourages repeating tokens proportional to use
|
||||
- Presence: Simpler penalty not scaled by count
|
||||
- Positive values reduce repetition, negative encourage reuse
|
||||
|
||||
**Max Tokens** (integer)
|
||||
- Sets maximum response length
|
||||
- Cannot exceed context length minus prompt length
|
||||
- Use to control costs and enforce concise replies
|
||||
|
||||
### Response Formats
|
||||
- **Standard JSON**: Default chat completion format
|
||||
- **Streaming**: Server-Sent Events (SSE) with `stream: true`
|
||||
- **JSON Mode**: Guaranteed valid JSON with `response_format: {"type": "json_object"}`
|
||||
- **Structured Outputs**: Schema-validated JSON responses
|
||||
|
||||
### Advanced Features
|
||||
- **Tool/Function Calling**: Connect models to external APIs
|
||||
- **Multimodal Inputs**: Support for images, PDFs, audio
|
||||
- **Prompt Caching**: Reduce costs for repeated prompts
|
||||
- **Web Search Integration**: Enhanced responses with web data
|
||||
- **Zero Completion Insurance**: Protection against failed responses
|
||||
- **Logprobs**: Access token probabilities for confidence analysis
|
||||
|
||||
## Reference Files
|
||||
|
||||
This skill includes comprehensive documentation in `references/`:
|
||||
|
||||
- **llms-full.md** - Complete list of available models with metadata
|
||||
- **llms-small.md** - Curated subset of popular models
|
||||
- **llms.md** - Standard model listings
|
||||
|
||||
Use `view` to read specific reference files when detailed model information is needed.
|
||||
|
||||
## Working with This Skill
|
||||
|
||||
### For Beginners
|
||||
1. Start with basic chat completion examples (Python/JavaScript/cURL above)
|
||||
2. Use the standard OpenAI SDK for easy integration
|
||||
3. Try simple model names like `openai/gpt-4o` or `anthropic/claude-3.5-sonnet`
|
||||
4. Keep parameters simple initially (just model and messages)
|
||||
|
||||
### For Intermediate Users
|
||||
1. Implement model fallback arrays for reliability
|
||||
2. Experiment with sampling parameters (temperature, top_p)
|
||||
3. Use streaming for better UX in conversational apps
|
||||
4. Try `openrouter/auto` for automatic model selection
|
||||
5. Implement JSON mode for structured data extraction
|
||||
|
||||
### For Advanced Users
|
||||
1. Fine-tune multiple sampling parameters together
|
||||
2. Implement custom routing logic with fallback chains
|
||||
3. Use logprobs for confidence scoring
|
||||
4. Leverage tool/function calling capabilities
|
||||
5. Optimize costs by selecting appropriate models per task
|
||||
6. Implement prompt caching strategies
|
||||
7. Use seed parameter for reproducible testing
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Error Handling with Fallbacks
|
||||
```python
|
||||
try:
|
||||
completion = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
extra_body={
|
||||
"models": [
|
||||
"anthropic/claude-3.5-sonnet",
|
||||
"google/gemini-2.0-flash-exp:free"
|
||||
]
|
||||
},
|
||||
messages=[{"role": "user", "content": "Your prompt"}]
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"All models failed: {e}")
|
||||
```
|
||||
|
||||
### Cost-Optimized Routing
|
||||
```python
|
||||
# Use cheaper models for simple tasks
|
||||
simple_completion = client.chat.completions.create(
|
||||
model="google/gemini-2.0-flash-exp:free",
|
||||
messages=[{"role": "user", "content": "Simple question"}]
|
||||
)
|
||||
|
||||
# Use premium models for complex tasks
|
||||
complex_completion = client.chat.completions.create(
|
||||
model="openai/o1",
|
||||
messages=[{"role": "user", "content": "Complex reasoning task"}]
|
||||
)
|
||||
```
|
||||
|
||||
### Context-Aware Temperature
|
||||
```python
|
||||
# Low temperature for factual responses
|
||||
factual = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
temperature=0.2,
|
||||
messages=[{"role": "user", "content": "What is the capital of France?"}]
|
||||
)
|
||||
|
||||
# High temperature for creative content
|
||||
creative = client.chat.completions.create(
|
||||
model="openai/gpt-4o",
|
||||
temperature=1.2,
|
||||
messages=[{"role": "user", "content": "Write a unique story opening"}]
|
||||
)
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
### Official Documentation
|
||||
- API Reference: https://openrouter.ai/docs/api-reference/overview
|
||||
- Quickstart Guide: https://openrouter.ai/docs/quickstart
|
||||
- Model List: https://openrouter.ai/docs/models
|
||||
- Parameters Guide: https://openrouter.ai/docs/api-reference/parameters
|
||||
|
||||
### Key Endpoints
|
||||
- Chat Completions: `POST https://openrouter.ai/api/v1/chat/completions`
|
||||
- List Models: `GET https://openrouter.ai/api/v1/models`
|
||||
- Generation Info: `GET https://openrouter.ai/api/v1/generation`
|
||||
|
||||
## Notes
|
||||
|
||||
- OpenRouter normalizes API schemas across all providers
|
||||
- Uses OpenAI-compatible API format for easy migration
|
||||
- Automatic provider fallback if models are rate-limited or down
|
||||
- Pricing based on actual model used (important for fallbacks)
|
||||
- Response includes metadata about which model processed the request
|
||||
- All models support streaming via Server-Sent Events
|
||||
- Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always implement fallbacks** for production applications
|
||||
2. **Use appropriate temperature** based on task type (low for factual, high for creative)
|
||||
3. **Set max_tokens** to control costs and response length
|
||||
4. **Enable streaming** for better user experience in chat applications
|
||||
5. **Use JSON mode** when you need guaranteed structured output
|
||||
6. **Test with seed parameter** for reproducible results during development
|
||||
7. **Monitor costs** by selecting appropriate models per task
|
||||
8. **Use auto-routing** when unsure which model performs best
|
||||
9. **Implement proper error handling** for rate limits and failures
|
||||
10. **Cache prompts** for repeated requests to reduce costs
|
||||
Reference in New Issue
Block a user