--- name: openrouter description: OpenRouter API - Unified access to 400+ AI models through one API --- # OpenRouter Skill Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces. ## When to Use This Skill This skill should be triggered when: - Making API calls to multiple AI model providers through a unified interface - Implementing model fallback strategies or auto-routing - Working with OpenAI-compatible SDKs but targeting multiple providers - Configuring advanced sampling parameters (temperature, top_p, penalties) - Setting up streaming responses or structured JSON outputs - Comparing costs across different AI models - Building applications that need automatic provider failover - Implementing function/tool calling across different models - Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance) ## Quick Reference ### Basic Chat Completion (Python) ```python from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key="", ) completion = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "What is the meaning of life?"}] ) print(completion.choices[0].message.content) ``` ### Basic Chat Completion (JavaScript/TypeScript) ```typescript import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: 'https://openrouter.ai/api/v1', apiKey: '', }); const completion = await openai.chat.completions.create({ model: 'openai/gpt-4o', messages: [{"role": 'user', "content": 'What is the meaning of life?'}], }); console.log(completion.choices[0].message); ``` ### cURL Request ```bash curl https://openrouter.ai/api/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENROUTER_API_KEY" \ -d '{ "model": "openai/gpt-4o", "messages": [{"role": "user", "content": "What is the meaning of life?"}] }' ``` ### Model Fallback Configuration (Python) ```python completion = client.chat.completions.create( model="openai/gpt-4o", extra_body={ "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"], }, messages=[{"role": "user", "content": "Your prompt here"}] ) ``` ### Model Fallback Configuration (TypeScript) ```typescript const completion = await client.chat.completions.create({ model: 'openai/gpt-4o', models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'], messages: [{ role: 'user', content: 'Your prompt here' }], }); ``` ### Auto Router (Dynamic Model Selection) ```python completion = client.chat.completions.create( model="openrouter/auto", # Automatically selects best model for the prompt messages=[{"role": "user", "content": "Your prompt here"}] ) ``` ### Advanced Parameters Example ```python completion = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Write a creative story"}], temperature=0.8, # Higher for creativity (0.0-2.0) max_tokens=500, # Limit response length top_p=0.9, # Nucleus sampling (0.0-1.0) frequency_penalty=0.5, # Reduce repetition (-2.0-2.0) presence_penalty=0.3 # Encourage topic diversity (-2.0-2.0) ) ``` ### Streaming Response ```python stream = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Tell me a story"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='') ``` ### JSON Mode (Structured Output) ```python completion = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": "Extract person's name, age, and city from: John is 30 and lives in NYC" }], response_format={"type": "json_object"} ) ``` ### Deterministic Output with Seed ```python completion = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Generate a random number"}], seed=42, # Same seed = same output (when supported) temperature=0.0 # Deterministic sampling ) ``` ## Key Concepts ### Model Routing OpenRouter provides intelligent routing capabilities: - **Auto Router** (`openrouter/auto`): Automatically selects the best model based on your prompt using NotDiamond - **Fallback Models**: Specify multiple models that automatically retry if primary fails - **Provider Routing**: Automatically routes across providers for reliability ### Authentication - Uses Bearer token authentication with API keys - API keys can be managed programmatically - Compatible with OpenAI SDK authentication patterns ### Model Naming Convention Models use the format `provider/model-name`: - `openai/gpt-4o` - OpenAI's GPT-4 Optimized - `anthropic/claude-3.5-sonnet` - Anthropic's Claude 3.5 Sonnet - `google/gemini-2.0-flash-exp:free` - Google's free Gemini model - `openrouter/auto` - Auto-routing system ### Sampling Parameters **Temperature** (0.0-2.0, default: 1.0) - Lower = more predictable, focused responses - Higher = more creative, diverse responses - Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work **Top P** (0.0-1.0, default: 1.0) - Limits choices to percentage of likely tokens - Dynamic filtering of improbable options - Balance between consistency and variety **Frequency/Presence Penalties** (-2.0-2.0, default: 0.0) - Frequency: Discourages repeating tokens proportional to use - Presence: Simpler penalty not scaled by count - Positive values reduce repetition, negative encourage reuse **Max Tokens** (integer) - Sets maximum response length - Cannot exceed context length minus prompt length - Use to control costs and enforce concise replies ### Response Formats - **Standard JSON**: Default chat completion format - **Streaming**: Server-Sent Events (SSE) with `stream: true` - **JSON Mode**: Guaranteed valid JSON with `response_format: {"type": "json_object"}` - **Structured Outputs**: Schema-validated JSON responses ### Advanced Features - **Tool/Function Calling**: Connect models to external APIs - **Multimodal Inputs**: Support for images, PDFs, audio - **Prompt Caching**: Reduce costs for repeated prompts - **Web Search Integration**: Enhanced responses with web data - **Zero Completion Insurance**: Protection against failed responses - **Logprobs**: Access token probabilities for confidence analysis ## Reference Files This skill includes comprehensive documentation in `references/`: - **llms-full.md** - Complete list of available models with metadata - **llms-small.md** - Curated subset of popular models - **llms.md** - Standard model listings Use `view` to read specific reference files when detailed model information is needed. ## Working with This Skill ### For Beginners 1. Start with basic chat completion examples (Python/JavaScript/cURL above) 2. Use the standard OpenAI SDK for easy integration 3. Try simple model names like `openai/gpt-4o` or `anthropic/claude-3.5-sonnet` 4. Keep parameters simple initially (just model and messages) ### For Intermediate Users 1. Implement model fallback arrays for reliability 2. Experiment with sampling parameters (temperature, top_p) 3. Use streaming for better UX in conversational apps 4. Try `openrouter/auto` for automatic model selection 5. Implement JSON mode for structured data extraction ### For Advanced Users 1. Fine-tune multiple sampling parameters together 2. Implement custom routing logic with fallback chains 3. Use logprobs for confidence scoring 4. Leverage tool/function calling capabilities 5. Optimize costs by selecting appropriate models per task 6. Implement prompt caching strategies 7. Use seed parameter for reproducible testing ## Common Patterns ### Error Handling with Fallbacks ```python try: completion = client.chat.completions.create( model="openai/gpt-4o", extra_body={ "models": [ "anthropic/claude-3.5-sonnet", "google/gemini-2.0-flash-exp:free" ] }, messages=[{"role": "user", "content": "Your prompt"}] ) except Exception as e: print(f"All models failed: {e}") ``` ### Cost-Optimized Routing ```python # Use cheaper models for simple tasks simple_completion = client.chat.completions.create( model="google/gemini-2.0-flash-exp:free", messages=[{"role": "user", "content": "Simple question"}] ) # Use premium models for complex tasks complex_completion = client.chat.completions.create( model="openai/o1", messages=[{"role": "user", "content": "Complex reasoning task"}] ) ``` ### Context-Aware Temperature ```python # Low temperature for factual responses factual = client.chat.completions.create( model="openai/gpt-4o", temperature=0.2, messages=[{"role": "user", "content": "What is the capital of France?"}] ) # High temperature for creative content creative = client.chat.completions.create( model="openai/gpt-4o", temperature=1.2, messages=[{"role": "user", "content": "Write a unique story opening"}] ) ``` ## Resources ### Official Documentation - API Reference: https://openrouter.ai/docs/api-reference/overview - Quickstart Guide: https://openrouter.ai/docs/quickstart - Model List: https://openrouter.ai/docs/models - Parameters Guide: https://openrouter.ai/docs/api-reference/parameters ### Key Endpoints - Chat Completions: `POST https://openrouter.ai/api/v1/chat/completions` - List Models: `GET https://openrouter.ai/api/v1/models` - Generation Info: `GET https://openrouter.ai/api/v1/generation` ## Notes - OpenRouter normalizes API schemas across all providers - Uses OpenAI-compatible API format for easy migration - Automatic provider fallback if models are rate-limited or down - Pricing based on actual model used (important for fallbacks) - Response includes metadata about which model processed the request - All models support streaming via Server-Sent Events - Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.) ## Best Practices 1. **Always implement fallbacks** for production applications 2. **Use appropriate temperature** based on task type (low for factual, high for creative) 3. **Set max_tokens** to control costs and response length 4. **Enable streaming** for better user experience in chat applications 5. **Use JSON mode** when you need guaranteed structured output 6. **Test with seed parameter** for reproducible results during development 7. **Monitor costs** by selecting appropriate models per task 8. **Use auto-routing** when unsure which model performs best 9. **Implement proper error handling** for rate limits and failures 10. **Cache prompts** for repeated requests to reduce costs