--- name: ollama description: Ollama API Documentation --- # Ollama Skill Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically. ## When to Use This Skill This skill should be triggered when: - Running local AI models with Ollama - Building applications that interact with Ollama's API - Implementing chat completions, embeddings, or streaming responses - Setting up Ollama authentication or cloud models - Configuring Ollama server (environment variables, ports, proxies) - Using Ollama with OpenAI-compatible libraries - Troubleshooting Ollama installations or GPU compatibility - Implementing tool calling, structured outputs, or vision capabilities - Working with Ollama in Docker or behind proxies - Creating, copying, pushing, or managing Ollama models ## Quick Reference ### 1. Basic Chat Completion (cURL) Generate a simple chat response: ```bash curl http://localhost:11434/api/chat -d '{ "model": "gemma3", "messages": [ { "role": "user", "content": "Why is the sky blue?" } ] }' ``` ### 2. Simple Text Generation (cURL) Generate a text response from a prompt: ```bash curl http://localhost:11434/api/generate -d '{ "model": "gemma3", "prompt": "Why is the sky blue?" }' ``` ### 3. Python Chat with OpenAI Library Use Ollama with the OpenAI Python library: ```python from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1/', api_key='ollama', # required but ignored ) chat_completion = client.chat.completions.create( messages=[ { 'role': 'user', 'content': 'Say this is a test', } ], model='llama3.2', ) ``` ### 4. Vision Model (Image Analysis) Ask questions about images: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama") response = client.chat.completions.create( model="llava", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": "...", }, ], } ], max_tokens=300, ) ``` ### 5. Generate Embeddings Create vector embeddings for text: ```python client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama") embeddings = client.embeddings.create( model="all-minilm", input=["why is the sky blue?", "why is the grass green?"], ) ``` ### 6. Structured Outputs (JSON Schema) Get structured JSON responses: ```python from pydantic import BaseModel from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama") class FriendInfo(BaseModel): name: str age: int is_available: bool class FriendList(BaseModel): friends: list[FriendInfo] completion = client.beta.chat.completions.parse( temperature=0, model="llama3.1:8b", messages=[ {"role": "user", "content": "Return a list of friends in JSON format"} ], response_format=FriendList, ) friends_response = completion.choices[0].message if friends_response.parsed: print(friends_response.parsed) ``` ### 7. JavaScript/TypeScript Chat Use Ollama with the OpenAI JavaScript library: ```javascript import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1/", apiKey: "ollama", // required but ignored }); const chatCompletion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Say this is a test" }], model: "llama3.2", }); ``` ### 8. Authentication for Cloud Models Sign in to use cloud models: ```bash # Sign in from CLI ollama signin # Then use cloud models ollama run gpt-oss:120b-cloud ``` Or use API keys for direct cloud access: ```bash export OLLAMA_API_KEY=your_api_key curl https://ollama.com/api/generate \ -H "Authorization: Bearer $OLLAMA_API_KEY" \ -d '{ "model": "gpt-oss:120b", "prompt": "Why is the sky blue?", "stream": false }' ``` ### 9. Configure Ollama Server Set environment variables for server configuration: **macOS:** ```bash # Set environment variable launchctl setenv OLLAMA_HOST "0.0.0.0:11434" # Restart Ollama application ``` **Linux (systemd):** ```bash # Edit service systemctl edit ollama.service # Add under [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" # Reload and restart systemctl daemon-reload systemctl restart ollama ``` **Windows:** ``` 1. Quit Ollama from task bar 2. Search "environment variables" in Settings 3. Edit or create OLLAMA_HOST variable 4. Set value: 0.0.0.0:11434 5. Restart Ollama from Start menu ``` ### 10. Check Model GPU Loading Verify if your model is using GPU: ```bash ollama ps ``` Output shows: - `100% GPU` - Fully loaded on GPU - `100% CPU` - Fully loaded in system memory - `48%/52% CPU/GPU` - Split between both ## Key Concepts ### Base URLs - **Local API (default)**: `http://localhost:11434/api` - **Cloud API**: `https://ollama.com/api` - **OpenAI Compatible**: `/v1/` endpoints for OpenAI libraries ### Authentication - **Local**: No authentication required for `http://localhost:11434` - **Cloud Models**: Requires signing in (`ollama signin`) or API key - **API Keys**: For programmatic access to `https://ollama.com/api` ### Models - **Local Models**: Run on your machine (e.g., `gemma3`, `llama3.2`, `qwen3`) - **Cloud Models**: Suffix `-cloud` (e.g., `gpt-oss:120b-cloud`, `qwen3-coder:480b-cloud`) - **Vision Models**: Support image inputs (e.g., `llava`) ### Common Environment Variables - `OLLAMA_HOST` - Change bind address (default: `127.0.0.1:11434`) - `OLLAMA_CONTEXT_LENGTH` - Context window size (default: `2048` tokens) - `OLLAMA_MODELS` - Model storage directory - `OLLAMA_ORIGINS` - Allow additional web origins for CORS - `HTTPS_PROXY` - Proxy server for model downloads ### Error Handling **Status Codes:** - `200` - Success - `400` - Bad Request (invalid parameters) - `404` - Not Found (model doesn't exist) - `429` - Too Many Requests (rate limit) - `500` - Internal Server Error - `502` - Bad Gateway (cloud model unreachable) **Error Format:** ```json { "error": "the model failed to generate a response" } ``` ### Streaming vs Non-Streaming - **Streaming** (default): Returns response chunks as JSON objects (NDJSON) - **Non-Streaming**: Set `"stream": false` to get complete response in one object ## Reference Files This skill includes comprehensive documentation in `references/`: - **llms-txt.md** - Complete API reference covering: - All API endpoints (`/api/generate`, `/api/chat`, `/api/embed`, etc.) - Authentication methods (signin, API keys) - Error handling and status codes - OpenAI compatibility layer - Cloud models usage - Streaming responses - Configuration and environment variables - **llms.md** - Documentation index listing all available topics: - API reference (version, model details, chat, generate, embeddings) - Capabilities (embeddings, streaming, structured outputs, tool calling, vision) - CLI reference - Cloud integration - Platform-specific guides (Linux, macOS, Windows, Docker) - IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline) Use the reference files when you need: - Detailed API parameter specifications - Complete endpoint documentation - Advanced configuration options - Platform-specific setup instructions - Integration guides for specific tools ## Working with This Skill ### For Beginners Start with these common patterns: 1. **Simple generation**: Use `/api/generate` endpoint with a prompt 2. **Chat interface**: Use `/api/chat` with messages array 3. **OpenAI compatibility**: Use OpenAI libraries with `base_url='http://localhost:11434/v1/'` 4. **Check GPU usage**: Run `ollama ps` to verify model loading Read `llms-txt.md` section on "Introduction" and "Quickstart" for foundational concepts. ### For Intermediate Users Focus on: - **Embeddings** for semantic search and RAG applications - **Structured outputs** with JSON schema validation - **Vision models** for image analysis - **Streaming** for real-time response generation - **Authentication** for cloud models Check the specific API endpoints in `llms-txt.md` for detailed parameter options. ### For Advanced Users Explore: - **Tool calling** for function execution - **Custom model creation** with Modelfiles - **Server configuration** with environment variables - **Proxy setup** for network-restricted environments - **Docker deployment** with custom configurations - **Performance optimization** with GPU settings Refer to platform-specific sections in `llms.md` and configuration details in `llms-txt.md`. ### Common Use Cases **Building a chatbot:** 1. Use `/api/chat` endpoint 2. Maintain message history in your application 3. Stream responses for better UX 4. Handle errors gracefully **Creating embeddings for search:** 1. Use `/api/embed` endpoint 2. Store embeddings in vector database 3. Perform similarity search 4. Implement RAG (Retrieval Augmented Generation) **Running behind a firewall:** 1. Set `HTTPS_PROXY` environment variable 2. Configure proxy in Docker if containerized 3. Ensure certificates are trusted **Using cloud models:** 1. Run `ollama signin` once 2. Pull cloud models with `-cloud` suffix 3. Use same API endpoints as local models ## Troubleshooting ### Model Not Loading on GPU **Check:** ```bash ollama ps ``` **Solutions:** - Verify GPU compatibility in documentation - Check CUDA/ROCm installation - Review available VRAM - Try smaller model variants ### Cannot Access Ollama Remotely **Problem:** Ollama only accessible from localhost **Solution:** ```bash # Set OLLAMA_HOST to bind to all interfaces export OLLAMA_HOST="0.0.0.0:11434" ``` See "How do I configure Ollama server?" in `llms-txt.md` for platform-specific instructions. ### Proxy Issues **Problem:** Cannot download models behind proxy **Solution:** ```bash # Set proxy (HTTPS only, not HTTP) export HTTPS_PROXY=https://proxy.example.com # Restart Ollama ``` See "How do I use Ollama behind a proxy?" in `llms-txt.md`. ### CORS Errors in Browser **Problem:** Browser extension or web app cannot access Ollama **Solution:** ```bash # Allow specific origins export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*" ``` See "How can I allow additional web origins?" in `llms-txt.md`. ## Resources ### Official Documentation - Main docs: https://docs.ollama.com - API Reference: https://docs.ollama.com/api - Model Library: https://ollama.com/models ### Official Libraries - Python: https://github.com/ollama/ollama-python - JavaScript: https://github.com/ollama/ollama-js ### Community - GitHub: https://github.com/ollama/ollama - Community Libraries: See GitHub README for full list ## Notes - This skill was generated from official Ollama documentation - All examples are tested and working with Ollama's API - Code samples include proper language detection for syntax highlighting - Reference files preserve structure from official docs with working links - OpenAI compatibility means most OpenAI code works with minimal changes ## Quick Command Reference ```bash # CLI Commands ollama signin # Sign in to ollama.com ollama run gemma3 # Run a model interactively ollama pull gemma3 # Download a model ollama ps # List running models ollama list # List installed models # Check API Status curl http://localhost:11434/api/version # Environment Variables (Common) export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com" ```