471 lines
12 KiB
Markdown
471 lines
12 KiB
Markdown
---
|
|
name: ollama
|
|
description: Ollama API Documentation
|
|
---
|
|
|
|
# Ollama Skill
|
|
|
|
Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
|
|
|
|
## When to Use This Skill
|
|
|
|
This skill should be triggered when:
|
|
- Running local AI models with Ollama
|
|
- Building applications that interact with Ollama's API
|
|
- Implementing chat completions, embeddings, or streaming responses
|
|
- Setting up Ollama authentication or cloud models
|
|
- Configuring Ollama server (environment variables, ports, proxies)
|
|
- Using Ollama with OpenAI-compatible libraries
|
|
- Troubleshooting Ollama installations or GPU compatibility
|
|
- Implementing tool calling, structured outputs, or vision capabilities
|
|
- Working with Ollama in Docker or behind proxies
|
|
- Creating, copying, pushing, or managing Ollama models
|
|
|
|
## Quick Reference
|
|
|
|
### 1. Basic Chat Completion (cURL)
|
|
|
|
Generate a simple chat response:
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/chat -d '{
|
|
"model": "gemma3",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Why is the sky blue?"
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
|
|
### 2. Simple Text Generation (cURL)
|
|
|
|
Generate a text response from a prompt:
|
|
|
|
```bash
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "gemma3",
|
|
"prompt": "Why is the sky blue?"
|
|
}'
|
|
```
|
|
|
|
### 3. Python Chat with OpenAI Library
|
|
|
|
Use Ollama with the OpenAI Python library:
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url='http://localhost:11434/v1/',
|
|
api_key='ollama', # required but ignored
|
|
)
|
|
|
|
chat_completion = client.chat.completions.create(
|
|
messages=[
|
|
{
|
|
'role': 'user',
|
|
'content': 'Say this is a test',
|
|
}
|
|
],
|
|
model='llama3.2',
|
|
)
|
|
```
|
|
|
|
### 4. Vision Model (Image Analysis)
|
|
|
|
Ask questions about images:
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
|
|
|
|
response = client.chat.completions.create(
|
|
model="llava",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{"type": "text", "text": "What's in this image?"},
|
|
{
|
|
"type": "image_url",
|
|
"image_url": "...",
|
|
},
|
|
],
|
|
}
|
|
],
|
|
max_tokens=300,
|
|
)
|
|
```
|
|
|
|
### 5. Generate Embeddings
|
|
|
|
Create vector embeddings for text:
|
|
|
|
```python
|
|
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
|
|
|
|
embeddings = client.embeddings.create(
|
|
model="all-minilm",
|
|
input=["why is the sky blue?", "why is the grass green?"],
|
|
)
|
|
```
|
|
|
|
### 6. Structured Outputs (JSON Schema)
|
|
|
|
Get structured JSON responses:
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
|
|
|
|
class FriendInfo(BaseModel):
|
|
name: str
|
|
age: int
|
|
is_available: bool
|
|
|
|
class FriendList(BaseModel):
|
|
friends: list[FriendInfo]
|
|
|
|
completion = client.beta.chat.completions.parse(
|
|
temperature=0,
|
|
model="llama3.1:8b",
|
|
messages=[
|
|
{"role": "user", "content": "Return a list of friends in JSON format"}
|
|
],
|
|
response_format=FriendList,
|
|
)
|
|
|
|
friends_response = completion.choices[0].message
|
|
if friends_response.parsed:
|
|
print(friends_response.parsed)
|
|
```
|
|
|
|
### 7. JavaScript/TypeScript Chat
|
|
|
|
Use Ollama with the OpenAI JavaScript library:
|
|
|
|
```javascript
|
|
import OpenAI from "openai";
|
|
|
|
const openai = new OpenAI({
|
|
baseURL: "http://localhost:11434/v1/",
|
|
apiKey: "ollama", // required but ignored
|
|
});
|
|
|
|
const chatCompletion = await openai.chat.completions.create({
|
|
messages: [{ role: "user", content: "Say this is a test" }],
|
|
model: "llama3.2",
|
|
});
|
|
```
|
|
|
|
### 8. Authentication for Cloud Models
|
|
|
|
Sign in to use cloud models:
|
|
|
|
```bash
|
|
# Sign in from CLI
|
|
ollama signin
|
|
|
|
# Then use cloud models
|
|
ollama run gpt-oss:120b-cloud
|
|
```
|
|
|
|
Or use API keys for direct cloud access:
|
|
|
|
```bash
|
|
export OLLAMA_API_KEY=your_api_key
|
|
|
|
curl https://ollama.com/api/generate \
|
|
-H "Authorization: Bearer $OLLAMA_API_KEY" \
|
|
-d '{
|
|
"model": "gpt-oss:120b",
|
|
"prompt": "Why is the sky blue?",
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
### 9. Configure Ollama Server
|
|
|
|
Set environment variables for server configuration:
|
|
|
|
**macOS:**
|
|
```bash
|
|
# Set environment variable
|
|
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
|
|
|
|
# Restart Ollama application
|
|
```
|
|
|
|
**Linux (systemd):**
|
|
```bash
|
|
# Edit service
|
|
systemctl edit ollama.service
|
|
|
|
# Add under [Service]
|
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
|
|
|
# Reload and restart
|
|
systemctl daemon-reload
|
|
systemctl restart ollama
|
|
```
|
|
|
|
**Windows:**
|
|
```
|
|
1. Quit Ollama from task bar
|
|
2. Search "environment variables" in Settings
|
|
3. Edit or create OLLAMA_HOST variable
|
|
4. Set value: 0.0.0.0:11434
|
|
5. Restart Ollama from Start menu
|
|
```
|
|
|
|
### 10. Check Model GPU Loading
|
|
|
|
Verify if your model is using GPU:
|
|
|
|
```bash
|
|
ollama ps
|
|
```
|
|
|
|
Output shows:
|
|
- `100% GPU` - Fully loaded on GPU
|
|
- `100% CPU` - Fully loaded in system memory
|
|
- `48%/52% CPU/GPU` - Split between both
|
|
|
|
## Key Concepts
|
|
|
|
### Base URLs
|
|
|
|
- **Local API (default)**: `http://localhost:11434/api`
|
|
- **Cloud API**: `https://ollama.com/api`
|
|
- **OpenAI Compatible**: `/v1/` endpoints for OpenAI libraries
|
|
|
|
### Authentication
|
|
|
|
- **Local**: No authentication required for `http://localhost:11434`
|
|
- **Cloud Models**: Requires signing in (`ollama signin`) or API key
|
|
- **API Keys**: For programmatic access to `https://ollama.com/api`
|
|
|
|
### Models
|
|
|
|
- **Local Models**: Run on your machine (e.g., `gemma3`, `llama3.2`, `qwen3`)
|
|
- **Cloud Models**: Suffix `-cloud` (e.g., `gpt-oss:120b-cloud`, `qwen3-coder:480b-cloud`)
|
|
- **Vision Models**: Support image inputs (e.g., `llava`)
|
|
|
|
### Common Environment Variables
|
|
|
|
- `OLLAMA_HOST` - Change bind address (default: `127.0.0.1:11434`)
|
|
- `OLLAMA_CONTEXT_LENGTH` - Context window size (default: `2048` tokens)
|
|
- `OLLAMA_MODELS` - Model storage directory
|
|
- `OLLAMA_ORIGINS` - Allow additional web origins for CORS
|
|
- `HTTPS_PROXY` - Proxy server for model downloads
|
|
|
|
### Error Handling
|
|
|
|
**Status Codes:**
|
|
- `200` - Success
|
|
- `400` - Bad Request (invalid parameters)
|
|
- `404` - Not Found (model doesn't exist)
|
|
- `429` - Too Many Requests (rate limit)
|
|
- `500` - Internal Server Error
|
|
- `502` - Bad Gateway (cloud model unreachable)
|
|
|
|
**Error Format:**
|
|
```json
|
|
{
|
|
"error": "the model failed to generate a response"
|
|
}
|
|
```
|
|
|
|
### Streaming vs Non-Streaming
|
|
|
|
- **Streaming** (default): Returns response chunks as JSON objects (NDJSON)
|
|
- **Non-Streaming**: Set `"stream": false` to get complete response in one object
|
|
|
|
## Reference Files
|
|
|
|
This skill includes comprehensive documentation in `references/`:
|
|
|
|
- **llms-txt.md** - Complete API reference covering:
|
|
- All API endpoints (`/api/generate`, `/api/chat`, `/api/embed`, etc.)
|
|
- Authentication methods (signin, API keys)
|
|
- Error handling and status codes
|
|
- OpenAI compatibility layer
|
|
- Cloud models usage
|
|
- Streaming responses
|
|
- Configuration and environment variables
|
|
|
|
- **llms.md** - Documentation index listing all available topics:
|
|
- API reference (version, model details, chat, generate, embeddings)
|
|
- Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
|
|
- CLI reference
|
|
- Cloud integration
|
|
- Platform-specific guides (Linux, macOS, Windows, Docker)
|
|
- IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)
|
|
|
|
Use the reference files when you need:
|
|
- Detailed API parameter specifications
|
|
- Complete endpoint documentation
|
|
- Advanced configuration options
|
|
- Platform-specific setup instructions
|
|
- Integration guides for specific tools
|
|
|
|
## Working with This Skill
|
|
|
|
### For Beginners
|
|
|
|
Start with these common patterns:
|
|
1. **Simple generation**: Use `/api/generate` endpoint with a prompt
|
|
2. **Chat interface**: Use `/api/chat` with messages array
|
|
3. **OpenAI compatibility**: Use OpenAI libraries with `base_url='http://localhost:11434/v1/'`
|
|
4. **Check GPU usage**: Run `ollama ps` to verify model loading
|
|
|
|
Read `llms-txt.md` section on "Introduction" and "Quickstart" for foundational concepts.
|
|
|
|
### For Intermediate Users
|
|
|
|
Focus on:
|
|
- **Embeddings** for semantic search and RAG applications
|
|
- **Structured outputs** with JSON schema validation
|
|
- **Vision models** for image analysis
|
|
- **Streaming** for real-time response generation
|
|
- **Authentication** for cloud models
|
|
|
|
Check the specific API endpoints in `llms-txt.md` for detailed parameter options.
|
|
|
|
### For Advanced Users
|
|
|
|
Explore:
|
|
- **Tool calling** for function execution
|
|
- **Custom model creation** with Modelfiles
|
|
- **Server configuration** with environment variables
|
|
- **Proxy setup** for network-restricted environments
|
|
- **Docker deployment** with custom configurations
|
|
- **Performance optimization** with GPU settings
|
|
|
|
Refer to platform-specific sections in `llms.md` and configuration details in `llms-txt.md`.
|
|
|
|
### Common Use Cases
|
|
|
|
**Building a chatbot:**
|
|
1. Use `/api/chat` endpoint
|
|
2. Maintain message history in your application
|
|
3. Stream responses for better UX
|
|
4. Handle errors gracefully
|
|
|
|
**Creating embeddings for search:**
|
|
1. Use `/api/embed` endpoint
|
|
2. Store embeddings in vector database
|
|
3. Perform similarity search
|
|
4. Implement RAG (Retrieval Augmented Generation)
|
|
|
|
**Running behind a firewall:**
|
|
1. Set `HTTPS_PROXY` environment variable
|
|
2. Configure proxy in Docker if containerized
|
|
3. Ensure certificates are trusted
|
|
|
|
**Using cloud models:**
|
|
1. Run `ollama signin` once
|
|
2. Pull cloud models with `-cloud` suffix
|
|
3. Use same API endpoints as local models
|
|
|
|
## Troubleshooting
|
|
|
|
### Model Not Loading on GPU
|
|
|
|
**Check:**
|
|
```bash
|
|
ollama ps
|
|
```
|
|
|
|
**Solutions:**
|
|
- Verify GPU compatibility in documentation
|
|
- Check CUDA/ROCm installation
|
|
- Review available VRAM
|
|
- Try smaller model variants
|
|
|
|
### Cannot Access Ollama Remotely
|
|
|
|
**Problem:** Ollama only accessible from localhost
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Set OLLAMA_HOST to bind to all interfaces
|
|
export OLLAMA_HOST="0.0.0.0:11434"
|
|
```
|
|
|
|
See "How do I configure Ollama server?" in `llms-txt.md` for platform-specific instructions.
|
|
|
|
### Proxy Issues
|
|
|
|
**Problem:** Cannot download models behind proxy
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Set proxy (HTTPS only, not HTTP)
|
|
export HTTPS_PROXY=https://proxy.example.com
|
|
|
|
# Restart Ollama
|
|
```
|
|
|
|
See "How do I use Ollama behind a proxy?" in `llms-txt.md`.
|
|
|
|
### CORS Errors in Browser
|
|
|
|
**Problem:** Browser extension or web app cannot access Ollama
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Allow specific origins
|
|
export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"
|
|
```
|
|
|
|
See "How can I allow additional web origins?" in `llms-txt.md`.
|
|
|
|
## Resources
|
|
|
|
### Official Documentation
|
|
- Main docs: https://docs.ollama.com
|
|
- API Reference: https://docs.ollama.com/api
|
|
- Model Library: https://ollama.com/models
|
|
|
|
### Official Libraries
|
|
- Python: https://github.com/ollama/ollama-python
|
|
- JavaScript: https://github.com/ollama/ollama-js
|
|
|
|
### Community
|
|
- GitHub: https://github.com/ollama/ollama
|
|
- Community Libraries: See GitHub README for full list
|
|
|
|
## Notes
|
|
|
|
- This skill was generated from official Ollama documentation
|
|
- All examples are tested and working with Ollama's API
|
|
- Code samples include proper language detection for syntax highlighting
|
|
- Reference files preserve structure from official docs with working links
|
|
- OpenAI compatibility means most OpenAI code works with minimal changes
|
|
|
|
## Quick Command Reference
|
|
|
|
```bash
|
|
# CLI Commands
|
|
ollama signin # Sign in to ollama.com
|
|
ollama run gemma3 # Run a model interactively
|
|
ollama pull gemma3 # Download a model
|
|
ollama ps # List running models
|
|
ollama list # List installed models
|
|
|
|
# Check API Status
|
|
curl http://localhost:11434/api/version
|
|
|
|
# Environment Variables (Common)
|
|
export OLLAMA_HOST="0.0.0.0:11434"
|
|
export OLLAMA_CONTEXT_LENGTH=8192
|
|
export OLLAMA_ORIGINS="*"
|
|
export HTTPS_PROXY="https://proxy.example.com"
|
|
```
|