Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:59:39 +08:00
commit 0b993003eb
9 changed files with 2987 additions and 0 deletions

View File

@@ -0,0 +1,507 @@
# Ollama Node.js API Reference
This reference provides comprehensive examples for integrating Ollama into Node.js projects using the official `ollama` npm package.
**IMPORTANT**: Always use streaming responses for better user experience.
## Table of Contents
1. [Package Setup](#package-setup)
2. [Installation & Setup](#installation--setup)
3. [Verifying Ollama Connection](#verifying-ollama-connection)
4. [Model Selection](#model-selection)
5. [Generate API (Text Completion)](#generate-api-text-completion)
6. [Chat API (Conversational)](#chat-api-conversational)
7. [Embeddings](#embeddings)
8. [Error Handling](#error-handling)
## Package Setup
### ES Modules (package.json)
When creating Node.js scripts for users, always use ES modules. Create a `package.json` with:
```json
{
"type": "module",
"dependencies": {
"ollama": "^0.5.0"
}
}
```
This allows using modern `import` syntax instead of `require`.
### Running Scripts
```bash
# Install dependencies
npm install
# Run script
node script.js
```
## Installation & Setup
### Installation
```bash
npm install ollama
```
### Import
```javascript
import { Ollama } from 'ollama';
```
### Configuration
**IMPORTANT**: Always ask users for their Ollama URL. Do not assume localhost.
```javascript
import { Ollama } from 'ollama';
// Create client with custom URL
const ollama = new Ollama({ host: 'http://localhost:11434' });
// Or for remote Ollama instance
// const ollama = new Ollama({ host: 'http://192.168.1.100:11434' });
```
## Verifying Ollama Connection
### Check Connection (Development)
During development, verify Ollama is running and check available models using curl:
```bash
# Check Ollama is running and get version
curl http://localhost:11434/api/version
# List available models
curl http://localhost:11434/api/tags
```
### Check Ollama Version (Node.js)
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function checkOllama() {
try {
// Simple way to verify connection
const models = await ollama.list();
console.log('✓ Connected to Ollama');
console.log(` Available models: ${models.models.length}`);
return true;
} catch (error) {
console.log(`✗ Failed to connect to Ollama: ${error.message}`);
return false;
}
}
// Usage
await checkOllama();
```
## Model Selection
**IMPORTANT**: Always ask users which model they want to use. Don't assume a default.
### Listing Available Models
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function listAvailableModels() {
const { models } = await ollama.list();
return models.map(m => m.name);
}
// Usage - show available models to user
const available = await listAvailableModels();
console.log('Available models:');
available.forEach(model => {
console.log(` - ${model}`);
});
```
### Finding Models
If the user doesn't have a model installed or wants to use a different one:
- **Browse models**: Direct them to https://ollama.com/search
- **Popular choices**: llama3.2, llama3.1, mistral, phi3, qwen2.5
- **Specialized models**: codellama (coding), llava (vision), nomic-embed-text (embeddings)
### Model Selection Flow
```javascript
async function selectModel() {
const available = await listAvailableModels();
if (available.length === 0) {
console.log('No models installed!');
console.log('Visit https://ollama.com/search to find models');
console.log('Then run: ollama pull <model-name>');
return null;
}
console.log('Available models:');
available.forEach((model, i) => {
console.log(` ${i + 1}. ${model}`);
});
// In practice, you'd ask the user to choose
return available[0]; // Default to first available
}
```
## Generate API (Text Completion)
### Streaming Text Generation
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function generateStream(prompt, model = 'llama3.2') {
const response = await ollama.generate({
model: model,
prompt: prompt,
stream: true
});
for await (const chunk of response) {
process.stdout.write(chunk.response);
}
}
// Usage
process.stdout.write('Response: ');
await generateStream('Why is the sky blue?', 'llama3.2');
process.stdout.write('\n');
```
### With Options (Temperature, Top-P, etc.)
```javascript
async function generateWithOptions(prompt, model = 'llama3.2') {
const response = await ollama.generate({
model: model,
prompt: prompt,
stream: true,
options: {
temperature: 0.7,
top_p: 0.9,
top_k: 40,
num_predict: 100 // Max tokens
}
});
for await (const chunk of response) {
process.stdout.write(chunk.response);
}
}
// Usage
process.stdout.write('Response: ');
await generateWithOptions('Write a haiku about programming');
process.stdout.write('\n');
```
## Chat API (Conversational)
### Streaming Chat
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function chatStream(messages, model = 'llama3.2') {
/*
* Chat with a model using conversation history with streaming.
*
* Args:
* messages: Array of message objects with 'role' and 'content'
* role can be 'system', 'user', or 'assistant'
*/
const response = await ollama.chat({
model: model,
messages: messages,
stream: true
});
for await (const chunk of response) {
process.stdout.write(chunk.message.content);
}
}
// Usage
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
];
process.stdout.write('Response: ');
await chatStream(messages);
process.stdout.write('\n');
```
### Multi-turn Conversation
```javascript
import * as readline from 'readline';
async function conversationLoop(model = 'llama3.2') {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
const messages = [
{ role: 'system', content: 'You are a helpful assistant.' }
];
const askQuestion = () => {
rl.question('\nYou: ', async (input) => {
if (input.toLowerCase() === 'exit' || input.toLowerCase() === 'quit') {
rl.close();
return;
}
// Add user message
messages.push({ role: 'user', content: input });
// Stream response
process.stdout.write('Assistant: ');
let fullResponse = '';
const response = await ollama.chat({
model: model,
messages: messages,
stream: true
});
for await (const chunk of response) {
const content = chunk.message.content;
process.stdout.write(content);
fullResponse += content;
}
process.stdout.write('\n');
// Add assistant response to history
messages.push({ role: 'assistant', content: fullResponse });
askQuestion();
});
};
askQuestion();
}
// Usage
await conversationLoop();
```
## Embeddings
### Generate Embeddings
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function getEmbeddings(text, model = 'nomic-embed-text') {
/*
* Generate embeddings for text.
*
* Note: Use an embedding-specific model like 'nomic-embed-text'
* Regular models can generate embeddings, but dedicated models work better.
*/
const response = await ollama.embeddings({
model: model,
prompt: text
});
return response.embedding;
}
// Usage
const embedding = await getEmbeddings('Hello, world!');
console.log(`Embedding dimension: ${embedding.length}`);
console.log(`First 5 values: ${embedding.slice(0, 5)}`);
```
### Semantic Similarity
```javascript
function cosineSimilarity(vec1, vec2) {
const dotProduct = vec1.reduce((sum, val, i) => sum + val * vec2[i], 0);
const magnitude1 = Math.sqrt(vec1.reduce((sum, val) => sum + val * val, 0));
const magnitude2 = Math.sqrt(vec2.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitude1 * magnitude2);
}
// Usage
const text1 = 'The cat sat on the mat';
const text2 = 'A feline rested on a rug';
const text3 = 'JavaScript is a programming language';
const emb1 = await getEmbeddings(text1);
const emb2 = await getEmbeddings(text2);
const emb3 = await getEmbeddings(text3);
console.log(`Similarity 1-2: ${cosineSimilarity(emb1, emb2).toFixed(3)}`); // High
console.log(`Similarity 1-3: ${cosineSimilarity(emb1, emb3).toFixed(3)}`); // Low
```
## Error Handling
### Comprehensive Error Handling
```javascript
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function* safeGenerateStream(prompt, model = 'llama3.2') {
try {
const response = await ollama.generate({
model: model,
prompt: prompt,
stream: true
});
for await (const chunk of response) {
yield chunk.response;
}
} catch (error) {
// Model not found or other API errors
if (error.message.toLowerCase().includes('not found')) {
console.log(`\n✗ Model '${model}' not found`);
console.log(` Run: ollama pull ${model}`);
console.log(` Or browse models at: https://ollama.com/search`);
} else if (error.code === 'ECONNREFUSED') {
console.log('\n✗ Connection failed. Is Ollama running?');
console.log(' Start Ollama with: ollama serve');
} else {
console.log(`\n✗ Unexpected error: ${error.message}`);
}
}
}
// Usage
process.stdout.write('Response: ');
for await (const token of safeGenerateStream('Hello, world!', 'llama3.2')) {
process.stdout.write(token);
}
process.stdout.write('\n');
```
### Checking Model Availability
```javascript
async function ensureModelAvailable(model) {
try {
const { models } = await ollama.list();
const modelNames = models.map(m => m.name);
if (!modelNames.includes(model)) {
console.log(`Model '${model}' not available locally`);
console.log(`Available models: ${modelNames.join(', ')}`);
console.log(`\nTo download: ollama pull ${model}`);
console.log(`Browse models: https://ollama.com/search`);
return false;
}
return true;
} catch (error) {
console.log(`Failed to check models: ${error.message}`);
return false;
}
}
// Usage
if (await ensureModelAvailable('llama3.2')) {
// Proceed with using the model
}
```
## Best Practices
1. **Always Use Streaming**: Stream responses for better user experience
2. **Ask About Models**: Don't assume models - ask users which model they want to use
3. **Verify Connection**: Check Ollama connection during development with curl
4. **Error Handling**: Handle model not found and connection errors gracefully
5. **Context Management**: Manage conversation history to avoid token limits
6. **Model Selection**: Direct users to https://ollama.com/search to find models
7. **Custom Hosts**: Always ask users for their Ollama URL, don't assume localhost
8. **ES Modules**: Use `"type": "module"` in package.json for modern import syntax
## Complete Example Script
```javascript
// script.js
import { Ollama } from 'ollama';
const ollama = new Ollama();
async function main() {
const model = 'llama3.2';
// Check connection
try {
await ollama.list();
} catch (error) {
console.log(`Error: Cannot connect to Ollama - ${error.message}`);
console.log('Make sure Ollama is running: ollama serve');
return;
}
// Stream a response
console.log('Asking about JavaScript...\n');
const response = await ollama.generate({
model: model,
prompt: 'Explain JavaScript in one sentence',
stream: true
});
process.stdout.write('Response: ');
for await (const chunk of response) {
process.stdout.write(chunk.response);
}
process.stdout.write('\n');
}
main();
```
### package.json
```json
{
"type": "module",
"dependencies": {
"ollama": "^0.5.0"
}
}
```
### Running
```bash
npm install
node script.js
```

View File

@@ -0,0 +1,454 @@
# Ollama Python API Reference
This reference provides comprehensive examples for integrating Ollama into Python projects using the official `ollama` Python library.
**IMPORTANT**: Always use streaming responses for better user experience.
## Table of Contents
1. [PEP 723 Inline Script Metadata](#pep-723-inline-script-metadata)
2. [Installation & Setup](#installation--setup)
3. [Verifying Ollama Connection](#verifying-ollama-connection)
4. [Model Selection](#model-selection)
5. [Generate API (Text Completion)](#generate-api-text-completion)
6. [Chat API (Conversational)](#chat-api-conversational)
7. [Embeddings](#embeddings)
8. [Error Handling](#error-handling)
## Installation & Setup
### Installation
```bash
pip install ollama
```
### Import
```python
import ollama
```
### Configuration
**IMPORTANT**: Always ask users for their Ollama URL. Do not assume localhost.
```python
# Create client with custom URL
client = ollama.Client(host='http://localhost:11434')
# Or for remote Ollama instance
# client = ollama.Client(host='http://192.168.1.100:11434')
```
## Verifying Ollama Connection
### Check Connection (Development)
During development, verify Ollama is running and check available models using curl:
```bash
# Check Ollama is running and get version
curl http://localhost:11434/api/version
# List available models
curl http://localhost:11434/api/tags
```
### Check Ollama Version (Python)
```python
import ollama
def check_ollama():
"""Check if Ollama is running."""
try:
# Simple way to verify connection
models = ollama.list()
print(f"✓ Connected to Ollama")
print(f" Available models: {len(models.get('models', []))}")
return True
except Exception as e:
print(f"✗ Failed to connect to Ollama: {e}")
return False
# Usage
check_ollama()
```
## Model Selection
**IMPORTANT**: Always ask users which model they want to use. Don't assume a default.
### Listing Available Models
```python
import ollama
def list_available_models():
"""List all locally installed models."""
models = ollama.list()
return [model['name'] for model in models.get('models', [])]
# Usage - show available models to user
available = list_available_models()
print("Available models:")
for model in available:
print(f" - {model}")
```
### Finding Models
If the user doesn't have a model installed or wants to use a different one:
- **Browse models**: Direct them to https://ollama.com/search
- **Popular choices**: llama3.2, llama3.1, mistral, phi3, qwen2.5
- **Specialized models**: codellama (coding), llava (vision), nomic-embed-text (embeddings)
### Model Selection Flow
```python
def select_model():
"""Interactive model selection."""
available = list_available_models()
if not available:
print("No models installed!")
print("Visit https://ollama.com/search to find models")
print("Then run: ollama pull <model-name>")
return None
print("Available models:")
for i, model in enumerate(available, 1):
print(f" {i}. {model}")
# In practice, you'd ask the user to choose
return available[0] # Default to first available
```
## Generate API (Text Completion)
### Streaming Text Generation
```python
import ollama
def generate_stream(prompt, model="llama3.2"):
"""Generate text with streaming (yields tokens as they arrive)."""
stream = ollama.generate(
model=model,
prompt=prompt,
stream=True
)
for chunk in stream:
yield chunk['response']
# Usage
print("Response: ", end="", flush=True)
for token in generate_stream("Why is the sky blue?", model="llama3.2"):
print(token, end="", flush=True)
print()
```
### With Options (Temperature, Top-P, etc.)
```python
def generate_with_options(prompt, model="llama3.2"):
"""Generate with custom sampling parameters."""
stream = ollama.generate(
model=model,
prompt=prompt,
stream=True,
options={
'temperature': 0.7,
'top_p': 0.9,
'top_k': 40,
'num_predict': 100 # Max tokens
}
)
for chunk in stream:
yield chunk['response']
# Usage
print("Response: ", end="", flush=True)
for token in generate_with_options("Write a haiku about programming"):
print(token, end="", flush=True)
print()
```
## Chat API (Conversational)
### Streaming Chat
```python
import ollama
def chat_stream(messages, model="llama3.2"):
"""
Chat with a model using conversation history with streaming.
Args:
messages: List of message dicts with 'role' and 'content'
role can be 'system', 'user', or 'assistant'
"""
stream = ollama.chat(
model=model,
messages=messages,
stream=True
)
for chunk in stream:
yield chunk['message']['content']
# Usage
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
print("Response: ", end="", flush=True)
for token in chat_stream(messages):
print(token, end="", flush=True)
print()
```
### Multi-turn Conversation
```python
def conversation_loop(model="llama3.2"):
"""Interactive chat loop with streaming responses."""
messages = [
{"role": "system", "content": "You are a helpful assistant."}
]
while True:
user_input = input("\nYou: ")
if user_input.lower() in ['exit', 'quit']:
break
# Add user message
messages.append({"role": "user", "content": user_input})
# Stream response
print("Assistant: ", end="", flush=True)
full_response = ""
for token in chat_stream(messages, model):
print(token, end="", flush=True)
full_response += token
print()
# Add assistant response to history
messages.append({"role": "assistant", "content": full_response})
# Usage
conversation_loop()
```
## Embeddings
### Generate Embeddings
```python
import ollama
def get_embeddings(text, model="nomic-embed-text"):
"""
Generate embeddings for text.
Note: Use an embedding-specific model like 'nomic-embed-text'
Regular models can generate embeddings, but dedicated models work better.
"""
response = ollama.embeddings(
model=model,
prompt=text
)
return response['embedding']
# Usage
embedding = get_embeddings("Hello, world!")
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
```
### Semantic Similarity
```python
import math
def cosine_similarity(vec1, vec2):
"""Calculate cosine similarity between two vectors."""
dot_product = sum(a * b for a, b in zip(vec1, vec2))
magnitude1 = math.sqrt(sum(a * a for a in vec1))
magnitude2 = math.sqrt(sum(b * b for b in vec2))
return dot_product / (magnitude1 * magnitude2)
# Usage
text1 = "The cat sat on the mat"
text2 = "A feline rested on a rug"
text3 = "Python is a programming language"
emb1 = get_embeddings(text1)
emb2 = get_embeddings(text2)
emb3 = get_embeddings(text3)
print(f"Similarity 1-2: {cosine_similarity(emb1, emb2):.3f}") # High
print(f"Similarity 1-3: {cosine_similarity(emb1, emb3):.3f}") # Low
```
## Error Handling
### Comprehensive Error Handling
```python
import ollama
def safe_generate_stream(prompt, model="llama3.2"):
"""Generate with comprehensive error handling."""
try:
stream = ollama.generate(
model=model,
prompt=prompt,
stream=True
)
for chunk in stream:
yield chunk['response']
except ollama.ResponseError as e:
# Model not found or other API errors
if "not found" in str(e).lower():
print(f"\n✗ Model '{model}' not found")
print(f" Run: ollama pull {model}")
print(f" Or browse models at: https://ollama.com/search")
else:
print(f"\n✗ API Error: {e}")
except ConnectionError:
print("\n✗ Connection failed. Is Ollama running?")
print(" Start Ollama with: ollama serve")
except Exception as e:
print(f"\n✗ Unexpected error: {e}")
# Usage
print("Response: ", end="", flush=True)
for token in safe_generate_stream("Hello, world!", model="llama3.2"):
print(token, end="", flush=True)
print()
```
### Checking Model Availability
```python
def ensure_model_available(model):
"""Check if model is available, provide guidance if not."""
try:
available = ollama.list()
model_names = [m['name'] for m in available.get('models', [])]
if model not in model_names:
print(f"Model '{model}' not available locally")
print(f"Available models: {', '.join(model_names)}")
print(f"\nTo download: ollama pull {model}")
print(f"Browse models: https://ollama.com/search")
return False
return True
except Exception as e:
print(f"Failed to check models: {e}")
return False
# Usage
if ensure_model_available("llama3.2"):
# Proceed with using the model
pass
```
## Best Practices
1. **Always Use Streaming**: Stream responses for better user experience
2. **Ask About Models**: Don't assume models - ask users which model they want to use
3. **Verify Connection**: Check Ollama connection during development with curl
4. **Error Handling**: Handle model not found and connection errors gracefully
5. **Context Management**: Manage conversation history to avoid token limits
6. **Model Selection**: Direct users to https://ollama.com/search to find models
7. **Custom Hosts**: Always ask users for their Ollama URL, don't assume localhost
## PEP 723 Inline Script Metadata
When creating standalone Python scripts for users, always include inline script metadata at the top of the file using PEP 723 format. This allows tools like `uv` and `pipx` to automatically manage dependencies.
### Format
```python
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "ollama>=0.1.0",
# ]
# ///
import ollama
# Your code here
```
### Running Scripts
Users can run scripts with PEP 723 metadata using:
```bash
# Using uv (recommended)
uv run script.py
# Using pipx
pipx run script.py
# Traditional approach
pip install ollama
python script.py
```
### Complete Example Script
```python
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "ollama>=0.1.0",
# ]
# ///
import ollama
def main():
"""Simple streaming chat example."""
model = "llama3.2"
# Check connection
try:
ollama.list()
except Exception as e:
print(f"Error: Cannot connect to Ollama - {e}")
print("Make sure Ollama is running: ollama serve")
return
# Stream a response
print("Asking about Python...\n")
stream = ollama.generate(
model=model,
prompt="Explain Python in one sentence",
stream=True
)
print("Response: ", end="", flush=True)
for chunk in stream:
print(chunk['response'], end="", flush=True)
print()
if __name__ == "__main__":
main()
```