Initial commit

2025-11-29 17:59:39 +08:00
commit 0b993003eb
9 changed files with 2987 additions and 0 deletions
--- a/skills/ollama/SKILL.md
+++ b/skills/ollama/SKILL.md
@@ -0,0 +1,450 @@
+---
+name: ollama
+description: Use this if the user wants to connect to Ollama or leverage Ollama in any shape or form inside their project. Guide users integrating Ollama into their projects for local AI inference. Covers installation, connection setup, model management, and API usage for both Python and Node.js. Helps with text generation, chat interfaces, embeddings, streaming responses, and building AI-powered applications using local LLMs.
+---
+
+# Ollama
+
+## Overview
+
+This skill helps users integrate Ollama into their projects for running large language models locally. The skill guides users through setup, connection validation, model management, and API integration for both Python and Node.js applications. Ollama provides a simple API for running models like Llama, Mistral, Gemma, and others locally without cloud dependencies.
+
+## When to Use This Skill
+
+Use this skill when users want to:
+- Run large language models locally on their machine
+- Build AI-powered applications without cloud dependencies
+- Implement text generation, chat, or embeddings functionality
+- Stream LLM responses in real-time
+- Create RAG (Retrieval-Augmented Generation) systems
+- Integrate local AI capabilities into Python or Node.js projects
+- Manage Ollama models (pull, list, delete)
+- Validate Ollama connectivity and troubleshoot connection issues
+
+## Installation and Setup
+
+### Step 1: Collect Ollama URL
+
+**IMPORTANT**: Always ask users for their Ollama URL. Do not assume it's running locally.
+
+Ask the user: "What is your Ollama server URL?"
+
+Common scenarios:
+- **Local installation**: `http://localhost:11434` (default)
+- **Remote server**: `http://192.168.1.100:11434`
+- **Custom port**: `http://localhost:8080`
+- **Docker**: `http://localhost:11434` (if port mapped to 11434)
+
+If the user says they're running Ollama locally or doesn't know the URL, suggest trying `http://localhost:11434`.
+
+### Step 2: Check if Ollama is Installed
+
+Before proceeding, verify if Ollama is installed and running at the provided URL. Users can check by visiting the URL in their browser or running:
+
+```bash
+curl <OLLAMA_URL>/api/version
+```
+
+If Ollama is not installed, guide users to install it:
+
+**macOS/Linux:**
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+**Windows:**
+Download from https://ollama.com/download
+
+**Docker:**
+```bash
+docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
+```
+
+### Step 3: Start Ollama Service
+
+Ensure Ollama is running:
+
+**macOS/Linux:**
+```bash
+ollama serve
+```
+
+**Docker:**
+```bash
+docker start ollama
+```
+
+The service typically runs at `http://localhost:11434` by default.
+
+### Step 4: Validate Connection
+
+Use the validation script to test connectivity and list available models.
+
+**IMPORTANT**: The script path is relative to the skill directory. When running the script, either:
+1. Use the full path from the skill directory (e.g., `/path/to/ollama/scripts/validate_connection.py`)
+2. Change to the skill directory first and then run `python scripts/validate_connection.py`
+
+```bash
+# Run from the skill directory
+cd /path/to/ollama
+python scripts/validate_connection.py <OLLAMA_URL>
+```
+
+Example with the user's Ollama URL:
+```bash
+cd /path/to/ollama
+python scripts/validate_connection.py http://192.168.1.100:11434
+```
+
+The script will:
+- Normalize the URL (remove any path components)
+- Check if Ollama is accessible
+- Display the Ollama version
+- List all installed models with sizes
+- Provide troubleshooting guidance if connection fails
+
+**Success output:**
+```
+✓ Connection successful!
+  URL: http://localhost:11434
+  Version: Ollama 0.1.0
+  Models available: 2
+
+Installed models:
+  - llama3.2 (4.7 GB)
+  - mistral (7.2 GB)
+```
+
+**Failure output:**
+```
+✗ Connection failed: Connection refused
+  URL: http://localhost:11434
+
+Troubleshooting:
+  1. Ensure Ollama is installed and running
+  2. Check that the URL is correct
+  3. Verify Ollama is accessible at the specified URL
+  4. Try: curl http://localhost:11434/api/version
+```
+
+## Model Management
+
+### Pulling Models
+
+Help users download models from the Ollama library. Common models include:
+
+- `llama3.2` - Meta's Llama 3.2 (various sizes: 1B, 3B)
+- `llama3.1` - Meta's Llama 3.1 (8B, 70B, 405B)
+- `mistral` - Mistral 7B
+- `phi3` - Microsoft Phi-3
+- `gemma2` - Google Gemma 2
+
+Users can pull models using:
+```bash
+ollama pull llama3.2
+```
+
+Or programmatically using the API (examples in reference docs).
+
+### Listing Models
+
+Guide users to list installed models:
+```bash
+ollama list
+```
+
+Or use the validation script to see models with detailed information.
+
+### Removing Models
+
+Help users delete models to free space:
+```bash
+ollama rm llama3.2
+```
+
+### Model Selection Guidance
+
+Help users choose appropriate models based on their needs:
+
+- **Small models (1-3B)**: Fast, good for simple tasks, lower resource requirements
+- **Medium models (7-13B)**: Balanced performance and quality
+- **Large models (70B+)**: Best quality, require significant resources
+
+## Implementation Guidance
+
+### Python Projects
+
+For Python-based projects, refer to the Python API reference:
+
+- **File**: `references/python_api.md`
+- **Usage**: Load this reference when implementing Python integrations
+- **Contains**:
+  - REST API examples using `urllib.request` (standard library)
+  - Text generation with the Generate API
+  - Conversational interfaces with the Chat API
+  - **Streaming responses for real-time output (RECOMMENDED)**
+  - Embeddings for semantic search
+  - Complete RAG system example
+  - Error handling patterns
+  - PEP 723 inline script metadata for dependencies
+- **No dependencies required**: Uses only Python standard library
+
+**IMPORTANT**: When creating Python scripts for users, include PEP 723 inline script metadata to declare dependencies. See the reference docs for examples.
+
+**DEFAULT TO STREAMING**: When implementing text generation or chat, use streaming responses unless the user explicitly requests non-streaming.
+
+Common Python use cases:
+```python
+# Streaming text generation (RECOMMENDED)
+for token in generate_stream("Explain quantum computing"):
+    print(token, end="", flush=True)
+
+# Streaming chat conversation (RECOMMENDED)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is the capital of France?"}
+]
+for token in chat_stream(messages):
+    print(token, end="", flush=True)
+
+# Non-streaming (use only when needed)
+response = generate("Explain quantum computing")
+
+# Embeddings for semantic search
+embedding = get_embeddings("Hello, world!")
+```
+
+### Node.js Projects
+
+For Node.js-based projects, refer to the Node.js API reference:
+
+- **File**: `references/nodejs_api.md`
+- **Usage**: Load this reference when implementing Node.js integrations
+- **Contains**:
+  - Official `ollama` npm package examples
+  - Alternative Fetch API examples (Node.js 18+)
+  - Text generation and chat APIs
+  - **Streaming with async iterators (RECOMMENDED)**
+  - Embeddings and semantic similarity
+  - Complete RAG system example
+  - Error handling and retry logic
+  - TypeScript support examples
+
+Installation:
+```bash
+npm install ollama
+```
+
+**DEFAULT TO STREAMING**: When implementing text generation or chat, use streaming responses unless the user explicitly requests non-streaming.
+
+Common Node.js use cases:
+```javascript
+import { Ollama } from 'ollama';
+const ollama = new Ollama();
+
+// Streaming text generation (RECOMMENDED)
+const stream = await ollama.generate({
+  model: 'llama3.2',
+  prompt: 'Explain quantum computing',
+  stream: true
+});
+
+for await (const chunk of stream) {
+  process.stdout.write(chunk.response);
+}
+
+// Streaming chat conversation (RECOMMENDED)
+const chatStream = await ollama.chat({
+  model: 'llama3.2',
+  messages: [
+    { role: 'system', content: 'You are a helpful assistant.' },
+    { role: 'user', content: 'What is the capital of France?' }
+  ],
+  stream: true
+});
+
+for await (const chunk of chatStream) {
+  process.stdout.write(chunk.message.content);
+}
+
+// Non-streaming (use only when needed)
+const response = await ollama.generate({
+  model: 'llama3.2',
+  prompt: 'Explain quantum computing'
+});
+
+// Embeddings
+const embedding = await ollama.embeddings({
+  model: 'llama3.2',
+  prompt: 'Hello, world!'
+});
+```
+
+## Common Integration Patterns
+
+### Text Generation
+
+Generate text completions from prompts. Use cases:
+- Content generation
+- Code completion
+- Question answering
+- Summarization
+
+Guide users to use the Generate API with appropriate parameters (temperature, top_p, etc.) for their use case.
+
+### Conversational Interfaces
+
+Build chat applications with conversation history. Use cases:
+- Chatbots
+- Virtual assistants
+- Customer support
+- Interactive tutorials
+
+Guide users to use the Chat API with message history management. Explain the importance of system prompts for behavior control.
+
+### Embeddings & Semantic Search
+
+Generate vector embeddings for text. Use cases:
+- Semantic search
+- Document similarity
+- RAG systems
+- Recommendation systems
+
+Guide users to use the Embeddings API and implement cosine similarity for comparing embeddings.
+
+### Streaming Responses
+
+**RECOMMENDED APPROACH**: Always prefer streaming for better user experience.
+
+Stream LLM output token-by-token. Use cases:
+- Real-time chat interfaces
+- Progressive content generation
+- Better user experience for long outputs
+- Immediate feedback to users
+
+**When creating code for users, default to streaming API** unless they specifically request non-streaming responses.
+
+Guide users to:
+- Enable `stream: true` in API calls
+- Handle async iteration (Node.js) or generators (Python)
+- Display tokens as they arrive for real-time feedback
+- Show progress indicators during generation
+
+### RAG (Retrieval-Augmented Generation)
+
+Combine document retrieval with generation. Use cases:
+- Question answering over documents
+- Knowledge base chatbots
+- Context-aware assistance
+
+Guide users to:
+1. Generate embeddings for documents
+2. Store embeddings with associated text
+3. Search for relevant documents using query embeddings
+4. Inject retrieved context into prompts
+5. Generate answers with context
+
+Both reference docs include complete RAG system examples.
+
+## Best Practices
+
+### Security
+- Never hardcode sensitive information
+- Use environment variables for configuration
+- Validate and sanitize user inputs before sending to LLM
+
+### Performance
+- Use streaming for long responses to improve perceived performance
+- Cache embeddings for documents that don't change
+- Choose appropriate model sizes for your use case
+- Consider response time requirements when selecting models
+
+### Error Handling
+- Always implement proper error handling for network failures
+- Check model availability before making requests
+- Provide helpful error messages to users
+- Implement retry logic for transient failures
+
+### Connection Management
+- Validate connections before proceeding with implementation
+- Handle connection timeouts gracefully
+- For remote Ollama instances, ensure network accessibility
+- Use the validation script during development
+
+### Model Management
+- Check available disk space before pulling large models
+- Keep only models you actively use
+- Inform users about model download sizes
+- Provide model selection guidance based on requirements
+
+### Context Management
+- For chat applications, manage conversation history to avoid token limits
+- Trim old messages when conversations get too long
+- Consider using summarization for long conversation histories
+
+## Troubleshooting
+
+### Connection Issues
+
+If connection fails:
+1. Verify Ollama is installed: `ollama --version`
+2. Check if Ollama is running: `curl http://localhost:11434/api/version`
+3. Restart Ollama service: `ollama serve`
+4. Check firewall settings for remote connections
+5. Verify the URL format (should be `http://host:port` with no path)
+
+### Model Not Found
+
+If model is not available:
+1. List installed models: `ollama list`
+2. Pull the required model: `ollama pull model-name`
+3. Verify model name spelling (case-sensitive)
+
+### Out of Memory
+
+If running out of memory:
+1. Use a smaller model variant
+2. Close other applications
+3. Increase system swap space
+4. Consider using a machine with more RAM
+
+### Slow Performance
+
+If responses are slow:
+1. Use a smaller model
+2. Reduce `num_predict` parameter
+3. Check CPU/GPU usage
+4. Ensure Ollama is using GPU if available
+5. Close other resource-intensive applications
+
+## Resources
+
+### scripts/validate_connection.py
+Python script to validate Ollama connection and list available models. Normalizes URLs, tests connectivity, displays version information, and provides troubleshooting guidance.
+
+### references/python_api.md
+Comprehensive Python API reference with examples for:
+- Installation and setup
+- Connection verification
+- Model management (list, pull, delete)
+- Generate API for text completion
+- Chat API for conversations
+- Streaming responses
+- Embeddings and semantic search
+- Complete RAG system implementation
+- Error handling patterns
+- Best practices
+
+### references/nodejs_api.md
+Comprehensive Node.js API reference with examples for:
+- Installation using npm
+- Official `ollama` package usage
+- Alternative Fetch API examples
+- Model management
+- Generate and Chat APIs
+- Streaming with async iterators
+- Embeddings and semantic similarity
+- Complete RAG system implementation
+- Error handling and retry logic
+- TypeScript support
+- Best practices
--- a/skills/ollama/references/nodejs_api.md
+++ b/skills/ollama/references/nodejs_api.md
@@ -0,0 +1,507 @@
+# Ollama Node.js API Reference
+
+This reference provides comprehensive examples for integrating Ollama into Node.js projects using the official `ollama` npm package.
+
+**IMPORTANT**: Always use streaming responses for better user experience.
+
+## Table of Contents
+
+1. [Package Setup](#package-setup)
+2. [Installation & Setup](#installation--setup)
+3. [Verifying Ollama Connection](#verifying-ollama-connection)
+4. [Model Selection](#model-selection)
+5. [Generate API (Text Completion)](#generate-api-text-completion)
+6. [Chat API (Conversational)](#chat-api-conversational)
+7. [Embeddings](#embeddings)
+8. [Error Handling](#error-handling)
+
+## Package Setup
+
+### ES Modules (package.json)
+
+When creating Node.js scripts for users, always use ES modules. Create a `package.json` with:
+
+```json
+{
+  "type": "module",
+  "dependencies": {
+    "ollama": "^0.5.0"
+  }
+}
+```
+
+This allows using modern `import` syntax instead of `require`.
+
+### Running Scripts
+
+```bash
+# Install dependencies
+npm install
+
+# Run script
+node script.js
+```
+
+## Installation & Setup
+
+### Installation
+
+```bash
+npm install ollama
+```
+
+### Import
+
+```javascript
+import { Ollama } from 'ollama';
+```
+
+### Configuration
+
+**IMPORTANT**: Always ask users for their Ollama URL. Do not assume localhost.
+
+```javascript
+import { Ollama } from 'ollama';
+
+// Create client with custom URL
+const ollama = new Ollama({ host: 'http://localhost:11434' });
+
+// Or for remote Ollama instance
+// const ollama = new Ollama({ host: 'http://192.168.1.100:11434' });
+```
+
+## Verifying Ollama Connection
+
+### Check Connection (Development)
+
+During development, verify Ollama is running and check available models using curl:
+
+```bash
+# Check Ollama is running and get version
+curl http://localhost:11434/api/version
+
+# List available models
+curl http://localhost:11434/api/tags
+```
+
+### Check Ollama Version (Node.js)
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function checkOllama() {
+  try {
+    // Simple way to verify connection
+    const models = await ollama.list();
+    console.log('✓ Connected to Ollama');
+    console.log(`  Available models: ${models.models.length}`);
+    return true;
+  } catch (error) {
+    console.log(`✗ Failed to connect to Ollama: ${error.message}`);
+    return false;
+  }
+}
+
+// Usage
+await checkOllama();
+```
+
+## Model Selection
+
+**IMPORTANT**: Always ask users which model they want to use. Don't assume a default.
+
+### Listing Available Models
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function listAvailableModels() {
+  const { models } = await ollama.list();
+  return models.map(m => m.name);
+}
+
+// Usage - show available models to user
+const available = await listAvailableModels();
+console.log('Available models:');
+available.forEach(model => {
+  console.log(`  - ${model}`);
+});
+```
+
+### Finding Models
+
+If the user doesn't have a model installed or wants to use a different one:
+- **Browse models**: Direct them to https://ollama.com/search
+- **Popular choices**: llama3.2, llama3.1, mistral, phi3, qwen2.5
+- **Specialized models**: codellama (coding), llava (vision), nomic-embed-text (embeddings)
+
+### Model Selection Flow
+
+```javascript
+async function selectModel() {
+  const available = await listAvailableModels();
+
+  if (available.length === 0) {
+    console.log('No models installed!');
+    console.log('Visit https://ollama.com/search to find models');
+    console.log('Then run: ollama pull <model-name>');
+    return null;
+  }
+
+  console.log('Available models:');
+  available.forEach((model, i) => {
+    console.log(`  ${i + 1}. ${model}`);
+  });
+
+  // In practice, you'd ask the user to choose
+  return available[0];  // Default to first available
+}
+```
+
+## Generate API (Text Completion)
+
+### Streaming Text Generation
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function generateStream(prompt, model = 'llama3.2') {
+  const response = await ollama.generate({
+    model: model,
+    prompt: prompt,
+    stream: true
+  });
+
+  for await (const chunk of response) {
+    process.stdout.write(chunk.response);
+  }
+}
+
+// Usage
+process.stdout.write('Response: ');
+await generateStream('Why is the sky blue?', 'llama3.2');
+process.stdout.write('\n');
+```
+
+### With Options (Temperature, Top-P, etc.)
+
+```javascript
+async function generateWithOptions(prompt, model = 'llama3.2') {
+  const response = await ollama.generate({
+    model: model,
+    prompt: prompt,
+    stream: true,
+    options: {
+      temperature: 0.7,
+      top_p: 0.9,
+      top_k: 40,
+      num_predict: 100  // Max tokens
+    }
+  });
+
+  for await (const chunk of response) {
+    process.stdout.write(chunk.response);
+  }
+}
+
+// Usage
+process.stdout.write('Response: ');
+await generateWithOptions('Write a haiku about programming');
+process.stdout.write('\n');
+```
+
+## Chat API (Conversational)
+
+### Streaming Chat
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function chatStream(messages, model = 'llama3.2') {
+  /*
+   * Chat with a model using conversation history with streaming.
+   *
+   * Args:
+   *   messages: Array of message objects with 'role' and 'content'
+   *            role can be 'system', 'user', or 'assistant'
+   */
+  const response = await ollama.chat({
+    model: model,
+    messages: messages,
+    stream: true
+  });
+
+  for await (const chunk of response) {
+    process.stdout.write(chunk.message.content);
+  }
+}
+
+// Usage
+const messages = [
+  { role: 'system', content: 'You are a helpful assistant.' },
+  { role: 'user', content: 'What is the capital of France?' }
+];
+
+process.stdout.write('Response: ');
+await chatStream(messages);
+process.stdout.write('\n');
+```
+
+### Multi-turn Conversation
+
+```javascript
+import * as readline from 'readline';
+
+async function conversationLoop(model = 'llama3.2') {
+  const rl = readline.createInterface({
+    input: process.stdin,
+    output: process.stdout
+  });
+
+  const messages = [
+    { role: 'system', content: 'You are a helpful assistant.' }
+  ];
+
+  const askQuestion = () => {
+    rl.question('\nYou: ', async (input) => {
+      if (input.toLowerCase() === 'exit' || input.toLowerCase() === 'quit') {
+        rl.close();
+        return;
+      }
+
+      // Add user message
+      messages.push({ role: 'user', content: input });
+
+      // Stream response
+      process.stdout.write('Assistant: ');
+      let fullResponse = '';
+
+      const response = await ollama.chat({
+        model: model,
+        messages: messages,
+        stream: true
+      });
+
+      for await (const chunk of response) {
+        const content = chunk.message.content;
+        process.stdout.write(content);
+        fullResponse += content;
+      }
+      process.stdout.write('\n');
+
+      // Add assistant response to history
+      messages.push({ role: 'assistant', content: fullResponse });
+
+      askQuestion();
+    });
+  };
+
+  askQuestion();
+}
+
+// Usage
+await conversationLoop();
+```
+
+## Embeddings
+
+### Generate Embeddings
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function getEmbeddings(text, model = 'nomic-embed-text') {
+  /*
+   * Generate embeddings for text.
+   *
+   * Note: Use an embedding-specific model like 'nomic-embed-text'
+   * Regular models can generate embeddings, but dedicated models work better.
+   */
+  const response = await ollama.embeddings({
+    model: model,
+    prompt: text
+  });
+
+  return response.embedding;
+}
+
+// Usage
+const embedding = await getEmbeddings('Hello, world!');
+console.log(`Embedding dimension: ${embedding.length}`);
+console.log(`First 5 values: ${embedding.slice(0, 5)}`);
+```
+
+### Semantic Similarity
+
+```javascript
+function cosineSimilarity(vec1, vec2) {
+  const dotProduct = vec1.reduce((sum, val, i) => sum + val * vec2[i], 0);
+  const magnitude1 = Math.sqrt(vec1.reduce((sum, val) => sum + val * val, 0));
+  const magnitude2 = Math.sqrt(vec2.reduce((sum, val) => sum + val * val, 0));
+  return dotProduct / (magnitude1 * magnitude2);
+}
+
+// Usage
+const text1 = 'The cat sat on the mat';
+const text2 = 'A feline rested on a rug';
+const text3 = 'JavaScript is a programming language';
+
+const emb1 = await getEmbeddings(text1);
+const emb2 = await getEmbeddings(text2);
+const emb3 = await getEmbeddings(text3);
+
+console.log(`Similarity 1-2: ${cosineSimilarity(emb1, emb2).toFixed(3)}`);  // High
+console.log(`Similarity 1-3: ${cosineSimilarity(emb1, emb3).toFixed(3)}`);  // Low
+```
+
+## Error Handling
+
+### Comprehensive Error Handling
+
+```javascript
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function* safeGenerateStream(prompt, model = 'llama3.2') {
+  try {
+    const response = await ollama.generate({
+      model: model,
+      prompt: prompt,
+      stream: true
+    });
+
+    for await (const chunk of response) {
+      yield chunk.response;
+    }
+
+  } catch (error) {
+    // Model not found or other API errors
+    if (error.message.toLowerCase().includes('not found')) {
+      console.log(`\n✗ Model '${model}' not found`);
+      console.log(`  Run: ollama pull ${model}`);
+      console.log(`  Or browse models at: https://ollama.com/search`);
+    } else if (error.code === 'ECONNREFUSED') {
+      console.log('\n✗ Connection failed. Is Ollama running?');
+      console.log('  Start Ollama with: ollama serve');
+    } else {
+      console.log(`\n✗ Unexpected error: ${error.message}`);
+    }
+  }
+}
+
+// Usage
+process.stdout.write('Response: ');
+for await (const token of safeGenerateStream('Hello, world!', 'llama3.2')) {
+  process.stdout.write(token);
+}
+process.stdout.write('\n');
+```
+
+### Checking Model Availability
+
+```javascript
+async function ensureModelAvailable(model) {
+  try {
+    const { models } = await ollama.list();
+    const modelNames = models.map(m => m.name);
+
+    if (!modelNames.includes(model)) {
+      console.log(`Model '${model}' not available locally`);
+      console.log(`Available models: ${modelNames.join(', ')}`);
+      console.log(`\nTo download: ollama pull ${model}`);
+      console.log(`Browse models: https://ollama.com/search`);
+      return false;
+    }
+
+    return true;
+
+  } catch (error) {
+    console.log(`Failed to check models: ${error.message}`);
+    return false;
+  }
+}
+
+// Usage
+if (await ensureModelAvailable('llama3.2')) {
+  // Proceed with using the model
+}
+```
+
+## Best Practices
+
+1. **Always Use Streaming**: Stream responses for better user experience
+2. **Ask About Models**: Don't assume models - ask users which model they want to use
+3. **Verify Connection**: Check Ollama connection during development with curl
+4. **Error Handling**: Handle model not found and connection errors gracefully
+5. **Context Management**: Manage conversation history to avoid token limits
+6. **Model Selection**: Direct users to https://ollama.com/search to find models
+7. **Custom Hosts**: Always ask users for their Ollama URL, don't assume localhost
+8. **ES Modules**: Use `"type": "module"` in package.json for modern import syntax
+
+## Complete Example Script
+
+```javascript
+// script.js
+import { Ollama } from 'ollama';
+
+const ollama = new Ollama();
+
+async function main() {
+  const model = 'llama3.2';
+
+  // Check connection
+  try {
+    await ollama.list();
+  } catch (error) {
+    console.log(`Error: Cannot connect to Ollama - ${error.message}`);
+    console.log('Make sure Ollama is running: ollama serve');
+    return;
+  }
+
+  // Stream a response
+  console.log('Asking about JavaScript...\n');
+
+  const response = await ollama.generate({
+    model: model,
+    prompt: 'Explain JavaScript in one sentence',
+    stream: true
+  });
+
+  process.stdout.write('Response: ');
+  for await (const chunk of response) {
+    process.stdout.write(chunk.response);
+  }
+  process.stdout.write('\n');
+}
+
+main();
+```
+
+### package.json
+
+```json
+{
+  "type": "module",
+  "dependencies": {
+    "ollama": "^0.5.0"
+  }
+}
+```
+
+### Running
+
+```bash
+npm install
+node script.js
+```
--- a/skills/ollama/references/python_api.md
+++ b/skills/ollama/references/python_api.md
@@ -0,0 +1,454 @@
+# Ollama Python API Reference
+
+This reference provides comprehensive examples for integrating Ollama into Python projects using the official `ollama` Python library.
+
+**IMPORTANT**: Always use streaming responses for better user experience.
+
+## Table of Contents
+
+1. [PEP 723 Inline Script Metadata](#pep-723-inline-script-metadata)
+2. [Installation & Setup](#installation--setup)
+3. [Verifying Ollama Connection](#verifying-ollama-connection)
+4. [Model Selection](#model-selection)
+5. [Generate API (Text Completion)](#generate-api-text-completion)
+6. [Chat API (Conversational)](#chat-api-conversational)
+7. [Embeddings](#embeddings)
+8. [Error Handling](#error-handling)
+
+## Installation & Setup
+
+### Installation
+
+```bash
+pip install ollama
+```
+
+### Import
+
+```python
+import ollama
+```
+
+### Configuration
+
+**IMPORTANT**: Always ask users for their Ollama URL. Do not assume localhost.
+
+```python
+# Create client with custom URL
+client = ollama.Client(host='http://localhost:11434')
+
+# Or for remote Ollama instance
+# client = ollama.Client(host='http://192.168.1.100:11434')
+```
+
+## Verifying Ollama Connection
+
+### Check Connection (Development)
+
+During development, verify Ollama is running and check available models using curl:
+
+```bash
+# Check Ollama is running and get version
+curl http://localhost:11434/api/version
+
+# List available models
+curl http://localhost:11434/api/tags
+```
+
+### Check Ollama Version (Python)
+
+```python
+import ollama
+
+def check_ollama():
+    """Check if Ollama is running."""
+    try:
+        # Simple way to verify connection
+        models = ollama.list()
+        print(f"✓ Connected to Ollama")
+        print(f"  Available models: {len(models.get('models', []))}")
+        return True
+    except Exception as e:
+        print(f"✗ Failed to connect to Ollama: {e}")
+        return False
+
+# Usage
+check_ollama()
+```
+
+## Model Selection
+
+**IMPORTANT**: Always ask users which model they want to use. Don't assume a default.
+
+### Listing Available Models
+
+```python
+import ollama
+
+def list_available_models():
+    """List all locally installed models."""
+    models = ollama.list()
+    return [model['name'] for model in models.get('models', [])]
+
+# Usage - show available models to user
+available = list_available_models()
+print("Available models:")
+for model in available:
+    print(f"  - {model}")
+```
+
+### Finding Models
+
+If the user doesn't have a model installed or wants to use a different one:
+- **Browse models**: Direct them to https://ollama.com/search
+- **Popular choices**: llama3.2, llama3.1, mistral, phi3, qwen2.5
+- **Specialized models**: codellama (coding), llava (vision), nomic-embed-text (embeddings)
+
+### Model Selection Flow
+
+```python
+def select_model():
+    """Interactive model selection."""
+    available = list_available_models()
+
+    if not available:
+        print("No models installed!")
+        print("Visit https://ollama.com/search to find models")
+        print("Then run: ollama pull <model-name>")
+        return None
+
+    print("Available models:")
+    for i, model in enumerate(available, 1):
+        print(f"  {i}. {model}")
+
+    # In practice, you'd ask the user to choose
+    return available[0]  # Default to first available
+```
+
+## Generate API (Text Completion)
+
+### Streaming Text Generation
+
+```python
+import ollama
+
+def generate_stream(prompt, model="llama3.2"):
+    """Generate text with streaming (yields tokens as they arrive)."""
+    stream = ollama.generate(
+        model=model,
+        prompt=prompt,
+        stream=True
+    )
+
+    for chunk in stream:
+        yield chunk['response']
+
+# Usage
+print("Response: ", end="", flush=True)
+for token in generate_stream("Why is the sky blue?", model="llama3.2"):
+    print(token, end="", flush=True)
+print()
+```
+
+### With Options (Temperature, Top-P, etc.)
+
+```python
+def generate_with_options(prompt, model="llama3.2"):
+    """Generate with custom sampling parameters."""
+    stream = ollama.generate(
+        model=model,
+        prompt=prompt,
+        stream=True,
+        options={
+            'temperature': 0.7,
+            'top_p': 0.9,
+            'top_k': 40,
+            'num_predict': 100  # Max tokens
+        }
+    )
+
+    for chunk in stream:
+        yield chunk['response']
+
+# Usage
+print("Response: ", end="", flush=True)
+for token in generate_with_options("Write a haiku about programming"):
+    print(token, end="", flush=True)
+print()
+```
+
+## Chat API (Conversational)
+
+### Streaming Chat
+
+```python
+import ollama
+
+def chat_stream(messages, model="llama3.2"):
+    """
+    Chat with a model using conversation history with streaming.
+
+    Args:
+        messages: List of message dicts with 'role' and 'content'
+                 role can be 'system', 'user', or 'assistant'
+    """
+    stream = ollama.chat(
+        model=model,
+        messages=messages,
+        stream=True
+    )
+
+    for chunk in stream:
+        yield chunk['message']['content']
+
+# Usage
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is the capital of France?"}
+]
+
+print("Response: ", end="", flush=True)
+for token in chat_stream(messages):
+    print(token, end="", flush=True)
+print()
+```
+
+### Multi-turn Conversation
+
+```python
+def conversation_loop(model="llama3.2"):
+    """Interactive chat loop with streaming responses."""
+    messages = [
+        {"role": "system", "content": "You are a helpful assistant."}
+    ]
+
+    while True:
+        user_input = input("\nYou: ")
+        if user_input.lower() in ['exit', 'quit']:
+            break
+
+        # Add user message
+        messages.append({"role": "user", "content": user_input})
+
+        # Stream response
+        print("Assistant: ", end="", flush=True)
+        full_response = ""
+        for token in chat_stream(messages, model):
+            print(token, end="", flush=True)
+            full_response += token
+        print()
+
+        # Add assistant response to history
+        messages.append({"role": "assistant", "content": full_response})
+
+# Usage
+conversation_loop()
+```
+
+
+## Embeddings
+
+### Generate Embeddings
+
+```python
+import ollama
+
+def get_embeddings(text, model="nomic-embed-text"):
+    """
+    Generate embeddings for text.
+
+    Note: Use an embedding-specific model like 'nomic-embed-text'
+    Regular models can generate embeddings, but dedicated models work better.
+    """
+    response = ollama.embeddings(
+        model=model,
+        prompt=text
+    )
+    return response['embedding']
+
+# Usage
+embedding = get_embeddings("Hello, world!")
+print(f"Embedding dimension: {len(embedding)}")
+print(f"First 5 values: {embedding[:5]}")
+```
+
+### Semantic Similarity
+
+```python
+import math
+
+def cosine_similarity(vec1, vec2):
+    """Calculate cosine similarity between two vectors."""
+    dot_product = sum(a * b for a, b in zip(vec1, vec2))
+    magnitude1 = math.sqrt(sum(a * a for a in vec1))
+    magnitude2 = math.sqrt(sum(b * b for b in vec2))
+    return dot_product / (magnitude1 * magnitude2)
+
+# Usage
+text1 = "The cat sat on the mat"
+text2 = "A feline rested on a rug"
+text3 = "Python is a programming language"
+
+emb1 = get_embeddings(text1)
+emb2 = get_embeddings(text2)
+emb3 = get_embeddings(text3)
+
+print(f"Similarity 1-2: {cosine_similarity(emb1, emb2):.3f}")  # High
+print(f"Similarity 1-3: {cosine_similarity(emb1, emb3):.3f}")  # Low
+```
+
+## Error Handling
+
+### Comprehensive Error Handling
+
+```python
+import ollama
+
+def safe_generate_stream(prompt, model="llama3.2"):
+    """Generate with comprehensive error handling."""
+    try:
+        stream = ollama.generate(
+            model=model,
+            prompt=prompt,
+            stream=True
+        )
+
+        for chunk in stream:
+            yield chunk['response']
+
+    except ollama.ResponseError as e:
+        # Model not found or other API errors
+        if "not found" in str(e).lower():
+            print(f"\n✗ Model '{model}' not found")
+            print(f"  Run: ollama pull {model}")
+            print(f"  Or browse models at: https://ollama.com/search")
+        else:
+            print(f"\n✗ API Error: {e}")
+
+    except ConnectionError:
+        print("\n✗ Connection failed. Is Ollama running?")
+        print("  Start Ollama with: ollama serve")
+
+    except Exception as e:
+        print(f"\n✗ Unexpected error: {e}")
+
+# Usage
+print("Response: ", end="", flush=True)
+for token in safe_generate_stream("Hello, world!", model="llama3.2"):
+    print(token, end="", flush=True)
+print()
+```
+
+### Checking Model Availability
+
+```python
+def ensure_model_available(model):
+    """Check if model is available, provide guidance if not."""
+    try:
+        available = ollama.list()
+        model_names = [m['name'] for m in available.get('models', [])]
+
+        if model not in model_names:
+            print(f"Model '{model}' not available locally")
+            print(f"Available models: {', '.join(model_names)}")
+            print(f"\nTo download: ollama pull {model}")
+            print(f"Browse models: https://ollama.com/search")
+            return False
+
+        return True
+
+    except Exception as e:
+        print(f"Failed to check models: {e}")
+        return False
+
+# Usage
+if ensure_model_available("llama3.2"):
+    # Proceed with using the model
+    pass
+```
+
+## Best Practices
+
+1. **Always Use Streaming**: Stream responses for better user experience
+2. **Ask About Models**: Don't assume models - ask users which model they want to use
+3. **Verify Connection**: Check Ollama connection during development with curl
+4. **Error Handling**: Handle model not found and connection errors gracefully
+5. **Context Management**: Manage conversation history to avoid token limits
+6. **Model Selection**: Direct users to https://ollama.com/search to find models
+7. **Custom Hosts**: Always ask users for their Ollama URL, don't assume localhost
+
+## PEP 723 Inline Script Metadata
+
+When creating standalone Python scripts for users, always include inline script metadata at the top of the file using PEP 723 format. This allows tools like `uv` and `pipx` to automatically manage dependencies.
+
+### Format
+
+```python
+# /// script
+# requires-python = ">=3.8"
+# dependencies = [
+#   "ollama>=0.1.0",
+# ]
+# ///
+
+import ollama
+
+# Your code here
+```
+
+### Running Scripts
+
+Users can run scripts with PEP 723 metadata using:
+
+```bash
+# Using uv (recommended)
+uv run script.py
+
+# Using pipx
+pipx run script.py
+
+# Traditional approach
+pip install ollama
+python script.py
+```
+
+### Complete Example Script
+
+```python
+# /// script
+# requires-python = ">=3.8"
+# dependencies = [
+#   "ollama>=0.1.0",
+# ]
+# ///
+
+import ollama
+
+def main():
+    """Simple streaming chat example."""
+    model = "llama3.2"
+
+    # Check connection
+    try:
+        ollama.list()
+    except Exception as e:
+        print(f"Error: Cannot connect to Ollama - {e}")
+        print("Make sure Ollama is running: ollama serve")
+        return
+
+    # Stream a response
+    print("Asking about Python...\n")
+    stream = ollama.generate(
+        model=model,
+        prompt="Explain Python in one sentence",
+        stream=True
+    )
+
+    print("Response: ", end="", flush=True)
+    for chunk in stream:
+        print(chunk['response'], end="", flush=True)
+    print()
+
+if __name__ == "__main__":
+    main()
+```