Initial commit

2025-11-30 08:24:51 +08:00
commit 8aebb293cd
31 changed files with 7386 additions and 0 deletions
--- a/references/code-execution-patterns.md
+++ b/references/code-execution-patterns.md
@@ -0,0 +1,481 @@
+# Code Execution Patterns
+
+Complete guide to using code execution with Google Gemini API for computational tasks, data analysis, and problem-solving.
+
+---
+
+## What is Code Execution?
+
+Code Execution allows Gemini models to generate and execute Python code to solve problems requiring computation, enabling the model to:
+- Perform precise mathematical calculations
+- Analyze data with pandas/numpy
+- Generate charts and visualizations
+- Implement algorithms
+- Process files and data structures
+
+---
+
+## How It Works
+
+1. **Model receives prompt** requiring computation
+2. **Model generates Python code** to solve the problem
+3. **Code executes in sandbox** (secure, isolated environment)
+4. **Results return to model** for incorporation into response
+5. **Model explains results** in natural language
+
+---
+
+## Enabling Code Execution
+
+### Basic Setup (SDK)
+
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash', // Or gemini-2.5-pro
+  contents: 'Calculate the sum of first 50 prime numbers',
+  config: {
+    tools: [{ codeExecution: {} }] // Enable code execution
+  }
+});
+```
+
+### Basic Setup (Fetch)
+
+```typescript
+const response = await fetch(
+  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
+  {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      'x-goog-api-key': env.GEMINI_API_KEY,
+    },
+    body: JSON.stringify({
+      tools: [{ code_execution: {} }],
+      contents: [{ parts: [{ text: 'Calculate...' }] }]
+    }),
+  }
+);
+```
+
+---
+
+## Available Python Packages
+
+### Standard Library
+- `math`, `statistics`, `random`
+- `datetime`, `time`, `calendar`
+- `json`, `csv`, `re`
+- `collections`, `itertools`, `functools`
+
+### Data Science
+- `numpy` - numerical computing
+- `pandas` - data analysis and manipulation
+- `scipy` - scientific computing
+
+### Visualization
+- `matplotlib` - plotting and charts
+- `seaborn` - statistical visualization
+
+**Note**: This is a **limited sandbox environment** - not all PyPI packages are available.
+
+---
+
+## Response Structure
+
+### Parsing Code Execution Results
+
+```typescript
+for (const part of response.candidates[0].content.parts) {
+  // Inline text
+  if (part.text) {
+    console.log('Text:', part.text);
+  }
+
+  // Generated code
+  if (part.executableCode) {
+    console.log('Language:', part.executableCode.language); // "PYTHON"
+    console.log('Code:', part.executableCode.code);
+  }
+
+  // Execution results
+  if (part.codeExecutionResult) {
+    console.log('Outcome:', part.codeExecutionResult.outcome); // "OUTCOME_OK" or "OUTCOME_FAILED"
+    console.log('Output:', part.codeExecutionResult.output);
+  }
+}
+```
+
+### Example Response
+
+```json
+{
+  "candidates": [{
+    "content": {
+      "parts": [
+        { "text": "I'll calculate that for you." },
+        {
+          "executableCode": {
+            "language": "PYTHON",
+            "code": "primes = []\nnum = 2\nwhile len(primes) < 50:\n  if is_prime(num):\n    primes.append(num)\n  num += 1\nprint(sum(primes))"
+          }
+        },
+        {
+          "codeExecutionResult": {
+            "outcome": "OUTCOME_OK",
+            "output": "5117\n"
+          }
+        },
+        { "text": "The sum is 5117." }
+      ]
+    }
+  }]
+}
+```
+
+---
+
+## Common Patterns
+
+### 1. Mathematical Calculations
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Calculate the 100th Fibonacci number',
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+**Prompting Tip**: Use phrases like "generate and run code" or "calculate using code" to explicitly request code execution.
+
+### 2. Data Analysis
+
+```typescript
+const prompt = `
+  Analyze this sales data:
+
+  month,revenue,customers
+  Jan,50000,120
+  Feb,62000,145
+  Mar,58000,138
+
+  Calculate:
+  1. Total revenue
+  2. Average revenue per customer
+  3. Month-over-month growth rate
+
+  Use pandas or numpy for analysis.
+`;
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: prompt,
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+### 3. Chart Generation
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Create a bar chart showing prime number distribution by last digit (0-9) for primes under 100',
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+**Note**: Chart image data appears in `codeExecutionResult.output` (base64 encoded in some cases).
+
+### 4. Algorithm Implementation
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Implement quicksort and sort this list: [64, 34, 25, 12, 22, 11, 90]. Show the sorted result.',
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+### 5. File Processing (In-Memory)
+
+```typescript
+const csvData = `name,age,city
+Alice,30,NYC
+Bob,25,LA
+Charlie,35,Chicago`;
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: `Parse this CSV data and calculate average age:\n\n${csvData}`,
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+---
+
+## Chat with Code Execution
+
+### Multi-Turn Computational Conversations
+
+```typescript
+const chat = await ai.chats.create({
+  model: 'gemini-2.5-flash',
+  config: { tools: [{ codeExecution: {} }] }
+});
+
+// First turn
+let response = await chat.sendMessage('I have a data analysis question');
+console.log(response.text);
+
+// Second turn (will use code execution)
+response = await chat.sendMessage(`
+  Calculate statistics for: [12, 15, 18, 22, 25, 28, 30]
+  - Mean
+  - Median
+  - Standard deviation
+`);
+
+for (const part of response.candidates[0].content.parts) {
+  if (part.text) console.log(part.text);
+  if (part.executableCode) console.log('Code:', part.executableCode.code);
+  if (part.codeExecutionResult) console.log('Results:', part.codeExecutionResult.output);
+}
+```
+
+---
+
+## Error Handling
+
+### Checking Execution Outcome
+
+```typescript
+for (const part of response.candidates[0].content.parts) {
+  if (part.codeExecutionResult) {
+    if (part.codeExecutionResult.outcome === 'OUTCOME_OK') {
+      console.log('✅ Success:', part.codeExecutionResult.output);
+    } else if (part.codeExecutionResult.outcome === 'OUTCOME_FAILED') {
+      console.error('❌ Execution failed:', part.codeExecutionResult.output);
+    }
+  }
+}
+```
+
+### Common Execution Errors
+
+**Timeout**:
+```
+Error: Execution timed out after 30 seconds
+```
+**Solution**: Simplify computation or reduce data size.
+
+**Import Error**:
+```
+ModuleNotFoundError: No module named 'requests'
+```
+**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy).
+
+**Syntax Error**:
+```
+SyntaxError: invalid syntax
+```
+**Solution**: Model generated invalid code - try rephrasing prompt or regenerating.
+
+---
+
+## Best Practices
+
+### ✅ Do
+
+1. **Be Explicit**: Use phrases like "generate and run code" to trigger code execution
+2. **Provide Data**: Include data directly in prompt for analysis
+3. **Specify Output**: Ask for specific calculations or metrics
+4. **Use Available Packages**: Stick to numpy, pandas, matplotlib, scipy
+5. **Check Outcome**: Always verify `outcome === 'OUTCOME_OK'`
+
+### ❌ Don't
+
+1. **Network Access**: Code cannot make HTTP requests
+2. **File System**: No persistent file storage between executions
+3. **Long Computations**: Timeout limits apply (~30 seconds)
+4. **External Dependencies**: Can't install new packages
+5. **State Persistence**: Each execution is isolated (no global state)
+
+---
+
+## Limitations
+
+### Sandbox Restrictions
+
+- **No Network Access**: Cannot call external APIs
+- **No File I/O**: Cannot read/write to disk (in-memory only)
+- **Limited Packages**: Only pre-installed packages available
+- **Execution Timeout**: ~30 seconds maximum
+- **No State**: Each execution is independent
+
+### Supported Models
+
+✅ **Works with**:
+- `gemini-2.5-pro`
+- `gemini-2.5-flash`
+
+❌ **Does NOT work with**:
+- `gemini-2.5-flash-lite` (no code execution support)
+- Gemini 1.5 models (use Gemini 2.5)
+
+---
+
+## Advanced Patterns
+
+### Iterative Analysis
+
+```typescript
+const chat = await ai.chats.create({
+  model: 'gemini-2.5-flash',
+  config: { tools: [{ codeExecution: {} }] }
+});
+
+// Step 1: Initial analysis
+let response = await chat.sendMessage('Analyze data: [10, 20, 30, 40, 50]');
+
+// Step 2: Follow-up based on results
+response = await chat.sendMessage('Now calculate the variance');
+
+// Step 3: Visualization
+response = await chat.sendMessage('Create a histogram of this data');
+```
+
+### Combining with Function Calling
+
+```typescript
+const weatherFunction = {
+  name: 'get_current_weather',
+  description: 'Get weather for a city',
+  parametersJsonSchema: {
+    type: 'object',
+    properties: { city: { type: 'string' } },
+    required: ['city']
+  }
+};
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Get weather for NYC, LA, Chicago. Calculate the average temperature.',
+  config: {
+    tools: [
+      { functionDeclarations: [weatherFunction] },
+      { codeExecution: {} }
+    ]
+  }
+});
+
+// Model will:
+// 1. Call get_current_weather for each city
+// 2. Generate code to calculate average
+// 3. Return result
+```
+
+### Data Transformation Pipeline
+
+```typescript
+const prompt = `
+  Transform this data:
+  Input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+
+  Pipeline:
+  1. Filter odd numbers
+  2. Square each number
+  3. Calculate sum
+  4. Return result
+
+  Use code to process.
+`;
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: prompt,
+  config: { tools: [{ codeExecution: {} }] }
+});
+```
+
+---
+
+## Optimization Tips
+
+### 1. Clear Instructions
+
+**❌ Vague**:
+```typescript
+contents: 'Analyze this data'
+```
+
+**✅ Specific**:
+```typescript
+contents: 'Calculate mean, median, and standard deviation for: [12, 15, 18, 22, 25]'
+```
+
+### 2. Provide Complete Data
+
+```typescript
+const csvData = `...complete dataset...`;
+const prompt = `Analyze this CSV data:\n\n${csvData}\n\nCalculate total revenue.`;
+```
+
+### 3. Request Code Explicitly
+
+```typescript
+contents: 'Generate and run code to calculate the factorial of 20'
+```
+
+### 4. Handle Large Datasets
+
+For large data, consider:
+- Sampling (analyze subset)
+- Aggregation (group by categories)
+- Pagination (process in chunks)
+
+---
+
+## Troubleshooting
+
+### Code Not Executing
+
+**Symptom**: Response has text but no `executableCode`
+
+**Causes**:
+1. Code execution not enabled (`tools: [{ codeExecution: {} }]`)
+2. Model decided code wasn't necessary
+3. Using `gemini-2.5-flash-lite` (doesn't support code execution)
+
+**Solution**: Be explicit in prompt: "Use code to calculate..."
+
+### Timeout Errors
+
+**Symptom**: `OUTCOME_FAILED` with timeout message
+
+**Causes**: Computation too complex or data too large
+
+**Solution**:
+- Simplify algorithm
+- Reduce data size
+- Use more efficient approach
+
+### Import Errors
+
+**Symptom**: `ModuleNotFoundError`
+
+**Causes**: Trying to import unavailable package
+
+**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy)
+
+---
+
+## References
+
+- Official Docs: https://ai.google.dev/gemini-api/docs/code-execution
+- Templates: See `code-execution.ts` for working examples
+- Available Packages: See "Available Python Packages" section above
--- a/references/context-caching-guide.md
+++ b/references/context-caching-guide.md
@@ -0,0 +1,373 @@
+# Context Caching Guide
+
+Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%.
+
+---
+
+## What is Context Caching?
+
+Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency.
+
+---
+
+## How It Works
+
+1. **Create a cache** with your repeated content (documents, videos, system instructions)
+2. **Set TTL** (time-to-live) for cache expiration
+3. **Reference the cache** in subsequent API calls
+4. **Pay less** - cached tokens cost ~90% less than regular input tokens
+
+---
+
+## Benefits
+
+### Cost Savings
+- **Cached input tokens**: ~90% cheaper than regular tokens
+- **Output tokens**: Same price (not cached)
+- **Example**: 100K token document cached → ~10K token cost equivalent
+
+### Performance
+- **Reduced latency**: Cached content is preprocessed
+- **Faster responses**: No need to reprocess large context
+- **Consistent results**: Same context every time
+
+### Use Cases
+- Large documents analyzed repeatedly
+- Long system instructions used across sessions
+- Video/audio files queried multiple times
+- Consistent conversation context
+
+---
+
+## Cache Creation
+
+### Basic Cache (SDK)
+
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
+
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001', // Must use explicit version!
+  config: {
+    displayName: 'my-cache',
+    systemInstruction: 'You are a helpful assistant.',
+    contents: 'Large document content here...',
+    ttl: '3600s', // 1 hour
+  }
+});
+```
+
+### Cache with Expiration Time
+
+```typescript
+// Set specific expiration time (timezone-aware)
+const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now
+
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'my-cache',
+    contents: documentText,
+    expireTime: expirationTime, // Use expireTime instead of ttl
+  }
+});
+```
+
+---
+
+## TTL (Time-To-Live) Guidelines
+
+### Recommended TTL Values
+
+| Use Case | TTL | Reason |
+|----------|-----|--------|
+| Quick analysis session | 300s (5 min) | Short-lived tasks |
+| Extended conversation | 3600s (1 hour) | Standard session length |
+| Daily batch processing | 86400s (24 hours) | Reuse across day |
+| Long-term analysis | 604800s (7 days) | Maximum allowed |
+
+### TTL vs Expiration Time
+
+**TTL (time-to-live)**:
+- Relative duration from cache creation
+- Format: `"3600s"` (string with 's' suffix)
+- Easy for session-based caching
+
+**Expiration Time**:
+- Absolute timestamp
+- Must be timezone-aware Date object
+- Precise control over cache lifetime
+
+---
+
+## Using a Cache
+
+### Generate Content with Cache (SDK)
+
+```typescript
+// Use cache name as model parameter
+const response = await ai.models.generateContent({
+  model: cache.name, // Use cache.name, not original model name
+  contents: 'Summarize the document'
+});
+
+console.log(response.text);
+```
+
+### Multiple Queries with Same Cache
+
+```typescript
+const queries = [
+  'What are the key points?',
+  'Who are the main characters?',
+  'What is the conclusion?'
+];
+
+for (const query of queries) {
+  const response = await ai.models.generateContent({
+    model: cache.name,
+    contents: query
+  });
+  console.log(`Q: ${query}`);
+  console.log(`A: ${response.text}\n`);
+}
+```
+
+---
+
+## Cache Management
+
+### Update Cache TTL
+
+```typescript
+// Extend cache lifetime before it expires
+await ai.caches.update({
+  name: cache.name,
+  config: {
+    ttl: '7200s' // Extend to 2 hours
+  }
+});
+```
+
+### List All Caches
+
+```typescript
+const caches = await ai.caches.list();
+caches.forEach(cache => {
+  console.log(`${cache.displayName}: ${cache.name}`);
+  console.log(`Expires: ${cache.expireTime}`);
+});
+```
+
+### Delete Cache
+
+```typescript
+// Delete when no longer needed
+await ai.caches.delete({ name: cache.name });
+```
+
+---
+
+## Advanced Use Cases
+
+### Caching Video Files
+
+```typescript
+import fs from 'fs';
+
+// 1. Upload video
+const videoFile = await ai.files.upload({
+  file: fs.createReadStream('./video.mp4')
+});
+
+// 2. Wait for processing
+while (videoFile.state.name === 'PROCESSING') {
+  await new Promise(resolve => setTimeout(resolve, 2000));
+  videoFile = await ai.files.get({ name: videoFile.name });
+}
+
+// 3. Create cache with video
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'video-cache',
+    systemInstruction: 'Analyze this video.',
+    contents: [videoFile],
+    ttl: '600s'
+  }
+});
+
+// 4. Query video multiple times
+const response1 = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'What happens in the first minute?'
+});
+
+const response2 = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'Who are the main people?'
+});
+```
+
+### Caching with System Instructions
+
+```typescript
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'legal-expert-cache',
+    systemInstruction: `
+      You are a legal expert specializing in contract law.
+      Always cite relevant sections when making claims.
+      Use clear, professional language.
+    `,
+    contents: largeContractDocument,
+    ttl: '3600s'
+  }
+});
+
+// System instruction is part of cached context
+const response = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'Is this contract enforceable?'
+});
+```
+
+---
+
+## Important Notes
+
+### Model Version Requirement
+
+**⚠️ You MUST use explicit version suffixes when creating caches:**
+
+```typescript
+// ✅ CORRECT
+model: 'gemini-2.5-flash-001'
+
+// ❌ WRONG (will fail)
+model: 'gemini-2.5-flash'
+```
+
+### Cache Expiration
+
+- Caches are **automatically deleted** after TTL expires
+- **Cannot recover** expired caches - must recreate
+- Update TTL **before expiration** to extend lifetime
+
+### Cost Calculation
+
+```
+Regular request: 100,000 input tokens = 100K token cost
+
+With caching (after cache creation):
+- Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost
+- New tokens: 1,000 × 1.0 = 1K cost
+- Total: 11K equivalent (89% savings!)
+```
+
+### Limitations
+
+- Maximum TTL: 7 days (604800s)
+- Cache creation costs same as regular tokens (first time only)
+- Subsequent uses get 90% discount
+- Only input tokens are cached (output tokens never cached)
+
+---
+
+## Best Practices
+
+### When to Use Caching
+
+✅ **Good Use Cases:**
+- Large documents queried repeatedly (legal docs, research papers)
+- Video/audio files analyzed with different questions
+- Long system instructions used across many requests
+- Consistent context in multi-turn conversations
+
+❌ **Bad Use Cases:**
+- Single-use content (no benefit)
+- Frequently changing content
+- Short content (<1000 tokens) - minimal savings
+- Content used only once per day (cache might expire)
+
+### Optimization Tips
+
+1. **Cache Early**: Create cache at session start
+2. **Extend TTL**: Update before expiration if still needed
+3. **Monitor Usage**: Track how often cache is reused
+4. **Clean Up**: Delete unused caches to avoid clutter
+5. **Combine Features**: Use caching with code execution, grounding for powerful workflows
+
+### Cache Naming
+
+Use descriptive `displayName` for easy identification:
+
+```typescript
+// ✅ Good names
+displayName: 'financial-report-2024-q3'
+displayName: 'legal-contract-acme-corp'
+displayName: 'video-analysis-project-x'
+
+// ❌ Vague names
+displayName: 'cache1'
+displayName: 'test'
+```
+
+---
+
+## Troubleshooting
+
+### "Invalid model name" Error
+
+**Problem**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
+
+**Solution**: Always use explicit version suffix:
+
+```typescript
+model: 'gemini-2.5-flash-001' // Correct
+```
+
+### Cache Expired Error
+
+**Problem**: Trying to use cache after TTL expired
+
+**Solution**: Check expiration before use or extend TTL proactively:
+
+```typescript
+const cache = await ai.caches.get({ name: cacheName });
+if (new Date(cache.expireTime) < new Date()) {
+  // Cache expired, recreate it
+  cache = await ai.caches.create({ ... });
+}
+```
+
+### High Costs Despite Caching
+
+**Problem**: Creating new cache for each request
+
+**Solution**: Reuse the same cache across multiple requests:
+
+```typescript
+// ❌ Wrong - creates new cache each time
+for (const query of queries) {
+  const cache = await ai.caches.create({ ... }); // Expensive!
+  const response = await ai.models.generateContent({ model: cache.name, ... });
+}
+
+// ✅ Correct - create once, use many times
+const cache = await ai.caches.create({ ... }); // Create once
+for (const query of queries) {
+  const response = await ai.models.generateContent({ model: cache.name, ... });
+}
+```
+
+---
+
+## References
+
+- Official Docs: https://ai.google.dev/gemini-api/docs/caching
+- Cost Optimization: See "Cost Optimization" in main SKILL.md
+- Templates: See `context-caching.ts` for working examples
--- a/references/function-calling-patterns.md
+++ b/references/function-calling-patterns.md
@@ -0,0 +1,59 @@
+# Function Calling Patterns
+
+Complete guide to implementing function calling (tool use) with Gemini API.
+
+---
+
+## Basic Pattern
+
+1. Define function declarations
+2. Send request with tools
+3. Check if model wants to call functions
+4. Execute functions
+5. Send results back to model
+6. Get final response
+
+---
+
+## Function Declaration Schema
+
+```typescript
+{
+  name: string,                    // Function name (no spaces)
+  description: string,             // What the function does
+  parametersJsonSchema: {          // Subset of OpenAPI schema
+    type: 'object',
+    properties: {
+      [paramName]: {
+        type: string,              // 'string' | 'number' | 'boolean' | 'array' | 'object'
+        description: string,       // Parameter description
+        enum?: string[]            // Optional: allowed values
+      }
+    },
+    required: string[]            // Required parameter names
+  }
+}
+```
+
+---
+
+## Calling Modes
+
+- **AUTO** (default): Model decides when to call
+- **ANY**: Force at least one function call
+- **NONE**: Disable function calling
+
+---
+
+## Parallel vs Compositional
+
+**Parallel**: Independent functions run simultaneously
+**Compositional**: Sequential dependencies (A → B → C)
+
+Gemini automatically detects which pattern to use.
+
+---
+
+## Official Docs
+
+https://ai.google.dev/gemini-api/docs/function-calling
--- a/references/generation-config.md
+++ b/references/generation-config.md
@@ -0,0 +1,57 @@
+# Generation Configuration Reference
+
+Complete reference for all generation parameters.
+
+---
+
+## All Parameters
+
+```typescript
+config: {
+  temperature: number,        // 0.0-2.0 (default: 1.0)
+  topP: number,              // 0.0-1.0 (default: 0.95)
+  topK: number,              // 1-100+ (default: 40)
+  maxOutputTokens: number,   // 1-65536
+  stopSequences: string[],   // Stop at these strings
+  responseMimeType: string,  // 'text/plain' | 'application/json'
+  candidateCount: number,    // Usually 1
+  thinkingConfig: {
+    thinkingBudget: number   // Max thinking tokens
+  }
+}
+```
+
+---
+
+## Parameter Guidelines
+
+### temperature
+- **0.0**: Deterministic, focused
+- **1.0**: Balanced (default)
+- **2.0**: Very creative, random
+
+### topP (nucleus sampling)
+- **0.95**: Default, good balance
+- Lower = more focused
+
+### topK
+- **40**: Default
+- Higher = more diversity
+
+### maxOutputTokens
+- Always set this to prevent excessive generation
+- Max: 65,536 tokens
+
+---
+
+## Use Cases
+
+**Factual tasks**: temperature=0.0, topP=0.8
+**Creative tasks**: temperature=1.2, topP=0.95
+**Code generation**: temperature=0.3, topP=0.9
+
+---
+
+## Official Docs
+
+https://ai.google.dev/gemini-api/docs/models/generative-models#model-parameters
--- a/references/grounding-guide.md
+++ b/references/grounding-guide.md
@@ -0,0 +1,602 @@
+# Grounding with Google Search Guide
+
+Complete guide to using grounding with Google Search to connect Gemini models to real-time web information, reducing hallucinations and providing verifiable, up-to-date responses.
+
+---
+
+## What is Grounding?
+
+Grounding connects the Gemini model to Google Search, allowing it to:
+- Access real-time information beyond training cutoff
+- Reduce hallucinations with fact-checked web sources
+- Provide citations and source URLs
+- Answer questions about current events
+- Verify information against the web
+
+---
+
+## How It Works
+
+1. **Model receives query** (e.g., "Who won Euro 2024?")
+2. **Model determines** if current information is needed
+3. **Performs Google Search** automatically
+4. **Processes search results** (web pages, snippets)
+5. **Incorporates findings** into response
+6. **Provides citations** with source URLs
+
+---
+
+## Two Grounding APIs
+
+### 1. Google Search (`googleSearch`) - Recommended for Gemini 2.5
+
+**Simple, automatic grounding**:
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Who won the euro 2024?',
+  config: {
+    tools: [{ googleSearch: {} }]
+  }
+});
+```
+
+**Features**:
+- Simple configuration (empty object)
+- Automatic search when model needs current info
+- Available on all Gemini 2.5 models
+- Recommended for new projects
+
+### 2. Google Search Retrieval (`googleSearchRetrieval`) - Legacy for Gemini 1.5
+
+**Dynamic threshold control**:
+
+```typescript
+import { DynamicRetrievalConfigMode } from '@google/genai';
+
+const response = await ai.models.generateContent({
+  model: 'gemini-1.5-flash',
+  contents: 'Who won the euro 2024?',
+  config: {
+    tools: [{
+      googleSearchRetrieval: {
+        dynamicRetrievalConfig: {
+          mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
+          dynamicThreshold: 0.7 // Search only if confidence < 70%
+        }
+      }
+    }]
+  }
+});
+```
+
+**Features**:
+- Control when searches happen via threshold
+- Used with Gemini 1.5 models
+- More configuration options
+
+**Recommendation**: Use `googleSearch` for Gemini 2.5 models (simpler and newer).
+
+---
+
+## Basic Usage
+
+### SDK Approach (Gemini 2.5)
+
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'What are the latest developments in AI?',
+  config: {
+    tools: [{ googleSearch: {} }]
+  }
+});
+
+console.log(response.text);
+
+// Check if grounding was used
+if (response.candidates[0].groundingMetadata) {
+  console.log('✓ Search performed');
+  console.log('Sources:', response.candidates[0].groundingMetadata.webPages);
+} else {
+  console.log('✓ Answered from model knowledge');
+}
+```
+
+### Fetch Approach (Cloudflare Workers)
+
+```typescript
+const response = await fetch(
+  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
+  {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      'x-goog-api-key': env.GEMINI_API_KEY,
+    },
+    body: JSON.stringify({
+      contents: [{ parts: [{ text: 'What are the latest developments in AI?' }] }],
+      tools: [{ google_search: {} }]
+    }),
+  }
+);
+
+const data = await response.json();
+console.log(data.candidates[0].content.parts[0].text);
+```
+
+---
+
+## Grounding Metadata
+
+### Structure
+
+```typescript
+{
+  groundingMetadata: {
+    // Search queries performed
+    searchQueries: [
+      { text: "euro 2024 winner" }
+    ],
+
+    // Web pages retrieved
+    webPages: [
+      {
+        url: "https://example.com/euro-2024",
+        title: "UEFA Euro 2024 Results",
+        snippet: "Spain won UEFA Euro 2024..."
+      }
+    ],
+
+    // Citations (inline references)
+    citations: [
+      {
+        startIndex: 42,
+        endIndex: 47,
+        uri: "https://example.com/euro-2024"
+      }
+    ],
+
+    // Retrieval queries (alternative search terms)
+    retrievalQueries: [
+      { query: "who won euro 2024 final" }
+    ]
+  }
+}
+```
+
+### Accessing Metadata
+
+```typescript
+if (response.candidates[0].groundingMetadata) {
+  const metadata = response.candidates[0].groundingMetadata;
+
+  // Display sources
+  console.log('Sources:');
+  metadata.webPages?.forEach((page, i) => {
+    console.log(`${i + 1}. ${page.title}`);
+    console.log(`   ${page.url}`);
+  });
+
+  // Display citations
+  console.log('\nCitations:');
+  metadata.citations?.forEach((citation) => {
+    console.log(`Position ${citation.startIndex}-${citation.endIndex}: ${citation.uri}`);
+  });
+}
+```
+
+---
+
+## When to Use Grounding
+
+### ✅ Good Use Cases
+
+**Current Events**:
+```typescript
+'What happened in the news today?'
+'Who won the latest sports championship?'
+'What are the current stock prices?'
+```
+
+**Recent Developments**:
+```typescript
+'What are the latest AI breakthroughs?'
+'What are recent changes in climate policy?'
+```
+
+**Fact-Checking**:
+```typescript
+'Is this claim true: [claim]?'
+'What does the latest research say about [topic]?'
+```
+
+**Real-Time Data**:
+```typescript
+'What is the current weather in Tokyo?'
+'What are today's cryptocurrency prices?'
+```
+
+### ❌ Not Recommended For
+
+**General Knowledge**:
+```typescript
+'What is the capital of France?' // Model knows this
+'How does photosynthesis work?' // Stable knowledge
+```
+
+**Mathematical Calculations**:
+```typescript
+'What is 15 * 27?' // Use code execution instead
+```
+
+**Creative Tasks**:
+```typescript
+'Write a poem about autumn' // No search needed
+```
+
+**Code Generation**:
+```typescript
+'Write a sorting algorithm' // Internal reasoning sufficient
+```
+
+---
+
+## Chat with Grounding
+
+### Multi-Turn Conversations
+
+```typescript
+const chat = await ai.chats.create({
+  model: 'gemini-2.5-flash',
+  config: {
+    tools: [{ googleSearch: {} }]
+  }
+});
+
+// First question
+let response = await chat.sendMessage('What are the latest quantum computing developments?');
+console.log(response.text);
+
+// Display sources
+if (response.candidates[0].groundingMetadata) {
+  const sources = response.candidates[0].groundingMetadata.webPages || [];
+  console.log(`\nSources: ${sources.length} web pages`);
+  sources.forEach(s => console.log(`- ${s.title}: ${s.url}`));
+}
+
+// Follow-up question
+response = await chat.sendMessage('Which company made the biggest breakthrough?');
+console.log('\n' + response.text);
+```
+
+---
+
+## Combining with Other Features
+
+### Grounding + Function Calling
+
+```typescript
+const weatherFunction = {
+  name: 'get_current_weather',
+  description: 'Get weather for a location',
+  parametersJsonSchema: {
+    type: 'object',
+    properties: {
+      location: { type: 'string', description: 'City name' }
+    },
+    required: ['location']
+  }
+};
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'What is the weather like in the city that won Euro 2024?',
+  config: {
+    tools: [
+      { googleSearch: {} },           // For finding Euro 2024 winner
+      { functionDeclarations: [weatherFunction] }  // For weather lookup
+    ]
+  }
+});
+
+// Model will:
+// 1. Use Google Search to find Euro 2024 winner (Madrid/Spain)
+// 2. Call get_current_weather function with the city
+// 3. Combine both results in response
+```
+
+### Grounding + Code Execution
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Find the current stock prices for AAPL, GOOGL, MSFT and calculate their average',
+  config: {
+    tools: [
+      { googleSearch: {} },      // For current stock prices
+      { codeExecution: {} }      // For averaging
+    ]
+  }
+});
+
+// Model will:
+// 1. Search for current stock prices
+// 2. Generate code to calculate average
+// 3. Execute code with the found prices
+// 4. Return result with citations
+```
+
+---
+
+## Checking Grounding Usage
+
+### Determine if Search Was Performed
+
+```typescript
+const queries = [
+  'What is 2+2?',                  // Should NOT use search
+  'What happened in the news today?' // Should use search
+];
+
+for (const query of queries) {
+  const response = await ai.models.generateContent({
+    model: 'gemini-2.5-flash',
+    contents: query,
+    config: { tools: [{ googleSearch: {} }] }
+  });
+
+  console.log(`Query: ${query}`);
+  console.log(`Search used: ${response.candidates[0].groundingMetadata ? 'YES' : 'NO'}`);
+  console.log();
+}
+```
+
+**Output**:
+```
+Query: What is 2+2?
+Search used: NO
+
+Query: What happened in the news today?
+Search used: YES
+```
+
+---
+
+## Dynamic Retrieval (Gemini 1.5)
+
+### Threshold-Based Grounding
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-1.5-flash',
+  contents: 'Who won the euro 2024?',
+  config: {
+    tools: [{
+      googleSearchRetrieval: {
+        dynamicRetrievalConfig: {
+          mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
+          dynamicThreshold: 0.7 // Search only if confidence < 70%
+        }
+      }
+    }]
+  }
+});
+
+if (!response.candidates[0].groundingMetadata) {
+  console.log('Model answered from knowledge (confidence >= 70%)');
+} else {
+  console.log('Search performed (confidence < 70%)');
+}
+```
+
+**How It Works**:
+- Model evaluates confidence in its internal knowledge
+- If confidence < threshold → performs search
+- If confidence >= threshold → uses internal knowledge
+
+**Threshold Values**:
+- `0.0`: Never search (always use internal knowledge)
+- `0.5`: Search if moderately uncertain
+- `0.7`: Search if somewhat uncertain (good default)
+- `1.0`: Always search
+
+---
+
+## Best Practices
+
+### ✅ Do
+
+1. **Check Metadata**: Always verify if grounding was used
+   ```typescript
+   if (response.candidates[0].groundingMetadata) { ... }
+   ```
+
+2. **Display Citations**: Show sources to users for transparency
+   ```typescript
+   metadata.webPages.forEach(page => {
+     console.log(`Source: ${page.title} (${page.url})`);
+   });
+   ```
+
+3. **Use Specific Queries**: Better search results with clear questions
+   ```typescript
+   // ✅ Good: "What are Microsoft's Q3 2024 earnings?"
+   // ❌ Vague: "Tell me about Microsoft"
+   ```
+
+4. **Combine Features**: Use with function calling/code execution for powerful workflows
+
+5. **Handle Missing Metadata**: Not all queries trigger search
+   ```typescript
+   const sources = response.candidates[0].groundingMetadata?.webPages || [];
+   ```
+
+### ❌ Don't
+
+1. **Don't Assume Search Always Happens**: Model decides when to search
+2. **Don't Ignore Citations**: They're crucial for fact-checking
+3. **Don't Use for Stable Knowledge**: Waste of resources for unchanging facts
+4. **Don't Expect Perfect Coverage**: Not all information is on the web
+
+---
+
+## Cost and Performance
+
+### Cost Considerations
+
+- **Added Latency**: Search takes 1-3 seconds typically
+- **Token Costs**: Retrieved content counts as input tokens
+- **Rate Limits**: Subject to API rate limits
+
+### Optimization
+
+**Use Dynamic Threshold** (Gemini 1.5):
+```typescript
+dynamicThreshold: 0.7 // Higher = more searches, lower = fewer searches
+```
+
+**Cache Grounding Results** (if appropriate):
+```typescript
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'grounding-cache',
+    tools: [{ googleSearch: {} }],
+    contents: 'Initial query that triggers search...',
+    ttl: '3600s'
+  }
+});
+// Subsequent queries reuse cached grounding results
+```
+
+---
+
+## Troubleshooting
+
+### Grounding Not Working
+
+**Symptom**: No `groundingMetadata` in response
+
+**Causes**:
+1. Grounding not enabled: `tools: [{ googleSearch: {} }]`
+2. Model decided search wasn't needed (query answerable from knowledge)
+3. Google Cloud project not configured (grounding requires GCP)
+
+**Solution**:
+- Verify `tools` configuration
+- Use queries requiring current information
+- Set up Google Cloud project
+
+### Poor Search Quality
+
+**Symptom**: Irrelevant sources or wrong information
+
+**Causes**:
+- Vague query
+- Search terms ambiguous
+- Recent events not yet indexed
+
+**Solution**:
+- Make queries more specific
+- Include context in prompt
+- Verify search queries in metadata
+
+### Citations Missing
+
+**Symptom**: `groundingMetadata` present but no citations
+
+**Explanation**: Citations are **inline references** - they may not always be present if model doesn't directly quote sources.
+
+**Solution**: Check `webPages` instead for full source list
+
+---
+
+## Important Requirements
+
+### Google Cloud Project
+
+**⚠️ Grounding requires a Google Cloud project, not just an API key.**
+
+**Setup**:
+1. Create Google Cloud project
+2. Enable Generative Language API
+3. Configure billing
+4. Use API key from that project
+
+**Error if Missing**:
+```
+Error: Grounding requires Google Cloud project configuration
+```
+
+### Model Support
+
+**✅ Supported**:
+- All Gemini 2.5 models (`googleSearch`)
+- All Gemini 1.5 models (`googleSearchRetrieval`)
+
+**❌ Not Supported**:
+- Gemini 1.0 models
+
+---
+
+## Examples
+
+### News Summary
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Summarize today's top 3 technology news headlines',
+  config: { tools: [{ googleSearch: {} }] }
+});
+
+console.log(response.text);
+metadata.webPages?.forEach((page, i) => {
+  console.log(`${i + 1}. ${page.title}: ${page.url}`);
+});
+```
+
+### Fact Verification
+
+```typescript
+const claim = "The Earth is flat";
+
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: `Is this claim true: "${claim}"? Use reliable sources to verify.`,
+  config: { tools: [{ googleSearch: {} }] }
+});
+
+console.log(response.text);
+```
+
+### Market Research
+
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'What are the current trends in electric vehicle adoption in 2024?',
+  config: { tools: [{ googleSearch: {} }] }
+});
+
+console.log(response.text);
+console.log('\nSources:');
+metadata.webPages?.forEach(page => {
+  console.log(`- ${page.title}`);
+});
+```
+
+---
+
+## References
+
+- Official Docs: https://ai.google.dev/gemini-api/docs/grounding
+- Google Search Docs: https://ai.google.dev/gemini-api/docs/google-search
+- Templates: See `grounding-search.ts` for working examples
+- Combined Features: See `combined-advanced.ts` for integration patterns
--- a/references/models-guide.md
+++ b/references/models-guide.md
@@ -0,0 +1,289 @@
+# Gemini Models Guide (2025)
+
+**Last Updated**: 2025-11-19 (Gemini 3 preview release)
+
+---
+
+## Gemini 3 Series (Preview - November 2025)
+
+### gemini-3-pro-preview
+
+**Model ID**: `gemini-3-pro-preview`
+
+**Status**: 🆕 Preview release (November 18, 2025)
+
+**Context Windows**:
+- Input: TBD (documentation pending)
+- Output: TBD (documentation pending)
+
+**Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.
+
+**Best For**:
+- Most complex reasoning tasks
+- Advanced multimodal analysis (images, videos, PDFs, audio)
+- Benchmark-critical applications
+- Cutting-edge projects requiring latest capabilities
+- Tasks requiring absolute best quality
+
+**Features**:
+- ✅ Enhanced multimodal understanding
+- ✅ Function calling
+- ✅ Streaming
+- ✅ System instructions
+- ✅ JSON mode
+- TBD Thinking mode (documentation pending)
+
+**Knowledge Cutoff**: TBD
+
+**Pricing**: Preview pricing (likely higher than 2.5 Pro)
+
+**⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability.
+
+**New Capabilities**:
+- Record-breaking benchmark performance
+- Enhanced generative UI responses
+- Advanced coding capabilities (Google Antigravity integration)
+- State-of-the-art multimodal understanding
+
+---
+
+## Current Production Models (Gemini 2.5 - Stable)
+
+### gemini-2.5-pro
+
+**Model ID**: `gemini-2.5-pro`
+
+**Context Windows**:
+- Input: 1,048,576 tokens (NOT 2M!)
+- Output: 65,536 tokens
+
+**Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.
+
+**Best For**:
+- Complex reasoning tasks
+- Advanced code generation and optimization
+- Mathematical problem-solving
+- Multi-step logical analysis
+- STEM applications
+
+**Features**:
+- ✅ Thinking mode (enabled by default)
+- ✅ Function calling
+- ✅ Multimodal (text, images, video, audio, PDFs)
+- ✅ Streaming
+- ✅ System instructions
+- ✅ JSON mode
+
+**Knowledge Cutoff**: January 2025
+
+**Pricing**: Higher cost, use for tasks requiring best quality
+
+---
+
+### gemini-2.5-flash
+
+**Model ID**: `gemini-2.5-flash`
+
+**Context Windows**:
+- Input: 1,048,576 tokens
+- Output: 65,536 tokens
+
+**Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.
+
+**Best For**:
+- General-purpose AI applications
+- High-volume API calls
+- Agentic workflows
+- Cost-sensitive applications
+- Production workloads
+
+**Features**:
+- ✅ Thinking mode (enabled by default)
+- ✅ Function calling
+- ✅ Multimodal (text, images, video, audio, PDFs)
+- ✅ Streaming
+- ✅ System instructions
+- ✅ JSON mode
+
+**Knowledge Cutoff**: January 2025
+
+**Pricing**: Best price-performance ratio
+
+**⭐ Recommended**: This is the default choice for most applications
+
+---
+
+### gemini-2.5-flash-lite
+
+**Model ID**: `gemini-2.5-flash-lite`
+
+**Context Windows**:
+- Input: 1,048,576 tokens
+- Output: 65,536 tokens
+
+**Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput.
+
+**Best For**:
+- High-throughput applications
+- Simple text generation
+- Cost-critical use cases
+- Speed-prioritized workloads
+
+**Features**:
+- ✅ Thinking mode (enabled by default)
+- ❌ **NO function calling** (critical limitation!)
+- ✅ Multimodal (text, images, video, audio, PDFs)
+- ✅ Streaming
+- ✅ System instructions
+- ✅ JSON mode
+
+**Knowledge Cutoff**: January 2025
+
+**Pricing**: Lowest cost
+
+**⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.
+
+---
+
+## Model Comparison Matrix
+
+| Feature | Pro | Flash | Flash-Lite |
+|---------|-----|-------|------------|
+| **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON |
+| **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** |
+| **Multimodal** | ✅ Full | ✅ Full | ✅ Full |
+| **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes |
+| **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 |
+| **Output Tokens** | 65,536 | 65,536 | 65,536 |
+| **Reasoning Quality** | Best | Good | Basic |
+| **Speed** | Moderate | Fast | Fastest |
+| **Cost** | Highest | Medium | Lowest |
+
+---
+
+## Previous Generation Models (Still Available)
+
+### Gemini 2.0 Flash
+
+**Model ID**: `gemini-2.0-flash`
+
+**Context**: 1M input / 65K output tokens
+
+**Status**: Previous generation, 2.5 Flash recommended instead
+
+### Gemini 1.5 Pro
+
+**Model ID**: `gemini-1.5-pro`
+
+**Context**: 2M input tokens (this is the ONLY model with 2M!)
+
+**Status**: Older model, 2.5 models recommended
+
+---
+
+## Context Window Clarification
+
+**⚠️ CRITICAL CORRECTION**:
+
+**ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million)
+
+**INACCURATE**: Claiming Gemini 2.5 has 2M token context window
+
+**WHY THIS MATTERS**:
+- Gemini 1.5 Pro (older model) had 2M tokens
+- Gemini 2.5 models (current) have ~1M tokens
+- This is a common mistake that causes confusion!
+
+**This skill prevents this error by providing accurate information.**
+
+---
+
+## Model Selection Guide
+
+### Use gemini-2.5-pro When:
+- ✅ Complex reasoning required (math, logic, STEM)
+- ✅ Advanced code generation and optimization
+- ✅ Multi-step problem-solving
+- ✅ Quality is more important than cost
+- ✅ Tasks require maximum capability
+
+### Use gemini-2.5-flash When:
+- ✅ General-purpose AI applications
+- ✅ High-volume production workloads
+- ✅ Function calling required
+- ✅ Agentic workflows
+- ✅ Good balance of cost and quality needed
+- ⭐ **Recommended default choice**
+
+### Use gemini-2.5-flash-lite When:
+- ✅ Simple text generation only
+- ✅ No function calling needed
+- ✅ High throughput required
+- ✅ Cost is primary concern
+- ⚠️ **Only if you don't need function calling!**
+
+---
+
+## Common Mistakes
+
+### ❌ Mistake 1: Using Wrong Model Name
+```typescript
+// WRONG - old model name
+model: 'gemini-1.5-pro'
+
+// CORRECT - current model
+model: 'gemini-2.5-flash'
+```
+
+### ❌ Mistake 2: Claiming 2M Context for 2.5 Models
+```typescript
+// WRONG ASSUMPTION
+// "Gemini 2.5 has 2M token context window"
+
+// CORRECT
+// Gemini 2.5 has 1,048,576 input tokens
+// Only Gemini 1.5 Pro (older) had 2M
+```
+
+### ❌ Mistake 3: Using Flash-Lite for Function Calling
+```typescript
+// WRONG - Flash-Lite doesn't support function calling!
+model: 'gemini-2.5-flash-lite',
+config: {
+  tools: [{ functionDeclarations: [...] }] // This will FAIL
+}
+
+// CORRECT
+model: 'gemini-2.5-flash', // or gemini-2.5-pro
+config: {
+  tools: [{ functionDeclarations: [...] }]
+}
+```
+
+---
+
+## Rate Limits (Free vs Paid)
+
+### Free Tier
+- **15 RPM** (requests per minute)
+- **1M TPM** (tokens per minute)
+- **1,500 RPD** (requests per day)
+
+### Paid Tier
+- **360 RPM**
+- **4M TPM**
+- Unlimited daily requests
+
+**Tip**: Monitor your usage and implement rate limiting to stay within quotas.
+
+---
+
+## Official Documentation
+
+- **Models Overview**: https://ai.google.dev/gemini-api/docs/models
+- **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
+- **Pricing**: https://ai.google.dev/pricing
+
+---
+
+**Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).
--- a/references/multimodal-guide.md
+++ b/references/multimodal-guide.md
@@ -0,0 +1,58 @@
+# Multimodal Guide
+
+Complete guide to using images, video, audio, and PDFs with Gemini API.
+
+---
+
+## Supported Formats
+
+### Images
+- JPEG, PNG, WebP, HEIC, HEIF
+- Max size: 20MB
+
+### Video
+- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
+- Max size: 2GB
+- Max length (inline): 2 minutes
+
+### Audio
+- MP3, WAV, FLAC, AAC, OGG, OPUS
+- Max size: 20MB
+
+### PDFs
+- Max size: 30MB
+- Text-based PDFs work best
+
+---
+
+## Usage Pattern
+
+```typescript
+contents: [
+  {
+    parts: [
+      { text: 'Your question' },
+      {
+        inlineData: {
+          data: base64EncodedData,
+          mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
+        }
+      }
+    ]
+  }
+]
+```
+
+---
+
+## Best Practices
+
+- Use specific, detailed prompts
+- Combine multiple modalities in one request
+- For large files (>2GB), use File API (Phase 2)
+
+---
+
+## Official Docs
+
+https://ai.google.dev/gemini-api/docs/vision
--- a/references/sdk-migration-guide.md
+++ b/references/sdk-migration-guide.md
@@ -0,0 +1,235 @@
+# SDK Migration Guide
+
+**From**: `@google/generative-ai` (DEPRECATED)
+**To**: `@google/genai` (CURRENT)
+
+**Deadline**: November 30, 2025 (deprecated SDK sunset)
+
+---
+
+## Why Migrate?
+
+The `@google/generative-ai` SDK is deprecated and will stop receiving updates on **November 30, 2025**.
+
+The new `@google/genai` SDK:
+- ✅ Works with both Gemini API and Vertex AI
+- ✅ Supports Gemini 2.0+ features
+- ✅ Better TypeScript support
+- ✅ Unified API across platforms
+- ✅ Active development and updates
+
+---
+
+## Migration Steps
+
+### 1. Update Package
+
+```bash
+# Remove deprecated SDK
+npm uninstall @google/generative-ai
+
+# Install current SDK
+npm install @google/genai@1.27.0
+```
+
+### 2. Update Imports
+
+**Old (DEPRECATED)**:
+```typescript
+import { GoogleGenerativeAI } from '@google/generative-ai';
+
+const genAI = new GoogleGenerativeAI(apiKey);
+const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
+```
+
+**New (CURRENT)**:
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey });
+// No need to get model separately
+```
+
+### 3. Update API Calls
+
+**Old**:
+```typescript
+const result = await model.generateContent(prompt);
+const response = await result.response;
+const text = response.text();
+```
+
+**New**:
+```typescript
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: prompt
+});
+const text = response.text;
+```
+
+### 4. Update Streaming
+
+**Old**:
+```typescript
+const result = await model.generateContentStream(prompt);
+for await (const chunk of result.stream) {
+  console.log(chunk.text());
+}
+```
+
+**New**:
+```typescript
+const response = await ai.models.generateContentStream({
+  model: 'gemini-2.5-flash',
+  contents: prompt
+});
+for await (const chunk of response) {
+  console.log(chunk.text);
+}
+```
+
+### 5. Update Chat
+
+**Old**:
+```typescript
+const chat = model.startChat({
+  history: []
+});
+const result = await chat.sendMessage(message);
+const response = await result.response;
+console.log(response.text());
+```
+
+**New**:
+```typescript
+const chat = await ai.models.createChat({
+  model: 'gemini-2.5-flash',
+  history: []
+});
+const response = await chat.sendMessage(message);
+console.log(response.text);
+```
+
+---
+
+## Complete Before/After Example
+
+### Before (Deprecated SDK)
+
+```typescript
+import { GoogleGenerativeAI } from '@google/generative-ai';
+
+const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
+const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
+
+// Generate
+const result = await model.generateContent('Hello');
+const response = await result.response;
+console.log(response.text());
+
+// Stream
+const streamResult = await model.generateContentStream('Write a story');
+for await (const chunk of streamResult.stream) {
+  console.log(chunk.text());
+}
+
+// Chat
+const chat = model.startChat();
+const chatResult = await chat.sendMessage('Hi');
+const chatResponse = await chatResult.response;
+console.log(chatResponse.text());
+```
+
+### After (Current SDK)
+
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
+
+// Generate
+const response = await ai.models.generateContent({
+  model: 'gemini-2.5-flash',
+  contents: 'Hello'
+});
+console.log(response.text);
+
+// Stream
+const streamResponse = await ai.models.generateContentStream({
+  model: 'gemini-2.5-flash',
+  contents: 'Write a story'
+});
+for await (const chunk of streamResponse) {
+  console.log(chunk.text);
+}
+
+// Chat
+const chat = await ai.models.createChat({ model: 'gemini-2.5-flash' });
+const chatResponse = await chat.sendMessage('Hi');
+console.log(chatResponse.text);
+```
+
+---
+
+## Key Differences
+
+| Aspect | Old SDK | New SDK |
+|--------|---------|---------|
+| Package | `@google/generative-ai` | `@google/genai` |
+| Class | `GoogleGenerativeAI` | `GoogleGenAI` |
+| Model Init | `genAI.getGenerativeModel()` | Specify in each call |
+| Text Access | `response.text()` (method) | `response.text` (property) |
+| Stream Iteration | `result.stream` | Direct iteration |
+| Chat Creation | `model.startChat()` | `ai.models.createChat()` |
+
+---
+
+## Troubleshooting
+
+### Error: "Cannot find module '@google/generative-ai'"
+
+**Cause**: Old import statement after migration
+
+**Solution**: Update all imports to `@google/genai`
+
+### Error: "Property 'text' does not exist"
+
+**Cause**: Using `response.text()` (method) instead of `response.text` (property)
+
+**Solution**: Remove parentheses: `response.text` not `response.text()`
+
+### Error: "generateContent is not a function"
+
+**Cause**: Trying to call methods on old model object
+
+**Solution**: Use `ai.models.generateContent()` directly
+
+---
+
+## Automated Migration Script
+
+```bash
+# Find all files using old SDK
+rg "@google/generative-ai" --type ts
+
+# Replace import statements
+find . -name "*.ts" -exec sed -i 's/@google\/generative-ai/@google\/genai/g' {} +
+
+# Replace class name
+find . -name "*.ts" -exec sed -i 's/GoogleGenerativeAI/GoogleGenAI/g' {} +
+```
+
+**⚠️ Note**: This script handles imports but NOT API changes. Manual review required!
+
+---
+
+## Official Resources
+
+- **Migration Guide**: https://ai.google.dev/gemini-api/docs/migrate-to-genai
+- **New SDK Docs**: https://github.com/googleapis/js-genai
+- **Deprecated SDK**: https://github.com/google-gemini/deprecated-generative-ai-js
+
+---
+
+**Deadline Reminder**: November 30, 2025 - Deprecated SDK sunset
--- a/references/streaming-patterns.md
+++ b/references/streaming-patterns.md
@@ -0,0 +1,81 @@
+# Streaming Patterns
+
+Complete guide to implementing streaming with Gemini API.
+
+---
+
+## SDK Approach (Async Iteration)
+
+```typescript
+const response = await ai.models.generateContentStream({
+  model: 'gemini-2.5-flash',
+  contents: 'Write a story'
+});
+
+for await (const chunk of response) {
+  process.stdout.write(chunk.text);
+}
+```
+
+**Pros**: Simple, automatic parsing
+**Cons**: Requires Node.js or compatible runtime
+
+---
+
+## Fetch Approach (SSE Parsing)
+
+```typescript
+const response = await fetch(
+  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent',
+  { /* ... */ }
+);
+
+const reader = response.body.getReader();
+const decoder = new TextDecoder();
+let buffer = '';
+
+while (true) {
+  const { done, value } = await reader.read();
+  if (done) break;
+
+  buffer += decoder.decode(value, { stream: true });
+  const lines = buffer.split('\n');
+  buffer = lines.pop() || '';
+
+  for (const line of lines) {
+    if (!line.startsWith('data: ')) continue;
+    
+    const data = JSON.parse(line.slice(6));
+    const text = data.candidates[0]?.content?.parts[0]?.text;
+    if (text) process.stdout.write(text);
+  }
+}
+```
+
+**Pros**: Works in any environment
+**Cons**: Manual SSE parsing required
+
+---
+
+## SSE Format
+
+```
+data: {"candidates":[{"content":{"parts":[{"text":"Hello"}]}}]}
+data: {"candidates":[{"content":{"parts":[{"text":" world"}]}}]}
+data: [DONE]
+```
+
+---
+
+## Best Practices
+
+- Always use `streamGenerateContent` endpoint
+- Handle incomplete chunks in buffer
+- Skip empty lines and `[DONE]` markers
+- Use streaming for better UX on long responses
+
+---
+
+## Official Docs
+
+https://ai.google.dev/gemini-api/docs/streaming
--- a/references/thinking-mode-guide.md
+++ b/references/thinking-mode-guide.md
@@ -0,0 +1,59 @@
+# Thinking Mode Guide
+
+Complete guide to thinking mode in Gemini 2.5 models.
+
+---
+
+## What is Thinking Mode?
+
+Gemini 2.5 models "think" internally before responding, improving accuracy on complex tasks.
+
+**Key Points**:
+- ✅ Always enabled on 2.5 models (cannot disable)
+- ✅ Transparent (you don't see the thinking process)
+- ✅ Configurable thinking budget
+- ✅ Improves reasoning quality
+
+---
+
+## Configuration
+
+```typescript
+config: {
+  thinkingConfig: {
+    thinkingBudget: 8192  // Max tokens for internal reasoning
+  }
+}
+```
+
+---
+
+## When to Increase Budget
+
+✅ Complex math/logic problems
+✅ Multi-step reasoning
+✅ Code optimization
+✅ Detailed analysis
+
+---
+
+## When Default is Fine
+
+⏺️ Simple questions
+⏺️ Creative writing
+⏺️ Translation
+⏺️ Summarization
+
+---
+
+## Model Comparison
+
+- **gemini-2.5-pro**: Best for complex reasoning
+- **gemini-2.5-flash**: Good balance
+- **gemini-2.5-flash-lite**: Basic thinking
+
+---
+
+## Official Docs
+
+https://ai.google.dev/gemini-api/docs/thinking
--- a/references/top-errors.md
+++ b/references/top-errors.md
@@ -0,0 +1,304 @@
+# Top Errors and Solutions
+
+22 common Gemini API errors with solutions (Phase 1 + Phase 2).
+
+---
+
+## 1. Using Deprecated SDK
+
+**Error**: `Cannot find module '@google/generative-ai'`
+
+**Cause**: Using old SDK after migration
+
+**Solution**: Install `@google/genai` instead
+
+---
+
+## 2. Wrong Context Window Claims
+
+**Error**: Input exceeds model capacity
+
+**Cause**: Assuming 2M tokens for Gemini 2.5
+
+**Solution**: Gemini 2.5 has 1,048,576 input tokens (NOT 2M!)
+
+---
+
+## 3. Model Not Found
+
+**Error**: `models/gemini-3.0-flash is not found`
+
+**Cause**: Wrong model name
+
+**Solution**: Use: `gemini-2.5-pro`, `gemini-2.5-flash`, or `gemini-2.5-flash-lite`
+
+---
+
+## 4. Function Calling on Flash-Lite
+
+**Error**: Function calling not working
+
+**Cause**: Flash-Lite doesn't support function calling
+
+**Solution**: Use `gemini-2.5-flash` or `gemini-2.5-pro`
+
+---
+
+## 5. Invalid API Key (401)
+
+**Error**: `API key not valid`
+
+**Cause**: Missing or wrong `GEMINI_API_KEY`
+
+**Solution**: Set environment variable correctly
+
+---
+
+## 6. Rate Limit Exceeded (429)
+
+**Error**: `Resource has been exhausted`
+
+**Cause**: Too many requests
+
+**Solution**: Implement exponential backoff
+
+---
+
+## 7. Streaming Parse Errors
+
+**Error**: Invalid JSON in SSE stream
+
+**Cause**: Incomplete chunk parsing
+
+**Solution**: Use buffer to handle partial chunks
+
+---
+
+## 8. Multimodal Format Errors
+
+**Error**: Invalid base64 or MIME type
+
+**Cause**: Wrong image encoding
+
+**Solution**: Use correct base64 encoding and MIME type
+
+---
+
+## 9. Context Length Exceeded
+
+**Error**: `Request payload size exceeds the limit`
+
+**Cause**: Input too large
+
+**Solution**: Reduce input size (max 1,048,576 tokens)
+
+---
+
+## 10. Chat Not Working with Fetch
+
+**Error**: No chat helper available
+
+**Cause**: Chat helpers are SDK-only
+
+**Solution**: Manually manage conversation history or use SDK
+
+---
+
+## 11. Thinking Mode Not Supported
+
+**Error**: Trying to disable thinking mode
+
+**Cause**: Thinking mode always enabled on 2.5
+
+**Solution**: You can only configure budget, not disable
+
+---
+
+## 12. Parameter Conflicts
+
+**Error**: Unsupported parameters
+
+**Cause**: Using wrong config options
+
+**Solution**: Use only supported parameters (see generation-config.md)
+
+---
+
+## 13. System Instruction Placement
+
+**Error**: System instruction not working
+
+**Cause**: Placed inside contents array
+
+**Solution**: Place at top level, not in contents
+
+---
+
+## 14. Token Counting Errors
+
+**Error**: Unexpected token usage
+
+**Cause**: Multimodal inputs use more tokens
+
+**Solution**: Images/video/audio count toward token limit
+
+---
+
+## 15. Parallel Function Call Errors
+
+**Error**: Functions not executing in parallel
+
+**Cause**: Dependencies between functions
+
+**Solution**: Gemini auto-detects; ensure functions are independent
+
+---
+
+## Phase 2 Errors
+
+### 16. Invalid Model Version for Caching
+
+**Error**: `Invalid model name for caching`
+
+**Cause**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
+
+**Solution**: Must use explicit version suffix when creating caches
+
+```typescript
+// ✅ Correct
+model: 'gemini-2.5-flash-001'
+
+// ❌ Wrong
+model: 'gemini-2.5-flash'
+```
+
+**Source**: https://ai.google.dev/gemini-api/docs/caching
+
+---
+
+### 17. Cache Expired or Not Found
+
+**Error**: `Cache not found` or `Cache expired`
+
+**Cause**: Trying to use cache after TTL expiration
+
+**Solution**: Check expiration before use or recreate cache
+
+```typescript
+const cache = await ai.caches.get({ name: cacheName });
+if (new Date(cache.expireTime) < new Date()) {
+  // Recreate cache
+  cache = await ai.caches.create({ ... });
+}
+```
+
+---
+
+### 18. Cannot Update Expired Cache TTL
+
+**Error**: `Cannot update expired cache`
+
+**Cause**: Trying to extend TTL after cache already expired
+
+**Solution**: Update TTL before expiration or create new cache
+
+```typescript
+// Update TTL before expiration
+await ai.caches.update({
+  name: cache.name,
+  config: { ttl: '7200s' }
+});
+```
+
+---
+
+### 19. Code Execution Timeout
+
+**Error**: `Execution timed out after 30 seconds` with `OUTCOME_FAILED`
+
+**Cause**: Python code taking too long to execute
+
+**Solution**: Simplify computation or reduce data size
+
+```typescript
+// Check outcome before using results
+if (part.codeExecutionResult?.outcome === 'OUTCOME_FAILED') {
+  console.error('Execution failed:', part.codeExecutionResult.output);
+}
+```
+
+**Source**: https://ai.google.dev/gemini-api/docs/code-execution
+
+---
+
+### 20. Python Package Not Available
+
+**Error**: `ModuleNotFoundError: No module named 'requests'`
+
+**Cause**: Trying to import package not in sandbox
+
+**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy)
+
+**Available Packages**:
+- Standard library: math, statistics, json, csv, datetime
+- Data science: numpy, pandas, scipy
+- Visualization: matplotlib, seaborn
+
+---
+
+### 21. Code Execution on Flash-Lite
+
+**Error**: Code execution not working
+
+**Cause**: `gemini-2.5-flash-lite` doesn't support code execution
+
+**Solution**: Use `gemini-2.5-flash` or `gemini-2.5-pro`
+
+```typescript
+// ✅ Correct
+model: 'gemini-2.5-flash' // Supports code execution
+
+// ❌ Wrong
+model: 'gemini-2.5-flash-lite' // NO code execution support
+```
+
+---
+
+### 22. Grounding Requires Google Cloud Project
+
+**Error**: `Grounding requires Google Cloud project configuration`
+
+**Cause**: Using API key not associated with GCP project
+
+**Solution**: Set up Google Cloud project and enable Generative Language API
+
+**Steps**:
+1. Create Google Cloud project
+2. Enable Generative Language API
+3. Configure billing
+4. Use API key from that project
+
+**Source**: https://ai.google.dev/gemini-api/docs/grounding
+
+---
+
+## Quick Debugging Checklist
+
+### Phase 1 (Core)
+- [ ] Using @google/genai (NOT @google/generative-ai)
+- [ ] Model name is gemini-2.5-pro/flash/flash-lite
+- [ ] API key is set correctly
+- [ ] Input under 1,048,576 tokens
+- [ ] Not using Flash-Lite for function calling
+- [ ] System instruction at top level
+- [ ] Streaming endpoint is streamGenerateContent
+- [ ] MIME types are correct for multimodal
+
+### Phase 2 (Advanced)
+- [ ] Caching: Using explicit model version (e.g., gemini-2.5-flash-001)
+- [ ] Caching: Cache not expired (check expireTime)
+- [ ] Code Execution: Not using Flash-Lite
+- [ ] Code Execution: Using only available Python packages
+- [ ] Grounding: Google Cloud project configured
+- [ ] Grounding: Checking groundingMetadata for search results
+