gh-jezweb-claude-skills-ski…/SKILL.md

---
name: google-gemini-api
description: |
  Integrate Gemini API with correct current SDK (@google/genai v1.27+, NOT deprecated @google/generative-ai).
  Supports text generation, multimodal (images/video/audio/PDFs), function calling, and thinking mode. 1M input tokens.

  Use when: integrating Gemini API, implementing multimodal AI, using thinking mode for reasoning, function calling
  with parallel execution, streaming responses, deploying to Cloudflare Workers, building chat, or troubleshooting
  SDK deprecation, context window, model not found, function calling, or multimodal format errors.

  Keywords: gemini api, @google/genai, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite,
  gemini-3-pro-preview, multimodal gemini, thinking mode, google ai, genai sdk, function calling gemini,
  streaming gemini, gemini vision, gemini video, gemini audio, gemini pdf, system instructions,
  multi-turn chat, DEPRECATED @google/generative-ai, gemini context window, gemini models 2025,
  gemini 1m tokens, gemini tool use, parallel function calling, compositional function calling, gemini 3
license: MIT
---

# Google Gemini API - Complete Guide

**Version**: Phase 2 Complete + Gemini 3 ✅
**Package**: @google/genai@1.30.0 (⚠️ NOT @google/generative-ai)
**Last Updated**: 2025-11-26 (Package update + FileSearch preview)

---

## ⚠️ CRITICAL SDK MIGRATION WARNING

**DEPRECATED SDK**: `@google/generative-ai` (sunset November 30, 2025)
**CURRENT SDK**: `@google/genai` v1.27+

**If you see code using `@google/generative-ai`, it's outdated!**

This skill uses the **correct current SDK** and provides a complete migration guide.

---

## Status

**✅ Phase 1 Complete**:
- ✅ Text Generation (basic + streaming)
- ✅ Multimodal Inputs (images, video, audio, PDFs)
- ✅ Function Calling (basic + parallel execution)
- ✅ System Instructions & Multi-turn Chat
- ✅ Thinking Mode Configuration
- ✅ Generation Parameters (temperature, top-p, top-k, stop sequences)
- ✅ Both Node.js SDK (@google/genai) and fetch approaches

**✅ Phase 2 Complete**:
- ✅ Context Caching (cost optimization with TTL-based caching)
- ✅ Code Execution (built-in Python interpreter and sandbox)
- ✅ Grounding with Google Search (real-time web information + citations)

**📦 Separate Skills**:
- **Embeddings**: See `google-gemini-embeddings` skill for text-embedding-004

---

## Table of Contents

**Phase 1 - Core Features**:
1. [Quick Start](#quick-start)
2. [Current Models (2025)](#current-models-2025)
3. [SDK vs Fetch Approaches](#sdk-vs-fetch-approaches)
4. [Text Generation](#text-generation)
5. [Streaming](#streaming)
6. [Multimodal Inputs](#multimodal-inputs)
7. [Function Calling](#function-calling)
8. [System Instructions](#system-instructions)
9. [Multi-turn Chat](#multi-turn-chat)
10. [Thinking Mode](#thinking-mode)
11. [Generation Configuration](#generation-configuration)

**Phase 2 - Advanced Features**:
12. [Context Caching](#context-caching)
13. [Code Execution](#code-execution)
14. [Grounding with Google Search](#grounding-with-google-search)

**Common Reference**:
15. [Error Handling](#error-handling)
16. [Rate Limits](#rate-limits)
17. [SDK Migration Guide](#sdk-migration-guide)
18. [Production Best Practices](#production-best-practices)

---

## Quick Start

### Installation

**CORRECT SDK:**
```bash
npm install @google/genai@1.30.0
```

**❌ WRONG (DEPRECATED):**
```bash
npm install @google/generative-ai  # DO NOT USE!
```

### Environment Setup

```bash
export GEMINI_API_KEY="..."
```

Or create `.env` file:
```
GEMINI_API_KEY=...
```

### First Text Generation (Node.js SDK)

```typescript
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Explain quantum computing in simple terms'
});

console.log(response.text);
```

### First Text Generation (Fetch - Cloudflare Workers)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Explain quantum computing in simple terms' }] }]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```

---

## Current Models (2025)

### Gemini 3 Series (Preview - November 2025)

#### gemini-3-pro-preview
- **Context**: TBD (documentation pending)
- **Status**: 🆕 Preview release (November 18, 2025)
- **Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning
- **Best for**: Most complex reasoning tasks, advanced multimodal understanding, benchmark-critical applications
- **Features**: Enhanced multimodal (text, image, video, audio, PDF), function calling, streaming
- **Benchmark Performance**: Outperforms Gemini 2.5 Pro on every major AI benchmark
- **⚠️ Preview**: Use for evaluation. Consider gemini-2.5-pro for production until stable release

### Gemini 2.5 Series (General Availability - Stable)

#### gemini-2.5-pro
- **Context**: 1,048,576 input tokens / 65,536 output tokens
- **Description**: State-of-the-art thinking model for complex reasoning
- **Best for**: Code, math, STEM, complex problem-solving
- **Features**: Thinking mode (default on), function calling, multimodal, streaming
- **Knowledge cutoff**: January 2025

#### gemini-2.5-flash
- **Context**: 1,048,576 input tokens / 65,536 output tokens
- **Description**: Best price-performance workhorse model
- **Best for**: Large-scale processing, low-latency, high-volume, agentic use cases
- **Features**: Thinking mode (default on), function calling, multimodal, streaming
- **Knowledge cutoff**: January 2025

#### gemini-2.5-flash-lite
- **Context**: 1,048,576 input tokens / 65,536 output tokens
- **Description**: Cost-optimized, fastest 2.5 model
- **Best for**: High throughput, cost-sensitive applications
- **Features**: Thinking mode (default on), function calling, multimodal, streaming
- **Knowledge cutoff**: January 2025

### Model Feature Matrix

| Feature | 3-Pro (Preview) | 2.5-Pro | 2.5-Flash | 2.5-Flash-Lite |
|---------|-----------------|---------|-----------|----------------|
| Thinking Mode | TBD | ✅ Default ON | ✅ Default ON | ✅ Default ON |
| Function Calling | ✅ | ✅ | ✅ | ✅ |
| Multimodal | ✅ Enhanced | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ |
| System Instructions | ✅ | ✅ | ✅ | ✅ |
| Context Window | TBD | 1,048,576 in | 1,048,576 in | 1,048,576 in |
| Output Tokens | TBD | 65,536 max | 65,536 max | 65,536 max |
| Status | Preview | Stable | Stable | Stable |

### ⚠️ Context Window Correction

**ACCURATE (Gemini 2.5)**: Gemini 2.5 models support **1,048,576 input tokens** (NOT 2M!)
**OUTDATED**: Only Gemini 1.5 Pro (previous generation) had 2M token context window
**GEMINI 3**: Context window specifications pending official documentation

**Common mistake**: Claiming Gemini 2.5 has 2M tokens. It doesn't. This skill prevents this error.

---

## SDK vs Fetch Approaches

### Node.js SDK (@google/genai)

**Pros:**
- Type-safe with TypeScript
- Easier API (simpler syntax)
- Built-in chat helpers
- Automatic SSE parsing for streaming
- Better error handling

**Cons:**
- Requires Node.js or compatible runtime
- Larger bundle size
- May not work in all edge runtimes

**Use when:** Building Node.js apps, Next.js Server Actions/Components, or any environment with Node.js compatibility

### Fetch-based (Direct REST API)

**Pros:**
- Works in **any** JavaScript environment (Cloudflare Workers, Deno, Bun, browsers)
- Minimal dependencies
- Smaller bundle size
- Full control over requests

**Cons:**
- More verbose syntax
- Manual SSE parsing for streaming
- No built-in chat helpers
- Manual error handling

**Use when:** Deploying to Cloudflare Workers, browser clients, or lightweight edge runtimes

---

## Text Generation

### Basic Text Generation (SDK)

```typescript
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Write a haiku about artificial intelligence'
});

console.log(response.text);
```

### Basic Text Generation (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: 'Write a haiku about artificial intelligence' }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```

### Response Structure

```typescript
{
  text: string,                  // Convenience accessor for text content
  candidates: [
    {
      content: {
        parts: [
          { text: string }       // Generated text
        ],
        role: string             // "model"
      },
      finishReason: string,      // "STOP" | "MAX_TOKENS" | "SAFETY" | "OTHER"
      index: number
    }
  ],
  usageMetadata: {
    promptTokenCount: number,
    candidatesTokenCount: number,
    totalTokenCount: number
  }
}
```

---

## Streaming

### Streaming with SDK (Async Iteration)

```typescript
const response = await ai.models.generateContentStream({
  model: 'gemini-2.5-flash',
  contents: 'Write a 200-word story about time travel'
});

for await (const chunk of response) {
  process.stdout.write(chunk.text);
}
```

### Streaming with Fetch (SSE Parsing)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Write a 200-word story about time travel' }] }]
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';

  for (const line of lines) {
    if (line.trim() === '' || line.startsWith('data: [DONE]')) continue;
    if (!line.startsWith('data: ')) continue;

    try {
      const data = JSON.parse(line.slice(6));
      const text = data.candidates[0]?.content?.parts[0]?.text;
      if (text) {
        process.stdout.write(text);
      }
    } catch (e) {
      // Skip invalid JSON
    }
  }
}
```

**Key Points:**
- Use `streamGenerateContent` endpoint (not `generateContent`)
- Parse Server-Sent Events (SSE) format: `data: {json}\n\n`
- Handle incomplete chunks in buffer
- Skip empty lines and `[DONE]` markers

---

## Multimodal Inputs

Gemini 2.5 models support text + images + video + audio + PDFs in the same request.

### Images (Vision)

#### SDK Approach

```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// From file
const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'What is in this image?' },
        {
          inlineData: {
            data: base64Image,
            mimeType: 'image/jpeg'
          }
        }
      ]
    }
  ]
});

console.log(response.text);
```

#### Fetch Approach

```typescript
const imageData = fs.readFileSync('/path/to/image.jpg');
const base64Image = imageData.toString('base64');

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: 'What is in this image?' },
            {
              inlineData: {
                data: base64Image,
                mimeType: 'image/jpeg'
              }
            }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```

**Supported Image Formats:**
- JPEG (`.jpg`, `.jpeg`)
- PNG (`.png`)
- WebP (`.webp`)
- HEIC (`.heic`)
- HEIF (`.heif`)

**Max Image Size**: 20MB per image

### Video

```typescript
// Video must be < 2 minutes for inline data
const videoData = fs.readFileSync('/path/to/video.mp4');
const base64Video = videoData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Describe what happens in this video' },
        {
          inlineData: {
            data: base64Video,
            mimeType: 'video/mp4'
          }
        }
      ]
    }
  ]
});

console.log(response.text);
```

**Supported Video Formats:**
- MP4 (`.mp4`)
- MPEG (`.mpeg`)
- MOV (`.mov`)
- AVI (`.avi`)
- FLV (`.flv`)
- MPG (`.mpg`)
- WebM (`.webm`)
- WMV (`.wmv`)

**Max Video Length (inline)**: 2 minutes
**Max Video Size**: 2GB (use File API for larger files - Phase 2)

### Audio

```typescript
const audioData = fs.readFileSync('/path/to/audio.mp3');
const base64Audio = audioData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Transcribe and summarize this audio' },
        {
          inlineData: {
            data: base64Audio,
            mimeType: 'audio/mp3'
          }
        }
      ]
    }
  ]
});

console.log(response.text);
```

**Supported Audio Formats:**
- MP3 (`.mp3`)
- WAV (`.wav`)
- FLAC (`.flac`)
- AAC (`.aac`)
- OGG (`.ogg`)
- OPUS (`.opus`)

**Max Audio Size**: 20MB

### PDFs

```typescript
const pdfData = fs.readFileSync('/path/to/document.pdf');
const base64Pdf = pdfData.toString('base64');

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Summarize the key points in this PDF' },
        {
          inlineData: {
            data: base64Pdf,
            mimeType: 'application/pdf'
          }
        }
      ]
    }
  ]
});

console.log(response.text);
```

**Max PDF Size**: 30MB
**PDF Limitations**: Text-based PDFs work best; scanned images may have lower accuracy

### Multiple Inputs

You can combine multiple modalities in one request:

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: [
    {
      parts: [
        { text: 'Compare these two images and describe the differences:' },
        { inlineData: { data: base64Image1, mimeType: 'image/jpeg' } },
        { inlineData: { data: base64Image2, mimeType: 'image/jpeg' } }
      ]
    }
  ]
});
```

---

## Function Calling

Gemini supports function calling (tool use) to connect models with external APIs and systems.

### Basic Function Calling (SDK)

```typescript
import { GoogleGenAI, FunctionCallingConfigMode } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Define function declarations
const getCurrentWeather = {
  name: 'get_current_weather',
  description: 'Get the current weather for a location',
  parametersJsonSchema: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'City name, e.g. San Francisco'
      },
      unit: {
        type: 'string',
        enum: ['celsius', 'fahrenheit']
      }
    },
    required: ['location']
  }
};

// Make request with tools
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What\'s the weather in Tokyo?',
  config: {
    tools: [
      { functionDeclarations: [getCurrentWeather] }
    ]
  }
});

// Check if model wants to call a function
const functionCall = response.candidates[0].content.parts[0].functionCall;

if (functionCall) {
  console.log('Function to call:', functionCall.name);
  console.log('Arguments:', functionCall.args);

  // Execute the function (your implementation)
  const weatherData = await fetchWeather(functionCall.args.location);

  // Send function result back to model
  const finalResponse = await ai.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: [
      'What\'s the weather in Tokyo?',
      response.candidates[0].content, // Original assistant response with function call
      {
        parts: [
          {
            functionResponse: {
              name: functionCall.name,
              response: weatherData
            }
          }
        ]
      }
    ],
    config: {
      tools: [
        { functionDeclarations: [getCurrentWeather] }
      ]
    }
  });

  console.log(finalResponse.text);
}
```

### Function Calling (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'What\'s the weather in Tokyo?' }] }
      ],
      tools: [
        {
          functionDeclarations: [
            {
              name: 'get_current_weather',
              description: 'Get the current weather for a location',
              parameters: {
                type: 'object',
                properties: {
                  location: {
                    type: 'string',
                    description: 'City name'
                  }
                },
                required: ['location']
              }
            }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();
const functionCall = data.candidates[0]?.content?.parts[0]?.functionCall;

if (functionCall) {
  // Execute function and send result back (same flow as SDK)
}
```

### Parallel Function Calling

Gemini can call multiple independent functions simultaneously:

```typescript
const tools = [
  {
    functionDeclarations: [
      {
        name: 'get_weather',
        description: 'Get weather for a location',
        parametersJsonSchema: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          },
          required: ['location']
        }
      },
      {
        name: 'get_population',
        description: 'Get population of a city',
        parametersJsonSchema: {
          type: 'object',
          properties: {
            city: { type: 'string' }
          },
          required: ['city']
        }
      }
    ]
  }
];

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the weather and population of Tokyo?',
  config: { tools }
});

// Model may return MULTIPLE function calls in parallel
const functionCalls = response.candidates[0].content.parts.filter(
  part => part.functionCall
);

console.log(`Model wants to call ${functionCalls.length} functions in parallel`);
```

### Function Calling Modes

```typescript
import { FunctionCallingConfigMode } from '@google/genai';

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What\'s the weather?',
  config: {
    tools: [{ functionDeclarations: [getCurrentWeather] }],
    toolConfig: {
      functionCallingConfig: {
        mode: FunctionCallingConfigMode.ANY, // Force function call
        // mode: FunctionCallingConfigMode.AUTO, // Model decides (default)
        // mode: FunctionCallingConfigMode.NONE, // Never call functions
        allowedFunctionNames: ['get_current_weather'] // Optional: restrict to specific functions
      }
    }
  }
});
```

**Modes:**
- `AUTO` (default): Model decides whether to call functions
- `ANY`: Force model to call at least one function
- `NONE`: Disable function calling for this request

---

## System Instructions

System instructions guide the model's behavior and set context. They are **separate** from the conversation messages.

### SDK Approach

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  systemInstruction: 'You are a helpful AI assistant that always responds in the style of a pirate. Use nautical terminology and end sentences with "arrr".',
  contents: 'Explain what a database is'
});

console.log(response.text);
// Output: "Ahoy there! A database be like a treasure chest..."
```

### Fetch Approach

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      systemInstruction: {
        parts: [
          { text: 'You are a helpful AI assistant that always responds in the style of a pirate.' }
        ]
      },
      contents: [
        { parts: [{ text: 'Explain what a database is' }] }
      ]
    }),
  }
);
```

**Key Points:**
- System instructions are **NOT** part of `contents` array
- They are set once at the **top level** of the request
- They persist for the entire conversation (when using multi-turn chat)
- They don't count as user or model messages

---

## Multi-turn Chat

For conversations with history, use the SDK's chat helpers or manually manage conversation state.

### SDK Chat Helpers (Recommended)

```typescript
const chat = await ai.models.createChat({
  model: 'gemini-2.5-flash',
  systemInstruction: 'You are a helpful coding assistant.',
  history: [] // Start empty or with previous messages
});

// Send first message
const response1 = await chat.sendMessage('What is TypeScript?');
console.log('Assistant:', response1.text);

// Send follow-up (context is automatically maintained)
const response2 = await chat.sendMessage('How do I install it?');
console.log('Assistant:', response2.text);

// Get full chat history
const history = chat.getHistory();
console.log('Full conversation:', history);
```

### Manual Chat Management (Fetch)

```typescript
const conversationHistory = [];

// First turn
const response1 = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        {
          role: 'user',
          parts: [{ text: 'What is TypeScript?' }]
        }
      ]
    }),
  }
);

const data1 = await response1.json();
const assistantReply1 = data1.candidates[0].content.parts[0].text;

// Add to history
conversationHistory.push(
  { role: 'user', parts: [{ text: 'What is TypeScript?' }] },
  { role: 'model', parts: [{ text: assistantReply1 }] }
);

// Second turn (include full history)
const response2 = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        ...conversationHistory,
        { role: 'user', parts: [{ text: 'How do I install it?' }] }
      ]
    }),
  }
);
```

**Message Roles:**
- `user`: User messages
- `model`: Assistant responses

**⚠️ Important**: Chat helpers are **SDK-only**. With fetch, you must manually manage conversation history.

---

## Thinking Mode

Gemini 2.5 models have **thinking mode enabled by default** for enhanced quality. You can configure the thinking budget.

### Configure Thinking Budget (SDK)

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Solve this complex math problem: ...',
  config: {
    thinkingConfig: {
      thinkingBudget: 8192 // Max tokens for thinking (default: model-dependent)
    }
  }
});
```

### Configure Thinking Budget (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Solve this complex math problem: ...' }] }],
      generationConfig: {
        thinkingConfig: {
          thinkingBudget: 8192
        }
      }
    }),
  }
);
```

### Configure Thinking Level (SDK) - New in v1.30.0

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Solve this complex problem: ...',
  config: {
    thinkingConfig: {
      thinkingLevel: 'MEDIUM' // 'LOW' | 'MEDIUM' | 'HIGH'
    }
  }
});
```

**Thinking Levels:**
- `LOW`: Minimal internal reasoning (faster, lower quality)
- `MEDIUM`: Balanced reasoning (default)
- `HIGH`: Maximum reasoning depth (slower, higher quality)

**Key Points:**
- Thinking mode is **always enabled** on Gemini 2.5 models (cannot be disabled)
- Higher thinking budgets allow more internal reasoning (may increase latency)
- `thinkingLevel` provides simpler control than `thinkingBudget` (new in v1.30.0)
- Default budget varies by model (usually sufficient for most tasks)
- Only increase budget/level for very complex reasoning tasks

---

## Generation Configuration

Customize model behavior with generation parameters.

### All Configuration Options (SDK)

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Write a creative story',
  config: {
    temperature: 0.9,           // Randomness (0.0-2.0, default: 1.0)
    topP: 0.95,                 // Nucleus sampling (0.0-1.0)
    topK: 40,                   // Top-k sampling
    maxOutputTokens: 2048,      // Max tokens to generate
    stopSequences: ['END'],     // Stop generation if these appear
    responseMimeType: 'text/plain', // Or 'application/json' for JSON mode
    candidateCount: 1           // Number of response candidates (usually 1)
  }
});
```

### All Configuration Options (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Write a creative story' }] }],
      generationConfig: {
        temperature: 0.9,
        topP: 0.95,
        topK: 40,
        maxOutputTokens: 2048,
        stopSequences: ['END'],
        responseMimeType: 'text/plain',
        candidateCount: 1
      }
    }),
  }
);
```

### Parameter Guidelines

| Parameter | Range | Default | Use Case |
|-----------|-------|---------|----------|
| **temperature** | 0.0-2.0 | 1.0 | Lower = more focused, higher = more creative |
| **topP** | 0.0-1.0 | 0.95 | Nucleus sampling threshold |
| **topK** | 1-100+ | 40 | Limit to top K tokens |
| **maxOutputTokens** | 1-65536 | Model max | Control response length |
| **stopSequences** | Array | None | Stop generation at specific strings |

**Tips:**
- For **factual tasks**: Use low temperature (0.0-0.3)
- For **creative tasks**: Use high temperature (0.7-1.5)
- **topP** and **topK** both control randomness; use one or the other (not both)
- Always set **maxOutputTokens** to prevent excessive generation

---

## Context Caching

Context caching allows you to cache frequently used content (like system instructions, large documents, or video files) to reduce costs by **up to 90%** and improve latency.

### How It Works

1. **Create a cache** with your repeated content
2. **Reference the cache** in subsequent requests
3. **Save tokens** - cached tokens cost significantly less
4. **TTL management** - caches expire after specified time

### Benefits

- **Cost savings**: Up to 90% reduction on cached tokens
- **Reduced latency**: Faster responses by reusing processed content
- **Consistent context**: Same large context across multiple requests

### Cache Creation (SDK)

```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Create a cache for a large document
const documentText = fs.readFileSync('./large-document.txt', 'utf-8');

const cache = await ai.caches.create({
  model: 'gemini-2.5-flash',
  config: {
    displayName: 'large-doc-cache', // Identifier for the cache
    systemInstruction: 'You are an expert at analyzing legal documents.',
    contents: documentText,
    ttl: '3600s', // Cache for 1 hour
  }
});

console.log('Cache created:', cache.name);
console.log('Expires at:', cache.expireTime);
```

### Cache Creation (Fetch)

```typescript
const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/cachedContents',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      model: 'models/gemini-2.5-flash',
      displayName: 'large-doc-cache',
      systemInstruction: {
        parts: [{ text: 'You are an expert at analyzing legal documents.' }]
      },
      contents: [
        { parts: [{ text: documentText }] }
      ],
      ttl: '3600s'
    }),
  }
);

const cache = await response.json();
console.log('Cache created:', cache.name);
```

### Using a Cache (SDK)

```typescript
// Generate content using the cache
const response = await ai.models.generateContent({
  model: cache.name, // Use cache name as model
  contents: 'Summarize the key points in the document'
});

console.log(response.text);
```

### Using a Cache (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/${cache.name}:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'Summarize the key points in the document' }] }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```

### Update Cache TTL (SDK)

```typescript
import { UpdateCachedContentConfig } from '@google/genai';

await ai.caches.update({
  name: cache.name,
  config: {
    ttl: '7200s' // Extend to 2 hours
  }
});
```

### Update Cache with Expiration Time (SDK)

```typescript
// Set specific expiration time (must be timezone-aware)
const in10Minutes = new Date(Date.now() + 10 * 60 * 1000);

await ai.caches.update({
  name: cache.name,
  config: {
    expireTime: in10Minutes
  }
});
```

### List and Delete Caches (SDK)

```typescript
// List all caches
const caches = await ai.caches.list();
for (const cache of caches) {
  console.log(cache.name, cache.displayName);
}

// Delete a specific cache
await ai.caches.delete({ name: cache.name });
```

### Caching with Video Files

```typescript
import { GoogleGenAI } from '@google/genai';
import fs from 'fs';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Upload video file
const videoFile = await ai.files.upload({
  file: fs.createReadStream('./video.mp4')
});

// Wait for processing
while (videoFile.state.name === 'PROCESSING') {
  await new Promise(resolve => setTimeout(resolve, 2000));
  videoFile = await ai.files.get({ name: videoFile.name });
}

// Create cache with video
const cache = await ai.caches.create({
  model: 'gemini-2.5-flash',
  config: {
    displayName: 'video-analysis-cache',
    systemInstruction: 'You are an expert video analyzer.',
    contents: [videoFile],
    ttl: '300s' // 5 minutes
  }
});

// Use cache for multiple queries
const response1 = await ai.models.generateContent({
  model: cache.name,
  contents: 'What happens in the first minute?'
});

const response2 = await ai.models.generateContent({
  model: cache.name,
  contents: 'Describe the main characters'
});
```

### Key Points

**When to Use Caching:**
- Large system instructions used repeatedly
- Long documents analyzed multiple times
- Video/audio files queried with different prompts
- Consistent context across conversation sessions

**TTL Guidelines:**
- Short sessions: 300s (5 min) to 3600s (1 hour)
- Long sessions: 3600s (1 hour) to 86400s (24 hours)
- Maximum: 7 days

**Cost Savings:**
- Cached input tokens: ~90% cheaper than regular tokens
- Output tokens: Same price (not cached)

**Important:**
- You must use explicit model version suffixes (e.g., `gemini-2.5-flash-001`, NOT just `gemini-2.5-flash`)
- Caches are automatically deleted after TTL expires
- Update TTL before expiration to extend cache lifetime

---

## Code Execution

Gemini models can generate and execute Python code to solve problems requiring computation, data analysis, or visualization.

### How It Works

1. Model generates executable Python code
2. Code runs in secure sandbox
3. Results are returned to the model
4. Model incorporates results into response

### Supported Operations

- Mathematical calculations
- Data analysis and statistics
- File processing (CSV, JSON, etc.)
- Chart and graph generation
- Algorithm implementation
- Data transformations

### Available Python Packages

**Standard Library:**
- `math`, `statistics`, `random`, `datetime`, `json`, `csv`, `re`
- `collections`, `itertools`, `functools`

**Data Science:**
- `numpy`, `pandas`, `scipy`

**Visualization:**
- `matplotlib`, `seaborn`

**Note**: Limited package availability compared to full Python environment

### Basic Code Execution (SDK)

```typescript
import { GoogleGenAI, Tool, ToolCodeExecution } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the sum of the first 50 prime numbers? Generate and run code for the calculation.',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Parse response parts
for (const part of response.candidates[0].content.parts) {
  if (part.text) {
    console.log('Text:', part.text);
  }
  if (part.executableCode) {
    console.log('Generated Code:', part.executableCode.code);
  }
  if (part.codeExecutionResult) {
    console.log('Execution Output:', part.codeExecutionResult.output);
  }
}
```

### Basic Code Execution (Fetch)

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      tools: [{ code_execution: {} }],
      contents: [
        {
          parts: [
            { text: 'What is the sum of the first 50 prime numbers? Generate and run code.' }
          ]
        }
      ]
    }),
  }
);

const data = await response.json();

for (const part of data.candidates[0].content.parts) {
  if (part.text) {
    console.log('Text:', part.text);
  }
  if (part.executableCode) {
    console.log('Code:', part.executableCode.code);
  }
  if (part.codeExecutionResult) {
    console.log('Result:', part.codeExecutionResult.output);
  }
}
```

### Chat with Code Execution (SDK)

```typescript
const chat = await ai.chats.create({
  model: 'gemini-2.5-flash',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

let response = await chat.sendMessage('I have a math question for you.');
console.log(response.text);

response = await chat.sendMessage(
  'Calculate the Fibonacci sequence up to the 20th number and sum them.'
);

// Model will generate and execute code, then provide answer
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Code:', part.executableCode.code);
  if (part.codeExecutionResult) console.log('Output:', part.codeExecutionResult.output);
}
```

### Data Analysis Example

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: `
    Analyze this sales data and calculate:
    1. Total revenue
    2. Average sale price
    3. Best-selling month

    Data (CSV format):
    month,sales,revenue
    Jan,150,45000
    Feb,200,62000
    Mar,175,53000
    Apr,220,68000
  `,
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Model will generate pandas/numpy code to analyze data
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Analysis Code:', part.executableCode.code);
  if (part.codeExecutionResult) console.log('Results:', part.codeExecutionResult.output);
}
```

### Visualization Example

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Create a bar chart showing the distribution of prime numbers under 100 by their last digit. Generate the chart and describe the pattern.',
  config: {
    tools: [{ codeExecution: {} }]
  }
});

// Model generates matplotlib code, executes it, and describes results
for (const part of response.candidates[0].content.parts) {
  if (part.text) console.log(part.text);
  if (part.executableCode) console.log('Chart Code:', part.executableCode.code);
  if (part.codeExecutionResult) {
    // Note: Chart image data would be in output
    console.log('Execution completed');
  }
}
```

### Response Structure

```typescript
{
  candidates: [
    {
      content: {
        parts: [
          { text: "I'll calculate that for you." },
          {
            executableCode: {
              language: "PYTHON",
              code: "def is_prime(n):\n  if n <= 1:\n    return False\n  ..."
            }
          },
          {
            codeExecutionResult: {
              outcome: "OUTCOME_OK", // or "OUTCOME_FAILED"
              output: "5117\n"
            }
          },
          { text: "The sum of the first 50 prime numbers is 5117." }
        ]
      }
    }
  ]
}
```

### Error Handling

```typescript
for (const part of response.candidates[0].content.parts) {
  if (part.codeExecutionResult) {
    if (part.codeExecutionResult.outcome === 'OUTCOME_FAILED') {
      console.error('Code execution failed:', part.codeExecutionResult.output);
    } else {
      console.log('Success:', part.codeExecutionResult.output);
    }
  }
}
```

### Key Points

**When to Use Code Execution:**
- Complex mathematical calculations
- Data analysis and statistics
- Algorithm implementations
- File parsing and processing
- Chart generation
- Computational problems

**Limitations:**
- Sandbox environment (limited file system access)
- Limited Python package availability
- Execution timeout limits
- No network access from code
- No persistent state between executions

**Best Practices:**
- Specify what calculation or analysis you need clearly
- Request code generation explicitly ("Generate and run code...")
- Check `outcome` field for errors
- Use for deterministic computations, not for general programming

**Important:**
- Available on all Gemini 2.5 models (Pro, Flash, Flash-Lite)
- Code runs in isolated sandbox for security
- Supports Python with standard library and common data science packages

---

## Grounding with Google Search

Grounding connects the model to real-time web information, reducing hallucinations and providing up-to-date, fact-checked responses with citations.

### How It Works

1. Model determines if it needs current information
2. Automatically performs Google Search
3. Processes search results
4. Incorporates findings into response
5. Provides citations and source URLs

### Benefits

- **Real-time information**: Access to current events and data
- **Reduced hallucinations**: Answers grounded in web sources
- **Verifiable**: Citations allow fact-checking
- **Up-to-date**: Not limited to model's training cutoff

### Grounding Options

#### 1. Google Search (`googleSearch`) - Recommended for Gemini 2.5

```typescript
const groundingTool = {
  googleSearch: {}
};
```

**Features:**
- Simple configuration
- Automatic search when needed
- Available on all Gemini 2.5 models

#### 2. FileSearch - New in v1.29.0 (Preview)

```typescript
const fileSearchTool = {
  fileSearch: {
    fileSearchStoreId: 'store-id-here' // Created via FileSearchStore APIs
  }
};
```

**Features:**
- Search through your own document collections
- Upload and index custom knowledge bases
- Alternative to web search for proprietary data
- Preview feature (requires FileSearchStore setup)

**Note**: See [FileSearch documentation](https://github.com/googleapis/js-genai) for store creation and management.

#### 3. Google Search Retrieval (`googleSearchRetrieval`) - Legacy (Gemini 1.5)

```typescript
const retrievalTool = {
  googleSearchRetrieval: {
    dynamicRetrievalConfig: {
      mode: 'MODE_DYNAMIC',
      dynamicThreshold: 0.7 // Only search if confidence < 70%
    }
  }
};
```

**Features:**
- Dynamic threshold control
- Used with Gemini 1.5 models
- More configuration options

### Basic Grounding (SDK) - Gemini 2.5

```typescript
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Who won the euro 2024?',
  config: {
    tools: [{ googleSearch: {} }]
  }
});

console.log(response.text);

// Check if grounding was used
if (response.candidates[0].groundingMetadata) {
  console.log('Search was performed!');
  console.log('Sources:', response.candidates[0].groundingMetadata);
}
```

### Basic Grounding (Fetch) - Gemini 2.5

```typescript
const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': env.GEMINI_API_KEY,
    },
    body: JSON.stringify({
      contents: [
        { parts: [{ text: 'Who won the euro 2024?' }] }
      ],
      tools: [
        { google_search: {} }
      ]
    }),
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

if (data.candidates[0].groundingMetadata) {
  console.log('Grounding metadata:', data.candidates[0].groundingMetadata);
}
```

### Dynamic Retrieval (SDK) - Gemini 1.5

```typescript
import { GoogleGenAI, DynamicRetrievalConfigMode } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await ai.models.generateContent({
  model: 'gemini-1.5-flash',
  contents: 'Who won the euro 2024?',
  config: {
    tools: [
      {
        googleSearchRetrieval: {
          dynamicRetrievalConfig: {
            mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
            dynamicThreshold: 0.7 // Search only if confidence < 70%
          }
        }
      }
    ]
  }
});

console.log(response.text);

if (!response.candidates[0].groundingMetadata) {
  console.log('Model answered from its own knowledge (high confidence)');
}
```

### Grounding Metadata Structure

```typescript
{
  groundingMetadata: {
    searchQueries: [
      { text: "euro 2024 winner" }
    ],
    webPages: [
      {
        url: "https://example.com/euro-2024-results",
        title: "UEFA Euro 2024 Final Results",
        snippet: "Spain won UEFA Euro 2024..."
      }
    ],
    citations: [
      {
        startIndex: 42,
        endIndex: 47,
        uri: "https://example.com/euro-2024-results"
      }
    ],
    retrievalQueries: [
      {
        query: "who won euro 2024 final"
      }
    ]
  }
}
```

### Chat with Grounding (SDK)

```typescript
const chat = await ai.chats.create({
  model: 'gemini-2.5-flash',
  config: {
    tools: [{ googleSearch: {} }]
  }
});

let response = await chat.sendMessage('What are the latest developments in quantum computing?');
console.log(response.text);

// Check grounding sources
if (response.candidates[0].groundingMetadata) {
  const sources = response.candidates[0].groundingMetadata.webPages || [];
  console.log(`Sources used: ${sources.length}`);
  sources.forEach(source => {
    console.log(`- ${source.title}: ${source.url}`);
  });
}

// Follow-up still has grounding enabled
response = await chat.sendMessage('Which company made the biggest breakthrough?');
console.log(response.text);
```

### Combining Grounding with Function Calling

```typescript
const weatherFunction = {
  name: 'get_current_weather',
  description: 'Get current weather for a location',
  parametersJsonSchema: {
    type: 'object',
    properties: {
      location: { type: 'string', description: 'City name' }
    },
    required: ['location']
  }
};

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is the weather like in the city that won Euro 2024?',
  config: {
    tools: [
      { googleSearch: {} },
      { functionDeclarations: [weatherFunction] }
    ]
  }
});

// Model will:
// 1. Use Google Search to find Euro 2024 winner
// 2. Call get_current_weather function with the city
// 3. Combine both results in response
```

### Checking if Grounding was Used

```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'What is 2+2?', // Model knows this without search
  config: {
    tools: [{ googleSearch: {} }]
  }
});

if (!response.candidates[0].groundingMetadata) {
  console.log('Model answered from its own knowledge (no search needed)');
} else {
  console.log('Search was performed');
}
```

### Key Points

**When to Use Grounding:**
- Current events and news
- Real-time data (stock prices, sports scores, weather)
- Fact-checking and verification
- Questions about recent developments
- Information beyond model's training cutoff

**When NOT to Use:**
- General knowledge questions
- Mathematical calculations
- Code generation
- Creative writing
- Tasks requiring internal reasoning only

**Cost Considerations:**
- Grounding adds latency (search takes time)
- Additional token costs for retrieved content
- Use `dynamicThreshold` to control when searches happen (Gemini 1.5)

**Important Notes:**
- Grounding requires **Google Cloud project** (not just API key)
- Search results quality depends on query phrasing
- Citations may not cover all facts in response
- Search is performed automatically based on confidence

**Gemini 2.5 vs 1.5:**
- **Gemini 2.5**: Use `googleSearch` (simple, recommended)
- **Gemini 1.5**: Use `googleSearchRetrieval` with `dynamicThreshold`

**Best Practices:**
- Always check `groundingMetadata` to see if search was used
- Display citations to users for transparency
- Use specific, well-phrased questions for better search results
- Combine with function calling for hybrid workflows

---

## Error Handling

### Common Errors

#### 1. Invalid API Key (401)

```typescript
{
  error: {
    code: 401,
    message: 'API key not valid. Please pass a valid API key.',
    status: 'UNAUTHENTICATED'
  }
}
```

**Solution**: Verify `GEMINI_API_KEY` environment variable is set correctly.

#### 2. Rate Limit Exceeded (429)

```typescript
{
  error: {
    code: 429,
    message: 'Resource has been exhausted (e.g. check quota).',
    status: 'RESOURCE_EXHAUSTED'
  }
}
```

**Solution**: Implement exponential backoff retry strategy.

#### 3. Model Not Found (404)

```typescript
{
  error: {
    code: 404,
    message: 'models/gemini-3.0-flash is not found',
    status: 'NOT_FOUND'
  }
}
```

**Solution**: Use correct model names: `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite`

#### 4. Context Length Exceeded (400)

```typescript
{
  error: {
    code: 400,
    message: 'Request payload size exceeds the limit',
    status: 'INVALID_ARGUMENT'
  }
}
```

**Solution**: Reduce input size. Gemini 2.5 models support 1,048,576 input tokens max.

### Exponential Backoff Pattern

```typescript
async function generateWithRetry(request, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await ai.models.generateContent(request);
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }
      throw error;
    }
  }
}
```

---

## Rate Limits

### Free Tier (Gemini API)

Rate limits vary by model:

**Gemini 2.5 Pro**:
- Requests per minute: 5 RPM
- Tokens per minute: 125,000 TPM
- Requests per day: 100 RPD

**Gemini 2.5 Flash**:
- Requests per minute: 10 RPM
- Tokens per minute: 250,000 TPM
- Requests per day: 250 RPD

**Gemini 2.5 Flash-Lite**:
- Requests per minute: 15 RPM
- Tokens per minute: 250,000 TPM
- Requests per day: 1,000 RPD

### Paid Tier (Tier 1)

Requires billing account linked to your Google Cloud project.

**Gemini 2.5 Pro**:
- Requests per minute: 150 RPM
- Tokens per minute: 2,000,000 TPM
- Requests per day: 10,000 RPD

**Gemini 2.5 Flash**:
- Requests per minute: 1,000 RPM
- Tokens per minute: 1,000,000 TPM
- Requests per day: 10,000 RPD

**Gemini 2.5 Flash-Lite**:
- Requests per minute: 4,000 RPM
- Tokens per minute: 4,000,000 TPM
- Requests per day: Not specified

### Higher Tiers (Tier 2 & 3)

**Tier 2** (requires $250+ spending and 30-day wait):
- Even higher limits available

**Tier 3** (requires $1,000+ spending and 30-day wait):
- Maximum limits available

**Tips:**
- Implement rate limit handling with exponential backoff
- Use batch processing for high-volume tasks
- Monitor usage in Google AI Studio
- Choose the right model based on your rate limit needs
- Official rate limits: https://ai.google.dev/gemini-api/docs/rate-limits

---

## SDK Migration Guide

### From @google/generative-ai to @google/genai

#### 1. Update Package

```bash
# Remove deprecated SDK
npm uninstall @google/generative-ai

# Install current SDK
npm install @google/genai@1.27.0
```

#### 2. Update Imports

**Old (DEPRECATED):**
```typescript
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
```

**New (CURRENT):**
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey });
// Use ai.models.generateContent() directly
```

#### 3. Update API Calls

**Old:**
```typescript
const result = await model.generateContent(prompt);
const response = await result.response;
const text = response.text();
```

**New:**
```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: prompt
});
const text = response.text;
```

#### 4. Update Streaming

**Old:**
```typescript
const result = await model.generateContentStream(prompt);
for await (const chunk of result.stream) {
  console.log(chunk.text());
}
```

**New:**
```typescript
const response = await ai.models.generateContentStream({
  model: 'gemini-2.5-flash',
  contents: prompt
});
for await (const chunk of response) {
  console.log(chunk.text);
}
```

#### 5. Update Chat

**Old:**
```typescript
const chat = model.startChat();
const result = await chat.sendMessage(message);
const response = await result.response;
```

**New:**
```typescript
const chat = await ai.models.createChat({ model: 'gemini-2.5-flash' });
const response = await chat.sendMessage(message);
// response.text is directly available
```

---

## Production Best Practices

### 1. Always Do

✅ **Use @google/genai** (NOT @google/generative-ai)
✅ **Set maxOutputTokens** to prevent excessive generation
✅ **Implement rate limit handling** with exponential backoff
✅ **Use environment variables** for API keys (never hardcode)
✅ **Validate inputs** before sending to API (save costs)
✅ **Use streaming** for better UX on long responses
✅ **Choose the right model** based on your needs (Pro for complex reasoning, Flash for balance, Flash-Lite for speed)
✅ **Handle errors gracefully** with try-catch
✅ **Monitor token usage** for cost control
✅ **Use correct model names**: gemini-2.5-pro/flash/flash-lite

### 2. Never Do

❌ **Never use @google/generative-ai** (deprecated!)
❌ **Never hardcode API keys** in code
❌ **Never claim 2M context** for Gemini 2.5 (it's 1,048,576 input tokens)
❌ **Never expose API keys** in client-side code
❌ **Never skip error handling** (always try-catch)
❌ **Never use generic rate limits** (each model has different limits - check official docs)
❌ **Never send PII** without user consent
❌ **Never trust user input** without validation
❌ **Never ignore rate limits** (will get 429 errors)
❌ **Never use old model names** like gemini-1.5-pro (use 2.5 models)

### 3. Security

- **API Key Storage**: Use environment variables or secret managers
- **Server-Side Only**: Never expose API keys in browser JavaScript
- **Input Validation**: Sanitize all user inputs before API calls
- **Rate Limiting**: Implement your own rate limits to prevent abuse
- **Error Messages**: Don't expose API keys or sensitive data in error logs

### 4. Cost Optimization

- **Choose Right Model**: Use Flash for most tasks, Pro only when needed
- **Set Token Limits**: Use maxOutputTokens to control costs
- **Batch Requests**: Process multiple items efficiently
- **Cache Results**: Store responses when appropriate
- **Monitor Usage**: Track token consumption in Google Cloud Console

### 5. Performance

- **Use Streaming**: Better perceived latency for long responses
- **Parallel Requests**: Use Promise.all() for independent calls
- **Edge Deployment**: Deploy to Cloudflare Workers for low latency
- **Connection Pooling**: Reuse HTTP connections when possible

---

## Quick Reference

### Installation
```bash
npm install @google/genai@1.30.0
```

### Environment
```bash
export GEMINI_API_KEY="..."
```

### Models (2025)
- `gemini-2.5-pro` (1,048,576 in / 65,536 out) - Best for complex reasoning
- `gemini-2.5-flash` (1,048,576 in / 65,536 out) - Best price-performance balance
- `gemini-2.5-flash-lite` (1,048,576 in / 65,536 out) - Fastest, most cost-effective

### Basic Generation
```typescript
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'Your prompt here'
});
console.log(response.text);
```

### Streaming
```typescript
const response = await ai.models.generateContentStream({...});
for await (const chunk of response) {
  console.log(chunk.text);
}
```

### Multimodal
```typescript
contents: [
  {
    parts: [
      { text: 'What is this?' },
      { inlineData: { data: base64Image, mimeType: 'image/jpeg' } }
    ]
  }
]
```

### Function Calling
```typescript
config: {
  tools: [{ functionDeclarations: [...] }]
}
```

---

**Last Updated**: 2025-11-26
**Production Validated**: All features tested with @google/genai@1.30.0
**Phase**: 2 Complete ✅ (All Core + Advanced Features)