Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:24:51 +08:00
commit 8aebb293cd
31 changed files with 7386 additions and 0 deletions

View File

@@ -0,0 +1,481 @@
# Code Execution Patterns
Complete guide to using code execution with Google Gemini API for computational tasks, data analysis, and problem-solving.
---
## What is Code Execution?
Code Execution allows Gemini models to generate and execute Python code to solve problems requiring computation, enabling the model to:
- Perform precise mathematical calculations
- Analyze data with pandas/numpy
- Generate charts and visualizations
- Implement algorithms
- Process files and data structures
---
## How It Works
1. **Model receives prompt** requiring computation
2. **Model generates Python code** to solve the problem
3. **Code executes in sandbox** (secure, isolated environment)
4. **Results return to model** for incorporation into response
5. **Model explains results** in natural language
---
## Enabling Code Execution
### Basic Setup (SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash', // Or gemini-2.5-pro
contents: 'Calculate the sum of first 50 prime numbers',
config: {
tools: [{ codeExecution: {} }] // Enable code execution
}
});
```
### Basic Setup (Fetch)
```typescript
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
tools: [{ code_execution: {} }],
contents: [{ parts: [{ text: 'Calculate...' }] }]
}),
}
);
```
---
## Available Python Packages
### Standard Library
- `math`, `statistics`, `random`
- `datetime`, `time`, `calendar`
- `json`, `csv`, `re`
- `collections`, `itertools`, `functools`
### Data Science
- `numpy` - numerical computing
- `pandas` - data analysis and manipulation
- `scipy` - scientific computing
### Visualization
- `matplotlib` - plotting and charts
- `seaborn` - statistical visualization
**Note**: This is a **limited sandbox environment** - not all PyPI packages are available.
---
## Response Structure
### Parsing Code Execution Results
```typescript
for (const part of response.candidates[0].content.parts) {
// Inline text
if (part.text) {
console.log('Text:', part.text);
}
// Generated code
if (part.executableCode) {
console.log('Language:', part.executableCode.language); // "PYTHON"
console.log('Code:', part.executableCode.code);
}
// Execution results
if (part.codeExecutionResult) {
console.log('Outcome:', part.codeExecutionResult.outcome); // "OUTCOME_OK" or "OUTCOME_FAILED"
console.log('Output:', part.codeExecutionResult.output);
}
}
```
### Example Response
```json
{
"candidates": [{
"content": {
"parts": [
{ "text": "I'll calculate that for you." },
{
"executableCode": {
"language": "PYTHON",
"code": "primes = []\nnum = 2\nwhile len(primes) < 50:\n if is_prime(num):\n primes.append(num)\n num += 1\nprint(sum(primes))"
}
},
{
"codeExecutionResult": {
"outcome": "OUTCOME_OK",
"output": "5117\n"
}
},
{ "text": "The sum is 5117." }
]
}
}]
}
```
---
## Common Patterns
### 1. Mathematical Calculations
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Calculate the 100th Fibonacci number',
config: { tools: [{ codeExecution: {} }] }
});
```
**Prompting Tip**: Use phrases like "generate and run code" or "calculate using code" to explicitly request code execution.
### 2. Data Analysis
```typescript
const prompt = `
Analyze this sales data:
month,revenue,customers
Jan,50000,120
Feb,62000,145
Mar,58000,138
Calculate:
1. Total revenue
2. Average revenue per customer
3. Month-over-month growth rate
Use pandas or numpy for analysis.
`;
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: prompt,
config: { tools: [{ codeExecution: {} }] }
});
```
### 3. Chart Generation
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Create a bar chart showing prime number distribution by last digit (0-9) for primes under 100',
config: { tools: [{ codeExecution: {} }] }
});
```
**Note**: Chart image data appears in `codeExecutionResult.output` (base64 encoded in some cases).
### 4. Algorithm Implementation
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Implement quicksort and sort this list: [64, 34, 25, 12, 22, 11, 90]. Show the sorted result.',
config: { tools: [{ codeExecution: {} }] }
});
```
### 5. File Processing (In-Memory)
```typescript
const csvData = `name,age,city
Alice,30,NYC
Bob,25,LA
Charlie,35,Chicago`;
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Parse this CSV data and calculate average age:\n\n${csvData}`,
config: { tools: [{ codeExecution: {} }] }
});
```
---
## Chat with Code Execution
### Multi-Turn Computational Conversations
```typescript
const chat = await ai.chats.create({
model: 'gemini-2.5-flash',
config: { tools: [{ codeExecution: {} }] }
});
// First turn
let response = await chat.sendMessage('I have a data analysis question');
console.log(response.text);
// Second turn (will use code execution)
response = await chat.sendMessage(`
Calculate statistics for: [12, 15, 18, 22, 25, 28, 30]
- Mean
- Median
- Standard deviation
`);
for (const part of response.candidates[0].content.parts) {
if (part.text) console.log(part.text);
if (part.executableCode) console.log('Code:', part.executableCode.code);
if (part.codeExecutionResult) console.log('Results:', part.codeExecutionResult.output);
}
```
---
## Error Handling
### Checking Execution Outcome
```typescript
for (const part of response.candidates[0].content.parts) {
if (part.codeExecutionResult) {
if (part.codeExecutionResult.outcome === 'OUTCOME_OK') {
console.log('✅ Success:', part.codeExecutionResult.output);
} else if (part.codeExecutionResult.outcome === 'OUTCOME_FAILED') {
console.error('❌ Execution failed:', part.codeExecutionResult.output);
}
}
}
```
### Common Execution Errors
**Timeout**:
```
Error: Execution timed out after 30 seconds
```
**Solution**: Simplify computation or reduce data size.
**Import Error**:
```
ModuleNotFoundError: No module named 'requests'
```
**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy).
**Syntax Error**:
```
SyntaxError: invalid syntax
```
**Solution**: Model generated invalid code - try rephrasing prompt or regenerating.
---
## Best Practices
### ✅ Do
1. **Be Explicit**: Use phrases like "generate and run code" to trigger code execution
2. **Provide Data**: Include data directly in prompt for analysis
3. **Specify Output**: Ask for specific calculations or metrics
4. **Use Available Packages**: Stick to numpy, pandas, matplotlib, scipy
5. **Check Outcome**: Always verify `outcome === 'OUTCOME_OK'`
### ❌ Don't
1. **Network Access**: Code cannot make HTTP requests
2. **File System**: No persistent file storage between executions
3. **Long Computations**: Timeout limits apply (~30 seconds)
4. **External Dependencies**: Can't install new packages
5. **State Persistence**: Each execution is isolated (no global state)
---
## Limitations
### Sandbox Restrictions
- **No Network Access**: Cannot call external APIs
- **No File I/O**: Cannot read/write to disk (in-memory only)
- **Limited Packages**: Only pre-installed packages available
- **Execution Timeout**: ~30 seconds maximum
- **No State**: Each execution is independent
### Supported Models
**Works with**:
- `gemini-2.5-pro`
- `gemini-2.5-flash`
**Does NOT work with**:
- `gemini-2.5-flash-lite` (no code execution support)
- Gemini 1.5 models (use Gemini 2.5)
---
## Advanced Patterns
### Iterative Analysis
```typescript
const chat = await ai.chats.create({
model: 'gemini-2.5-flash',
config: { tools: [{ codeExecution: {} }] }
});
// Step 1: Initial analysis
let response = await chat.sendMessage('Analyze data: [10, 20, 30, 40, 50]');
// Step 2: Follow-up based on results
response = await chat.sendMessage('Now calculate the variance');
// Step 3: Visualization
response = await chat.sendMessage('Create a histogram of this data');
```
### Combining with Function Calling
```typescript
const weatherFunction = {
name: 'get_current_weather',
description: 'Get weather for a city',
parametersJsonSchema: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city']
}
};
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Get weather for NYC, LA, Chicago. Calculate the average temperature.',
config: {
tools: [
{ functionDeclarations: [weatherFunction] },
{ codeExecution: {} }
]
}
});
// Model will:
// 1. Call get_current_weather for each city
// 2. Generate code to calculate average
// 3. Return result
```
### Data Transformation Pipeline
```typescript
const prompt = `
Transform this data:
Input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Pipeline:
1. Filter odd numbers
2. Square each number
3. Calculate sum
4. Return result
Use code to process.
`;
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: prompt,
config: { tools: [{ codeExecution: {} }] }
});
```
---
## Optimization Tips
### 1. Clear Instructions
**❌ Vague**:
```typescript
contents: 'Analyze this data'
```
**✅ Specific**:
```typescript
contents: 'Calculate mean, median, and standard deviation for: [12, 15, 18, 22, 25]'
```
### 2. Provide Complete Data
```typescript
const csvData = `...complete dataset...`;
const prompt = `Analyze this CSV data:\n\n${csvData}\n\nCalculate total revenue.`;
```
### 3. Request Code Explicitly
```typescript
contents: 'Generate and run code to calculate the factorial of 20'
```
### 4. Handle Large Datasets
For large data, consider:
- Sampling (analyze subset)
- Aggregation (group by categories)
- Pagination (process in chunks)
---
## Troubleshooting
### Code Not Executing
**Symptom**: Response has text but no `executableCode`
**Causes**:
1. Code execution not enabled (`tools: [{ codeExecution: {} }]`)
2. Model decided code wasn't necessary
3. Using `gemini-2.5-flash-lite` (doesn't support code execution)
**Solution**: Be explicit in prompt: "Use code to calculate..."
### Timeout Errors
**Symptom**: `OUTCOME_FAILED` with timeout message
**Causes**: Computation too complex or data too large
**Solution**:
- Simplify algorithm
- Reduce data size
- Use more efficient approach
### Import Errors
**Symptom**: `ModuleNotFoundError`
**Causes**: Trying to import unavailable package
**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy)
---
## References
- Official Docs: https://ai.google.dev/gemini-api/docs/code-execution
- Templates: See `code-execution.ts` for working examples
- Available Packages: See "Available Python Packages" section above

View File

@@ -0,0 +1,373 @@
# Context Caching Guide
Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%.
---
## What is Context Caching?
Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency.
---
## How It Works
1. **Create a cache** with your repeated content (documents, videos, system instructions)
2. **Set TTL** (time-to-live) for cache expiration
3. **Reference the cache** in subsequent API calls
4. **Pay less** - cached tokens cost ~90% less than regular input tokens
---
## Benefits
### Cost Savings
- **Cached input tokens**: ~90% cheaper than regular tokens
- **Output tokens**: Same price (not cached)
- **Example**: 100K token document cached → ~10K token cost equivalent
### Performance
- **Reduced latency**: Cached content is preprocessed
- **Faster responses**: No need to reprocess large context
- **Consistent results**: Same context every time
### Use Cases
- Large documents analyzed repeatedly
- Long system instructions used across sessions
- Video/audio files queried multiple times
- Consistent conversation context
---
## Cache Creation
### Basic Cache (SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const cache = await ai.caches.create({
model: 'gemini-2.5-flash-001', // Must use explicit version!
config: {
displayName: 'my-cache',
systemInstruction: 'You are a helpful assistant.',
contents: 'Large document content here...',
ttl: '3600s', // 1 hour
}
});
```
### Cache with Expiration Time
```typescript
// Set specific expiration time (timezone-aware)
const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now
const cache = await ai.caches.create({
model: 'gemini-2.5-flash-001',
config: {
displayName: 'my-cache',
contents: documentText,
expireTime: expirationTime, // Use expireTime instead of ttl
}
});
```
---
## TTL (Time-To-Live) Guidelines
### Recommended TTL Values
| Use Case | TTL | Reason |
|----------|-----|--------|
| Quick analysis session | 300s (5 min) | Short-lived tasks |
| Extended conversation | 3600s (1 hour) | Standard session length |
| Daily batch processing | 86400s (24 hours) | Reuse across day |
| Long-term analysis | 604800s (7 days) | Maximum allowed |
### TTL vs Expiration Time
**TTL (time-to-live)**:
- Relative duration from cache creation
- Format: `"3600s"` (string with 's' suffix)
- Easy for session-based caching
**Expiration Time**:
- Absolute timestamp
- Must be timezone-aware Date object
- Precise control over cache lifetime
---
## Using a Cache
### Generate Content with Cache (SDK)
```typescript
// Use cache name as model parameter
const response = await ai.models.generateContent({
model: cache.name, // Use cache.name, not original model name
contents: 'Summarize the document'
});
console.log(response.text);
```
### Multiple Queries with Same Cache
```typescript
const queries = [
'What are the key points?',
'Who are the main characters?',
'What is the conclusion?'
];
for (const query of queries) {
const response = await ai.models.generateContent({
model: cache.name,
contents: query
});
console.log(`Q: ${query}`);
console.log(`A: ${response.text}\n`);
}
```
---
## Cache Management
### Update Cache TTL
```typescript
// Extend cache lifetime before it expires
await ai.caches.update({
name: cache.name,
config: {
ttl: '7200s' // Extend to 2 hours
}
});
```
### List All Caches
```typescript
const caches = await ai.caches.list();
caches.forEach(cache => {
console.log(`${cache.displayName}: ${cache.name}`);
console.log(`Expires: ${cache.expireTime}`);
});
```
### Delete Cache
```typescript
// Delete when no longer needed
await ai.caches.delete({ name: cache.name });
```
---
## Advanced Use Cases
### Caching Video Files
```typescript
import fs from 'fs';
// 1. Upload video
const videoFile = await ai.files.upload({
file: fs.createReadStream('./video.mp4')
});
// 2. Wait for processing
while (videoFile.state.name === 'PROCESSING') {
await new Promise(resolve => setTimeout(resolve, 2000));
videoFile = await ai.files.get({ name: videoFile.name });
}
// 3. Create cache with video
const cache = await ai.caches.create({
model: 'gemini-2.5-flash-001',
config: {
displayName: 'video-cache',
systemInstruction: 'Analyze this video.',
contents: [videoFile],
ttl: '600s'
}
});
// 4. Query video multiple times
const response1 = await ai.models.generateContent({
model: cache.name,
contents: 'What happens in the first minute?'
});
const response2 = await ai.models.generateContent({
model: cache.name,
contents: 'Who are the main people?'
});
```
### Caching with System Instructions
```typescript
const cache = await ai.caches.create({
model: 'gemini-2.5-flash-001',
config: {
displayName: 'legal-expert-cache',
systemInstruction: `
You are a legal expert specializing in contract law.
Always cite relevant sections when making claims.
Use clear, professional language.
`,
contents: largeContractDocument,
ttl: '3600s'
}
});
// System instruction is part of cached context
const response = await ai.models.generateContent({
model: cache.name,
contents: 'Is this contract enforceable?'
});
```
---
## Important Notes
### Model Version Requirement
**⚠️ You MUST use explicit version suffixes when creating caches:**
```typescript
// ✅ CORRECT
model: 'gemini-2.5-flash-001'
// ❌ WRONG (will fail)
model: 'gemini-2.5-flash'
```
### Cache Expiration
- Caches are **automatically deleted** after TTL expires
- **Cannot recover** expired caches - must recreate
- Update TTL **before expiration** to extend lifetime
### Cost Calculation
```
Regular request: 100,000 input tokens = 100K token cost
With caching (after cache creation):
- Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost
- New tokens: 1,000 × 1.0 = 1K cost
- Total: 11K equivalent (89% savings!)
```
### Limitations
- Maximum TTL: 7 days (604800s)
- Cache creation costs same as regular tokens (first time only)
- Subsequent uses get 90% discount
- Only input tokens are cached (output tokens never cached)
---
## Best Practices
### When to Use Caching
**Good Use Cases:**
- Large documents queried repeatedly (legal docs, research papers)
- Video/audio files analyzed with different questions
- Long system instructions used across many requests
- Consistent context in multi-turn conversations
**Bad Use Cases:**
- Single-use content (no benefit)
- Frequently changing content
- Short content (<1000 tokens) - minimal savings
- Content used only once per day (cache might expire)
### Optimization Tips
1. **Cache Early**: Create cache at session start
2. **Extend TTL**: Update before expiration if still needed
3. **Monitor Usage**: Track how often cache is reused
4. **Clean Up**: Delete unused caches to avoid clutter
5. **Combine Features**: Use caching with code execution, grounding for powerful workflows
### Cache Naming
Use descriptive `displayName` for easy identification:
```typescript
// ✅ Good names
displayName: 'financial-report-2024-q3'
displayName: 'legal-contract-acme-corp'
displayName: 'video-analysis-project-x'
// ❌ Vague names
displayName: 'cache1'
displayName: 'test'
```
---
## Troubleshooting
### "Invalid model name" Error
**Problem**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
**Solution**: Always use explicit version suffix:
```typescript
model: 'gemini-2.5-flash-001' // Correct
```
### Cache Expired Error
**Problem**: Trying to use cache after TTL expired
**Solution**: Check expiration before use or extend TTL proactively:
```typescript
const cache = await ai.caches.get({ name: cacheName });
if (new Date(cache.expireTime) < new Date()) {
// Cache expired, recreate it
cache = await ai.caches.create({ ... });
}
```
### High Costs Despite Caching
**Problem**: Creating new cache for each request
**Solution**: Reuse the same cache across multiple requests:
```typescript
// ❌ Wrong - creates new cache each time
for (const query of queries) {
const cache = await ai.caches.create({ ... }); // Expensive!
const response = await ai.models.generateContent({ model: cache.name, ... });
}
// ✅ Correct - create once, use many times
const cache = await ai.caches.create({ ... }); // Create once
for (const query of queries) {
const response = await ai.models.generateContent({ model: cache.name, ... });
}
```
---
## References
- Official Docs: https://ai.google.dev/gemini-api/docs/caching
- Cost Optimization: See "Cost Optimization" in main SKILL.md
- Templates: See `context-caching.ts` for working examples

View File

@@ -0,0 +1,59 @@
# Function Calling Patterns
Complete guide to implementing function calling (tool use) with Gemini API.
---
## Basic Pattern
1. Define function declarations
2. Send request with tools
3. Check if model wants to call functions
4. Execute functions
5. Send results back to model
6. Get final response
---
## Function Declaration Schema
```typescript
{
name: string, // Function name (no spaces)
description: string, // What the function does
parametersJsonSchema: { // Subset of OpenAPI schema
type: 'object',
properties: {
[paramName]: {
type: string, // 'string' | 'number' | 'boolean' | 'array' | 'object'
description: string, // Parameter description
enum?: string[] // Optional: allowed values
}
},
required: string[] // Required parameter names
}
}
```
---
## Calling Modes
- **AUTO** (default): Model decides when to call
- **ANY**: Force at least one function call
- **NONE**: Disable function calling
---
## Parallel vs Compositional
**Parallel**: Independent functions run simultaneously
**Compositional**: Sequential dependencies (A → B → C)
Gemini automatically detects which pattern to use.
---
## Official Docs
https://ai.google.dev/gemini-api/docs/function-calling

View File

@@ -0,0 +1,57 @@
# Generation Configuration Reference
Complete reference for all generation parameters.
---
## All Parameters
```typescript
config: {
temperature: number, // 0.0-2.0 (default: 1.0)
topP: number, // 0.0-1.0 (default: 0.95)
topK: number, // 1-100+ (default: 40)
maxOutputTokens: number, // 1-65536
stopSequences: string[], // Stop at these strings
responseMimeType: string, // 'text/plain' | 'application/json'
candidateCount: number, // Usually 1
thinkingConfig: {
thinkingBudget: number // Max thinking tokens
}
}
```
---
## Parameter Guidelines
### temperature
- **0.0**: Deterministic, focused
- **1.0**: Balanced (default)
- **2.0**: Very creative, random
### topP (nucleus sampling)
- **0.95**: Default, good balance
- Lower = more focused
### topK
- **40**: Default
- Higher = more diversity
### maxOutputTokens
- Always set this to prevent excessive generation
- Max: 65,536 tokens
---
## Use Cases
**Factual tasks**: temperature=0.0, topP=0.8
**Creative tasks**: temperature=1.2, topP=0.95
**Code generation**: temperature=0.3, topP=0.9
---
## Official Docs
https://ai.google.dev/gemini-api/docs/models/generative-models#model-parameters

View File

@@ -0,0 +1,602 @@
# Grounding with Google Search Guide
Complete guide to using grounding with Google Search to connect Gemini models to real-time web information, reducing hallucinations and providing verifiable, up-to-date responses.
---
## What is Grounding?
Grounding connects the Gemini model to Google Search, allowing it to:
- Access real-time information beyond training cutoff
- Reduce hallucinations with fact-checked web sources
- Provide citations and source URLs
- Answer questions about current events
- Verify information against the web
---
## How It Works
1. **Model receives query** (e.g., "Who won Euro 2024?")
2. **Model determines** if current information is needed
3. **Performs Google Search** automatically
4. **Processes search results** (web pages, snippets)
5. **Incorporates findings** into response
6. **Provides citations** with source URLs
---
## Two Grounding APIs
### 1. Google Search (`googleSearch`) - Recommended for Gemini 2.5
**Simple, automatic grounding**:
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Who won the euro 2024?',
config: {
tools: [{ googleSearch: {} }]
}
});
```
**Features**:
- Simple configuration (empty object)
- Automatic search when model needs current info
- Available on all Gemini 2.5 models
- Recommended for new projects
### 2. Google Search Retrieval (`googleSearchRetrieval`) - Legacy for Gemini 1.5
**Dynamic threshold control**:
```typescript
import { DynamicRetrievalConfigMode } from '@google/genai';
const response = await ai.models.generateContent({
model: 'gemini-1.5-flash',
contents: 'Who won the euro 2024?',
config: {
tools: [{
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
dynamicThreshold: 0.7 // Search only if confidence < 70%
}
}
}]
}
});
```
**Features**:
- Control when searches happen via threshold
- Used with Gemini 1.5 models
- More configuration options
**Recommendation**: Use `googleSearch` for Gemini 2.5 models (simpler and newer).
---
## Basic Usage
### SDK Approach (Gemini 2.5)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What are the latest developments in AI?',
config: {
tools: [{ googleSearch: {} }]
}
});
console.log(response.text);
// Check if grounding was used
if (response.candidates[0].groundingMetadata) {
console.log('✓ Search performed');
console.log('Sources:', response.candidates[0].groundingMetadata.webPages);
} else {
console.log('✓ Answered from model knowledge');
}
```
### Fetch Approach (Cloudflare Workers)
```typescript
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-goog-api-key': env.GEMINI_API_KEY,
},
body: JSON.stringify({
contents: [{ parts: [{ text: 'What are the latest developments in AI?' }] }],
tools: [{ google_search: {} }]
}),
}
);
const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);
```
---
## Grounding Metadata
### Structure
```typescript
{
groundingMetadata: {
// Search queries performed
searchQueries: [
{ text: "euro 2024 winner" }
],
// Web pages retrieved
webPages: [
{
url: "https://example.com/euro-2024",
title: "UEFA Euro 2024 Results",
snippet: "Spain won UEFA Euro 2024..."
}
],
// Citations (inline references)
citations: [
{
startIndex: 42,
endIndex: 47,
uri: "https://example.com/euro-2024"
}
],
// Retrieval queries (alternative search terms)
retrievalQueries: [
{ query: "who won euro 2024 final" }
]
}
}
```
### Accessing Metadata
```typescript
if (response.candidates[0].groundingMetadata) {
const metadata = response.candidates[0].groundingMetadata;
// Display sources
console.log('Sources:');
metadata.webPages?.forEach((page, i) => {
console.log(`${i + 1}. ${page.title}`);
console.log(` ${page.url}`);
});
// Display citations
console.log('\nCitations:');
metadata.citations?.forEach((citation) => {
console.log(`Position ${citation.startIndex}-${citation.endIndex}: ${citation.uri}`);
});
}
```
---
## When to Use Grounding
### ✅ Good Use Cases
**Current Events**:
```typescript
'What happened in the news today?'
'Who won the latest sports championship?'
'What are the current stock prices?'
```
**Recent Developments**:
```typescript
'What are the latest AI breakthroughs?'
'What are recent changes in climate policy?'
```
**Fact-Checking**:
```typescript
'Is this claim true: [claim]?'
'What does the latest research say about [topic]?'
```
**Real-Time Data**:
```typescript
'What is the current weather in Tokyo?'
'What are today's cryptocurrency prices?'
```
### ❌ Not Recommended For
**General Knowledge**:
```typescript
'What is the capital of France?' // Model knows this
'How does photosynthesis work?' // Stable knowledge
```
**Mathematical Calculations**:
```typescript
'What is 15 * 27?' // Use code execution instead
```
**Creative Tasks**:
```typescript
'Write a poem about autumn' // No search needed
```
**Code Generation**:
```typescript
'Write a sorting algorithm' // Internal reasoning sufficient
```
---
## Chat with Grounding
### Multi-Turn Conversations
```typescript
const chat = await ai.chats.create({
model: 'gemini-2.5-flash',
config: {
tools: [{ googleSearch: {} }]
}
});
// First question
let response = await chat.sendMessage('What are the latest quantum computing developments?');
console.log(response.text);
// Display sources
if (response.candidates[0].groundingMetadata) {
const sources = response.candidates[0].groundingMetadata.webPages || [];
console.log(`\nSources: ${sources.length} web pages`);
sources.forEach(s => console.log(`- ${s.title}: ${s.url}`));
}
// Follow-up question
response = await chat.sendMessage('Which company made the biggest breakthrough?');
console.log('\n' + response.text);
```
---
## Combining with Other Features
### Grounding + Function Calling
```typescript
const weatherFunction = {
name: 'get_current_weather',
description: 'Get weather for a location',
parametersJsonSchema: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' }
},
required: ['location']
}
};
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What is the weather like in the city that won Euro 2024?',
config: {
tools: [
{ googleSearch: {} }, // For finding Euro 2024 winner
{ functionDeclarations: [weatherFunction] } // For weather lookup
]
}
});
// Model will:
// 1. Use Google Search to find Euro 2024 winner (Madrid/Spain)
// 2. Call get_current_weather function with the city
// 3. Combine both results in response
```
### Grounding + Code Execution
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Find the current stock prices for AAPL, GOOGL, MSFT and calculate their average',
config: {
tools: [
{ googleSearch: {} }, // For current stock prices
{ codeExecution: {} } // For averaging
]
}
});
// Model will:
// 1. Search for current stock prices
// 2. Generate code to calculate average
// 3. Execute code with the found prices
// 4. Return result with citations
```
---
## Checking Grounding Usage
### Determine if Search Was Performed
```typescript
const queries = [
'What is 2+2?', // Should NOT use search
'What happened in the news today?' // Should use search
];
for (const query of queries) {
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: query,
config: { tools: [{ googleSearch: {} }] }
});
console.log(`Query: ${query}`);
console.log(`Search used: ${response.candidates[0].groundingMetadata ? 'YES' : 'NO'}`);
console.log();
}
```
**Output**:
```
Query: What is 2+2?
Search used: NO
Query: What happened in the news today?
Search used: YES
```
---
## Dynamic Retrieval (Gemini 1.5)
### Threshold-Based Grounding
```typescript
const response = await ai.models.generateContent({
model: 'gemini-1.5-flash',
contents: 'Who won the euro 2024?',
config: {
tools: [{
googleSearchRetrieval: {
dynamicRetrievalConfig: {
mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
dynamicThreshold: 0.7 // Search only if confidence < 70%
}
}
}]
}
});
if (!response.candidates[0].groundingMetadata) {
console.log('Model answered from knowledge (confidence >= 70%)');
} else {
console.log('Search performed (confidence < 70%)');
}
```
**How It Works**:
- Model evaluates confidence in its internal knowledge
- If confidence < threshold → performs search
- If confidence >= threshold → uses internal knowledge
**Threshold Values**:
- `0.0`: Never search (always use internal knowledge)
- `0.5`: Search if moderately uncertain
- `0.7`: Search if somewhat uncertain (good default)
- `1.0`: Always search
---
## Best Practices
### ✅ Do
1. **Check Metadata**: Always verify if grounding was used
```typescript
if (response.candidates[0].groundingMetadata) { ... }
```
2. **Display Citations**: Show sources to users for transparency
```typescript
metadata.webPages.forEach(page => {
console.log(`Source: ${page.title} (${page.url})`);
});
```
3. **Use Specific Queries**: Better search results with clear questions
```typescript
// ✅ Good: "What are Microsoft's Q3 2024 earnings?"
// ❌ Vague: "Tell me about Microsoft"
```
4. **Combine Features**: Use with function calling/code execution for powerful workflows
5. **Handle Missing Metadata**: Not all queries trigger search
```typescript
const sources = response.candidates[0].groundingMetadata?.webPages || [];
```
### ❌ Don't
1. **Don't Assume Search Always Happens**: Model decides when to search
2. **Don't Ignore Citations**: They're crucial for fact-checking
3. **Don't Use for Stable Knowledge**: Waste of resources for unchanging facts
4. **Don't Expect Perfect Coverage**: Not all information is on the web
---
## Cost and Performance
### Cost Considerations
- **Added Latency**: Search takes 1-3 seconds typically
- **Token Costs**: Retrieved content counts as input tokens
- **Rate Limits**: Subject to API rate limits
### Optimization
**Use Dynamic Threshold** (Gemini 1.5):
```typescript
dynamicThreshold: 0.7 // Higher = more searches, lower = fewer searches
```
**Cache Grounding Results** (if appropriate):
```typescript
const cache = await ai.caches.create({
model: 'gemini-2.5-flash-001',
config: {
displayName: 'grounding-cache',
tools: [{ googleSearch: {} }],
contents: 'Initial query that triggers search...',
ttl: '3600s'
}
});
// Subsequent queries reuse cached grounding results
```
---
## Troubleshooting
### Grounding Not Working
**Symptom**: No `groundingMetadata` in response
**Causes**:
1. Grounding not enabled: `tools: [{ googleSearch: {} }]`
2. Model decided search wasn't needed (query answerable from knowledge)
3. Google Cloud project not configured (grounding requires GCP)
**Solution**:
- Verify `tools` configuration
- Use queries requiring current information
- Set up Google Cloud project
### Poor Search Quality
**Symptom**: Irrelevant sources or wrong information
**Causes**:
- Vague query
- Search terms ambiguous
- Recent events not yet indexed
**Solution**:
- Make queries more specific
- Include context in prompt
- Verify search queries in metadata
### Citations Missing
**Symptom**: `groundingMetadata` present but no citations
**Explanation**: Citations are **inline references** - they may not always be present if model doesn't directly quote sources.
**Solution**: Check `webPages` instead for full source list
---
## Important Requirements
### Google Cloud Project
**⚠️ Grounding requires a Google Cloud project, not just an API key.**
**Setup**:
1. Create Google Cloud project
2. Enable Generative Language API
3. Configure billing
4. Use API key from that project
**Error if Missing**:
```
Error: Grounding requires Google Cloud project configuration
```
### Model Support
**✅ Supported**:
- All Gemini 2.5 models (`googleSearch`)
- All Gemini 1.5 models (`googleSearchRetrieval`)
**❌ Not Supported**:
- Gemini 1.0 models
---
## Examples
### News Summary
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Summarize today's top 3 technology news headlines',
config: { tools: [{ googleSearch: {} }] }
});
console.log(response.text);
metadata.webPages?.forEach((page, i) => {
console.log(`${i + 1}. ${page.title}: ${page.url}`);
});
```
### Fact Verification
```typescript
const claim = "The Earth is flat";
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Is this claim true: "${claim}"? Use reliable sources to verify.`,
config: { tools: [{ googleSearch: {} }] }
});
console.log(response.text);
```
### Market Research
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'What are the current trends in electric vehicle adoption in 2024?',
config: { tools: [{ googleSearch: {} }] }
});
console.log(response.text);
console.log('\nSources:');
metadata.webPages?.forEach(page => {
console.log(`- ${page.title}`);
});
```
---
## References
- Official Docs: https://ai.google.dev/gemini-api/docs/grounding
- Google Search Docs: https://ai.google.dev/gemini-api/docs/google-search
- Templates: See `grounding-search.ts` for working examples
- Combined Features: See `combined-advanced.ts` for integration patterns

289
references/models-guide.md Normal file
View File

@@ -0,0 +1,289 @@
# Gemini Models Guide (2025)
**Last Updated**: 2025-11-19 (Gemini 3 preview release)
---
## Gemini 3 Series (Preview - November 2025)
### gemini-3-pro-preview
**Model ID**: `gemini-3-pro-preview`
**Status**: 🆕 Preview release (November 18, 2025)
**Context Windows**:
- Input: TBD (documentation pending)
- Output: TBD (documentation pending)
**Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.
**Best For**:
- Most complex reasoning tasks
- Advanced multimodal analysis (images, videos, PDFs, audio)
- Benchmark-critical applications
- Cutting-edge projects requiring latest capabilities
- Tasks requiring absolute best quality
**Features**:
- ✅ Enhanced multimodal understanding
- ✅ Function calling
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
- TBD Thinking mode (documentation pending)
**Knowledge Cutoff**: TBD
**Pricing**: Preview pricing (likely higher than 2.5 Pro)
**⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability.
**New Capabilities**:
- Record-breaking benchmark performance
- Enhanced generative UI responses
- Advanced coding capabilities (Google Antigravity integration)
- State-of-the-art multimodal understanding
---
## Current Production Models (Gemini 2.5 - Stable)
### gemini-2.5-pro
**Model ID**: `gemini-2.5-pro`
**Context Windows**:
- Input: 1,048,576 tokens (NOT 2M!)
- Output: 65,536 tokens
**Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.
**Best For**:
- Complex reasoning tasks
- Advanced code generation and optimization
- Mathematical problem-solving
- Multi-step logical analysis
- STEM applications
**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Higher cost, use for tasks requiring best quality
---
### gemini-2.5-flash
**Model ID**: `gemini-2.5-flash`
**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
**Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.
**Best For**:
- General-purpose AI applications
- High-volume API calls
- Agentic workflows
- Cost-sensitive applications
- Production workloads
**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Best price-performance ratio
**⭐ Recommended**: This is the default choice for most applications
---
### gemini-2.5-flash-lite
**Model ID**: `gemini-2.5-flash-lite`
**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
**Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput.
**Best For**:
- High-throughput applications
- Simple text generation
- Cost-critical use cases
- Speed-prioritized workloads
**Features**:
- ✅ Thinking mode (enabled by default)
-**NO function calling** (critical limitation!)
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Lowest cost
**⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.
---
## Model Comparison Matrix
| Feature | Pro | Flash | Flash-Lite |
|---------|-----|-------|------------|
| **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON |
| **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** |
| **Multimodal** | ✅ Full | ✅ Full | ✅ Full |
| **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes |
| **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 |
| **Output Tokens** | 65,536 | 65,536 | 65,536 |
| **Reasoning Quality** | Best | Good | Basic |
| **Speed** | Moderate | Fast | Fastest |
| **Cost** | Highest | Medium | Lowest |
---
## Previous Generation Models (Still Available)
### Gemini 2.0 Flash
**Model ID**: `gemini-2.0-flash`
**Context**: 1M input / 65K output tokens
**Status**: Previous generation, 2.5 Flash recommended instead
### Gemini 1.5 Pro
**Model ID**: `gemini-1.5-pro`
**Context**: 2M input tokens (this is the ONLY model with 2M!)
**Status**: Older model, 2.5 models recommended
---
## Context Window Clarification
**⚠️ CRITICAL CORRECTION**:
**ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million)
**INACCURATE**: Claiming Gemini 2.5 has 2M token context window
**WHY THIS MATTERS**:
- Gemini 1.5 Pro (older model) had 2M tokens
- Gemini 2.5 models (current) have ~1M tokens
- This is a common mistake that causes confusion!
**This skill prevents this error by providing accurate information.**
---
## Model Selection Guide
### Use gemini-2.5-pro When:
- ✅ Complex reasoning required (math, logic, STEM)
- ✅ Advanced code generation and optimization
- ✅ Multi-step problem-solving
- ✅ Quality is more important than cost
- ✅ Tasks require maximum capability
### Use gemini-2.5-flash When:
- ✅ General-purpose AI applications
- ✅ High-volume production workloads
- ✅ Function calling required
- ✅ Agentic workflows
- ✅ Good balance of cost and quality needed
-**Recommended default choice**
### Use gemini-2.5-flash-lite When:
- ✅ Simple text generation only
- ✅ No function calling needed
- ✅ High throughput required
- ✅ Cost is primary concern
- ⚠️ **Only if you don't need function calling!**
---
## Common Mistakes
### ❌ Mistake 1: Using Wrong Model Name
```typescript
// WRONG - old model name
model: 'gemini-1.5-pro'
// CORRECT - current model
model: 'gemini-2.5-flash'
```
### ❌ Mistake 2: Claiming 2M Context for 2.5 Models
```typescript
// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"
// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M
```
### ❌ Mistake 3: Using Flash-Lite for Function Calling
```typescript
// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
tools: [{ functionDeclarations: [...] }] // This will FAIL
}
// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
tools: [{ functionDeclarations: [...] }]
}
```
---
## Rate Limits (Free vs Paid)
### Free Tier
- **15 RPM** (requests per minute)
- **1M TPM** (tokens per minute)
- **1,500 RPD** (requests per day)
### Paid Tier
- **360 RPM**
- **4M TPM**
- Unlimited daily requests
**Tip**: Monitor your usage and implement rate limiting to stay within quotas.
---
## Official Documentation
- **Models Overview**: https://ai.google.dev/gemini-api/docs/models
- **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
- **Pricing**: https://ai.google.dev/pricing
---
**Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).

View File

@@ -0,0 +1,58 @@
# Multimodal Guide
Complete guide to using images, video, audio, and PDFs with Gemini API.
---
## Supported Formats
### Images
- JPEG, PNG, WebP, HEIC, HEIF
- Max size: 20MB
### Video
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV
- Max size: 2GB
- Max length (inline): 2 minutes
### Audio
- MP3, WAV, FLAC, AAC, OGG, OPUS
- Max size: 20MB
### PDFs
- Max size: 30MB
- Text-based PDFs work best
---
## Usage Pattern
```typescript
contents: [
{
parts: [
{ text: 'Your question' },
{
inlineData: {
data: base64EncodedData,
mimeType: 'image/jpeg' // or video/mp4, audio/mp3, application/pdf
}
}
]
}
]
```
---
## Best Practices
- Use specific, detailed prompts
- Combine multiple modalities in one request
- For large files (>2GB), use File API (Phase 2)
---
## Official Docs
https://ai.google.dev/gemini-api/docs/vision

View File

@@ -0,0 +1,235 @@
# SDK Migration Guide
**From**: `@google/generative-ai` (DEPRECATED)
**To**: `@google/genai` (CURRENT)
**Deadline**: November 30, 2025 (deprecated SDK sunset)
---
## Why Migrate?
The `@google/generative-ai` SDK is deprecated and will stop receiving updates on **November 30, 2025**.
The new `@google/genai` SDK:
- ✅ Works with both Gemini API and Vertex AI
- ✅ Supports Gemini 2.0+ features
- ✅ Better TypeScript support
- ✅ Unified API across platforms
- ✅ Active development and updates
---
## Migration Steps
### 1. Update Package
```bash
# Remove deprecated SDK
npm uninstall @google/generative-ai
# Install current SDK
npm install @google/genai@1.27.0
```
### 2. Update Imports
**Old (DEPRECATED)**:
```typescript
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(apiKey);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
```
**New (CURRENT)**:
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey });
// No need to get model separately
```
### 3. Update API Calls
**Old**:
```typescript
const result = await model.generateContent(prompt);
const response = await result.response;
const text = response.text();
```
**New**:
```typescript
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: prompt
});
const text = response.text;
```
### 4. Update Streaming
**Old**:
```typescript
const result = await model.generateContentStream(prompt);
for await (const chunk of result.stream) {
console.log(chunk.text());
}
```
**New**:
```typescript
const response = await ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: prompt
});
for await (const chunk of response) {
console.log(chunk.text);
}
```
### 5. Update Chat
**Old**:
```typescript
const chat = model.startChat({
history: []
});
const result = await chat.sendMessage(message);
const response = await result.response;
console.log(response.text());
```
**New**:
```typescript
const chat = await ai.models.createChat({
model: 'gemini-2.5-flash',
history: []
});
const response = await chat.sendMessage(message);
console.log(response.text);
```
---
## Complete Before/After Example
### Before (Deprecated SDK)
```typescript
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
// Generate
const result = await model.generateContent('Hello');
const response = await result.response;
console.log(response.text());
// Stream
const streamResult = await model.generateContentStream('Write a story');
for await (const chunk of streamResult.stream) {
console.log(chunk.text());
}
// Chat
const chat = model.startChat();
const chatResult = await chat.sendMessage('Hi');
const chatResponse = await chatResult.response;
console.log(chatResponse.text());
```
### After (Current SDK)
```typescript
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Generate
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'Hello'
});
console.log(response.text);
// Stream
const streamResponse = await ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: 'Write a story'
});
for await (const chunk of streamResponse) {
console.log(chunk.text);
}
// Chat
const chat = await ai.models.createChat({ model: 'gemini-2.5-flash' });
const chatResponse = await chat.sendMessage('Hi');
console.log(chatResponse.text);
```
---
## Key Differences
| Aspect | Old SDK | New SDK |
|--------|---------|---------|
| Package | `@google/generative-ai` | `@google/genai` |
| Class | `GoogleGenerativeAI` | `GoogleGenAI` |
| Model Init | `genAI.getGenerativeModel()` | Specify in each call |
| Text Access | `response.text()` (method) | `response.text` (property) |
| Stream Iteration | `result.stream` | Direct iteration |
| Chat Creation | `model.startChat()` | `ai.models.createChat()` |
---
## Troubleshooting
### Error: "Cannot find module '@google/generative-ai'"
**Cause**: Old import statement after migration
**Solution**: Update all imports to `@google/genai`
### Error: "Property 'text' does not exist"
**Cause**: Using `response.text()` (method) instead of `response.text` (property)
**Solution**: Remove parentheses: `response.text` not `response.text()`
### Error: "generateContent is not a function"
**Cause**: Trying to call methods on old model object
**Solution**: Use `ai.models.generateContent()` directly
---
## Automated Migration Script
```bash
# Find all files using old SDK
rg "@google/generative-ai" --type ts
# Replace import statements
find . -name "*.ts" -exec sed -i 's/@google\/generative-ai/@google\/genai/g' {} +
# Replace class name
find . -name "*.ts" -exec sed -i 's/GoogleGenerativeAI/GoogleGenAI/g' {} +
```
**⚠️ Note**: This script handles imports but NOT API changes. Manual review required!
---
## Official Resources
- **Migration Guide**: https://ai.google.dev/gemini-api/docs/migrate-to-genai
- **New SDK Docs**: https://github.com/googleapis/js-genai
- **Deprecated SDK**: https://github.com/google-gemini/deprecated-generative-ai-js
---
**Deadline Reminder**: November 30, 2025 - Deprecated SDK sunset

View File

@@ -0,0 +1,81 @@
# Streaming Patterns
Complete guide to implementing streaming with Gemini API.
---
## SDK Approach (Async Iteration)
```typescript
const response = await ai.models.generateContentStream({
model: 'gemini-2.5-flash',
contents: 'Write a story'
});
for await (const chunk of response) {
process.stdout.write(chunk.text);
}
```
**Pros**: Simple, automatic parsing
**Cons**: Requires Node.js or compatible runtime
---
## Fetch Approach (SSE Parsing)
```typescript
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent',
{ /* ... */ }
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const data = JSON.parse(line.slice(6));
const text = data.candidates[0]?.content?.parts[0]?.text;
if (text) process.stdout.write(text);
}
}
```
**Pros**: Works in any environment
**Cons**: Manual SSE parsing required
---
## SSE Format
```
data: {"candidates":[{"content":{"parts":[{"text":"Hello"}]}}]}
data: {"candidates":[{"content":{"parts":[{"text":" world"}]}}]}
data: [DONE]
```
---
## Best Practices
- Always use `streamGenerateContent` endpoint
- Handle incomplete chunks in buffer
- Skip empty lines and `[DONE]` markers
- Use streaming for better UX on long responses
---
## Official Docs
https://ai.google.dev/gemini-api/docs/streaming

View File

@@ -0,0 +1,59 @@
# Thinking Mode Guide
Complete guide to thinking mode in Gemini 2.5 models.
---
## What is Thinking Mode?
Gemini 2.5 models "think" internally before responding, improving accuracy on complex tasks.
**Key Points**:
- ✅ Always enabled on 2.5 models (cannot disable)
- ✅ Transparent (you don't see the thinking process)
- ✅ Configurable thinking budget
- ✅ Improves reasoning quality
---
## Configuration
```typescript
config: {
thinkingConfig: {
thinkingBudget: 8192 // Max tokens for internal reasoning
}
}
```
---
## When to Increase Budget
✅ Complex math/logic problems
✅ Multi-step reasoning
✅ Code optimization
✅ Detailed analysis
---
## When Default is Fine
⏺️ Simple questions
⏺️ Creative writing
⏺️ Translation
⏺️ Summarization
---
## Model Comparison
- **gemini-2.5-pro**: Best for complex reasoning
- **gemini-2.5-flash**: Good balance
- **gemini-2.5-flash-lite**: Basic thinking
---
## Official Docs
https://ai.google.dev/gemini-api/docs/thinking

304
references/top-errors.md Normal file
View File

@@ -0,0 +1,304 @@
# Top Errors and Solutions
22 common Gemini API errors with solutions (Phase 1 + Phase 2).
---
## 1. Using Deprecated SDK
**Error**: `Cannot find module '@google/generative-ai'`
**Cause**: Using old SDK after migration
**Solution**: Install `@google/genai` instead
---
## 2. Wrong Context Window Claims
**Error**: Input exceeds model capacity
**Cause**: Assuming 2M tokens for Gemini 2.5
**Solution**: Gemini 2.5 has 1,048,576 input tokens (NOT 2M!)
---
## 3. Model Not Found
**Error**: `models/gemini-3.0-flash is not found`
**Cause**: Wrong model name
**Solution**: Use: `gemini-2.5-pro`, `gemini-2.5-flash`, or `gemini-2.5-flash-lite`
---
## 4. Function Calling on Flash-Lite
**Error**: Function calling not working
**Cause**: Flash-Lite doesn't support function calling
**Solution**: Use `gemini-2.5-flash` or `gemini-2.5-pro`
---
## 5. Invalid API Key (401)
**Error**: `API key not valid`
**Cause**: Missing or wrong `GEMINI_API_KEY`
**Solution**: Set environment variable correctly
---
## 6. Rate Limit Exceeded (429)
**Error**: `Resource has been exhausted`
**Cause**: Too many requests
**Solution**: Implement exponential backoff
---
## 7. Streaming Parse Errors
**Error**: Invalid JSON in SSE stream
**Cause**: Incomplete chunk parsing
**Solution**: Use buffer to handle partial chunks
---
## 8. Multimodal Format Errors
**Error**: Invalid base64 or MIME type
**Cause**: Wrong image encoding
**Solution**: Use correct base64 encoding and MIME type
---
## 9. Context Length Exceeded
**Error**: `Request payload size exceeds the limit`
**Cause**: Input too large
**Solution**: Reduce input size (max 1,048,576 tokens)
---
## 10. Chat Not Working with Fetch
**Error**: No chat helper available
**Cause**: Chat helpers are SDK-only
**Solution**: Manually manage conversation history or use SDK
---
## 11. Thinking Mode Not Supported
**Error**: Trying to disable thinking mode
**Cause**: Thinking mode always enabled on 2.5
**Solution**: You can only configure budget, not disable
---
## 12. Parameter Conflicts
**Error**: Unsupported parameters
**Cause**: Using wrong config options
**Solution**: Use only supported parameters (see generation-config.md)
---
## 13. System Instruction Placement
**Error**: System instruction not working
**Cause**: Placed inside contents array
**Solution**: Place at top level, not in contents
---
## 14. Token Counting Errors
**Error**: Unexpected token usage
**Cause**: Multimodal inputs use more tokens
**Solution**: Images/video/audio count toward token limit
---
## 15. Parallel Function Call Errors
**Error**: Functions not executing in parallel
**Cause**: Dependencies between functions
**Solution**: Gemini auto-detects; ensure functions are independent
---
## Phase 2 Errors
### 16. Invalid Model Version for Caching
**Error**: `Invalid model name for caching`
**Cause**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
**Solution**: Must use explicit version suffix when creating caches
```typescript
// ✅ Correct
model: 'gemini-2.5-flash-001'
// ❌ Wrong
model: 'gemini-2.5-flash'
```
**Source**: https://ai.google.dev/gemini-api/docs/caching
---
### 17. Cache Expired or Not Found
**Error**: `Cache not found` or `Cache expired`
**Cause**: Trying to use cache after TTL expiration
**Solution**: Check expiration before use or recreate cache
```typescript
const cache = await ai.caches.get({ name: cacheName });
if (new Date(cache.expireTime) < new Date()) {
// Recreate cache
cache = await ai.caches.create({ ... });
}
```
---
### 18. Cannot Update Expired Cache TTL
**Error**: `Cannot update expired cache`
**Cause**: Trying to extend TTL after cache already expired
**Solution**: Update TTL before expiration or create new cache
```typescript
// Update TTL before expiration
await ai.caches.update({
name: cache.name,
config: { ttl: '7200s' }
});
```
---
### 19. Code Execution Timeout
**Error**: `Execution timed out after 30 seconds` with `OUTCOME_FAILED`
**Cause**: Python code taking too long to execute
**Solution**: Simplify computation or reduce data size
```typescript
// Check outcome before using results
if (part.codeExecutionResult?.outcome === 'OUTCOME_FAILED') {
console.error('Execution failed:', part.codeExecutionResult.output);
}
```
**Source**: https://ai.google.dev/gemini-api/docs/code-execution
---
### 20. Python Package Not Available
**Error**: `ModuleNotFoundError: No module named 'requests'`
**Cause**: Trying to import package not in sandbox
**Solution**: Use only available packages (numpy, pandas, matplotlib, seaborn, scipy)
**Available Packages**:
- Standard library: math, statistics, json, csv, datetime
- Data science: numpy, pandas, scipy
- Visualization: matplotlib, seaborn
---
### 21. Code Execution on Flash-Lite
**Error**: Code execution not working
**Cause**: `gemini-2.5-flash-lite` doesn't support code execution
**Solution**: Use `gemini-2.5-flash` or `gemini-2.5-pro`
```typescript
// ✅ Correct
model: 'gemini-2.5-flash' // Supports code execution
// ❌ Wrong
model: 'gemini-2.5-flash-lite' // NO code execution support
```
---
### 22. Grounding Requires Google Cloud Project
**Error**: `Grounding requires Google Cloud project configuration`
**Cause**: Using API key not associated with GCP project
**Solution**: Set up Google Cloud project and enable Generative Language API
**Steps**:
1. Create Google Cloud project
2. Enable Generative Language API
3. Configure billing
4. Use API key from that project
**Source**: https://ai.google.dev/gemini-api/docs/grounding
---
## Quick Debugging Checklist
### Phase 1 (Core)
- [ ] Using @google/genai (NOT @google/generative-ai)
- [ ] Model name is gemini-2.5-pro/flash/flash-lite
- [ ] API key is set correctly
- [ ] Input under 1,048,576 tokens
- [ ] Not using Flash-Lite for function calling
- [ ] System instruction at top level
- [ ] Streaming endpoint is streamGenerateContent
- [ ] MIME types are correct for multimodal
### Phase 2 (Advanced)
- [ ] Caching: Using explicit model version (e.g., gemini-2.5-flash-001)
- [ ] Caching: Cache not expired (check expireTime)
- [ ] Code Execution: Not using Flash-Lite
- [ ] Code Execution: Using only available Python packages
- [ ] Grounding: Google Cloud project configured
- [ ] Grounding: Checking groundingMetadata for search results