gh-jezweb-claude-skills-ski…/references/models-guide.md

# Gemini Models Guide (2025)

**Last Updated**: 2025-11-19 (Gemini 3 preview release)

---

## Gemini 3 Series (Preview - November 2025)

### gemini-3-pro-preview

**Model ID**: `gemini-3-pro-preview`

**Status**: 🆕 Preview release (November 18, 2025)

**Context Windows**:
- Input: TBD (documentation pending)
- Output: TBD (documentation pending)

**Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.

**Best For**:
- Most complex reasoning tasks
- Advanced multimodal analysis (images, videos, PDFs, audio)
- Benchmark-critical applications
- Cutting-edge projects requiring latest capabilities
- Tasks requiring absolute best quality

**Features**:
- ✅ Enhanced multimodal understanding
- ✅ Function calling
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
- TBD Thinking mode (documentation pending)

**Knowledge Cutoff**: TBD

**Pricing**: Preview pricing (likely higher than 2.5 Pro)

**⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability.

**New Capabilities**:
- Record-breaking benchmark performance
- Enhanced generative UI responses
- Advanced coding capabilities (Google Antigravity integration)
- State-of-the-art multimodal understanding

---

## Current Production Models (Gemini 2.5 - Stable)

### gemini-2.5-pro

**Model ID**: `gemini-2.5-pro`

**Context Windows**:
- Input: 1,048,576 tokens (NOT 2M!)
- Output: 65,536 tokens

**Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.

**Best For**:
- Complex reasoning tasks
- Advanced code generation and optimization
- Mathematical problem-solving
- Multi-step logical analysis
- STEM applications

**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode

**Knowledge Cutoff**: January 2025

**Pricing**: Higher cost, use for tasks requiring best quality

---

### gemini-2.5-flash

**Model ID**: `gemini-2.5-flash`

**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens

**Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.

**Best For**:
- General-purpose AI applications
- High-volume API calls
- Agentic workflows
- Cost-sensitive applications
- Production workloads

**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode

**Knowledge Cutoff**: January 2025

**Pricing**: Best price-performance ratio

**⭐ Recommended**: This is the default choice for most applications

---

### gemini-2.5-flash-lite

**Model ID**: `gemini-2.5-flash-lite`

**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens

**Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput.

**Best For**:
- High-throughput applications
- Simple text generation
- Cost-critical use cases
- Speed-prioritized workloads

**Features**:
- ✅ Thinking mode (enabled by default)
- ❌ **NO function calling** (critical limitation!)
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode

**Knowledge Cutoff**: January 2025

**Pricing**: Lowest cost

**⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.

---

## Model Comparison Matrix

| Feature | Pro | Flash | Flash-Lite |
|---------|-----|-------|------------|
| **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON |
| **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** |
| **Multimodal** | ✅ Full | ✅ Full | ✅ Full |
| **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes |
| **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 |
| **Output Tokens** | 65,536 | 65,536 | 65,536 |
| **Reasoning Quality** | Best | Good | Basic |
| **Speed** | Moderate | Fast | Fastest |
| **Cost** | Highest | Medium | Lowest |

---

## Previous Generation Models (Still Available)

### Gemini 2.0 Flash

**Model ID**: `gemini-2.0-flash`

**Context**: 1M input / 65K output tokens

**Status**: Previous generation, 2.5 Flash recommended instead

### Gemini 1.5 Pro

**Model ID**: `gemini-1.5-pro`

**Context**: 2M input tokens (this is the ONLY model with 2M!)

**Status**: Older model, 2.5 models recommended

---

## Context Window Clarification

**⚠️ CRITICAL CORRECTION**:

**ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million)

**INACCURATE**: Claiming Gemini 2.5 has 2M token context window

**WHY THIS MATTERS**:
- Gemini 1.5 Pro (older model) had 2M tokens
- Gemini 2.5 models (current) have ~1M tokens
- This is a common mistake that causes confusion!

**This skill prevents this error by providing accurate information.**

---

## Model Selection Guide

### Use gemini-2.5-pro When:
- ✅ Complex reasoning required (math, logic, STEM)
- ✅ Advanced code generation and optimization
- ✅ Multi-step problem-solving
- ✅ Quality is more important than cost
- ✅ Tasks require maximum capability

### Use gemini-2.5-flash When:
- ✅ General-purpose AI applications
- ✅ High-volume production workloads
- ✅ Function calling required
- ✅ Agentic workflows
- ✅ Good balance of cost and quality needed
- ⭐ **Recommended default choice**

### Use gemini-2.5-flash-lite When:
- ✅ Simple text generation only
- ✅ No function calling needed
- ✅ High throughput required
- ✅ Cost is primary concern
- ⚠️ **Only if you don't need function calling!**

---

## Common Mistakes

### ❌ Mistake 1: Using Wrong Model Name
```typescript
// WRONG - old model name
model: 'gemini-1.5-pro'

// CORRECT - current model
model: 'gemini-2.5-flash'
```

### ❌ Mistake 2: Claiming 2M Context for 2.5 Models
```typescript
// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"

// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M
```

### ❌ Mistake 3: Using Flash-Lite for Function Calling
```typescript
// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
  tools: [{ functionDeclarations: [...] }] // This will FAIL
}

// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
  tools: [{ functionDeclarations: [...] }]
}
```

---

## Rate Limits (Free vs Paid)

### Free Tier
- **15 RPM** (requests per minute)
- **1M TPM** (tokens per minute)
- **1,500 RPD** (requests per day)

### Paid Tier
- **360 RPM**
- **4M TPM**
- Unlimited daily requests

**Tip**: Monitor your usage and implement rate limiting to stay within quotas.

---

## Official Documentation

- **Models Overview**: https://ai.google.dev/gemini-api/docs/models
- **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
- **Pricing**: https://ai.google.dev/pricing

---

**Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).