Files
gh-jezweb-claude-skills-ski…/references/models-guide.md
2025-11-30 08:24:51 +08:00

290 lines
7.2 KiB
Markdown

# Gemini Models Guide (2025)
**Last Updated**: 2025-11-19 (Gemini 3 preview release)
---
## Gemini 3 Series (Preview - November 2025)
### gemini-3-pro-preview
**Model ID**: `gemini-3-pro-preview`
**Status**: 🆕 Preview release (November 18, 2025)
**Context Windows**:
- Input: TBD (documentation pending)
- Output: TBD (documentation pending)
**Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.
**Best For**:
- Most complex reasoning tasks
- Advanced multimodal analysis (images, videos, PDFs, audio)
- Benchmark-critical applications
- Cutting-edge projects requiring latest capabilities
- Tasks requiring absolute best quality
**Features**:
- ✅ Enhanced multimodal understanding
- ✅ Function calling
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
- TBD Thinking mode (documentation pending)
**Knowledge Cutoff**: TBD
**Pricing**: Preview pricing (likely higher than 2.5 Pro)
**⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability.
**New Capabilities**:
- Record-breaking benchmark performance
- Enhanced generative UI responses
- Advanced coding capabilities (Google Antigravity integration)
- State-of-the-art multimodal understanding
---
## Current Production Models (Gemini 2.5 - Stable)
### gemini-2.5-pro
**Model ID**: `gemini-2.5-pro`
**Context Windows**:
- Input: 1,048,576 tokens (NOT 2M!)
- Output: 65,536 tokens
**Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.
**Best For**:
- Complex reasoning tasks
- Advanced code generation and optimization
- Mathematical problem-solving
- Multi-step logical analysis
- STEM applications
**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Higher cost, use for tasks requiring best quality
---
### gemini-2.5-flash
**Model ID**: `gemini-2.5-flash`
**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
**Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.
**Best For**:
- General-purpose AI applications
- High-volume API calls
- Agentic workflows
- Cost-sensitive applications
- Production workloads
**Features**:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Best price-performance ratio
**⭐ Recommended**: This is the default choice for most applications
---
### gemini-2.5-flash-lite
**Model ID**: `gemini-2.5-flash-lite`
**Context Windows**:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
**Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput.
**Best For**:
- High-throughput applications
- Simple text generation
- Cost-critical use cases
- Speed-prioritized workloads
**Features**:
- ✅ Thinking mode (enabled by default)
-**NO function calling** (critical limitation!)
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
**Knowledge Cutoff**: January 2025
**Pricing**: Lowest cost
**⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.
---
## Model Comparison Matrix
| Feature | Pro | Flash | Flash-Lite |
|---------|-----|-------|------------|
| **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON |
| **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** |
| **Multimodal** | ✅ Full | ✅ Full | ✅ Full |
| **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes |
| **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 |
| **Output Tokens** | 65,536 | 65,536 | 65,536 |
| **Reasoning Quality** | Best | Good | Basic |
| **Speed** | Moderate | Fast | Fastest |
| **Cost** | Highest | Medium | Lowest |
---
## Previous Generation Models (Still Available)
### Gemini 2.0 Flash
**Model ID**: `gemini-2.0-flash`
**Context**: 1M input / 65K output tokens
**Status**: Previous generation, 2.5 Flash recommended instead
### Gemini 1.5 Pro
**Model ID**: `gemini-1.5-pro`
**Context**: 2M input tokens (this is the ONLY model with 2M!)
**Status**: Older model, 2.5 models recommended
---
## Context Window Clarification
**⚠️ CRITICAL CORRECTION**:
**ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million)
**INACCURATE**: Claiming Gemini 2.5 has 2M token context window
**WHY THIS MATTERS**:
- Gemini 1.5 Pro (older model) had 2M tokens
- Gemini 2.5 models (current) have ~1M tokens
- This is a common mistake that causes confusion!
**This skill prevents this error by providing accurate information.**
---
## Model Selection Guide
### Use gemini-2.5-pro When:
- ✅ Complex reasoning required (math, logic, STEM)
- ✅ Advanced code generation and optimization
- ✅ Multi-step problem-solving
- ✅ Quality is more important than cost
- ✅ Tasks require maximum capability
### Use gemini-2.5-flash When:
- ✅ General-purpose AI applications
- ✅ High-volume production workloads
- ✅ Function calling required
- ✅ Agentic workflows
- ✅ Good balance of cost and quality needed
-**Recommended default choice**
### Use gemini-2.5-flash-lite When:
- ✅ Simple text generation only
- ✅ No function calling needed
- ✅ High throughput required
- ✅ Cost is primary concern
- ⚠️ **Only if you don't need function calling!**
---
## Common Mistakes
### ❌ Mistake 1: Using Wrong Model Name
```typescript
// WRONG - old model name
model: 'gemini-1.5-pro'
// CORRECT - current model
model: 'gemini-2.5-flash'
```
### ❌ Mistake 2: Claiming 2M Context for 2.5 Models
```typescript
// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"
// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M
```
### ❌ Mistake 3: Using Flash-Lite for Function Calling
```typescript
// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
tools: [{ functionDeclarations: [...] }] // This will FAIL
}
// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
tools: [{ functionDeclarations: [...] }]
}
```
---
## Rate Limits (Free vs Paid)
### Free Tier
- **15 RPM** (requests per minute)
- **1M TPM** (tokens per minute)
- **1,500 RPD** (requests per day)
### Paid Tier
- **360 RPM**
- **4M TPM**
- Unlimited daily requests
**Tip**: Monitor your usage and implement rate limiting to stay within quotas.
---
## Official Documentation
- **Models Overview**: https://ai.google.dev/gemini-api/docs/models
- **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
- **Pricing**: https://ai.google.dev/pricing
---
**Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).