7.2 KiB
Gemini Models Guide (2025)
Last Updated: 2025-11-19 (Gemini 3 preview release)
Gemini 3 Series (Preview - November 2025)
gemini-3-pro-preview
Model ID: gemini-3-pro-preview
Status: 🆕 Preview release (November 18, 2025)
Context Windows:
- Input: TBD (documentation pending)
- Output: TBD (documentation pending)
Description: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.
Best For:
- Most complex reasoning tasks
- Advanced multimodal analysis (images, videos, PDFs, audio)
- Benchmark-critical applications
- Cutting-edge projects requiring latest capabilities
- Tasks requiring absolute best quality
Features:
- ✅ Enhanced multimodal understanding
- ✅ Function calling
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
- TBD Thinking mode (documentation pending)
Knowledge Cutoff: TBD
Pricing: Preview pricing (likely higher than 2.5 Pro)
⚠️ Preview Status: Use for evaluation and testing. Consider gemini-2.5-pro for production-critical decisions until Gemini 3 reaches stable general availability.
New Capabilities:
- Record-breaking benchmark performance
- Enhanced generative UI responses
- Advanced coding capabilities (Google Antigravity integration)
- State-of-the-art multimodal understanding
Current Production Models (Gemini 2.5 - Stable)
gemini-2.5-pro
Model ID: gemini-2.5-pro
Context Windows:
- Input: 1,048,576 tokens (NOT 2M!)
- Output: 65,536 tokens
Description: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.
Best For:
- Complex reasoning tasks
- Advanced code generation and optimization
- Mathematical problem-solving
- Multi-step logical analysis
- STEM applications
Features:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
Knowledge Cutoff: January 2025
Pricing: Higher cost, use for tasks requiring best quality
gemini-2.5-flash
Model ID: gemini-2.5-flash
Context Windows:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
Description: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.
Best For:
- General-purpose AI applications
- High-volume API calls
- Agentic workflows
- Cost-sensitive applications
- Production workloads
Features:
- ✅ Thinking mode (enabled by default)
- ✅ Function calling
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
Knowledge Cutoff: January 2025
Pricing: Best price-performance ratio
⭐ Recommended: This is the default choice for most applications
gemini-2.5-flash-lite
Model ID: gemini-2.5-flash-lite
Context Windows:
- Input: 1,048,576 tokens
- Output: 65,536 tokens
Description: Most cost-efficient and fastest 2.5 model, optimized for high throughput.
Best For:
- High-throughput applications
- Simple text generation
- Cost-critical use cases
- Speed-prioritized workloads
Features:
- ✅ Thinking mode (enabled by default)
- ❌ NO function calling (critical limitation!)
- ✅ Multimodal (text, images, video, audio, PDFs)
- ✅ Streaming
- ✅ System instructions
- ✅ JSON mode
Knowledge Cutoff: January 2025
Pricing: Lowest cost
⚠️ Important: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.
Model Comparison Matrix
| Feature | Pro | Flash | Flash-Lite |
|---|---|---|---|
| Thinking Mode | ✅ Default ON | ✅ Default ON | ✅ Default ON |
| Function Calling | ✅ Yes | ✅ Yes | ❌ NO |
| Multimodal | ✅ Full | ✅ Full | ✅ Full |
| Streaming | ✅ Yes | ✅ Yes | ✅ Yes |
| Input Tokens | 1,048,576 | 1,048,576 | 1,048,576 |
| Output Tokens | 65,536 | 65,536 | 65,536 |
| Reasoning Quality | Best | Good | Basic |
| Speed | Moderate | Fast | Fastest |
| Cost | Highest | Medium | Lowest |
Previous Generation Models (Still Available)
Gemini 2.0 Flash
Model ID: gemini-2.0-flash
Context: 1M input / 65K output tokens
Status: Previous generation, 2.5 Flash recommended instead
Gemini 1.5 Pro
Model ID: gemini-1.5-pro
Context: 2M input tokens (this is the ONLY model with 2M!)
Status: Older model, 2.5 models recommended
Context Window Clarification
⚠️ CRITICAL CORRECTION:
ACCURATE: Gemini 2.5 models support 1,048,576 input tokens (approximately 1 million)
INACCURATE: Claiming Gemini 2.5 has 2M token context window
WHY THIS MATTERS:
- Gemini 1.5 Pro (older model) had 2M tokens
- Gemini 2.5 models (current) have ~1M tokens
- This is a common mistake that causes confusion!
This skill prevents this error by providing accurate information.
Model Selection Guide
Use gemini-2.5-pro When:
- ✅ Complex reasoning required (math, logic, STEM)
- ✅ Advanced code generation and optimization
- ✅ Multi-step problem-solving
- ✅ Quality is more important than cost
- ✅ Tasks require maximum capability
Use gemini-2.5-flash When:
- ✅ General-purpose AI applications
- ✅ High-volume production workloads
- ✅ Function calling required
- ✅ Agentic workflows
- ✅ Good balance of cost and quality needed
- ⭐ Recommended default choice
Use gemini-2.5-flash-lite When:
- ✅ Simple text generation only
- ✅ No function calling needed
- ✅ High throughput required
- ✅ Cost is primary concern
- ⚠️ Only if you don't need function calling!
Common Mistakes
❌ Mistake 1: Using Wrong Model Name
// WRONG - old model name
model: 'gemini-1.5-pro'
// CORRECT - current model
model: 'gemini-2.5-flash'
❌ Mistake 2: Claiming 2M Context for 2.5 Models
// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"
// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M
❌ Mistake 3: Using Flash-Lite for Function Calling
// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
tools: [{ functionDeclarations: [...] }] // This will FAIL
}
// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
tools: [{ functionDeclarations: [...] }]
}
Rate Limits (Free vs Paid)
Free Tier
- 15 RPM (requests per minute)
- 1M TPM (tokens per minute)
- 1,500 RPD (requests per day)
Paid Tier
- 360 RPM
- 4M TPM
- Unlimited daily requests
Tip: Monitor your usage and implement rate limiting to stay within quotas.
Official Documentation
- Models Overview: https://ai.google.dev/gemini-api/docs/models
- Gemini 2.5 Announcement: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
- Pricing: https://ai.google.dev/pricing
Production Tip: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).