# Gemini Models Guide (2025) **Last Updated**: 2025-11-19 (Gemini 3 preview release) --- ## Gemini 3 Series (Preview - November 2025) ### gemini-3-pro-preview **Model ID**: `gemini-3-pro-preview` **Status**: 🆕 Preview release (November 18, 2025) **Context Windows**: - Input: TBD (documentation pending) - Output: TBD (documentation pending) **Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark. **Best For**: - Most complex reasoning tasks - Advanced multimodal analysis (images, videos, PDFs, audio) - Benchmark-critical applications - Cutting-edge projects requiring latest capabilities - Tasks requiring absolute best quality **Features**: - ✅ Enhanced multimodal understanding - ✅ Function calling - ✅ Streaming - ✅ System instructions - ✅ JSON mode - TBD Thinking mode (documentation pending) **Knowledge Cutoff**: TBD **Pricing**: Preview pricing (likely higher than 2.5 Pro) **⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability. **New Capabilities**: - Record-breaking benchmark performance - Enhanced generative UI responses - Advanced coding capabilities (Google Antigravity integration) - State-of-the-art multimodal understanding --- ## Current Production Models (Gemini 2.5 - Stable) ### gemini-2.5-pro **Model ID**: `gemini-2.5-pro` **Context Windows**: - Input: 1,048,576 tokens (NOT 2M!) - Output: 65,536 tokens **Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM. **Best For**: - Complex reasoning tasks - Advanced code generation and optimization - Mathematical problem-solving - Multi-step logical analysis - STEM applications **Features**: - ✅ Thinking mode (enabled by default) - ✅ Function calling - ✅ Multimodal (text, images, video, audio, PDFs) - ✅ Streaming - ✅ System instructions - ✅ JSON mode **Knowledge Cutoff**: January 2025 **Pricing**: Higher cost, use for tasks requiring best quality --- ### gemini-2.5-flash **Model ID**: `gemini-2.5-flash` **Context Windows**: - Input: 1,048,576 tokens - Output: 65,536 tokens **Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks. **Best For**: - General-purpose AI applications - High-volume API calls - Agentic workflows - Cost-sensitive applications - Production workloads **Features**: - ✅ Thinking mode (enabled by default) - ✅ Function calling - ✅ Multimodal (text, images, video, audio, PDFs) - ✅ Streaming - ✅ System instructions - ✅ JSON mode **Knowledge Cutoff**: January 2025 **Pricing**: Best price-performance ratio **⭐ Recommended**: This is the default choice for most applications --- ### gemini-2.5-flash-lite **Model ID**: `gemini-2.5-flash-lite` **Context Windows**: - Input: 1,048,576 tokens - Output: 65,536 tokens **Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput. **Best For**: - High-throughput applications - Simple text generation - Cost-critical use cases - Speed-prioritized workloads **Features**: - ✅ Thinking mode (enabled by default) - ❌ **NO function calling** (critical limitation!) - ✅ Multimodal (text, images, video, audio, PDFs) - ✅ Streaming - ✅ System instructions - ✅ JSON mode **Knowledge Cutoff**: January 2025 **Pricing**: Lowest cost **⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use. --- ## Model Comparison Matrix | Feature | Pro | Flash | Flash-Lite | |---------|-----|-------|------------| | **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON | | **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** | | **Multimodal** | ✅ Full | ✅ Full | ✅ Full | | **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes | | **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 | | **Output Tokens** | 65,536 | 65,536 | 65,536 | | **Reasoning Quality** | Best | Good | Basic | | **Speed** | Moderate | Fast | Fastest | | **Cost** | Highest | Medium | Lowest | --- ## Previous Generation Models (Still Available) ### Gemini 2.0 Flash **Model ID**: `gemini-2.0-flash` **Context**: 1M input / 65K output tokens **Status**: Previous generation, 2.5 Flash recommended instead ### Gemini 1.5 Pro **Model ID**: `gemini-1.5-pro` **Context**: 2M input tokens (this is the ONLY model with 2M!) **Status**: Older model, 2.5 models recommended --- ## Context Window Clarification **⚠️ CRITICAL CORRECTION**: **ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million) **INACCURATE**: Claiming Gemini 2.5 has 2M token context window **WHY THIS MATTERS**: - Gemini 1.5 Pro (older model) had 2M tokens - Gemini 2.5 models (current) have ~1M tokens - This is a common mistake that causes confusion! **This skill prevents this error by providing accurate information.** --- ## Model Selection Guide ### Use gemini-2.5-pro When: - ✅ Complex reasoning required (math, logic, STEM) - ✅ Advanced code generation and optimization - ✅ Multi-step problem-solving - ✅ Quality is more important than cost - ✅ Tasks require maximum capability ### Use gemini-2.5-flash When: - ✅ General-purpose AI applications - ✅ High-volume production workloads - ✅ Function calling required - ✅ Agentic workflows - ✅ Good balance of cost and quality needed - ⭐ **Recommended default choice** ### Use gemini-2.5-flash-lite When: - ✅ Simple text generation only - ✅ No function calling needed - ✅ High throughput required - ✅ Cost is primary concern - ⚠️ **Only if you don't need function calling!** --- ## Common Mistakes ### ❌ Mistake 1: Using Wrong Model Name ```typescript // WRONG - old model name model: 'gemini-1.5-pro' // CORRECT - current model model: 'gemini-2.5-flash' ``` ### ❌ Mistake 2: Claiming 2M Context for 2.5 Models ```typescript // WRONG ASSUMPTION // "Gemini 2.5 has 2M token context window" // CORRECT // Gemini 2.5 has 1,048,576 input tokens // Only Gemini 1.5 Pro (older) had 2M ``` ### ❌ Mistake 3: Using Flash-Lite for Function Calling ```typescript // WRONG - Flash-Lite doesn't support function calling! model: 'gemini-2.5-flash-lite', config: { tools: [{ functionDeclarations: [...] }] // This will FAIL } // CORRECT model: 'gemini-2.5-flash', // or gemini-2.5-pro config: { tools: [{ functionDeclarations: [...] }] } ``` --- ## Rate Limits (Free vs Paid) ### Free Tier - **15 RPM** (requests per minute) - **1M TPM** (tokens per minute) - **1,500 RPD** (requests per day) ### Paid Tier - **360 RPM** - **4M TPM** - Unlimited daily requests **Tip**: Monitor your usage and implement rate limiting to stay within quotas. --- ## Official Documentation - **Models Overview**: https://ai.google.dev/gemini-api/docs/models - **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/ - **Pricing**: https://ai.google.dev/pricing --- **Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).