Initial commit
This commit is contained in:
289
references/models-guide.md
Normal file
289
references/models-guide.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Gemini Models Guide (2025)
|
||||
|
||||
**Last Updated**: 2025-11-19 (Gemini 3 preview release)
|
||||
|
||||
---
|
||||
|
||||
## Gemini 3 Series (Preview - November 2025)
|
||||
|
||||
### gemini-3-pro-preview
|
||||
|
||||
**Model ID**: `gemini-3-pro-preview`
|
||||
|
||||
**Status**: 🆕 Preview release (November 18, 2025)
|
||||
|
||||
**Context Windows**:
|
||||
- Input: TBD (documentation pending)
|
||||
- Output: TBD (documentation pending)
|
||||
|
||||
**Description**: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.
|
||||
|
||||
**Best For**:
|
||||
- Most complex reasoning tasks
|
||||
- Advanced multimodal analysis (images, videos, PDFs, audio)
|
||||
- Benchmark-critical applications
|
||||
- Cutting-edge projects requiring latest capabilities
|
||||
- Tasks requiring absolute best quality
|
||||
|
||||
**Features**:
|
||||
- ✅ Enhanced multimodal understanding
|
||||
- ✅ Function calling
|
||||
- ✅ Streaming
|
||||
- ✅ System instructions
|
||||
- ✅ JSON mode
|
||||
- TBD Thinking mode (documentation pending)
|
||||
|
||||
**Knowledge Cutoff**: TBD
|
||||
|
||||
**Pricing**: Preview pricing (likely higher than 2.5 Pro)
|
||||
|
||||
**⚠️ Preview Status**: Use for evaluation and testing. Consider `gemini-2.5-pro` for production-critical decisions until Gemini 3 reaches stable general availability.
|
||||
|
||||
**New Capabilities**:
|
||||
- Record-breaking benchmark performance
|
||||
- Enhanced generative UI responses
|
||||
- Advanced coding capabilities (Google Antigravity integration)
|
||||
- State-of-the-art multimodal understanding
|
||||
|
||||
---
|
||||
|
||||
## Current Production Models (Gemini 2.5 - Stable)
|
||||
|
||||
### gemini-2.5-pro
|
||||
|
||||
**Model ID**: `gemini-2.5-pro`
|
||||
|
||||
**Context Windows**:
|
||||
- Input: 1,048,576 tokens (NOT 2M!)
|
||||
- Output: 65,536 tokens
|
||||
|
||||
**Description**: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.
|
||||
|
||||
**Best For**:
|
||||
- Complex reasoning tasks
|
||||
- Advanced code generation and optimization
|
||||
- Mathematical problem-solving
|
||||
- Multi-step logical analysis
|
||||
- STEM applications
|
||||
|
||||
**Features**:
|
||||
- ✅ Thinking mode (enabled by default)
|
||||
- ✅ Function calling
|
||||
- ✅ Multimodal (text, images, video, audio, PDFs)
|
||||
- ✅ Streaming
|
||||
- ✅ System instructions
|
||||
- ✅ JSON mode
|
||||
|
||||
**Knowledge Cutoff**: January 2025
|
||||
|
||||
**Pricing**: Higher cost, use for tasks requiring best quality
|
||||
|
||||
---
|
||||
|
||||
### gemini-2.5-flash
|
||||
|
||||
**Model ID**: `gemini-2.5-flash`
|
||||
|
||||
**Context Windows**:
|
||||
- Input: 1,048,576 tokens
|
||||
- Output: 65,536 tokens
|
||||
|
||||
**Description**: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.
|
||||
|
||||
**Best For**:
|
||||
- General-purpose AI applications
|
||||
- High-volume API calls
|
||||
- Agentic workflows
|
||||
- Cost-sensitive applications
|
||||
- Production workloads
|
||||
|
||||
**Features**:
|
||||
- ✅ Thinking mode (enabled by default)
|
||||
- ✅ Function calling
|
||||
- ✅ Multimodal (text, images, video, audio, PDFs)
|
||||
- ✅ Streaming
|
||||
- ✅ System instructions
|
||||
- ✅ JSON mode
|
||||
|
||||
**Knowledge Cutoff**: January 2025
|
||||
|
||||
**Pricing**: Best price-performance ratio
|
||||
|
||||
**⭐ Recommended**: This is the default choice for most applications
|
||||
|
||||
---
|
||||
|
||||
### gemini-2.5-flash-lite
|
||||
|
||||
**Model ID**: `gemini-2.5-flash-lite`
|
||||
|
||||
**Context Windows**:
|
||||
- Input: 1,048,576 tokens
|
||||
- Output: 65,536 tokens
|
||||
|
||||
**Description**: Most cost-efficient and fastest 2.5 model, optimized for high throughput.
|
||||
|
||||
**Best For**:
|
||||
- High-throughput applications
|
||||
- Simple text generation
|
||||
- Cost-critical use cases
|
||||
- Speed-prioritized workloads
|
||||
|
||||
**Features**:
|
||||
- ✅ Thinking mode (enabled by default)
|
||||
- ❌ **NO function calling** (critical limitation!)
|
||||
- ✅ Multimodal (text, images, video, audio, PDFs)
|
||||
- ✅ Streaming
|
||||
- ✅ System instructions
|
||||
- ✅ JSON mode
|
||||
|
||||
**Knowledge Cutoff**: January 2025
|
||||
|
||||
**Pricing**: Lowest cost
|
||||
|
||||
**⚠️ Important**: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.
|
||||
|
||||
---
|
||||
|
||||
## Model Comparison Matrix
|
||||
|
||||
| Feature | Pro | Flash | Flash-Lite |
|
||||
|---------|-----|-------|------------|
|
||||
| **Thinking Mode** | ✅ Default ON | ✅ Default ON | ✅ Default ON |
|
||||
| **Function Calling** | ✅ Yes | ✅ Yes | ❌ **NO** |
|
||||
| **Multimodal** | ✅ Full | ✅ Full | ✅ Full |
|
||||
| **Streaming** | ✅ Yes | ✅ Yes | ✅ Yes |
|
||||
| **Input Tokens** | 1,048,576 | 1,048,576 | 1,048,576 |
|
||||
| **Output Tokens** | 65,536 | 65,536 | 65,536 |
|
||||
| **Reasoning Quality** | Best | Good | Basic |
|
||||
| **Speed** | Moderate | Fast | Fastest |
|
||||
| **Cost** | Highest | Medium | Lowest |
|
||||
|
||||
---
|
||||
|
||||
## Previous Generation Models (Still Available)
|
||||
|
||||
### Gemini 2.0 Flash
|
||||
|
||||
**Model ID**: `gemini-2.0-flash`
|
||||
|
||||
**Context**: 1M input / 65K output tokens
|
||||
|
||||
**Status**: Previous generation, 2.5 Flash recommended instead
|
||||
|
||||
### Gemini 1.5 Pro
|
||||
|
||||
**Model ID**: `gemini-1.5-pro`
|
||||
|
||||
**Context**: 2M input tokens (this is the ONLY model with 2M!)
|
||||
|
||||
**Status**: Older model, 2.5 models recommended
|
||||
|
||||
---
|
||||
|
||||
## Context Window Clarification
|
||||
|
||||
**⚠️ CRITICAL CORRECTION**:
|
||||
|
||||
**ACCURATE**: Gemini 2.5 models support **1,048,576 input tokens** (approximately 1 million)
|
||||
|
||||
**INACCURATE**: Claiming Gemini 2.5 has 2M token context window
|
||||
|
||||
**WHY THIS MATTERS**:
|
||||
- Gemini 1.5 Pro (older model) had 2M tokens
|
||||
- Gemini 2.5 models (current) have ~1M tokens
|
||||
- This is a common mistake that causes confusion!
|
||||
|
||||
**This skill prevents this error by providing accurate information.**
|
||||
|
||||
---
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
### Use gemini-2.5-pro When:
|
||||
- ✅ Complex reasoning required (math, logic, STEM)
|
||||
- ✅ Advanced code generation and optimization
|
||||
- ✅ Multi-step problem-solving
|
||||
- ✅ Quality is more important than cost
|
||||
- ✅ Tasks require maximum capability
|
||||
|
||||
### Use gemini-2.5-flash When:
|
||||
- ✅ General-purpose AI applications
|
||||
- ✅ High-volume production workloads
|
||||
- ✅ Function calling required
|
||||
- ✅ Agentic workflows
|
||||
- ✅ Good balance of cost and quality needed
|
||||
- ⭐ **Recommended default choice**
|
||||
|
||||
### Use gemini-2.5-flash-lite When:
|
||||
- ✅ Simple text generation only
|
||||
- ✅ No function calling needed
|
||||
- ✅ High throughput required
|
||||
- ✅ Cost is primary concern
|
||||
- ⚠️ **Only if you don't need function calling!**
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
### ❌ Mistake 1: Using Wrong Model Name
|
||||
```typescript
|
||||
// WRONG - old model name
|
||||
model: 'gemini-1.5-pro'
|
||||
|
||||
// CORRECT - current model
|
||||
model: 'gemini-2.5-flash'
|
||||
```
|
||||
|
||||
### ❌ Mistake 2: Claiming 2M Context for 2.5 Models
|
||||
```typescript
|
||||
// WRONG ASSUMPTION
|
||||
// "Gemini 2.5 has 2M token context window"
|
||||
|
||||
// CORRECT
|
||||
// Gemini 2.5 has 1,048,576 input tokens
|
||||
// Only Gemini 1.5 Pro (older) had 2M
|
||||
```
|
||||
|
||||
### ❌ Mistake 3: Using Flash-Lite for Function Calling
|
||||
```typescript
|
||||
// WRONG - Flash-Lite doesn't support function calling!
|
||||
model: 'gemini-2.5-flash-lite',
|
||||
config: {
|
||||
tools: [{ functionDeclarations: [...] }] // This will FAIL
|
||||
}
|
||||
|
||||
// CORRECT
|
||||
model: 'gemini-2.5-flash', // or gemini-2.5-pro
|
||||
config: {
|
||||
tools: [{ functionDeclarations: [...] }]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits (Free vs Paid)
|
||||
|
||||
### Free Tier
|
||||
- **15 RPM** (requests per minute)
|
||||
- **1M TPM** (tokens per minute)
|
||||
- **1,500 RPD** (requests per day)
|
||||
|
||||
### Paid Tier
|
||||
- **360 RPM**
|
||||
- **4M TPM**
|
||||
- Unlimited daily requests
|
||||
|
||||
**Tip**: Monitor your usage and implement rate limiting to stay within quotas.
|
||||
|
||||
---
|
||||
|
||||
## Official Documentation
|
||||
|
||||
- **Models Overview**: https://ai.google.dev/gemini-api/docs/models
|
||||
- **Gemini 2.5 Announcement**: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
|
||||
- **Pricing**: https://ai.google.dev/pricing
|
||||
|
||||
---
|
||||
|
||||
**Production Tip**: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).
|
||||
Reference in New Issue
Block a user