Files
gh-jezweb-claude-skills-ski…/references/models-guide.md
2025-11-30 08:24:51 +08:00

7.2 KiB

Gemini Models Guide (2025)

Last Updated: 2025-11-19 (Gemini 3 preview release)


Gemini 3 Series (Preview - November 2025)

gemini-3-pro-preview

Model ID: gemini-3-pro-preview

Status: 🆕 Preview release (November 18, 2025)

Context Windows:

  • Input: TBD (documentation pending)
  • Output: TBD (documentation pending)

Description: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.

Best For:

  • Most complex reasoning tasks
  • Advanced multimodal analysis (images, videos, PDFs, audio)
  • Benchmark-critical applications
  • Cutting-edge projects requiring latest capabilities
  • Tasks requiring absolute best quality

Features:

  • Enhanced multimodal understanding
  • Function calling
  • Streaming
  • System instructions
  • JSON mode
  • TBD Thinking mode (documentation pending)

Knowledge Cutoff: TBD

Pricing: Preview pricing (likely higher than 2.5 Pro)

⚠️ Preview Status: Use for evaluation and testing. Consider gemini-2.5-pro for production-critical decisions until Gemini 3 reaches stable general availability.

New Capabilities:

  • Record-breaking benchmark performance
  • Enhanced generative UI responses
  • Advanced coding capabilities (Google Antigravity integration)
  • State-of-the-art multimodal understanding

Current Production Models (Gemini 2.5 - Stable)

gemini-2.5-pro

Model ID: gemini-2.5-pro

Context Windows:

  • Input: 1,048,576 tokens (NOT 2M!)
  • Output: 65,536 tokens

Description: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.

Best For:

  • Complex reasoning tasks
  • Advanced code generation and optimization
  • Mathematical problem-solving
  • Multi-step logical analysis
  • STEM applications

Features:

  • Thinking mode (enabled by default)
  • Function calling
  • Multimodal (text, images, video, audio, PDFs)
  • Streaming
  • System instructions
  • JSON mode

Knowledge Cutoff: January 2025

Pricing: Higher cost, use for tasks requiring best quality


gemini-2.5-flash

Model ID: gemini-2.5-flash

Context Windows:

  • Input: 1,048,576 tokens
  • Output: 65,536 tokens

Description: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.

Best For:

  • General-purpose AI applications
  • High-volume API calls
  • Agentic workflows
  • Cost-sensitive applications
  • Production workloads

Features:

  • Thinking mode (enabled by default)
  • Function calling
  • Multimodal (text, images, video, audio, PDFs)
  • Streaming
  • System instructions
  • JSON mode

Knowledge Cutoff: January 2025

Pricing: Best price-performance ratio

Recommended: This is the default choice for most applications


gemini-2.5-flash-lite

Model ID: gemini-2.5-flash-lite

Context Windows:

  • Input: 1,048,576 tokens
  • Output: 65,536 tokens

Description: Most cost-efficient and fastest 2.5 model, optimized for high throughput.

Best For:

  • High-throughput applications
  • Simple text generation
  • Cost-critical use cases
  • Speed-prioritized workloads

Features:

  • Thinking mode (enabled by default)
  • NO function calling (critical limitation!)
  • Multimodal (text, images, video, audio, PDFs)
  • Streaming
  • System instructions
  • JSON mode

Knowledge Cutoff: January 2025

Pricing: Lowest cost

⚠️ Important: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.


Model Comparison Matrix

Feature Pro Flash Flash-Lite
Thinking Mode Default ON Default ON Default ON
Function Calling Yes Yes NO
Multimodal Full Full Full
Streaming Yes Yes Yes
Input Tokens 1,048,576 1,048,576 1,048,576
Output Tokens 65,536 65,536 65,536
Reasoning Quality Best Good Basic
Speed Moderate Fast Fastest
Cost Highest Medium Lowest

Previous Generation Models (Still Available)

Gemini 2.0 Flash

Model ID: gemini-2.0-flash

Context: 1M input / 65K output tokens

Status: Previous generation, 2.5 Flash recommended instead

Gemini 1.5 Pro

Model ID: gemini-1.5-pro

Context: 2M input tokens (this is the ONLY model with 2M!)

Status: Older model, 2.5 models recommended


Context Window Clarification

⚠️ CRITICAL CORRECTION:

ACCURATE: Gemini 2.5 models support 1,048,576 input tokens (approximately 1 million)

INACCURATE: Claiming Gemini 2.5 has 2M token context window

WHY THIS MATTERS:

  • Gemini 1.5 Pro (older model) had 2M tokens
  • Gemini 2.5 models (current) have ~1M tokens
  • This is a common mistake that causes confusion!

This skill prevents this error by providing accurate information.


Model Selection Guide

Use gemini-2.5-pro When:

  • Complex reasoning required (math, logic, STEM)
  • Advanced code generation and optimization
  • Multi-step problem-solving
  • Quality is more important than cost
  • Tasks require maximum capability

Use gemini-2.5-flash When:

  • General-purpose AI applications
  • High-volume production workloads
  • Function calling required
  • Agentic workflows
  • Good balance of cost and quality needed
  • Recommended default choice

Use gemini-2.5-flash-lite When:

  • Simple text generation only
  • No function calling needed
  • High throughput required
  • Cost is primary concern
  • ⚠️ Only if you don't need function calling!

Common Mistakes

Mistake 1: Using Wrong Model Name

// WRONG - old model name
model: 'gemini-1.5-pro'

// CORRECT - current model
model: 'gemini-2.5-flash'

Mistake 2: Claiming 2M Context for 2.5 Models

// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"

// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M

Mistake 3: Using Flash-Lite for Function Calling

// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
  tools: [{ functionDeclarations: [...] }] // This will FAIL
}

// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
  tools: [{ functionDeclarations: [...] }]
}

Rate Limits (Free vs Paid)

Free Tier

  • 15 RPM (requests per minute)
  • 1M TPM (tokens per minute)
  • 1,500 RPD (requests per day)

Paid Tier

  • 360 RPM
  • 4M TPM
  • Unlimited daily requests

Tip: Monitor your usage and implement rate limiting to stay within quotas.


Official Documentation


Production Tip: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).