zhongwei/gh-jezweb-claude-skills-skills-google-gemini-api

Fork 0

Files

Zhongwei Li 8aebb293cd Initial commit

2025-11-30 08:24:51 +08:00

7.2 KiB

Raw Blame History

Gemini Models Guide (2025)

Last Updated: 2025-11-19 (Gemini 3 preview release)

Gemini 3 Series (Preview - November 2025)

gemini-3-pro-preview

Model ID: gemini-3-pro-preview

Status: 🆕 Preview release (November 18, 2025)

Context Windows:

Input: TBD (documentation pending)
Output: TBD (documentation pending)

Description: Google's newest and most intelligent AI model with state-of-the-art reasoning and multimodal understanding. Outperforms Gemini 2.5 Pro on every major AI benchmark.

Best For:

Most complex reasoning tasks
Advanced multimodal analysis (images, videos, PDFs, audio)
Benchmark-critical applications
Cutting-edge projects requiring latest capabilities
Tasks requiring absolute best quality

Features:

✅ Enhanced multimodal understanding
✅ Function calling
✅ Streaming
✅ System instructions
✅ JSON mode
TBD Thinking mode (documentation pending)

Knowledge Cutoff: TBD

Pricing: Preview pricing (likely higher than 2.5 Pro)

⚠️ Preview Status: Use for evaluation and testing. Consider gemini-2.5-pro for production-critical decisions until Gemini 3 reaches stable general availability.

New Capabilities:

Record-breaking benchmark performance
Enhanced generative UI responses
Advanced coding capabilities (Google Antigravity integration)
State-of-the-art multimodal understanding

Current Production Models (Gemini 2.5 - Stable)

gemini-2.5-pro

Model ID: gemini-2.5-pro

Context Windows:

Input: 1,048,576 tokens (NOT 2M!)
Output: 65,536 tokens

Description: State-of-the-art thinking model capable of reasoning over complex problems in code, math, and STEM.

Best For:

Complex reasoning tasks
Advanced code generation and optimization
Mathematical problem-solving
Multi-step logical analysis
STEM applications

Features:

✅ Thinking mode (enabled by default)
✅ Function calling
✅ Multimodal (text, images, video, audio, PDFs)
✅ Streaming
✅ System instructions
✅ JSON mode

Knowledge Cutoff: January 2025

Pricing: Higher cost, use for tasks requiring best quality

gemini-2.5-flash

Model ID: gemini-2.5-flash

Context Windows:

Input: 1,048,576 tokens
Output: 65,536 tokens

Description: Best price-performance model for large-scale processing, low-latency, and high-volume tasks.

Best For:

General-purpose AI applications
High-volume API calls
Agentic workflows
Cost-sensitive applications
Production workloads

Features:

✅ Thinking mode (enabled by default)
✅ Function calling
✅ Multimodal (text, images, video, audio, PDFs)
✅ Streaming
✅ System instructions
✅ JSON mode

Knowledge Cutoff: January 2025

Pricing: Best price-performance ratio

⭐ Recommended: This is the default choice for most applications

gemini-2.5-flash-lite

Model ID: gemini-2.5-flash-lite

Context Windows:

Input: 1,048,576 tokens
Output: 65,536 tokens

Description: Most cost-efficient and fastest 2.5 model, optimized for high throughput.

Best For:

High-throughput applications
Simple text generation
Cost-critical use cases
Speed-prioritized workloads

Features:

✅ Thinking mode (enabled by default)
❌ NO function calling (critical limitation!)
✅ Multimodal (text, images, video, audio, PDFs)
✅ Streaming
✅ System instructions
✅ JSON mode

Knowledge Cutoff: January 2025

Pricing: Lowest cost

⚠️ Important: Flash-Lite does NOT support function calling! Use Flash or Pro if you need tool use.

Model Comparison Matrix

Feature	Pro	Flash	Flash-Lite
Thinking Mode	✅ Default ON	✅ Default ON	✅ Default ON
Function Calling	✅ Yes	✅ Yes	❌ NO
Multimodal	✅ Full	✅ Full	✅ Full
Streaming	✅ Yes	✅ Yes	✅ Yes
Input Tokens	1,048,576	1,048,576	1,048,576
Output Tokens	65,536	65,536	65,536
Reasoning Quality	Best	Good	Basic
Speed	Moderate	Fast	Fastest
Cost	Highest	Medium	Lowest

Previous Generation Models (Still Available)

Gemini 2.0 Flash

Model ID: gemini-2.0-flash

Context: 1M input / 65K output tokens

Status: Previous generation, 2.5 Flash recommended instead

Gemini 1.5 Pro

Model ID: gemini-1.5-pro

Context: 2M input tokens (this is the ONLY model with 2M!)

Status: Older model, 2.5 models recommended

Context Window Clarification

⚠️ CRITICAL CORRECTION:

ACCURATE: Gemini 2.5 models support 1,048,576 input tokens (approximately 1 million)

INACCURATE: Claiming Gemini 2.5 has 2M token context window

WHY THIS MATTERS:

Gemini 1.5 Pro (older model) had 2M tokens
Gemini 2.5 models (current) have ~1M tokens
This is a common mistake that causes confusion!

This skill prevents this error by providing accurate information.

Model Selection Guide

Use gemini-2.5-pro When:

✅ Complex reasoning required (math, logic, STEM)
✅ Advanced code generation and optimization
✅ Multi-step problem-solving
✅ Quality is more important than cost
✅ Tasks require maximum capability

Use gemini-2.5-flash When:

✅ General-purpose AI applications
✅ High-volume production workloads
✅ Function calling required
✅ Agentic workflows
✅ Good balance of cost and quality needed
⭐ Recommended default choice

Use gemini-2.5-flash-lite When:

✅ Simple text generation only
✅ No function calling needed
✅ High throughput required
✅ Cost is primary concern
⚠️ Only if you don't need function calling!

Common Mistakes

❌ Mistake 1: Using Wrong Model Name

// WRONG - old model name
model: 'gemini-1.5-pro'

// CORRECT - current model
model: 'gemini-2.5-flash'

❌ Mistake 2: Claiming 2M Context for 2.5 Models

// WRONG ASSUMPTION
// "Gemini 2.5 has 2M token context window"

// CORRECT
// Gemini 2.5 has 1,048,576 input tokens
// Only Gemini 1.5 Pro (older) had 2M

❌ Mistake 3: Using Flash-Lite for Function Calling

// WRONG - Flash-Lite doesn't support function calling!
model: 'gemini-2.5-flash-lite',
config: {
  tools: [{ functionDeclarations: [...] }] // This will FAIL
}

// CORRECT
model: 'gemini-2.5-flash', // or gemini-2.5-pro
config: {
  tools: [{ functionDeclarations: [...] }]
}

Rate Limits (Free vs Paid)

Free Tier

15 RPM (requests per minute)
1M TPM (tokens per minute)
1,500 RPD (requests per day)

Paid Tier

360 RPM
4M TPM
Unlimited daily requests

Tip: Monitor your usage and implement rate limiting to stay within quotas.

Official Documentation

Models Overview: https://ai.google.dev/gemini-api/docs/models
Gemini 2.5 Announcement: https://developers.googleblog.com/en/gemini-2-5-thinking-model-updates/
Pricing: https://ai.google.dev/pricing

Production Tip: Always use gemini-2.5-flash as your default unless you specifically need Pro's advanced reasoning or want to minimize cost with Flash-Lite (and don't need function calling).

7.2 KiB Raw Blame History

Gemini Models Guide (2025)

Gemini 3 Series (Preview - November 2025)

gemini-3-pro-preview

Current Production Models (Gemini 2.5 - Stable)

gemini-2.5-pro

gemini-2.5-flash

gemini-2.5-flash-lite

Model Comparison Matrix

Previous Generation Models (Still Available)

Gemini 2.0 Flash

Gemini 1.5 Pro

Context Window Clarification

Model Selection Guide

Use gemini-2.5-pro When:

Use gemini-2.5-flash When:

Use gemini-2.5-flash-lite When:

Common Mistakes

❌ Mistake 1: Using Wrong Model Name

❌ Mistake 2: Claiming 2M Context for 2.5 Models

❌ Mistake 3: Using Flash-Lite for Function Calling

Rate Limits (Free vs Paid)

Free Tier

Paid Tier

Official Documentation

7.2 KiB

Raw Blame History