# Context Caching Guide Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%. --- ## What is Context Caching? Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency. --- ## How It Works 1. **Create a cache** with your repeated content (documents, videos, system instructions) 2. **Set TTL** (time-to-live) for cache expiration 3. **Reference the cache** in subsequent API calls 4. **Pay less** - cached tokens cost ~90% less than regular input tokens --- ## Benefits ### Cost Savings - **Cached input tokens**: ~90% cheaper than regular tokens - **Output tokens**: Same price (not cached) - **Example**: 100K token document cached → ~10K token cost equivalent ### Performance - **Reduced latency**: Cached content is preprocessed - **Faster responses**: No need to reprocess large context - **Consistent results**: Same context every time ### Use Cases - Large documents analyzed repeatedly - Long system instructions used across sessions - Video/audio files queried multiple times - Consistent conversation context --- ## Cache Creation ### Basic Cache (SDK) ```typescript import { GoogleGenAI } from '@google/genai'; const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const cache = await ai.caches.create({ model: 'gemini-2.5-flash-001', // Must use explicit version! config: { displayName: 'my-cache', systemInstruction: 'You are a helpful assistant.', contents: 'Large document content here...', ttl: '3600s', // 1 hour } }); ``` ### Cache with Expiration Time ```typescript // Set specific expiration time (timezone-aware) const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now const cache = await ai.caches.create({ model: 'gemini-2.5-flash-001', config: { displayName: 'my-cache', contents: documentText, expireTime: expirationTime, // Use expireTime instead of ttl } }); ``` --- ## TTL (Time-To-Live) Guidelines ### Recommended TTL Values | Use Case | TTL | Reason | |----------|-----|--------| | Quick analysis session | 300s (5 min) | Short-lived tasks | | Extended conversation | 3600s (1 hour) | Standard session length | | Daily batch processing | 86400s (24 hours) | Reuse across day | | Long-term analysis | 604800s (7 days) | Maximum allowed | ### TTL vs Expiration Time **TTL (time-to-live)**: - Relative duration from cache creation - Format: `"3600s"` (string with 's' suffix) - Easy for session-based caching **Expiration Time**: - Absolute timestamp - Must be timezone-aware Date object - Precise control over cache lifetime --- ## Using a Cache ### Generate Content with Cache (SDK) ```typescript // Use cache name as model parameter const response = await ai.models.generateContent({ model: cache.name, // Use cache.name, not original model name contents: 'Summarize the document' }); console.log(response.text); ``` ### Multiple Queries with Same Cache ```typescript const queries = [ 'What are the key points?', 'Who are the main characters?', 'What is the conclusion?' ]; for (const query of queries) { const response = await ai.models.generateContent({ model: cache.name, contents: query }); console.log(`Q: ${query}`); console.log(`A: ${response.text}\n`); } ``` --- ## Cache Management ### Update Cache TTL ```typescript // Extend cache lifetime before it expires await ai.caches.update({ name: cache.name, config: { ttl: '7200s' // Extend to 2 hours } }); ``` ### List All Caches ```typescript const caches = await ai.caches.list(); caches.forEach(cache => { console.log(`${cache.displayName}: ${cache.name}`); console.log(`Expires: ${cache.expireTime}`); }); ``` ### Delete Cache ```typescript // Delete when no longer needed await ai.caches.delete({ name: cache.name }); ``` --- ## Advanced Use Cases ### Caching Video Files ```typescript import fs from 'fs'; // 1. Upload video const videoFile = await ai.files.upload({ file: fs.createReadStream('./video.mp4') }); // 2. Wait for processing while (videoFile.state.name === 'PROCESSING') { await new Promise(resolve => setTimeout(resolve, 2000)); videoFile = await ai.files.get({ name: videoFile.name }); } // 3. Create cache with video const cache = await ai.caches.create({ model: 'gemini-2.5-flash-001', config: { displayName: 'video-cache', systemInstruction: 'Analyze this video.', contents: [videoFile], ttl: '600s' } }); // 4. Query video multiple times const response1 = await ai.models.generateContent({ model: cache.name, contents: 'What happens in the first minute?' }); const response2 = await ai.models.generateContent({ model: cache.name, contents: 'Who are the main people?' }); ``` ### Caching with System Instructions ```typescript const cache = await ai.caches.create({ model: 'gemini-2.5-flash-001', config: { displayName: 'legal-expert-cache', systemInstruction: ` You are a legal expert specializing in contract law. Always cite relevant sections when making claims. Use clear, professional language. `, contents: largeContractDocument, ttl: '3600s' } }); // System instruction is part of cached context const response = await ai.models.generateContent({ model: cache.name, contents: 'Is this contract enforceable?' }); ``` --- ## Important Notes ### Model Version Requirement **⚠️ You MUST use explicit version suffixes when creating caches:** ```typescript // ✅ CORRECT model: 'gemini-2.5-flash-001' // ❌ WRONG (will fail) model: 'gemini-2.5-flash' ``` ### Cache Expiration - Caches are **automatically deleted** after TTL expires - **Cannot recover** expired caches - must recreate - Update TTL **before expiration** to extend lifetime ### Cost Calculation ``` Regular request: 100,000 input tokens = 100K token cost With caching (after cache creation): - Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost - New tokens: 1,000 × 1.0 = 1K cost - Total: 11K equivalent (89% savings!) ``` ### Limitations - Maximum TTL: 7 days (604800s) - Cache creation costs same as regular tokens (first time only) - Subsequent uses get 90% discount - Only input tokens are cached (output tokens never cached) --- ## Best Practices ### When to Use Caching ✅ **Good Use Cases:** - Large documents queried repeatedly (legal docs, research papers) - Video/audio files analyzed with different questions - Long system instructions used across many requests - Consistent context in multi-turn conversations ❌ **Bad Use Cases:** - Single-use content (no benefit) - Frequently changing content - Short content (<1000 tokens) - minimal savings - Content used only once per day (cache might expire) ### Optimization Tips 1. **Cache Early**: Create cache at session start 2. **Extend TTL**: Update before expiration if still needed 3. **Monitor Usage**: Track how often cache is reused 4. **Clean Up**: Delete unused caches to avoid clutter 5. **Combine Features**: Use caching with code execution, grounding for powerful workflows ### Cache Naming Use descriptive `displayName` for easy identification: ```typescript // ✅ Good names displayName: 'financial-report-2024-q3' displayName: 'legal-contract-acme-corp' displayName: 'video-analysis-project-x' // ❌ Vague names displayName: 'cache1' displayName: 'test' ``` --- ## Troubleshooting ### "Invalid model name" Error **Problem**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001` **Solution**: Always use explicit version suffix: ```typescript model: 'gemini-2.5-flash-001' // Correct ``` ### Cache Expired Error **Problem**: Trying to use cache after TTL expired **Solution**: Check expiration before use or extend TTL proactively: ```typescript const cache = await ai.caches.get({ name: cacheName }); if (new Date(cache.expireTime) < new Date()) { // Cache expired, recreate it cache = await ai.caches.create({ ... }); } ``` ### High Costs Despite Caching **Problem**: Creating new cache for each request **Solution**: Reuse the same cache across multiple requests: ```typescript // ❌ Wrong - creates new cache each time for (const query of queries) { const cache = await ai.caches.create({ ... }); // Expensive! const response = await ai.models.generateContent({ model: cache.name, ... }); } // ✅ Correct - create once, use many times const cache = await ai.caches.create({ ... }); // Create once for (const query of queries) { const response = await ai.models.generateContent({ model: cache.name, ... }); } ``` --- ## References - Official Docs: https://ai.google.dev/gemini-api/docs/caching - Cost Optimization: See "Cost Optimization" in main SKILL.md - Templates: See `context-caching.ts` for working examples