zhongwei/gh-jezweb-claude-skills-skills-google-gemini-api

Fork 0

Files

Zhongwei Li 8aebb293cd Initial commit

2025-11-30 08:24:51 +08:00

8.7 KiB

Raw Blame History

Context Caching Guide

Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%.

What is Context Caching?

Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency.

How It Works

Create a cache with your repeated content (documents, videos, system instructions)
Set TTL (time-to-live) for cache expiration
Reference the cache in subsequent API calls
Pay less - cached tokens cost ~90% less than regular input tokens

Benefits

Cost Savings

Cached input tokens: ~90% cheaper than regular tokens
Output tokens: Same price (not cached)
Example: 100K token document cached → ~10K token cost equivalent

Performance

Reduced latency: Cached content is preprocessed
Faster responses: No need to reprocess large context
Consistent results: Same context every time

Use Cases

Large documents analyzed repeatedly
Long system instructions used across sessions
Video/audio files queried multiple times
Consistent conversation context

Cache Creation

Basic Cache (SDK)

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const cache = await ai.caches.create({
  model: 'gemini-2.5-flash-001', // Must use explicit version!
  config: {
    displayName: 'my-cache',
    systemInstruction: 'You are a helpful assistant.',
    contents: 'Large document content here...',
    ttl: '3600s', // 1 hour
  }
});

Cache with Expiration Time

// Set specific expiration time (timezone-aware)
const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now

const cache = await ai.caches.create({
  model: 'gemini-2.5-flash-001',
  config: {
    displayName: 'my-cache',
    contents: documentText,
    expireTime: expirationTime, // Use expireTime instead of ttl
  }
});

TTL (Time-To-Live) Guidelines

Recommended TTL Values

Use Case	TTL	Reason
Quick analysis session	300s (5 min)	Short-lived tasks
Extended conversation	3600s (1 hour)	Standard session length
Daily batch processing	86400s (24 hours)	Reuse across day
Long-term analysis	604800s (7 days)	Maximum allowed

TTL vs Expiration Time

TTL (time-to-live):

Relative duration from cache creation
Format: "3600s" (string with 's' suffix)
Easy for session-based caching

Expiration Time:

Absolute timestamp
Must be timezone-aware Date object
Precise control over cache lifetime

Using a Cache

Generate Content with Cache (SDK)

// Use cache name as model parameter
const response = await ai.models.generateContent({
  model: cache.name, // Use cache.name, not original model name
  contents: 'Summarize the document'
});

console.log(response.text);

Multiple Queries with Same Cache

const queries = [
  'What are the key points?',
  'Who are the main characters?',
  'What is the conclusion?'
];

for (const query of queries) {
  const response = await ai.models.generateContent({
    model: cache.name,
    contents: query
  });
  console.log(`Q: ${query}`);
  console.log(`A: ${response.text}\n`);
}

Cache Management

Update Cache TTL

// Extend cache lifetime before it expires
await ai.caches.update({
  name: cache.name,
  config: {
    ttl: '7200s' // Extend to 2 hours
  }
});

List All Caches

const caches = await ai.caches.list();
caches.forEach(cache => {
  console.log(`${cache.displayName}: ${cache.name}`);
  console.log(`Expires: ${cache.expireTime}`);
});

Delete Cache

// Delete when no longer needed
await ai.caches.delete({ name: cache.name });

Advanced Use Cases

Caching Video Files

import fs from 'fs';

// 1. Upload video
const videoFile = await ai.files.upload({
  file: fs.createReadStream('./video.mp4')
});

// 2. Wait for processing
while (videoFile.state.name === 'PROCESSING') {
  await new Promise(resolve => setTimeout(resolve, 2000));
  videoFile = await ai.files.get({ name: videoFile.name });
}

// 3. Create cache with video
const cache = await ai.caches.create({
  model: 'gemini-2.5-flash-001',
  config: {
    displayName: 'video-cache',
    systemInstruction: 'Analyze this video.',
    contents: [videoFile],
    ttl: '600s'
  }
});

// 4. Query video multiple times
const response1 = await ai.models.generateContent({
  model: cache.name,
  contents: 'What happens in the first minute?'
});

const response2 = await ai.models.generateContent({
  model: cache.name,
  contents: 'Who are the main people?'
});

Caching with System Instructions

const cache = await ai.caches.create({
  model: 'gemini-2.5-flash-001',
  config: {
    displayName: 'legal-expert-cache',
    systemInstruction: `
      You are a legal expert specializing in contract law.
      Always cite relevant sections when making claims.
      Use clear, professional language.
    `,
    contents: largeContractDocument,
    ttl: '3600s'
  }
});

// System instruction is part of cached context
const response = await ai.models.generateContent({
  model: cache.name,
  contents: 'Is this contract enforceable?'
});

Important Notes

Model Version Requirement

⚠️ You MUST use explicit version suffixes when creating caches:

// ✅ CORRECT
model: 'gemini-2.5-flash-001'

// ❌ WRONG (will fail)
model: 'gemini-2.5-flash'

Cache Expiration

Caches are automatically deleted after TTL expires
Cannot recover expired caches - must recreate
Update TTL before expiration to extend lifetime

Cost Calculation

Regular request: 100,000 input tokens = 100K token cost

With caching (after cache creation):
- Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost
- New tokens: 1,000 × 1.0 = 1K cost
- Total: 11K equivalent (89% savings!)

Limitations

Maximum TTL: 7 days (604800s)
Cache creation costs same as regular tokens (first time only)
Subsequent uses get 90% discount
Only input tokens are cached (output tokens never cached)

Best Practices

When to Use Caching

✅ Good Use Cases:

Large documents queried repeatedly (legal docs, research papers)
Video/audio files analyzed with different questions
Long system instructions used across many requests
Consistent context in multi-turn conversations

❌ Bad Use Cases:

Single-use content (no benefit)
Frequently changing content
Short content (<1000 tokens) - minimal savings
Content used only once per day (cache might expire)

Optimization Tips

Cache Early: Create cache at session start
Extend TTL: Update before expiration if still needed
Monitor Usage: Track how often cache is reused
Clean Up: Delete unused caches to avoid clutter
Combine Features: Use caching with code execution, grounding for powerful workflows

Cache Naming

Use descriptive displayName for easy identification:

// ✅ Good names
displayName: 'financial-report-2024-q3'
displayName: 'legal-contract-acme-corp'
displayName: 'video-analysis-project-x'

// ❌ Vague names
displayName: 'cache1'
displayName: 'test'

Troubleshooting

"Invalid model name" Error

Problem: Using gemini-2.5-flash instead of gemini-2.5-flash-001

Solution: Always use explicit version suffix:

model: 'gemini-2.5-flash-001' // Correct

Cache Expired Error

Problem: Trying to use cache after TTL expired

Solution: Check expiration before use or extend TTL proactively:

const cache = await ai.caches.get({ name: cacheName });
if (new Date(cache.expireTime) < new Date()) {
  // Cache expired, recreate it
  cache = await ai.caches.create({ ... });
}

High Costs Despite Caching

Problem: Creating new cache for each request

Solution: Reuse the same cache across multiple requests:

// ❌ Wrong - creates new cache each time
for (const query of queries) {
  const cache = await ai.caches.create({ ... }); // Expensive!
  const response = await ai.models.generateContent({ model: cache.name, ... });
}

// ✅ Correct - create once, use many times
const cache = await ai.caches.create({ ... }); // Create once
for (const query of queries) {
  const response = await ai.models.generateContent({ model: cache.name, ... });
}

References

Official Docs: https://ai.google.dev/gemini-api/docs/caching
Cost Optimization: See "Cost Optimization" in main SKILL.md
Templates: See context-caching.ts for working examples

8.7 KiB Raw Blame History Unescape Escape