Initial commit
This commit is contained in:
373
references/context-caching-guide.md
Normal file
373
references/context-caching-guide.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Context Caching Guide
|
||||
|
||||
Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%.
|
||||
|
||||
---
|
||||
|
||||
## What is Context Caching?
|
||||
|
||||
Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Create a cache** with your repeated content (documents, videos, system instructions)
|
||||
2. **Set TTL** (time-to-live) for cache expiration
|
||||
3. **Reference the cache** in subsequent API calls
|
||||
4. **Pay less** - cached tokens cost ~90% less than regular input tokens
|
||||
|
||||
---
|
||||
|
||||
## Benefits
|
||||
|
||||
### Cost Savings
|
||||
- **Cached input tokens**: ~90% cheaper than regular tokens
|
||||
- **Output tokens**: Same price (not cached)
|
||||
- **Example**: 100K token document cached → ~10K token cost equivalent
|
||||
|
||||
### Performance
|
||||
- **Reduced latency**: Cached content is preprocessed
|
||||
- **Faster responses**: No need to reprocess large context
|
||||
- **Consistent results**: Same context every time
|
||||
|
||||
### Use Cases
|
||||
- Large documents analyzed repeatedly
|
||||
- Long system instructions used across sessions
|
||||
- Video/audio files queried multiple times
|
||||
- Consistent conversation context
|
||||
|
||||
---
|
||||
|
||||
## Cache Creation
|
||||
|
||||
### Basic Cache (SDK)
|
||||
|
||||
```typescript
|
||||
import { GoogleGenAI } from '@google/genai';
|
||||
|
||||
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
|
||||
|
||||
const cache = await ai.caches.create({
|
||||
model: 'gemini-2.5-flash-001', // Must use explicit version!
|
||||
config: {
|
||||
displayName: 'my-cache',
|
||||
systemInstruction: 'You are a helpful assistant.',
|
||||
contents: 'Large document content here...',
|
||||
ttl: '3600s', // 1 hour
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### Cache with Expiration Time
|
||||
|
||||
```typescript
|
||||
// Set specific expiration time (timezone-aware)
|
||||
const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now
|
||||
|
||||
const cache = await ai.caches.create({
|
||||
model: 'gemini-2.5-flash-001',
|
||||
config: {
|
||||
displayName: 'my-cache',
|
||||
contents: documentText,
|
||||
expireTime: expirationTime, // Use expireTime instead of ttl
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TTL (Time-To-Live) Guidelines
|
||||
|
||||
### Recommended TTL Values
|
||||
|
||||
| Use Case | TTL | Reason |
|
||||
|----------|-----|--------|
|
||||
| Quick analysis session | 300s (5 min) | Short-lived tasks |
|
||||
| Extended conversation | 3600s (1 hour) | Standard session length |
|
||||
| Daily batch processing | 86400s (24 hours) | Reuse across day |
|
||||
| Long-term analysis | 604800s (7 days) | Maximum allowed |
|
||||
|
||||
### TTL vs Expiration Time
|
||||
|
||||
**TTL (time-to-live)**:
|
||||
- Relative duration from cache creation
|
||||
- Format: `"3600s"` (string with 's' suffix)
|
||||
- Easy for session-based caching
|
||||
|
||||
**Expiration Time**:
|
||||
- Absolute timestamp
|
||||
- Must be timezone-aware Date object
|
||||
- Precise control over cache lifetime
|
||||
|
||||
---
|
||||
|
||||
## Using a Cache
|
||||
|
||||
### Generate Content with Cache (SDK)
|
||||
|
||||
```typescript
|
||||
// Use cache name as model parameter
|
||||
const response = await ai.models.generateContent({
|
||||
model: cache.name, // Use cache.name, not original model name
|
||||
contents: 'Summarize the document'
|
||||
});
|
||||
|
||||
console.log(response.text);
|
||||
```
|
||||
|
||||
### Multiple Queries with Same Cache
|
||||
|
||||
```typescript
|
||||
const queries = [
|
||||
'What are the key points?',
|
||||
'Who are the main characters?',
|
||||
'What is the conclusion?'
|
||||
];
|
||||
|
||||
for (const query of queries) {
|
||||
const response = await ai.models.generateContent({
|
||||
model: cache.name,
|
||||
contents: query
|
||||
});
|
||||
console.log(`Q: ${query}`);
|
||||
console.log(`A: ${response.text}\n`);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cache Management
|
||||
|
||||
### Update Cache TTL
|
||||
|
||||
```typescript
|
||||
// Extend cache lifetime before it expires
|
||||
await ai.caches.update({
|
||||
name: cache.name,
|
||||
config: {
|
||||
ttl: '7200s' // Extend to 2 hours
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### List All Caches
|
||||
|
||||
```typescript
|
||||
const caches = await ai.caches.list();
|
||||
caches.forEach(cache => {
|
||||
console.log(`${cache.displayName}: ${cache.name}`);
|
||||
console.log(`Expires: ${cache.expireTime}`);
|
||||
});
|
||||
```
|
||||
|
||||
### Delete Cache
|
||||
|
||||
```typescript
|
||||
// Delete when no longer needed
|
||||
await ai.caches.delete({ name: cache.name });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Use Cases
|
||||
|
||||
### Caching Video Files
|
||||
|
||||
```typescript
|
||||
import fs from 'fs';
|
||||
|
||||
// 1. Upload video
|
||||
const videoFile = await ai.files.upload({
|
||||
file: fs.createReadStream('./video.mp4')
|
||||
});
|
||||
|
||||
// 2. Wait for processing
|
||||
while (videoFile.state.name === 'PROCESSING') {
|
||||
await new Promise(resolve => setTimeout(resolve, 2000));
|
||||
videoFile = await ai.files.get({ name: videoFile.name });
|
||||
}
|
||||
|
||||
// 3. Create cache with video
|
||||
const cache = await ai.caches.create({
|
||||
model: 'gemini-2.5-flash-001',
|
||||
config: {
|
||||
displayName: 'video-cache',
|
||||
systemInstruction: 'Analyze this video.',
|
||||
contents: [videoFile],
|
||||
ttl: '600s'
|
||||
}
|
||||
});
|
||||
|
||||
// 4. Query video multiple times
|
||||
const response1 = await ai.models.generateContent({
|
||||
model: cache.name,
|
||||
contents: 'What happens in the first minute?'
|
||||
});
|
||||
|
||||
const response2 = await ai.models.generateContent({
|
||||
model: cache.name,
|
||||
contents: 'Who are the main people?'
|
||||
});
|
||||
```
|
||||
|
||||
### Caching with System Instructions
|
||||
|
||||
```typescript
|
||||
const cache = await ai.caches.create({
|
||||
model: 'gemini-2.5-flash-001',
|
||||
config: {
|
||||
displayName: 'legal-expert-cache',
|
||||
systemInstruction: `
|
||||
You are a legal expert specializing in contract law.
|
||||
Always cite relevant sections when making claims.
|
||||
Use clear, professional language.
|
||||
`,
|
||||
contents: largeContractDocument,
|
||||
ttl: '3600s'
|
||||
}
|
||||
});
|
||||
|
||||
// System instruction is part of cached context
|
||||
const response = await ai.models.generateContent({
|
||||
model: cache.name,
|
||||
contents: 'Is this contract enforceable?'
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Important Notes
|
||||
|
||||
### Model Version Requirement
|
||||
|
||||
**⚠️ You MUST use explicit version suffixes when creating caches:**
|
||||
|
||||
```typescript
|
||||
// ✅ CORRECT
|
||||
model: 'gemini-2.5-flash-001'
|
||||
|
||||
// ❌ WRONG (will fail)
|
||||
model: 'gemini-2.5-flash'
|
||||
```
|
||||
|
||||
### Cache Expiration
|
||||
|
||||
- Caches are **automatically deleted** after TTL expires
|
||||
- **Cannot recover** expired caches - must recreate
|
||||
- Update TTL **before expiration** to extend lifetime
|
||||
|
||||
### Cost Calculation
|
||||
|
||||
```
|
||||
Regular request: 100,000 input tokens = 100K token cost
|
||||
|
||||
With caching (after cache creation):
|
||||
- Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost
|
||||
- New tokens: 1,000 × 1.0 = 1K cost
|
||||
- Total: 11K equivalent (89% savings!)
|
||||
```
|
||||
|
||||
### Limitations
|
||||
|
||||
- Maximum TTL: 7 days (604800s)
|
||||
- Cache creation costs same as regular tokens (first time only)
|
||||
- Subsequent uses get 90% discount
|
||||
- Only input tokens are cached (output tokens never cached)
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### When to Use Caching
|
||||
|
||||
✅ **Good Use Cases:**
|
||||
- Large documents queried repeatedly (legal docs, research papers)
|
||||
- Video/audio files analyzed with different questions
|
||||
- Long system instructions used across many requests
|
||||
- Consistent context in multi-turn conversations
|
||||
|
||||
❌ **Bad Use Cases:**
|
||||
- Single-use content (no benefit)
|
||||
- Frequently changing content
|
||||
- Short content (<1000 tokens) - minimal savings
|
||||
- Content used only once per day (cache might expire)
|
||||
|
||||
### Optimization Tips
|
||||
|
||||
1. **Cache Early**: Create cache at session start
|
||||
2. **Extend TTL**: Update before expiration if still needed
|
||||
3. **Monitor Usage**: Track how often cache is reused
|
||||
4. **Clean Up**: Delete unused caches to avoid clutter
|
||||
5. **Combine Features**: Use caching with code execution, grounding for powerful workflows
|
||||
|
||||
### Cache Naming
|
||||
|
||||
Use descriptive `displayName` for easy identification:
|
||||
|
||||
```typescript
|
||||
// ✅ Good names
|
||||
displayName: 'financial-report-2024-q3'
|
||||
displayName: 'legal-contract-acme-corp'
|
||||
displayName: 'video-analysis-project-x'
|
||||
|
||||
// ❌ Vague names
|
||||
displayName: 'cache1'
|
||||
displayName: 'test'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Invalid model name" Error
|
||||
|
||||
**Problem**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
|
||||
|
||||
**Solution**: Always use explicit version suffix:
|
||||
|
||||
```typescript
|
||||
model: 'gemini-2.5-flash-001' // Correct
|
||||
```
|
||||
|
||||
### Cache Expired Error
|
||||
|
||||
**Problem**: Trying to use cache after TTL expired
|
||||
|
||||
**Solution**: Check expiration before use or extend TTL proactively:
|
||||
|
||||
```typescript
|
||||
const cache = await ai.caches.get({ name: cacheName });
|
||||
if (new Date(cache.expireTime) < new Date()) {
|
||||
// Cache expired, recreate it
|
||||
cache = await ai.caches.create({ ... });
|
||||
}
|
||||
```
|
||||
|
||||
### High Costs Despite Caching
|
||||
|
||||
**Problem**: Creating new cache for each request
|
||||
|
||||
**Solution**: Reuse the same cache across multiple requests:
|
||||
|
||||
```typescript
|
||||
// ❌ Wrong - creates new cache each time
|
||||
for (const query of queries) {
|
||||
const cache = await ai.caches.create({ ... }); // Expensive!
|
||||
const response = await ai.models.generateContent({ model: cache.name, ... });
|
||||
}
|
||||
|
||||
// ✅ Correct - create once, use many times
|
||||
const cache = await ai.caches.create({ ... }); // Create once
|
||||
for (const query of queries) {
|
||||
const response = await ai.models.generateContent({ model: cache.name, ... });
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Official Docs: https://ai.google.dev/gemini-api/docs/caching
|
||||
- Cost Optimization: See "Cost Optimization" in main SKILL.md
|
||||
- Templates: See `context-caching.ts` for working examples
|
||||
Reference in New Issue
Block a user