Initial commit

2025-11-30 08:24:51 +08:00
commit 8aebb293cd
31 changed files with 7386 additions and 0 deletions
--- a/references/context-caching-guide.md
+++ b/references/context-caching-guide.md
@@ -0,0 +1,373 @@
+# Context Caching Guide
+
+Complete guide to using context caching with Google Gemini API to reduce costs by up to 90%.
+
+---
+
+## What is Context Caching?
+
+Context caching allows you to cache frequently used content (system instructions, large documents, videos) and reuse it across multiple requests, significantly reducing token costs and improving latency.
+
+---
+
+## How It Works
+
+1. **Create a cache** with your repeated content (documents, videos, system instructions)
+2. **Set TTL** (time-to-live) for cache expiration
+3. **Reference the cache** in subsequent API calls
+4. **Pay less** - cached tokens cost ~90% less than regular input tokens
+
+---
+
+## Benefits
+
+### Cost Savings
+- **Cached input tokens**: ~90% cheaper than regular tokens
+- **Output tokens**: Same price (not cached)
+- **Example**: 100K token document cached → ~10K token cost equivalent
+
+### Performance
+- **Reduced latency**: Cached content is preprocessed
+- **Faster responses**: No need to reprocess large context
+- **Consistent results**: Same context every time
+
+### Use Cases
+- Large documents analyzed repeatedly
+- Long system instructions used across sessions
+- Video/audio files queried multiple times
+- Consistent conversation context
+
+---
+
+## Cache Creation
+
+### Basic Cache (SDK)
+
+```typescript
+import { GoogleGenAI } from '@google/genai';
+
+const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
+
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001', // Must use explicit version!
+  config: {
+    displayName: 'my-cache',
+    systemInstruction: 'You are a helpful assistant.',
+    contents: 'Large document content here...',
+    ttl: '3600s', // 1 hour
+  }
+});
+```
+
+### Cache with Expiration Time
+
+```typescript
+// Set specific expiration time (timezone-aware)
+const expirationTime = new Date(Date.now() + 2 * 60 * 60 * 1000); // 2 hours from now
+
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'my-cache',
+    contents: documentText,
+    expireTime: expirationTime, // Use expireTime instead of ttl
+  }
+});
+```
+
+---
+
+## TTL (Time-To-Live) Guidelines
+
+### Recommended TTL Values
+
+| Use Case | TTL | Reason |
+|----------|-----|--------|
+| Quick analysis session | 300s (5 min) | Short-lived tasks |
+| Extended conversation | 3600s (1 hour) | Standard session length |
+| Daily batch processing | 86400s (24 hours) | Reuse across day |
+| Long-term analysis | 604800s (7 days) | Maximum allowed |
+
+### TTL vs Expiration Time
+
+**TTL (time-to-live)**:
+- Relative duration from cache creation
+- Format: `"3600s"` (string with 's' suffix)
+- Easy for session-based caching
+
+**Expiration Time**:
+- Absolute timestamp
+- Must be timezone-aware Date object
+- Precise control over cache lifetime
+
+---
+
+## Using a Cache
+
+### Generate Content with Cache (SDK)
+
+```typescript
+// Use cache name as model parameter
+const response = await ai.models.generateContent({
+  model: cache.name, // Use cache.name, not original model name
+  contents: 'Summarize the document'
+});
+
+console.log(response.text);
+```
+
+### Multiple Queries with Same Cache
+
+```typescript
+const queries = [
+  'What are the key points?',
+  'Who are the main characters?',
+  'What is the conclusion?'
+];
+
+for (const query of queries) {
+  const response = await ai.models.generateContent({
+    model: cache.name,
+    contents: query
+  });
+  console.log(`Q: ${query}`);
+  console.log(`A: ${response.text}\n`);
+}
+```
+
+---
+
+## Cache Management
+
+### Update Cache TTL
+
+```typescript
+// Extend cache lifetime before it expires
+await ai.caches.update({
+  name: cache.name,
+  config: {
+    ttl: '7200s' // Extend to 2 hours
+  }
+});
+```
+
+### List All Caches
+
+```typescript
+const caches = await ai.caches.list();
+caches.forEach(cache => {
+  console.log(`${cache.displayName}: ${cache.name}`);
+  console.log(`Expires: ${cache.expireTime}`);
+});
+```
+
+### Delete Cache
+
+```typescript
+// Delete when no longer needed
+await ai.caches.delete({ name: cache.name });
+```
+
+---
+
+## Advanced Use Cases
+
+### Caching Video Files
+
+```typescript
+import fs from 'fs';
+
+// 1. Upload video
+const videoFile = await ai.files.upload({
+  file: fs.createReadStream('./video.mp4')
+});
+
+// 2. Wait for processing
+while (videoFile.state.name === 'PROCESSING') {
+  await new Promise(resolve => setTimeout(resolve, 2000));
+  videoFile = await ai.files.get({ name: videoFile.name });
+}
+
+// 3. Create cache with video
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'video-cache',
+    systemInstruction: 'Analyze this video.',
+    contents: [videoFile],
+    ttl: '600s'
+  }
+});
+
+// 4. Query video multiple times
+const response1 = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'What happens in the first minute?'
+});
+
+const response2 = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'Who are the main people?'
+});
+```
+
+### Caching with System Instructions
+
+```typescript
+const cache = await ai.caches.create({
+  model: 'gemini-2.5-flash-001',
+  config: {
+    displayName: 'legal-expert-cache',
+    systemInstruction: `
+      You are a legal expert specializing in contract law.
+      Always cite relevant sections when making claims.
+      Use clear, professional language.
+    `,
+    contents: largeContractDocument,
+    ttl: '3600s'
+  }
+});
+
+// System instruction is part of cached context
+const response = await ai.models.generateContent({
+  model: cache.name,
+  contents: 'Is this contract enforceable?'
+});
+```
+
+---
+
+## Important Notes
+
+### Model Version Requirement
+
+**⚠️ You MUST use explicit version suffixes when creating caches:**
+
+```typescript
+// ✅ CORRECT
+model: 'gemini-2.5-flash-001'
+
+// ❌ WRONG (will fail)
+model: 'gemini-2.5-flash'
+```
+
+### Cache Expiration
+
+- Caches are **automatically deleted** after TTL expires
+- **Cannot recover** expired caches - must recreate
+- Update TTL **before expiration** to extend lifetime
+
+### Cost Calculation
+
+```
+Regular request: 100,000 input tokens = 100K token cost
+
+With caching (after cache creation):
+- Cached tokens: 100,000 × 0.1 (90% discount) = 10K equivalent cost
+- New tokens: 1,000 × 1.0 = 1K cost
+- Total: 11K equivalent (89% savings!)
+```
+
+### Limitations
+
+- Maximum TTL: 7 days (604800s)
+- Cache creation costs same as regular tokens (first time only)
+- Subsequent uses get 90% discount
+- Only input tokens are cached (output tokens never cached)
+
+---
+
+## Best Practices
+
+### When to Use Caching
+
+✅ **Good Use Cases:**
+- Large documents queried repeatedly (legal docs, research papers)
+- Video/audio files analyzed with different questions
+- Long system instructions used across many requests
+- Consistent context in multi-turn conversations
+
+❌ **Bad Use Cases:**
+- Single-use content (no benefit)
+- Frequently changing content
+- Short content (<1000 tokens) - minimal savings
+- Content used only once per day (cache might expire)
+
+### Optimization Tips
+
+1. **Cache Early**: Create cache at session start
+2. **Extend TTL**: Update before expiration if still needed
+3. **Monitor Usage**: Track how often cache is reused
+4. **Clean Up**: Delete unused caches to avoid clutter
+5. **Combine Features**: Use caching with code execution, grounding for powerful workflows
+
+### Cache Naming
+
+Use descriptive `displayName` for easy identification:
+
+```typescript
+// ✅ Good names
+displayName: 'financial-report-2024-q3'
+displayName: 'legal-contract-acme-corp'
+displayName: 'video-analysis-project-x'
+
+// ❌ Vague names
+displayName: 'cache1'
+displayName: 'test'
+```
+
+---
+
+## Troubleshooting
+
+### "Invalid model name" Error
+
+**Problem**: Using `gemini-2.5-flash` instead of `gemini-2.5-flash-001`
+
+**Solution**: Always use explicit version suffix:
+
+```typescript
+model: 'gemini-2.5-flash-001' // Correct
+```
+
+### Cache Expired Error
+
+**Problem**: Trying to use cache after TTL expired
+
+**Solution**: Check expiration before use or extend TTL proactively:
+
+```typescript
+const cache = await ai.caches.get({ name: cacheName });
+if (new Date(cache.expireTime) < new Date()) {
+  // Cache expired, recreate it
+  cache = await ai.caches.create({ ... });
+}
+```
+
+### High Costs Despite Caching
+
+**Problem**: Creating new cache for each request
+
+**Solution**: Reuse the same cache across multiple requests:
+
+```typescript
+// ❌ Wrong - creates new cache each time
+for (const query of queries) {
+  const cache = await ai.caches.create({ ... }); // Expensive!
+  const response = await ai.models.generateContent({ model: cache.name, ... });
+}
+
+// ✅ Correct - create once, use many times
+const cache = await ai.caches.create({ ... }); // Create once
+for (const query of queries) {
+  const response = await ai.models.generateContent({ model: cache.name, ... });
+}
+```
+
+---
+
+## References
+
+- Official Docs: https://ai.google.dev/gemini-api/docs/caching
+- Cost Optimization: See "Cost Optimization" in main SKILL.md
+- Templates: See `context-caching.ts` for working examples