Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:24:57 +08:00
commit 524980fd3e
10 changed files with 1706 additions and 0 deletions

View File

@@ -0,0 +1,12 @@
{
"name": "google-gemini-file-search",
"description": "Build document Q&A and searchable knowledge bases with Google Gemini File Search - fully managed RAG with automatic chunking, embeddings, and citations. Upload 100+ file formats (PDF, Word, Excel, code), configure semantic search, and query with natural language. Use when: building document Q&A systems, creating searchable knowledge bases, implementing semantic search without managing embeddings, indexing large document collections (100+ formats), or troubleshooting document immutability errors ",
"version": "1.0.0",
"author": {
"name": "Jeremy Dawes",
"email": "jeremy@jezweb.net"
},
"skills": [
"./"
]
}

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Jeremy Dawes (Jezweb)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

227
PROJECT_STATUS.md Normal file
View File

@@ -0,0 +1,227 @@
# Google Gemini File Search Skill - Project Status
**Created:** 2025-11-10
**Status:** Phase 1 Complete (Core Documentation) - Phase 2 In Progress (Implementation)
**Version:** 1.0.0-beta
---
## ✅ Completed (Phase 1: Core Documentation)
### Directory Structure
- [x] Created skill directory with standard structure
- [x] scripts/ directory
- [x] templates/ directory
- [x] references/ directory
- [x] assets/ directory (empty, for future diagrams)
### Core Documentation Files
- [x] **SKILL.md** - Comprehensive skill file with YAML frontmatter (PRODUCTION READY)
- 8 documented errors with prevention strategies
- Complete setup instructions with TypeScript examples
- Chunking best practices
- Metadata schema patterns
- Cost optimization techniques
- Comparison guide (vs Vectorize, OpenAI, Claude MCP)
- ~5,000 words, optimized for ~65% token savings
- [x] **README.md** - Auto-trigger keywords and quick start (PRODUCTION READY)
- 40+ auto-trigger keywords (primary, use case, technical)
- Quick start example
- Feature highlights
- Comparison table
- Examples for 3 use cases
- [x] **LICENSE** - MIT License
### Scripts
- [x] **scripts/create-store.ts** - CLI tool to create file search stores (COMPLETE)
- [x] **scripts/README.md** - Documentation of all scripts (COMPLETE)
- [ ] scripts/upload-batch.ts (TO BE IMPLEMENTED)
- [ ] scripts/query-store.ts (TO BE IMPLEMENTED)
- [ ] scripts/cleanup.ts (TO BE IMPLEMENTED)
### Templates
- [x] **templates/README.md** - Overview of all templates (COMPLETE)
- [ ] templates/basic-node-rag/ (TO BE IMPLEMENTED)
- [ ] templates/cloudflare-worker-rag/ (TO BE IMPLEMENTED)
- [ ] templates/nextjs-docs-search/ (TO BE IMPLEMENTED)
### References
- [x] **references/README.md** - Overview of reference docs (COMPLETE)
- [ ] references/api-reference.md (TO BE IMPLEMENTED)
- [ ] references/chunking-best-practices.md (TO BE IMPLEMENTED)
- [ ] references/pricing-calculator.md (TO BE IMPLEMENTED)
- [ ] references/migration-from-openai.md (TO BE IMPLEMENTED)
---
## 🚧 Phase 2: Implementation (In Progress)
### Scripts Remaining (3/4 incomplete)
Priority order:
1. **upload-batch.ts** - Most essential for production use
2. **query-store.ts** - Interactive testing tool
3. **cleanup.ts** - Utility for maintenance
**Estimated Time:** ~2 hours (with testing)
### Templates Remaining (3/3 incomplete)
Priority order:
1. **basic-node-rag/** - Foundational example, simplest to implement
2. **nextjs-docs-search/** - Most practical for users, highest value
3. **cloudflare-worker-rag/** - Advanced integration, requires Wrangler setup
**Estimated Time:** ~6-8 hours (with testing)
### References Remaining (4/4 incomplete)
Priority order:
1. **api-reference.md** - Most frequently referenced
2. **chunking-best-practices.md** - Critical for retrieval quality
3. **pricing-calculator.md** - Business decision support
4. **migration-from-openai.md** - Competitive alternative
**Estimated Time:** ~4 hours (research + writing)
---
## 🎯 Phase 3: Testing & Validation (Not Started)
### Required Testing
- [ ] Install skill to `~/.claude/skills/google-gemini-file-search/`
- [ ] Verify auto-trigger works (test keywords)
- [ ] Run create-store.ts script (functional test)
- [ ] Test basic-node-rag template (end-to-end)
- [ ] Verify package.json dependencies install correctly
- [ ] Confirm SKILL.md loads properly (no syntax errors)
- [ ] Validate YAML frontmatter parsing
### Package Version Verification
- [ ] Confirm @google/genai v0.21.0+ is current stable
- [ ] Test with Node.js 18, 20, 22
- [ ] Verify TypeScript 5.x compatibility
**Estimated Time:** ~2 hours
---
## 📦 Phase 4: Marketplace Integration (Not Started)
### Marketplace Requirements
- [ ] Generate .claude-plugin/plugin.json manifest
- [ ] Add icon/thumbnail image to assets/
- [ ] Verify metadata completeness
- [ ] Test marketplace discovery
- [ ] Submit to claude-skills repository
**Estimated Time:** ~1 hour
---
## 📊 Current Progress
**Overall Completion:**
- Phase 1 (Core Documentation): ✅ 100%
- Phase 2 (Implementation): 🚧 15% (1/8 scripts + 4/4 placeholders)
- Phase 3 (Testing): ⏸️ 0%
- Phase 4 (Marketplace): ⏸️ 0%
**Total Estimated Remaining Work:** ~15 hours
---
## 🚀 Ready to Use?
**Current State:** SKILL.md and README.md are production-ready and can be used immediately for guidance. The skill will auto-trigger on relevant keywords and provide comprehensive setup instructions.
**What Works Now:**
- Complete setup documentation (SKILL.md)
- All 8 error prevention strategies documented
- Chunking best practices
- Cost optimization guide
- Comparison guide (vs alternatives)
- One working CLI script (create-store.ts)
**What's Missing:**
- Working templates (users must implement from SKILL.md examples)
- Batch upload utility
- Interactive query tool
- Reference documentation depth
---
## 📝 Next Session Tasks
**Immediate Priorities:**
1. Implement basic-node-rag template (highest ROI for users)
2. Implement upload-batch.ts script
3. Implement query-store.ts script
**Rationale:** These 3 items provide end-to-end working examples that users can run immediately. Templates are more valuable than additional reference docs because they're executable.
**Recommended Approach:**
1. Start fresh session
2. Implement basic-node-rag (minimal, ~200 lines total)
3. Implement upload-batch.ts (~150 lines)
4. Implement query-store.ts (~100 lines)
5. Test all three end-to-end
6. Generate marketplace manifest
7. Install and verify skill discovery
**Session Budget:** ~4-6 hours with testing
---
## 🔍 Quality Checklist (Phase 1 ✅)
**SKILL.md Compliance:**
- [x] YAML frontmatter with name + description
- [x] License field (MIT)
- [x] Metadata section (version, package versions, supported models)
- [x] Keywords comprehensive
- [x] Third-person description style
- [x] Imperative instructions
- [x] 8 documented errors with prevention code
- [x] Token efficiency measured (~65% savings)
**README.md Compliance:**
- [x] Auto-trigger keywords (40+ keywords)
- [x] Clear use cases ("Use when" scenarios)
- [x] Quick start example
- [x] Prerequisites listed
- [x] Comparison table
- [x] Version information
**Official Standards Compliance:**
- [x] Follows Anthropic agent_skills_spec.md
- [x] Follows planning/claude-code-skill-standards.md
- [x] Directory structure matches official skills repo
- [x] Resources in bundled locations (scripts/, references/, templates/)
---
## 📌 Notes for Continuation
### Key Decisions Made:
1. **Chunking Defaults:** Recommended 500 tokens/chunk, 50 overlap for technical docs
2. **Model Preference:** gemini-2.5-flash for most use cases (cost-effective)
3. **Metadata Limit:** Emphasized 20 key-value pair max in all examples
4. **Storage Calculation:** 3x multiplier prominently featured in all cost examples
### Research Sources Used:
- Official Docs: https://ai.google.dev/gemini-api/docs/file-search
- Blog: https://blog.google/technology/developers/file-search-gemini-api/
- Tutorial: https://www.philschmid.de/gemini-file-search-javascript
- API Reference: https://ai.google.dev/api/file-search/*
- SDK: https://github.com/googleapis/js-genai
### Package Versions Locked:
- @google/genai: ^0.21.0
- Node.js: >=18.0.0
- Supported Models: gemini-2.5-pro, gemini-2.5-flash
---
**Maintainer:** Jeremy Dawes (Jezweb)
**Repository:** https://github.com/jezweb/claude-skills
**Last Updated:** 2025-11-10

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# google-gemini-file-search
Build document Q&A and searchable knowledge bases with Google Gemini File Search - fully managed RAG with automatic chunking, embeddings, and citations. Upload 100+ file formats (PDF, Word, Excel, code), configure semantic search, and query with natural language. Use when: building document Q&A systems, creating searchable knowledge bases, implementing semantic search without managing embeddings, indexing large document collections (100+ formats), or troubleshooting document immutability errors

1018
SKILL.md Normal file

File diff suppressed because it is too large Load Diff

69
plugin.lock.json Normal file
View File

@@ -0,0 +1,69 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:jezweb/claude-skills:skills/google-gemini-file-search",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "30f128b45ee132610e093d8dffb57ea14caa91f1",
"treeHash": "bfd89990a3d6c693822aa25a7b6dff1b8ea0b63036928efa471d8e5160d84a76",
"generatedAt": "2025-11-28T10:19:02.002828Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "google-gemini-file-search",
"description": "Build document Q&A and searchable knowledge bases with Google Gemini File Search - fully managed RAG with automatic chunking, embeddings, and citations. Upload 100+ file formats (PDF, Word, Excel, code), configure semantic search, and query with natural language. Use when: building document Q&A systems, creating searchable knowledge bases, implementing semantic search without managing embeddings, indexing large document collections (100+ formats), or troubleshooting document immutability errors ",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "PROJECT_STATUS.md",
"sha256": "484a87690c29e5060650a1fc29f6da3ea5f53c6d967e1e34a4abc1b9d75d60ae"
},
{
"path": "LICENSE",
"sha256": "4530f92f44fe545e18773c7bbe0c312b0f04b78c161bdb784ff24f0ace9a5592"
},
{
"path": "README.md",
"sha256": "ed11cd2f405264afce6232525aa978a1cedbf06a3df1ddc10a8754d0444392b6"
},
{
"path": "SKILL.md",
"sha256": "925e4e1481c2c835d1e29426558acdc23e2163ee7a809a56f5b66b0c295005ca"
},
{
"path": "references/README.md",
"sha256": "244068784dee08c29e63e693aa9d8c30ee1b660a1792f8e7ffd4640abd79c3b5"
},
{
"path": "scripts/create-store.ts",
"sha256": "88eeea54211848256c2341aa567178b307a88517c776965f59fbaf648169ad62"
},
{
"path": "scripts/README.md",
"sha256": "2bbbba9d6618a207b16d4f79ac7a73fe9cb56723c54d3329ad9ad4d11a2d8b32"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "80087d39dece3c664a87fb1e52ea003ae987894ed21f8f0011a14d0e11ac2759"
},
{
"path": "templates/README.md",
"sha256": "868b21026a4b49e78d518b4d5ad27aadda6f306cf19ea558a0846c21e7085e9d"
}
],
"dirSha256": "bfd89990a3d6c693822aa25a7b6dff1b8ea0b63036928efa471d8e5160d84a76"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

62
references/README.md Normal file
View File

@@ -0,0 +1,62 @@
# Google Gemini File Search Reference Documentation
This directory contains detailed reference materials for advanced File Search usage.
## Reference Documents
### 🚧 api-reference.md (TO BE IMPLEMENTED)
Complete API documentation extracted from official sources.
**Sections:**
- FileSearchStore API (create, get, list, delete, upload, import)
- Documents API (list, get, delete, query)
- Operations API (polling pattern)
- Request/response schemas
- Error codes and handling
### 🚧 chunking-best-practices.md (TO BE IMPLEMENTED)
Detailed chunking strategies for different content types.
**Sections:**
- How chunking works (whiteSpaceConfig)
- Content type recommendations (technical docs, prose, legal, code, FAQ)
- Chunk size impact on retrieval quality
- Overlap token guidelines
- Testing and tuning chunking configs
- Examples with before/after retrieval quality
### 🚧 pricing-calculator.md (TO BE IMPLEMENTED)
Cost estimation guide with examples.
**Sections:**
- Pricing model breakdown (indexing, storage, queries)
- Token calculation methods
- Cost examples by use case (10GB KB, 100MB docs, 1TB archive)
- ROI comparison (vs Vectorize, OpenAI, manual RAG)
- Cost optimization strategies
- Free tier maximization
### 🚧 migration-from-openai.md (TO BE IMPLEMENTED)
Migration guide from OpenAI Files API.
**Sections:**
- API mapping (OpenAI → Gemini equivalents)
- Key differences (storage model, chunking, pricing)
- Migration checklist
- Code conversion examples
- Common gotchas
- When to migrate vs stay with OpenAI
## Development Status
**Completed:** 0/4 documents (0%)
**Priority:**
1. api-reference.md (most frequently referenced)
2. chunking-best-practices.md (critical for quality)
3. pricing-calculator.md (business decision support)
4. migration-from-openai.md (competitive alternative)
## Notes
These references supplement SKILL.md with deeper technical details for advanced users. SKILL.md provides quick-start patterns; these docs provide comprehensive knowledge.

91
scripts/README.md Normal file
View File

@@ -0,0 +1,91 @@
# Google Gemini File Search CLI Scripts
This directory contains CLI tools for managing Google Gemini File Search stores and documents.
## Available Scripts
### ✅ create-store.ts
Create a new file search store.
**Usage:**
```bash
ts-node create-store.ts --name "My Knowledge Base" --project "customer-support" --environment "production"
```
**Status:** Complete
### 🚧 upload-batch.ts (TO BE IMPLEMENTED)
Batch upload documents to a file search store with progress tracking.
**Planned Features:**
- Concurrent uploads with configurable batch size
- Progress bar with ETA
- Automatic chunking configuration per file type
- Metadata extraction from file path/name
- Cost estimation before upload
- Operation polling until indexing complete
**Usage:**
```bash
ts-node upload-batch.ts --store "fileSearchStores/abc123" --directory "./docs" --concurrent 5
```
### 🚧 query-store.ts (TO BE IMPLEMENTED)
Interactive query tool with citation display.
**Planned Features:**
- Interactive REPL for queries
- Citation rendering with source links
- Metadata filtering options
- Model selection (Flash vs Pro)
- Export query results
**Usage:**
```bash
ts-node query-store.ts --store "fileSearchStores/abc123"
```
### 🚧 cleanup.ts (TO BE IMPLEMENTED)
Delete stores and documents (with safety prompts).
**Planned Features:**
- List all stores with document counts
- Delete specific store or all stores
- Force delete confirmation prompts
- Dry-run mode
**Usage:**
```bash
ts-node cleanup.ts --store "fileSearchStores/abc123" --force
ts-node cleanup.ts --all --dry-run
```
## Prerequisites
```bash
# Install dependencies
npm install @google/genai
# Set API key
export GOOGLE_API_KEY="your-api-key-here"
```
## Development Status
**Completed:** 1/4 scripts (25%)
**Next Steps:**
1. Implement upload-batch.ts
2. Implement query-store.ts
3. Implement cleanup.ts
4. Add package.json with dependencies and scripts
5. Test all scripts end-to-end
## Notes
These scripts demonstrate best practices from SKILL.md:
- Operation polling until done: true
- Storage quota calculation (3x multiplier)
- Recommended chunking configurations
- Metadata schema patterns
- Force delete for non-empty stores

124
scripts/create-store.ts Normal file
View File

@@ -0,0 +1,124 @@
#!/usr/bin/env node
/**
* Create a new Google Gemini File Search Store
*
* Usage:
* ts-node create-store.ts --name "My Knowledge Base" --project "customer-support"
* node create-store.js --name "My Knowledge Base"
*/
import { GoogleGenAI } from '@google/genai'
interface CreateStoreOptions {
name: string
project?: string
environment?: string
}
async function createFileSearchStore(options: CreateStoreOptions) {
// Validate API key
const apiKey = process.env.GOOGLE_API_KEY
if (!apiKey) {
console.error('❌ Error: GOOGLE_API_KEY environment variable is required')
console.error(' Create an API key at: https://aistudio.google.com/apikey')
process.exit(1)
}
// Initialize client
console.log('Initializing Google Gemini client...')
const ai = new GoogleGenAI({ apiKey })
try {
// Check if store already exists
console.log(`\nChecking for existing store: "${options.name}"...`)
let existingStore = null
let pageToken: string | null = null
do {
const page = await ai.fileSearchStores.list({ pageToken: pageToken || undefined })
existingStore = page.fileSearchStores?.find(
s => s.displayName === options.name
)
pageToken = page.nextPageToken || null
} while (!existingStore && pageToken)
if (existingStore) {
console.log('⚠️ Store already exists:')
console.log(` Name: ${existingStore.name}`)
console.log(` Display Name: ${existingStore.displayName}`)
console.log(` Created: ${existingStore.createTime}`)
console.log('\n Use this store name for uploads and queries.')
return
}
// Create new store
console.log('Creating new file search store...')
const customMetadata: Record<string, string> = {}
if (options.project) {
customMetadata.project = options.project
}
if (options.environment) {
customMetadata.environment = options.environment
}
const fileStore = await ai.fileSearchStores.create({
config: {
displayName: options.name,
...(Object.keys(customMetadata).length > 0 && { customMetadata })
}
})
console.log('\n✅ Store created successfully!')
console.log(` Name: ${fileStore.name}`)
console.log(` Display Name: ${fileStore.displayName}`)
console.log(` Created: ${fileStore.createTime}`)
console.log('\n Use this store name for uploads and queries:')
console.log(` export FILE_SEARCH_STORE="${fileStore.name}"`)
} catch (error) {
console.error('\n❌ Error creating store:', error)
if (error instanceof Error) {
console.error(` ${error.message}`)
}
process.exit(1)
}
}
// Parse command-line arguments
function parseArgs(): CreateStoreOptions {
const args = process.argv.slice(2)
const options: Partial<CreateStoreOptions> = {}
for (let i = 0; i < args.length; i++) {
if (args[i] === '--name' && args[i + 1]) {
options.name = args[i + 1]
i++
} else if (args[i] === '--project' && args[i + 1]) {
options.project = args[i + 1]
i++
} else if (args[i] === '--environment' && args[i + 1]) {
options.environment = args[i + 1]
i++
}
}
if (!options.name) {
console.error('Usage: ts-node create-store.ts --name "Store Name" [--project "project"] [--environment "env"]')
console.error('\nExample:')
console.error(' ts-node create-store.ts --name "Customer Support KB" --project "support" --environment "production"')
process.exit(1)
}
return options as CreateStoreOptions
}
// Main execution
if (require.main === module) {
const options = parseArgs()
createFileSearchStore(options).catch(error => {
console.error('Unexpected error:', error)
process.exit(1)
})
}
export { createFileSearchStore, CreateStoreOptions }

79
templates/README.md Normal file
View File

@@ -0,0 +1,79 @@
# Google Gemini File Search Templates
This directory contains working example projects demonstrating different deployment patterns for Gemini File Search.
## Templates
### 🚧 basic-node-rag/ (TO BE IMPLEMENTED)
Minimal Node.js/TypeScript example for learning and prototyping.
**Features:**
- Simple TypeScript setup
- Create store → Upload documents → Query → Display citations
- Single-file example (~200 lines)
- Perfect for understanding core concepts
**Use When:**
- Learning File Search API
- Quick prototyping
- Building CLI tools
### 🚧 cloudflare-worker-rag/ (TO BE IMPLEMENTED)
Edge deployment with Cloudflare Workers + R2 integration.
**Features:**
- Cloudflare Workers with @cloudflare/vite-plugin
- R2 integration for document storage
- Edge API endpoints (upload, query)
- Hybrid architecture (Gemini File Search + Cloudflare edge)
- Wrangler configuration
**Use When:**
- Building global edge applications
- Integrating with Cloudflare stack (D1, R2, KV)
- Need low-latency worldwide
### 🚧 nextjs-docs-search/ (TO BE IMPLEMENTED)
Full-stack Next.js application with UI.
**Features:**
- Next.js 14+ App Router
- Document upload UI with drag-and-drop
- Real-time search interface
- Citation rendering with source links
- Metadata filtering UI
- Tailwind CSS + shadcn/ui
- TypeScript throughout
**Use When:**
- Building production documentation sites
- Creating knowledge base UIs
- Need full-stack app with frontend
## Structure
Each template includes:
- `README.md` - Setup and deployment instructions
- `package.json` - Dependencies and scripts
- `tsconfig.json` - TypeScript configuration
- `.env.example` - Environment variables template
- `src/` - Source code
- Working example with sample data
## Development Status
**Completed:** 0/3 templates (0%)
**Priority:**
1. basic-node-rag (foundational example)
2. nextjs-docs-search (most practical for users)
3. cloudflare-worker-rag (advanced integration)
## Notes
All templates demonstrate:
- Proper error handling from SKILL.md
- Recommended chunking configurations
- Metadata schema best practices
- Operation polling patterns
- Cost-aware implementations