Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:23:15 +08:00
commit 962cc0c926
9 changed files with 1076 additions and 0 deletions

View File

@@ -0,0 +1,14 @@
{
"name": "gemini-file-search-tool",
"description": "CLI tool for managing Gemini File Search stores with fully managed RAG",
"version": "0.1.0",
"author": {
"name": "Dennis Vriend"
},
"skills": [
"./skills"
],
"commands": [
"./commands"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# gemini-file-search-tool
CLI tool for managing Gemini File Search stores with fully managed RAG

34
commands/create-store.md Normal file
View File

@@ -0,0 +1,34 @@
---
description: Create a new Gemini File Search store
argument-hint: store-name
---
Create a new Gemini File Search store for document storage and RAG queries.
## Usage
```bash
gemini-file-search-tool create-store "STORE_NAME" [--display-name NAME]
```
## Arguments
- `STORE_NAME`: Store identifier (required)
- `--display-name NAME`: Human-readable display name (optional)
## Examples
```bash
# Create store with auto-generated display name
gemini-file-search-tool create-store "research-papers"
# Create store with custom display name
gemini-file-search-tool create-store "docs" --display-name "Project Documentation"
# Capture output as JSON
gemini-file-search-tool create-store "code" | jq '.name'
```
## Output
Returns JSON with store name, display_name, create_time, and update_time.

33
commands/list-stores.md Normal file
View File

@@ -0,0 +1,33 @@
---
description: List all available File Search stores
argument-hint: (no arguments)
---
List all Gemini File Search stores with their metadata.
## Usage
```bash
gemini-file-search-tool list-stores [-v]
```
## Arguments
- `-v`: Verbose logging (optional)
## Examples
```bash
# List all stores
gemini-file-search-tool list-stores
# List with verbose logging
gemini-file-search-tool list-stores -v
# Filter with jq
gemini-file-search-tool list-stores | jq '.[] | select(.display_name | contains("docs"))'
```
## Output
Returns JSON array of stores with name, display_name, create_time, and update_time.

40
commands/query.md Normal file
View File

@@ -0,0 +1,40 @@
---
description: Query store with natural language RAG
argument-hint: prompt --store name
---
Query a Gemini File Search store with natural language using RAG (Retrieval-Augmented Generation).
## Usage
```bash
gemini-file-search-tool query "PROMPT" --store "STORE_NAME" [OPTIONS]
```
## Arguments
- `PROMPT`: Natural language query (required, positional)
- `--store NAME`: Store to query (required)
- `--query-model {flash|pro}`: Query model (default: flash)
- `--query-grounding-metadata`: Include source citations
- `--show-cost`: Show token usage and cost
- `-v/-vv/-vvv`: Verbosity levels
## Examples
```bash
# Basic query
gemini-file-search-tool query "What is DORA?" --store "papers"
# Query with citations and cost tracking
gemini-file-search-tool query "How does authentication work?" \
--store "codebase" --query-grounding-metadata --show-cost -v
# Query with Pro model
gemini-file-search-tool query "Explain the architecture" \
--store "docs" --query-model pro
```
## Output
Returns JSON with response_text, usage_metadata, grounding_metadata (if requested), and estimated_cost (if requested).

36
commands/sync-cache.md Normal file
View File

@@ -0,0 +1,36 @@
---
description: Sync pending upload operations to cache
argument-hint: --store name
---
Synchronize pending upload operations and update cache with final status.
## Usage
```bash
gemini-file-search-tool sync-cache --store "STORE_NAME" [OPTIONS]
```
## Arguments
- `--store NAME`: Store name (required)
- `--num-workers N`: Parallel workers (default: 4)
- `--text`: Human-readable output (default: JSON)
- `-v`: Verbose logging
## Examples
```bash
# Sync with default workers
gemini-file-search-tool sync-cache --store "papers"
# Sync with 8 parallel workers
gemini-file-search-tool sync-cache --store "codebase" --num-workers 8 -v
# Human-readable output
gemini-file-search-tool sync-cache --store "docs" --text
```
## Output
Returns JSON with synced/failed/pending counts and operation details (or text format with --text).

41
commands/upload.md Normal file
View File

@@ -0,0 +1,41 @@
---
description: Upload documents to store with caching
argument-hint: files... --store name
---
Upload files to a Gemini File Search store with intelligent caching and glob support.
## Usage
```bash
gemini-file-search-tool upload FILES... --store "STORE_NAME" [OPTIONS]
```
## Arguments
- `FILES`: File paths or glob patterns (required)
- `--store NAME`: Target store name (required)
- `--num-workers N`: Concurrent workers (default: CPU cores)
- `--no-wait`: Async upload without polling
- `--rebuild-cache`: Force re-upload all files
- `-v/-vv/-vvv`: Verbosity levels
## Examples
```bash
# Upload single file
gemini-file-search-tool upload document.pdf --store "papers"
# Upload with glob pattern
gemini-file-search-tool upload "docs/**/*.md" --store "documentation" -v
# Upload codebase for Code-RAG
gemini-file-search-tool upload "src/**/*.py" --store "my-codebase" -v
# Async upload with 8 workers
gemini-file-search-tool upload "*.pdf" --store "papers" --no-wait --num-workers 8
```
## Output
Returns JSON array with upload status for each file (completed, skipped, pending, failed).

65
plugin.lock.json Normal file
View File

@@ -0,0 +1,65 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:dnvriend/gemini-file-search-tool:plugins/gemini-file-search-tool",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "775738ea02c7b4a60ddf2ab33ef233908f8f648d",
"treeHash": "59b75c57d78a7ea6702adb1afcf39c9966f7af7a34f5d24c932b598383bc7769",
"generatedAt": "2025-11-28T10:16:35.283387Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "gemini-file-search-tool",
"description": "CLI tool for managing Gemini File Search stores with fully managed RAG",
"version": "0.1.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "f318b83982a29ac6c54ea8f56381725c1735d167c75d8e9ab403ff5b8eb2fa66"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "82968a67ed6150aab8651e6213ebf56e2056990a35fcf4cf0b67d87ada345aa5"
},
{
"path": "commands/sync-cache.md",
"sha256": "4c43c73ba90fbe68fb45fe71cc19f4c8a17fd44a997699c705e596f14b11b367"
},
{
"path": "commands/create-store.md",
"sha256": "fc7a0d901ca119c7528294a8b2f38286f52bc08929e1dfbddd864c0781fa602e"
},
{
"path": "commands/list-stores.md",
"sha256": "1c996a856eee537c3d840c63466edf28f6a7b07d96252c8d81dc8c0830c574b0"
},
{
"path": "commands/upload.md",
"sha256": "bdd85a995d35e8e7e4d9c6df9498c4cd0b6de62281916102ee00f6d46c5e738d"
},
{
"path": "commands/query.md",
"sha256": "4abc12c49191999b365f5b8d1e9757c7227f6b674abdadd8654118f6a0bd44b7"
},
{
"path": "skills/gemini-file-search-tool/SKILL.md",
"sha256": "16704e282d444753294dab1b503242d7558a48c8d6902abcd3f0d16b4b8e8e39"
}
],
"dirSha256": "59b75c57d78a7ea6702adb1afcf39c9966f7af7a34f5d24c932b598383bc7769"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,810 @@
---
name: skill-gemini-file-search-tool
description: Manage Gemini RAG stores with Code-RAG support
---
# When to use
- Managing Gemini File Search stores and documents
- Uploading documents for RAG queries (including codebases)
- Querying stores with natural language
- Building Code-RAG systems for semantic code search
# Gemini File Search Tool Skill
## Purpose
Comprehensive guide for managing Google's fully managed RAG (Retrieval-Augmented Generation) system using the `gemini-file-search-tool` CLI. This tool eliminates the complexity of vector databases, embeddings, and retrieval infrastructure by providing a production-ready interface to Gemini File Search.
## When to Use This Skill
**Use this skill when:**
- Creating and managing Gemini File Search stores
- Uploading documents for semantic search and RAG queries
- Building Code-RAG systems (semantic code search with natural language)
- Querying document stores with natural language
- Managing upload caches and operation status
- Implementing document search in AI applications
**Do NOT use this skill for:**
- General document processing (use document-skills instead)
- File system operations (use Read/Write tools)
- Cloud infrastructure management (use AWS/GCP CLIs)
## CLI Tool: gemini-file-search-tool
Production-ready CLI and Python library for Google's Gemini File Search API, a fully managed RAG system that automatically handles document ingestion, chunking, embedding generation, and semantic retrieval with zero infrastructure overhead.
### Installation
```bash
# Clone repository
git clone https://github.com/dnvriend/gemini-file-search-tool.git
cd gemini-file-search-tool
# Install globally with uv
uv tool install .
# Verify installation
gemini-file-search-tool --help
```
### Prerequisites
**Authentication (Choose one):**
**Option 1: Gemini Developer API (Recommended for development)**
```bash
export GEMINI_API_KEY="your-api-key-here"
# Or
export GOOGLE_API_KEY="your-api-key-here"
```
Get API key from: https://aistudio.google.com/apikey
**Option 2: Vertex AI (For production)**
```bash
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"
```
### Quick Start
```bash
# Create a store
gemini-file-search-tool create-store "my-docs"
# Upload documents
gemini-file-search-tool upload "*.pdf" --store "my-docs" -v
# Query with natural language
gemini-file-search-tool query "What is the main topic?" --store "my-docs" --show-cost
```
## Progressive Disclosure
<details>
<summary><strong>📖 Store Management Commands (Click to expand)</strong></summary>
### create-store - Create New Store
Create a new Gemini File Search store for document storage and RAG queries.
**Usage:**
```bash
gemini-file-search-tool create-store "STORE_NAME" [--display-name NAME] [-v]
```
**Arguments:**
- `STORE_NAME`: Store identifier (required, positional)
- `--display-name NAME`: Human-readable display name (optional)
- `-v`: Verbose logging
**Examples:**
```bash
# Create store with auto-generated display name
gemini-file-search-tool create-store "research-papers"
# Create store with custom display name
gemini-file-search-tool create-store "docs" --display-name "Project Documentation"
# Capture output for processing
gemini-file-search-tool create-store "code" | jq '.name'
```
**Output:**
```json
{
"name": "fileSearchStores/abc123",
"display_name": "research-papers",
"create_time": "2025-11-20T10:30:00Z",
"update_time": "2025-11-20T10:30:00Z"
}
```
---
### list-stores - List All Stores
List all available Gemini File Search stores with their metadata.
**Usage:**
```bash
gemini-file-search-tool list-stores [-v]
```
**Arguments:**
- `-v`: Verbose logging
**Examples:**
```bash
# List all stores
gemini-file-search-tool list-stores
# List with verbose logging
gemini-file-search-tool list-stores -v
# Filter stores with jq
gemini-file-search-tool list-stores | jq '.[] | select(.display_name | contains("docs"))'
# Count stores
gemini-file-search-tool list-stores | jq 'length'
```
**Output:**
```json
[
{
"name": "fileSearchStores/abc123",
"display_name": "research-papers",
"create_time": "2025-11-20T10:30:00Z",
"update_time": "2025-11-20T10:30:00Z"
}
]
```
---
### get-store - Get Store Details
Get detailed information about a specific store.
**Usage:**
```bash
gemini-file-search-tool get-store "STORE_NAME" [-v]
```
**Arguments:**
- `STORE_NAME`: Store name/ID (accepts display names, IDs, or full resource names)
- `-v`: Verbose logging
**Examples:**
```bash
# Get by display name
gemini-file-search-tool get-store "research-papers"
# Get by ID
gemini-file-search-tool get-store "abc123"
# Extract specific field
gemini-file-search-tool get-store "docs" | jq '.create_time'
```
---
### update-store - Update Store
Update a store's display name or metadata.
**Usage:**
```bash
gemini-file-search-tool update-store "STORE_NAME" --display-name "NEW_NAME" [-v]
```
**Examples:**
```bash
# Rename store
gemini-file-search-tool update-store "docs" --display-name "Production Documentation"
```
---
### delete-store - Delete Store
Delete a Gemini File Search store and all its documents.
**Usage:**
```bash
gemini-file-search-tool delete-store "STORE_NAME" [--force] [-v]
```
**Arguments:**
- `STORE_NAME`: Store to delete (required)
- `--force`: Skip confirmation prompt
- `-v`: Verbose logging
**Examples:**
```bash
# Delete with confirmation
gemini-file-search-tool delete-store "old-docs"
# Delete without confirmation
gemini-file-search-tool delete-store "temp-store" --force
```
**Note:** Shows cache statistics before deletion and automatically removes cache file after successful deletion.
</details>
<details>
<summary><strong>📁 Document Management Commands (Click to expand)</strong></summary>
### upload - Upload Documents
Upload files to a Gemini File Search store with intelligent caching, glob support, and parallel processing.
**Usage:**
```bash
gemini-file-search-tool upload FILES... --store "STORE_NAME" [OPTIONS]
```
**Arguments:**
- `FILES`: File paths or glob patterns (required, positional)
- `--store NAME`: Target store name (required)
- `--title TEXT`: Custom metadata title (optional)
- `--url URL`: Custom metadata URL (optional)
- `--max-tokens N`: Max tokens per chunk (default: 200)
- `--max-overlap N`: Max overlap tokens (default: 20)
- `--num-workers N`: Concurrent workers (default: CPU cores)
- `--skip-validation`: Skip file validation checks
- `--ignore-gitignore`: Ignore .gitignore patterns
- `--dry-run`: Preview files without uploading
- `--rebuild-cache`: Force re-upload all files
- `--no-wait`: Async upload without polling
- `-v/-vv/-vvv`: Verbosity (INFO/DEBUG/TRACE)
**Key Features:**
- **Intelligent Caching**: Automatically skips unchanged files using mtime-based optimization (O(1) check)
- **Glob Patterns**: Supports `*.pdf`, `docs/**/*.md`, `src/**/*.py`
- **Gitignore Support**: Respects `.gitignore` patterns by default
- **Parallel Processing**: Concurrent uploads with configurable workers
- **System File Filtering**: Auto-skips `__pycache__`, `.pyc`, `.DS_Store`
- **Async Mode**: Fire-and-forget uploads with `--no-wait`
**Examples:**
```bash
# Upload single file
gemini-file-search-tool upload document.pdf --store "papers"
# Upload multiple files
gemini-file-search-tool upload doc1.pdf doc2.pdf --store "papers"
# Upload with glob pattern
gemini-file-search-tool upload "*.pdf" --store "papers" -v
# Upload recursive with markdown files
gemini-file-search-tool upload "docs/**/*.md" --store "documentation" -v
# Upload codebase for Code-RAG
gemini-file-search-tool upload "src/**/*.py" --store "my-codebase" -v
# Async upload with 8 workers
gemini-file-search-tool upload "*.pdf" --store "papers" --no-wait --num-workers 8
# Dry-run to preview files
gemini-file-search-tool upload "**/*.py" --store "code" --dry-run -v
# Rebuild cache (force re-upload)
gemini-file-search-tool upload "**/*.py" --store "code" --rebuild-cache
```
**Output:**
```json
[
{"file": "doc1.pdf", "status": "completed", "document": {"name": "documents/123"}},
{"file": "doc2.pdf", "status": "skipped", "reason": "Already exists"},
{"file": "doc3.pdf", "status": "pending", "operation": "operations/456"}
]
```
---
### list-documents - List Documents in Store
List all documents in a Gemini File Search store.
**Usage:**
```bash
gemini-file-search-tool list-documents --store "STORE_NAME" [-v/-vv/-vvv]
```
**Arguments:**
- `--store NAME`: Store name (required)
- `-v/-vv/-vvv`: Verbosity levels
**Examples:**
```bash
# List all documents
gemini-file-search-tool list-documents --store "papers"
# With verbose logging
gemini-file-search-tool list-documents --store "papers" -vv
# Count documents
gemini-file-search-tool list-documents --store "papers" | jq 'length'
```
**Note:** Uses REST API workaround due to SDK bug #1661. Only works with Developer API (not Vertex AI).
</details>
<details>
<summary><strong>🔍 Query Commands (Click to expand)</strong></summary>
### query - Natural Language RAG Queries
Query a Gemini File Search store with natural language using Retrieval-Augmented Generation.
**Usage:**
```bash
gemini-file-search-tool query "PROMPT" --store "STORE_NAME" [OPTIONS]
```
**Arguments:**
- `PROMPT`: Natural language query (required, positional)
- `--store NAME`: Store to query (required)
- `--query-model {flash|pro}`: Model for RAG query (default: flash)
- `--query-grounding-metadata`: Include source citations in response
- `--metadata-filter "key=value"`: Filter documents by metadata
- `--show-cost`: Include token usage and cost estimation
- `-v/-vv/-vvv`: Verbosity (INFO/DEBUG/TRACE)
**Enhancement Options (Use Sparingly):**
- `--enhance-mode {generic|code-rag|obsidian}`: Query optimization mode (disabled by default)
- `--enhancement-model {flash|pro}`: Model for enhancement (default: flash)
- `--show-enhancement`: Display enhanced query
- `--dry-run-enhancement`: Preview enhancement without executing query
**Examples:**
```bash
# Basic query
gemini-file-search-tool query "What is DORA?" --store "papers"
# Query with citations and cost tracking
gemini-file-search-tool query "How does authentication work?" \
--store "codebase" --query-grounding-metadata --show-cost -v
# Query with Pro model
gemini-file-search-tool query "Explain the architecture" \
--store "docs" --query-model pro
# Code-RAG query (semantic code search)
gemini-file-search-tool query "Where is error handling implemented?" \
--store "my-codebase" --query-grounding-metadata
# With metadata filtering
gemini-file-search-tool query "Python best practices" \
--store "docs" --metadata-filter "language=python"
```
**Output:**
```json
{
"response_text": "DORA stands for DevOps Research and Assessment...",
"usage_metadata": {
"prompt_token_count": 150,
"candidates_token_count": 320,
"total_token_count": 470
},
"grounding_metadata": {
"grounding_chunks": [
{
"retrieved_context": {
"title": "/path/to/doc.pdf",
"text": "Relevant excerpt..."
}
}
]
},
"estimated_cost": {
"total_cost_usd": 0.00010725,
"model": "gemini-2.5-flash"
}
}
```
**Important Notes:**
- **Enhancement is disabled by default**: Benchmarks show simple queries work best for RAG
- **Flash model recommended**: 12x more cost-efficient than Pro, adequate for RAG
- Only use `--enhance-mode` for vague or poorly worded queries
</details>
<details>
<summary><strong>💾 Cache Management Commands (Click to expand)</strong></summary>
### sync-cache - Sync Pending Operations
Synchronize pending upload operations and update cache with final status.
**Usage:**
```bash
gemini-file-search-tool sync-cache --store "STORE_NAME" [OPTIONS]
```
**Arguments:**
- `--store NAME`: Store name (required)
- `--num-workers N`: Parallel workers (default: 4)
- `--text`: Human-readable output (default: JSON)
- `-v`: Verbose logging
**Examples:**
```bash
# Sync with default workers
gemini-file-search-tool sync-cache --store "papers"
# Sync with 8 parallel workers
gemini-file-search-tool sync-cache --store "codebase" --num-workers 8 -v
# Human-readable output
gemini-file-search-tool sync-cache --store "docs" --text
```
**When to Use:**
- After `upload --no-wait` operations
- To check status of pending uploads
- To update cache with final document IDs
---
### flush-cache - Clear Cache File
Delete the cache file for a specific store to start fresh.
**Usage:**
```bash
gemini-file-search-tool flush-cache --store "STORE_NAME" [--force] [-v]
```
**Arguments:**
- `--store NAME`: Store name (required)
- `--force`: Skip confirmation prompt
- `-v`: Verbose logging
**Examples:**
```bash
# Flush with confirmation
gemini-file-search-tool flush-cache --store "old-docs"
# Flush without confirmation
gemini-file-search-tool flush-cache --store "temp" --force
```
**When to Use:**
- Before rebuilding cache
- When cache is corrupted
- For clean slate troubleshooting
---
### cache-report - Generate Cache Status Report
Show cache statistics and operation status.
**Usage:**
```bash
gemini-file-search-tool cache-report --store "STORE_NAME" [FILTERS] [OPTIONS]
```
**Arguments:**
- `--store NAME`: Store name (required)
- `--pending-only`: Show only pending operations
- `--errors-only`: Show only failed operations
- `--completed-only`: Show only completed operations
- `--all`: Show all cached files
- `--text`: Human-readable output (default: JSON)
- `-v`: Verbose logging
**Examples:**
```bash
# Show summary + pending operations
gemini-file-search-tool cache-report --store "papers"
# Show only failed operations
gemini-file-search-tool cache-report --store "docs" --errors-only
# Show all cached files
gemini-file-search-tool cache-report --store "code" --all --text
```
</details>
<details>
<summary><strong>💻 Code-RAG: Semantic Code Search (Click to expand)</strong></summary>
### What is Code-RAG?
Code-RAG (Retrieval-Augmented Generation for Code) enables uploading entire codebases and querying them with natural language. This is powerful for:
- **Codebase Onboarding**: New developers ask questions about architecture
- **Code Discovery**: Find implementations without grepping
- **Architecture Analysis**: Understand design patterns and structure
- **Documentation Generation**: Generate contextual docs from code
- **AI Coding Assistants**: Build agents that answer codebase questions
### Code-RAG Workflow
**Step 1: Upload Codebase**
```bash
# Upload all Python files
gemini-file-search-tool upload "src/**/*.py" --store "my-project" -v
# Upload multiple languages
gemini-file-search-tool upload "src/**/*.{py,js,go}" --store "polyglot-project" -v
# Upload with custom chunking
gemini-file-search-tool upload "**/*.py" --store "large-project" \
--max-tokens 500 --max-overlap 50 --num-workers 8
```
**Step 2: Query with Natural Language**
```bash
# Architectural questions
gemini-file-search-tool query "How does the authentication system work?" \
--store "my-project" --query-grounding-metadata -v
# Implementation discovery
gemini-file-search-tool query "Where is error handling for API calls implemented?" \
--store "my-project" --show-cost
# Design pattern analysis
gemini-file-search-tool query "What design patterns are used in this codebase?" \
--store "my-project" --query-model pro
```
**Step 3: Get Source References**
```bash
# Query with full citations
gemini-file-search-tool query "Explain the database layer" \
--store "my-project" --query-grounding-metadata | \
jq '.grounding_metadata.grounding_chunks[].retrieved_context.title'
```
### Code-RAG Best Practices
1. **Comprehensive Docstrings**: Well-documented code provides better semantic search results
2. **Modular Structure**: Clear file/function organization improves retrieval accuracy
3. **Use Flash Model**: Cost-effective and sufficient for code search
4. **Enable Grounding**: Always use `--query-grounding-metadata` to verify sources
5. **Gitignore Support**: Automatically skips `__pycache__`, build artifacts, etc.
### Meta Note
This tool itself was built using Code-RAG! During development, we uploaded the codebase to a Gemini File Search store and queried it to understand implementation details, find bugs, and plan features. The tool enables the very functionality it provides.
</details>
<details>
<summary><strong>⚙️ Advanced Features (Click to expand)</strong></summary>
### Intelligent Caching System
**Location:** `~/.config/gemini-file-search-tool/stores/`
**How it Works:**
- Tracks uploaded files per store in JSON cache files
- Uses mtime-based optimization (O(1) check before O(n) hash)
- For 1000 unchanged files: ~0.1s (mtime) vs ~5-10s (full hash)
- Automatically skips unchanged files on re-upload
- Per-store isolation prevents cache conflicts
**Cache Structure:**
```json
{
"/absolute/path/to/file.py": {
"hash": "sha256-hash",
"mtime": 1731969000.0,
"remote_id": "documents/123",
"last_uploaded": "2025-11-18T22:30:00Z"
}
}
```
**Cache States:**
- **Completed**: Has `remote_id`, file successfully uploaded
- **Pending**: Has `operation`, async upload in progress
- **Missing**: Not in cache, needs upload
---
### Parallel Processing
**Upload Concurrency:**
```bash
# Default: Uses CPU core count
gemini-file-search-tool upload "*.pdf" --store "papers"
# Custom workers
gemini-file-search-tool upload "*.pdf" --store "papers" --num-workers 8
```
**Sync Concurrency:**
```bash
# Parallel operation status checking
gemini-file-search-tool sync-cache --store "papers" --num-workers 8
```
---
### Cost Tracking
**Query Cost Estimation:**
```bash
gemini-file-search-tool query "What is this?" --store "docs" --show-cost
```
**Output:**
```json
{
"estimated_cost": {
"input_cost_usd": 0.00001125,
"output_cost_usd": 0.000096,
"total_cost_usd": 0.00010725,
"model": "gemini-2.5-flash"
}
}
```
**Current Pricing (2025-01):**
- **Flash**: $0.075 input / $0.30 output per 1M tokens
- **Pro**: $1.25 input / $5.00 output per 1M tokens
---
### Authentication Methods
**Developer API (Recommended for Development):**
```bash
export GEMINI_API_KEY="AIza..."
gemini-file-search-tool list-stores
```
**Vertex AI (Recommended for Production):**
```bash
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"
gemini-file-search-tool list-stores
```
</details>
<details>
<summary><strong>🔧 Troubleshooting (Click to expand)</strong></summary>
### Common Issues
**Issue: Authentication error - missing API key**
```bash
Error: GEMINI_API_KEY or GOOGLE_API_KEY environment variable is required
```
**Solution:**
1. Get API key from https://aistudio.google.com/apikey
2. Set environment variable:
```bash
export GEMINI_API_KEY="your-api-key"
```
---
**Issue: File not uploading**
**Solution:**
1. Check if file is in cache and unchanged:
```bash
gemini-file-search-tool cache-report --store "your-store" | grep "your-file"
```
2. Force re-upload with `--rebuild-cache`:
```bash
gemini-file-search-tool upload "file.pdf" --store "your-store" --rebuild-cache
```
---
**Issue: Pending operations not completing**
**Solution:**
Run sync-cache to check operation status:
```bash
gemini-file-search-tool sync-cache --store "your-store" -v
```
---
**Issue: Store not found**
**Solution:**
1. List all stores:
```bash
gemini-file-search-tool list-stores
```
2. Use exact display name or ID from list output
---
**Issue: Query returning no results**
**Solution:**
1. Verify documents were uploaded successfully:
```bash
gemini-file-search-tool list-documents --store "your-store"
```
2. Check if uploads are pending:
```bash
gemini-file-search-tool cache-report --store "your-store" --pending-only
```
3. Try simpler query (avoid over-specification)
---
### Getting Help
```bash
# General help
gemini-file-search-tool --help
# Command-specific help
gemini-file-search-tool upload --help
gemini-file-search-tool query --help
# Verbose logging for debugging
gemini-file-search-tool upload "file.pdf" --store "test" -vvv
```
---
### Known Issues
**SDK Bug #1661 - Document Listing**
- `list-documents` uses REST API workaround
- Only works with Developer API (not Vertex AI)
- GitHub: https://github.com/googleapis/python-genai/issues/1661
</details>
## Exit Codes
- `0`: Success
- `1`: Error (authentication, validation, API error)
## Output Formats
**JSON (Default):**
All commands output JSON to stdout, logs to stderr for easy piping:
```bash
gemini-file-search-tool list-stores | jq '.[] | .display_name'
```
**Text (Select Commands):**
Some commands support `--text` flag for human-readable output:
```bash
gemini-file-search-tool cache-report --store "docs" --text
```
## Best Practices
1. **Use Caching**: Let the tool manage duplicates automatically, don't manually track uploads
2. **Flash Model for RAG**: 12x more cost-efficient than Pro, perfectly adequate for semantic search
3. **Enable Grounding**: Always use `--query-grounding-metadata` to verify sources and validate answers
4. **Simple Queries**: Keep queries simple and direct, avoid over-specification (RAG relies on semantic similarity)
5. **Code-RAG Documentation**: Comprehensive docstrings maximize retrieval quality for code search
6. **Parallel Processing**: Use `--num-workers` to speed up large uploads
7. **Async + Sync**: Use `--no-wait` for bulk uploads, then `sync-cache` to check final status
## Resources
- **GitHub**: https://github.com/dnvriend/gemini-file-search-tool
- **Gemini File Search API**: https://ai.google.dev/gemini-api/docs/file-search
- **Google AI Studio**: https://aistudio.google.com/
- **API Keys**: https://aistudio.google.com/apikey
- **Pricing**: https://ai.google.dev/pricing