8.7 KiB
CLI Operations Reference
Complete guide for operating CocoIndex flows using the CLI.
Overview
The CocoIndex CLI (cocoindex command) provides tools for managing and inspecting flows. Most commands require an APP_TARGET argument specifying where flow definitions are located.
Environment Setup
Environment Variables
Create a .env file in the project directory:
# Database connection (required)
COCOINDEX_DATABASE_URL=postgresql://user:password@localhost/cocoindex_db
# Optional: App namespace for organizing flows
COCOINDEX_APP_NAMESPACE=dev
# Optional: Global concurrency limits
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS=50
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES=524288000 # 500MB
# Optional: LLM API keys (if using LLM functions)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
VOYAGE_API_KEY=pa-...
Loading Environment Files
# Default: loads .env from current directory
cocoindex <command> ...
# Specify custom env file
cocoindex --env-file path/to/.env <command> ...
# Specify app directory
cocoindex --app-dir /path/to/project <command> ...
APP_TARGET Format
The APP_TARGET tells the CLI where flow definitions are located:
Python Module
# Load from module name
cocoindex update main
# Load from package module
cocoindex update my_package.flows
Python File
# Load from file path
cocoindex update main.py
# Load from nested file
cocoindex update path/to/flows.py
Specific Flow
# Target specific flow in module
cocoindex update main:MyFlowName
# Target specific flow in file
cocoindex update path/to/flows.py:MyFlowName
Core Commands
setup - Initialize Flow Resources
Create all persistent backends needed by flows (database tables, collections, etc.).
# Setup all flows
cocoindex setup main.py
# Setup specific flow
cocoindex setup main.py:MyFlow
What it does:
- Creates internal storage tables in Postgres
- Creates target resources (database tables, vector collections, graph structures)
- Updates schemas if flow definition changed
- No-op if already set up and no changes needed
When to use:
- First time running a flow
- After modifying flow structure (new fields, new targets)
- After dropping flows to recreate resources
update - Build/Update Target Data
Run transformations and update target data based on current source data.
# One-time update
cocoindex update main.py
# One-time update with setup
cocoindex update --setup main.py
# One-time update specific flow
cocoindex update main.py:TextEmbedding
# Force reexport even if no changes
cocoindex update --reexport main.py
What it does:
- Reads source data
- Applies transformations
- Updates target databases
- Uses incremental processing (only processes changed data)
Options:
--setup- Run setup first if needed--reexport- Reexport all data even if unchanged (useful after data loss)
update -L - Live Update Mode
Continuously monitor source changes and update targets.
# Live update mode
cocoindex update main.py -L
# Live update with setup
cocoindex update --setup main.py -L
# Live update with reexport on initial update
cocoindex update --reexport main.py -L
What it does:
- Performs initial one-time update
- Continuously monitors source changes
- Automatically processes updates
- Runs until aborted (Ctrl-C)
Requires:
- At least one source with change capture enabled:
refresh_intervalparameter on source- Source-specific change capture (Postgres notifications, S3 events, etc.)
Example with refresh interval:
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="documents"),
refresh_interval=datetime.timedelta(minutes=1) # Check every minute
)
drop - Remove Flow Resources
Remove all persistent backends owned by flows.
# Drop all flows
cocoindex drop main.py
# Drop specific flow
cocoindex drop main.py:MyFlow
What it does:
- Drops internal storage tables
- Drops target resources (tables, collections, graphs)
- Cleans up all persistent data
Warning: This is destructive and cannot be undone!
show - Inspect Flow Definition
Display flow structure and statistics.
# Show flow structure
cocoindex show main.py:MyFlow
# Show all flows
cocoindex show main.py
What it shows:
- Flow name and structure
- Sources configured
- Transformations defined
- Targets and their schemas
- Current statistics (if flow is set up)
evaluate - Test Flow Without Updating
Run transformations and dump results to files without updating targets.
# Evaluate flow
cocoindex evaluate main.py:MyFlow
# Specify output directory
cocoindex evaluate main.py:MyFlow --output-dir ./eval_results
# Disable cache
cocoindex evaluate main.py:MyFlow --no-cache
What it does:
- Runs transformations
- Saves results to files (JSON, CSV, etc.)
- Does NOT update targets
- Uses existing cache by default
When to use:
- Testing flow logic before running full update
- Debugging transformation issues
- Inspecting intermediate data
- Validating output format
Options:
--output-dir PATH- Directory for output files (default:eval_{flow_name}_{timestamp})--no-cache- Disable reading from cache (still doesn't write to cache)
Complete Workflow Examples
First-Time Setup and Indexing
# 1. Setup flow resources
cocoindex setup main.py
# 2. Run initial indexing
cocoindex update main.py
# 3. Verify results
cocoindex show main.py
Development Workflow
# 1. Test with evaluate (no side effects)
cocoindex evaluate main.py:MyFlow --output-dir ./test_output
# 2. If looks good, setup and update
cocoindex update --setup main.py:MyFlow
# 3. Check results
cocoindex show main.py:MyFlow
Production Live Updates
# Run with live updates and auto-setup
cocoindex update --setup main.py -L
Rebuild After Changes
# Drop old resources
cocoindex drop main.py
# Setup with new definition
cocoindex setup main.py
# Reindex everything
cocoindex update --reexport main.py
Multiple Flows
# Setup all flows
cocoindex setup main.py
# Update specific flows
cocoindex update main.py:CodeEmbedding
cocoindex update main.py:DocumentEmbedding
# Show all flows
cocoindex show main.py
Common Issues and Solutions
Issue: "Flow not found"
Problem: CLI can't find the flow definition.
Solutions:
# Make sure APP_TARGET is correct
cocoindex show main.py # Should list flows
# Use --app-dir if not in project root
cocoindex --app-dir /path/to/project show main.py
# Check flow name is correct
cocoindex show main.py:CorrectFlowName
Issue: "Database connection failed"
Problem: Can't connect to Postgres.
Solutions:
# Check .env file exists
cat .env | grep COCOINDEX_DATABASE_URL
# Test connection
psql $COCOINDEX_DATABASE_URL
# Use --env-file if .env is elsewhere
cocoindex --env-file /path/to/.env update main.py
Issue: "Schema mismatch"
Problem: Flow definition changed but resources not updated.
Solution:
# Re-run setup to update schemas
cocoindex setup main.py
# Then update data
cocoindex update main.py
Issue: "Live update exits immediately"
Problem: No change capture mechanisms enabled.
Solution: Add refresh_interval or use source-specific change capture:
data_scope["docs"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="docs"),
refresh_interval=datetime.timedelta(seconds=30) # Add this
)
Advanced Options
Global Options
# Show version
cocoindex --version
# Show help
cocoindex --help
cocoindex update --help
# Specify app directory
cocoindex --app-dir /custom/path update main
# Custom env file
cocoindex --env-file prod.env update main
Performance Tuning
Set environment variables for concurrency:
# In .env file
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS=100
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES=1073741824 # 1GB
Or per-source in code:
data_scope["docs"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="docs"),
max_inflight_rows=50,
max_inflight_bytes=500*1024*1024 # 500MB
)
Best Practices
- Use evaluate before update - Test flow logic without side effects
- Always setup before first update - Or use
--setupflag - Use live updates in production - Keeps targets always fresh
- Set app namespace - Organize flows across environments (dev/staging/prod)
- Monitor with show - Regularly check flow statistics
- Version control .env.example - Document required environment variables
- Use specific flow targets - For selective updates:
main.py:FlowName - Setup after definition changes - Ensures schemas match flow definition