Initial commit
This commit is contained in:
401
skills/cocoindex/references/cli_operations.md
Normal file
401
skills/cocoindex/references/cli_operations.md
Normal file
@@ -0,0 +1,401 @@
|
||||
# CLI Operations Reference
|
||||
|
||||
Complete guide for operating CocoIndex flows using the CLI.
|
||||
|
||||
## Overview
|
||||
|
||||
The CocoIndex CLI (`cocoindex` command) provides tools for managing and inspecting flows. Most commands require an `APP_TARGET` argument specifying where flow definitions are located.
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file in the project directory:
|
||||
|
||||
```bash
|
||||
# Database connection (required)
|
||||
COCOINDEX_DATABASE_URL=postgresql://user:password@localhost/cocoindex_db
|
||||
|
||||
# Optional: App namespace for organizing flows
|
||||
COCOINDEX_APP_NAMESPACE=dev
|
||||
|
||||
# Optional: Global concurrency limits
|
||||
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS=50
|
||||
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES=524288000 # 500MB
|
||||
|
||||
# Optional: LLM API keys (if using LLM functions)
|
||||
OPENAI_API_KEY=sk-...
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
VOYAGE_API_KEY=pa-...
|
||||
```
|
||||
|
||||
### Loading Environment Files
|
||||
|
||||
```bash
|
||||
# Default: loads .env from current directory
|
||||
cocoindex <command> ...
|
||||
|
||||
# Specify custom env file
|
||||
cocoindex --env-file path/to/.env <command> ...
|
||||
|
||||
# Specify app directory
|
||||
cocoindex --app-dir /path/to/project <command> ...
|
||||
```
|
||||
|
||||
## APP_TARGET Format
|
||||
|
||||
The `APP_TARGET` tells the CLI where flow definitions are located:
|
||||
|
||||
### Python Module
|
||||
```bash
|
||||
# Load from module name
|
||||
cocoindex update main
|
||||
|
||||
# Load from package module
|
||||
cocoindex update my_package.flows
|
||||
```
|
||||
|
||||
### Python File
|
||||
```bash
|
||||
# Load from file path
|
||||
cocoindex update main.py
|
||||
|
||||
# Load from nested file
|
||||
cocoindex update path/to/flows.py
|
||||
```
|
||||
|
||||
### Specific Flow
|
||||
```bash
|
||||
# Target specific flow in module
|
||||
cocoindex update main:MyFlowName
|
||||
|
||||
# Target specific flow in file
|
||||
cocoindex update path/to/flows.py:MyFlowName
|
||||
```
|
||||
|
||||
## Core Commands
|
||||
|
||||
### setup - Initialize Flow Resources
|
||||
|
||||
Create all persistent backends needed by flows (database tables, collections, etc.).
|
||||
|
||||
```bash
|
||||
# Setup all flows
|
||||
cocoindex setup main.py
|
||||
|
||||
# Setup specific flow
|
||||
cocoindex setup main.py:MyFlow
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Creates internal storage tables in Postgres
|
||||
- Creates target resources (database tables, vector collections, graph structures)
|
||||
- Updates schemas if flow definition changed
|
||||
- No-op if already set up and no changes needed
|
||||
|
||||
**When to use:**
|
||||
- First time running a flow
|
||||
- After modifying flow structure (new fields, new targets)
|
||||
- After dropping flows to recreate resources
|
||||
|
||||
### update - Build/Update Target Data
|
||||
|
||||
Run transformations and update target data based on current source data.
|
||||
|
||||
```bash
|
||||
# One-time update
|
||||
cocoindex update main.py
|
||||
|
||||
# One-time update with setup
|
||||
cocoindex update --setup main.py
|
||||
|
||||
# One-time update specific flow
|
||||
cocoindex update main.py:TextEmbedding
|
||||
|
||||
# Force reexport even if no changes
|
||||
cocoindex update --reexport main.py
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Reads source data
|
||||
- Applies transformations
|
||||
- Updates target databases
|
||||
- Uses incremental processing (only processes changed data)
|
||||
|
||||
**Options:**
|
||||
- `--setup` - Run setup first if needed
|
||||
- `--reexport` - Reexport all data even if unchanged (useful after data loss)
|
||||
|
||||
### update -L - Live Update Mode
|
||||
|
||||
Continuously monitor source changes and update targets.
|
||||
|
||||
```bash
|
||||
# Live update mode
|
||||
cocoindex update main.py -L
|
||||
|
||||
# Live update with setup
|
||||
cocoindex update --setup main.py -L
|
||||
|
||||
# Live update with reexport on initial update
|
||||
cocoindex update --reexport main.py -L
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Performs initial one-time update
|
||||
- Continuously monitors source changes
|
||||
- Automatically processes updates
|
||||
- Runs until aborted (Ctrl-C)
|
||||
|
||||
**Requires:**
|
||||
- At least one source with change capture enabled:
|
||||
- `refresh_interval` parameter on source
|
||||
- Source-specific change capture (Postgres notifications, S3 events, etc.)
|
||||
|
||||
**Example with refresh interval:**
|
||||
```python
|
||||
data_scope["documents"] = flow_builder.add_source(
|
||||
cocoindex.sources.LocalFile(path="documents"),
|
||||
refresh_interval=datetime.timedelta(minutes=1) # Check every minute
|
||||
)
|
||||
```
|
||||
|
||||
### drop - Remove Flow Resources
|
||||
|
||||
Remove all persistent backends owned by flows.
|
||||
|
||||
```bash
|
||||
# Drop all flows
|
||||
cocoindex drop main.py
|
||||
|
||||
# Drop specific flow
|
||||
cocoindex drop main.py:MyFlow
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Drops internal storage tables
|
||||
- Drops target resources (tables, collections, graphs)
|
||||
- Cleans up all persistent data
|
||||
|
||||
**Warning:** This is destructive and cannot be undone!
|
||||
|
||||
### show - Inspect Flow Definition
|
||||
|
||||
Display flow structure and statistics.
|
||||
|
||||
```bash
|
||||
# Show flow structure
|
||||
cocoindex show main.py:MyFlow
|
||||
|
||||
# Show all flows
|
||||
cocoindex show main.py
|
||||
```
|
||||
|
||||
**What it shows:**
|
||||
- Flow name and structure
|
||||
- Sources configured
|
||||
- Transformations defined
|
||||
- Targets and their schemas
|
||||
- Current statistics (if flow is set up)
|
||||
|
||||
### evaluate - Test Flow Without Updating
|
||||
|
||||
Run transformations and dump results to files without updating targets.
|
||||
|
||||
```bash
|
||||
# Evaluate flow
|
||||
cocoindex evaluate main.py:MyFlow
|
||||
|
||||
# Specify output directory
|
||||
cocoindex evaluate main.py:MyFlow --output-dir ./eval_results
|
||||
|
||||
# Disable cache
|
||||
cocoindex evaluate main.py:MyFlow --no-cache
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
- Runs transformations
|
||||
- Saves results to files (JSON, CSV, etc.)
|
||||
- Does NOT update targets
|
||||
- Uses existing cache by default
|
||||
|
||||
**When to use:**
|
||||
- Testing flow logic before running full update
|
||||
- Debugging transformation issues
|
||||
- Inspecting intermediate data
|
||||
- Validating output format
|
||||
|
||||
**Options:**
|
||||
- `--output-dir PATH` - Directory for output files (default: `eval_{flow_name}_{timestamp}`)
|
||||
- `--no-cache` - Disable reading from cache (still doesn't write to cache)
|
||||
|
||||
## Complete Workflow Examples
|
||||
|
||||
### First-Time Setup and Indexing
|
||||
|
||||
```bash
|
||||
# 1. Setup flow resources
|
||||
cocoindex setup main.py
|
||||
|
||||
# 2. Run initial indexing
|
||||
cocoindex update main.py
|
||||
|
||||
# 3. Verify results
|
||||
cocoindex show main.py
|
||||
```
|
||||
|
||||
### Development Workflow
|
||||
|
||||
```bash
|
||||
# 1. Test with evaluate (no side effects)
|
||||
cocoindex evaluate main.py:MyFlow --output-dir ./test_output
|
||||
|
||||
# 2. If looks good, setup and update
|
||||
cocoindex update --setup main.py:MyFlow
|
||||
|
||||
# 3. Check results
|
||||
cocoindex show main.py:MyFlow
|
||||
```
|
||||
|
||||
### Production Live Updates
|
||||
|
||||
```bash
|
||||
# Run with live updates and auto-setup
|
||||
cocoindex update --setup main.py -L
|
||||
```
|
||||
|
||||
### Rebuild After Changes
|
||||
|
||||
```bash
|
||||
# Drop old resources
|
||||
cocoindex drop main.py
|
||||
|
||||
# Setup with new definition
|
||||
cocoindex setup main.py
|
||||
|
||||
# Reindex everything
|
||||
cocoindex update --reexport main.py
|
||||
```
|
||||
|
||||
### Multiple Flows
|
||||
|
||||
```bash
|
||||
# Setup all flows
|
||||
cocoindex setup main.py
|
||||
|
||||
# Update specific flows
|
||||
cocoindex update main.py:CodeEmbedding
|
||||
cocoindex update main.py:DocumentEmbedding
|
||||
|
||||
# Show all flows
|
||||
cocoindex show main.py
|
||||
```
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Issue: "Flow not found"
|
||||
|
||||
**Problem:** CLI can't find the flow definition.
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# Make sure APP_TARGET is correct
|
||||
cocoindex show main.py # Should list flows
|
||||
|
||||
# Use --app-dir if not in project root
|
||||
cocoindex --app-dir /path/to/project show main.py
|
||||
|
||||
# Check flow name is correct
|
||||
cocoindex show main.py:CorrectFlowName
|
||||
```
|
||||
|
||||
### Issue: "Database connection failed"
|
||||
|
||||
**Problem:** Can't connect to Postgres.
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# Check .env file exists
|
||||
cat .env | grep COCOINDEX_DATABASE_URL
|
||||
|
||||
# Test connection
|
||||
psql $COCOINDEX_DATABASE_URL
|
||||
|
||||
# Use --env-file if .env is elsewhere
|
||||
cocoindex --env-file /path/to/.env update main.py
|
||||
```
|
||||
|
||||
### Issue: "Schema mismatch"
|
||||
|
||||
**Problem:** Flow definition changed but resources not updated.
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Re-run setup to update schemas
|
||||
cocoindex setup main.py
|
||||
|
||||
# Then update data
|
||||
cocoindex update main.py
|
||||
```
|
||||
|
||||
### Issue: "Live update exits immediately"
|
||||
|
||||
**Problem:** No change capture mechanisms enabled.
|
||||
|
||||
**Solution:**
|
||||
Add refresh_interval or use source-specific change capture:
|
||||
```python
|
||||
data_scope["docs"] = flow_builder.add_source(
|
||||
cocoindex.sources.LocalFile(path="docs"),
|
||||
refresh_interval=datetime.timedelta(seconds=30) # Add this
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Options
|
||||
|
||||
### Global Options
|
||||
|
||||
```bash
|
||||
# Show version
|
||||
cocoindex --version
|
||||
|
||||
# Show help
|
||||
cocoindex --help
|
||||
cocoindex update --help
|
||||
|
||||
# Specify app directory
|
||||
cocoindex --app-dir /custom/path update main
|
||||
|
||||
# Custom env file
|
||||
cocoindex --env-file prod.env update main
|
||||
```
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
Set environment variables for concurrency:
|
||||
|
||||
```bash
|
||||
# In .env file
|
||||
COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS=100
|
||||
COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES=1073741824 # 1GB
|
||||
```
|
||||
|
||||
Or per-source in code:
|
||||
```python
|
||||
data_scope["docs"] = flow_builder.add_source(
|
||||
cocoindex.sources.LocalFile(path="docs"),
|
||||
max_inflight_rows=50,
|
||||
max_inflight_bytes=500*1024*1024 # 500MB
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use evaluate before update** - Test flow logic without side effects
|
||||
2. **Always setup before first update** - Or use `--setup` flag
|
||||
3. **Use live updates in production** - Keeps targets always fresh
|
||||
4. **Set app namespace** - Organize flows across environments (dev/staging/prod)
|
||||
5. **Monitor with show** - Regularly check flow statistics
|
||||
6. **Version control .env.example** - Document required environment variables
|
||||
7. **Use specific flow targets** - For selective updates: `main.py:FlowName`
|
||||
8. **Setup after definition changes** - Ensures schemas match flow definition
|
||||
Reference in New Issue
Block a user