2.6 KiB
2.6 KiB
description, argument-hint
| description | argument-hint |
|---|---|
| Manage dual-index corpora | <create|sync> [options] |
Manage corpora that combine both vector search (Qdrant) and full-text search (MeiliSearch) for the same content.
Subcommands:
- create: Create both Qdrant collection and MeiliSearch index
- sync: Index directory to both systems simultaneously
Common Options:
- --json: Output in JSON format
Create Options:
- name: Corpus name (required)
- --type: Corpus type - code or pdf (required)
- --models: Embedding models, comma-separated (default: stella,jina)
Sync Options:
- directory: Directory path to index (required)
- --corpus: Corpus name (required)
- --models: Embedding models (default: stella,jina)
- --file-types: File extensions to index (e.g., .py,.md)
Examples:
/corpus create MyDocs --type pdf --models stella
/corpus sync ~/Documents --corpus MyDocs
/corpus create CodeBase --type code
/corpus sync ~/projects --corpus CodeBase --file-types .py,.js,.md
Execution:
cd ${CLAUDE_PLUGIN_ROOT}
arc corpus $ARGUMENTS
What Is a Corpus?
A corpus combines two search systems:
- Vector search (Qdrant): Semantic similarity, concept matching
- Full-text search (MeiliSearch): Keyword, phrase, boolean operators
This enables hybrid search strategies:
- Broad semantic discovery (vector search)
- Precise keyword refinement (full-text search)
- Combined results for best of both worlds
When to Use Corpus vs Collection:
Use Corpus When:
- Need both semantic and keyword search
- Users search different ways (concepts vs exact terms)
- Want fast keyword filtering of semantic results
- Building search UIs with multiple search modes
Use Collection When:
- Only need semantic search
- Working with embeddings/vectors directly
- Integrating with existing vector workflows
- MeiliSearch not available/needed
How Sync Works:
- Discovers files in directory (respects .gitignore for code)
- Chunks content appropriately (PDFs vs code)
- Generates embeddings with specified models
- Uploads to Qdrant (vector search)
- Indexes to MeiliSearch (full-text search)
- Both indexes share same document IDs and metadata
Performance:
Corpus sync is approximately 2x slower than single-system indexing due to dual upload, but still efficient:
- PDFs: ~5-15/minute
- Source files: 50-100 files/second
Related Commands:
- /collection create - Create vector-only collection
- /index pdf - Index PDFs to vector only
- /index code - Index code to vector only
- /search semantic - Search vector index
- /search text - Search full-text index
Implementation:
- RDR-009: Dual indexing strategy
- RDR-006: Claude Code integration