Initial commit

2025-11-29 18:17:12 +08:00
commit b923023331
12 changed files with 860 additions and 0 deletions
--- a/commands/corpus.md
+++ b/commands/corpus.md
@@ -0,0 +1,97 @@
+---
+description: Manage dual-index corpora
+argument-hint: <create|sync> [options]
+---
+
+Manage corpora that combine both vector search (Qdrant) and full-text search (MeiliSearch) for the same content.
+
+**Subcommands:**
+
+- create: Create both Qdrant collection and MeiliSearch index
+- sync: Index directory to both systems simultaneously
+
+**Common Options:**
+
+- --json: Output in JSON format
+
+**Create Options:**
+
+- name: Corpus name (required)
+- --type: Corpus type - code or pdf (required)
+- --models: Embedding models, comma-separated (default: stella,jina)
+
+**Sync Options:**
+
+- directory: Directory path to index (required)
+- --corpus: Corpus name (required)
+- --models: Embedding models (default: stella,jina)
+- --file-types: File extensions to index (e.g., .py,.md)
+
+**Examples:**
+
+```text
+/corpus create MyDocs --type pdf --models stella
+/corpus sync ~/Documents --corpus MyDocs
+/corpus create CodeBase --type code
+/corpus sync ~/projects --corpus CodeBase --file-types .py,.js,.md
+```
+
+**Execution:**
+
+```bash
+cd ${CLAUDE_PLUGIN_ROOT}
+arc corpus $ARGUMENTS
+```
+
+**What Is a Corpus?**
+
+A corpus combines two search systems:
+1. **Vector search** (Qdrant): Semantic similarity, concept matching
+2. **Full-text search** (MeiliSearch): Keyword, phrase, boolean operators
+
+This enables hybrid search strategies:
+- Broad semantic discovery (vector search)
+- Precise keyword refinement (full-text search)
+- Combined results for best of both worlds
+
+**When to Use Corpus vs Collection:**
+
+**Use Corpus When:**
+- Need both semantic and keyword search
+- Users search different ways (concepts vs exact terms)
+- Want fast keyword filtering of semantic results
+- Building search UIs with multiple search modes
+
+**Use Collection When:**
+- Only need semantic search
+- Working with embeddings/vectors directly
+- Integrating with existing vector workflows
+- MeiliSearch not available/needed
+
+**How Sync Works:**
+
+1. Discovers files in directory (respects .gitignore for code)
+2. Chunks content appropriately (PDFs vs code)
+3. Generates embeddings with specified models
+4. Uploads to Qdrant (vector search)
+5. Indexes to MeiliSearch (full-text search)
+6. Both indexes share same document IDs and metadata
+
+**Performance:**
+
+Corpus sync is approximately 2x slower than single-system indexing due to dual upload, but still efficient:
+- PDFs: ~5-15/minute
+- Source files: 50-100 files/second
+
+**Related Commands:**
+
+- /collection create - Create vector-only collection
+- /index pdf - Index PDFs to vector only
+- /index code - Index code to vector only
+- /search semantic - Search vector index
+- /search text - Search full-text index
+
+**Implementation:**
+
+- RDR-009: Dual indexing strategy
+- RDR-006: Claude Code integration