Initial commit
This commit is contained in:
878
skills/semantic-code-search/SKILL.md
Normal file
878
skills/semantic-code-search/SKILL.md
Normal file
@@ -0,0 +1,878 @@
|
||||
---
|
||||
name: semantic-code-search
|
||||
description: Expert guidance on using the claude-context MCP for semantic code search. Provides best practices for indexing large codebases, formulating effective search queries, optimizing performance, and integrating vector-based code retrieval into investigation workflows. Use when working with large codebases, optimizing token usage, or when grep/ripgrep searches are insufficient.
|
||||
allowed-tools: Task
|
||||
---
|
||||
|
||||
# Semantic Code Search Expert
|
||||
|
||||
This Skill provides comprehensive guidance on leveraging the claude-context MCP server for efficient, semantic code search across large codebases using hybrid vector retrieval (BM25 + dense embeddings).
|
||||
|
||||
## When to use this Skill
|
||||
|
||||
Claude should invoke this Skill when:
|
||||
|
||||
- Working with large codebases (10k+ lines of code)
|
||||
- Need semantic understanding beyond keyword matching
|
||||
- Want to optimize token consumption (reduce context usage by ~40%)
|
||||
- Traditional grep/ripgrep searches return too many false positives
|
||||
- Need to find functionality by concept rather than exact keywords
|
||||
- User asks: "index this codebase", "search semantically", "find where authentication is handled"
|
||||
- Before launching codebase-detective for large-scale investigations
|
||||
- User mentions: "claude-context", "vector search", "semantic search", "index code"
|
||||
- Token budget is constrained and need efficient code retrieval
|
||||
|
||||
## Core Capabilities of Claude-Context MCP
|
||||
|
||||
### Available Tools
|
||||
|
||||
1. **mcp__claude-context__index_codebase** - Index a directory with configurable splitter
|
||||
2. **mcp__claude-context__search_code** - Natural language semantic search
|
||||
3. **mcp__claude-context__clear_index** - Remove search indexes
|
||||
4. **mcp__claude-context__get_indexing_status** - Check indexing progress
|
||||
|
||||
### Key Benefits
|
||||
|
||||
- **40% Token Reduction**: Retrieve only relevant code snippets vs loading entire directories
|
||||
- **Semantic Understanding**: Find code by what it does, not just what it's named
|
||||
- **Scale**: Handle millions of lines of code efficiently
|
||||
- **Hybrid Search**: Combines BM25 keyword matching with dense vector embeddings
|
||||
- **Multi-Round Avoidance**: Get relevant results in one query vs multiple grep attempts
|
||||
|
||||
## Instructions
|
||||
|
||||
### Phase 1: Decide If Claude-Context Is Appropriate
|
||||
|
||||
**Use Claude-Context When:**
|
||||
|
||||
✅ Codebase is large (10k+ lines)
|
||||
✅ Need to find functionality by concept ("authentication logic", "payment processing")
|
||||
✅ Working with unfamiliar codebase
|
||||
✅ Token budget is limited
|
||||
✅ Need to search across multiple languages/frameworks
|
||||
✅ grep returns hundreds of matches and you need the most relevant ones
|
||||
✅ Investigation requires understanding semantic relationships
|
||||
|
||||
**DON'T Use Claude-Context When:**
|
||||
|
||||
❌ Searching for exact string matches (use grep/ripgrep instead)
|
||||
❌ Codebase is small (<5k lines) - overhead not worth it
|
||||
❌ Looking for specific file names (use find/glob instead)
|
||||
❌ Searching within 2-3 known files (use Read tool instead)
|
||||
❌ Need regex pattern matching (use grep/ripgrep instead)
|
||||
❌ Time-sensitive quick lookup (indexing takes time)
|
||||
|
||||
### Phase 2: Indexing Best Practices
|
||||
|
||||
#### 2.1 Initial Indexing
|
||||
|
||||
**Standard Indexing (Recommended):**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__index_codebase with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
splitter: "ast", // Syntax-aware with automatic fallback
|
||||
force: false // Don't re-index if already indexed
|
||||
}
|
||||
```
|
||||
|
||||
**Why AST Splitter?**
|
||||
- Preserves code structure (functions, classes stay intact)
|
||||
- Automatically falls back to character-based for non-code files
|
||||
- Better semantic coherence in search results
|
||||
|
||||
**When to Use LangChain Splitter:**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__index_codebase with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
splitter: "langchain", // Character-based splitting
|
||||
force: false
|
||||
}
|
||||
```
|
||||
|
||||
Use LangChain when:
|
||||
- Codebase has many configuration/data files (JSON, YAML, XML)
|
||||
- Documentation-heavy projects (Markdown, text files)
|
||||
- AST parsing fails frequently for your languages
|
||||
|
||||
#### 2.2 Custom File Extensions
|
||||
|
||||
**Include Additional Extensions:**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__index_codebase with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
splitter: "ast",
|
||||
customExtensions: [".vue", ".svelte", ".astro", ".prisma", ".proto"]
|
||||
}
|
||||
```
|
||||
|
||||
**Common Custom Extensions by Framework:**
|
||||
|
||||
- Vue.js: `[".vue"]`
|
||||
- Svelte: `[".svelte"]`
|
||||
- Astro: `[".astro"]`
|
||||
- Prisma: `[".prisma"]`
|
||||
- GraphQL: `[".graphql", ".gql"]`
|
||||
- Protocol Buffers: `[".proto"]`
|
||||
- Terraform: `[".tf", ".tfvars"]`
|
||||
|
||||
#### 2.3 Ignore Patterns
|
||||
|
||||
**Default Ignored (Automatic):**
|
||||
- `node_modules/`, `dist/`, `build/`, `.git/`
|
||||
- `vendor/`, `target/`, `__pycache__/`
|
||||
|
||||
**Add Custom Ignore Patterns:**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__index_codebase with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
splitter: "ast",
|
||||
ignorePatterns: [
|
||||
"generated/**", // Generated code
|
||||
"*.min.js", // Minified files
|
||||
"*.bundle.js", // Bundled files
|
||||
"test-data/**", // Large test fixtures
|
||||
"docs/api/**", // Auto-generated docs
|
||||
".storybook/**", // Storybook config
|
||||
"*.lock", // Lock files
|
||||
"static/vendor/**" // Third-party static files
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**When to Use ignorePatterns:**
|
||||
- Generated code clutters search results
|
||||
- Large static assets slow indexing
|
||||
- Third-party code isn't relevant to your investigation
|
||||
- Test fixtures create noise
|
||||
|
||||
⚠️ **IMPORTANT**: Only use `ignorePatterns` when user explicitly requests custom filtering. Don't add it by default.
|
||||
|
||||
#### 2.4 Force Re-Indexing
|
||||
|
||||
**When to Force Re-Index:**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__index_codebase with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
splitter: "ast",
|
||||
force: true // ⚠️ Overwrites existing index
|
||||
}
|
||||
```
|
||||
|
||||
Use `force: true` when:
|
||||
- Codebase has changed significantly
|
||||
- Previous indexing was interrupted
|
||||
- Switching between splitters (ast ↔ langchain)
|
||||
- Search results seem outdated
|
||||
- Adding/removing custom extensions or ignore patterns
|
||||
|
||||
**Conflict Handling:**
|
||||
If indexing is attempted on an already indexed path, ALWAYS:
|
||||
1. Inform the user that the path is already indexed
|
||||
2. Ask if they want to force re-index
|
||||
3. Explain the trade-off (time vs freshness)
|
||||
4. Only proceed with `force: true` if user confirms
|
||||
|
||||
#### 2.5 Monitor Indexing Progress
|
||||
|
||||
**Check Status:**
|
||||
|
||||
```typescript
|
||||
mcp__claude-context__get_indexing_status with:
|
||||
{
|
||||
path: "/absolute/path/to/project"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Indicators:**
|
||||
- `Indexing... (45%)` - Still processing
|
||||
- `Indexed: 1,234 chunks from 567 files` - Complete
|
||||
- `Not indexed` - Never indexed or cleared
|
||||
|
||||
**Best Practice:**
|
||||
For large codebases (100k+ lines), check status every 30 seconds to provide user updates.
|
||||
|
||||
### Phase 3: Search Query Formulation
|
||||
|
||||
#### 3.1 Effective Query Patterns
|
||||
|
||||
**Concept-Based Queries (Best for Claude-Context):**
|
||||
|
||||
```typescript
|
||||
// ✅ GOOD - Semantic concepts
|
||||
search_code with query: "user authentication login flow with JWT tokens"
|
||||
search_code with query: "database connection pooling initialization"
|
||||
search_code with query: "error handling middleware for HTTP requests"
|
||||
search_code with query: "WebSocket connection establishment and message handling"
|
||||
search_code with query: "payment processing with Stripe integration"
|
||||
```
|
||||
|
||||
**Why These Work:**
|
||||
- Natural language describes WHAT the code does
|
||||
- Multiple related concepts improve relevance ranking
|
||||
- Captures intent, not just syntax
|
||||
|
||||
**Keyword Queries (Better for grep):**
|
||||
|
||||
```typescript
|
||||
// ⚠️ OKAY - Works but not optimal
|
||||
search_code with query: "authenticateUser function"
|
||||
search_code with query: "UserRepository class"
|
||||
```
|
||||
|
||||
**Why Less Optimal:**
|
||||
- Assumes you know exact naming
|
||||
- Misses semantically similar code with different names
|
||||
- Better handled by grep if you know the exact term
|
||||
|
||||
**Avoid These:**
|
||||
|
||||
```typescript
|
||||
// ❌ BAD - Too generic
|
||||
search_code with query: "user"
|
||||
search_code with query: "function"
|
||||
|
||||
// ❌ BAD - Too specific/technical
|
||||
search_code with query: "express.Router().post('/api/users')"
|
||||
search_code with query: "class UserService extends BaseService implements IUserService"
|
||||
|
||||
// ❌ BAD - Regex patterns (use grep instead)
|
||||
search_code with query: "func.*Handler|HandlerFunc"
|
||||
```
|
||||
|
||||
#### 3.2 Query Templates by Use Case
|
||||
|
||||
**Finding Authentication/Authorization:**
|
||||
```typescript
|
||||
"user login authentication with password validation and session creation"
|
||||
"JWT token generation and validation middleware"
|
||||
"OAuth2 authentication flow with Google provider"
|
||||
"role-based access control permission checking"
|
||||
"API key authentication verification"
|
||||
```
|
||||
|
||||
**Finding Database Operations:**
|
||||
```typescript
|
||||
"user data persistence save to database"
|
||||
"SQL query execution with prepared statements"
|
||||
"MongoDB collection find and update operations"
|
||||
"database transaction commit and rollback handling"
|
||||
"ORM model definition for user entity"
|
||||
```
|
||||
|
||||
**Finding API Endpoints:**
|
||||
```typescript
|
||||
"HTTP POST endpoint for creating new users"
|
||||
"GraphQL resolver for user queries and mutations"
|
||||
"REST API handler for updating user profile"
|
||||
"WebSocket event handler for chat messages"
|
||||
```
|
||||
|
||||
**Finding Business Logic:**
|
||||
```typescript
|
||||
"shopping cart calculation with tax and discounts"
|
||||
"email notification sending after user registration"
|
||||
"file upload processing with virus scanning"
|
||||
"report generation with PDF export"
|
||||
```
|
||||
|
||||
**Finding Configuration:**
|
||||
```typescript
|
||||
"environment variable configuration loading"
|
||||
"database connection string setup"
|
||||
"API rate limiting configuration"
|
||||
"CORS policy definition for cross-origin requests"
|
||||
```
|
||||
|
||||
**Finding Error Handling:**
|
||||
```typescript
|
||||
"global error handler for uncaught exceptions"
|
||||
"validation error formatting for API responses"
|
||||
"retry logic for failed HTTP requests"
|
||||
"logging critical errors to monitoring service"
|
||||
```
|
||||
|
||||
#### 3.3 Extension Filtering
|
||||
|
||||
**Filter by File Type:**
|
||||
|
||||
```typescript
|
||||
// Only search TypeScript files
|
||||
search_code with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
query: "user authentication",
|
||||
extensionFilter: [".ts", ".tsx"]
|
||||
}
|
||||
|
||||
// Only search Go files
|
||||
search_code with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
query: "HTTP handler implementation",
|
||||
extensionFilter: [".go"]
|
||||
}
|
||||
|
||||
// Search configs only
|
||||
search_code with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
query: "database connection settings",
|
||||
extensionFilter: [".json", ".yaml", ".env"]
|
||||
}
|
||||
```
|
||||
|
||||
**When to Use Extension Filters:**
|
||||
- Multi-language projects (frontend + backend)
|
||||
- Avoid irrelevant results from wrong language
|
||||
- Focus on specific layer (e.g., only database layer .go files)
|
||||
- Search configuration vs code separately
|
||||
|
||||
#### 3.4 Result Limiting
|
||||
|
||||
**Default Limit:**
|
||||
```typescript
|
||||
search_code with:
|
||||
{
|
||||
path: "/absolute/path/to/project",
|
||||
query: "authentication logic",
|
||||
limit: 10 // Default: 10 results
|
||||
}
|
||||
```
|
||||
|
||||
**Adjust Based on Use Case:**
|
||||
|
||||
```typescript
|
||||
// Quick overview - fewest results
|
||||
limit: 5
|
||||
|
||||
// Standard investigation - balanced
|
||||
limit: 10 // Recommended default
|
||||
|
||||
// Comprehensive search - more results
|
||||
limit: 20
|
||||
|
||||
// Exhaustive - find everything
|
||||
limit: 50 // Maximum allowed
|
||||
```
|
||||
|
||||
**Guideline:**
|
||||
- Start with 10 results
|
||||
- If too many false positives → refine query
|
||||
- If missing relevant code → increase limit
|
||||
- Never go below 5 (might miss important code)
|
||||
|
||||
### Phase 4: Performance Optimization Strategies
|
||||
|
||||
#### 4.1 Token Optimization
|
||||
|
||||
**Technique 1: Targeted Searches vs Full Directory Reads**
|
||||
|
||||
```typescript
|
||||
// ❌ WASTEFUL - Loads entire directory into context
|
||||
Read with path: "/project/src/**/*.ts"
|
||||
|
||||
// ✅ EFFICIENT - Returns only relevant snippets
|
||||
search_code with:
|
||||
{
|
||||
query: "user authentication flow",
|
||||
extensionFilter: [".ts"],
|
||||
limit: 10
|
||||
}
|
||||
```
|
||||
|
||||
**Token Savings:**
|
||||
- Full directory: ~50,000 tokens
|
||||
- Semantic search: ~5,000 tokens (10 snippets × ~500 tokens each)
|
||||
- **Savings: 90%**
|
||||
|
||||
**Technique 2: Iterative Refinement**
|
||||
|
||||
```typescript
|
||||
// First search - broad
|
||||
search_code with query: "user authentication"
|
||||
// Returns 10 results, review them
|
||||
|
||||
// Second search - refined based on findings
|
||||
search_code with query: "JWT token generation in authentication service"
|
||||
// Returns more specific results
|
||||
```
|
||||
|
||||
**Why This Works:**
|
||||
- First search gives context
|
||||
- Second search uses insights from first search
|
||||
- Total tokens < loading entire codebase
|
||||
|
||||
**Technique 3: Combine with Targeted Reads**
|
||||
|
||||
```typescript
|
||||
// 1. Semantic search to find relevant files
|
||||
search_code with query: "payment processing logic"
|
||||
// Returns: src/services/paymentService.ts:45-89
|
||||
|
||||
// 2. Read specific file for full context
|
||||
Read with path: "/project/src/services/paymentService.ts"
|
||||
```
|
||||
|
||||
**Workflow:**
|
||||
1. Search semantically → get file locations
|
||||
2. Read specific files → get full context
|
||||
3. Only load what you need
|
||||
|
||||
#### 4.2 Indexing Performance
|
||||
|
||||
**Optimize Indexing Time:**
|
||||
|
||||
1. **Index Once, Search Many**
|
||||
- Don't re-index unless code changed significantly
|
||||
- Check status before re-indexing
|
||||
|
||||
2. **Use Appropriate Splitter**
|
||||
- AST splitter: Slower indexing, better search results
|
||||
- LangChain splitter: Faster indexing, more general results
|
||||
|
||||
3. **Strategic Ignore Patterns**
|
||||
- Exclude generated code, vendor files
|
||||
- Reduces indexing time by 30-50%
|
||||
|
||||
4. **Incremental Approach**
|
||||
- For massive projects, index subdirectories separately
|
||||
- Example: Index `src/`, `lib/`, `api/` separately
|
||||
|
||||
**Indexing Time Expectations:**
|
||||
|
||||
| Codebase Size | Splitter | Expected Time |
|
||||
|--------------|----------|---------------|
|
||||
| 10k lines | AST | 30-60 sec |
|
||||
| 50k lines | AST | 2-5 min |
|
||||
| 100k lines | AST | 5-10 min |
|
||||
| 500k lines | AST | 20-30 min |
|
||||
| 10k lines | LangChain| 15-30 sec |
|
||||
| 100k lines | LangChain| 2-4 min |
|
||||
|
||||
### Phase 5: Integration with Code Investigation Workflows
|
||||
|
||||
#### 5.1 With Codebase-Detective Agent
|
||||
|
||||
**Recommended Workflow:**
|
||||
|
||||
```markdown
|
||||
# When user asks: "How does authentication work?"
|
||||
|
||||
## Step 1: Index (if not already indexed)
|
||||
mcp__claude-context__index_codebase
|
||||
|
||||
## Step 2: Semantic Search
|
||||
search_code with query: "user authentication login flow"
|
||||
search_code with query: "password validation and hashing"
|
||||
search_code with query: "session token generation and storage"
|
||||
|
||||
## Step 3: Launch Codebase-Detective
|
||||
Task tool with subagent_type: "code-analysis:detective"
|
||||
Provide detective with:
|
||||
- Search results (file locations)
|
||||
- User's question
|
||||
- Specific files to investigate
|
||||
|
||||
## Step 4: Deep Dive
|
||||
Detective uses semantic search results as starting points
|
||||
Reads specific files
|
||||
Traces code flow
|
||||
Provides comprehensive analysis
|
||||
```
|
||||
|
||||
**Why This Workflow?**
|
||||
- Semantic search narrows scope (saves tokens)
|
||||
- Detective focuses on relevant files (saves time)
|
||||
- Combined approach: breadth (search) + depth (detective)
|
||||
|
||||
#### 5.2 Semantic Search → Grep → Read Pattern
|
||||
|
||||
**For Complex Investigations:**
|
||||
|
||||
```typescript
|
||||
// 1. Semantic search for general area
|
||||
search_code with query: "HTTP request middleware authentication"
|
||||
// Results: 10 files in middleware/
|
||||
|
||||
// 2. Grep for specific patterns in those files
|
||||
Grep with pattern: "req\.user|req\.auth" in middleware/
|
||||
|
||||
// 3. Read exact implementations
|
||||
Read specific files identified above
|
||||
```
|
||||
|
||||
**When to Use This Pattern:**
|
||||
- Need both semantic understanding AND exact syntax
|
||||
- Want to verify search results with grep
|
||||
- Investigating specific implementation details
|
||||
|
||||
### Phase 6: Troubleshooting and Common Pitfalls
|
||||
|
||||
#### 6.1 Indexing Issues
|
||||
|
||||
**Problem: "Indexing stuck at 0%"**
|
||||
|
||||
Solutions:
|
||||
1. Check Node.js version (must be 20.x, NOT 24.x)
|
||||
2. Verify OPENAI_API_KEY is set
|
||||
3. Verify MILVUS_TOKEN is set
|
||||
4. Check path is absolute, not relative
|
||||
5. Ensure directory exists and is readable
|
||||
|
||||
**Problem: "Indexing failed halfway through"**
|
||||
|
||||
Solutions:
|
||||
1. Clear index: `clear_index`
|
||||
2. Re-index with `force: true`
|
||||
3. Check for corrupted files in codebase
|
||||
4. Try LangChain splitter instead of AST
|
||||
|
||||
**Problem: "Already indexed but want to update"**
|
||||
|
||||
Solution:
|
||||
1. Ask user if they want to force re-index
|
||||
2. Explain trade-off: time vs freshness
|
||||
3. Use `force: true` if confirmed
|
||||
|
||||
#### 6.2 Search Quality Issues
|
||||
|
||||
**Problem: "Search returns irrelevant results"**
|
||||
|
||||
Solutions:
|
||||
1. Make query more specific:
|
||||
- ❌ "user" → ✅ "user login authentication with password"
|
||||
2. Add extension filter to narrow scope
|
||||
3. Reduce limit to see top results only
|
||||
4. Try different query phrasing (synonyms, related concepts)
|
||||
|
||||
**Problem: "Search misses relevant code"**
|
||||
|
||||
Solutions:
|
||||
1. Broaden query:
|
||||
- ❌ "JWT token validation middleware" → ✅ "authentication verification"
|
||||
2. Increase limit (try 20 or 30)
|
||||
3. Try multiple searches with different keywords
|
||||
4. Check if file is actually indexed (might be in ignore patterns)
|
||||
|
||||
**Problem: "Too many results, all seem relevant"**
|
||||
|
||||
Solutions:
|
||||
1. Use extension filters to focus on specific files
|
||||
2. Combine with follow-up searches:
|
||||
- First: Broad search
|
||||
- Second: Specific search based on first results
|
||||
3. Use limit: 5 to see only top matches
|
||||
|
||||
#### 6.3 Performance Issues
|
||||
|
||||
**Problem: "Indexing takes too long"**
|
||||
|
||||
Solutions:
|
||||
1. Add ignore patterns for generated/vendor code
|
||||
2. Use LangChain splitter (faster but less accurate)
|
||||
3. Index subdirectories separately
|
||||
4. Check for very large files (>10MB) and exclude them
|
||||
|
||||
**Problem: "Search is slow"**
|
||||
|
||||
Solutions:
|
||||
1. Reduce limit (fewer results = faster)
|
||||
2. Use extension filters (smaller search space)
|
||||
3. Check indexing status (still indexing = slow search)
|
||||
|
||||
**Problem: "Using too many tokens"**
|
||||
|
||||
Solutions:
|
||||
1. Reduce search limit
|
||||
2. Use extension filters
|
||||
3. Make queries more specific (fewer but better results)
|
||||
4. Combine search with targeted reads (not full directory reads)
|
||||
|
||||
### Phase 7: Real-World Workflow Examples
|
||||
|
||||
#### Example 1: Investigating New Codebase
|
||||
|
||||
```markdown
|
||||
User: "I'm new to this project, help me understand the architecture"
|
||||
|
||||
## Workflow:
|
||||
|
||||
1. Index the codebase
|
||||
mcp__claude-context__index_codebase with path: "/project"
|
||||
|
||||
2. Search for entry points
|
||||
search_code with query: "application startup initialization main function"
|
||||
|
||||
3. Search for architecture patterns
|
||||
search_code with query: "dependency injection container service registration"
|
||||
search_code with query: "routing configuration API endpoint definitions"
|
||||
search_code with query: "database connection setup and migrations"
|
||||
|
||||
4. Search for domain models
|
||||
search_code with query: "core business entities data models"
|
||||
|
||||
5. Launch codebase-detective with findings
|
||||
Task tool with all search results as context
|
||||
|
||||
6. Provide architecture overview to user
|
||||
```
|
||||
|
||||
#### Example 2: Finding and Fixing a Bug
|
||||
|
||||
```markdown
|
||||
User: "Users can't reset their passwords, investigate"
|
||||
|
||||
## Workflow:
|
||||
|
||||
1. Ensure codebase is indexed
|
||||
get_indexing_status with path: "/project"
|
||||
|
||||
2. Search for password reset functionality
|
||||
search_code with query: "password reset request token generation email"
|
||||
search_code with query: "password reset verification token validation"
|
||||
search_code with query: "update user password after reset"
|
||||
|
||||
3. Find related error handling
|
||||
search_code with query: "password reset error handling validation"
|
||||
|
||||
4. Narrow down to specific files
|
||||
extensionFilter: [".ts", ".tsx"] to focus on TypeScript
|
||||
|
||||
5. Read specific implementations
|
||||
Read files identified in search
|
||||
|
||||
6. Identify bug and propose fix
|
||||
|
||||
7. Search for tests
|
||||
search_code with query: "password reset test cases" to find where to add tests
|
||||
```
|
||||
|
||||
#### Example 3: Adding New Feature to Existing System
|
||||
|
||||
```markdown
|
||||
User: "Add two-factor authentication to login"
|
||||
|
||||
## Workflow:
|
||||
|
||||
1. Index codebase (if needed)
|
||||
|
||||
2. Find existing authentication
|
||||
search_code with query: "user login authentication password verification"
|
||||
|
||||
3. Find similar security features
|
||||
search_code with query: "token generation validation security verification"
|
||||
|
||||
4. Find where to integrate
|
||||
search_code with query: "login flow user session creation after authentication"
|
||||
|
||||
5. Find database models
|
||||
search_code with query: "user model schema database table"
|
||||
|
||||
6. Find configuration patterns
|
||||
search_code with query: "feature flags configuration settings"
|
||||
|
||||
7. Launch codebase-detective with context
|
||||
Provide all search results to guide implementation
|
||||
|
||||
8. Implement 2FA based on existing patterns
|
||||
```
|
||||
|
||||
#### Example 4: Security Audit
|
||||
|
||||
```markdown
|
||||
User: "Audit the codebase for security issues"
|
||||
|
||||
## Workflow:
|
||||
|
||||
1. Index entire codebase
|
||||
|
||||
2. Search for authentication weaknesses
|
||||
search_code with query: "password storage hashing bcrypt authentication"
|
||||
search_code with query: "SQL query construction user input database"
|
||||
|
||||
3. Search for authorization issues
|
||||
search_code with query: "access control permission checking authorization"
|
||||
search_code with query: "API endpoint authentication middleware protection"
|
||||
|
||||
4. Search for input validation
|
||||
search_code with query: "user input validation sanitization XSS prevention"
|
||||
search_code with query: "file upload handling validation security"
|
||||
|
||||
5. Search for sensitive data handling
|
||||
search_code with query: "environment variables secrets API keys configuration"
|
||||
search_code with query: "logging sensitive data personal information"
|
||||
|
||||
6. Launch codebase-detective for deep analysis
|
||||
Investigate each suspicious finding
|
||||
|
||||
7. Generate security report
|
||||
```
|
||||
|
||||
#### Example 5: Migration Planning
|
||||
|
||||
```markdown
|
||||
User: "Plan migration from Express to Fastify"
|
||||
|
||||
## Workflow:
|
||||
|
||||
1. Index codebase
|
||||
|
||||
2. Find all Express usage
|
||||
search_code with query: "Express router middleware application setup"
|
||||
search_code with extensionFilter: [".ts", ".js"], limit: 50
|
||||
|
||||
3. Find route definitions
|
||||
search_code with query: "HTTP route handlers GET POST PUT DELETE endpoints"
|
||||
|
||||
4. Find middleware usage
|
||||
search_code with query: "middleware authentication error handling CORS"
|
||||
|
||||
5. Find specific Express features
|
||||
search_code with query: "express static file serving"
|
||||
search_code with query: "express session management"
|
||||
search_code with query: "express body parser request parsing"
|
||||
|
||||
6. Document all findings
|
||||
Create migration checklist with file locations
|
||||
|
||||
7. Estimate effort
|
||||
Count occurrences, identify complex migrations
|
||||
```
|
||||
|
||||
### Phase 8: Best Practices Summary
|
||||
|
||||
#### Indexing Best Practices
|
||||
|
||||
✅ **DO:**
|
||||
- Use AST splitter for better semantic coherence
|
||||
- Index once, search many times
|
||||
- Check status before re-indexing
|
||||
- Use absolute paths
|
||||
- Add custom extensions for framework-specific files
|
||||
- Use ignore patterns to exclude generated/vendor code
|
||||
|
||||
❌ **DON'T:**
|
||||
- Re-index unnecessarily (wastes time)
|
||||
- Use relative paths (causes errors)
|
||||
- Index without checking Node.js version (v20.x required)
|
||||
- Include minified/bundled files (creates noise)
|
||||
- Force re-index without user confirmation
|
||||
|
||||
#### Search Best Practices
|
||||
|
||||
✅ **DO:**
|
||||
- Use natural language concept queries
|
||||
- Start with limit: 10, adjust as needed
|
||||
- Use extension filters for multi-language projects
|
||||
- Refine queries based on results
|
||||
- Combine semantic search with targeted file reads
|
||||
|
||||
❌ **DON'T:**
|
||||
- Use overly generic queries ("user", "function")
|
||||
- Use regex patterns (use grep instead)
|
||||
- Assume exact naming (defeats semantic search purpose)
|
||||
- Set limit too low (<5) or too high (>30 usually)
|
||||
- Load entire directories when search would suffice
|
||||
|
||||
#### Performance Best Practices
|
||||
|
||||
✅ **DO:**
|
||||
- Use semantic search to reduce token usage
|
||||
- Combine search → read specific files
|
||||
- Monitor indexing progress for large codebases
|
||||
- Use extension filters to narrow search space
|
||||
- Clear old indexes when project structure changes significantly
|
||||
|
||||
❌ **DON'T:**
|
||||
- Read entire directories when searching would work
|
||||
- Index multiple times for the same investigation
|
||||
- Use limit: 50 when 10 would suffice
|
||||
- Search without specifying path (searches everything)
|
||||
|
||||
#### Workflow Best Practices
|
||||
|
||||
✅ **DO:**
|
||||
- Index at start of investigation
|
||||
- Use semantic search before launching agents
|
||||
- Provide search results to codebase-detective
|
||||
- Combine semantic search with grep for precision
|
||||
- Iterate on queries based on results
|
||||
|
||||
❌ **DON'T:**
|
||||
- Skip indexing for large codebases
|
||||
- Launch detective without search context
|
||||
- Rely solely on semantic search (combine tools)
|
||||
- Give up after first search (iterate and refine)
|
||||
|
||||
## Integration with This Plugin
|
||||
|
||||
This Skill works seamlessly with:
|
||||
|
||||
1. **Codebase-Detective Agent** (`plugins/code-analysis/agents/codebase-detective.md`)
|
||||
- Use semantic search to find starting points
|
||||
- Provide search results as context to detective
|
||||
- Detective does deep dive investigation
|
||||
|
||||
2. **Deep Analysis Skill** (`plugins/code-analysis/skills/deep-analysis/SKILL.md`)
|
||||
- Deep analysis invokes detective
|
||||
- Detective uses semantic search (from this skill)
|
||||
- Full workflow: deep-analysis → detective → semantic-search → investigation
|
||||
|
||||
3. **Analyze Command** (`plugins/code-analysis/commands/analyze.md`)
|
||||
- Command triggers deep analysis skill
|
||||
- Skill guides semantic search usage
|
||||
- Complete workflow automation
|
||||
|
||||
## Success Criteria
|
||||
|
||||
This Skill is successful when:
|
||||
|
||||
1. ✅ Codebase is indexed efficiently with appropriate settings
|
||||
2. ✅ Search queries are formulated semantically for best results
|
||||
3. ✅ Token usage is optimized (40% reduction achieved)
|
||||
4. ✅ Search results are relevant and actionable
|
||||
5. ✅ User understands when to use semantic search vs grep
|
||||
6. ✅ Integration with other tools (detective, grep, read) is seamless
|
||||
7. ✅ Performance is optimized (indexing time, search speed, token usage)
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
Before completing a semantic search workflow, ensure:
|
||||
|
||||
- ✅ Checked if path is already indexed (avoid unnecessary re-indexing)
|
||||
- ✅ Used appropriate splitter (AST for code, LangChain for docs)
|
||||
- ✅ Formulated queries using natural language concepts
|
||||
- ✅ Set reasonable result limits (10-20 typically)
|
||||
- ✅ Used extension filters when appropriate
|
||||
- ✅ Provided search results as context to agents
|
||||
- ✅ Explained to user why semantic search was beneficial
|
||||
- ✅ Documented file locations for follow-up investigation
|
||||
|
||||
## Notes
|
||||
|
||||
- Claude-Context MCP requires Node.js v20.x (NOT v24.x)
|
||||
- Requires OPENAI_API_KEY for embeddings
|
||||
- Requires MILVUS_TOKEN for Zilliz Cloud vector database
|
||||
- Achieves ~40% token reduction vs full directory reads
|
||||
- Uses hybrid search: BM25 (keyword) + dense embeddings (semantic)
|
||||
- AST splitter preserves code structure better than character-based
|
||||
- Always use absolute paths, never relative paths
|
||||
- Semantic search complements grep/ripgrep, doesn't replace it
|
||||
- Best for "what does this do?" queries, not "show me line 45"
|
||||
- Integration with codebase-detective creates powerful investigation workflow
|
||||
|
||||
---
|
||||
|
||||
**Maintained by:** Jack Rudenko @ MadAppGang
|
||||
**Plugin:** code-analysis v1.0.0
|
||||
**Last Updated:** November 5, 2024
|
||||
Reference in New Issue
Block a user