Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:47:28 +08:00
commit ab01918101
4 changed files with 594 additions and 0 deletions

View File

@@ -0,0 +1,533 @@
---
description: Guide for embedding mcp-vector-search as a library in MCP servers
---
# mcp-vector-search Library Integration Guide
Comprehensive guide for embedding mcp-vector-search within your own MCP server to provide semantic search capabilities over bundled documentation or resources.
## Overview
mcp-vector-search can be used as a library to add semantic search to custom MCP servers. This enables you to bundle documentation with your server and provide semantic search without requiring users to configure vector search separately.
**Use cases:**
- Bundle documentation with your MCP server
- Provide semantic search over application-specific knowledge bases
- Combine vector search with your own custom tools
- Deliver turnkey MCP servers with integrated search
## Adding the Dependency
Add mcp-vector-search to your `deps.edn`:
```clojure
{:deps
{org.hugoduncan/mcp-vector-search
{:git/url "https://github.com/hugoduncan/mcp-vector-search"
:git/sha "LATEST_SHA_HERE"}}}
```
**Important**: Replace `LATEST_SHA_HERE` with the latest commit SHA from the repository. You can find this by visiting the repository or running:
```bash
git ls-remote https://github.com/hugoduncan/mcp-vector-search HEAD
```
## Classpath vs Filesystem Sources
mcp-vector-search supports two types of source specifications:
### Filesystem Sources
Use the `:path` key with absolute paths:
```clojure
{:sources [{:path "/Users/me/docs/**/*.md"}]}
```
**Characteristics:**
- Absolute paths starting with `/`
- Files read from the filesystem
- Support file watching for automatic re-indexing
- Used for development and user-specific content
### Classpath Sources
Use the `:class-path` key with relative paths:
```clojure
{:sources [{:class-path "docs/**/*.md"}]}
```
**Characteristics:**
- Relative paths without leading `/`
- Resources discovered from classpath (JARs, resource directories)
- No file watching (read-only resources)
- Used for bundled, library-embedded content
**When to use each:**
- **Filesystem**: Development, user-specific content, files that change
- **Classpath**: Production, bundled resources, static documentation
You can mix both types in a single configuration to combine bundled resources with user-specific content.
## Resource Bundling
### Directory Structure
Organize your project to bundle resources:
```
my-mcp-server/
├── resources/
│ ├── .mcp-vector-search/
│ │ └── config.edn # Configuration for bundled resources
│ └── docs/
│ ├── guide.md
│ ├── api/
│ │ └── reference.md
│ └── examples/
│ └── usage.md
├── src/
│ └── my_server/
│ └── main.clj
└── deps.edn
```
**Key points:**
- Place documentation in `resources/` directory
- Configuration goes in `resources/.mcp-vector-search/config.edn`
- Resources will be bundled in the JAR when you package your server
- Standard Clojure project structure with `resources/` on the classpath
### Configuration for Classpath Resources
Create `resources/.mcp-vector-search/config.edn`:
```clojure
{:description "Search my-server documentation"
:sources [{:class-path "docs/**/*.md"
:name "Documentation"
:type "docs"}]}
```
**Configuration rules:**
- Use `:class-path` instead of `:path`
- Paths are relative (no leading `/`)
- All path spec features work: globs (`**/*.md`), named captures (`(?<category>[^/]+)`)
- Metadata extraction works identically
- File watching (`:watch?`) has no effect on classpath sources
### Starting the Server
In your MCP server's main namespace:
```clojure
(ns my-server.main
(:require
[mcp-vector-search.main :as vector-search]))
(defn -main []
;; Start vector search with classpath config
;; The config at resources/.mcp-vector-search/config.edn
;; will be found automatically on the classpath
(vector-search/start)
;; Your server code here
...)
```
The server will:
1. Search for `.mcp-vector-search/config.edn` on the classpath
2. Discover and index resources from `:class-path` sources
3. Provide the search tool via MCP
## Configuration Loading Order
When using mcp-vector-search as a library, configuration is loaded from the first location found:
1. **Classpath**: `resources/.mcp-vector-search/config.edn` (bundled in JAR)
2. **Project directory**: `./.mcp-vector-search/config.edn` (runtime override)
3. **Home directory**: `~/.mcp-vector-search/config.edn` (user-specific)
This loading order enables:
- **Bundling defaults**: Ship default configuration in your JAR
- **Project overrides**: Provide project-specific config during development
- **User customization**: Allow users to customize via home directory
**Example workflow:**
Development with project override:
```bash
# Create project-specific config during development
mkdir -p .mcp-vector-search
cat > .mcp-vector-search/config.edn <<EOF
{:description "Development search with local files"
:watch? true
:sources [{:path "/Users/me/project/docs/**/*.md"}]}
EOF
```
Production with bundled config:
```bash
# Build JAR - includes resources/.mcp-vector-search/config.edn
clojure -T:build uber
# Run - uses bundled config
java -jar target/my-server.jar
```
## JAR Packaging
### Build Configuration
Ensure `resources/` is in your `:paths` in `deps.edn`:
```clojure
{:paths ["src" "resources"]
:deps {org.clojure/clojure {:mvn/version "1.12.3"}
org.hugoduncan/mcp-vector-search
{:git/url "https://github.com/hugoduncan/mcp-vector-search"
:git/sha "LATEST_SHA_HERE"}}
:aliases
{:build {:deps {io.github.clojure/tools.build {:mvn/version "0.9.6"}}
:ns-default build}}}
```
### Build Script
Create `build.clj`:
```clojure
(ns build
(:require [clojure.tools.build.api :as b]))
(def class-dir "target/classes")
(def uber-file "target/my-server.jar")
(defn uber [_]
(b/copy-dir {:src-dirs ["src" "resources"]
:target-dir class-dir})
(b/compile-clj {:basis (b/create-basis)
:class-dir class-dir})
(b/uber {:class-dir class-dir
:uber-file uber-file
:basis (b/create-basis)
:main 'my-server.main}))
```
**Important**: Include both `"src"` and `"resources"` in `:src-dirs` to bundle all resources.
### Building and Running
```bash
# Build the JAR
clojure -T:build uber
# Verify resources are included
jar tf target/my-server.jar | grep -E "(docs|.mcp-vector-search)"
# Run the server
java -jar target/my-server.jar
```
Users can now run your MCP server with semantic search over your bundled documentation, without any configuration needed.
## Example Configurations
### Bundled API Documentation
```clojure
;; resources/.mcp-vector-search/config.edn
{:description "Search API documentation and examples"
:sources [{:class-path "api-docs/**/*.md"
:category "api"}
{:class-path "examples/**/*.clj"
:ingest :namespace-doc
:category "examples"}]}
```
### Mixed Classpath and Filesystem
Combine bundled resources with filesystem sources:
```clojure
{:description "Search bundled docs and local project files"
:sources [
;; Bundled documentation (from JAR)
{:class-path "lib-docs/**/*.md"
:source "library"
:type "docs"}
;; Local project documentation (filesystem)
{:path "/Users/me/project/docs/**/*.md"
:source "project"
:type "docs"}
;; Local source code (filesystem)
{:path "/Users/me/project/src/**/*.clj"
:ingest :namespace-doc
:source "project"
:type "source"}]}
```
### Multi-Language Documentation
Extract language metadata from paths:
```clojure
{:description "Search multi-language documentation"
:sources [{:class-path "docs/(?<lang>en|es|fr)/**/*.md"
:type "documentation"}]}
```
For resource `docs/en/guide.md`:
- Metadata: `{:type "documentation", :lang "en"}`
### Code Analysis for API Discovery
```clojure
{:description "Search bundled API documentation"
:sources [{:class-path "src/**/*.clj"
:ingest :code-analysis
:visibility :public-only
:category "api"}]}
```
## Complete Example
### Directory Structure
```
my-mcp-server/
├── resources/
│ ├── .mcp-vector-search/
│ │ └── config.edn
│ └── docs/
│ ├── README.md
│ └── api/
│ └── reference.md
├── src/
│ └── my_server/
│ └── main.clj
├── deps.edn
└── build.clj
```
### Configuration File
```clojure
;; resources/.mcp-vector-search/config.edn
{:description "Search my-server documentation"
:sources [{:class-path "docs/**/*.md"
:project "my-server"
:type "documentation"}]}
```
### Main Server Code
```clojure
;; src/my_server/main.clj
(ns my-server.main
(:require
[mcp-vector-search.main :as vector-search]
[mcp-vector-search.server :as mcp-server]))
(defn -main []
;; Start vector search (loads classpath config automatically)
(vector-search/start)
;; Add your custom tools here
;; ...
;; Start MCP server
(mcp-server/start!))
```
### Dependencies
```clojure
;; deps.edn
{:paths ["src" "resources"]
:deps {org.clojure/clojure {:mvn/version "1.12.3"}
org.hugoduncan/mcp-vector-search
{:git/url "https://github.com/hugoduncan/mcp-vector-search"
:git/sha "LATEST_SHA_HERE"}}
:aliases
{:build {:deps {io.github.clojure/tools.build {:mvn/version "0.9.6"}}
:ns-default build}}}
```
### Build Configuration
```clojure
;; build.clj
(ns build
(:require [clojure.tools.build.api :as b]))
(def class-dir "target/classes")
(def uber-file "target/my-server.jar")
(defn uber [_]
(b/copy-dir {:src-dirs ["src" "resources"]
:target-dir class-dir})
(b/compile-clj {:basis (b/create-basis)
:class-dir class-dir})
(b/uber {:class-dir class-dir
:uber-file uber-file
:basis (b/create-basis)
:main 'my-server.main}))
```
## Limitations
### No File Watching for Classpath Sources
Classpath sources do not support file watching:
- Resources in JARs are read-only
- Classpath resources are indexed once at startup
- The `:watch?` flag has no effect on `:class-path` sources
**Workaround for development**: Use filesystem sources instead:
```clojure
;; Development config (local file) - with watching
{:watch? true
:sources [{:path "/Users/me/project/resources/docs/**/*.md"}]}
;; Production config (bundled) - no watching needed
{:sources [{:class-path "docs/**/*.md"}]}
```
### Resource Discovery Performance
Classpath resource discovery walks the entire classpath:
- Use specific base paths to limit scope (e.g., `docs/**/*.md` not `**/*.md`)
- Initial indexing may take longer than filesystem sources
- Avoid overly broad patterns for large classpaths
## Troubleshooting
### Configuration Not Found
**Error**: `No configuration file found`
**Check:**
1. File is in `resources/.mcp-vector-search/config.edn`
2. `resources/` is on the classpath (should be automatic in `:paths`)
3. File is valid EDN syntax
4. No typos in directory or file name
**Verify config is on classpath:**
```clojure
(require '[clojure.java.io :as io])
(io/resource ".mcp-vector-search/config.edn") ;; Should return URL
```
### Resources Not Indexed
**Error**: `No documents indexed from classpath sources`
**Check:**
1. Resources exist in `resources/` directory
2. Path patterns match your directory structure
3. No leading `/` in `:class-path` values (should be relative)
4. Resources are included in the JAR (for packaged servers)
**Verify resources are on classpath:**
```clojure
(require '[clojure.java.io :as io])
(io/resource "docs/guide.md") ;; Should return URL
```
### JAR Build Issues
**Error**: Resources missing from JAR
**Check:**
1. Ensure `resources/` is in `:paths` in `deps.edn`:
```clojure
{:paths ["src" "resources"]
:deps {...}}
```
2. Ensure `resources/` is in `:src-dirs` in `build.clj`:
```clojure
(b/copy-dir {:src-dirs ["src" "resources"]
:target-dir class-dir})
```
3. Verify the JAR contains resources:
```bash
jar tf my-server.jar | grep docs
jar tf my-server.jar | grep .mcp-vector-search
```
### Path Pattern Issues
**Error**: Some resources not matched
**Common mistakes:**
- Using leading `/` with `:class-path` (should be relative)
- Forgetting `**` for recursive matching
- Regex special characters not escaped in captures
**Test patterns:**
```clojure
;; Wrong - leading slash
{:class-path "/docs/**/*.md"}
;; Correct - relative path
{:class-path "docs/**/*.md"}
;; Wrong - single glob doesn't recurse
{:class-path "docs/*.md"} ;; only matches docs/file.md
;; Correct - recursive glob
{:class-path "docs/**/*.md"} ;; matches docs/api/file.md
```
## Tips and Best Practices
**Resource organization:**
- Place all bundled content in `resources/` directory
- Use clear directory structure (e.g., `docs/`, `api/`, `examples/`)
- Include `.mcp-vector-search/config.edn` in `resources/`
- Keep resources focused on bundled content
**Configuration strategy:**
- Ship sensible defaults in `resources/.mcp-vector-search/config.edn`
- Use relative paths with `:class-path` for bundled content
- Document how users can override via project or home directory config
- Provide example configs in your server's README
**Path specifications:**
- Use specific base paths (e.g., `docs/**/*.md` not `**/*.md`)
- Leverage named captures for metadata extraction
- Test path patterns during development
- Avoid overly broad patterns for performance
**Build process:**
- Always include `"resources"` in both `:paths` and `:src-dirs`
- Verify JAR contents after building
- Test the JAR in a clean environment
- Document build requirements in your README
**Development workflow:**
- Use filesystem sources with `:watch?` during development
- Switch to classpath sources for production builds
- Keep development and production configs separate
- Test with both config types before releasing
**Testing:**
- Verify config loading from classpath
- Check resource discovery and indexing
- Test search functionality with bundled resources
- Validate JAR packaging and distribution
## See Also
- **doc/using.md** - Complete configuration reference for all strategies and options
- **doc/path-spec.md** - Formal syntax for path specifications and captures
- **doc/about.md** - Project purpose and use cases
- **README.md** - Quick start guide for mcp-vector-search