Files
gh-jezweb-claude-skills-ski…/references/integration-workers-ai-bge-base.md
2025-11-30 08:24:34 +08:00

9.7 KiB

Workers AI Integration Example (@cf/baai/bge-base-en-v1.5)

Complete working example using Cloudflare Workers AI for embeddings with Vectorize.

Model Specifications

  • Model: @cf/baai/bge-base-en-v1.5
  • Dimensions: 768
  • Metric: cosine (recommended)
  • Max Input: ~512 tokens (~2000 characters)
  • Cost: Free
  • Latency: ~50-200ms (edge-optimized)

Setup

1. Create Vectorize Index

npx wrangler vectorize create docs-search \
  --dimensions=768 \
  --metric=cosine \
  --description="Documentation search with Workers AI"

2. Create Metadata Indexes

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=category --type=string

npx wrangler vectorize create-metadata-index docs-search \
  --property-name=published_at --type=number

3. Configure Wrangler

wrangler.jsonc:

{
  "name": "vectorize-workers-ai-example",
  "main": "src/index.ts",
  "compatibility_date": "2025-10-21",
  "ai": {
    "binding": "AI"
  },
  "vectorize": [
    {
      "binding": "VECTORIZE_INDEX",
      "index_name": "docs-search"
    }
  ]
}

Complete Worker Example

export interface Env {
  AI: Ai;
  VECTORIZE_INDEX: VectorizeIndex;
}

interface Document {
  id: string;
  title: string;
  content: string;
  category?: string;
  url?: string;
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const url = new URL(request.url);

    // INDEX DOCUMENTS
    if (url.pathname === '/index' && request.method === 'POST') {
      try {
        const { documents } = await request.json() as { documents: Document[] };

        if (!documents || !Array.isArray(documents)) {
          return Response.json({ error: 'Invalid documents array' }, { status: 400 });
        }

        // Extract text for embedding
        const texts = documents.map(doc => doc.content);

        // Generate embeddings (batch)
        const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
          text: texts
        });

        // Prepare vectors
        const vectors = documents.map((doc, i) => ({
          id: doc.id,
          values: embeddings.data[i],
          metadata: {
            title: doc.title,
            content: doc.content,
            category: doc.category || 'general',
            url: doc.url,
            published_at: Math.floor(Date.now() / 1000),
          },
        }));

        // Upsert to Vectorize
        await env.VECTORIZE_INDEX.upsert(vectors);

        return Response.json({
          success: true,
          indexed: vectors.length,
          ids: vectors.map(v => v.id),
        });
      } catch (error) {
        return Response.json({
          error: error instanceof Error ? error.message : 'Unknown error',
        }, { status: 500 });
      }
    }

    // SEARCH
    if (url.pathname === '/search' && request.method === 'POST') {
      try {
        const { query, topK = 5, filter } = await request.json() as {
          query: string;
          topK?: number;
          filter?: Record<string, any>;
        };

        if (!query) {
          return Response.json({ error: 'Missing query' }, { status: 400 });
        }

        // Generate query embedding
        const queryEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
          text: query,
        });

        // Search Vectorize
        const results = await env.VECTORIZE_INDEX.query(
          queryEmbedding.data[0],
          {
            topK,
            filter,
            returnMetadata: 'all',
            returnValues: false,
          }
        );

        return Response.json({
          query,
          results: results.matches.map(match => ({
            id: match.id,
            score: match.score,
            title: match.metadata?.title,
            content: match.metadata?.content,
            category: match.metadata?.category,
            url: match.metadata?.url,
          })),
          count: results.count,
        });
      } catch (error) {
        return Response.json({
          error: error instanceof Error ? error.message : 'Unknown error',
        }, { status: 500 });
      }
    }

    // DEFAULT: API Documentation
    return Response.json({
      name: 'Vectorize + Workers AI Example',
      endpoints: {
        'POST /index': {
          description: 'Index documents',
          body: {
            documents: [
              {
                id: 'doc-1',
                title: 'Document Title',
                content: 'Document content for embedding',
                category: 'tutorials',
                url: '/docs/getting-started',
              },
            ],
          },
        },
        'POST /search': {
          description: 'Semantic search',
          body: {
            query: 'search query text',
            topK: 5,
            filter: { category: 'tutorials' },
          },
        },
      },
    });
  },
};

Usage Examples

1. Index Documents

curl -X POST https://your-worker.workers.dev/index \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "id": "workers-intro",
        "title": "Introduction to Cloudflare Workers",
        "content": "Cloudflare Workers allow you to deploy serverless code globally across Cloudflare's edge network. Workers run on V8 isolates providing fast cold starts.",
        "category": "documentation",
        "url": "/workers/getting-started"
      },
      {
        "id": "vectorize-intro",
        "title": "Introduction to Vectorize",
        "content": "Vectorize is a globally distributed vector database for semantic search and AI applications. It integrates seamlessly with Workers AI for embedding generation.",
        "category": "documentation",
        "url": "/vectorize/getting-started"
      },
      {
        "id": "d1-intro",
        "title": "Introduction to D1",
        "content": "D1 is Cloudflare's serverless SQL database built on SQLite. It provides familiar SQL semantics with global distribution.",
        "category": "documentation",
        "url": "/d1/getting-started"
      }
    ]
  }'
curl -X POST https://your-worker.workers.dev/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I deploy serverless functions?",
    "topK": 3,
    "filter": { "category": "documentation" }
  }'

Response:

{
  "query": "How do I deploy serverless functions?",
  "results": [
    {
      "id": "workers-intro",
      "score": 0.87,
      "title": "Introduction to Cloudflare Workers",
      "content": "Cloudflare Workers allow you to deploy...",
      "category": "documentation",
      "url": "/workers/getting-started"
    },
    {
      "id": "vectorize-intro",
      "score": 0.62,
      "title": "Introduction to Vectorize",
      "content": "Vectorize is a globally distributed...",
      "category": "documentation",
      "url": "/vectorize/getting-started"
    }
  ],
  "count": 2
}

Performance Tips

1. Batch Embeddings

// ✅ Good: Single API call for multiple texts
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [text1, text2, text3, ...] // Up to 100 texts
});

// ❌ Bad: Multiple API calls
for (const text of texts) {
  const embedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
    text
  });
}

2. Optimize Return Data

// Only return what you need
const results = await env.VECTORIZE_INDEX.query(queryVector, {
  topK: 5,
  returnValues: false,     // Don't return 768 floats per result
  returnMetadata: 'all',   // Return metadata only
});

3. Use Filters

// Narrow search scope with metadata filters
const results = await env.VECTORIZE_INDEX.query(queryVector, {
  topK: 5,
  filter: {
    category: 'documentation',
    published_at: { $gte: lastWeek }
  }
});

Error Handling

async function generateEmbedding(text: string, env: Env): Promise<number[]> {
  try {
    const response = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: text.trim().slice(0, 2000) // Truncate to model limits
    });

    if (!response?.data?.[0]) {
      throw new Error('No embedding returned from Workers AI');
    }

    if (response.data[0].length !== 768) {
      throw new Error(`Expected 768 dimensions, got ${response.data[0].length}`);
    }

    return response.data[0];
  } catch (error) {
    console.error('Workers AI embedding error:', error);
    throw new Error(`Failed to generate embedding: ${error instanceof Error ? error.message : 'Unknown error'}`);
  }
}

Testing Locally

# Install dependencies
npm install

# Run dev server
npx wrangler dev

# Test indexing
curl -X POST http://localhost:8787/index \
  -H "Content-Type: application/json" \
  -d '{"documents":[{"id":"test-1","title":"Test","content":"Test content"}]}'

# Test search
curl -X POST http://localhost:8787/search \
  -H "Content-Type: application/json" \
  -d '{"query":"test","topK":5}'

Deployment

# Deploy to production
npx wrangler deploy

# View logs
npx wrangler tail

Common Issues

"Embedding dimensions don't match"

  • Cause: Index created with wrong dimensions
  • Fix: Ensure index has 768 dimensions for bge-base-en-v1.5

"Text too long for model"

  • Cause: Input text exceeds ~2000 characters
  • Fix: Truncate or chunk text before embedding

"Rate limiting"

  • Cause: Too many concurrent requests
  • Fix: Workers AI scales automatically, but add retry logic for safety

See Also