Initial commit

2025-11-30 08:24:38 +08:00
commit b41966ed51
12 changed files with 3508 additions and 0 deletions
--- a/references/models-catalog.md
+++ b/references/models-catalog.md
@@ -0,0 +1,245 @@
+# Cloudflare Workers AI - Models Catalog
+
+Complete catalog of Workers AI models organized by task type.
+
+**Last Updated**: 2025-10-21
+**Official Catalog**: https://developers.cloudflare.com/workers-ai/models/
+
+---
+
+## Text Generation (LLMs)
+
+### Meta Llama Models
+
+| Model ID | Size | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/meta/llama-3.1-8b-instruct` | 8B | General purpose, balanced | 300/min |
+| `@cf/meta/llama-3.1-8b-instruct-fast` | 8B | Faster inference | 300/min |
+| `@cf/meta/llama-3.2-1b-instruct` | 1B | Ultra-fast, simple tasks | 300/min |
+| `@cf/meta/llama-3.2-3b-instruct` | 3B | Fast, good quality | 300/min |
+| `@cf/meta/llama-2-7b-chat-int8` | 7B | Legacy, reliable | 300/min |
+| `@cf/meta/llama-2-13b-chat-awq` | 13B | Higher quality (slower) | 300/min |
+
+### Qwen Models
+
+| Model ID | Size | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/qwen/qwen1.5-14b-chat-awq` | 14B | High quality, complex reasoning | 150/min |
+| `@cf/qwen/qwen1.5-7b-chat-awq` | 7B | Balanced quality/speed | 300/min |
+| `@cf/qwen/qwen1.5-1.8b-chat` | 1.8B | Fast, lightweight | 720/min |
+| `@cf/qwen/qwen1.5-0.5b-chat` | 0.5B | Ultra-fast, ultra-lightweight | 1500/min |
+
+### Mistral Models
+
+| Model ID | Size | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@hf/thebloke/mistral-7b-instruct-v0.1-awq` | 7B | Fast, efficient | 400/min |
+| `@hf/thebloke/openhermes-2.5-mistral-7b-awq` | 7B | Instruction following | 300/min |
+
+### DeepSeek Models
+
+| Model ID | Size | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b` | 32B | Coding, technical content | 300/min |
+| `@cf/deepseek-ai/deepseek-coder-6.7b-instruct-awq` | 6.7B | Code generation | 300/min |
+
+### Other Models
+
+| Model ID | Size | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/tinyllama/tinyllama-1.1b-chat-v1.0` | 1.1B | Extremely fast, limited capability | 720/min |
+| `@cf/microsoft/phi-2` | 2.7B | Fast, efficient | 720/min |
+| `@cf/google/gemma-2b-it-lora` | 2B | Instruction tuned | 300/min |
+| `@cf/google/gemma-7b-it-lora` | 7B | Higher quality | 300/min |
+
+---
+
+## Text Embeddings
+
+| Model ID | Dimensions | Best For | Rate Limit |
+|----------|-----------|----------|------------|
+| `@cf/baai/bge-base-en-v1.5` | 768 | General purpose RAG | 3000/min |
+| `@cf/baai/bge-large-en-v1.5` | 1024 | High accuracy search | 1500/min |
+| `@cf/baai/bge-small-en-v1.5` | 384 | Fast, low storage | 3000/min |
+| `@cf/baai/bge-m3` | 1024 | Multilingual | 3000/min |
+
+**Use Case**: RAG, semantic search, similarity detection, clustering
+
+---
+
+## Image Generation
+
+| Model ID | Type | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/black-forest-labs/flux-1-schnell` | Text-to-Image | Photorealistic, high quality | 720/min |
+| `@cf/stabilityai/stable-diffusion-xl-base-1.0` | Text-to-Image | General purpose | 720/min |
+| `@cf/lykon/dreamshaper-8-lcm` | Text-to-Image | Artistic, stylized | 720/min |
+| `@cf/runwayml/stable-diffusion-v1-5-img2img` | Image-to-Image | Transform images | 1500/min |
+| `@cf/runwayml/stable-diffusion-v1-5-inpainting` | Inpainting | Edit specific areas | 1500/min |
+| `@cf/bytedance/stable-diffusion-xl-lightning` | Text-to-Image | Fast generation | 720/min |
+
+**Output**: PNG images (~5 MB max)
+
+---
+
+## Vision Models
+
+| Model ID | Task | Best For | Rate Limit |
+|----------|------|----------|------------|
+| `@cf/meta/llama-3.2-11b-vision-instruct` | Image Understanding | Q&A, captioning, analysis | 720/min |
+| `@cf/unum/uform-gen2-qwen-500m` | Image Captioning | Fast captions | 720/min |
+
+**Input**: Base64-encoded images
+
+---
+
+## Translation
+
+| Model ID | Languages | Rate Limit |
+|----------|-----------|------------|
+| `@cf/meta/m2m100-1.2b` | 100+ languages | 720/min |
+
+**Supported Language Pairs**: https://developers.cloudflare.com/workers-ai/models/m2m100-1.2b/
+
+---
+
+## Text Classification
+
+| Model ID | Task | Rate Limit |
+|----------|------|------------|
+| `@cf/huggingface/distilbert-sst-2-int8` | Sentiment analysis | 2000/min |
+| `@hf/thebloke/openhermes-2.5-mistral-7b-awq` | General classification | 300/min |
+
+**Output**: Label + confidence score
+
+---
+
+## Automatic Speech Recognition
+
+| Model ID | Best For | Rate Limit |
+|----------|----------|------------|
+| `@cf/openai/whisper` | General transcription | 720/min |
+| `@cf/openai/whisper-tiny-en` | English only, fast | 720/min |
+
+**Input**: Audio files (MP3, WAV, etc.)
+
+---
+
+## Object Detection
+
+| Model ID | Task | Rate Limit |
+|----------|------|------------|
+| `@cf/facebook/detr-resnet-50` | Object detection | 3000/min |
+
+**Output**: Bounding boxes + labels
+
+---
+
+## Image Classification
+
+| Model ID | Classes | Rate Limit |
+|----------|---------|------------|
+| `@cf/microsoft/resnet-50` | 1000 ImageNet classes | 3000/min |
+
+**Output**: Top-5 predictions with probabilities
+
+---
+
+## Summarization
+
+| Model ID | Best For | Rate Limit |
+|----------|----------|------------|
+| `@cf/facebook/bart-large-cnn` | News articles, documents | 1500/min |
+
+---
+
+## Text-to-Image (Legacy)
+
+| Model ID | Type | Rate Limit |
+|----------|------|------------|
+| `@cf/stabilityai/stable-diffusion-v1-5-img2img` | Image-to-Image | 1500/min |
+
+---
+
+## Model Selection Guide
+
+### For Text Generation
+
+**Speed Priority:**
+1. `@cf/qwen/qwen1.5-0.5b-chat` (1500/min)
+2. `@cf/meta/llama-3.2-1b-instruct` (300/min)
+3. `@cf/tinyllama/tinyllama-1.1b-chat-v1.0` (720/min)
+
+**Quality Priority:**
+1. `@cf/qwen/qwen1.5-14b-chat-awq` (150/min)
+2. `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b` (300/min)
+3. `@cf/meta/llama-3.1-8b-instruct` (300/min)
+
+**Balanced:**
+1. `@cf/meta/llama-3.1-8b-instruct` (300/min)
+2. `@hf/thebloke/mistral-7b-instruct-v0.1-awq` (400/min)
+3. `@cf/qwen/qwen1.5-7b-chat-awq` (300/min)
+
+### For Embeddings
+
+**General Purpose RAG:**
+- `@cf/baai/bge-base-en-v1.5` (768 dims, 3000/min)
+
+**High Accuracy:**
+- `@cf/baai/bge-large-en-v1.5` (1024 dims, 1500/min)
+
+**Fast/Low Storage:**
+- `@cf/baai/bge-small-en-v1.5` (384 dims, 3000/min)
+
+### For Image Generation
+
+**Best Quality:**
+- `@cf/black-forest-labs/flux-1-schnell`
+
+**General Purpose:**
+- `@cf/stabilityai/stable-diffusion-xl-base-1.0`
+
+**Artistic/Stylized:**
+- `@cf/lykon/dreamshaper-8-lcm`
+
+**Fast:**
+- `@cf/bytedance/stable-diffusion-xl-lightning`
+
+---
+
+## Rate Limits Summary
+
+| Task Type | Default Limit | High-Speed Models |
+|-----------|---------------|-------------------|
+| Text Generation | 300/min | 400-1500/min |
+| Text Embeddings | 3000/min | 1500/min (large) |
+| Image Generation | 720/min | 720/min |
+| Vision Models | 720/min | 720/min |
+| Translation | 720/min | 720/min |
+| Classification | 2000/min | 2000/min |
+| Speech Recognition | 720/min | 720/min |
+| Object Detection | 3000/min | 3000/min |
+
+---
+
+## Pricing (Neurons)
+
+Pricing varies by model. Common examples:
+
+| Model | Input (1M tokens) | Output (1M tokens) |
+|-------|-------------------|-------------------|
+| Llama 3.2 1B | $0.027 | $0.201 |
+| Llama 3.1 8B | $0.088 | $0.606 |
+| BGE-base embeddings | $0.005 | N/A |
+| Flux image gen | ~$0.011/image | N/A |
+
+**Free Tier**: 10,000 neurons/day
+**Paid Tier**: $0.011 per 1,000 neurons
+
+---
+
+## References
+
+- [Official Models Catalog](https://developers.cloudflare.com/workers-ai/models/)
+- [Rate Limits](https://developers.cloudflare.com/workers-ai/platform/limits/)
+- [Pricing](https://developers.cloudflare.com/workers-ai/platform/pricing/)