Files
gh-secondsky-sap-skills-ski…/references/model-providers.md
2025-11-30 08:54:41 +08:00

390 lines
10 KiB
Markdown

# Model Providers Reference
Complete reference for SAP AI Core model providers and available models.
**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)
**Latest Models:** SAP Note 3437766
---
## Overview
SAP AI Core provides access to models from six providers via the Generative AI Hub. All models are accessed through a unified API, allowing easy switching between providers.
---
## Provider Summary
| Provider | Executable ID | Access Type | Model Categories |
|----------|---------------|-------------|------------------|
| Azure OpenAI | `azure-openai` | Remote | Chat, Embeddings, Vision |
| SAP Open Source | `aicore-opensource` | Local | Chat, Embeddings, Vision |
| Google Vertex AI | `gcp-vertexai` | Remote | Chat, Embeddings, Code |
| AWS Bedrock | `aws-bedrock` | Remote | Chat, Embeddings |
| Mistral AI | `aicore-mistralai` | Local | Chat, Code |
| IBM | `aicore-ibm` | Local | Chat, Code |
---
## 1. Azure OpenAI
**Executable ID:** `azure-openai`
**Access Type:** Remote (Azure-hosted)
### Chat Models
| Model | Context | Capabilities | Use Case |
|-------|---------|--------------|----------|
| gpt-4o | 128K | Chat, Vision | Advanced reasoning, multimodal |
| gpt-4o-mini | 128K | Chat, Vision | Cost-efficient, fast |
| gpt-4-turbo | 128K | Chat, Vision | Previous flagship |
| gpt-4 | 8K/32K | Chat | Reasoning, analysis |
| gpt-35-turbo | 4K/16K | Chat | Fast, economical |
### Embedding Models
| Model | Dimensions | Use Case |
|-------|------------|----------|
| text-embedding-3-large | 3072 | High accuracy embeddings |
| text-embedding-3-small | 1536 | Cost-efficient embeddings |
| text-embedding-ada-002 | 1536 | Legacy embeddings |
### Configuration Example
```json
{
"name": "azure-gpt4o-config",
"executableId": "azure-openai",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "gpt-4o"},
{"key": "modelVersion", "value": "2024-05-13"}
]
}
```
---
## 2. SAP-Hosted Open Source
**Executable ID:** `aicore-opensource`
**Access Type:** Local (SAP-hosted)
### Llama Models
| Model | Parameters | Context | Capabilities |
|-------|------------|---------|--------------|
| llama-3.1-405b | 405B | 128K | Advanced reasoning |
| llama-3.1-70b | 70B | 128K | Strong reasoning |
| llama-3.1-8b | 8B | 128K | Fast, efficient |
| llama-3.2-90b-vision | 90B | 128K | Vision + text |
| llama-3.2-11b-vision | 11B | 128K | Vision + text |
| llama-3.2-3b | 3B | 128K | Lightweight |
| llama-3.2-1b | 1B | 128K | Edge deployment |
### Mistral Models (Open Source)
| Model | Parameters | Context |
|-------|------------|---------|
| mistral-7b-instruct | 7B | 32K |
| mixtral-8x7b | 46.7B | 32K |
### Falcon Models
| Model | Parameters | Context |
|-------|------------|---------|
| falcon-40b | 40B | 2K |
### Configuration Example
```json
{
"name": "llama-config",
"executableId": "aicore-opensource",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "meta--llama-3.1-70b-instruct"},
{"key": "modelVersion", "value": "latest"}
]
}
```
---
## 3. Google Vertex AI
**Executable ID:** `gcp-vertexai`
**Access Type:** Remote (Google Cloud)
### Gemini Models
| Model | Context | Capabilities |
|-------|---------|--------------|
| gemini-2.5-pro | 2M | Chat, Vision, Code, Long context |
| gemini-2.5-flash | 1M | Fast, multimodal |
| gemini-2.5-flash-lite | 1M | Fast, lower-cost multimodal |
| gemini-2.0-flash | 1M | Flash family, multimodal |
| gemini-2.0-flash-lite | 1M | Flash family, lower-cost |
### PaLM 2 Models
| Model | Use Case |
|-------|----------|
| text-bison | Text generation |
| chat-bison | Conversational |
| code-bison | Code generation |
### Embedding Models
| Model | Dimensions |
|-------|------------|
| text-embedding-004 | 768 |
| textembedding-gecko | 768 |
### Configuration Example
```json
{
"name": "gemini-config",
"executableId": "gcp-vertexai",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "gemini-1.5-pro"},
{"key": "modelVersion", "value": "latest"}
]
}
```
---
## 4. AWS Bedrock
**Executable ID:** `aws-bedrock`
**Access Type:** Remote (AWS)
### Anthropic Claude Models
| Model | Context | Capabilities |
|-------|---------|--------------|
| claude-sonnet-4-5 | 200K | Latest, advanced reasoning |
| claude-4-opus | 200K | Highest capability |
| claude-4-sonnet | 200K | Balanced, high performance |
| claude-opus-4-1 | 200K | Extended Opus capabilities |
| claude-3-7-sonnet | 200K | Improved Sonnet 3.5 |
| claude-3-5-sonnet | 200K | Advanced reasoning |
| claude-3-opus | 200K | High capability |
| claude-3-sonnet | 200K | Balanced performance |
| claude-3-haiku | 200K | Fast, efficient |
### Amazon Titan Models
| Model | Use Case |
|-------|----------|
| titan-text-express | General text |
| titan-text-lite | Lightweight |
| titan-embed-text | Embeddings |
### Meta Llama 3 (Bedrock)
| Model | Parameters |
|-------|------------|
| llama-3-70b | 70B |
| llama-3-8b | 8B |
### Configuration Example
```json
{
"name": "claude-config",
"executableId": "aws-bedrock",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "anthropic--claude-3-5-sonnet"},
{"key": "modelVersion", "value": "latest"}
]
}
```
---
## 5. Mistral AI
**Executable ID:** `aicore-mistralai`
**Access Type:** Local (SAP-hosted)
### Models
| Model | Parameters | Context | Use Case |
|-------|------------|---------|----------|
| mistral-large | - | 32K | Advanced reasoning |
| mistral-medium | - | 32K | Balanced |
| mistral-small | - | 32K | Cost-efficient |
| codestral | - | 32K | Code generation |
### Configuration Example
```json
{
"name": "mistral-config",
"executableId": "aicore-mistralai",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "mistralai--mistral-large"},
{"key": "modelVersion", "value": "latest"}
]
}
```
---
## 6. IBM
**Executable ID:** `aicore-ibm`
**Access Type:** Local (SAP-hosted)
### Granite Models
| Model | Parameters | Use Case |
|-------|------------|----------|
| granite-13b-chat | 13B | Conversational |
| granite-13b-instruct | 13B | Task completion |
| granite-code | - | Code generation |
### Configuration Example
```json
{
"name": "granite-config",
"executableId": "aicore-ibm",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "ibm--granite-13b-chat"},
{"key": "modelVersion", "value": "latest"}
]
}
```
---
## Model Selection Guide
### By Use Case
| Use Case | Recommended Models |
|----------|-------------------|
| General chat | gpt-4o, claude-3-5-sonnet, gemini-1.5-pro |
| Code generation | gpt-4o, codestral, claude-3-5-sonnet |
| Long documents | gemini-1.5-pro (2M), claude-3 (200K), gpt-4o (128K) |
| Vision/images | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Embeddings | text-embedding-3-large, text-embedding-004 |
| Cost-sensitive | gpt-4o-mini, mistral-small, llama-3.1-8b |
| High throughput | gpt-35-turbo, claude-3-haiku, mistral-small |
### By Budget
| Budget | Tier | Models |
|--------|------|--------|
| Low | Economy | gpt-4o-mini, claude-3-haiku, mistral-small |
| Medium | Standard | gpt-4o, claude-3-sonnet, gemini-1.5-flash |
| High | Premium | claude-3-opus, gpt-4-turbo, gemini-1.5-pro |
### By Capability
| Capability | Best Models |
|------------|-------------|
| Reasoning | claude-3-opus, gpt-4o, llama-3.1-405b |
| Speed | claude-3-haiku, gpt-35-turbo, mistral-small |
| Context length | gemini-1.5-pro (2M), claude-3 (200K) |
| Multimodal | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Code | codestral, gpt-4o, claude-3-5-sonnet |
---
## Model Version Management
### Version Strategies
| Strategy | Configuration | Use Case |
|----------|---------------|----------|
| Latest | `"modelVersion": "latest"` | Development, auto-upgrade |
| Pinned | `"modelVersion": "2024-05-13"` | Production stability |
### Checking Available Versions
```bash
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" | \
jq '.resources[] | select(.model == "gpt-4o") | .versions'
```
### Handling Deprecation
1. Monitor `deprecationDate` in model metadata
2. Plan migration before `retirementDate`
3. Test new version in staging
4. Update configuration with new version
5. Patch existing deployments
---
## Pricing Considerations
Pricing varies by:
- Model complexity (larger = more expensive)
- Input vs output tokens (output often 2-3x input cost)
- Provider region
- Access type (Remote vs Local)
**Reference:** SAP Note 3437766 for current token rates.
### Cost Optimization
1. **Right-size models**: Use smaller models for simple tasks
2. **Batch requests**: Combine multiple queries when possible
3. **Cache responses**: Store and reuse common query results
4. **Limit tokens**: Set appropriate `max_tokens` limits
5. **Use streaming**: No additional cost, better UX
---
## Rate Limits
Rate limits vary by:
- Service plan tier
- Model provider
- Specific model
**Default limits** (vary by configuration):
- Requests per minute: 60-600
- Tokens per minute: 40K-400K
### Handling Rate Limits
```python
import time
from requests.exceptions import HTTPError
def call_with_retry(func, max_retries=3):
for attempt in range(max_retries):
try:
return func()
except HTTPError as e:
if e.response.status_code == 429:
wait_time = 2 ** attempt
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded")
```
---
## Documentation Links
- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
- SAP Note 3437766: Token rates, limits, deprecation dates
- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)