390 lines
10 KiB
Markdown
390 lines
10 KiB
Markdown
# Model Providers Reference
|
|
|
|
Complete reference for SAP AI Core model providers and available models.
|
|
|
|
**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)
|
|
|
|
**Latest Models:** SAP Note 3437766
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
SAP AI Core provides access to models from six providers via the Generative AI Hub. All models are accessed through a unified API, allowing easy switching between providers.
|
|
|
|
---
|
|
|
|
## Provider Summary
|
|
|
|
| Provider | Executable ID | Access Type | Model Categories |
|
|
|----------|---------------|-------------|------------------|
|
|
| Azure OpenAI | `azure-openai` | Remote | Chat, Embeddings, Vision |
|
|
| SAP Open Source | `aicore-opensource` | Local | Chat, Embeddings, Vision |
|
|
| Google Vertex AI | `gcp-vertexai` | Remote | Chat, Embeddings, Code |
|
|
| AWS Bedrock | `aws-bedrock` | Remote | Chat, Embeddings |
|
|
| Mistral AI | `aicore-mistralai` | Local | Chat, Code |
|
|
| IBM | `aicore-ibm` | Local | Chat, Code |
|
|
|
|
---
|
|
|
|
## 1. Azure OpenAI
|
|
|
|
**Executable ID:** `azure-openai`
|
|
**Access Type:** Remote (Azure-hosted)
|
|
|
|
### Chat Models
|
|
|
|
| Model | Context | Capabilities | Use Case |
|
|
|-------|---------|--------------|----------|
|
|
| gpt-4o | 128K | Chat, Vision | Advanced reasoning, multimodal |
|
|
| gpt-4o-mini | 128K | Chat, Vision | Cost-efficient, fast |
|
|
| gpt-4-turbo | 128K | Chat, Vision | Previous flagship |
|
|
| gpt-4 | 8K/32K | Chat | Reasoning, analysis |
|
|
| gpt-35-turbo | 4K/16K | Chat | Fast, economical |
|
|
|
|
### Embedding Models
|
|
|
|
| Model | Dimensions | Use Case |
|
|
|-------|------------|----------|
|
|
| text-embedding-3-large | 3072 | High accuracy embeddings |
|
|
| text-embedding-3-small | 1536 | Cost-efficient embeddings |
|
|
| text-embedding-ada-002 | 1536 | Legacy embeddings |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "azure-gpt4o-config",
|
|
"executableId": "azure-openai",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "gpt-4o"},
|
|
{"key": "modelVersion", "value": "2024-05-13"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 2. SAP-Hosted Open Source
|
|
|
|
**Executable ID:** `aicore-opensource`
|
|
**Access Type:** Local (SAP-hosted)
|
|
|
|
### Llama Models
|
|
|
|
| Model | Parameters | Context | Capabilities |
|
|
|-------|------------|---------|--------------|
|
|
| llama-3.1-405b | 405B | 128K | Advanced reasoning |
|
|
| llama-3.1-70b | 70B | 128K | Strong reasoning |
|
|
| llama-3.1-8b | 8B | 128K | Fast, efficient |
|
|
| llama-3.2-90b-vision | 90B | 128K | Vision + text |
|
|
| llama-3.2-11b-vision | 11B | 128K | Vision + text |
|
|
| llama-3.2-3b | 3B | 128K | Lightweight |
|
|
| llama-3.2-1b | 1B | 128K | Edge deployment |
|
|
|
|
### Mistral Models (Open Source)
|
|
|
|
| Model | Parameters | Context |
|
|
|-------|------------|---------|
|
|
| mistral-7b-instruct | 7B | 32K |
|
|
| mixtral-8x7b | 46.7B | 32K |
|
|
|
|
### Falcon Models
|
|
|
|
| Model | Parameters | Context |
|
|
|-------|------------|---------|
|
|
| falcon-40b | 40B | 2K |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "llama-config",
|
|
"executableId": "aicore-opensource",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "meta--llama-3.1-70b-instruct"},
|
|
{"key": "modelVersion", "value": "latest"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Google Vertex AI
|
|
|
|
**Executable ID:** `gcp-vertexai`
|
|
**Access Type:** Remote (Google Cloud)
|
|
|
|
### Gemini Models
|
|
|
|
| Model | Context | Capabilities |
|
|
|-------|---------|--------------|
|
|
| gemini-2.5-pro | 2M | Chat, Vision, Code, Long context |
|
|
| gemini-2.5-flash | 1M | Fast, multimodal |
|
|
| gemini-2.5-flash-lite | 1M | Fast, lower-cost multimodal |
|
|
| gemini-2.0-flash | 1M | Flash family, multimodal |
|
|
| gemini-2.0-flash-lite | 1M | Flash family, lower-cost |
|
|
|
|
### PaLM 2 Models
|
|
|
|
| Model | Use Case |
|
|
|-------|----------|
|
|
| text-bison | Text generation |
|
|
| chat-bison | Conversational |
|
|
| code-bison | Code generation |
|
|
|
|
### Embedding Models
|
|
|
|
| Model | Dimensions |
|
|
|-------|------------|
|
|
| text-embedding-004 | 768 |
|
|
| textembedding-gecko | 768 |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "gemini-config",
|
|
"executableId": "gcp-vertexai",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "gemini-1.5-pro"},
|
|
{"key": "modelVersion", "value": "latest"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. AWS Bedrock
|
|
|
|
**Executable ID:** `aws-bedrock`
|
|
**Access Type:** Remote (AWS)
|
|
|
|
### Anthropic Claude Models
|
|
|
|
| Model | Context | Capabilities |
|
|
|-------|---------|--------------|
|
|
| claude-sonnet-4-5 | 200K | Latest, advanced reasoning |
|
|
| claude-4-opus | 200K | Highest capability |
|
|
| claude-4-sonnet | 200K | Balanced, high performance |
|
|
| claude-opus-4-1 | 200K | Extended Opus capabilities |
|
|
| claude-3-7-sonnet | 200K | Improved Sonnet 3.5 |
|
|
| claude-3-5-sonnet | 200K | Advanced reasoning |
|
|
| claude-3-opus | 200K | High capability |
|
|
| claude-3-sonnet | 200K | Balanced performance |
|
|
| claude-3-haiku | 200K | Fast, efficient |
|
|
|
|
### Amazon Titan Models
|
|
|
|
| Model | Use Case |
|
|
|-------|----------|
|
|
| titan-text-express | General text |
|
|
| titan-text-lite | Lightweight |
|
|
| titan-embed-text | Embeddings |
|
|
|
|
### Meta Llama 3 (Bedrock)
|
|
|
|
| Model | Parameters |
|
|
|-------|------------|
|
|
| llama-3-70b | 70B |
|
|
| llama-3-8b | 8B |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "claude-config",
|
|
"executableId": "aws-bedrock",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "anthropic--claude-3-5-sonnet"},
|
|
{"key": "modelVersion", "value": "latest"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Mistral AI
|
|
|
|
**Executable ID:** `aicore-mistralai`
|
|
**Access Type:** Local (SAP-hosted)
|
|
|
|
### Models
|
|
|
|
| Model | Parameters | Context | Use Case |
|
|
|-------|------------|---------|----------|
|
|
| mistral-large | - | 32K | Advanced reasoning |
|
|
| mistral-medium | - | 32K | Balanced |
|
|
| mistral-small | - | 32K | Cost-efficient |
|
|
| codestral | - | 32K | Code generation |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "mistral-config",
|
|
"executableId": "aicore-mistralai",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "mistralai--mistral-large"},
|
|
{"key": "modelVersion", "value": "latest"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 6. IBM
|
|
|
|
**Executable ID:** `aicore-ibm`
|
|
**Access Type:** Local (SAP-hosted)
|
|
|
|
### Granite Models
|
|
|
|
| Model | Parameters | Use Case |
|
|
|-------|------------|----------|
|
|
| granite-13b-chat | 13B | Conversational |
|
|
| granite-13b-instruct | 13B | Task completion |
|
|
| granite-code | - | Code generation |
|
|
|
|
### Configuration Example
|
|
|
|
```json
|
|
{
|
|
"name": "granite-config",
|
|
"executableId": "aicore-ibm",
|
|
"scenarioId": "foundation-models",
|
|
"parameterBindings": [
|
|
{"key": "modelName", "value": "ibm--granite-13b-chat"},
|
|
{"key": "modelVersion", "value": "latest"}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Model Selection Guide
|
|
|
|
### By Use Case
|
|
|
|
| Use Case | Recommended Models |
|
|
|----------|-------------------|
|
|
| General chat | gpt-4o, claude-3-5-sonnet, gemini-1.5-pro |
|
|
| Code generation | gpt-4o, codestral, claude-3-5-sonnet |
|
|
| Long documents | gemini-1.5-pro (2M), claude-3 (200K), gpt-4o (128K) |
|
|
| Vision/images | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
|
|
| Embeddings | text-embedding-3-large, text-embedding-004 |
|
|
| Cost-sensitive | gpt-4o-mini, mistral-small, llama-3.1-8b |
|
|
| High throughput | gpt-35-turbo, claude-3-haiku, mistral-small |
|
|
|
|
### By Budget
|
|
|
|
| Budget | Tier | Models |
|
|
|--------|------|--------|
|
|
| Low | Economy | gpt-4o-mini, claude-3-haiku, mistral-small |
|
|
| Medium | Standard | gpt-4o, claude-3-sonnet, gemini-1.5-flash |
|
|
| High | Premium | claude-3-opus, gpt-4-turbo, gemini-1.5-pro |
|
|
|
|
### By Capability
|
|
|
|
| Capability | Best Models |
|
|
|------------|-------------|
|
|
| Reasoning | claude-3-opus, gpt-4o, llama-3.1-405b |
|
|
| Speed | claude-3-haiku, gpt-35-turbo, mistral-small |
|
|
| Context length | gemini-1.5-pro (2M), claude-3 (200K) |
|
|
| Multimodal | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
|
|
| Code | codestral, gpt-4o, claude-3-5-sonnet |
|
|
|
|
---
|
|
|
|
## Model Version Management
|
|
|
|
### Version Strategies
|
|
|
|
| Strategy | Configuration | Use Case |
|
|
|----------|---------------|----------|
|
|
| Latest | `"modelVersion": "latest"` | Development, auto-upgrade |
|
|
| Pinned | `"modelVersion": "2024-05-13"` | Production stability |
|
|
|
|
### Checking Available Versions
|
|
|
|
```bash
|
|
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
|
|
-H "Authorization: Bearer $AUTH_TOKEN" \
|
|
-H "AI-Resource-Group: default" | \
|
|
jq '.resources[] | select(.model == "gpt-4o") | .versions'
|
|
```
|
|
|
|
### Handling Deprecation
|
|
|
|
1. Monitor `deprecationDate` in model metadata
|
|
2. Plan migration before `retirementDate`
|
|
3. Test new version in staging
|
|
4. Update configuration with new version
|
|
5. Patch existing deployments
|
|
|
|
---
|
|
|
|
## Pricing Considerations
|
|
|
|
Pricing varies by:
|
|
- Model complexity (larger = more expensive)
|
|
- Input vs output tokens (output often 2-3x input cost)
|
|
- Provider region
|
|
- Access type (Remote vs Local)
|
|
|
|
**Reference:** SAP Note 3437766 for current token rates.
|
|
|
|
### Cost Optimization
|
|
|
|
1. **Right-size models**: Use smaller models for simple tasks
|
|
2. **Batch requests**: Combine multiple queries when possible
|
|
3. **Cache responses**: Store and reuse common query results
|
|
4. **Limit tokens**: Set appropriate `max_tokens` limits
|
|
5. **Use streaming**: No additional cost, better UX
|
|
|
|
---
|
|
|
|
## Rate Limits
|
|
|
|
Rate limits vary by:
|
|
- Service plan tier
|
|
- Model provider
|
|
- Specific model
|
|
|
|
**Default limits** (vary by configuration):
|
|
- Requests per minute: 60-600
|
|
- Tokens per minute: 40K-400K
|
|
|
|
### Handling Rate Limits
|
|
|
|
```python
|
|
import time
|
|
from requests.exceptions import HTTPError
|
|
|
|
def call_with_retry(func, max_retries=3):
|
|
for attempt in range(max_retries):
|
|
try:
|
|
return func()
|
|
except HTTPError as e:
|
|
if e.response.status_code == 429:
|
|
wait_time = 2 ** attempt
|
|
time.sleep(wait_time)
|
|
else:
|
|
raise
|
|
raise Exception("Max retries exceeded")
|
|
```
|
|
|
|
---
|
|
|
|
## Documentation Links
|
|
|
|
- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
|
|
- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
|
|
- SAP Note 3437766: Token rates, limits, deprecation dates
|
|
- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)
|