Model Providers Reference
Complete reference for SAP AI Core model providers and available models.
Documentation Source: https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core
Latest Models: SAP Note 3437766
Overview
SAP AI Core provides access to models from six providers via the Generative AI Hub. All models are accessed through a unified API, allowing easy switching between providers.
Provider Summary
| Provider |
Executable ID |
Access Type |
Model Categories |
| Azure OpenAI |
azure-openai |
Remote |
Chat, Embeddings, Vision |
| SAP Open Source |
aicore-opensource |
Local |
Chat, Embeddings, Vision |
| Google Vertex AI |
gcp-vertexai |
Remote |
Chat, Embeddings, Code |
| AWS Bedrock |
aws-bedrock |
Remote |
Chat, Embeddings |
| Mistral AI |
aicore-mistralai |
Local |
Chat, Code |
| IBM |
aicore-ibm |
Local |
Chat, Code |
1. Azure OpenAI
Executable ID: azure-openai
Access Type: Remote (Azure-hosted)
Chat Models
| Model |
Context |
Capabilities |
Use Case |
| gpt-4o |
128K |
Chat, Vision |
Advanced reasoning, multimodal |
| gpt-4o-mini |
128K |
Chat, Vision |
Cost-efficient, fast |
| gpt-4-turbo |
128K |
Chat, Vision |
Previous flagship |
| gpt-4 |
8K/32K |
Chat |
Reasoning, analysis |
| gpt-35-turbo |
4K/16K |
Chat |
Fast, economical |
Embedding Models
| Model |
Dimensions |
Use Case |
| text-embedding-3-large |
3072 |
High accuracy embeddings |
| text-embedding-3-small |
1536 |
Cost-efficient embeddings |
| text-embedding-ada-002 |
1536 |
Legacy embeddings |
Configuration Example
2. SAP-Hosted Open Source
Executable ID: aicore-opensource
Access Type: Local (SAP-hosted)
Llama Models
| Model |
Parameters |
Context |
Capabilities |
| llama-3.1-405b |
405B |
128K |
Advanced reasoning |
| llama-3.1-70b |
70B |
128K |
Strong reasoning |
| llama-3.1-8b |
8B |
128K |
Fast, efficient |
| llama-3.2-90b-vision |
90B |
128K |
Vision + text |
| llama-3.2-11b-vision |
11B |
128K |
Vision + text |
| llama-3.2-3b |
3B |
128K |
Lightweight |
| llama-3.2-1b |
1B |
128K |
Edge deployment |
Mistral Models (Open Source)
| Model |
Parameters |
Context |
| mistral-7b-instruct |
7B |
32K |
| mixtral-8x7b |
46.7B |
32K |
Falcon Models
| Model |
Parameters |
Context |
| falcon-40b |
40B |
2K |
Configuration Example
3. Google Vertex AI
Executable ID: gcp-vertexai
Access Type: Remote (Google Cloud)
Gemini Models
| Model |
Context |
Capabilities |
| gemini-2.5-pro |
2M |
Chat, Vision, Code, Long context |
| gemini-2.5-flash |
1M |
Fast, multimodal |
| gemini-2.5-flash-lite |
1M |
Fast, lower-cost multimodal |
| gemini-2.0-flash |
1M |
Flash family, multimodal |
| gemini-2.0-flash-lite |
1M |
Flash family, lower-cost |
PaLM 2 Models
| Model |
Use Case |
| text-bison |
Text generation |
| chat-bison |
Conversational |
| code-bison |
Code generation |
Embedding Models
| Model |
Dimensions |
| text-embedding-004 |
768 |
| textembedding-gecko |
768 |
Configuration Example
4. AWS Bedrock
Executable ID: aws-bedrock
Access Type: Remote (AWS)
Anthropic Claude Models
| Model |
Context |
Capabilities |
| claude-sonnet-4-5 |
200K |
Latest, advanced reasoning |
| claude-4-opus |
200K |
Highest capability |
| claude-4-sonnet |
200K |
Balanced, high performance |
| claude-opus-4-1 |
200K |
Extended Opus capabilities |
| claude-3-7-sonnet |
200K |
Improved Sonnet 3.5 |
| claude-3-5-sonnet |
200K |
Advanced reasoning |
| claude-3-opus |
200K |
High capability |
| claude-3-sonnet |
200K |
Balanced performance |
| claude-3-haiku |
200K |
Fast, efficient |
Amazon Titan Models
| Model |
Use Case |
| titan-text-express |
General text |
| titan-text-lite |
Lightweight |
| titan-embed-text |
Embeddings |
Meta Llama 3 (Bedrock)
| Model |
Parameters |
| llama-3-70b |
70B |
| llama-3-8b |
8B |
Configuration Example
5. Mistral AI
Executable ID: aicore-mistralai
Access Type: Local (SAP-hosted)
Models
| Model |
Parameters |
Context |
Use Case |
| mistral-large |
- |
32K |
Advanced reasoning |
| mistral-medium |
- |
32K |
Balanced |
| mistral-small |
- |
32K |
Cost-efficient |
| codestral |
- |
32K |
Code generation |
Configuration Example
6. IBM
Executable ID: aicore-ibm
Access Type: Local (SAP-hosted)
Granite Models
| Model |
Parameters |
Use Case |
| granite-13b-chat |
13B |
Conversational |
| granite-13b-instruct |
13B |
Task completion |
| granite-code |
- |
Code generation |
Configuration Example
Model Selection Guide
By Use Case
| Use Case |
Recommended Models |
| General chat |
gpt-4o, claude-3-5-sonnet, gemini-1.5-pro |
| Code generation |
gpt-4o, codestral, claude-3-5-sonnet |
| Long documents |
gemini-1.5-pro (2M), claude-3 (200K), gpt-4o (128K) |
| Vision/images |
gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Embeddings |
text-embedding-3-large, text-embedding-004 |
| Cost-sensitive |
gpt-4o-mini, mistral-small, llama-3.1-8b |
| High throughput |
gpt-35-turbo, claude-3-haiku, mistral-small |
By Budget
| Budget |
Tier |
Models |
| Low |
Economy |
gpt-4o-mini, claude-3-haiku, mistral-small |
| Medium |
Standard |
gpt-4o, claude-3-sonnet, gemini-1.5-flash |
| High |
Premium |
claude-3-opus, gpt-4-turbo, gemini-1.5-pro |
By Capability
| Capability |
Best Models |
| Reasoning |
claude-3-opus, gpt-4o, llama-3.1-405b |
| Speed |
claude-3-haiku, gpt-35-turbo, mistral-small |
| Context length |
gemini-1.5-pro (2M), claude-3 (200K) |
| Multimodal |
gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Code |
codestral, gpt-4o, claude-3-5-sonnet |
Model Version Management
Version Strategies
| Strategy |
Configuration |
Use Case |
| Latest |
"modelVersion": "latest" |
Development, auto-upgrade |
| Pinned |
"modelVersion": "2024-05-13" |
Production stability |
Checking Available Versions
Handling Deprecation
- Monitor
deprecationDate in model metadata
- Plan migration before
retirementDate
- Test new version in staging
- Update configuration with new version
- Patch existing deployments
Pricing Considerations
Pricing varies by:
- Model complexity (larger = more expensive)
- Input vs output tokens (output often 2-3x input cost)
- Provider region
- Access type (Remote vs Local)
Reference: SAP Note 3437766 for current token rates.
Cost Optimization
- Right-size models: Use smaller models for simple tasks
- Batch requests: Combine multiple queries when possible
- Cache responses: Store and reuse common query results
- Limit tokens: Set appropriate
max_tokens limits
- Use streaming: No additional cost, better UX
Rate Limits
Rate limits vary by:
- Service plan tier
- Model provider
- Specific model
Default limits (vary by configuration):
- Requests per minute: 60-600
- Tokens per minute: 40K-400K
Handling Rate Limits
Documentation Links