gh-secondsky-sap-skills-ski…/references/model-providers.md

# Model Providers Reference

Complete reference for SAP AI Core model providers and available models.

**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)

**Latest Models:** SAP Note 3437766

---

## Overview

SAP AI Core provides access to models from six providers via the Generative AI Hub. All models are accessed through a unified API, allowing easy switching between providers.

---

## Provider Summary

| Provider | Executable ID | Access Type | Model Categories |
|----------|---------------|-------------|------------------|
| Azure OpenAI | `azure-openai` | Remote | Chat, Embeddings, Vision |
| SAP Open Source | `aicore-opensource` | Local | Chat, Embeddings, Vision |
| Google Vertex AI | `gcp-vertexai` | Remote | Chat, Embeddings, Code |
| AWS Bedrock | `aws-bedrock` | Remote | Chat, Embeddings |
| Mistral AI | `aicore-mistralai` | Local | Chat, Code |
| IBM | `aicore-ibm` | Local | Chat, Code |

---

## 1. Azure OpenAI

**Executable ID:** `azure-openai`
**Access Type:** Remote (Azure-hosted)

### Chat Models

| Model | Context | Capabilities | Use Case |
|-------|---------|--------------|----------|
| gpt-4o | 128K | Chat, Vision | Advanced reasoning, multimodal |
| gpt-4o-mini | 128K | Chat, Vision | Cost-efficient, fast |
| gpt-4-turbo | 128K | Chat, Vision | Previous flagship |
| gpt-4 | 8K/32K | Chat | Reasoning, analysis |
| gpt-35-turbo | 4K/16K | Chat | Fast, economical |

### Embedding Models

| Model | Dimensions | Use Case |
|-------|------------|----------|
| text-embedding-3-large | 3072 | High accuracy embeddings |
| text-embedding-3-small | 1536 | Cost-efficient embeddings |
| text-embedding-ada-002 | 1536 | Legacy embeddings |

### Configuration Example

```json
{
  "name": "azure-gpt4o-config",
  "executableId": "azure-openai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "gpt-4o"},
    {"key": "modelVersion", "value": "2024-05-13"}
  ]
}
```

---

## 2. SAP-Hosted Open Source

**Executable ID:** `aicore-opensource`
**Access Type:** Local (SAP-hosted)

### Llama Models

| Model | Parameters | Context | Capabilities |
|-------|------------|---------|--------------|
| llama-3.1-405b | 405B | 128K | Advanced reasoning |
| llama-3.1-70b | 70B | 128K | Strong reasoning |
| llama-3.1-8b | 8B | 128K | Fast, efficient |
| llama-3.2-90b-vision | 90B | 128K | Vision + text |
| llama-3.2-11b-vision | 11B | 128K | Vision + text |
| llama-3.2-3b | 3B | 128K | Lightweight |
| llama-3.2-1b | 1B | 128K | Edge deployment |

### Mistral Models (Open Source)

| Model | Parameters | Context |
|-------|------------|---------|
| mistral-7b-instruct | 7B | 32K |
| mixtral-8x7b | 46.7B | 32K |

### Falcon Models

| Model | Parameters | Context |
|-------|------------|---------|
| falcon-40b | 40B | 2K |

### Configuration Example

```json
{
  "name": "llama-config",
  "executableId": "aicore-opensource",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "meta--llama-3.1-70b-instruct"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

---

## 3. Google Vertex AI

**Executable ID:** `gcp-vertexai`
**Access Type:** Remote (Google Cloud)

### Gemini Models

| Model | Context | Capabilities |
|-------|---------|--------------|
| gemini-2.5-pro | 2M | Chat, Vision, Code, Long context |
| gemini-2.5-flash | 1M | Fast, multimodal |
| gemini-2.5-flash-lite | 1M | Fast, lower-cost multimodal |
| gemini-2.0-flash | 1M | Flash family, multimodal |
| gemini-2.0-flash-lite | 1M | Flash family, lower-cost |

### PaLM 2 Models

| Model | Use Case |
|-------|----------|
| text-bison | Text generation |
| chat-bison | Conversational |
| code-bison | Code generation |

### Embedding Models

| Model | Dimensions |
|-------|------------|
| text-embedding-004 | 768 |
| textembedding-gecko | 768 |

### Configuration Example

```json
{
  "name": "gemini-config",
  "executableId": "gcp-vertexai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "gemini-1.5-pro"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

---

## 4. AWS Bedrock

**Executable ID:** `aws-bedrock`
**Access Type:** Remote (AWS)

### Anthropic Claude Models

| Model | Context | Capabilities |
|-------|---------|--------------|
| claude-sonnet-4-5 | 200K | Latest, advanced reasoning |
| claude-4-opus | 200K | Highest capability |
| claude-4-sonnet | 200K | Balanced, high performance |
| claude-opus-4-1 | 200K | Extended Opus capabilities |
| claude-3-7-sonnet | 200K | Improved Sonnet 3.5 |
| claude-3-5-sonnet | 200K | Advanced reasoning |
| claude-3-opus | 200K | High capability |
| claude-3-sonnet | 200K | Balanced performance |
| claude-3-haiku | 200K | Fast, efficient |

### Amazon Titan Models

| Model | Use Case |
|-------|----------|
| titan-text-express | General text |
| titan-text-lite | Lightweight |
| titan-embed-text | Embeddings |

### Meta Llama 3 (Bedrock)

| Model | Parameters |
|-------|------------|
| llama-3-70b | 70B |
| llama-3-8b | 8B |

### Configuration Example

```json
{
  "name": "claude-config",
  "executableId": "aws-bedrock",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "anthropic--claude-3-5-sonnet"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

---

## 5. Mistral AI

**Executable ID:** `aicore-mistralai`
**Access Type:** Local (SAP-hosted)

### Models

| Model | Parameters | Context | Use Case |
|-------|------------|---------|----------|
| mistral-large | - | 32K | Advanced reasoning |
| mistral-medium | - | 32K | Balanced |
| mistral-small | - | 32K | Cost-efficient |
| codestral | - | 32K | Code generation |

### Configuration Example

```json
{
  "name": "mistral-config",
  "executableId": "aicore-mistralai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "mistralai--mistral-large"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

---

## 6. IBM

**Executable ID:** `aicore-ibm`
**Access Type:** Local (SAP-hosted)

### Granite Models

| Model | Parameters | Use Case |
|-------|------------|----------|
| granite-13b-chat | 13B | Conversational |
| granite-13b-instruct | 13B | Task completion |
| granite-code | - | Code generation |

### Configuration Example

```json
{
  "name": "granite-config",
  "executableId": "aicore-ibm",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "ibm--granite-13b-chat"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

---

## Model Selection Guide

### By Use Case

| Use Case | Recommended Models |
|----------|-------------------|
| General chat | gpt-4o, claude-3-5-sonnet, gemini-1.5-pro |
| Code generation | gpt-4o, codestral, claude-3-5-sonnet |
| Long documents | gemini-1.5-pro (2M), claude-3 (200K), gpt-4o (128K) |
| Vision/images | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Embeddings | text-embedding-3-large, text-embedding-004 |
| Cost-sensitive | gpt-4o-mini, mistral-small, llama-3.1-8b |
| High throughput | gpt-35-turbo, claude-3-haiku, mistral-small |

### By Budget

| Budget | Tier | Models |
|--------|------|--------|
| Low | Economy | gpt-4o-mini, claude-3-haiku, mistral-small |
| Medium | Standard | gpt-4o, claude-3-sonnet, gemini-1.5-flash |
| High | Premium | claude-3-opus, gpt-4-turbo, gemini-1.5-pro |

### By Capability

| Capability | Best Models |
|------------|-------------|
| Reasoning | claude-3-opus, gpt-4o, llama-3.1-405b |
| Speed | claude-3-haiku, gpt-35-turbo, mistral-small |
| Context length | gemini-1.5-pro (2M), claude-3 (200K) |
| Multimodal | gpt-4o, gemini-1.5-pro, llama-3.2-vision |
| Code | codestral, gpt-4o, claude-3-5-sonnet |

---

## Model Version Management

### Version Strategies

| Strategy | Configuration | Use Case |
|----------|---------------|----------|
| Latest | `"modelVersion": "latest"` | Development, auto-upgrade |
| Pinned | `"modelVersion": "2024-05-13"` | Production stability |

### Checking Available Versions

```bash
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" | \
  jq '.resources[] | select(.model == "gpt-4o") | .versions'
```

### Handling Deprecation

1. Monitor `deprecationDate` in model metadata
2. Plan migration before `retirementDate`
3. Test new version in staging
4. Update configuration with new version
5. Patch existing deployments

---

## Pricing Considerations

Pricing varies by:
- Model complexity (larger = more expensive)
- Input vs output tokens (output often 2-3x input cost)
- Provider region
- Access type (Remote vs Local)

**Reference:** SAP Note 3437766 for current token rates.

### Cost Optimization

1. **Right-size models**: Use smaller models for simple tasks
2. **Batch requests**: Combine multiple queries when possible
3. **Cache responses**: Store and reuse common query results
4. **Limit tokens**: Set appropriate `max_tokens` limits
5. **Use streaming**: No additional cost, better UX

---

## Rate Limits

Rate limits vary by:
- Service plan tier
- Model provider
- Specific model

**Default limits** (vary by configuration):
- Requests per minute: 60-600
- Tokens per minute: 40K-400K

### Handling Rate Limits

```python
import time
from requests.exceptions import HTTPError

def call_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except HTTPError as e:
            if e.response.status_code == 429:
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")
```

---

## Documentation Links

- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
- SAP Note 3437766: Token rates, limits, deprecation dates
- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)