Files
gh-secondsky-sap-skills-ski…/references/model-providers.md
2025-11-30 08:54:41 +08:00

10 KiB

Model Providers Reference

Complete reference for SAP AI Core model providers and available models.

Documentation Source: https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core

Latest Models: SAP Note 3437766


Overview

SAP AI Core provides access to models from six providers via the Generative AI Hub. All models are accessed through a unified API, allowing easy switching between providers.


Provider Summary

Provider Executable ID Access Type Model Categories
Azure OpenAI azure-openai Remote Chat, Embeddings, Vision
SAP Open Source aicore-opensource Local Chat, Embeddings, Vision
Google Vertex AI gcp-vertexai Remote Chat, Embeddings, Code
AWS Bedrock aws-bedrock Remote Chat, Embeddings
Mistral AI aicore-mistralai Local Chat, Code
IBM aicore-ibm Local Chat, Code

1. Azure OpenAI

Executable ID: azure-openai Access Type: Remote (Azure-hosted)

Chat Models

Model Context Capabilities Use Case
gpt-4o 128K Chat, Vision Advanced reasoning, multimodal
gpt-4o-mini 128K Chat, Vision Cost-efficient, fast
gpt-4-turbo 128K Chat, Vision Previous flagship
gpt-4 8K/32K Chat Reasoning, analysis
gpt-35-turbo 4K/16K Chat Fast, economical

Embedding Models

Model Dimensions Use Case
text-embedding-3-large 3072 High accuracy embeddings
text-embedding-3-small 1536 Cost-efficient embeddings
text-embedding-ada-002 1536 Legacy embeddings

Configuration Example

{
  "name": "azure-gpt4o-config",
  "executableId": "azure-openai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "gpt-4o"},
    {"key": "modelVersion", "value": "2024-05-13"}
  ]
}

2. SAP-Hosted Open Source

Executable ID: aicore-opensource Access Type: Local (SAP-hosted)

Llama Models

Model Parameters Context Capabilities
llama-3.1-405b 405B 128K Advanced reasoning
llama-3.1-70b 70B 128K Strong reasoning
llama-3.1-8b 8B 128K Fast, efficient
llama-3.2-90b-vision 90B 128K Vision + text
llama-3.2-11b-vision 11B 128K Vision + text
llama-3.2-3b 3B 128K Lightweight
llama-3.2-1b 1B 128K Edge deployment

Mistral Models (Open Source)

Model Parameters Context
mistral-7b-instruct 7B 32K
mixtral-8x7b 46.7B 32K

Falcon Models

Model Parameters Context
falcon-40b 40B 2K

Configuration Example

{
  "name": "llama-config",
  "executableId": "aicore-opensource",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "meta--llama-3.1-70b-instruct"},
    {"key": "modelVersion", "value": "latest"}
  ]
}

3. Google Vertex AI

Executable ID: gcp-vertexai Access Type: Remote (Google Cloud)

Gemini Models

Model Context Capabilities
gemini-2.5-pro 2M Chat, Vision, Code, Long context
gemini-2.5-flash 1M Fast, multimodal
gemini-2.5-flash-lite 1M Fast, lower-cost multimodal
gemini-2.0-flash 1M Flash family, multimodal
gemini-2.0-flash-lite 1M Flash family, lower-cost

PaLM 2 Models

Model Use Case
text-bison Text generation
chat-bison Conversational
code-bison Code generation

Embedding Models

Model Dimensions
text-embedding-004 768
textembedding-gecko 768

Configuration Example

{
  "name": "gemini-config",
  "executableId": "gcp-vertexai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "gemini-1.5-pro"},
    {"key": "modelVersion", "value": "latest"}
  ]
}

4. AWS Bedrock

Executable ID: aws-bedrock Access Type: Remote (AWS)

Anthropic Claude Models

Model Context Capabilities
claude-sonnet-4-5 200K Latest, advanced reasoning
claude-4-opus 200K Highest capability
claude-4-sonnet 200K Balanced, high performance
claude-opus-4-1 200K Extended Opus capabilities
claude-3-7-sonnet 200K Improved Sonnet 3.5
claude-3-5-sonnet 200K Advanced reasoning
claude-3-opus 200K High capability
claude-3-sonnet 200K Balanced performance
claude-3-haiku 200K Fast, efficient

Amazon Titan Models

Model Use Case
titan-text-express General text
titan-text-lite Lightweight
titan-embed-text Embeddings

Meta Llama 3 (Bedrock)

Model Parameters
llama-3-70b 70B
llama-3-8b 8B

Configuration Example

{
  "name": "claude-config",
  "executableId": "aws-bedrock",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "anthropic--claude-3-5-sonnet"},
    {"key": "modelVersion", "value": "latest"}
  ]
}

5. Mistral AI

Executable ID: aicore-mistralai Access Type: Local (SAP-hosted)

Models

Model Parameters Context Use Case
mistral-large - 32K Advanced reasoning
mistral-medium - 32K Balanced
mistral-small - 32K Cost-efficient
codestral - 32K Code generation

Configuration Example

{
  "name": "mistral-config",
  "executableId": "aicore-mistralai",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "mistralai--mistral-large"},
    {"key": "modelVersion", "value": "latest"}
  ]
}

6. IBM

Executable ID: aicore-ibm Access Type: Local (SAP-hosted)

Granite Models

Model Parameters Use Case
granite-13b-chat 13B Conversational
granite-13b-instruct 13B Task completion
granite-code - Code generation

Configuration Example

{
  "name": "granite-config",
  "executableId": "aicore-ibm",
  "scenarioId": "foundation-models",
  "parameterBindings": [
    {"key": "modelName", "value": "ibm--granite-13b-chat"},
    {"key": "modelVersion", "value": "latest"}
  ]
}

Model Selection Guide

By Use Case

Use Case Recommended Models
General chat gpt-4o, claude-3-5-sonnet, gemini-1.5-pro
Code generation gpt-4o, codestral, claude-3-5-sonnet
Long documents gemini-1.5-pro (2M), claude-3 (200K), gpt-4o (128K)
Vision/images gpt-4o, gemini-1.5-pro, llama-3.2-vision
Embeddings text-embedding-3-large, text-embedding-004
Cost-sensitive gpt-4o-mini, mistral-small, llama-3.1-8b
High throughput gpt-35-turbo, claude-3-haiku, mistral-small

By Budget

Budget Tier Models
Low Economy gpt-4o-mini, claude-3-haiku, mistral-small
Medium Standard gpt-4o, claude-3-sonnet, gemini-1.5-flash
High Premium claude-3-opus, gpt-4-turbo, gemini-1.5-pro

By Capability

Capability Best Models
Reasoning claude-3-opus, gpt-4o, llama-3.1-405b
Speed claude-3-haiku, gpt-35-turbo, mistral-small
Context length gemini-1.5-pro (2M), claude-3 (200K)
Multimodal gpt-4o, gemini-1.5-pro, llama-3.2-vision
Code codestral, gpt-4o, claude-3-5-sonnet

Model Version Management

Version Strategies

Strategy Configuration Use Case
Latest "modelVersion": "latest" Development, auto-upgrade
Pinned "modelVersion": "2024-05-13" Production stability

Checking Available Versions

curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" | \
  jq '.resources[] | select(.model == "gpt-4o") | .versions'

Handling Deprecation

  1. Monitor deprecationDate in model metadata
  2. Plan migration before retirementDate
  3. Test new version in staging
  4. Update configuration with new version
  5. Patch existing deployments

Pricing Considerations

Pricing varies by:

  • Model complexity (larger = more expensive)
  • Input vs output tokens (output often 2-3x input cost)
  • Provider region
  • Access type (Remote vs Local)

Reference: SAP Note 3437766 for current token rates.

Cost Optimization

  1. Right-size models: Use smaller models for simple tasks
  2. Batch requests: Combine multiple queries when possible
  3. Cache responses: Store and reuse common query results
  4. Limit tokens: Set appropriate max_tokens limits
  5. Use streaming: No additional cost, better UX

Rate Limits

Rate limits vary by:

  • Service plan tier
  • Model provider
  • Specific model

Default limits (vary by configuration):

  • Requests per minute: 60-600
  • Tokens per minute: 40K-400K

Handling Rate Limits

import time
from requests.exceptions import HTTPError

def call_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except HTTPError as e:
            if e.response.status_code == 429:
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise
    raise Exception("Max retries exceeded")