11 KiB
Generative AI Hub Reference
Complete reference for SAP AI Core Generative AI Hub.
Documentation Source: https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core
Overview
The Generative AI Hub integrates large language models (LLMs) into SAP AI Core and SAP AI Launchpad, providing unified access to models from multiple providers.
Key Features
- Access to LLMs from 6 providers via unified API
- Harmonized API for model switching without code changes
- Prompt experimentation in AI Launchpad UI
- Orchestration workflows with filtering, masking, grounding
- Token-based metering and billing
Prerequisites
- SAP AI Core with Extended service plan
- Valid service key credentials
- Resource group created
Global Scenarios
Two scenarios provide generative AI access:
| Scenario ID | Description | Use Case |
|---|---|---|
foundation-models |
Direct model access | Single model deployment |
orchestration |
Unified multi-model access | Pipeline workflows |
Model Providers
1. Azure OpenAI (azure-openai)
Access to OpenAI models via Azure's private instance.
Models:
- GPT-4o, GPT-4o-mini
- GPT-4 Turbo, GPT-4
- GPT-3.5 Turbo
- text-embedding-3-large, text-embedding-3-small
Capabilities: Chat, embeddings, vision
2. SAP-Hosted Open Source (aicore-opensource)
SAP-hosted open source models via OpenAI-compatible API.
Models:
- Llama 3.1 (8B, 70B, 405B)
- Llama 3.2 (1B, 3B, 11B-Vision, 90B-Vision)
- Mistral 7B, Mixtral 8x7B
- Falcon 40B
Capabilities: Chat, embeddings, vision (select models)
3. Google Vertex AI (gcp-vertexai)
Access to Google's AI models.
Models:
- Gemini 1.5 Pro, Gemini 1.5 Flash
- Gemini 1.0 Pro
- PaLM 2 (text-bison, chat-bison)
- text-embedding-004
Capabilities: Chat, embeddings, vision, code
4. AWS Bedrock (aws-bedrock)
Access to models via AWS Bedrock.
Models:
- Anthropic Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
- Amazon Titan Text, Titan Embeddings
- Meta Llama 3
- Cohere Command
Capabilities: Chat, embeddings
5. Mistral AI (aicore-mistralai)
SAP-hosted Mistral models.
Models:
- Mistral Large
- Mistral Medium
- Mistral Small
- Mistral 7B Instruct
- Codestral
Capabilities: Chat, code
6. IBM (aicore-ibm)
SAP-hosted IBM models.
Models:
- Granite 13B Chat, Granite 13B Instruct
- Granite Code
Capabilities: Chat, code
API: List Available Models
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json"
Response Structure
{
"count": 50,
"resources": [
{
"model": "gpt-4o",
"accessType": "Remote",
"displayName": "GPT-4o",
"provider": "azure-openai",
"allowedScenarios": ["foundation-models"],
"executableId": "azure-openai",
"description": "OpenAI's most advanced model",
"versions": [
{
"name": "2024-05-13",
"isLatest": true,
"capabilities": ["text-generation", "chat", "vision"],
"contextLength": 128000,
"inputCost": 5.0,
"outputCost": 15.0,
"deprecationDate": null,
"retirementDate": null,
"isStreamingSupported": true
}
]
}
]
}
Model Metadata Fields
| Field | Description |
|---|---|
model |
Model identifier for API calls |
accessType |
"Remote" (external) or "Local" (SAP-hosted) |
provider |
Provider identifier |
executableId |
Executable ID for deployments |
contextLength |
Maximum context window tokens |
inputCost |
Cost per 1K input tokens |
outputCost |
Cost per 1K output tokens |
deprecationDate |
Date version becomes deprecated |
retirementDate |
Date version is removed |
isStreamingSupported |
Streaming capability |
Deploying a Model
Step 1: Create Configuration
curl -X POST "$AI_API_URL/v2/lm/configurations" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"name": "gpt4o-deployment-config",
"executableId": "azure-openai",
"scenarioId": "foundation-models",
"parameterBindings": [
{"key": "modelName", "value": "gpt-4o"},
{"key": "modelVersion", "value": "latest"}
]
}'
Step 2: Create Deployment
curl -X POST "$AI_API_URL/v2/lm/deployments" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"configurationId": "<config-id-from-step-1>"
}'
Step 3: Check Status
curl -X GET "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default"
Wait for status RUNNING and note the deploymentUrl.
Using the Harmonized API
The harmonized API provides unified access without model-specific code.
Chat Completion
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is SAP AI Core?"}
],
"max_tokens": 1000,
"temperature": 0.7
}'
With Streaming
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
Embeddings
curl -X POST "$DEPLOYMENT_URL/embeddings" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-large",
"input": ["Document chunk to embed"],
"encoding_format": "float"
}'
Orchestration Deployment
For unified access to multiple models:
Create Orchestration Deployment
# Get orchestration configuration ID
curl -X GET "$AI_API_URL/v2/lm/configurations?scenarioId=orchestration" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default"
# Create deployment
curl -X POST "$AI_API_URL/v2/lm/deployments" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"configurationId": "<orchestration-config-id>"
}'
Use Orchestration API
curl -X POST "$ORCHESTRATION_URL/v2/completion" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"config": {
"module_configurations": {
"llm_module_config": {
"model_name": "gpt-4o",
"model_version": "latest"
},
"templating_module_config": {
"template": [
{"role": "user", "content": "{{?prompt}}"}
]
}
}
},
"input_params": {
"prompt": "What is machine learning?"
}
}'
Model Version Management
Auto-Upgrade Strategy
Set modelVersion to "latest" for automatic upgrades:
{
"parameterBindings": [
{"key": "modelName", "value": "gpt-4o"},
{"key": "modelVersion", "value": "latest"}
]
}
Pinned Version Strategy
Specify exact version for stability:
{
"parameterBindings": [
{"key": "modelName", "value": "gpt-4o"},
{"key": "modelVersion", "value": "2024-05-13"}
]
}
Manual Version Upgrade
Patch deployment with new configuration:
curl -X PATCH "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"configurationId": "<new-config-id>"
}'
SAP AI Launchpad UI
Prompt Experimentation
Access: Workspaces → Generative AI Hub → Prompt Editor
Features:
- Interactive prompt testing
- Model selection and parameter tuning
- Variable placeholders
- Image inputs (select models)
- Streaming responses
- Save prompts (manager roles)
Required Roles
| Role | Capabilities |
|---|---|
genai_manager |
Full access, save prompts |
genai_experimenter |
Test only, no save |
prompt_manager |
Manage saved prompts |
prompt_experimenter |
Use saved prompts |
prompt_media_executor |
Upload images |
Prompt Types
- Question Answering: Q&A interactions
- Summarization: Extract key points
- Inferencing: Sentiment, entity extraction
- Transformations: Translation, format conversion
- Expansions: Content generation
Model Library
View model specifications and benchmarks in AI Launchpad:
Access: Generative AI Hub → Model Library
Information available:
- Model capabilities
- Context window sizes
- Performance benchmarks (win rates, arena scores)
- Cost per token
- Deprecation schedules
Rate Limits and Quotas
Refer to SAP Note 3437766 for:
- Token conversion rates per model
- Rate limits (requests/minute, tokens/minute)
- Regional availability
- Deprecation dates
Quota Increase Request
Submit support ticket:
- Component:
CA-ML-AIC - Include: tenant ID, current limits, requested limits, justification
Best Practices
Model Selection
| Use Case | Recommended Model |
|---|---|
| General chat | GPT-4o, Claude 3.5 Sonnet |
| Cost-sensitive | GPT-4o-mini, Mistral Small |
| Long context | GPT-4o (128K), Claude 3 (200K) |
| Embeddings | text-embedding-3-large |
| Code | Codestral, GPT-4o |
| Vision | GPT-4o, Gemini 1.5 Pro |
Cost Optimization
- Use smaller models for simple tasks
- Implement caching for repeated queries
- Set appropriate
max_tokenslimits - Use streaming for better UX without extra cost
- Monitor token usage via AI Launchpad analytics
Reliability
- Implement fallback configurations
- Pin model versions in production
- Monitor deprecation dates
- Test before upgrading versions
Documentation Links
- Generative AI Hub: https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md
- Supported Models: https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md
- SAP Note 3437766: Token rates, limits, deprecation
- SAP Discovery Center: https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core