Initial commit
This commit is contained in:
464
references/generative-ai-hub.md
Normal file
464
references/generative-ai-hub.md
Normal file
@@ -0,0 +1,464 @@
|
||||
# Generative AI Hub Reference
|
||||
|
||||
Complete reference for SAP AI Core Generative AI Hub.
|
||||
|
||||
**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Generative AI Hub integrates large language models (LLMs) into SAP AI Core and SAP AI Launchpad, providing unified access to models from multiple providers.
|
||||
|
||||
### Key Features
|
||||
|
||||
- Access to LLMs from 6 providers via unified API
|
||||
- Harmonized API for model switching without code changes
|
||||
- Prompt experimentation in AI Launchpad UI
|
||||
- Orchestration workflows with filtering, masking, grounding
|
||||
- Token-based metering and billing
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- SAP AI Core with **Extended** service plan
|
||||
- Valid service key credentials
|
||||
- Resource group created
|
||||
|
||||
---
|
||||
|
||||
## Global Scenarios
|
||||
|
||||
Two scenarios provide generative AI access:
|
||||
|
||||
| Scenario ID | Description | Use Case |
|
||||
|-------------|-------------|----------|
|
||||
| `foundation-models` | Direct model access | Single model deployment |
|
||||
| `orchestration` | Unified multi-model access | Pipeline workflows |
|
||||
|
||||
---
|
||||
|
||||
## Model Providers
|
||||
|
||||
### 1. Azure OpenAI (`azure-openai`)
|
||||
|
||||
Access to OpenAI models via Azure's private instance.
|
||||
|
||||
**Models:**
|
||||
- GPT-4o, GPT-4o-mini
|
||||
- GPT-4 Turbo, GPT-4
|
||||
- GPT-3.5 Turbo
|
||||
- text-embedding-3-large, text-embedding-3-small
|
||||
|
||||
**Capabilities:** Chat, embeddings, vision
|
||||
|
||||
### 2. SAP-Hosted Open Source (`aicore-opensource`)
|
||||
|
||||
SAP-hosted open source models via OpenAI-compatible API.
|
||||
|
||||
**Models:**
|
||||
- Llama 3.1 (8B, 70B, 405B)
|
||||
- Llama 3.2 (1B, 3B, 11B-Vision, 90B-Vision)
|
||||
- Mistral 7B, Mixtral 8x7B
|
||||
- Falcon 40B
|
||||
|
||||
**Capabilities:** Chat, embeddings, vision (select models)
|
||||
|
||||
### 3. Google Vertex AI (`gcp-vertexai`)
|
||||
|
||||
Access to Google's AI models.
|
||||
|
||||
**Models:**
|
||||
- Gemini 1.5 Pro, Gemini 1.5 Flash
|
||||
- Gemini 1.0 Pro
|
||||
- PaLM 2 (text-bison, chat-bison)
|
||||
- text-embedding-004
|
||||
|
||||
**Capabilities:** Chat, embeddings, vision, code
|
||||
|
||||
### 4. AWS Bedrock (`aws-bedrock`)
|
||||
|
||||
Access to models via AWS Bedrock.
|
||||
|
||||
**Models:**
|
||||
- Anthropic Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
|
||||
- Amazon Titan Text, Titan Embeddings
|
||||
- Meta Llama 3
|
||||
- Cohere Command
|
||||
|
||||
**Capabilities:** Chat, embeddings
|
||||
|
||||
### 5. Mistral AI (`aicore-mistralai`)
|
||||
|
||||
SAP-hosted Mistral models.
|
||||
|
||||
**Models:**
|
||||
- Mistral Large
|
||||
- Mistral Medium
|
||||
- Mistral Small
|
||||
- Mistral 7B Instruct
|
||||
- Codestral
|
||||
|
||||
**Capabilities:** Chat, code
|
||||
|
||||
### 6. IBM (`aicore-ibm`)
|
||||
|
||||
SAP-hosted IBM models.
|
||||
|
||||
**Models:**
|
||||
- Granite 13B Chat, Granite 13B Instruct
|
||||
- Granite Code
|
||||
|
||||
**Capabilities:** Chat, code
|
||||
|
||||
---
|
||||
|
||||
## API: List Available Models
|
||||
|
||||
```bash
|
||||
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json"
|
||||
```
|
||||
|
||||
### Response Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"count": 50,
|
||||
"resources": [
|
||||
{
|
||||
"model": "gpt-4o",
|
||||
"accessType": "Remote",
|
||||
"displayName": "GPT-4o",
|
||||
"provider": "azure-openai",
|
||||
"allowedScenarios": ["foundation-models"],
|
||||
"executableId": "azure-openai",
|
||||
"description": "OpenAI's most advanced model",
|
||||
"versions": [
|
||||
{
|
||||
"name": "2024-05-13",
|
||||
"isLatest": true,
|
||||
"capabilities": ["text-generation", "chat", "vision"],
|
||||
"contextLength": 128000,
|
||||
"inputCost": 5.0,
|
||||
"outputCost": 15.0,
|
||||
"deprecationDate": null,
|
||||
"retirementDate": null,
|
||||
"isStreamingSupported": true
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Model Metadata Fields
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `model` | Model identifier for API calls |
|
||||
| `accessType` | "Remote" (external) or "Local" (SAP-hosted) |
|
||||
| `provider` | Provider identifier |
|
||||
| `executableId` | Executable ID for deployments |
|
||||
| `contextLength` | Maximum context window tokens |
|
||||
| `inputCost` | Cost per 1K input tokens |
|
||||
| `outputCost` | Cost per 1K output tokens |
|
||||
| `deprecationDate` | Date version becomes deprecated |
|
||||
| `retirementDate` | Date version is removed |
|
||||
| `isStreamingSupported` | Streaming capability |
|
||||
|
||||
---
|
||||
|
||||
## Deploying a Model
|
||||
|
||||
### Step 1: Create Configuration
|
||||
|
||||
```bash
|
||||
curl -X POST "$AI_API_URL/v2/lm/configurations" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "gpt4o-deployment-config",
|
||||
"executableId": "azure-openai",
|
||||
"scenarioId": "foundation-models",
|
||||
"parameterBindings": [
|
||||
{"key": "modelName", "value": "gpt-4o"},
|
||||
{"key": "modelVersion", "value": "latest"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
### Step 2: Create Deployment
|
||||
|
||||
```bash
|
||||
curl -X POST "$AI_API_URL/v2/lm/deployments" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"configurationId": "<config-id-from-step-1>"
|
||||
}'
|
||||
```
|
||||
|
||||
### Step 3: Check Status
|
||||
|
||||
```bash
|
||||
curl -X GET "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default"
|
||||
```
|
||||
|
||||
Wait for status `RUNNING` and note the `deploymentUrl`.
|
||||
|
||||
---
|
||||
|
||||
## Using the Harmonized API
|
||||
|
||||
The harmonized API provides unified access without model-specific code.
|
||||
|
||||
### Chat Completion
|
||||
|
||||
```bash
|
||||
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "What is SAP AI Core?"}
|
||||
],
|
||||
"max_tokens": 1000,
|
||||
"temperature": 0.7
|
||||
}'
|
||||
```
|
||||
|
||||
### With Streaming
|
||||
|
||||
```bash
|
||||
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [{"role": "user", "content": "Tell me a story"}],
|
||||
"stream": true
|
||||
}'
|
||||
```
|
||||
|
||||
### Embeddings
|
||||
|
||||
```bash
|
||||
curl -X POST "$DEPLOYMENT_URL/embeddings" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "text-embedding-3-large",
|
||||
"input": ["Document chunk to embed"],
|
||||
"encoding_format": "float"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Deployment
|
||||
|
||||
For unified access to multiple models:
|
||||
|
||||
### Create Orchestration Deployment
|
||||
|
||||
```bash
|
||||
# Get orchestration configuration ID
|
||||
curl -X GET "$AI_API_URL/v2/lm/configurations?scenarioId=orchestration" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default"
|
||||
|
||||
# Create deployment
|
||||
curl -X POST "$AI_API_URL/v2/lm/deployments" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"configurationId": "<orchestration-config-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
### Use Orchestration API
|
||||
|
||||
```bash
|
||||
curl -X POST "$ORCHESTRATION_URL/v2/completion" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"config": {
|
||||
"module_configurations": {
|
||||
"llm_module_config": {
|
||||
"model_name": "gpt-4o",
|
||||
"model_version": "latest"
|
||||
},
|
||||
"templating_module_config": {
|
||||
"template": [
|
||||
{"role": "user", "content": "{{?prompt}}"}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"input_params": {
|
||||
"prompt": "What is machine learning?"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Version Management
|
||||
|
||||
### Auto-Upgrade Strategy
|
||||
|
||||
Set `modelVersion` to `"latest"` for automatic upgrades:
|
||||
|
||||
```json
|
||||
{
|
||||
"parameterBindings": [
|
||||
{"key": "modelName", "value": "gpt-4o"},
|
||||
{"key": "modelVersion", "value": "latest"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Pinned Version Strategy
|
||||
|
||||
Specify exact version for stability:
|
||||
|
||||
```json
|
||||
{
|
||||
"parameterBindings": [
|
||||
{"key": "modelName", "value": "gpt-4o"},
|
||||
{"key": "modelVersion", "value": "2024-05-13"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Manual Version Upgrade
|
||||
|
||||
Patch deployment with new configuration:
|
||||
|
||||
```bash
|
||||
curl -X PATCH "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
|
||||
-H "Authorization: Bearer $AUTH_TOKEN" \
|
||||
-H "AI-Resource-Group: default" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"configurationId": "<new-config-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SAP AI Launchpad UI
|
||||
|
||||
### Prompt Experimentation
|
||||
|
||||
Access: **Workspaces** → **Generative AI Hub** → **Prompt Editor**
|
||||
|
||||
Features:
|
||||
- Interactive prompt testing
|
||||
- Model selection and parameter tuning
|
||||
- Variable placeholders
|
||||
- Image inputs (select models)
|
||||
- Streaming responses
|
||||
- Save prompts (manager roles)
|
||||
|
||||
### Required Roles
|
||||
|
||||
| Role | Capabilities |
|
||||
|------|--------------|
|
||||
| `genai_manager` | Full access, save prompts |
|
||||
| `genai_experimenter` | Test only, no save |
|
||||
| `prompt_manager` | Manage saved prompts |
|
||||
| `prompt_experimenter` | Use saved prompts |
|
||||
| `prompt_media_executor` | Upload images |
|
||||
|
||||
### Prompt Types
|
||||
|
||||
- **Question Answering**: Q&A interactions
|
||||
- **Summarization**: Extract key points
|
||||
- **Inferencing**: Sentiment, entity extraction
|
||||
- **Transformations**: Translation, format conversion
|
||||
- **Expansions**: Content generation
|
||||
|
||||
---
|
||||
|
||||
## Model Library
|
||||
|
||||
View model specifications and benchmarks in AI Launchpad:
|
||||
|
||||
**Access:** Generative AI Hub → Model Library
|
||||
|
||||
Information available:
|
||||
- Model capabilities
|
||||
- Context window sizes
|
||||
- Performance benchmarks (win rates, arena scores)
|
||||
- Cost per token
|
||||
- Deprecation schedules
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits and Quotas
|
||||
|
||||
Refer to **SAP Note 3437766** for:
|
||||
- Token conversion rates per model
|
||||
- Rate limits (requests/minute, tokens/minute)
|
||||
- Regional availability
|
||||
- Deprecation dates
|
||||
|
||||
### Quota Increase Request
|
||||
|
||||
Submit support ticket:
|
||||
- Component: `CA-ML-AIC`
|
||||
- Include: tenant ID, current limits, requested limits, justification
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Model Selection
|
||||
|
||||
| Use Case | Recommended Model |
|
||||
|----------|-------------------|
|
||||
| General chat | GPT-4o, Claude 3.5 Sonnet |
|
||||
| Cost-sensitive | GPT-4o-mini, Mistral Small |
|
||||
| Long context | GPT-4o (128K), Claude 3 (200K) |
|
||||
| Embeddings | text-embedding-3-large |
|
||||
| Code | Codestral, GPT-4o |
|
||||
| Vision | GPT-4o, Gemini 1.5 Pro |
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
1. Use smaller models for simple tasks
|
||||
2. Implement caching for repeated queries
|
||||
3. Set appropriate `max_tokens` limits
|
||||
4. Use streaming for better UX without extra cost
|
||||
5. Monitor token usage via AI Launchpad analytics
|
||||
|
||||
### Reliability
|
||||
|
||||
1. Implement fallback configurations
|
||||
2. Pin model versions in production
|
||||
3. Monitor deprecation dates
|
||||
4. Test before upgrading versions
|
||||
|
||||
---
|
||||
|
||||
## Documentation Links
|
||||
|
||||
- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
|
||||
- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
|
||||
- SAP Note 3437766: Token rates, limits, deprecation
|
||||
- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)
|
||||
Reference in New Issue
Block a user