gh-secondsky-sap-skills-ski…/references/generative-ai-hub.md

# Generative AI Hub Reference

Complete reference for SAP AI Core Generative AI Hub.

**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)

---

## Overview

The Generative AI Hub integrates large language models (LLMs) into SAP AI Core and SAP AI Launchpad, providing unified access to models from multiple providers.

### Key Features

- Access to LLMs from 6 providers via unified API
- Harmonized API for model switching without code changes
- Prompt experimentation in AI Launchpad UI
- Orchestration workflows with filtering, masking, grounding
- Token-based metering and billing

### Prerequisites

- SAP AI Core with **Extended** service plan
- Valid service key credentials
- Resource group created

---

## Global Scenarios

Two scenarios provide generative AI access:

| Scenario ID | Description | Use Case |
|-------------|-------------|----------|
| `foundation-models` | Direct model access | Single model deployment |
| `orchestration` | Unified multi-model access | Pipeline workflows |

---

## Model Providers

### 1. Azure OpenAI (`azure-openai`)

Access to OpenAI models via Azure's private instance.

**Models:**
- GPT-4o, GPT-4o-mini
- GPT-4 Turbo, GPT-4
- GPT-3.5 Turbo
- text-embedding-3-large, text-embedding-3-small

**Capabilities:** Chat, embeddings, vision

### 2. SAP-Hosted Open Source (`aicore-opensource`)

SAP-hosted open source models via OpenAI-compatible API.

**Models:**
- Llama 3.1 (8B, 70B, 405B)
- Llama 3.2 (1B, 3B, 11B-Vision, 90B-Vision)
- Mistral 7B, Mixtral 8x7B
- Falcon 40B

**Capabilities:** Chat, embeddings, vision (select models)

### 3. Google Vertex AI (`gcp-vertexai`)

Access to Google's AI models.

**Models:**
- Gemini 1.5 Pro, Gemini 1.5 Flash
- Gemini 1.0 Pro
- PaLM 2 (text-bison, chat-bison)
- text-embedding-004

**Capabilities:** Chat, embeddings, vision, code

### 4. AWS Bedrock (`aws-bedrock`)

Access to models via AWS Bedrock.

**Models:**
- Anthropic Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
- Amazon Titan Text, Titan Embeddings
- Meta Llama 3
- Cohere Command

**Capabilities:** Chat, embeddings

### 5. Mistral AI (`aicore-mistralai`)

SAP-hosted Mistral models.

**Models:**
- Mistral Large
- Mistral Medium
- Mistral Small
- Mistral 7B Instruct
- Codestral

**Capabilities:** Chat, code

### 6. IBM (`aicore-ibm`)

SAP-hosted IBM models.

**Models:**
- Granite 13B Chat, Granite 13B Instruct
- Granite Code

**Capabilities:** Chat, code

---

## API: List Available Models

```bash
curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json"
```

### Response Structure

```json
{
  "count": 50,
  "resources": [
    {
      "model": "gpt-4o",
      "accessType": "Remote",
      "displayName": "GPT-4o",
      "provider": "azure-openai",
      "allowedScenarios": ["foundation-models"],
      "executableId": "azure-openai",
      "description": "OpenAI's most advanced model",
      "versions": [
        {
          "name": "2024-05-13",
          "isLatest": true,
          "capabilities": ["text-generation", "chat", "vision"],
          "contextLength": 128000,
          "inputCost": 5.0,
          "outputCost": 15.0,
          "deprecationDate": null,
          "retirementDate": null,
          "isStreamingSupported": true
        }
      ]
    }
  ]
}
```

### Model Metadata Fields

| Field | Description |
|-------|-------------|
| `model` | Model identifier for API calls |
| `accessType` | "Remote" (external) or "Local" (SAP-hosted) |
| `provider` | Provider identifier |
| `executableId` | Executable ID for deployments |
| `contextLength` | Maximum context window tokens |
| `inputCost` | Cost per 1K input tokens |
| `outputCost` | Cost per 1K output tokens |
| `deprecationDate` | Date version becomes deprecated |
| `retirementDate` | Date version is removed |
| `isStreamingSupported` | Streaming capability |

---

## Deploying a Model

### Step 1: Create Configuration

```bash
curl -X POST "$AI_API_URL/v2/lm/configurations" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "gpt4o-deployment-config",
    "executableId": "azure-openai",
    "scenarioId": "foundation-models",
    "parameterBindings": [
      {"key": "modelName", "value": "gpt-4o"},
      {"key": "modelVersion", "value": "latest"}
    ]
  }'
```

### Step 2: Create Deployment

```bash
curl -X POST "$AI_API_URL/v2/lm/deployments" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "configurationId": "<config-id-from-step-1>"
  }'
```

### Step 3: Check Status

```bash
curl -X GET "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default"
```

Wait for status `RUNNING` and note the `deploymentUrl`.

---

## Using the Harmonized API

The harmonized API provides unified access without model-specific code.

### Chat Completion

```bash
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is SAP AI Core?"}
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'
```

### With Streaming

```bash
curl -X POST "$DEPLOYMENT_URL/chat/completions" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'
```

### Embeddings

```bash
curl -X POST "$DEPLOYMENT_URL/embeddings" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-large",
    "input": ["Document chunk to embed"],
    "encoding_format": "float"
  }'
```

---

## Orchestration Deployment

For unified access to multiple models:

### Create Orchestration Deployment

```bash
# Get orchestration configuration ID
curl -X GET "$AI_API_URL/v2/lm/configurations?scenarioId=orchestration" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default"

# Create deployment
curl -X POST "$AI_API_URL/v2/lm/deployments" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "configurationId": "<orchestration-config-id>"
  }'
```

### Use Orchestration API

```bash
curl -X POST "$ORCHESTRATION_URL/v2/completion" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "module_configurations": {
        "llm_module_config": {
          "model_name": "gpt-4o",
          "model_version": "latest"
        },
        "templating_module_config": {
          "template": [
            {"role": "user", "content": "{{?prompt}}"}
          ]
        }
      }
    },
    "input_params": {
      "prompt": "What is machine learning?"
    }
  }'
```

---

## Model Version Management

### Auto-Upgrade Strategy

Set `modelVersion` to `"latest"` for automatic upgrades:

```json
{
  "parameterBindings": [
    {"key": "modelName", "value": "gpt-4o"},
    {"key": "modelVersion", "value": "latest"}
  ]
}
```

### Pinned Version Strategy

Specify exact version for stability:

```json
{
  "parameterBindings": [
    {"key": "modelName", "value": "gpt-4o"},
    {"key": "modelVersion", "value": "2024-05-13"}
  ]
}
```

### Manual Version Upgrade

Patch deployment with new configuration:

```bash
curl -X PATCH "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
  -H "Authorization: Bearer $AUTH_TOKEN" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "configurationId": "<new-config-id>"
  }'
```

---

## SAP AI Launchpad UI

### Prompt Experimentation

Access: **Workspaces** → **Generative AI Hub** → **Prompt Editor**

Features:
- Interactive prompt testing
- Model selection and parameter tuning
- Variable placeholders
- Image inputs (select models)
- Streaming responses
- Save prompts (manager roles)

### Required Roles

| Role | Capabilities |
|------|--------------|
| `genai_manager` | Full access, save prompts |
| `genai_experimenter` | Test only, no save |
| `prompt_manager` | Manage saved prompts |
| `prompt_experimenter` | Use saved prompts |
| `prompt_media_executor` | Upload images |

### Prompt Types

- **Question Answering**: Q&A interactions
- **Summarization**: Extract key points
- **Inferencing**: Sentiment, entity extraction
- **Transformations**: Translation, format conversion
- **Expansions**: Content generation

---

## Model Library

View model specifications and benchmarks in AI Launchpad:

**Access:** Generative AI Hub → Model Library

Information available:
- Model capabilities
- Context window sizes
- Performance benchmarks (win rates, arena scores)
- Cost per token
- Deprecation schedules

---

## Rate Limits and Quotas

Refer to **SAP Note 3437766** for:
- Token conversion rates per model
- Rate limits (requests/minute, tokens/minute)
- Regional availability
- Deprecation dates

### Quota Increase Request

Submit support ticket:
- Component: `CA-ML-AIC`
- Include: tenant ID, current limits, requested limits, justification

---

## Best Practices

### Model Selection

| Use Case | Recommended Model |
|----------|-------------------|
| General chat | GPT-4o, Claude 3.5 Sonnet |
| Cost-sensitive | GPT-4o-mini, Mistral Small |
| Long context | GPT-4o (128K), Claude 3 (200K) |
| Embeddings | text-embedding-3-large |
| Code | Codestral, GPT-4o |
| Vision | GPT-4o, Gemini 1.5 Pro |

### Cost Optimization

1. Use smaller models for simple tasks
2. Implement caching for repeated queries
3. Set appropriate `max_tokens` limits
4. Use streaming for better UX without extra cost
5. Monitor token usage via AI Launchpad analytics

### Reliability

1. Implement fallback configurations
2. Pin model versions in production
3. Monitor deprecation dates
4. Test before upgrading versions

---

## Documentation Links

- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
- SAP Note 3437766: Token rates, limits, deprecation
- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)