Initial commit

2025-11-30 08:54:41 +08:00
commit 47e178c6cb
15 changed files with 5385 additions and 0 deletions
--- a/references/generative-ai-hub.md
+++ b/references/generative-ai-hub.md
@@ -0,0 +1,464 @@
+# Generative AI Hub Reference
+
+Complete reference for SAP AI Core Generative AI Hub.
+
+**Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core)
+
+---
+
+## Overview
+
+The Generative AI Hub integrates large language models (LLMs) into SAP AI Core and SAP AI Launchpad, providing unified access to models from multiple providers.
+
+### Key Features
+
+- Access to LLMs from 6 providers via unified API
+- Harmonized API for model switching without code changes
+- Prompt experimentation in AI Launchpad UI
+- Orchestration workflows with filtering, masking, grounding
+- Token-based metering and billing
+
+### Prerequisites
+
+- SAP AI Core with **Extended** service plan
+- Valid service key credentials
+- Resource group created
+
+---
+
+## Global Scenarios
+
+Two scenarios provide generative AI access:
+
+| Scenario ID | Description | Use Case |
+|-------------|-------------|----------|
+| `foundation-models` | Direct model access | Single model deployment |
+| `orchestration` | Unified multi-model access | Pipeline workflows |
+
+---
+
+## Model Providers
+
+### 1. Azure OpenAI (`azure-openai`)
+
+Access to OpenAI models via Azure's private instance.
+
+**Models:**
+- GPT-4o, GPT-4o-mini
+- GPT-4 Turbo, GPT-4
+- GPT-3.5 Turbo
+- text-embedding-3-large, text-embedding-3-small
+
+**Capabilities:** Chat, embeddings, vision
+
+### 2. SAP-Hosted Open Source (`aicore-opensource`)
+
+SAP-hosted open source models via OpenAI-compatible API.
+
+**Models:**
+- Llama 3.1 (8B, 70B, 405B)
+- Llama 3.2 (1B, 3B, 11B-Vision, 90B-Vision)
+- Mistral 7B, Mixtral 8x7B
+- Falcon 40B
+
+**Capabilities:** Chat, embeddings, vision (select models)
+
+### 3. Google Vertex AI (`gcp-vertexai`)
+
+Access to Google's AI models.
+
+**Models:**
+- Gemini 1.5 Pro, Gemini 1.5 Flash
+- Gemini 1.0 Pro
+- PaLM 2 (text-bison, chat-bison)
+- text-embedding-004
+
+**Capabilities:** Chat, embeddings, vision, code
+
+### 4. AWS Bedrock (`aws-bedrock`)
+
+Access to models via AWS Bedrock.
+
+**Models:**
+- Anthropic Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
+- Amazon Titan Text, Titan Embeddings
+- Meta Llama 3
+- Cohere Command
+
+**Capabilities:** Chat, embeddings
+
+### 5. Mistral AI (`aicore-mistralai`)
+
+SAP-hosted Mistral models.
+
+**Models:**
+- Mistral Large
+- Mistral Medium
+- Mistral Small
+- Mistral 7B Instruct
+- Codestral
+
+**Capabilities:** Chat, code
+
+### 6. IBM (`aicore-ibm`)
+
+SAP-hosted IBM models.
+
+**Models:**
+- Granite 13B Chat, Granite 13B Instruct
+- Granite Code
+
+**Capabilities:** Chat, code
+
+---
+
+## API: List Available Models
+
+```bash
+curl -X GET "$AI_API_URL/v2/lm/scenarios/foundation-models/models" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json"
+```
+
+### Response Structure
+
+```json
+{
+  "count": 50,
+  "resources": [
+    {
+      "model": "gpt-4o",
+      "accessType": "Remote",
+      "displayName": "GPT-4o",
+      "provider": "azure-openai",
+      "allowedScenarios": ["foundation-models"],
+      "executableId": "azure-openai",
+      "description": "OpenAI's most advanced model",
+      "versions": [
+        {
+          "name": "2024-05-13",
+          "isLatest": true,
+          "capabilities": ["text-generation", "chat", "vision"],
+          "contextLength": 128000,
+          "inputCost": 5.0,
+          "outputCost": 15.0,
+          "deprecationDate": null,
+          "retirementDate": null,
+          "isStreamingSupported": true
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Model Metadata Fields
+
+| Field | Description |
+|-------|-------------|
+| `model` | Model identifier for API calls |
+| `accessType` | "Remote" (external) or "Local" (SAP-hosted) |
+| `provider` | Provider identifier |
+| `executableId` | Executable ID for deployments |
+| `contextLength` | Maximum context window tokens |
+| `inputCost` | Cost per 1K input tokens |
+| `outputCost` | Cost per 1K output tokens |
+| `deprecationDate` | Date version becomes deprecated |
+| `retirementDate` | Date version is removed |
+| `isStreamingSupported` | Streaming capability |
+
+---
+
+## Deploying a Model
+
+### Step 1: Create Configuration
+
+```bash
+curl -X POST "$AI_API_URL/v2/lm/configurations" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "gpt4o-deployment-config",
+    "executableId": "azure-openai",
+    "scenarioId": "foundation-models",
+    "parameterBindings": [
+      {"key": "modelName", "value": "gpt-4o"},
+      {"key": "modelVersion", "value": "latest"}
+    ]
+  }'
+```
+
+### Step 2: Create Deployment
+
+```bash
+curl -X POST "$AI_API_URL/v2/lm/deployments" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "configurationId": "<config-id-from-step-1>"
+  }'
+```
+
+### Step 3: Check Status
+
+```bash
+curl -X GET "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default"
+```
+
+Wait for status `RUNNING` and note the `deploymentUrl`.
+
+---
+
+## Using the Harmonized API
+
+The harmonized API provides unified access without model-specific code.
+
+### Chat Completion
+
+```bash
+curl -X POST "$DEPLOYMENT_URL/chat/completions" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o",
+    "messages": [
+      {"role": "system", "content": "You are a helpful assistant."},
+      {"role": "user", "content": "What is SAP AI Core?"}
+    ],
+    "max_tokens": 1000,
+    "temperature": 0.7
+  }'
+```
+
+### With Streaming
+
+```bash
+curl -X POST "$DEPLOYMENT_URL/chat/completions" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gpt-4o",
+    "messages": [{"role": "user", "content": "Tell me a story"}],
+    "stream": true
+  }'
+```
+
+### Embeddings
+
+```bash
+curl -X POST "$DEPLOYMENT_URL/embeddings" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "text-embedding-3-large",
+    "input": ["Document chunk to embed"],
+    "encoding_format": "float"
+  }'
+```
+
+---
+
+## Orchestration Deployment
+
+For unified access to multiple models:
+
+### Create Orchestration Deployment
+
+```bash
+# Get orchestration configuration ID
+curl -X GET "$AI_API_URL/v2/lm/configurations?scenarioId=orchestration" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default"
+
+# Create deployment
+curl -X POST "$AI_API_URL/v2/lm/deployments" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "configurationId": "<orchestration-config-id>"
+  }'
+```
+
+### Use Orchestration API
+
+```bash
+curl -X POST "$ORCHESTRATION_URL/v2/completion" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "config": {
+      "module_configurations": {
+        "llm_module_config": {
+          "model_name": "gpt-4o",
+          "model_version": "latest"
+        },
+        "templating_module_config": {
+          "template": [
+            {"role": "user", "content": "{{?prompt}}"}
+          ]
+        }
+      }
+    },
+    "input_params": {
+      "prompt": "What is machine learning?"
+    }
+  }'
+```
+
+---
+
+## Model Version Management
+
+### Auto-Upgrade Strategy
+
+Set `modelVersion` to `"latest"` for automatic upgrades:
+
+```json
+{
+  "parameterBindings": [
+    {"key": "modelName", "value": "gpt-4o"},
+    {"key": "modelVersion", "value": "latest"}
+  ]
+}
+```
+
+### Pinned Version Strategy
+
+Specify exact version for stability:
+
+```json
+{
+  "parameterBindings": [
+    {"key": "modelName", "value": "gpt-4o"},
+    {"key": "modelVersion", "value": "2024-05-13"}
+  ]
+}
+```
+
+### Manual Version Upgrade
+
+Patch deployment with new configuration:
+
+```bash
+curl -X PATCH "$AI_API_URL/v2/lm/deployments/<deployment-id>" \
+  -H "Authorization: Bearer $AUTH_TOKEN" \
+  -H "AI-Resource-Group: default" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "configurationId": "<new-config-id>"
+  }'
+```
+
+---
+
+## SAP AI Launchpad UI
+
+### Prompt Experimentation
+
+Access: **Workspaces** → **Generative AI Hub** → **Prompt Editor**
+
+Features:
+- Interactive prompt testing
+- Model selection and parameter tuning
+- Variable placeholders
+- Image inputs (select models)
+- Streaming responses
+- Save prompts (manager roles)
+
+### Required Roles
+
+| Role | Capabilities |
+|------|--------------|
+| `genai_manager` | Full access, save prompts |
+| `genai_experimenter` | Test only, no save |
+| `prompt_manager` | Manage saved prompts |
+| `prompt_experimenter` | Use saved prompts |
+| `prompt_media_executor` | Upload images |
+
+### Prompt Types
+
+- **Question Answering**: Q&A interactions
+- **Summarization**: Extract key points
+- **Inferencing**: Sentiment, entity extraction
+- **Transformations**: Translation, format conversion
+- **Expansions**: Content generation
+
+---
+
+## Model Library
+
+View model specifications and benchmarks in AI Launchpad:
+
+**Access:** Generative AI Hub → Model Library
+
+Information available:
+- Model capabilities
+- Context window sizes
+- Performance benchmarks (win rates, arena scores)
+- Cost per token
+- Deprecation schedules
+
+---
+
+## Rate Limits and Quotas
+
+Refer to **SAP Note 3437766** for:
+- Token conversion rates per model
+- Rate limits (requests/minute, tokens/minute)
+- Regional availability
+- Deprecation dates
+
+### Quota Increase Request
+
+Submit support ticket:
+- Component: `CA-ML-AIC`
+- Include: tenant ID, current limits, requested limits, justification
+
+---
+
+## Best Practices
+
+### Model Selection
+
+| Use Case | Recommended Model |
+|----------|-------------------|
+| General chat | GPT-4o, Claude 3.5 Sonnet |
+| Cost-sensitive | GPT-4o-mini, Mistral Small |
+| Long context | GPT-4o (128K), Claude 3 (200K) |
+| Embeddings | text-embedding-3-large |
+| Code | Codestral, GPT-4o |
+| Vision | GPT-4o, Gemini 1.5 Pro |
+
+### Cost Optimization
+
+1. Use smaller models for simple tasks
+2. Implement caching for repeated queries
+3. Set appropriate `max_tokens` limits
+4. Use streaming for better UX without extra cost
+5. Monitor token usage via AI Launchpad analytics
+
+### Reliability
+
+1. Implement fallback configurations
+2. Pin model versions in production
+3. Monitor deprecation dates
+4. Test before upgrading versions
+
+---
+
+## Documentation Links
+
+- Generative AI Hub: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/generative-ai-hub-7db524e.md)
+- Supported Models: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/supported-models-509e588.md)
+- SAP Note 3437766: Token rates, limits, deprecation
+- SAP Discovery Center: [https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core](https://discovery-center.cloud.sap/serviceCatalog/sap-ai-core)