# ML Operations Reference Complete reference for SAP AI Core ML training and operations. **Documentation Source:** [https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core](https://github.com/SAP-docs/sap-artificial-intelligence/tree/main/docs/sap-ai-core) --- ## Overview SAP AI Core uses Argo Workflows for training pipelines, supporting batch jobs for model preprocessing, training, and inference. ### Key Components | Component | Description | |-----------|-------------| | **Scenarios** | AI use case implementations | | **Executables** | Reusable workflow templates | | **Configurations** | Parameters and artifact bindings | | **Executions** | Running instances of workflows | | **Artifacts** | Datasets, models, and results | --- ## Workflow Engine ### Argo Workflows SAP AI Core uses Argo Workflows (container-native workflow engine) supporting: - Direct Acyclic Graph (DAG) structures - Parallel step execution - Container-based steps - Data ingestion and preprocessing - Model training and batch inference **Limitation:** Not optimized for time-critical tasks due to scheduling overhead. --- ## Prerequisites ### 1. Object Store Secret (Required) Create a secret named `default` for training output artifacts: ```bash curl -X POST "$AI_API_URL/v2/admin/objectStoreSecrets" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "name": "default", "type": "S3", "pathPrefix": "my-bucket/training-output", "data": { "AWS_ACCESS_KEY_ID": "", "AWS_SECRET_ACCESS_KEY": "" } }' ``` **Note:** Without a `default` secret, training pipelines will fail. ### 2. Docker Registry Secret For custom training images: ```bash curl -X POST "$AI_API_URL/v2/admin/dockerRegistrySecrets" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "name": "docker-registry", "data": { ".dockerconfigjson": "" } }' ``` ### 3. Git Repository Sync workflow templates from Git: ```bash curl -X POST "$AI_API_URL/v2/admin/repositories" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "name": "training-repo", "url": "[https://github.com/org/training-workflows",](https://github.com/org/training-workflows",) "username": "", "password": "" }' ``` --- ## Workflow Template ### Basic Structure ```yaml apiVersion: ai.sap.com/v1alpha1 kind: WorkflowTemplate metadata: name: text-classifier-training annotations: scenarios.ai.sap.com/description: "Train text classification model" scenarios.ai.sap.com/name: "text-classifier" executables.ai.sap.com/description: "Training executable" executables.ai.sap.com/name: "text-classifier-train" artifacts.ai.sap.com/training-data.kind: "dataset" artifacts.ai.sap.com/trained-model.kind: "model" labels: scenarios.ai.sap.com/id: "text-classifier" executables.ai.sap.com/id: "text-classifier-train" ai.sap.com/version: "1.0.0" spec: imagePullSecrets: - name: docker-registry entrypoint: main arguments: parameters: - name: learning_rate default: "0.001" - name: epochs default: "10" artifacts: - name: training-data path: /data/input archive: none: {} templates: - name: main steps: - - name: preprocess template: preprocess-data - - name: train template: train-model - - name: evaluate template: evaluate-model - name: preprocess-data container: image: my-registry/preprocessing:latest command: ["python", "preprocess.py"] args: ["--input", "/data/input", "--output", "/data/processed"] - name: train-model container: image: my-registry/training:latest command: ["python", "train.py"] args: - "--data=/data/processed" - "--lr={{workflow.parameters.learning_rate}}" - "--epochs={{workflow.parameters.epochs}}" - "--output=/data/model" outputs: artifacts: - name: trained-model path: /data/model globalName: trained-model archive: none: {} - name: evaluate-model container: image: my-registry/evaluation:latest command: ["python", "evaluate.py"] args: ["--model", "/data/model"] ``` ### Annotations Reference | Annotation | Description | |------------|-------------| | `scenarios.ai.sap.com/name` | Human-readable scenario name | | `scenarios.ai.sap.com/id` | Scenario identifier | | `executables.ai.sap.com/name` | Executable name | | `executables.ai.sap.com/id` | Executable identifier | | `artifacts.ai.sap.com/.kind` | Artifact type (dataset, model, etc.) | --- ## Artifacts ### Types | Kind | Description | Use Case | |------|-------------|----------| | `dataset` | Training/validation data | Input for training | | `model` | Trained model | Output from training | | `resultset` | Inference results | Output from batch inference | | `other` | Miscellaneous | Logs, metrics, configs | ### Register Input Artifact ```bash curl -X POST "$AI_API_URL/v2/lm/artifacts" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "name": "training-dataset-v1", "kind": "dataset", "url": "ai://default/datasets/training-v1", "scenarioId": "text-classifier", "description": "Training dataset version 1" }' ``` ### URL Format - `ai://default/` - Uses default object store secret - `ai:///` - Uses named object store secret ### List Artifacts ```bash curl -X GET "$AI_API_URL/v2/lm/artifacts?scenarioId=text-classifier" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` --- ## Configurations ### Create Training Configuration ```bash curl -X POST "$AI_API_URL/v2/lm/configurations" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "name": "text-classifier-config-v1", "executableId": "text-classifier-train", "scenarioId": "text-classifier", "parameterBindings": [ {"key": "learning_rate", "value": "0.001"}, {"key": "epochs", "value": "20"}, {"key": "batch_size", "value": "32"} ], "inputArtifactBindings": [ {"key": "training-data", "artifactId": ""} ] }' ``` --- ## Executions ### Create Execution ```bash curl -X POST "$AI_API_URL/v2/lm/executions" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "configurationId": "" }' ``` ### Execution Statuses | Status | Description | |--------|-------------| | `UNKNOWN` | Initial state | | `PENDING` | Queued for execution | | `RUNNING` | Currently executing | | `COMPLETED` | Finished successfully | | `DEAD` | Failed | | `STOPPED` | Manually stopped | ### Check Execution Status ```bash curl -X GET "$AI_API_URL/v2/lm/executions/" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` ### Get Execution Logs ```bash curl -X GET "$AI_API_URL/v2/lm/executions//logs" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` ### Stop Execution ```bash curl -X PATCH "$AI_API_URL/v2/lm/executions/" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{"targetStatus": "STOPPED"}' ``` --- ## Metrics ### Write Metrics from Training In your training code: ```python import requests import os def log_metrics(metrics: dict, step: int): """Log metrics to SAP AI Core.""" api_url = os.environ.get("AICORE_API_URL") token = os.environ.get("AICORE_AUTH_TOKEN") execution_id = os.environ.get("AICORE_EXECUTION_ID") response = requests.post( f"{api_url}/v2/lm/executions/{execution_id}/metrics", headers={ "Authorization": f"Bearer {token}", "Content-Type": "application/json" }, json={ "metrics": [ {"name": name, "value": value, "step": step} for name, value in metrics.items() ] } ) # Usage in training loop for epoch in range(epochs): train_loss = train_epoch() val_loss = validate() log_metrics({ "train_loss": train_loss, "val_loss": val_loss, "accuracy": accuracy }, step=epoch) ``` ### Read Metrics ```bash curl -X GET "$AI_API_URL/v2/lm/executions//metrics" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` --- ## Training Schedules ### Create Schedule ```bash curl -X POST "$AI_API_URL/v2/lm/executionSchedules" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" \ -H "Content-Type: application/json" \ -d '{ "configurationId": "", "cron": "0 0 * * 0", "start": "2024-01-01T00:00:00Z", "end": "2024-12-31T23:59:59Z" }' ``` ### Cron Expression Format SAP AI Core uses 5-field cron expressions with **3-letter day-of-week names**: ``` ┌───────── minute (0-59) │ ┌─────── hour (0-23) │ │ ┌───── day of month (1-31) │ │ │ ┌─── month (1-12) │ │ │ │ ┌─ day of week (mon, tue, wed, thu, fri, sat, sun) │ │ │ │ │ * * * * * ``` Examples: - `0 0 * * *` - Daily at midnight - `0 0 * * sun` - Weekly on Sunday - `0 0 * * fri` - Weekly on Friday - `0 0 1 * *` - Monthly on 1st - `0 */6 * * *` - Every 6 hours **Note:** Using `* * * * *` treats the schedule as "Run Always" (continuous check), which differs from standard cron behavior. Minimum interval for pipeline schedules is 1 hour. ### List Schedules ```bash curl -X GET "$AI_API_URL/v2/lm/executionSchedules" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` ### Delete Schedule ```bash curl -X DELETE "$AI_API_URL/v2/lm/executionSchedules/" \ -H "Authorization: Bearer $AUTH_TOKEN" \ -H "AI-Resource-Group: default" ``` --- ## SAP AI Launchpad ### ML Operations App Access: **Workspaces** → **ML Operations** Features: - View scenarios and executables - Create/manage configurations - Run/monitor executions - View training metrics - Manage artifacts - Create schedules ### Required Roles | Role | Capabilities | |------|--------------| | `operations_manager` | Access ML Operations app | | `mloperations_viewer` | View-only access | | `mloperations_editor` | Full edit access | ### Comparing Runs 1. Navigate to ML Operations → Executions 2. Select multiple executions 3. Click "Compare" 4. View side-by-side metrics and parameters --- ## Best Practices ### Workflow Design 1. **Modular steps**: Break workflow into reusable templates 2. **Parameterization**: Use parameters for hyperparameters 3. **Artifact management**: Define clear input/output artifacts 4. **Error handling**: Include retry logic for flaky operations ### Resource Management 1. **Appropriate sizing**: Match container resources to workload 2. **GPU allocation**: Request GPUs only when needed 3. **Storage**: Use object store for large datasets 4. **Cleanup**: Delete old executions and artifacts ### Monitoring 1. **Log metrics**: Track loss, accuracy, etc. during training 2. **Check logs**: Review execution logs for errors 3. **Compare runs**: Analyze different hyperparameter settings 4. **Set alerts**: Monitor for failed executions --- ## Troubleshooting ### Execution Failed 1. Check execution logs: `GET /v2/lm/executions/{id}/logs` 2. Verify object store secret exists and is named `default` 3. Check Docker image is accessible 4. Verify artifact paths are correct 5. Check resource quota not exceeded ### Artifacts Not Found 1. Verify artifact URL format: `ai://default/` 2. Check object store secret permissions 3. Verify file exists in object store 4. Check artifact registered in correct scenario ### Schedule Not Running 1. Verify schedule is active (not paused) 2. Check cron expression is valid 3. Verify start/end dates bracket current time 4. Check configuration still exists --- ## Documentation Links - Training Overview: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/train-your-model-a9ceb06.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/train-your-model-a9ceb06.md) - ML Operations (Launchpad): [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-launchpad/ml-operations-df78271.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-launchpad/ml-operations-df78271.md) - Schedules: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/create-a-training-schedule-bd409a9.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/create-a-training-schedule-bd409a9.md) - Metrics: [https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/view-the-metric-resource-for-an-execution-d85dd44.md](https://github.com/SAP-docs/sap-artificial-intelligence/blob/main/docs/sap-ai-core/view-the-metric-resource-for-an-execution-d85dd44.md)