# Graphs and Pipelines Guide Complete guide for graph/pipeline development in SAP Data Intelligence. ## Table of Contents 1. [Overview](#overview) 2. [Graph Concepts](#graph-concepts) 3. [Creating Graphs](#creating-graphs) 4. [Running Graphs](#running-graphs) 5. [Monitoring](#monitoring) 6. [Error Recovery](#error-recovery) 7. [Scheduling](#scheduling) 8. [Advanced Topics](#advanced-topics) 9. [Best Practices](#best-practices) --- ## Overview Graphs (also called pipelines) are the core execution unit in SAP Data Intelligence. **Definition:** A graph is a network of operators connected via typed input/output ports for data transfer. **Key Features:** - Visual design in Modeler - Two generations (Gen1, Gen2) - Execution monitoring - Error recovery (Gen2) - Scheduling --- ## Graph Concepts ### Operator Generations **Gen1 Operators:** - Legacy operator model - Process-based execution - Manual error handling - Broad compatibility **Gen2 Operators:** - Enhanced error recovery - State management with snapshots - Native multiplexing - Better performance **Critical Rule:** Cannot mix Gen1 and Gen2 operators in the same graph. ### Ports and Data Types **Port Types:** - Input ports: Receive data - Output ports: Send data - Typed connections **Common Data Types:** - string, int32, int64, float32, float64 - blob (binary data) - message (structured data) - table (tabular data) - any (flexible type) ### Graph Structure ``` [Source Operator] ─── port ──→ [Processing Operator] ─── port ──→ [Target Operator] │ │ │ Output Port In/Out Ports Input Port ``` --- ## Creating Graphs ### Using the Modeler 1. **Create New Graph** - Open Modeler - Click "+" to create graph - Select graph name and location 2. **Add Operators** - Browse operator repository - Drag operators to canvas - Or search by name 3. **Connect Operators** - Drag from output port to input port - Verify type compatibility - Configure connection properties 4. **Configure Operators** - Select operator - Set parameters in Properties panel - Configure ports if needed 5. **Validate Graph** - Click Validate button - Review warnings and errors - Fix issues before running ### Graph-Level Configuration **Graph Data Types:** Create custom data types for the graph: ```json { "name": "CustomerRecord", "properties": { "id": "string", "name": "string", "amount": "float64" } } ``` **Graph Parameters:** Define runtime parameters: ``` Parameters: - name: source_path type: string default: /data/input/ - name: batch_size type: int32 default: 1000 ``` ### Groups and Tags **Groups:** - Organize operators visually - Share Docker configuration - Resource allocation **Tags:** - Label operators - Filter and search - Documentation --- ## Running Graphs ### Execution Methods **Manual Execution:** 1. Open graph in Modeler 2. Click "Run" button 3. Configure runtime parameters 4. Monitor execution **Programmatic Execution:** Via Pipeline API or Data Workflow operators. ### Execution Model **Process Model:** - Each operator runs as process - Communication via ports - Coordinated by main engine **Gen2 Features:** - Snapshot checkpoints - State recovery - Exactly-once semantics (configurable) ### Runtime Parameters Pass parameters at execution: ``` source_path = /data/2024/january/ batch_size = 5000 target_table = SALES_JAN_2024 ``` ### Resource Configuration **Memory/CPU:** ```json { "resources": { "requests": { "memory": "1Gi", "cpu": "500m" }, "limits": { "memory": "4Gi", "cpu": "2000m" } } } ``` --- ## Monitoring ### Graph Status | Status | Description | |--------|-------------| | Pending | Waiting to start | | Running | Actively executing | | Completed | Finished successfully | | Failed | Error occurred | | Dead | Terminated unexpectedly | | Stopping | Shutdown in progress | ### Operator Status | Status | Description | |--------|-------------| | Initializing | Setting up | | Running | Processing data | | Stopped | Finished or stopped | | Failed | Error in operator | ### Monitoring Dashboard **Available Metrics:** - Messages processed - Processing time - Memory usage - Error counts **Access:** 1. Open running graph 2. Click "Monitor" tab 3. View real-time statistics ### Diagnostic Information **Collect Diagnostics:** 1. Select running/failed graph 2. Click "Download Diagnostics" 3. Review logs and state **Archive Contents:** - execution.json (execution details) - graphs.json (graph definition) - events.json (execution events) - Operator logs - State snapshots --- ## Error Recovery ### Gen2 Error Recovery **Automatic Recovery:** 1. Enable in graph settings 2. Configure snapshot interval 3. System recovers from last snapshot **Configuration:** ```json { "autoRecovery": { "enabled": true, "snapshotInterval": "60s", "maxRetries": 3 } } ``` ### Snapshots **What's Saved:** - Operator state - Message queues - Processing position **When Snapshots Occur:** - Periodic (configured interval) - On operator request - Before shutdown ### Delivery Guarantees | Mode | Description | |------|-------------| | At-most-once | May lose messages | | At-least-once | May duplicate messages | | Exactly-once | No loss or duplication | **Gen2 Default:** At-least-once with recovery. ### Manual Error Handling **In Script Operators:** ```python def on_input(msg_id, header, body): try: result = process(body) api.send("output", api.Message(result)) except Exception as e: api.logger.error(f"Processing error: {e}") api.send("error", api.Message({ "error": str(e), "input": body })) ``` --- ## Scheduling ### Schedule Graph Executions **Cron Expression Format:** ``` ┌───────────── second (0-59) │ ┌───────────── minute (0-59) │ │ ┌───────────── hour (0-23) │ │ │ ┌───────────── day of month (1-31) │ │ │ │ ┌───────────── month (1-12) │ │ │ │ │ ┌───────────── day of week (0-6, Sun=0) │ │ │ │ │ │ * * * * * * ``` **Examples:** ``` 0 0 * * * * # Every hour 0 0 0 * * * # Daily at midnight 0 0 0 * * 1 # Every Monday 0 0 6 1 * * # 6 AM on first of month 0 */15 * * * * # Every 15 minutes ``` ### Creating Schedule 1. Open graph 2. Click "Schedule" 3. Configure cron expression 4. Set timezone 5. Activate schedule ### Managing Schedules **Actions:** - View scheduled runs - Pause schedule - Resume schedule - Delete schedule - View execution history --- ## Advanced Topics ### Native Multiplexing (Gen2) Connect one output to multiple inputs: ``` [Source] ─┬──→ [Processor A] ├──→ [Processor B] └──→ [Processor C] ``` Or multiple outputs to one input: ``` [Source A] ──┐ [Source B] ──┼──→ [Processor] [Source C] ──┘ ``` ### Graph Snippets Reusable graph fragments: 1. **Create Snippet:** - Select operators - Right-click > "Save as Snippet" - Name and save 2. **Use Snippet:** - Drag snippet to canvas - Configure parameters - Connect to graph ### Parameterization **Substitution Variables:** ``` ${parameter_name} ${ENV.VARIABLE_NAME} ${SYSTEM.TENANT} ``` **In Operator Config:** ``` File Path: ${source_path}/data_${DATE}.csv Connection: ${target_connection} ``` ### Import/Export **Export Graph:** ``` 1. Select graph 2. Right-click > Export 3. Include data types (optional) 4. Save as .zip ``` **Import Graph:** ``` 1. Right-click in repository 2. Import > From file 3. Select .zip file 4. Map dependencies ``` --- ## Best Practices ### Graph Design 1. **Clear Data Flow**: Left-to-right, top-to-bottom 2. **Meaningful Names**: Descriptive operator names 3. **Group Related Operators**: Use groups for organization 4. **Document**: Add descriptions to operators 5. **Validate Often**: Check during development ### Performance 1. **Minimize Cross-Engine Communication** 2. **Use Appropriate Batch Sizes** 3. **Configure Resources**: Memory and CPU 4. **Enable Parallel Processing**: Where applicable 5. **Monitor and Tune**: Use metrics ### Error Handling 1. **Enable Auto-Recovery** (Gen2) 2. **Configure Appropriate Snapshot Interval** 3. **Implement Error Ports**: Route errors 4. **Log Sufficiently**: Debug information 5. **Test Failure Scenarios**: Validate recovery ### Maintenance 1. **Version Control**: Use graph versioning 2. **Document Changes**: Change history 3. **Test Before Deploy**: Validate thoroughly 4. **Monitor Production**: Watch for issues 5. **Clean Up**: Remove unused graphs --- ## Troubleshooting ### Common Issues | Issue | Cause | Solution | |-------|-------|----------| | Port type mismatch | Incompatible types | Use converter operator | | Graph won't start | Resource constraints | Adjust resource config | | Slow performance | Cross-engine overhead | Optimize operator placement | | Recovery fails | Corrupt snapshot | Clear state, restart | | Schedule not running | Incorrect cron | Verify expression | ### Diagnostic Steps 1. Check graph status 2. Review operator logs 3. Download diagnostics 4. Check resource usage 5. Verify connections --- ## Documentation Links - **Using Graphs**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs) - **Creating Graphs**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs) - **Graph Examples**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs) --- **Last Updated**: 2025-11-22