10 KiB
Graphs and Pipelines Guide
Complete guide for graph/pipeline development in SAP Data Intelligence.
Table of Contents
- Overview
- Graph Concepts
- Creating Graphs
- Running Graphs
- Monitoring
- Error Recovery
- Scheduling
- Advanced Topics
- Best Practices
Overview
Graphs (also called pipelines) are the core execution unit in SAP Data Intelligence.
Definition: A graph is a network of operators connected via typed input/output ports for data transfer.
Key Features:
- Visual design in Modeler
- Two generations (Gen1, Gen2)
- Execution monitoring
- Error recovery (Gen2)
- Scheduling
Graph Concepts
Operator Generations
Gen1 Operators:
- Legacy operator model
- Process-based execution
- Manual error handling
- Broad compatibility
Gen2 Operators:
- Enhanced error recovery
- State management with snapshots
- Native multiplexing
- Better performance
Critical Rule: Cannot mix Gen1 and Gen2 operators in the same graph.
Ports and Data Types
Port Types:
- Input ports: Receive data
- Output ports: Send data
- Typed connections
Common Data Types:
- string, int32, int64, float32, float64
- blob (binary data)
- message (structured data)
- table (tabular data)
- any (flexible type)
Graph Structure
[Source Operator] ─── port ──→ [Processing Operator] ─── port ──→ [Target Operator]
│ │ │
Output Port In/Out Ports Input Port
Creating Graphs
Using the Modeler
-
Create New Graph
- Open Modeler
- Click "+" to create graph
- Select graph name and location
-
Add Operators
- Browse operator repository
- Drag operators to canvas
- Or search by name
-
Connect Operators
- Drag from output port to input port
- Verify type compatibility
- Configure connection properties
-
Configure Operators
- Select operator
- Set parameters in Properties panel
- Configure ports if needed
-
Validate Graph
- Click Validate button
- Review warnings and errors
- Fix issues before running
Graph-Level Configuration
Graph Data Types: Create custom data types for the graph:
{
"name": "CustomerRecord",
"properties": {
"id": "string",
"name": "string",
"amount": "float64"
}
}
Graph Parameters: Define runtime parameters:
Parameters:
- name: source_path
type: string
default: /data/input/
- name: batch_size
type: int32
default: 1000
Groups and Tags
Groups:
- Organize operators visually
- Share Docker configuration
- Resource allocation
Tags:
- Label operators
- Filter and search
- Documentation
Running Graphs
Execution Methods
Manual Execution:
- Open graph in Modeler
- Click "Run" button
- Configure runtime parameters
- Monitor execution
Programmatic Execution: Via Pipeline API or Data Workflow operators.
Execution Model
Process Model:
- Each operator runs as process
- Communication via ports
- Coordinated by main engine
Gen2 Features:
- Snapshot checkpoints
- State recovery
- Exactly-once semantics (configurable)
Runtime Parameters
Pass parameters at execution:
source_path = /data/2024/january/
batch_size = 5000
target_table = SALES_JAN_2024
Resource Configuration
Memory/CPU:
{
"resources": {
"requests": {
"memory": "1Gi",
"cpu": "500m"
},
"limits": {
"memory": "4Gi",
"cpu": "2000m"
}
}
}
Monitoring
Graph Status
| Status | Description |
|---|---|
| Pending | Waiting to start |
| Running | Actively executing |
| Completed | Finished successfully |
| Failed | Error occurred |
| Dead | Terminated unexpectedly |
| Stopping | Shutdown in progress |
Operator Status
| Status | Description |
|---|---|
| Initializing | Setting up |
| Running | Processing data |
| Stopped | Finished or stopped |
| Failed | Error in operator |
Monitoring Dashboard
Available Metrics:
- Messages processed
- Processing time
- Memory usage
- Error counts
Access:
- Open running graph
- Click "Monitor" tab
- View real-time statistics
Diagnostic Information
Collect Diagnostics:
- Select running/failed graph
- Click "Download Diagnostics"
- Review logs and state
Archive Contents:
- execution.json (execution details)
- graphs.json (graph definition)
- events.json (execution events)
- Operator logs
- State snapshots
Error Recovery
Gen2 Error Recovery
Automatic Recovery:
- Enable in graph settings
- Configure snapshot interval
- System recovers from last snapshot
Configuration:
{
"autoRecovery": {
"enabled": true,
"snapshotInterval": "60s",
"maxRetries": 3
}
}
Snapshots
What's Saved:
- Operator state
- Message queues
- Processing position
When Snapshots Occur:
- Periodic (configured interval)
- On operator request
- Before shutdown
Delivery Guarantees
| Mode | Description |
|---|---|
| At-most-once | May lose messages |
| At-least-once | May duplicate messages |
| Exactly-once | No loss or duplication |
Gen2 Default: At-least-once with recovery.
Manual Error Handling
In Script Operators:
def on_input(msg_id, header, body):
try:
result = process(body)
api.send("output", api.Message(result))
except Exception as e:
api.logger.error(f"Processing error: {e}")
api.send("error", api.Message({
"error": str(e),
"input": body
}))
Scheduling
Schedule Graph Executions
Cron Expression Format:
┌───────────── second (0-59)
│ ┌───────────── minute (0-59)
│ │ ┌───────────── hour (0-23)
│ │ │ ┌───────────── day of month (1-31)
│ │ │ │ ┌───────────── month (1-12)
│ │ │ │ │ ┌───────────── day of week (0-6, Sun=0)
│ │ │ │ │ │
* * * * * *
Examples:
0 0 * * * * # Every hour
0 0 0 * * * # Daily at midnight
0 0 0 * * 1 # Every Monday
0 0 6 1 * * # 6 AM on first of month
0 */15 * * * * # Every 15 minutes
Creating Schedule
- Open graph
- Click "Schedule"
- Configure cron expression
- Set timezone
- Activate schedule
Managing Schedules
Actions:
- View scheduled runs
- Pause schedule
- Resume schedule
- Delete schedule
- View execution history
Advanced Topics
Native Multiplexing (Gen2)
Connect one output to multiple inputs:
[Source] ─┬──→ [Processor A]
├──→ [Processor B]
└──→ [Processor C]
Or multiple outputs to one input:
[Source A] ──┐
[Source B] ──┼──→ [Processor]
[Source C] ──┘
Graph Snippets
Reusable graph fragments:
-
Create Snippet:
- Select operators
- Right-click > "Save as Snippet"
- Name and save
-
Use Snippet:
- Drag snippet to canvas
- Configure parameters
- Connect to graph
Parameterization
Substitution Variables:
${parameter_name}
${ENV.VARIABLE_NAME}
${SYSTEM.TENANT}
In Operator Config:
File Path: ${source_path}/data_${DATE}.csv
Connection: ${target_connection}
Import/Export
Export Graph:
1. Select graph
2. Right-click > Export
3. Include data types (optional)
4. Save as .zip
Import Graph:
1. Right-click in repository
2. Import > From file
3. Select .zip file
4. Map dependencies
Best Practices
Graph Design
- Clear Data Flow: Left-to-right, top-to-bottom
- Meaningful Names: Descriptive operator names
- Group Related Operators: Use groups for organization
- Document: Add descriptions to operators
- Validate Often: Check during development
Performance
- Minimize Cross-Engine Communication
- Use Appropriate Batch Sizes
- Configure Resources: Memory and CPU
- Enable Parallel Processing: Where applicable
- Monitor and Tune: Use metrics
Error Handling
- Enable Auto-Recovery (Gen2)
- Configure Appropriate Snapshot Interval
- Implement Error Ports: Route errors
- Log Sufficiently: Debug information
- Test Failure Scenarios: Validate recovery
Maintenance
- Version Control: Use graph versioning
- Document Changes: Change history
- Test Before Deploy: Validate thoroughly
- Monitor Production: Watch for issues
- Clean Up: Remove unused graphs
Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Port type mismatch | Incompatible types | Use converter operator |
| Graph won't start | Resource constraints | Adjust resource config |
| Slow performance | Cross-engine overhead | Optimize operator placement |
| Recovery fails | Corrupt snapshot | Clear state, restart |
| Schedule not running | Incorrect cron | Verify expression |
Diagnostic Steps
- Check graph status
- Review operator logs
- Download diagnostics
- Check resource usage
- Verify connections
Documentation Links
- Using Graphs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs
- Creating Graphs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs
- Graph Examples: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs
Last Updated: 2025-11-22