zhongwei/gh-secondsky-sap-skills-skills-sap-hana-cloud-data-intelligence

Fork 0

Files

Zhongwei Li e23395aeb2 Initial commit

2025-11-30 08:55:25 +08:00

10 KiB

Raw Blame History

Graphs and Pipelines Guide

Complete guide for graph/pipeline development in SAP Data Intelligence.

Overview
Graph Concepts
Creating Graphs
Running Graphs
Monitoring
Error Recovery
Scheduling
Advanced Topics
Best Practices

Overview

Graphs (also called pipelines) are the core execution unit in SAP Data Intelligence.

Definition: A graph is a network of operators connected via typed input/output ports for data transfer.

Key Features:

Visual design in Modeler
Two generations (Gen1, Gen2)
Execution monitoring
Error recovery (Gen2)
Scheduling

Graph Concepts

Operator Generations

Gen1 Operators:

Legacy operator model
Process-based execution
Manual error handling
Broad compatibility

Gen2 Operators:

Enhanced error recovery
State management with snapshots
Native multiplexing
Better performance

Critical Rule: Cannot mix Gen1 and Gen2 operators in the same graph.

Ports and Data Types

Port Types:

Input ports: Receive data
Output ports: Send data
Typed connections

Common Data Types:

string, int32, int64, float32, float64
blob (binary data)
message (structured data)
table (tabular data)
any (flexible type)

Graph Structure

[Source Operator] ─── port ──→ [Processing Operator] ─── port ──→ [Target Operator]
     │                                │                                │
  Output Port                  In/Out Ports                      Input Port

Creating Graphs

Using the Modeler

Create New Graph
- Open Modeler
- Click "+" to create graph
- Select graph name and location
Add Operators
- Browse operator repository
- Drag operators to canvas
- Or search by name
Connect Operators
- Drag from output port to input port
- Verify type compatibility
- Configure connection properties
Configure Operators
- Select operator
- Set parameters in Properties panel
- Configure ports if needed
Validate Graph
- Click Validate button
- Review warnings and errors
- Fix issues before running

Graph-Level Configuration

Graph Data Types: Create custom data types for the graph:

{
  "name": "CustomerRecord",
  "properties": {
    "id": "string",
    "name": "string",
    "amount": "float64"
  }
}

Graph Parameters: Define runtime parameters:

Parameters:
  - name: source_path
    type: string
    default: /data/input/
  - name: batch_size
    type: int32
    default: 1000

Groups and Tags

Groups:

Organize operators visually
Share Docker configuration
Resource allocation

Tags:

Label operators
Filter and search
Documentation

Running Graphs

Execution Methods

Manual Execution:

Open graph in Modeler
Click "Run" button
Configure runtime parameters
Monitor execution

Programmatic Execution: Via Pipeline API or Data Workflow operators.

Execution Model

Process Model:

Each operator runs as process
Communication via ports
Coordinated by main engine

Gen2 Features:

Snapshot checkpoints
State recovery
Exactly-once semantics (configurable)

Runtime Parameters

Pass parameters at execution:

source_path = /data/2024/january/
batch_size = 5000
target_table = SALES_JAN_2024

Resource Configuration

Memory/CPU:

{
  "resources": {
    "requests": {
      "memory": "1Gi",
      "cpu": "500m"
    },
    "limits": {
      "memory": "4Gi",
      "cpu": "2000m"
    }
  }
}

Monitoring

Graph Status

Status	Description
Pending	Waiting to start
Running	Actively executing
Completed	Finished successfully
Failed	Error occurred
Dead	Terminated unexpectedly
Stopping	Shutdown in progress

Operator Status

Status	Description
Initializing	Setting up
Running	Processing data
Stopped	Finished or stopped
Failed	Error in operator

Monitoring Dashboard

Available Metrics:

Messages processed
Processing time
Memory usage
Error counts

Access:

Open running graph
Click "Monitor" tab
View real-time statistics

Diagnostic Information

Collect Diagnostics:

Select running/failed graph
Click "Download Diagnostics"
Review logs and state

Archive Contents:

execution.json (execution details)
graphs.json (graph definition)
events.json (execution events)
Operator logs
State snapshots

Error Recovery

Gen2 Error Recovery

Automatic Recovery:

Enable in graph settings
Configure snapshot interval
System recovers from last snapshot

Configuration:

{
  "autoRecovery": {
    "enabled": true,
    "snapshotInterval": "60s",
    "maxRetries": 3
  }
}

Snapshots

What's Saved:

Operator state
Message queues
Processing position

When Snapshots Occur:

Periodic (configured interval)
On operator request
Before shutdown

Delivery Guarantees

Mode	Description
At-most-once	May lose messages
At-least-once	May duplicate messages
Exactly-once	No loss or duplication

Gen2 Default: At-least-once with recovery.

Manual Error Handling

In Script Operators:

def on_input(msg_id, header, body):
    try:
        result = process(body)
        api.send("output", api.Message(result))
    except Exception as e:
        api.logger.error(f"Processing error: {e}")
        api.send("error", api.Message({
            "error": str(e),
            "input": body
        }))

Scheduling

Schedule Graph Executions

Cron Expression Format:

┌───────────── second (0-59)
│ ┌───────────── minute (0-59)
│ │ ┌───────────── hour (0-23)
│ │ │ ┌───────────── day of month (1-31)
│ │ │ │ ┌───────────── month (1-12)
│ │ │ │ │ ┌───────────── day of week (0-6, Sun=0)
│ │ │ │ │ │
* * * * * *

Examples:

0 0 * * * *     # Every hour
0 0 0 * * *     # Daily at midnight
0 0 0 * * 1     # Every Monday
0 0 6 1 * *     # 6 AM on first of month
0 */15 * * * *  # Every 15 minutes

Creating Schedule

Open graph
Click "Schedule"
Configure cron expression
Set timezone
Activate schedule

Managing Schedules

Actions:

View scheduled runs
Pause schedule
Resume schedule
Delete schedule
View execution history

Advanced Topics

Native Multiplexing (Gen2)

Connect one output to multiple inputs:

[Source] ─┬──→ [Processor A]
          ├──→ [Processor B]
          └──→ [Processor C]

Or multiple outputs to one input:

[Source A] ──┐
[Source B] ──┼──→ [Processor]
[Source C] ──┘

Graph Snippets

Reusable graph fragments:

Create Snippet:
- Select operators
- Right-click > "Save as Snippet"
- Name and save
Use Snippet:
- Drag snippet to canvas
- Configure parameters
- Connect to graph

Parameterization

Substitution Variables:

${parameter_name}
${ENV.VARIABLE_NAME}
${SYSTEM.TENANT}

In Operator Config:

File Path: ${source_path}/data_${DATE}.csv
Connection: ${target_connection}

Import/Export

Export Graph:

1. Select graph
2. Right-click > Export
3. Include data types (optional)
4. Save as .zip

Import Graph:

1. Right-click in repository
2. Import > From file
3. Select .zip file
4. Map dependencies

Best Practices

Graph Design

Clear Data Flow: Left-to-right, top-to-bottom
Meaningful Names: Descriptive operator names
Group Related Operators: Use groups for organization
Document: Add descriptions to operators
Validate Often: Check during development

Performance

Minimize Cross-Engine Communication
Use Appropriate Batch Sizes
Configure Resources: Memory and CPU
Enable Parallel Processing: Where applicable
Monitor and Tune: Use metrics

Error Handling

Enable Auto-Recovery (Gen2)
Configure Appropriate Snapshot Interval
Implement Error Ports: Route errors
Log Sufficiently: Debug information
Test Failure Scenarios: Validate recovery

Maintenance

Version Control: Use graph versioning
Document Changes: Change history
Test Before Deploy: Validate thoroughly
Monitor Production: Watch for issues
Clean Up: Remove unused graphs

Troubleshooting

Common Issues

Issue	Cause	Solution
Port type mismatch	Incompatible types	Use converter operator
Graph won't start	Resource constraints	Adjust resource config
Slow performance	Cross-engine overhead	Optimize operator placement
Recovery fails	Corrupt snapshot	Clear state, restart
Schedule not running	Incorrect cron	Verify expression

Diagnostic Steps

Check graph status
Review operator logs
Download diagnostics
Check resource usage
Verify connections

Documentation Links

Using Graphs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs
Creating Graphs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs
Graph Examples: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs

Last Updated: 2025-11-22

10 KiB Raw Blame History