Files
2025-11-30 08:55:25 +08:00

8.8 KiB

Data Workflow Operators Guide

Complete guide for data workflow orchestration in SAP Data Intelligence.

Table of Contents

  1. Overview
  2. Workflow Structure
  3. Available Operators
  4. Data Transfer
  5. Remote Execution
  6. Control Flow
  7. Notifications
  8. Best Practices

Overview

Data Workflow operators orchestrate data processing tasks that run for a limited time and finish with either "completed" or "dead" status.

Key Characteristics:

  • Sequential execution via signal passing
  • Operators start after receiving input signal
  • Each operator has input, output, and error ports
  • Unconnected output ports cause graph failure

Important: Do not mix Data Workflow operators with non-Data Workflow operators in the same graph.


Workflow Structure

Required Components

Every data workflow requires:

  • Workflow Trigger: First operator (starts the workflow)
  • Workflow Terminator: Last operator (ends the workflow)

Basic Structure

[Workflow Trigger] -> [Task Operator(s)] -> [Workflow Terminator]

Signal Flow

  1. Workflow Trigger sends initial signal
  2. Each operator waits for input signal
  3. Operator executes task
  4. Operator sends output signal (or error)
  5. Next operator receives signal and executes
  6. Workflow Terminator completes the graph

Available Operators

Core Workflow Operators

Operator Purpose
Workflow Trigger Initiates workflow execution
Workflow Terminator Concludes workflow with status
Workflow Split Duplicates signal for parallel paths
Workflow Merge (AND) Combines outputs using logical AND
Workflow Merge (OR) Combines outputs using logical OR

Task Operators

Operator Purpose
Data Transfer Move data between systems
Data Transform Apply data transformations
Pipeline Execute DI graphs locally or remotely
SAP Data Services Job Run remote Data Services jobs
SAP HANA Flowgraph Execute HANA flowgraphs
BW Process Chain Run BW process chains
Notification Send email notifications

Data Transfer

Purpose

Transfer data from SAP systems to cloud storage.

Supported Sources

  • SAP Business Warehouse (BW)
  • SAP HANA

Supported Targets

  • Amazon S3
  • Google Cloud Storage
  • Hadoop Distributed File System (HDFS)
  • SAP Vora

Transfer Modes

Mode Description Best For
BW OLAP Default BW access Small datasets
Generated HANA Views Partition-based transfer Large datasets
BW ODP Datastore extraction Cloud/distributed storage

BW OLAP Mode

  • Default mode for BW sources
  • Uses standard OLAP interface (like RSRT2)
  • Single result set processing
  • Cell export limitations
  • Not suitable for large-scale transfers

Generated HANA Views Mode

Requirements:

  • Connection via DI Connection Management
  • SAP BW 4.2.0 or later
  • Working HANA database connection
  • SSL certificates (if required)
  • Query with generated calculation view (no restricted attributes)

Advantage: Transfers partitions separately, enabling large result sets and parallel processing.

BW ODP Mode

Works with: Datastores only

Supported Targets:

  • Azure Data Lake
  • Google Cloud Storage
  • HDFS
  • Alibaba OSS
  • Amazon S3
  • Semantic Data Lake
  • Azure Storage Blob

Note: Partition processing is sequential, not parallel.

Configuration

Source Connection: BW_SYSTEM
Target Connection: S3_BUCKET
Transfer Mode: Generated HANA Views
Source Query: /NAMESPACE/QUERY
Target Path: /data/export/

Remote Execution

Pipeline Operator

Execute SAP Data Intelligence graphs.

Options:

  • Local execution (same DI instance)
  • Remote execution (different DI instance)
  • Parameter passing
  • Synchronous/asynchronous

Configuration:

Graph: /namespace/my_pipeline
Connection: (for remote)
Parameters: key=value pairs
Wait for Completion: Yes/No

SAP Data Services Job

Execute jobs in remote SAP Data Services systems.

Prerequisites:

  • Data Services connection configured
  • Job accessible from DI

Configuration:

Connection: DS_CONNECTION
Repository: REPO_NAME
Job: JOB_NAME
Global Variables: VAR1=VALUE1

SAP HANA Flowgraph

Execute flowgraphs in remote HANA systems.

Prerequisites:

  • HANA connection configured
  • Flowgraph deployed

Configuration:

Connection: HANA_CONNECTION
Flowgraph: SCHEMA.FLOWGRAPH_NAME
Parameters: (if applicable)

BW Process Chain

Execute SAP BW process chains.

Prerequisites:

  • BW connection configured
  • Process chain accessible

Configuration:

Connection: BW_CONNECTION
Process Chain: CHAIN_ID
Variant: (if applicable)
Wait for Completion: Yes/No

Control Flow

Workflow Split

Duplicates incoming signal to multiple output ports.

                    ┌──→ [Task A]
[Trigger] → [Split] ┼──→ [Task B]
                    └──→ [Task C]

Use Case: Parallel execution paths

Workflow Merge (AND)

Combines multiple inputs using logical AND. Sends output only when all inputs received.

[Task A] ──┐
[Task B] ──┼──→ [Merge AND] → [Next Task]
[Task C] ──┘

Use Case: Wait for all parallel tasks to complete

Workflow Merge (OR)

Combines multiple inputs using logical OR. Sends output when any input received.

[Task A] ──┐
[Task B] ──┼──→ [Merge OR] → [Next Task]
[Task C] ──┘

Use Case: Continue when first task completes

Error Handling

Error Port:

  • All task operators have error ports
  • Connect error port to handle failures
  • Unhandled errors terminate workflow
                    ┌── success ──→ [Continue]
[Task] ────────────┤
                    └── error ──→ [Error Handler]

Notifications

Notification Operator

Send email notifications during workflow execution.

Configuration:

SMTP Connection: EMAIL_CONNECTION
To: recipients@company.com
CC: (optional)
Subject: Workflow ${workflow.name} - ${status}
Body: The workflow completed at ${timestamp}
Attachment: (optional file path)

Use Cases

  • Success notifications
  • Error alerts
  • Progress updates
  • Audit trail

Template Variables

Variable Description
${workflow.name} Workflow name
${status} Execution status
${timestamp} Current timestamp
${error.message} Error details (if failed)

Best Practices

Design Principles

  1. Clear Flow: Design linear or clearly branching workflows
  2. Error Handling: Always connect error ports
  3. Notifications: Add alerts for critical failures
  4. Idempotency: Design tasks to be re-runnable

Performance

  1. Parallelize: Use Split/Merge for independent tasks
  2. Optimize Transfers: Choose appropriate transfer mode
  3. Monitor Progress: Track workflow status
  4. Resource Planning: Consider target system load

Reliability

  1. Test Components: Validate each task individually
  2. Handle Failures: Implement retry logic where needed
  3. Clean Up: Manage temporary data
  4. Document: Maintain workflow documentation

Example Workflow

[Trigger]
    ↓
[Split]
    ├──→ [Transfer from BW] ──→ [Merge AND]
    └──→ [Transfer from HANA] ─┘
                                    ↓
                              [Transform Data]
                                    ↓
                              [Load to Target]
                                    ↓
                              [Send Notification]
                                    ↓
                              [Terminator]


Last Updated: 2025-11-22