zhongwei/gh-secondsky-sap-skills-skills-sap-hana-cloud-data-intelligence

Fork 0

Files

Zhongwei Li e23395aeb2 Initial commit

2025-11-30 08:55:25 +08:00

9.5 KiB

Raw Permalink Blame History

Structured Data Operators Guide

Complete guide for structured data processing in SAP Data Intelligence.

Overview
Data Transform Operator
Projection Node
Aggregation Node
Join Node
Union Node
Case Node
Consumer Operators
Producer Operators
Resiliency

Overview

Structured data operators provide SQL-like data processing capabilities with visual configuration.

Key Components:

Data Transform: Visual SQL editor for transformations
Consumer Operators: Read structured data from sources
Producer Operators: Write structured data to targets

Supported Formats:

CSV, Parquet, ORC, JSON (files)
Database tables (HANA, SQL databases)
SAP applications (S/4HANA, BW)

Data Transform Operator

The Data Transform operator provides a visual editor for creating SQL-like transformations.

Custom Editor Features

Visual Designer:

Drag-and-drop node creation
Visual connection of data flows
Schema preview at each node

Available Nodes:

Source: Input data connection
Target: Output data connection
Projection: Column operations
Aggregation: GROUP BY operations
Join: Combine datasets
Union: Merge datasets
Case: Conditional logic
Filter: Row filtering

Creating a Data Transform

Add Data Transform operator to graph
Open Custom Editor
Add Source node (connect to input port)
Add transformation nodes
Add Target node (connect to output port)
Configure each node
Validate and close editor

Best Practices

Use meaningful node names
Preview data at each step
Optimize join order for performance
Use filters early to reduce data volume

Projection Node

The Projection node performs column-level operations.

Capabilities

Column Selection:

Include/exclude columns
Rename columns
Reorder columns

Column Transformation:

Apply expressions
Use DTL functions
Create calculated columns

Configuration

Source Columns:
  - CUSTOMER_ID (include)
  - CUSTOMER_NAME (include, rename to NAME)
  - INTERNAL_CODE (exclude)

Calculated Columns:
  - FULL_NAME: CONCAT(FIRST_NAME, ' ', LAST_NAME)
  - ORDER_YEAR: YEAR(ORDER_DATE)

Expression Examples

-- String concatenation
CONCAT(FIRST_NAME, ' ', LAST_NAME)

-- Conditional value
CASE WHEN AMOUNT > 1000 THEN 'High' ELSE 'Low' END

-- Date extraction
YEAR(ORDER_DATE)

-- Null handling
COALESCE(DISCOUNT, 0)

-- Type conversion
TO_DECIMAL(QUANTITY * PRICE, 15, 2)

Aggregation Node

The Aggregation node performs GROUP BY operations with aggregate functions.

Aggregate Functions

Function	Description
COUNT	Count rows
COUNT_DISTINCT	Count unique values
SUM	Sum of values
AVG	Average of values
MIN	Minimum value
MAX	Maximum value
FIRST	First value
LAST	Last value

Configuration

Group By Columns:
  - REGION
  - PRODUCT_CATEGORY

Aggregations:
  - TOTAL_SALES: SUM(SALES_AMOUNT)
  - ORDER_COUNT: COUNT(ORDER_ID)
  - AVG_ORDER: AVG(SALES_AMOUNT)
  - FIRST_ORDER: MIN(ORDER_DATE)
  - LAST_ORDER: MAX(ORDER_DATE)

Example Output Schema

Input:
  ORDER_ID, REGION, PRODUCT_CATEGORY, SALES_AMOUNT, ORDER_DATE

Output:
  REGION, PRODUCT_CATEGORY, TOTAL_SALES, ORDER_COUNT, AVG_ORDER, FIRST_ORDER, LAST_ORDER

Join Node

The Join node combines data from multiple sources.

Join Types

Type	Description
INNER	Only matching rows
LEFT	All left + matching right
RIGHT	All right + matching left
FULL	All rows from both sides

Configuration

Join Type: LEFT
Left Source: ORDERS
Right Source: CUSTOMERS

Join Conditions:
  - ORDERS.CUSTOMER_ID = CUSTOMERS.ID
  - ORDERS.REGION = CUSTOMERS.REGION (optional)

Design Considerations

Performance Tips:

Put smaller table on right side for LEFT joins
Use indexed columns in join conditions
Filter data before joining when possible
Avoid Cartesian products (missing join condition)

Multiple Joins:

Chain Join nodes for 3+ sources
Consider join order for performance
Validate intermediate results

Handling Duplicates

Scenario: Customer has multiple orders
Solution: Aggregate before or after join depending on requirement

Union Node

The Union node combines rows from multiple sources.

Union Types

Type	Description
UNION ALL	Include all rows (with duplicates)
UNION	Include distinct rows only

Configuration

Union Type: UNION ALL

Sources:
  - ORDERS_2023
  - ORDERS_2024

Column Mapping:
  - ORDER_ID -> ORDER_ID
  - CUSTOMER -> CUSTOMER_ID (rename)
  - AMOUNT -> SALES_AMOUNT (rename)

Requirements

Same number of columns from each source
Compatible data types
Column mapping for different names

Adding Source Identifier

Use a calculated column to track data origin:

-- In source 1 projection
'2023' AS DATA_YEAR

-- In source 2 projection
'2024' AS DATA_YEAR

Case Node

The Case node applies conditional logic to route or transform data.

Conditional Expressions

CASE
  WHEN ORDER_TYPE = 'SALES' THEN 'Revenue'
  WHEN ORDER_TYPE = 'RETURN' THEN 'Refund'
  ELSE 'Other'
END AS TRANSACTION_TYPE

Routing Data

Configure multiple output ports based on conditions:

Condition 1: REGION = 'EMEA' -> Output Port 1
Condition 2: REGION = 'APAC' -> Output Port 2
Default: -> Output Port 3

Nested Conditions

CASE
  WHEN AMOUNT > 10000 THEN
    CASE
      WHEN CUSTOMER_TYPE = 'VIP' THEN 'Priority High'
      ELSE 'Priority Medium'
    END
  ELSE 'Priority Low'
END AS PRIORITY

Consumer Operators

Operators that read structured data from sources.

Structured File Consumer

Reads from file storage (S3, Azure, GCS, HDFS, local).

Supported Formats:

CSV (with header options)
Parquet
ORC
JSON (JSON Lines format)

Configuration:

Connection: S3 Connection
Source: s3://bucket/path/*.parquet
Format: Parquet
Schema: Auto-detect or manual
Partition Pruning: date_column > '2024-01-01'

Excel Support:

Read Excel files (.xlsx)
Specify sheet name
Define header row

Structured SQL Consumer

Reads from SQL databases.

Configuration:

Connection: HANA Connection
Table/View: SALES_DATA
Columns: Select columns
Filter: WHERE clause

SAP Application Consumer

Reads from SAP applications via OData or RFC.

Configuration:

Connection: S/4HANA Connection
Entity: A_SalesOrder
Select: OrderID, CustomerID, NetAmount
Filter: $filter=CreationDate gt '2024-01-01'

Producer Operators

Operators that write structured data to targets.

Structured File Producer

Writes to file storage.

Configuration:

Connection: S3 Connection
Target: s3://bucket/output/
Format: Parquet
Partition Columns: YEAR, MONTH
Compression: SNAPPY

Partitioning Strategies:

By column values (e.g., year, region)
By time (hourly, daily)
By size (max rows per file)

Structured Table Producer

Writes to database tables.

Write Modes:

INSERT: Add new rows
UPSERT: Insert or update
DELETE: Remove matching rows
TRUNCATE_INSERT: Clear and reload

Configuration:

Connection: HANA Connection
Table: TARGET_TABLE
Mode: UPSERT
Key Columns: ID, DATE
Batch Size: 10000

SAP Application Producer

Writes to SAP applications.

Configuration:

Connection: S/4HANA Connection
Entity: A_SalesOrder
Operation: POST (create) / PATCH (update)

Resiliency

Structured data operators support resiliency features for reliable processing.

Checkpoint Configuration

Enable Checkpointing: Yes
Checkpoint Interval: 60 seconds
Checkpoint Location: /checkpoint/path

Recovery Behavior

On Failure:

Graph stops at failure point
State saved to checkpoint
Manual or auto restart
Resume from last checkpoint

Best Practices

Enable checkpointing for long-running jobs
Use appropriate checkpoint intervals
Store checkpoints on reliable storage
Monitor checkpoint sizes

Exactly-Once Processing

For exactly-once semantics:

Use UPSERT to database targets
Enable deduplication for file targets
Implement idempotent transformations

Example: End-to-End Pipeline

[Structured File Consumer] -> [Data Transform] -> [Structured Table Producer]
       (CSV files)              |                      (HANA table)
                                |
                    [Projection] - Select columns
                                |
                    [Join] - Enrich with master data
                                |
                    [Aggregation] - Summarize by region
                                |
                    [Case] - Apply business rules

Documentation Links

Structured Data Operators: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/working-with-structureddata-operators
Data Transform: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/working-with-structureddata-operators/data-transform-8fe8c02.md

Last Updated: 2025-11-22

9.5 KiB Raw Permalink Blame History

Structured Data Operators Guide

Table of Contents

Overview

Data Transform Operator

Custom Editor Features

Creating a Data Transform

Best Practices

Projection Node

Capabilities

Configuration

Expression Examples

Aggregation Node

Aggregate Functions

Configuration

Example Output Schema

Join Node

Join Types

Configuration

Design Considerations

Handling Duplicates

Union Node

Union Types

Configuration

Requirements

Adding Source Identifier

Case Node

Conditional Expressions

Routing Data

Nested Conditions

Consumer Operators

Structured File Consumer

Structured SQL Consumer

SAP Application Consumer

Producer Operators

Structured File Producer

Structured Table Producer

SAP Application Producer

Resiliency

Checkpoint Configuration

Recovery Behavior

Best Practices

Exactly-Once Processing

Example: End-to-End Pipeline

Documentation Links

9.5 KiB

Raw Permalink Blame History