zhongwei/gh-secondsky-sap-skills-skills-sap-hana-cloud-data-intelligence

Fork 0

Files

Zhongwei Li e23395aeb2 Initial commit

2025-11-30 08:55:25 +08:00

9.5 KiB

Raw Permalink Blame History

Replication Flows Guide

Complete guide for data replication in SAP Data Intelligence.

Overview
Creating Replication Flows
Supported Sources
Supported Targets
Task Configuration
Filters and Mappings
Delivery Guarantees
Cloud Storage Target Structure
Kafka as Target
Monitoring and Management

Overview

Replication flows enable data movement from sources to targets with support for:

Small or large datasets
Batch or real-time processing
Full or delta (CDC) loading
Multiple target types

Key Workflow:

Configure source and target connections
Create replication flow
Add tasks with datasets
Configure filters and mappings
Validate flow
Deploy to tenant repository
Run and monitor

Creating Replication Flows

Prerequisites

Source connection created and enabled in Connection Management
Target connection configured
Appropriate authorizations

Creation Steps

Open Modeler in SAP Data Intelligence
Navigate to Replication Flows
Create new replication flow
Configure source:
- Select source connection
- Choose connection type (ABAP, database, etc.)
Configure target:
- Select target type (database, cloud storage, Kafka)
- Set target-specific options
Add tasks (see Task Configuration)
Validate the flow
Deploy to tenant repository
Run the flow

Supported Sources

ABAP Systems

SAP S/4HANA (Cloud and On-Premise)
SAP ECC via SLT
SAP BW/4HANA
CDS views with extraction

Source Configuration:

Connection Type: ABAP
Extraction Type: CDS / ODP / Table
Package Size: 50000

Databases

SAP HANA
Azure SQL Database (delta requires schema = username)
Other SQL databases via connectors

Supported Targets

Database Targets

SAP HANA Cloud:

Write modes: INSERT, UPSERT, DELETE
Exactly-once delivery with UPSERT
Batch size configuration

Cloud Storage Targets

Target	Description
Amazon S3	AWS object storage
Azure Data Lake Storage Gen2	Microsoft cloud storage
Google Cloud Storage	GCP object storage
SAP HANA Data Lake	SAP cloud data lake

Cloud Storage Options:

Group Delta By: None, Date, Hour
File Type: CSV, Parquet, JSON, JSONLines
Suppress Duplicates: Minimize duplicate records

Container Name Limit: 64 characters maximum

Kafka Target

Each dataset maps to a Kafka topic
Topic names editable (need not match source)
No container name limit

Task Configuration

Tasks define what data to replicate and how.

Task Components

Task:
  - Source dataset (table, view, etc.)
  - Target specification
  - Filter conditions
  - Column mappings
  - Load type (Initial/Delta)

Load Types

Type	Description
Initial Load	Full data extraction
Delta Load	Changed data only (CDC)
Initial + Delta	Full load followed by continuous delta

Creating Tasks

Click "Add Task"
Select source object
Configure target (table name, topic, etc.)
Set filters (optional)
Define mappings (optional)
Choose load type

Filters and Mappings

Source Filters

Reduce data volume with filter conditions:

Filter Examples:
- CreationDate ge datetime'2024-01-01T00:00:00'
- Region eq 'EMEA'
- Status in ('ACTIVE', 'PENDING')

Column Mappings

Auto-mapping: System matches source to target columns automatically

Custom Mapping: Define specific source-to-target column relationships

Custom Mapping Example:
  Source Column    -> Target Column
  SalesOrder       -> SALES_ORDER_ID
  SoldToParty      -> CUSTOMER_ID
  NetAmount        -> AMOUNT

Data Type Compatibility

Ensure source and target data types are compatible. See references/abap-integration.md for ABAP type mappings.

Delivery Guarantees

Default: At-Least-Once

May result in duplicate records during:

Recovery from failures
Network issues
System restarts

Exactly-Once with Database Targets

When using UPSERT to database targets (e.g., HANA Cloud):

System eliminates duplicates automatically
Achieved through key-based merge operations

Suppress Duplicates (Cloud Storage)

For non-database targets:

Enable "Suppress Duplicates" during initial load
Minimizes but may not eliminate all duplicates

Cloud Storage Target Structure

Directory Hierarchy

/<container-base-path>/
    .sap.rms.container                    # Container metadata
    <tableName>/
        .sap.partfile.metadata            # Dataset metadata
        initial/
            .sap.partfile.metadata
            part-<timestamp>-<workOrderID>-<no>.<ext>
            _SUCCESS                       # Load completion marker
        delta/
            <date(time)-optional>/
                .sap.partfile.metadata
                part-<timestamp>-<workOrderID>-<no>.<ext>

File Formats

Format	Options
CSV	Delimiter, header row, encoding
Parquet	Compression (SNAPPY, GZIP), compatibility mode
JSON	Standard JSON format
JSONLines	One JSON object per line

Appended Columns

System automatically adds metadata columns:

Column	Description
`__operation_type`	L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive
`__sequence_number`	Delta row ordering
`__timestamp`	UTC write timestamp

Success Marker

The _SUCCESS file indicates:

Initial load completion
Safe for downstream processing

Kafka as Target

Topic Configuration

One topic per source dataset
Topic name defaults to dataset name (editable)
Configure partitions and replication factor

Serialization

Format	Description
AVRO	Schema in message; column names: alphanumeric + underscore only
JSON	No schema; flexible structure

Note: Schema registries are not supported.

Message Structure

Each source record = one Kafka message (not batched)
Message key = concatenated primary key values (underscore separated)

Message Headers

Header	Values
`kafkaSerializationType`	AVRO or JSON
`opType`	L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive
`Seq`	Sequential integer (delta order); empty for initial load

Compression Options

None
GZIP
Snappy
LZ4
Zstandard

Network Configuration

For Kafka behind Cloud Connector:

Broker addresses must match virtual hosts in SCC
Use identical virtual and internal host values when possible

Monitoring and Management

Monitoring Tools

SAP Data Intelligence Monitoring:

View replication flow status
Track task execution
Monitor data volumes
View error logs

Flow Status

Status	Description
Deployed	Ready to run
Running	Active execution
Completed	Successfully finished
Failed	Error occurred
Stopped	Manually stopped

Management Operations

Operation	Description
Edit	Modify existing flow
Undeploy	Remove from runtime
Delete	Remove flow definition
Clean Up	Remove source artifacts

Clean Up Source Artifacts

After completing replication:

Navigate to deployed flow
Select "Clean Up"
Removes delta pointers and temporary data

Best Practices

Planning

Assess Data Volume: Plan for initial load duration
Choose Delivery Mode: Understand exactly-once requirements
Design Target Schema: Match source structure appropriately
Plan Delta Strategy: Determine grouping (none/date/hour)

Performance

Use Filters: Reduce data volume at source
Optimize Package Size: Balance memory vs. round-trips
Monitor Progress: Track initial and delta loads
Schedule Appropriately: Avoid peak system times

Reliability

Enable Monitoring: Track all flows actively
Handle Duplicates: Design for at-least-once semantics
Validate Before Deploy: Check all configurations
Test with Sample Data: Verify mappings and transformations

Documentation Links

Replicating Data: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data
Create Replication Flow: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md
Cloud Storage Structure: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md
Kafka as Target: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md

Last Updated: 2025-11-22

9.5 KiB Raw Permalink Blame History