Files
gh-secondsky-sap-skills-ski…/references/replication-flows.md
2025-11-30 08:55:25 +08:00

9.5 KiB

Replication Flows Guide

Complete guide for data replication in SAP Data Intelligence.

Table of Contents

  1. Overview
  2. Creating Replication Flows
  3. Supported Sources
  4. Supported Targets
  5. Task Configuration
  6. Filters and Mappings
  7. Delivery Guarantees
  8. Cloud Storage Target Structure
  9. Kafka as Target
  10. Monitoring and Management

Overview

Replication flows enable data movement from sources to targets with support for:

  • Small or large datasets
  • Batch or real-time processing
  • Full or delta (CDC) loading
  • Multiple target types

Key Workflow:

  1. Configure source and target connections
  2. Create replication flow
  3. Add tasks with datasets
  4. Configure filters and mappings
  5. Validate flow
  6. Deploy to tenant repository
  7. Run and monitor

Creating Replication Flows

Prerequisites

  • Source connection created and enabled in Connection Management
  • Target connection configured
  • Appropriate authorizations

Creation Steps

  1. Open Modeler in SAP Data Intelligence

  2. Navigate to Replication Flows

  3. Create new replication flow

  4. Configure source:

    • Select source connection
    • Choose connection type (ABAP, database, etc.)
  5. Configure target:

    • Select target type (database, cloud storage, Kafka)
    • Set target-specific options
  6. Add tasks (see Task Configuration)

  7. Validate the flow

  8. Deploy to tenant repository

  9. Run the flow


Supported Sources

ABAP Systems

  • SAP S/4HANA (Cloud and On-Premise)
  • SAP ECC via SLT
  • SAP BW/4HANA
  • CDS views with extraction

Source Configuration:

Connection Type: ABAP
Extraction Type: CDS / ODP / Table
Package Size: 50000

Databases

  • SAP HANA
  • Azure SQL Database (delta requires schema = username)
  • Other SQL databases via connectors

Supported Targets

Database Targets

SAP HANA Cloud:

  • Write modes: INSERT, UPSERT, DELETE
  • Exactly-once delivery with UPSERT
  • Batch size configuration

Cloud Storage Targets

Target Description
Amazon S3 AWS object storage
Azure Data Lake Storage Gen2 Microsoft cloud storage
Google Cloud Storage GCP object storage
SAP HANA Data Lake SAP cloud data lake

Cloud Storage Options:

  • Group Delta By: None, Date, Hour
  • File Type: CSV, Parquet, JSON, JSONLines
  • Suppress Duplicates: Minimize duplicate records

Container Name Limit: 64 characters maximum

Kafka Target

  • Each dataset maps to a Kafka topic
  • Topic names editable (need not match source)
  • No container name limit

Task Configuration

Tasks define what data to replicate and how.

Task Components

Task:
  - Source dataset (table, view, etc.)
  - Target specification
  - Filter conditions
  - Column mappings
  - Load type (Initial/Delta)

Load Types

Type Description
Initial Load Full data extraction
Delta Load Changed data only (CDC)
Initial + Delta Full load followed by continuous delta

Creating Tasks

  1. Click "Add Task"
  2. Select source object
  3. Configure target (table name, topic, etc.)
  4. Set filters (optional)
  5. Define mappings (optional)
  6. Choose load type

Filters and Mappings

Source Filters

Reduce data volume with filter conditions:

Filter Examples:
- CreationDate ge datetime'2024-01-01T00:00:00'
- Region eq 'EMEA'
- Status in ('ACTIVE', 'PENDING')

Column Mappings

Auto-mapping: System matches source to target columns automatically

Custom Mapping: Define specific source-to-target column relationships

Custom Mapping Example:
  Source Column    -> Target Column
  SalesOrder       -> SALES_ORDER_ID
  SoldToParty      -> CUSTOMER_ID
  NetAmount        -> AMOUNT

Data Type Compatibility

Ensure source and target data types are compatible. See references/abap-integration.md for ABAP type mappings.


Delivery Guarantees

Default: At-Least-Once

May result in duplicate records during:

  • Recovery from failures
  • Network issues
  • System restarts

Exactly-Once with Database Targets

When using UPSERT to database targets (e.g., HANA Cloud):

  • System eliminates duplicates automatically
  • Achieved through key-based merge operations

Suppress Duplicates (Cloud Storage)

For non-database targets:

  • Enable "Suppress Duplicates" during initial load
  • Minimizes but may not eliminate all duplicates

Cloud Storage Target Structure

Directory Hierarchy

/<container-base-path>/
    .sap.rms.container                    # Container metadata
    <tableName>/
        .sap.partfile.metadata            # Dataset metadata
        initial/
            .sap.partfile.metadata
            part-<timestamp>-<workOrderID>-<no>.<ext>
            _SUCCESS                       # Load completion marker
        delta/
            <date(time)-optional>/
                .sap.partfile.metadata
                part-<timestamp>-<workOrderID>-<no>.<ext>

File Formats

Format Options
CSV Delimiter, header row, encoding
Parquet Compression (SNAPPY, GZIP), compatibility mode
JSON Standard JSON format
JSONLines One JSON object per line

Appended Columns

System automatically adds metadata columns:

Column Description
__operation_type L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive
__sequence_number Delta row ordering
__timestamp UTC write timestamp

Success Marker

The _SUCCESS file indicates:

  • Initial load completion
  • Safe for downstream processing

Kafka as Target

Topic Configuration

  • One topic per source dataset
  • Topic name defaults to dataset name (editable)
  • Configure partitions and replication factor

Serialization

Format Description
AVRO Schema in message; column names: alphanumeric + underscore only
JSON No schema; flexible structure

Note: Schema registries are not supported.

Message Structure

  • Each source record = one Kafka message (not batched)
  • Message key = concatenated primary key values (underscore separated)

Message Headers

Header Values
kafkaSerializationType AVRO or JSON
opType L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive
Seq Sequential integer (delta order); empty for initial load

Compression Options

  • None
  • GZIP
  • Snappy
  • LZ4
  • Zstandard

Network Configuration

For Kafka behind Cloud Connector:

  • Broker addresses must match virtual hosts in SCC
  • Use identical virtual and internal host values when possible

Monitoring and Management

Monitoring Tools

SAP Data Intelligence Monitoring:

  • View replication flow status
  • Track task execution
  • Monitor data volumes
  • View error logs

Flow Status

Status Description
Deployed Ready to run
Running Active execution
Completed Successfully finished
Failed Error occurred
Stopped Manually stopped

Management Operations

Operation Description
Edit Modify existing flow
Undeploy Remove from runtime
Delete Remove flow definition
Clean Up Remove source artifacts

Clean Up Source Artifacts

After completing replication:

  1. Navigate to deployed flow
  2. Select "Clean Up"
  3. Removes delta pointers and temporary data

Best Practices

Planning

  1. Assess Data Volume: Plan for initial load duration
  2. Choose Delivery Mode: Understand exactly-once requirements
  3. Design Target Schema: Match source structure appropriately
  4. Plan Delta Strategy: Determine grouping (none/date/hour)

Performance

  1. Use Filters: Reduce data volume at source
  2. Optimize Package Size: Balance memory vs. round-trips
  3. Monitor Progress: Track initial and delta loads
  4. Schedule Appropriately: Avoid peak system times

Reliability

  1. Enable Monitoring: Track all flows actively
  2. Handle Duplicates: Design for at-least-once semantics
  3. Validate Before Deploy: Check all configurations
  4. Test with Sample Data: Verify mappings and transformations


Last Updated: 2025-11-22