9.5 KiB
Replication Flows Guide
Complete guide for data replication in SAP Data Intelligence.
Table of Contents
- Overview
- Creating Replication Flows
- Supported Sources
- Supported Targets
- Task Configuration
- Filters and Mappings
- Delivery Guarantees
- Cloud Storage Target Structure
- Kafka as Target
- Monitoring and Management
Overview
Replication flows enable data movement from sources to targets with support for:
- Small or large datasets
- Batch or real-time processing
- Full or delta (CDC) loading
- Multiple target types
Key Workflow:
- Configure source and target connections
- Create replication flow
- Add tasks with datasets
- Configure filters and mappings
- Validate flow
- Deploy to tenant repository
- Run and monitor
Creating Replication Flows
Prerequisites
- Source connection created and enabled in Connection Management
- Target connection configured
- Appropriate authorizations
Creation Steps
-
Open Modeler in SAP Data Intelligence
-
Navigate to Replication Flows
-
Create new replication flow
-
Configure source:
- Select source connection
- Choose connection type (ABAP, database, etc.)
-
Configure target:
- Select target type (database, cloud storage, Kafka)
- Set target-specific options
-
Add tasks (see Task Configuration)
-
Validate the flow
-
Deploy to tenant repository
-
Run the flow
Supported Sources
ABAP Systems
- SAP S/4HANA (Cloud and On-Premise)
- SAP ECC via SLT
- SAP BW/4HANA
- CDS views with extraction
Source Configuration:
Connection Type: ABAP
Extraction Type: CDS / ODP / Table
Package Size: 50000
Databases
- SAP HANA
- Azure SQL Database (delta requires schema = username)
- Other SQL databases via connectors
Supported Targets
Database Targets
SAP HANA Cloud:
- Write modes: INSERT, UPSERT, DELETE
- Exactly-once delivery with UPSERT
- Batch size configuration
Cloud Storage Targets
| Target | Description |
|---|---|
| Amazon S3 | AWS object storage |
| Azure Data Lake Storage Gen2 | Microsoft cloud storage |
| Google Cloud Storage | GCP object storage |
| SAP HANA Data Lake | SAP cloud data lake |
Cloud Storage Options:
- Group Delta By: None, Date, Hour
- File Type: CSV, Parquet, JSON, JSONLines
- Suppress Duplicates: Minimize duplicate records
Container Name Limit: 64 characters maximum
Kafka Target
- Each dataset maps to a Kafka topic
- Topic names editable (need not match source)
- No container name limit
Task Configuration
Tasks define what data to replicate and how.
Task Components
Task:
- Source dataset (table, view, etc.)
- Target specification
- Filter conditions
- Column mappings
- Load type (Initial/Delta)
Load Types
| Type | Description |
|---|---|
| Initial Load | Full data extraction |
| Delta Load | Changed data only (CDC) |
| Initial + Delta | Full load followed by continuous delta |
Creating Tasks
- Click "Add Task"
- Select source object
- Configure target (table name, topic, etc.)
- Set filters (optional)
- Define mappings (optional)
- Choose load type
Filters and Mappings
Source Filters
Reduce data volume with filter conditions:
Filter Examples:
- CreationDate ge datetime'2024-01-01T00:00:00'
- Region eq 'EMEA'
- Status in ('ACTIVE', 'PENDING')
Column Mappings
Auto-mapping: System matches source to target columns automatically
Custom Mapping: Define specific source-to-target column relationships
Custom Mapping Example:
Source Column -> Target Column
SalesOrder -> SALES_ORDER_ID
SoldToParty -> CUSTOMER_ID
NetAmount -> AMOUNT
Data Type Compatibility
Ensure source and target data types are compatible. See references/abap-integration.md for ABAP type mappings.
Delivery Guarantees
Default: At-Least-Once
May result in duplicate records during:
- Recovery from failures
- Network issues
- System restarts
Exactly-Once with Database Targets
When using UPSERT to database targets (e.g., HANA Cloud):
- System eliminates duplicates automatically
- Achieved through key-based merge operations
Suppress Duplicates (Cloud Storage)
For non-database targets:
- Enable "Suppress Duplicates" during initial load
- Minimizes but may not eliminate all duplicates
Cloud Storage Target Structure
Directory Hierarchy
/<container-base-path>/
.sap.rms.container # Container metadata
<tableName>/
.sap.partfile.metadata # Dataset metadata
initial/
.sap.partfile.metadata
part-<timestamp>-<workOrderID>-<no>.<ext>
_SUCCESS # Load completion marker
delta/
<date(time)-optional>/
.sap.partfile.metadata
part-<timestamp>-<workOrderID>-<no>.<ext>
File Formats
| Format | Options |
|---|---|
| CSV | Delimiter, header row, encoding |
| Parquet | Compression (SNAPPY, GZIP), compatibility mode |
| JSON | Standard JSON format |
| JSONLines | One JSON object per line |
Appended Columns
System automatically adds metadata columns:
| Column | Description |
|---|---|
__operation_type |
L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
__sequence_number |
Delta row ordering |
__timestamp |
UTC write timestamp |
Success Marker
The _SUCCESS file indicates:
- Initial load completion
- Safe for downstream processing
Kafka as Target
Topic Configuration
- One topic per source dataset
- Topic name defaults to dataset name (editable)
- Configure partitions and replication factor
Serialization
| Format | Description |
|---|---|
| AVRO | Schema in message; column names: alphanumeric + underscore only |
| JSON | No schema; flexible structure |
Note: Schema registries are not supported.
Message Structure
- Each source record = one Kafka message (not batched)
- Message key = concatenated primary key values (underscore separated)
Message Headers
| Header | Values |
|---|---|
kafkaSerializationType |
AVRO or JSON |
opType |
L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
Seq |
Sequential integer (delta order); empty for initial load |
Compression Options
- None
- GZIP
- Snappy
- LZ4
- Zstandard
Network Configuration
For Kafka behind Cloud Connector:
- Broker addresses must match virtual hosts in SCC
- Use identical virtual and internal host values when possible
Monitoring and Management
Monitoring Tools
SAP Data Intelligence Monitoring:
- View replication flow status
- Track task execution
- Monitor data volumes
- View error logs
Flow Status
| Status | Description |
|---|---|
| Deployed | Ready to run |
| Running | Active execution |
| Completed | Successfully finished |
| Failed | Error occurred |
| Stopped | Manually stopped |
Management Operations
| Operation | Description |
|---|---|
| Edit | Modify existing flow |
| Undeploy | Remove from runtime |
| Delete | Remove flow definition |
| Clean Up | Remove source artifacts |
Clean Up Source Artifacts
After completing replication:
- Navigate to deployed flow
- Select "Clean Up"
- Removes delta pointers and temporary data
Best Practices
Planning
- Assess Data Volume: Plan for initial load duration
- Choose Delivery Mode: Understand exactly-once requirements
- Design Target Schema: Match source structure appropriately
- Plan Delta Strategy: Determine grouping (none/date/hour)
Performance
- Use Filters: Reduce data volume at source
- Optimize Package Size: Balance memory vs. round-trips
- Monitor Progress: Track initial and delta loads
- Schedule Appropriately: Avoid peak system times
Reliability
- Enable Monitoring: Track all flows actively
- Handle Duplicates: Design for at-least-once semantics
- Validate Before Deploy: Check all configurations
- Test with Sample Data: Verify mappings and transformations
Documentation Links
- Replicating Data: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data
- Create Replication Flow: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md
- Cloud Storage Structure: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md
- Kafka as Target: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md
Last Updated: 2025-11-22