gh-secondsky-sap-skills-ski…/references/replication-flows.md

# Replication Flows Guide

Complete guide for data replication in SAP Data Intelligence.

## Table of Contents

1. [Overview](#overview)
2. [Creating Replication Flows](#creating-replication-flows)
3. [Supported Sources](#supported-sources)
4. [Supported Targets](#supported-targets)
5. [Task Configuration](#task-configuration)
6. [Filters and Mappings](#filters-and-mappings)
7. [Delivery Guarantees](#delivery-guarantees)
8. [Cloud Storage Target Structure](#cloud-storage-target-structure)
9. [Kafka as Target](#kafka-as-target)
10. [Monitoring and Management](#monitoring-and-management)

---

## Overview

Replication flows enable data movement from sources to targets with support for:

- Small or large datasets
- Batch or real-time processing
- Full or delta (CDC) loading
- Multiple target types

**Key Workflow:**
1. Configure source and target connections
2. Create replication flow
3. Add tasks with datasets
4. Configure filters and mappings
5. Validate flow
6. Deploy to tenant repository
7. Run and monitor

---

## Creating Replication Flows

### Prerequisites

- Source connection created and enabled in Connection Management
- Target connection configured
- Appropriate authorizations

### Creation Steps

1. **Open Modeler** in SAP Data Intelligence
2. **Navigate** to Replication Flows
3. **Create new** replication flow
4. **Configure source**:
   - Select source connection
   - Choose connection type (ABAP, database, etc.)

5. **Configure target**:
   - Select target type (database, cloud storage, Kafka)
   - Set target-specific options

6. **Add tasks** (see Task Configuration)
7. **Validate** the flow
8. **Deploy** to tenant repository
9. **Run** the flow

---

## Supported Sources

### ABAP Systems

- SAP S/4HANA (Cloud and On-Premise)
- SAP ECC via SLT
- SAP BW/4HANA
- CDS views with extraction

**Source Configuration:**
```
Connection Type: ABAP
Extraction Type: CDS / ODP / Table
Package Size: 50000
```

### Databases

- SAP HANA
- Azure SQL Database (delta requires schema = username)
- Other SQL databases via connectors

---

## Supported Targets

### Database Targets

**SAP HANA Cloud:**
- Write modes: INSERT, UPSERT, DELETE
- Exactly-once delivery with UPSERT
- Batch size configuration

### Cloud Storage Targets

| Target | Description |
|--------|-------------|
| Amazon S3 | AWS object storage |
| Azure Data Lake Storage Gen2 | Microsoft cloud storage |
| Google Cloud Storage | GCP object storage |
| SAP HANA Data Lake | SAP cloud data lake |

**Cloud Storage Options:**
- Group Delta By: None, Date, Hour
- File Type: CSV, Parquet, JSON, JSONLines
- Suppress Duplicates: Minimize duplicate records

**Container Name Limit:** 64 characters maximum

### Kafka Target

- Each dataset maps to a Kafka topic
- Topic names editable (need not match source)
- No container name limit

---

## Task Configuration

Tasks define what data to replicate and how.

### Task Components

```
Task:
  - Source dataset (table, view, etc.)
  - Target specification
  - Filter conditions
  - Column mappings
  - Load type (Initial/Delta)
```

### Load Types

| Type | Description |
|------|-------------|
| Initial Load | Full data extraction |
| Delta Load | Changed data only (CDC) |
| Initial + Delta | Full load followed by continuous delta |

### Creating Tasks

1. Click "Add Task"
2. Select source object
3. Configure target (table name, topic, etc.)
4. Set filters (optional)
5. Define mappings (optional)
6. Choose load type

---

## Filters and Mappings

### Source Filters

Reduce data volume with filter conditions:

```
Filter Examples:
- CreationDate ge datetime'2024-01-01T00:00:00'
- Region eq 'EMEA'
- Status in ('ACTIVE', 'PENDING')
```

### Column Mappings

**Auto-mapping:** System matches source to target columns automatically

**Custom Mapping:** Define specific source-to-target column relationships

```
Custom Mapping Example:
  Source Column    -> Target Column
  SalesOrder       -> SALES_ORDER_ID
  SoldToParty      -> CUSTOMER_ID
  NetAmount        -> AMOUNT
```

### Data Type Compatibility

Ensure source and target data types are compatible. See `references/abap-integration.md` for ABAP type mappings.

---

## Delivery Guarantees

### Default: At-Least-Once

May result in duplicate records during:
- Recovery from failures
- Network issues
- System restarts

### Exactly-Once with Database Targets

When using UPSERT to database targets (e.g., HANA Cloud):
- System eliminates duplicates automatically
- Achieved through key-based merge operations

### Suppress Duplicates (Cloud Storage)

For non-database targets:
- Enable "Suppress Duplicates" during initial load
- Minimizes but may not eliminate all duplicates

---

## Cloud Storage Target Structure

### Directory Hierarchy

```
/<container-base-path>/
    .sap.rms.container                    # Container metadata
    <tableName>/
        .sap.partfile.metadata            # Dataset metadata
        initial/
            .sap.partfile.metadata
            part-<timestamp>-<workOrderID>-<no>.<ext>
            _SUCCESS                       # Load completion marker
        delta/
            <date(time)-optional>/
                .sap.partfile.metadata
                part-<timestamp>-<workOrderID>-<no>.<ext>
```

### File Formats

| Format | Options |
|--------|---------|
| CSV | Delimiter, header row, encoding |
| Parquet | Compression (SNAPPY, GZIP), compatibility mode |
| JSON | Standard JSON format |
| JSONLines | One JSON object per line |

### Appended Columns

System automatically adds metadata columns:

| Column | Description |
|--------|-------------|
| `__operation_type` | L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
| `__sequence_number` | Delta row ordering |
| `__timestamp` | UTC write timestamp |

### Success Marker

The `_SUCCESS` file indicates:
- Initial load completion
- Safe for downstream processing

---

## Kafka as Target

### Topic Configuration

- One topic per source dataset
- Topic name defaults to dataset name (editable)
- Configure partitions and replication factor

### Serialization

| Format | Description |
|--------|-------------|
| AVRO | Schema in message; column names: alphanumeric + underscore only |
| JSON | No schema; flexible structure |

**Note:** Schema registries are not supported.

### Message Structure

- Each source record = one Kafka message (not batched)
- Message key = concatenated primary key values (underscore separated)

### Message Headers

| Header | Values |
|--------|--------|
| `kafkaSerializationType` | AVRO or JSON |
| `opType` | L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
| `Seq` | Sequential integer (delta order); empty for initial load |

### Compression Options

- None
- GZIP
- Snappy
- LZ4
- Zstandard

### Network Configuration

For Kafka behind Cloud Connector:
- Broker addresses must match virtual hosts in SCC
- Use identical virtual and internal host values when possible

---

## Monitoring and Management

### Monitoring Tools

**SAP Data Intelligence Monitoring:**
- View replication flow status
- Track task execution
- Monitor data volumes
- View error logs

### Flow Status

| Status | Description |
|--------|-------------|
| Deployed | Ready to run |
| Running | Active execution |
| Completed | Successfully finished |
| Failed | Error occurred |
| Stopped | Manually stopped |

### Management Operations

| Operation | Description |
|-----------|-------------|
| Edit | Modify existing flow |
| Undeploy | Remove from runtime |
| Delete | Remove flow definition |
| Clean Up | Remove source artifacts |

### Clean Up Source Artifacts

After completing replication:
1. Navigate to deployed flow
2. Select "Clean Up"
3. Removes delta pointers and temporary data

---

## Best Practices

### Planning

1. **Assess Data Volume**: Plan for initial load duration
2. **Choose Delivery Mode**: Understand exactly-once requirements
3. **Design Target Schema**: Match source structure appropriately
4. **Plan Delta Strategy**: Determine grouping (none/date/hour)

### Performance

1. **Use Filters**: Reduce data volume at source
2. **Optimize Package Size**: Balance memory vs. round-trips
3. **Monitor Progress**: Track initial and delta loads
4. **Schedule Appropriately**: Avoid peak system times

### Reliability

1. **Enable Monitoring**: Track all flows actively
2. **Handle Duplicates**: Design for at-least-once semantics
3. **Validate Before Deploy**: Check all configurations
4. **Test with Sample Data**: Verify mappings and transformations

---

## Documentation Links

- **Replicating Data**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data)
- **Create Replication Flow**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md)
- **Cloud Storage Structure**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md)
- **Kafka as Target**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md)

---

**Last Updated**: 2025-11-22