Initial commit
This commit is contained in:
379
references/replication-flows.md
Normal file
379
references/replication-flows.md
Normal file
@@ -0,0 +1,379 @@
|
||||
# Replication Flows Guide
|
||||
|
||||
Complete guide for data replication in SAP Data Intelligence.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Creating Replication Flows](#creating-replication-flows)
|
||||
3. [Supported Sources](#supported-sources)
|
||||
4. [Supported Targets](#supported-targets)
|
||||
5. [Task Configuration](#task-configuration)
|
||||
6. [Filters and Mappings](#filters-and-mappings)
|
||||
7. [Delivery Guarantees](#delivery-guarantees)
|
||||
8. [Cloud Storage Target Structure](#cloud-storage-target-structure)
|
||||
9. [Kafka as Target](#kafka-as-target)
|
||||
10. [Monitoring and Management](#monitoring-and-management)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Replication flows enable data movement from sources to targets with support for:
|
||||
|
||||
- Small or large datasets
|
||||
- Batch or real-time processing
|
||||
- Full or delta (CDC) loading
|
||||
- Multiple target types
|
||||
|
||||
**Key Workflow:**
|
||||
1. Configure source and target connections
|
||||
2. Create replication flow
|
||||
3. Add tasks with datasets
|
||||
4. Configure filters and mappings
|
||||
5. Validate flow
|
||||
6. Deploy to tenant repository
|
||||
7. Run and monitor
|
||||
|
||||
---
|
||||
|
||||
## Creating Replication Flows
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Source connection created and enabled in Connection Management
|
||||
- Target connection configured
|
||||
- Appropriate authorizations
|
||||
|
||||
### Creation Steps
|
||||
|
||||
1. **Open Modeler** in SAP Data Intelligence
|
||||
2. **Navigate** to Replication Flows
|
||||
3. **Create new** replication flow
|
||||
4. **Configure source**:
|
||||
- Select source connection
|
||||
- Choose connection type (ABAP, database, etc.)
|
||||
|
||||
5. **Configure target**:
|
||||
- Select target type (database, cloud storage, Kafka)
|
||||
- Set target-specific options
|
||||
|
||||
6. **Add tasks** (see Task Configuration)
|
||||
7. **Validate** the flow
|
||||
8. **Deploy** to tenant repository
|
||||
9. **Run** the flow
|
||||
|
||||
---
|
||||
|
||||
## Supported Sources
|
||||
|
||||
### ABAP Systems
|
||||
|
||||
- SAP S/4HANA (Cloud and On-Premise)
|
||||
- SAP ECC via SLT
|
||||
- SAP BW/4HANA
|
||||
- CDS views with extraction
|
||||
|
||||
**Source Configuration:**
|
||||
```
|
||||
Connection Type: ABAP
|
||||
Extraction Type: CDS / ODP / Table
|
||||
Package Size: 50000
|
||||
```
|
||||
|
||||
### Databases
|
||||
|
||||
- SAP HANA
|
||||
- Azure SQL Database (delta requires schema = username)
|
||||
- Other SQL databases via connectors
|
||||
|
||||
---
|
||||
|
||||
## Supported Targets
|
||||
|
||||
### Database Targets
|
||||
|
||||
**SAP HANA Cloud:**
|
||||
- Write modes: INSERT, UPSERT, DELETE
|
||||
- Exactly-once delivery with UPSERT
|
||||
- Batch size configuration
|
||||
|
||||
### Cloud Storage Targets
|
||||
|
||||
| Target | Description |
|
||||
|--------|-------------|
|
||||
| Amazon S3 | AWS object storage |
|
||||
| Azure Data Lake Storage Gen2 | Microsoft cloud storage |
|
||||
| Google Cloud Storage | GCP object storage |
|
||||
| SAP HANA Data Lake | SAP cloud data lake |
|
||||
|
||||
**Cloud Storage Options:**
|
||||
- Group Delta By: None, Date, Hour
|
||||
- File Type: CSV, Parquet, JSON, JSONLines
|
||||
- Suppress Duplicates: Minimize duplicate records
|
||||
|
||||
**Container Name Limit:** 64 characters maximum
|
||||
|
||||
### Kafka Target
|
||||
|
||||
- Each dataset maps to a Kafka topic
|
||||
- Topic names editable (need not match source)
|
||||
- No container name limit
|
||||
|
||||
---
|
||||
|
||||
## Task Configuration
|
||||
|
||||
Tasks define what data to replicate and how.
|
||||
|
||||
### Task Components
|
||||
|
||||
```
|
||||
Task:
|
||||
- Source dataset (table, view, etc.)
|
||||
- Target specification
|
||||
- Filter conditions
|
||||
- Column mappings
|
||||
- Load type (Initial/Delta)
|
||||
```
|
||||
|
||||
### Load Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| Initial Load | Full data extraction |
|
||||
| Delta Load | Changed data only (CDC) |
|
||||
| Initial + Delta | Full load followed by continuous delta |
|
||||
|
||||
### Creating Tasks
|
||||
|
||||
1. Click "Add Task"
|
||||
2. Select source object
|
||||
3. Configure target (table name, topic, etc.)
|
||||
4. Set filters (optional)
|
||||
5. Define mappings (optional)
|
||||
6. Choose load type
|
||||
|
||||
---
|
||||
|
||||
## Filters and Mappings
|
||||
|
||||
### Source Filters
|
||||
|
||||
Reduce data volume with filter conditions:
|
||||
|
||||
```
|
||||
Filter Examples:
|
||||
- CreationDate ge datetime'2024-01-01T00:00:00'
|
||||
- Region eq 'EMEA'
|
||||
- Status in ('ACTIVE', 'PENDING')
|
||||
```
|
||||
|
||||
### Column Mappings
|
||||
|
||||
**Auto-mapping:** System matches source to target columns automatically
|
||||
|
||||
**Custom Mapping:** Define specific source-to-target column relationships
|
||||
|
||||
```
|
||||
Custom Mapping Example:
|
||||
Source Column -> Target Column
|
||||
SalesOrder -> SALES_ORDER_ID
|
||||
SoldToParty -> CUSTOMER_ID
|
||||
NetAmount -> AMOUNT
|
||||
```
|
||||
|
||||
### Data Type Compatibility
|
||||
|
||||
Ensure source and target data types are compatible. See `references/abap-integration.md` for ABAP type mappings.
|
||||
|
||||
---
|
||||
|
||||
## Delivery Guarantees
|
||||
|
||||
### Default: At-Least-Once
|
||||
|
||||
May result in duplicate records during:
|
||||
- Recovery from failures
|
||||
- Network issues
|
||||
- System restarts
|
||||
|
||||
### Exactly-Once with Database Targets
|
||||
|
||||
When using UPSERT to database targets (e.g., HANA Cloud):
|
||||
- System eliminates duplicates automatically
|
||||
- Achieved through key-based merge operations
|
||||
|
||||
### Suppress Duplicates (Cloud Storage)
|
||||
|
||||
For non-database targets:
|
||||
- Enable "Suppress Duplicates" during initial load
|
||||
- Minimizes but may not eliminate all duplicates
|
||||
|
||||
---
|
||||
|
||||
## Cloud Storage Target Structure
|
||||
|
||||
### Directory Hierarchy
|
||||
|
||||
```
|
||||
/<container-base-path>/
|
||||
.sap.rms.container # Container metadata
|
||||
<tableName>/
|
||||
.sap.partfile.metadata # Dataset metadata
|
||||
initial/
|
||||
.sap.partfile.metadata
|
||||
part-<timestamp>-<workOrderID>-<no>.<ext>
|
||||
_SUCCESS # Load completion marker
|
||||
delta/
|
||||
<date(time)-optional>/
|
||||
.sap.partfile.metadata
|
||||
part-<timestamp>-<workOrderID>-<no>.<ext>
|
||||
```
|
||||
|
||||
### File Formats
|
||||
|
||||
| Format | Options |
|
||||
|--------|---------|
|
||||
| CSV | Delimiter, header row, encoding |
|
||||
| Parquet | Compression (SNAPPY, GZIP), compatibility mode |
|
||||
| JSON | Standard JSON format |
|
||||
| JSONLines | One JSON object per line |
|
||||
|
||||
### Appended Columns
|
||||
|
||||
System automatically adds metadata columns:
|
||||
|
||||
| Column | Description |
|
||||
|--------|-------------|
|
||||
| `__operation_type` | L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
|
||||
| `__sequence_number` | Delta row ordering |
|
||||
| `__timestamp` | UTC write timestamp |
|
||||
|
||||
### Success Marker
|
||||
|
||||
The `_SUCCESS` file indicates:
|
||||
- Initial load completion
|
||||
- Safe for downstream processing
|
||||
|
||||
---
|
||||
|
||||
## Kafka as Target
|
||||
|
||||
### Topic Configuration
|
||||
|
||||
- One topic per source dataset
|
||||
- Topic name defaults to dataset name (editable)
|
||||
- Configure partitions and replication factor
|
||||
|
||||
### Serialization
|
||||
|
||||
| Format | Description |
|
||||
|--------|-------------|
|
||||
| AVRO | Schema in message; column names: alphanumeric + underscore only |
|
||||
| JSON | No schema; flexible structure |
|
||||
|
||||
**Note:** Schema registries are not supported.
|
||||
|
||||
### Message Structure
|
||||
|
||||
- Each source record = one Kafka message (not batched)
|
||||
- Message key = concatenated primary key values (underscore separated)
|
||||
|
||||
### Message Headers
|
||||
|
||||
| Header | Values |
|
||||
|--------|--------|
|
||||
| `kafkaSerializationType` | AVRO or JSON |
|
||||
| `opType` | L=Load, I=Insert, U=Update, B=Before, X=Delete, M=Archive |
|
||||
| `Seq` | Sequential integer (delta order); empty for initial load |
|
||||
|
||||
### Compression Options
|
||||
|
||||
- None
|
||||
- GZIP
|
||||
- Snappy
|
||||
- LZ4
|
||||
- Zstandard
|
||||
|
||||
### Network Configuration
|
||||
|
||||
For Kafka behind Cloud Connector:
|
||||
- Broker addresses must match virtual hosts in SCC
|
||||
- Use identical virtual and internal host values when possible
|
||||
|
||||
---
|
||||
|
||||
## Monitoring and Management
|
||||
|
||||
### Monitoring Tools
|
||||
|
||||
**SAP Data Intelligence Monitoring:**
|
||||
- View replication flow status
|
||||
- Track task execution
|
||||
- Monitor data volumes
|
||||
- View error logs
|
||||
|
||||
### Flow Status
|
||||
|
||||
| Status | Description |
|
||||
|--------|-------------|
|
||||
| Deployed | Ready to run |
|
||||
| Running | Active execution |
|
||||
| Completed | Successfully finished |
|
||||
| Failed | Error occurred |
|
||||
| Stopped | Manually stopped |
|
||||
|
||||
### Management Operations
|
||||
|
||||
| Operation | Description |
|
||||
|-----------|-------------|
|
||||
| Edit | Modify existing flow |
|
||||
| Undeploy | Remove from runtime |
|
||||
| Delete | Remove flow definition |
|
||||
| Clean Up | Remove source artifacts |
|
||||
|
||||
### Clean Up Source Artifacts
|
||||
|
||||
After completing replication:
|
||||
1. Navigate to deployed flow
|
||||
2. Select "Clean Up"
|
||||
3. Removes delta pointers and temporary data
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Planning
|
||||
|
||||
1. **Assess Data Volume**: Plan for initial load duration
|
||||
2. **Choose Delivery Mode**: Understand exactly-once requirements
|
||||
3. **Design Target Schema**: Match source structure appropriately
|
||||
4. **Plan Delta Strategy**: Determine grouping (none/date/hour)
|
||||
|
||||
### Performance
|
||||
|
||||
1. **Use Filters**: Reduce data volume at source
|
||||
2. **Optimize Package Size**: Balance memory vs. round-trips
|
||||
3. **Monitor Progress**: Track initial and delta loads
|
||||
4. **Schedule Appropriately**: Avoid peak system times
|
||||
|
||||
### Reliability
|
||||
|
||||
1. **Enable Monitoring**: Track all flows actively
|
||||
2. **Handle Duplicates**: Design for at-least-once semantics
|
||||
3. **Validate Before Deploy**: Check all configurations
|
||||
4. **Test with Sample Data**: Verify mappings and transformations
|
||||
|
||||
---
|
||||
|
||||
## Documentation Links
|
||||
|
||||
- **Replicating Data**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/replicating-data)
|
||||
- **Create Replication Flow**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/create-a-replication-flow-a425e34.md)
|
||||
- **Cloud Storage Structure**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/cloud-storage-target-structure-12e0f97.md)
|
||||
- **Kafka as Target**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/blob/main/docs/modelingguide/replicating-data/kafka-as-target-b9b819c.md)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-11-22
|
||||
Reference in New Issue
Block a user