19 KiB
Data Acquisition and Preparation Reference
Table of Contents
- Data Flows
- Replication Flows
- Transformation Flows
- Local Tables
- Remote Tables
- Task Chains
- Python Operators
- Data Transformation
- Semantic Onboarding
- File Spaces and Object Store
Data Flows
Data flows provide ETL capabilities for data transformation and loading.
Prerequisites
Required Privileges:
- Data Warehouse General (
-R------) - SAP Datasphere access - Connection (
-R------) - Read connections - Data Warehouse Data Builder (
CRUD----) - Create/edit/delete flows - Space Files (
CRUD----) - Manage space objects - Data Warehouse Data Integration (
-RU-----) - Run flows - Data Warehouse Data Integration (
-R--E---) - Schedule flows
Creating a Data Flow
- Navigate to Data Builder
- Select "New Data Flow"
- Add source operators
- Add transformation operators
- Add target operator
- Save and deploy
Key Limitations
- No delta processing: Use replication flows for delta/CDC data instead
- Single target table only per data flow
- Local tables only: Data flows load exclusively to local tables in the repository
- Double quotes unsupported in identifiers (column/table names)
- Spatial data types not supported
- ABAP source preview unavailable (except CDS views and LTR objects)
- Transformation operators cannot be previewed
Advanced Properties
Dynamic Memory Allocation:
| Setting | Memory Range | Use Case |
|---|---|---|
| Small | 1-2 GB | Low volume |
| Medium | 2-3 GB | Standard volume |
| Large | 3-5 GB | High volume |
Additional Options:
- Automatic restart on failure
- Input parameters support
Data Flow Operators
Source Operators:
- Remote tables
- Local tables
- Views
- CSV files
Transformation Operators:
| Operator | Purpose | Configuration |
|---|---|---|
| Join | Combine sources | Join type, conditions |
| Union | Stack sources | Column mapping |
| Projection | Select columns | Include/exclude, rename |
| Filter | Row filtering | Filter conditions |
| Aggregation | Group and aggregate | Group by, aggregates |
| Script | Custom Python | Python code |
| Calculated Column | Derived values | Expression |
Target Operators:
- Local table (new or existing)
- Truncate and insert or delta merge
Join Operations
Join Types:
- Inner Join: Matching rows only
- Left Outer: All left + matching right
- Right Outer: All right + matching left
- Full Outer: All rows from both
- Cross Join: Cartesian product
Join Conditions:
source1.column = source2.column
Aggregation Operations
Aggregate Functions:
- SUM, AVG, MIN, MAX
- COUNT, COUNT DISTINCT
- FIRST, LAST
Calculated Columns
Expression Syntax:
CASE WHEN column1 > 100 THEN 'High' ELSE 'Low' END
CONCAT(first_name, ' ', last_name)
ROUND(amount * exchange_rate, 2)
Input Parameters
Define runtime parameters for dynamic filtering:
Parameter Types:
- String
- Integer
- Date
- Timestamp
Usage in Expressions:
WHERE region = :IP_REGION
Running Data Flows
Execution Options:
- Manual run from Data Builder
- Scheduled via task chain
- API trigger
Run Modes:
- Full: Process all data
- Delta: Process changes only (requires delta capture)
Replication Flows
Replicate data from source systems to SAP Datasphere or external targets.
Creating a Replication Flow
- Navigate to Data Builder
- Select "New Replication Flow"
- Add source connection and objects
- Add target connection
- Configure load type and mappings
- Save and deploy
Source Systems
SAP Sources:
- SAP S/4HANA Cloud (ODP, CDS views)
- SAP S/4HANA On-Premise (ODP, SLT, CDS)
- SAP BW/4HANA
- SAP ECC
- SAP HANA
Cloud Storage Sources:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- SFTP
Streaming Sources:
- Apache Kafka
- Confluent Kafka
Target Systems
SAP Datasphere Targets:
- Local tables (managed by replication flow)
External Targets:
- Apache Kafka
- Confluent Kafka
- Google BigQuery
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- SFTP
- SAP Signavio
Load Types
| Load Type | Description | Use Case |
|---|---|---|
| Initial Only | One-time full load | Static data |
| Initial + Delta | Full load then changes | Standard replication |
| Real-Time | Continuous streaming | Live data |
Configuration Options
Flow-Level Properties:
| Property | Description | Default |
|---|---|---|
| Delta Load Frequency | Interval for delta changes | Configurable |
| Skip Unmapped Target Columns | Ignore unmapped columns | Optional |
| Merge Data Automatically | Auto-merge for file space targets | Requires consent |
| Source Thread Limit | Parallel threads for source (1-160) | 16 |
| Target Thread Limit | Parallel threads for target (1-160) | 16 |
| Content Type | Template or Native format | Template |
Object-Level Properties:
| Property | Description |
|---|---|
| Load Type | Initial Only, Initial+Delta, Delta Only |
| Delta Capture | Enable CDC tracking |
| ABAP Exit | Custom projection logic |
| Object Thread Count | Thread count for delta operations |
| Delete Before Load | Clear target before loading |
Critical Constraints
- No input parameters: Replication flows do not support input parameters
- Thread limits read-only at design time: Editable only after deployment
- Content Type applies globally: Selection affects all replication objects in the flow
- ABAP systems: Consult SAP Note 3297105 before creating replication flows
Content Type (ABAP Sources)
| Type | Date/Timestamp Handling | Use Case |
|---|---|---|
| Template Type | Applies ISO format requirements | Standard integration |
| Native Type | Dates → strings, timestamps → decimals | Custom formatting |
Filters:
- Define row-level filters on source
- Multiple filter conditions with AND/OR
- Important: For ODP-CDS, filters must apply to primary key fields only
Mappings:
- Automatic column mapping
- Manual mapping overrides
- Exclude columns
Projections:
- Custom SQL expressions
- Column transformations
- Calculated columns
- ABAP Exit for custom projection logic
Sizing and Performance
Thread Configuration:
- Source/Target Thread Limits: 1-160 (default: 16)
- Higher values = more parallelism but more resources
- Consider source system capacity
Capacity Planning:
- Estimate data volume per table
- Consider network bandwidth
- Plan for parallel execution
- RFC fast serialization (SAP Note 3486245) for improved performance
Load Balancing:
- Distribute across multiple flows
- Schedule during off-peak hours
- Monitor resource consumption
Unsupported Data Types
- BLOB, CLOB (large objects)
- Spatial data types
- Custom ABAP types
- Virtual Tables (SAP HANA Smart Data Access)
- Row Tables (use COLUMN TABLE only)
Transformation Flows
Delta-aware transformations with automatic change propagation.
Creating a Transformation Flow
- Navigate to Data Builder
- Select "New Transformation Flow"
- Add source (view or graphical view)
- Add target table
- Configure run settings
- Save and deploy
Key Constraints and Limitations
Data Access Restrictions: Views and Open SQL schema objects cannot be used if they:
- Reference remote tables (except BW Bridge)
- Consume views with data access controls
- Have controls applied to them
Loading Constraints:
- Loading delta changes from views is not supported
- Only loads data to local SAP Datasphere repository tables
- Remote tables in BW Bridge spaces must be shared with the SAP Datasphere space
Runtime Options
| Runtime | Storage Target | Use Case |
|---|---|---|
| HANA | SAP HANA Database storage | Standard transformations |
| SPARK | SAP HANA Data Lake Files storage | Large-scale file processing |
Load Types
| Load Type | Description | Requirements |
|---|---|---|
| Initial Only | Full dataset load | None |
| Initial and Delta | Full load then changes | Delta capture enabled on source and target tables |
Input Parameter Constraints
- Cannot be created/edited in Graphical View Editor
- Scheduled flows use default values
- Not supported in Python operations (Spark runtime)
- Exclude from task chain input parameters
Source Options
- Graphical view (created inline)
- SQL view (created inline)
- Existing views
Target Table Management
Options:
- Create new local table
- Use existing local table
Column Handling:
- Add new columns automatically
- Map columns manually
- Exclude columns
Run Modes
| Mode | Action | Use Case |
|---|---|---|
| Start | Process delta changes | Regular runs |
| Delete | Remove target records | Cleanup |
| Truncate | Clear and reload | Full refresh |
Delta Processing
Transformation flows track changes automatically:
- Insert: New records
- Update: Modified records
- Delete: Removed records
File Space Transformations
Transform data in object store (file spaces):
Supported Functions:
- String functions
- Numeric functions
- Date functions
- Conversion functions
Local Tables
Store data directly in SAP Datasphere.
Creating Local Tables
Methods:
- Data Builder > New Table
- Import from CSV
- Create from data flow target
- Create from replication flow target
Storage Options
| Storage | Target System | Use Case |
|---|---|---|
| Disk | SAP HANA Cloud, SAP HANA database | Standard persistent storage |
| In-Memory | SAP HANA Cloud, SAP HANA database | High-performance hot data |
| File | SAP HANA Cloud data lake storage | Large-scale cost-effective storage |
Table Properties
Key Columns:
- Primary key definition
- Unique constraints
Data Types:
- String (VARCHAR)
- Integer (INT, BIGINT)
- Decimal (DECIMAL)
- Date, Time, Timestamp
- Boolean
- Binary
Partitioning
Partition Types:
- Range partitioning (date/numeric)
- Hash partitioning
Benefits:
- Improved query performance
- Parallel processing
- Selective data loading
Delta Capture
Enable change tracking for incremental processing:
- Enable delta capture on table
- Track insert/update/delete operations
- Query changes with delta tokens
Important Constraint: Once delta capture is enabled and deployed, it cannot be modified or disabled.
Allow Data Transport
Available for dimensions on SAP Business Data Cloud formation tenants:
- Enables data inclusion during repository package transport
- Limited to initial import data initialization
- Applies only to: Dimensions, text entities, or relational datasets
Data Maintenance
Operations:
- Insert records
- Update records
- Delete records
- Truncate table
- Load from file
Local Table (File)
Store data in object store:
Supported Formats:
- Parquet
- CSV
- JSON
Use Cases:
- Large datasets
- Cost-effective storage
- Integration with data lakes
Remote Tables
Virtual access to external data without copying.
Importing Remote Tables
- Select connection in source browser
- Choose tables/views to import
- Configure import settings
- Deploy remote table
Data Access Modes
| Mode | Description | Performance |
|---|---|---|
| Remote | Query source directly | Network dependent |
| Replicated | Copy to local storage | Fast queries |
Replication Options
Full Replication:
- Copy all data
- Scheduled refresh
Real-Time Replication:
- Continuous change capture
- Near real-time updates
Partitioned Replication:
- Divide data into partitions
- Parallel loading
Remote Table Properties
Statistics:
- Create statistics for query optimization
- Update statistics periodically
Filters:
- Define partitioning filters
- Limit data volume
Task Chains
Orchestrate multiple data integration tasks.
Creating Task Chains
- Navigate to Data Builder
- Select "New Task Chain"
- Add task nodes
- Configure dependencies
- Save and deploy
Supported Task Types
Repository Objects:
| Task Type | Activity | Description |
|---|---|---|
| Remote Tables | Replicate | Replicate remote table data |
| Views | Persist | Persist view data to storage |
| Intelligent Lookups | Run | Execute intelligent lookup |
| Data Flows | Run | Execute data flow |
| Replication Flows | Run | Run with load type Initial Only |
| Transformation Flows | Run | Execute transformation flow |
| Local Tables | Delete Records | Delete records with Change Type "Deleted" |
| Local Tables (File) | Merge | Merge delta files |
| Local Tables (File) | Optimize | Compact files |
| Local Tables (File) | Delete Records | Remove data |
Non-Repository Objects:
| Task Type | Description |
|---|---|
| Open SQL Procedure | Execute SAP HANA schema procedures |
| BW Bridge Process Chain | Run SAP BW Bridge processes |
Toolbar-Only Objects:
| Task Type | Description |
|---|---|
| API Task | Call external REST APIs |
| Notification Task | Send email notifications |
Nested Objects:
| Task Type | Description |
|---|---|
| Task Chain | Reference locally-created or shared task chains |
Object Prerequisites
- All objects must be deployed before adding to task chains
- SAP HANA Open SQL schema procedures require EXECUTE privileges granted to space users
- Views cannot have data access controls assigned
- Data flows with input parameters use default values during task chain execution
- Persisting views may include only one parameter with default value
Execution Control
Sequential Execution:
- Tasks run one after another
- Succeeding task runs only when previous completes with completed status
- Failure stops chain execution
Parallel Execution:
- Multiple branches run simultaneously
- Completion condition options:
- ANY: Succeeds when any parallel task completes
- ALL: Succeeds only when all parallel tasks complete
- Synchronization at join points
Layout Options:
- Top-Bottom orientation
- Left-Right orientation
- Drag tasks to reorder
Apache Spark Settings:
- Override default Apache Spark Application Settings per task
- Configure memory and executor settings
Input Parameters
Pass parameters to task chain tasks:
Parameter Definition:
name: region
type: string
default: "US"
Parameter Usage:
- Pass to data flows
- Use in filters
- Dynamic configuration
Scheduling
Simple Schedule:
- Daily, weekly, monthly
- Specific time
Cron Expression:
0 0 6 * * ? # Daily at 6 AM
0 0 */4 * * ? # Every 4 hours
Important Scheduling Constraint: If scheduling remote tables with Replicated (Real-time) data access, replication type converts to batch replication at the next scheduled run (eliminates real-time updates).
Email Notifications
Configure notifications for:
- Success
- Failure
- Warning
Recipient Options:
- Tenant users (searchable after task chain is deployed)
- External email addresses (requires deployed task chain for recipient selection)
Export Constraint: CSN/JSON export does not include notification recipients
Python Operators
Custom data processing with Python.
Creating Python Operators
- Add Script operator to data flow
- Define input/output ports
- Write Python code
- Configure execution
Python Script Structure
def transform(data):
"""
Transform input data.
Args:
data: pandas DataFrame
Returns:
pandas DataFrame
"""
# Your transformation logic
result = data.copy()
result['new_column'] = result['existing'].apply(my_function)
return result
Available Libraries
- pandas
- numpy
- scipy
- scikit-learn
- datetime
Best Practices
- Keep transformations simple
- Handle null values explicitly
- Log errors appropriately
- Test with sample data
Data Transformation
Column-level transformations in graphical views.
Text Transformations
| Function | Description | Example |
|---|---|---|
| Change Case | Upper/lower/title | UPPER(name) |
| Concatenate | Join columns | CONCAT(first, last) |
| Extract | Substring | SUBSTRING(text, 1, 5) |
| Split | Divide by delimiter | SPLIT(full_name, ' ') |
| Find/Replace | Text substitution | REPLACE(text, 'old', 'new') |
Numeric Transformations
| Function | Description |
|---|---|
| ROUND | Round to precision |
| FLOOR | Round down |
| CEIL | Round up |
| ABS | Absolute value |
| MOD | Modulo operation |
Date Transformations
| Function | Description |
|---|---|
| YEAR | Extract year |
| MONTH | Extract month |
| DAY | Extract day |
| DATEDIFF | Date difference |
| ADD_DAYS | Add days to date |
Filter Operations
-- Numeric filter
amount > 1000
-- Text filter
region IN ('US', 'EU', 'APAC')
-- Date filter
order_date >= '2024-01-01'
-- Null handling
customer_name IS NOT NULL
Semantic Onboarding
Import objects with business semantics from SAP systems.
SAP S/4HANA Import
Import CDS views with annotations:
- Semantic types (currency, unit)
- Associations
- Hierarchies
- Text relationships
SAP BW/4HANA Import
Import BW objects:
- InfoObjects
- CompositeProviders
- Queries
- Analysis Authorizations
Import Process
- Select source connection
- Browse available objects
- Select objects to import
- Review semantic mapping
- Deploy imported objects
File Spaces and Object Store
Store and process data in object store.
Creating File Spaces
- System > Configuration > Spaces
- Create new file space
- Configure object store connection
- Set storage limits
Data Loading
Supported Formats:
- Parquet (recommended)
- CSV
- JSON
Loading Methods:
- Replication flows
- Transformation flows
- API upload
In-Memory Acceleration
Enable in-memory storage for faster queries:
- Select table/view
- Enable in-memory storage
- Configure refresh schedule
Premium Outbound Integration
Export data to external systems:
- Configure outbound connection
- Schedule exports
- Monitor transfer status
Documentation Links
- Data Flows: https://help.sap.com/docs/SAP_DATASPHERE/c8a54ee704e94e15926551293243fd1d/e30fd14
- Replication Flows: https://help.sap.com/docs/SAP_DATASPHERE/c8a54ee704e94e15926551293243fd1d/25e2bd7
- Transformation Flows: https://help.sap.com/docs/SAP_DATASPHERE/c8a54ee704e94e15926551293243fd1d/f7161e6
- Task Chains: https://help.sap.com/docs/SAP_DATASPHERE/c8a54ee704e94e15926551293243fd1d/d1afbc2
Last Updated: 2025-11-22