Initial commit

2025-11-30 08:55:25 +08:00
commit e23395aeb2
19 changed files with 6391 additions and 0 deletions
--- a/references/graphs-pipelines.md
+++ b/references/graphs-pipelines.md
@@ -0,0 +1,494 @@
+# Graphs and Pipelines Guide
+
+Complete guide for graph/pipeline development in SAP Data Intelligence.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Graph Concepts](#graph-concepts)
+3. [Creating Graphs](#creating-graphs)
+4. [Running Graphs](#running-graphs)
+5. [Monitoring](#monitoring)
+6. [Error Recovery](#error-recovery)
+7. [Scheduling](#scheduling)
+8. [Advanced Topics](#advanced-topics)
+9. [Best Practices](#best-practices)
+
+---
+
+## Overview
+
+Graphs (also called pipelines) are the core execution unit in SAP Data Intelligence.
+
+**Definition:**
+A graph is a network of operators connected via typed input/output ports for data transfer.
+
+**Key Features:**
+- Visual design in Modeler
+- Two generations (Gen1, Gen2)
+- Execution monitoring
+- Error recovery (Gen2)
+- Scheduling
+
+---
+
+## Graph Concepts
+
+### Operator Generations
+
+**Gen1 Operators:**
+- Legacy operator model
+- Process-based execution
+- Manual error handling
+- Broad compatibility
+
+**Gen2 Operators:**
+- Enhanced error recovery
+- State management with snapshots
+- Native multiplexing
+- Better performance
+
+**Critical Rule:** Cannot mix Gen1 and Gen2 operators in the same graph.
+
+### Ports and Data Types
+
+**Port Types:**
+- Input ports: Receive data
+- Output ports: Send data
+- Typed connections
+
+**Common Data Types:**
+- string, int32, int64, float32, float64
+- blob (binary data)
+- message (structured data)
+- table (tabular data)
+- any (flexible type)
+
+### Graph Structure
+
+```
+[Source Operator] ─── port ──→ [Processing Operator] ─── port ──→ [Target Operator]
+     │                                │                                │
+  Output Port                  In/Out Ports                      Input Port
+```
+
+---
+
+## Creating Graphs
+
+### Using the Modeler
+
+1. **Create New Graph**
+   - Open Modeler
+   - Click "+" to create graph
+   - Select graph name and location
+
+2. **Add Operators**
+   - Browse operator repository
+   - Drag operators to canvas
+   - Or search by name
+
+3. **Connect Operators**
+   - Drag from output port to input port
+   - Verify type compatibility
+   - Configure connection properties
+
+4. **Configure Operators**
+   - Select operator
+   - Set parameters in Properties panel
+   - Configure ports if needed
+
+5. **Validate Graph**
+   - Click Validate button
+   - Review warnings and errors
+   - Fix issues before running
+
+### Graph-Level Configuration
+
+**Graph Data Types:**
+Create custom data types for the graph:
+
+```json
+{
+  "name": "CustomerRecord",
+  "properties": {
+    "id": "string",
+    "name": "string",
+    "amount": "float64"
+  }
+}
+```
+
+**Graph Parameters:**
+Define runtime parameters:
+
+```
+Parameters:
+  - name: source_path
+    type: string
+    default: /data/input/
+  - name: batch_size
+    type: int32
+    default: 1000
+```
+
+### Groups and Tags
+
+**Groups:**
+- Organize operators visually
+- Share Docker configuration
+- Resource allocation
+
+**Tags:**
+- Label operators
+- Filter and search
+- Documentation
+
+---
+
+## Running Graphs
+
+### Execution Methods
+
+**Manual Execution:**
+1. Open graph in Modeler
+2. Click "Run" button
+3. Configure runtime parameters
+4. Monitor execution
+
+**Programmatic Execution:**
+Via Pipeline API or Data Workflow operators.
+
+### Execution Model
+
+**Process Model:**
+- Each operator runs as process
+- Communication via ports
+- Coordinated by main engine
+
+**Gen2 Features:**
+- Snapshot checkpoints
+- State recovery
+- Exactly-once semantics (configurable)
+
+### Runtime Parameters
+
+Pass parameters at execution:
+
+```
+source_path = /data/2024/january/
+batch_size = 5000
+target_table = SALES_JAN_2024
+```
+
+### Resource Configuration
+
+**Memory/CPU:**
+```json
+{
+  "resources": {
+    "requests": {
+      "memory": "1Gi",
+      "cpu": "500m"
+    },
+    "limits": {
+      "memory": "4Gi",
+      "cpu": "2000m"
+    }
+  }
+}
+```
+
+---
+
+## Monitoring
+
+### Graph Status
+
+| Status | Description |
+|--------|-------------|
+| Pending | Waiting to start |
+| Running | Actively executing |
+| Completed | Finished successfully |
+| Failed | Error occurred |
+| Dead | Terminated unexpectedly |
+| Stopping | Shutdown in progress |
+
+### Operator Status
+
+| Status | Description |
+|--------|-------------|
+| Initializing | Setting up |
+| Running | Processing data |
+| Stopped | Finished or stopped |
+| Failed | Error in operator |
+
+### Monitoring Dashboard
+
+**Available Metrics:**
+- Messages processed
+- Processing time
+- Memory usage
+- Error counts
+
+**Access:**
+1. Open running graph
+2. Click "Monitor" tab
+3. View real-time statistics
+
+### Diagnostic Information
+
+**Collect Diagnostics:**
+1. Select running/failed graph
+2. Click "Download Diagnostics"
+3. Review logs and state
+
+**Archive Contents:**
+- execution.json (execution details)
+- graphs.json (graph definition)
+- events.json (execution events)
+- Operator logs
+- State snapshots
+
+---
+
+## Error Recovery
+
+### Gen2 Error Recovery
+
+**Automatic Recovery:**
+1. Enable in graph settings
+2. Configure snapshot interval
+3. System recovers from last snapshot
+
+**Configuration:**
+```json
+{
+  "autoRecovery": {
+    "enabled": true,
+    "snapshotInterval": "60s",
+    "maxRetries": 3
+  }
+}
+```
+
+### Snapshots
+
+**What's Saved:**
+- Operator state
+- Message queues
+- Processing position
+
+**When Snapshots Occur:**
+- Periodic (configured interval)
+- On operator request
+- Before shutdown
+
+### Delivery Guarantees
+
+| Mode | Description |
+|------|-------------|
+| At-most-once | May lose messages |
+| At-least-once | May duplicate messages |
+| Exactly-once | No loss or duplication |
+
+**Gen2 Default:** At-least-once with recovery.
+
+### Manual Error Handling
+
+**In Script Operators:**
+```python
+def on_input(msg_id, header, body):
+    try:
+        result = process(body)
+        api.send("output", api.Message(result))
+    except Exception as e:
+        api.logger.error(f"Processing error: {e}")
+        api.send("error", api.Message({
+            "error": str(e),
+            "input": body
+        }))
+```
+
+---
+
+## Scheduling
+
+### Schedule Graph Executions
+
+**Cron Expression Format:**
+```
+┌───────────── second (0-59)
+│ ┌───────────── minute (0-59)
+│ │ ┌───────────── hour (0-23)
+│ │ │ ┌───────────── day of month (1-31)
+│ │ │ │ ┌───────────── month (1-12)
+│ │ │ │ │ ┌───────────── day of week (0-6, Sun=0)
+│ │ │ │ │ │
+* * * * * *
+```
+
+**Examples:**
+```
+0 0 * * * *     # Every hour
+0 0 0 * * *     # Daily at midnight
+0 0 0 * * 1     # Every Monday
+0 0 6 1 * *     # 6 AM on first of month
+0 */15 * * * *  # Every 15 minutes
+```
+
+### Creating Schedule
+
+1. Open graph
+2. Click "Schedule"
+3. Configure cron expression
+4. Set timezone
+5. Activate schedule
+
+### Managing Schedules
+
+**Actions:**
+- View scheduled runs
+- Pause schedule
+- Resume schedule
+- Delete schedule
+- View execution history
+
+---
+
+## Advanced Topics
+
+### Native Multiplexing (Gen2)
+
+Connect one output to multiple inputs:
+
+```
+[Source] ─┬──→ [Processor A]
+          ├──→ [Processor B]
+          └──→ [Processor C]
+```
+
+Or multiple outputs to one input:
+
+```
+[Source A] ──┐
+[Source B] ──┼──→ [Processor]
+[Source C] ──┘
+```
+
+### Graph Snippets
+
+Reusable graph fragments:
+
+1. **Create Snippet:**
+   - Select operators
+   - Right-click > "Save as Snippet"
+   - Name and save
+
+2. **Use Snippet:**
+   - Drag snippet to canvas
+   - Configure parameters
+   - Connect to graph
+
+### Parameterization
+
+**Substitution Variables:**
+```
+${parameter_name}
+${ENV.VARIABLE_NAME}
+${SYSTEM.TENANT}
+```
+
+**In Operator Config:**
+```
+File Path: ${source_path}/data_${DATE}.csv
+Connection: ${target_connection}
+```
+
+### Import/Export
+
+**Export Graph:**
+```
+1. Select graph
+2. Right-click > Export
+3. Include data types (optional)
+4. Save as .zip
+```
+
+**Import Graph:**
+```
+1. Right-click in repository
+2. Import > From file
+3. Select .zip file
+4. Map dependencies
+```
+
+---
+
+## Best Practices
+
+### Graph Design
+
+1. **Clear Data Flow**: Left-to-right, top-to-bottom
+2. **Meaningful Names**: Descriptive operator names
+3. **Group Related Operators**: Use groups for organization
+4. **Document**: Add descriptions to operators
+5. **Validate Often**: Check during development
+
+### Performance
+
+1. **Minimize Cross-Engine Communication**
+2. **Use Appropriate Batch Sizes**
+3. **Configure Resources**: Memory and CPU
+4. **Enable Parallel Processing**: Where applicable
+5. **Monitor and Tune**: Use metrics
+
+### Error Handling
+
+1. **Enable Auto-Recovery** (Gen2)
+2. **Configure Appropriate Snapshot Interval**
+3. **Implement Error Ports**: Route errors
+4. **Log Sufficiently**: Debug information
+5. **Test Failure Scenarios**: Validate recovery
+
+### Maintenance
+
+1. **Version Control**: Use graph versioning
+2. **Document Changes**: Change history
+3. **Test Before Deploy**: Validate thoroughly
+4. **Monitor Production**: Watch for issues
+5. **Clean Up**: Remove unused graphs
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+| Issue | Cause | Solution |
+|-------|-------|----------|
+| Port type mismatch | Incompatible types | Use converter operator |
+| Graph won't start | Resource constraints | Adjust resource config |
+| Slow performance | Cross-engine overhead | Optimize operator placement |
+| Recovery fails | Corrupt snapshot | Clear state, restart |
+| Schedule not running | Incorrect cron | Verify expression |
+
+### Diagnostic Steps
+
+1. Check graph status
+2. Review operator logs
+3. Download diagnostics
+4. Check resource usage
+5. Verify connections
+
+---
+
+## Documentation Links
+
+- **Using Graphs**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs)
+- **Creating Graphs**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide/using-graphs/creating-graphs)
+- **Graph Examples**: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects/data-intelligence-graphs)
+
+---
+
+**Last Updated**: 2025-11-22