Initial commit

2025-11-30 08:55:25 +08:00
commit e23395aeb2
19 changed files with 6391 additions and 0 deletions
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,263 @@
+---
+name: sap-hana-cloud-data-intelligence
+description: |
+  Develops data processing pipelines, integrations, and machine learning scenarios in SAP Data Intelligence Cloud. Use when building graphs/pipelines with operators, integrating ABAP/S4HANA systems, creating replication flows, developing ML scenarios with JupyterLab, or using Data Transformation Language functions. Covers Gen1/Gen2 operators, subengines (Python, Node.js, C++), structured data operators, and repository objects.
+license: GPL-3.0
+metadata:
+  version: "1.0.0"
+  last_verified: "2025-11-27"
+---
+
+# SAP HANA Cloud Data Intelligence Skill
+
+This skill provides comprehensive guidance for developing with SAP Data Intelligence Cloud, including pipeline creation, operator development, data integration, and machine learning scenarios.
+
+## When to Use This Skill
+
+Use this skill when:
+
+- Creating or modifying data processing graphs/pipelines
+- Developing custom operators (Gen1 or Gen2)
+- Integrating ABAP-based SAP systems (S/4HANA, BW)
+- Building replication flows for data movement
+- Developing ML scenarios with ML Scenario Manager
+- Working with JupyterLab in Data Intelligence
+- Using Data Transformation Language (DTL) functions
+- Configuring subengines (Python, Node.js, C++)
+- Working with structured data operators
+
+## Core Concepts
+
+### Graphs (Pipelines)
+
+Graphs are networks of operators connected via typed input/output ports for data transfer.
+
+**Two Generations:**
+- **Gen1 Operators**: Legacy operators, broad compatibility
+- **Gen2 Operators**: Enhanced error recovery, state management, snapshots
+
+**Critical Rule**: Graphs cannot mix Gen1 and Gen2 operators - choose one generation per graph.
+
+**Gen2 Advantages:**
+- Automatic error recovery with snapshots
+- State management with periodic checkpoints
+- Native multiplexing (one-to-many, many-to-one)
+- Improved Python3 operator
+
+### Operators
+
+Building blocks that process data within graphs. Each operator has:
+- **Ports**: Typed input/output connections for data flow
+- **Configuration**: Parameters that control behavior
+- **Runtime**: Engine that executes the operator
+
+**Operator Categories:**
+1. Messaging (Kafka, MQTT, NATS)
+2. Storage (Files, HDFS, S3, Azure, GCS)
+3. Database (HANA, SAP BW, SQL)
+4. Script (Python, JavaScript, R, Go)
+5. Data Processing (Transform, Anonymize, Validate)
+6. Machine Learning (TensorFlow, PyTorch, HANA ML)
+7. Integration (OData, REST, SAP CPI)
+8. Workflow (Pipeline, Data Workflow)
+
+### Subengines
+
+Subengines enable operators to run on different runtimes within the same graph.
+
+**Supported Subengines:**
+- **ABAP**: For ABAP Pipeline Engine operators
+- **Python 3.9**: For Python-based operators
+- **Node.js**: For JavaScript-based operators
+- **C++**: For high-performance native operators
+
+**Key Benefit**: Connected operators on the same subengine run in a single OS process for optimal performance.
+
+**Trade-off**: Cross-engine communication requires serialization/deserialization overhead.
+
+## Quick Start Patterns
+
+### Basic Graph Creation
+
+```
+1. Open SAP Data Intelligence Modeler
+2. Create new graph
+3. Add operators from repository
+4. Connect operator ports (matching types)
+5. Configure operator parameters
+6. Validate graph
+7. Execute and monitor
+```
+
+### Replication Flow Pattern
+
+```
+1. Create replication flow in Modeler
+2. Configure source connection (ABAP, HANA, etc.)
+3. Configure target (HANA Cloud, S3, Kafka, etc.)
+4. Add tasks with source objects
+5. Define filters and mappings
+6. Validate flow
+7. Deploy to tenant repository
+8. Run and monitor
+```
+
+**Delivery Guarantees:**
+- Default: At-least-once (may have duplicates)
+- With UPSERT to databases: Exactly-once
+- For cloud storage: Use "Suppress Duplicates" option
+
+### ML Scenario Pattern
+
+```
+1. Open ML Scenario Manager from launchpad
+2. Create new scenario
+3. Add datasets (register data sources)
+4. Create Jupyter notebooks for experiments
+5. Build training pipelines
+6. Track metrics with Metrics Explorer
+7. Version scenario for reproducibility
+8. Deploy model pipeline
+```
+
+## Common Tasks
+
+### ABAP System Integration
+
+For integrating ABAP-based SAP systems:
+
+1. **Prerequisites**: Configure Cloud Connector for on-premise systems
+2. **Connection Setup**: Create ABAP connection in Connection Management
+3. **Metadata Access**: Use Metadata Explorer for object discovery
+4. **Data Sources**: CDS Views, ODP (Operational Data Provisioning), Tables
+
+**Reference**: See `references/abap-integration.md` for detailed setup.
+
+### Structured Data Processing
+
+Use structured data operators for SQL-like transformations:
+
+- **Data Transform**: Visual SQL editor for complex transformations
+- **Aggregation Node**: GROUP BY with aggregation functions
+- **Join Node**: INNER, LEFT, RIGHT, FULL joins
+- **Projection Node**: Column selection and renaming
+- **Union Node**: Combine multiple datasets
+- **Case Node**: Conditional logic
+
+**Reference**: See `references/structured-data-operators.md` for configuration.
+
+### Data Transformation Language
+
+DTL provides SQL-like functions for data processing:
+
+**Function Categories:**
+- String: CONCAT, SUBSTRING, UPPER, LOWER, TRIM, REPLACE
+- Numeric: ABS, CEIL, FLOOR, ROUND, MOD, POWER
+- Date/Time: ADD_DAYS, MONTHS_BETWEEN, EXTRACT, CURRENT_UTCTIMESTAMP
+- Conversion: TO_DATE, TO_STRING, TO_INTEGER, TO_DECIMAL
+- Miscellaneous: CASE, COALESCE, IFNULL, NULLIF
+
+**Reference**: See `references/dtl-functions.md` for complete reference.
+
+## Best Practices
+
+### Graph Design
+
+1. **Choose Generation Early**: Decide Gen1 vs Gen2 before building
+2. **Minimize Cross-Engine Communication**: Group operators by subengine
+3. **Use Appropriate Port Types**: Match data types for efficient transfer
+4. **Enable Snapshots**: For Gen2 graphs, enable auto-recovery
+5. **Validate Before Execution**: Always validate graphs
+
+### Operator Development
+
+1. **Start with Built-in Operators**: Use predefined operators first
+2. **Extend When Needed**: Create custom operators for specific needs
+3. **Use Script Operators**: For quick prototyping with Python/JS
+4. **Version Your Operators**: Track changes with operator versions
+5. **Document Configuration**: Describe all parameters
+
+### Replication Flows
+
+1. **Plan Target Schema**: Understand target structure requirements
+2. **Use Filters**: Reduce data volume with source filters
+3. **Handle Duplicates**: Configure for exactly-once when possible
+4. **Monitor Execution**: Track progress and errors
+5. **Clean Up Artifacts**: Remove source artifacts after completion
+
+### ML Scenarios
+
+1. **Version Early**: Create versions before major changes
+2. **Track All Metrics**: Use SDK for comprehensive tracking
+3. **Use Notebooks for Exploration**: JupyterLab for experimentation
+4. **Productionize with Pipelines**: Convert notebooks to pipelines
+5. **Export/Import for Migration**: Use ZIP export for transfers
+
+## Error Handling
+
+### Common Graph Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| Port type mismatch | Incompatible data types | Use converter operator or matching types |
+| Gen1/Gen2 mixing | Combined operator generations | Use single generation per graph |
+| Resource exhaustion | Insufficient memory/CPU | Adjust resource requirements |
+| Connection failure | Network or credentials | Verify connection settings |
+| Validation errors | Invalid configuration | Review error messages, fix config |
+
+### Recovery Strategies
+
+**Gen2 Graphs:**
+- Enable automatic recovery in graph settings
+- Configure snapshot intervals
+- Monitor recovery status
+
+**Gen1 Graphs:**
+- Implement manual error handling in operators
+- Use try-catch in script operators
+- Configure retry logic
+
+## Reference Files
+
+For detailed information, see:
+
+- `references/operators-reference.md` - Complete operator catalog (266 operators)
+- `references/abap-integration.md` - ABAP/S4HANA/BW integration with SAP Notes
+- `references/structured-data-operators.md` - Structured data processing
+- `references/dtl-functions.md` - Data Transformation Language (79 functions)
+- `references/ml-scenario-manager.md` - ML Scenario Manager, SDK, artifacts
+- `references/subengines.md` - Python, Node.js, C++ subengine development
+- `references/graphs-pipelines.md` - Graph execution, snapshots, recovery
+- `references/replication-flows.md` - Replication flows, cloud storage, Kafka
+- `references/data-workflow.md` - Data workflow operators, orchestration
+- `references/security-cdc.md` - Security, data protection, CDC methods
+- `references/additional-features.md` - Monitoring, cloud storage services, scenario templates, data types, Git terminal
+- `references/modeling-advanced.md` - Graph snippets, SAP cloud apps, configuration types, 141 graph templates
+
+## Templates
+
+Starter templates are available in `templates/`:
+
+- `templates/basic-graph.json` - Simple data processing graph
+- `templates/replication-flow.json` - Data replication pattern
+- `templates/ml-training-pipeline.json` - ML training workflow
+
+## Documentation Links
+
+**Primary Sources:**
+- GitHub Docs: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs)
+- SAP Help Portal: [https://help.sap.com/docs/SAP_DATA_INTELLIGENCE](https://help.sap.com/docs/SAP_DATA_INTELLIGENCE)
+- SAP Developer Center: [https://developers.sap.com/topics/data-intelligence.html](https://developers.sap.com/topics/data-intelligence.html)
+
+**Section-Specific:**
+- Modeling Guide: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide)
+- ABAP Integration: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/abapintegration](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/abapintegration)
+- Machine Learning: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/machinelearning](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/machinelearning)
+- Function Reference: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/functionreference](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/functionreference)
+- Repository Objects: [https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects](https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects)
+
+## Version Information
+
+- **Skill Version**: 1.0.0
+- **Last Updated**: 2025-11-27
+- **Documentation Source**: SAP-docs/sap-hana-cloud-data-intelligence (GitHub)