Initial commit

2025-11-30 09:02:46 +08:00
commit b81339e588
8 changed files with 2678 additions and 0 deletions
--- a/commands/transform-table.md
+++ b/commands/transform-table.md
@@ -0,0 +1,186 @@
+---
+name: transform-table
+description: Transform a single database table to staging format with data quality improvements, PII handling, and JSON extraction
+---
+
+# Transform Single Table to Staging
+
+## ⚠️ CRITICAL: This command enforces strict sub-agent delegation
+
+I'll help you transform a database table to staging format using the appropriate staging-transformer sub-agent (Presto/Trino or Hive).
+
+---
+
+## Required Information
+
+Please provide the following details:
+
+### 1. Source Table
+- **Database Name**: Source database (e.g., `client_src`, `demo_db`)
+- **Table Name**: Source table name (e.g., `customer_profiles_histunion`)
+
+### 2. Target Configuration
+- **Staging Database**: Target database (default: `client_stg`)
+- **Lookup Database**: Reference database for rules (default: `client_config`)
+
+### 3. SQL Engine (Optional)
+- **Engine**: Choose one:
+  - `presto` or `trino` - Presto/Trino SQL engine (default)
+  - `hive` - Hive SQL engine
+  - If not specified, will default to Presto/Trino
+
+### 4. Transformation Requirements (Optional)
+- **Deduplication**: Required? (will check client_config.staging_trnsfrm_rules)
+- **JSON Columns**: Will auto-detect and process
+- **Join Logic**: Any joins needed? (will check additional_rules)
+
+---
+
+## What I'll Do
+
+### Step 1: Detect SQL Engine
+I will determine the appropriate sub-agent:
+- **Presto/Trino keywords** → `staging-transformer-presto`
+- **Hive keywords** → `staging-transformer-hive`
+- **No specification** → `staging-transformer-presto` (default)
+
+### Step 2: Delegate to Specialized Agent
+I will invoke the appropriate staging-transformer agent with:
+- Complete table transformation context
+- All mandatory requirements (13 rules)
+- Engine-specific SQL generation
+- Full compliance validation
+
+### Step 3: Sub-Agent Will Execute
+The specialized agent will:
+1. **Validate table existence** (MANDATORY first step)
+2. **Analyze metadata** (columns, types, data samples)
+3. **Check configuration** (deduplication rules, additional rules)
+4. **Detect JSON columns** (automatic processing)
+5. **Generate SQL files**:
+   - `staging/init_queries/{source_db}_{table_name}_init.sql` (Presto)
+   - `staging/queries/{source_db}_{table_name}.sql` (Presto)
+   - `staging/queries/{source_db}_{table_name}_upsert.sql` (if dedup, Presto)
+   - OR `staging_hive/queries/{source_db}_{table_name}.sql` (Hive)
+6. **Update configuration**: `staging/config/src_params.yml` or `staging_hive/config/src_params.yml`
+7. **Create/verify DIG file**: `staging/staging_transformation.dig` or `staging_hive/staging_hive.dig`
+8. **Execute git workflow**: Commit, branch, push, PR creation
+
+---
+
+## Quality Assurance
+
+The sub-agent ensures complete compliance with all requirements:
+
+✅ **Column Limit Management** (max 200 columns)
+✅ **JSON Detection & Extraction** (automatic)
+✅ **Date Processing** (4 outputs per date column)
+✅ **Email/Phone Validation** (with hashing)
+✅ **String Standardization** (UPPER, TRIM, NULL handling)
+✅ **Deduplication Logic** (if configured)
+✅ **Join Processing** (if specified)
+✅ **Incremental Processing** (state tracking)
+✅ **SQL File Creation** (init, incremental, upsert)
+✅ **DIG File Management** (conditional creation)
+✅ **Configuration Update** (src_params.yml)
+✅ **Git Workflow** (complete automation)
+✅ **Treasure Data Compatibility** (VARCHAR/BIGINT timestamps)
+
+---
+
+## Output Files
+
+### For Presto/Trino Engine:
+1. `staging/init_queries/{source_db}_{table_name}_init.sql` - Initial load SQL
+2. `staging/queries/{source_db}_{table_name}.sql` - Incremental SQL
+3. `staging/queries/{source_db}_{table_name}_upsert.sql` - Upsert SQL (if dedup)
+4. `staging/config/src_params.yml` - Updated configuration
+5. `staging/staging_transformation.dig` - Workflow (created if not exists)
+
+### For Hive Engine:
+1. `staging_hive/queries/{source_db}_{table_name}.sql` - Combined SQL
+2. `staging_hive/config/src_params.yml` - Updated configuration
+3. `staging_hive/staging_hive.dig` - Workflow (created if not exists)
+4. `staging_hive/queries/get_max_time.sql` - Template (created if not exists)
+5. `staging_hive/queries/get_stg_rows_for_delete.sql` - Template (created if not exists)
+
+### Plus:
+- Git commit with comprehensive message
+- Pull request with transformation summary
+- Validation report
+
+---
+
+## Example Usage
+
+### Example 1: Presto Engine (Default)
+```
+User: Transform table client_src.customer_profiles_histunion
+→ Engine: Presto (default)
+→ Sub-agent: staging-transformer-presto
+→ Output: staging/ directory files
+```
+
+### Example 2: Hive Engine (Explicit)
+```
+User: Transform table client_src.klaviyo_events_histunion using Hive
+→ Engine: Hive
+→ Sub-agent: staging-transformer-hive
+→ Output: staging_hive/ directory files
+```
+
+### Example 3: With Custom Databases
+```
+User: Transform demo_db.orders_histunion
+      Use demo_db_stg as staging database
+      Use client_config for lookup
+→ Engine: Presto (default)
+→ Custom databases applied
+```
+
+---
+
+## Next Steps After Transformation
+
+1. **Review generated files**:
+   ```bash
+   ls -l staging/queries/
+   ls -l staging/init_queries/
+   cat staging/config/src_params.yml
+   ```
+
+2. **Review Pull Request**:
+   - Check transformation summary
+   - Verify all validation gates passed
+   - Review generated SQL
+
+3. **Test the workflow**:
+   ```bash
+   cd staging
+   td wf push
+   td wf run staging_transformation.dig
+   ```
+
+4. **Monitor execution**:
+   ```sql
+   SELECT * FROM client_config.inc_log
+   WHERE table_name = '{your_table}'
+   ORDER BY inc_value DESC
+   LIMIT 1
+   ```
+
+---
+
+## Production-Ready Guarantee
+
+All transformations will:
+- ✅ Work the first time
+- ✅ Follow consistent patterns
+- ✅ Include complete error handling
+- ✅ Include comprehensive data quality
+- ✅ Be maintainable and documented
+- ✅ Match production standards exactly
+
+---
+
+**Ready to proceed? Please provide the source database and table name, and I'll delegate to the appropriate staging-transformer agent for complete processing.**