--- name: transform-batch description: Transform multiple database tables in parallel with maximum efficiency --- # Transform Multiple Tables to Staging (Batch Mode) ## ⚠️ CRITICAL: This command enables parallel processing for 3x-10x faster transformations I'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance. --- ## Required Information Please provide the following details: ### 1. Source Tables - **Table List**: Comma-separated list of tables (e.g., `table1, table2, table3`) - **Format**: `database.table_name` or just `table_name` (if same database) - **Example**: `client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion` ### 2. Source Configuration - **Source Database**: Database containing tables (e.g., `client_src`) - **Staging Database**: Target database (default: `client_stg`) - **Lookup Database**: Reference database for rules (default: `client_config`) ### 3. SQL Engine (Optional) - **Engine**: Choose one: - `presto` or `trino` - Presto/Trino SQL engine (default) - `hive` - Hive SQL engine - `mixed` - Specify engine per table - If not specified, will default to Presto/Trino for all tables ### 4. Mixed Engine Example (Optional) If you need different engines for different tables: ``` Transform table1 using Hive, table2 using Presto, table3 using Hive ``` --- ## What I'll Do ### Step 1: Parse Table List I will extract individual tables from your input: - Parse comma-separated list - Detect database prefix for each table - Identify total table count ### Step 2: Detect Engine Strategy I will determine processing strategy: - **Single Engine**: All tables use same engine - Presto/Trino (default) → All tables to `staging-transformer-presto` - Hive → All tables to `staging-transformer-hive` - **Mixed Engines**: Different engines per table - Parse engine specification per table - Route each table to appropriate sub-agent ### Step 3: Launch Parallel Sub-Agents I will create parallel sub-agent calls: - **ONE sub-agent per table** (maximum parallelism) - **Single message with multiple Task calls** (concurrent execution) - **Each sub-agent processes independently** (no blocking) - **All sub-agents skip git workflow** (consolidated at end) ### Step 4: Monitor Parallel Execution I will track all sub-agent progress: - Wait for all sub-agents to complete - Collect results from each transformation - Identify any failures or errors - Report partial success if needed ### Step 5: Consolidate Results After ALL tables complete successfully: 1. **Aggregate file changes** across all tables 2. **Execute single git workflow**: - Create feature branch - Commit all changes together - Push to remote - Create comprehensive PR 3. **Report complete summary** --- ## Processing Strategy ### Parallel Processing (Recommended for 2+ Tables) ``` User requests: "Transform tables A, B, C" Main Claude creates 3 parallel sub-agent calls: ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │ │ (Table A) │ │ (Table B) │ │ (Table C) │ │ staging- │ │ staging- │ │ staging- │ │ transformer- │ │ transformer- │ │ transformer- │ │ presto │ │ presto │ │ presto │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ↓ ↓ ↓ [Files for A] [Files for B] [Files for C] ↓ ↓ ↓ └─────────────────────┴─────────────────────┘ ↓ [Consolidated Git Workflow] [Single PR with all tables] ``` ### Performance Benefits: - **Speed**: N tables in ~1x time instead of N×time - **Efficiency**: Optimal resource utilization - **User Experience**: Faster results for batch operations - **Scalability**: Can handle 10+ tables efficiently --- ## Quality Assurance (Per Table) Each sub-agent ensures complete compliance: ✅ **Column Limit Management** (max 200 columns) ✅ **JSON Detection & Extraction** (automatic) ✅ **Date Processing** (4 outputs per date column) ✅ **Email/Phone Validation** (with hashing) ✅ **String Standardization** (UPPER, TRIM, NULL handling) ✅ **Deduplication Logic** (if configured) ✅ **Join Processing** (if specified) ✅ **Incremental Processing** (state tracking) ✅ **SQL File Creation** (init, incremental, upsert) ✅ **DIG File Management** (conditional creation) ✅ **Configuration Update** (src_params.yml) ✅ **Treasure Data Compatibility** (VARCHAR/BIGINT timestamps) --- ## Output Files ### For Presto/Trino Engine (per table): - `staging/init_queries/{source_db}_{table}_init.sql` - `staging/queries/{source_db}_{table}.sql` - `staging/queries/{source_db}_{table}_upsert.sql` (if dedup) - Updated `staging/config/src_params.yml` (all tables) - `staging/staging_transformation.dig` (created once if not exists) ### For Hive Engine (per table): - `staging_hive/queries/{source_db}_{table}.sql` - Updated `staging_hive/config/src_params.yml` (all tables) - `staging_hive/staging_hive.dig` (created once if not exists) - Template files (created once if not exist) ### Plus: - Single git commit with all tables - Comprehensive pull request - Complete validation report for all tables --- ## Example Usage ### Example 1: Same Engine (Presto Default) ``` User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion → Parallel execution with 3 staging-transformer-presto agents → All files to staging/ directory → Single consolidated git workflow → Time: ~1x (vs 3x sequential) ``` ### Example 2: Same Engine (Hive Explicit) ``` User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion → Parallel execution with 2 staging-transformer-hive agents → All files to staging_hive/ directory → Single consolidated git workflow → Time: ~1x (vs 2x sequential) ``` ### Example 3: Mixed Engines ``` User: Transform table1 using Hive, table2 using Presto, table3 using Hive → Parallel execution: - Table1 → staging-transformer-hive - Table2 → staging-transformer-presto - Table3 → staging-transformer-hive → Files distributed to appropriate directories → Single consolidated git workflow → Time: ~1x (vs 3x sequential) ``` --- ## Error Handling ### Partial Success Scenario If some tables succeed and others fail: 1. **Report Clear Status**: ``` ✅ Successfully transformed: table1, table2 ❌ Failed: table3 (error message) ``` 2. **Preserve Successful Work**: - Keep files from successful transformations - Allow retry of only failed tables 3. **Git Safety**: - Only execute git workflow if ALL tables succeed - Otherwise, keep changes local for review ### Full Failure Scenario If all tables fail: - Report detailed error for each table - No git workflow execution - Provide troubleshooting guidance --- ## Next Steps After Batch Transformation 1. **Review Pull Request**: ``` Title: "Batch transform 5 tables to staging" Body: - Transformed tables: table1, table2, table3, table4, table5 - Engine: Presto/Trino - All validation gates passed ✅ - Files created: 15 SQL files, 1 config update ``` 2. **Verify Generated Files**: ```bash # For Presto ls -l staging/queries/ ls -l staging/init_queries/ cat staging/config/src_params.yml # For Hive ls -l staging_hive/queries/ cat staging_hive/config/src_params.yml ``` 3. **Test Workflow**: ```bash cd staging # or staging_hive td wf push td wf run staging_transformation.dig # or staging_hive.dig ``` 4. **Monitor All Tables**: ```sql SELECT table_name, inc_value, project_name FROM client_config.inc_log WHERE table_name IN ('table1', 'table2', 'table3') ORDER BY inc_value DESC ``` --- ## Performance Comparison | Tables | Sequential Time | Parallel Time | Speedup | |--------|----------------|---------------|---------| | 2 | ~10 min | ~5 min | 2x | | 3 | ~15 min | ~5 min | 3x | | 5 | ~25 min | ~5 min | 5x | | 10 | ~50 min | ~5 min | 10x | **Note**: Actual times vary based on table complexity and data volume. --- ## Production-Ready Guarantee All batch transformations will: - ✅ Execute in parallel for maximum speed - ✅ Maintain complete quality for each table - ✅ Provide atomic git workflow (all or nothing) - ✅ Include comprehensive error handling - ✅ Generate maintainable code - ✅ Match production standards exactly --- **Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!** **Format Examples:** - `Transform tables: table1, table2, table3` (same database) - `Transform client_src.table1, client_src.table2` (explicit database) - `Transform table1 using Hive, table2 using Presto` (mixed engines)