9.3 KiB
name, description
| name | description |
|---|---|
| transform-batch | Transform multiple database tables in parallel with maximum efficiency |
Transform Multiple Tables to Staging (Batch Mode)
⚠️ CRITICAL: This command enables parallel processing for 3x-10x faster transformations
I'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance.
Required Information
Please provide the following details:
1. Source Tables
- Table List: Comma-separated list of tables (e.g.,
table1, table2, table3) - Format:
database.table_nameor justtable_name(if same database) - Example:
client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion
2. Source Configuration
- Source Database: Database containing tables (e.g.,
client_src) - Staging Database: Target database (default:
client_stg) - Lookup Database: Reference database for rules (default:
client_config)
3. SQL Engine (Optional)
- Engine: Choose one:
prestoortrino- Presto/Trino SQL engine (default)hive- Hive SQL enginemixed- Specify engine per table- If not specified, will default to Presto/Trino for all tables
4. Mixed Engine Example (Optional)
If you need different engines for different tables:
Transform table1 using Hive, table2 using Presto, table3 using Hive
What I'll Do
Step 1: Parse Table List
I will extract individual tables from your input:
- Parse comma-separated list
- Detect database prefix for each table
- Identify total table count
Step 2: Detect Engine Strategy
I will determine processing strategy:
- Single Engine: All tables use same engine
- Presto/Trino (default) → All tables to
staging-transformer-presto - Hive → All tables to
staging-transformer-hive
- Presto/Trino (default) → All tables to
- Mixed Engines: Different engines per table
- Parse engine specification per table
- Route each table to appropriate sub-agent
Step 3: Launch Parallel Sub-Agents
I will create parallel sub-agent calls:
- ONE sub-agent per table (maximum parallelism)
- Single message with multiple Task calls (concurrent execution)
- Each sub-agent processes independently (no blocking)
- All sub-agents skip git workflow (consolidated at end)
Step 4: Monitor Parallel Execution
I will track all sub-agent progress:
- Wait for all sub-agents to complete
- Collect results from each transformation
- Identify any failures or errors
- Report partial success if needed
Step 5: Consolidate Results
After ALL tables complete successfully:
- Aggregate file changes across all tables
- Execute single git workflow:
- Create feature branch
- Commit all changes together
- Push to remote
- Create comprehensive PR
- Report complete summary
Processing Strategy
Parallel Processing (Recommended for 2+ Tables)
User requests: "Transform tables A, B, C"
Main Claude creates 3 parallel sub-agent calls:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │
│ (Table A) │ │ (Table B) │ │ (Table C) │
│ staging- │ │ staging- │ │ staging- │
│ transformer- │ │ transformer- │ │ transformer- │
│ presto │ │ presto │ │ presto │
└─────────────────┘ └─────────────────┘ └─────────────────┘
↓ ↓ ↓
[Files for A] [Files for B] [Files for C]
↓ ↓ ↓
└─────────────────────┴─────────────────────┘
↓
[Consolidated Git Workflow]
[Single PR with all tables]
Performance Benefits:
- Speed: N tables in ~1x time instead of N×time
- Efficiency: Optimal resource utilization
- User Experience: Faster results for batch operations
- Scalability: Can handle 10+ tables efficiently
Quality Assurance (Per Table)
Each sub-agent ensures complete compliance:
✅ Column Limit Management (max 200 columns) ✅ JSON Detection & Extraction (automatic) ✅ Date Processing (4 outputs per date column) ✅ Email/Phone Validation (with hashing) ✅ String Standardization (UPPER, TRIM, NULL handling) ✅ Deduplication Logic (if configured) ✅ Join Processing (if specified) ✅ Incremental Processing (state tracking) ✅ SQL File Creation (init, incremental, upsert) ✅ DIG File Management (conditional creation) ✅ Configuration Update (src_params.yml) ✅ Treasure Data Compatibility (VARCHAR/BIGINT timestamps)
Output Files
For Presto/Trino Engine (per table):
staging/init_queries/{source_db}_{table}_init.sqlstaging/queries/{source_db}_{table}.sqlstaging/queries/{source_db}_{table}_upsert.sql(if dedup)- Updated
staging/config/src_params.yml(all tables) staging/staging_transformation.dig(created once if not exists)
For Hive Engine (per table):
staging_hive/queries/{source_db}_{table}.sql- Updated
staging_hive/config/src_params.yml(all tables) staging_hive/staging_hive.dig(created once if not exists)- Template files (created once if not exist)
Plus:
- Single git commit with all tables
- Comprehensive pull request
- Complete validation report for all tables
Example Usage
Example 1: Same Engine (Presto Default)
User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion
→ Parallel execution with 3 staging-transformer-presto agents
→ All files to staging/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)
Example 2: Same Engine (Hive Explicit)
User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion
→ Parallel execution with 2 staging-transformer-hive agents
→ All files to staging_hive/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 2x sequential)
Example 3: Mixed Engines
User: Transform table1 using Hive, table2 using Presto, table3 using Hive
→ Parallel execution:
- Table1 → staging-transformer-hive
- Table2 → staging-transformer-presto
- Table3 → staging-transformer-hive
→ Files distributed to appropriate directories
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)
Error Handling
Partial Success Scenario
If some tables succeed and others fail:
-
Report Clear Status:
✅ Successfully transformed: table1, table2 ❌ Failed: table3 (error message) -
Preserve Successful Work:
- Keep files from successful transformations
- Allow retry of only failed tables
-
Git Safety:
- Only execute git workflow if ALL tables succeed
- Otherwise, keep changes local for review
Full Failure Scenario
If all tables fail:
- Report detailed error for each table
- No git workflow execution
- Provide troubleshooting guidance
Next Steps After Batch Transformation
-
Review Pull Request:
Title: "Batch transform 5 tables to staging" Body: - Transformed tables: table1, table2, table3, table4, table5 - Engine: Presto/Trino - All validation gates passed ✅ - Files created: 15 SQL files, 1 config update -
Verify Generated Files:
# For Presto ls -l staging/queries/ ls -l staging/init_queries/ cat staging/config/src_params.yml # For Hive ls -l staging_hive/queries/ cat staging_hive/config/src_params.yml -
Test Workflow:
cd staging # or staging_hive td wf push td wf run staging_transformation.dig # or staging_hive.dig -
Monitor All Tables:
SELECT table_name, inc_value, project_name FROM client_config.inc_log WHERE table_name IN ('table1', 'table2', 'table3') ORDER BY inc_value DESC
Performance Comparison
| Tables | Sequential Time | Parallel Time | Speedup |
|---|---|---|---|
| 2 | ~10 min | ~5 min | 2x |
| 3 | ~15 min | ~5 min | 3x |
| 5 | ~25 min | ~5 min | 5x |
| 10 | ~50 min | ~5 min | 10x |
Note: Actual times vary based on table complexity and data volume.
Production-Ready Guarantee
All batch transformations will:
- ✅ Execute in parallel for maximum speed
- ✅ Maintain complete quality for each table
- ✅ Provide atomic git workflow (all or nothing)
- ✅ Include comprehensive error handling
- ✅ Generate maintainable code
- ✅ Match production standards exactly
Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!
Format Examples:
Transform tables: table1, table2, table3(same database)Transform client_src.table1, client_src.table2(explicit database)Transform table1 using Hive, table2 using Presto(mixed engines)