Initial commit
This commit is contained in:
295
commands/transform-batch.md
Normal file
295
commands/transform-batch.md
Normal file
@@ -0,0 +1,295 @@
|
||||
---
|
||||
name: transform-batch
|
||||
description: Transform multiple database tables in parallel with maximum efficiency
|
||||
---
|
||||
|
||||
# Transform Multiple Tables to Staging (Batch Mode)
|
||||
|
||||
## ⚠️ CRITICAL: This command enables parallel processing for 3x-10x faster transformations
|
||||
|
||||
I'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance.
|
||||
|
||||
---
|
||||
|
||||
## Required Information
|
||||
|
||||
Please provide the following details:
|
||||
|
||||
### 1. Source Tables
|
||||
- **Table List**: Comma-separated list of tables (e.g., `table1, table2, table3`)
|
||||
- **Format**: `database.table_name` or just `table_name` (if same database)
|
||||
- **Example**: `client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion`
|
||||
|
||||
### 2. Source Configuration
|
||||
- **Source Database**: Database containing tables (e.g., `client_src`)
|
||||
- **Staging Database**: Target database (default: `client_stg`)
|
||||
- **Lookup Database**: Reference database for rules (default: `client_config`)
|
||||
|
||||
### 3. SQL Engine (Optional)
|
||||
- **Engine**: Choose one:
|
||||
- `presto` or `trino` - Presto/Trino SQL engine (default)
|
||||
- `hive` - Hive SQL engine
|
||||
- `mixed` - Specify engine per table
|
||||
- If not specified, will default to Presto/Trino for all tables
|
||||
|
||||
### 4. Mixed Engine Example (Optional)
|
||||
If you need different engines for different tables:
|
||||
```
|
||||
Transform table1 using Hive, table2 using Presto, table3 using Hive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Parse Table List
|
||||
I will extract individual tables from your input:
|
||||
- Parse comma-separated list
|
||||
- Detect database prefix for each table
|
||||
- Identify total table count
|
||||
|
||||
### Step 2: Detect Engine Strategy
|
||||
I will determine processing strategy:
|
||||
- **Single Engine**: All tables use same engine
|
||||
- Presto/Trino (default) → All tables to `staging-transformer-presto`
|
||||
- Hive → All tables to `staging-transformer-hive`
|
||||
- **Mixed Engines**: Different engines per table
|
||||
- Parse engine specification per table
|
||||
- Route each table to appropriate sub-agent
|
||||
|
||||
### Step 3: Launch Parallel Sub-Agents
|
||||
I will create parallel sub-agent calls:
|
||||
- **ONE sub-agent per table** (maximum parallelism)
|
||||
- **Single message with multiple Task calls** (concurrent execution)
|
||||
- **Each sub-agent processes independently** (no blocking)
|
||||
- **All sub-agents skip git workflow** (consolidated at end)
|
||||
|
||||
### Step 4: Monitor Parallel Execution
|
||||
I will track all sub-agent progress:
|
||||
- Wait for all sub-agents to complete
|
||||
- Collect results from each transformation
|
||||
- Identify any failures or errors
|
||||
- Report partial success if needed
|
||||
|
||||
### Step 5: Consolidate Results
|
||||
After ALL tables complete successfully:
|
||||
1. **Aggregate file changes** across all tables
|
||||
2. **Execute single git workflow**:
|
||||
- Create feature branch
|
||||
- Commit all changes together
|
||||
- Push to remote
|
||||
- Create comprehensive PR
|
||||
3. **Report complete summary**
|
||||
|
||||
---
|
||||
|
||||
## Processing Strategy
|
||||
|
||||
### Parallel Processing (Recommended for 2+ Tables)
|
||||
```
|
||||
User requests: "Transform tables A, B, C"
|
||||
|
||||
Main Claude creates 3 parallel sub-agent calls:
|
||||
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │
|
||||
│ (Table A) │ │ (Table B) │ │ (Table C) │
|
||||
│ staging- │ │ staging- │ │ staging- │
|
||||
│ transformer- │ │ transformer- │ │ transformer- │
|
||||
│ presto │ │ presto │ │ presto │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
↓ ↓ ↓
|
||||
[Files for A] [Files for B] [Files for C]
|
||||
↓ ↓ ↓
|
||||
└─────────────────────┴─────────────────────┘
|
||||
↓
|
||||
[Consolidated Git Workflow]
|
||||
[Single PR with all tables]
|
||||
```
|
||||
|
||||
### Performance Benefits:
|
||||
- **Speed**: N tables in ~1x time instead of N×time
|
||||
- **Efficiency**: Optimal resource utilization
|
||||
- **User Experience**: Faster results for batch operations
|
||||
- **Scalability**: Can handle 10+ tables efficiently
|
||||
|
||||
---
|
||||
|
||||
## Quality Assurance (Per Table)
|
||||
|
||||
Each sub-agent ensures complete compliance:
|
||||
|
||||
✅ **Column Limit Management** (max 200 columns)
|
||||
✅ **JSON Detection & Extraction** (automatic)
|
||||
✅ **Date Processing** (4 outputs per date column)
|
||||
✅ **Email/Phone Validation** (with hashing)
|
||||
✅ **String Standardization** (UPPER, TRIM, NULL handling)
|
||||
✅ **Deduplication Logic** (if configured)
|
||||
✅ **Join Processing** (if specified)
|
||||
✅ **Incremental Processing** (state tracking)
|
||||
✅ **SQL File Creation** (init, incremental, upsert)
|
||||
✅ **DIG File Management** (conditional creation)
|
||||
✅ **Configuration Update** (src_params.yml)
|
||||
✅ **Treasure Data Compatibility** (VARCHAR/BIGINT timestamps)
|
||||
|
||||
---
|
||||
|
||||
## Output Files
|
||||
|
||||
### For Presto/Trino Engine (per table):
|
||||
- `staging/init_queries/{source_db}_{table}_init.sql`
|
||||
- `staging/queries/{source_db}_{table}.sql`
|
||||
- `staging/queries/{source_db}_{table}_upsert.sql` (if dedup)
|
||||
- Updated `staging/config/src_params.yml` (all tables)
|
||||
- `staging/staging_transformation.dig` (created once if not exists)
|
||||
|
||||
### For Hive Engine (per table):
|
||||
- `staging_hive/queries/{source_db}_{table}.sql`
|
||||
- Updated `staging_hive/config/src_params.yml` (all tables)
|
||||
- `staging_hive/staging_hive.dig` (created once if not exists)
|
||||
- Template files (created once if not exist)
|
||||
|
||||
### Plus:
|
||||
- Single git commit with all tables
|
||||
- Comprehensive pull request
|
||||
- Complete validation report for all tables
|
||||
|
||||
---
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Example 1: Same Engine (Presto Default)
|
||||
```
|
||||
User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion
|
||||
|
||||
→ Parallel execution with 3 staging-transformer-presto agents
|
||||
→ All files to staging/ directory
|
||||
→ Single consolidated git workflow
|
||||
→ Time: ~1x (vs 3x sequential)
|
||||
```
|
||||
|
||||
### Example 2: Same Engine (Hive Explicit)
|
||||
```
|
||||
User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion
|
||||
|
||||
→ Parallel execution with 2 staging-transformer-hive agents
|
||||
→ All files to staging_hive/ directory
|
||||
→ Single consolidated git workflow
|
||||
→ Time: ~1x (vs 2x sequential)
|
||||
```
|
||||
|
||||
### Example 3: Mixed Engines
|
||||
```
|
||||
User: Transform table1 using Hive, table2 using Presto, table3 using Hive
|
||||
|
||||
→ Parallel execution:
|
||||
- Table1 → staging-transformer-hive
|
||||
- Table2 → staging-transformer-presto
|
||||
- Table3 → staging-transformer-hive
|
||||
→ Files distributed to appropriate directories
|
||||
→ Single consolidated git workflow
|
||||
→ Time: ~1x (vs 3x sequential)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Partial Success Scenario
|
||||
If some tables succeed and others fail:
|
||||
|
||||
1. **Report Clear Status**:
|
||||
```
|
||||
✅ Successfully transformed: table1, table2
|
||||
❌ Failed: table3 (error message)
|
||||
```
|
||||
|
||||
2. **Preserve Successful Work**:
|
||||
- Keep files from successful transformations
|
||||
- Allow retry of only failed tables
|
||||
|
||||
3. **Git Safety**:
|
||||
- Only execute git workflow if ALL tables succeed
|
||||
- Otherwise, keep changes local for review
|
||||
|
||||
### Full Failure Scenario
|
||||
If all tables fail:
|
||||
- Report detailed error for each table
|
||||
- No git workflow execution
|
||||
- Provide troubleshooting guidance
|
||||
|
||||
---
|
||||
|
||||
## Next Steps After Batch Transformation
|
||||
|
||||
1. **Review Pull Request**:
|
||||
```
|
||||
Title: "Batch transform 5 tables to staging"
|
||||
|
||||
Body:
|
||||
- Transformed tables: table1, table2, table3, table4, table5
|
||||
- Engine: Presto/Trino
|
||||
- All validation gates passed ✅
|
||||
- Files created: 15 SQL files, 1 config update
|
||||
```
|
||||
|
||||
2. **Verify Generated Files**:
|
||||
```bash
|
||||
# For Presto
|
||||
ls -l staging/queries/
|
||||
ls -l staging/init_queries/
|
||||
cat staging/config/src_params.yml
|
||||
|
||||
# For Hive
|
||||
ls -l staging_hive/queries/
|
||||
cat staging_hive/config/src_params.yml
|
||||
```
|
||||
|
||||
3. **Test Workflow**:
|
||||
```bash
|
||||
cd staging # or staging_hive
|
||||
td wf push
|
||||
td wf run staging_transformation.dig # or staging_hive.dig
|
||||
```
|
||||
|
||||
4. **Monitor All Tables**:
|
||||
```sql
|
||||
SELECT table_name, inc_value, project_name
|
||||
FROM client_config.inc_log
|
||||
WHERE table_name IN ('table1', 'table2', 'table3')
|
||||
ORDER BY inc_value DESC
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
| Tables | Sequential Time | Parallel Time | Speedup |
|
||||
|--------|----------------|---------------|---------|
|
||||
| 2 | ~10 min | ~5 min | 2x |
|
||||
| 3 | ~15 min | ~5 min | 3x |
|
||||
| 5 | ~25 min | ~5 min | 5x |
|
||||
| 10 | ~50 min | ~5 min | 10x |
|
||||
|
||||
**Note**: Actual times vary based on table complexity and data volume.
|
||||
|
||||
---
|
||||
|
||||
## Production-Ready Guarantee
|
||||
|
||||
All batch transformations will:
|
||||
- ✅ Execute in parallel for maximum speed
|
||||
- ✅ Maintain complete quality for each table
|
||||
- ✅ Provide atomic git workflow (all or nothing)
|
||||
- ✅ Include comprehensive error handling
|
||||
- ✅ Generate maintainable code
|
||||
- ✅ Match production standards exactly
|
||||
|
||||
---
|
||||
|
||||
**Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!**
|
||||
|
||||
**Format Examples:**
|
||||
- `Transform tables: table1, table2, table3` (same database)
|
||||
- `Transform client_src.table1, client_src.table2` (explicit database)
|
||||
- `Transform table1 using Hive, table2 using Presto` (mixed engines)
|
||||
Reference in New Issue
Block a user