---
name: histunion-batch
description: Create hist-union workflows for multiple tables in batch with parallel processing
---

# Create Batch Hist-Union Workflows

## ⚠️ CRITICAL: This command processes multiple tables efficiently with schema validation

I'll help you create hist-union workflows for multiple tables at once, with proper schema validation for each table.

---

## Required Information

### 1. Table List
Provide table names in any format (comma-separated or one per line):

**Option A - Base names:**
```
client_src.klaviyo_events, client_src.shopify_products, client_src.onetrust_profiles
```

**Option B - Hist names:**
```
client_src.klaviyo_events_hist
client_src.shopify_products_hist
client_src.onetrust_profiles_hist
```

**Option C - Mixed formats:**
```
client_src.klaviyo_events, client_src.shopify_products_hist, client_src.onetrust_profiles
```

**Option D - List format:**
```
- client_src.klaviyo_events
- client_src.shopify_products
- client_src.onetrust_profiles
```

### 2. Lookup Database (Optional)
- **Lookup/Config Database**: Database for inc_log watermark table
- **Default**: `client_config` (will be used if not specified)

---

## What I'll Do

### Step 1: Parse All Table Names
I will parse and normalize all table names:
```
For each table in list:
1. Extract database and base name
2. Remove _hist or _histunion suffix if present
3. Derive:
   - Inc table: {database}.{base_name}
   - Hist table: {database}.{base_name}_hist
   - Target table: {database}.{base_name}_histunion
```

### Step 2: Get Schemas for All Tables via MCP Tool
**CRITICAL**: I will get exact schemas for EVERY table:
```
For each table:
1. Call mcp__mcc_treasuredata__describe_table for inc table
   - Get complete column list
   - Get exact column order
   - Get data types

2. Call mcp__mcc_treasuredata__describe_table for hist table
   - Get complete column list
   - Get exact column order
   - Get data types

3. Compare schemas:
   - Document column differences
   - Note any extra columns in inc vs hist
   - Record exact column order
```

**Note**: This may require multiple MCP calls. I'll process them efficiently.

### Step 3: Check Full Load Status for Each Table
I will check each table against full load list:
```
For each table:
IF table_name IN ('klaviyo_lists', 'klaviyo_metric_data'):
    template[table] = 'FULL_LOAD'  # Case 3
ELSE:
    IF inc_schema == hist_schema:
        template[table] = 'IDENTICAL'  # Case 1
    ELSE:
        template[table] = 'EXTRA_COLUMNS'  # Case 2
```

### Step 4: Generate SQL Files for All Tables
I will create SQL file for each table in ONE response:
```
For each table, create: hist_union/queries/{base_name}.sql

With correct template based on schema analysis:
- Case 1: Identical schemas
- Case 2: Inc has extra columns
- Case 3: Full load

All files created in parallel using multiple Write tool calls
```

### Step 5: Update Digdag Workflow
I will update workflow with all tables:
```
File: hist_union/hist_union_runner.dig

Structure:
+hist_union_tasks:
  _parallel: true

  +{table1_name}_histunion:
    td>: queries/{table1_name}.sql

  +{table2_name}_histunion:
    td>: queries/{table2_name}.sql

  +{table3_name}_histunion:
    td>: queries/{table3_name}.sql

  ... (all tables)
```

### Step 6: Verify Quality Gates for All Tables
Before delivering, I will verify for EACH table:
```
For each table:
✅ MCP tool used for both inc and hist schemas
✅ Schema differences identified
✅ Correct template selected
✅ All inc columns present in exact order
✅ NULL handling correct for missing columns
✅ Watermarks included for both hist and inc
✅ Parallel execution configured
```

---

## Batch Processing Strategy

### Efficient MCP Usage
```
1. Collect all table names first
2. Make MCP calls for all inc tables
3. Make MCP calls for all hist tables
4. Compare all schemas in batch
5. Generate all SQL files in ONE response
6. Update workflow once with all tasks
```

### Parallel File Generation
I will use multiple Write tool calls in a SINGLE response:
```
Single Response Contains:
- Write: hist_union/queries/table1.sql
- Write: hist_union/queries/table2.sql
- Write: hist_union/queries/table3.sql
- ... (all tables)
- Edit: hist_union/hist_union_runner.dig (add all tasks)
```

---

## Output

I will generate:

### For N Tables:
1. **hist_union/queries/{table1}.sql** - SQL for table 1
2. **hist_union/queries/{table2}.sql** - SQL for table 2
3. **hist_union/queries/{table3}.sql** - SQL for table 3
4. ... (one SQL file per table)
5. **hist_union/hist_union_runner.dig** - Updated workflow with all tables

### Workflow Structure:
```yaml
timezone: UTC

_export:
  td:
    database: {database}
  lkup_db: {lkup_db}

+create_inc_log_table:
  td>:
  query: |
    CREATE TABLE IF NOT EXISTS ${lkup_db}.inc_log (
      table_name varchar,
      project_name varchar,
      inc_value bigint
    )

+hist_union_tasks:
  _parallel: true

  +table1_histunion:
    td>: queries/table1.sql

  +table2_histunion:
    td>: queries/table2.sql

  +table3_histunion:
    td>: queries/table3.sql

  # ... all tables processed in parallel
```

---

## Progress Reporting

During processing, I will report:

### Phase 1: Parsing
```
Parsing table names...
✅ Found 5 tables to process:
  1. client_src.klaviyo_events
  2. client_src.shopify_products
  3. client_src.onetrust_profiles
  4. client_src.klaviyo_lists (FULL LOAD)
  5. client_src.users
```

### Phase 2: Schema Retrieval
```
Retrieving schemas via MCP tool...
✅ Got schema for client_src.klaviyo_events (inc)
✅ Got schema for client_src.klaviyo_events_hist (hist)
✅ Got schema for client_src.shopify_products (inc)
✅ Got schema for client_src.shopify_products_hist (hist)
... (all tables)
```

### Phase 3: Schema Analysis
```
Analyzing schemas...
✅ Table 1: Identical schemas - Use Case 1
✅ Table 2: Inc has extra 'incremental_date' - Use Case 2
✅ Table 3: Identical schemas - Use Case 1
✅ Table 4: FULL LOAD - Use Case 3
✅ Table 5: Identical schemas - Use Case 1
```

### Phase 4: File Generation
```
Generating all files...
✅ Created hist_union/queries/klaviyo_events.sql
✅ Created hist_union/queries/shopify_products.sql
✅ Created hist_union/queries/onetrust_profiles.sql
✅ Created hist_union/queries/klaviyo_lists.sql (FULL LOAD)
✅ Created hist_union/queries/users.sql
✅ Updated hist_union/hist_union_runner.dig with 5 parallel tasks
```

---

## Special Handling

### Mixed Databases
If tables are from different databases:
```
✅ Supported - Each SQL file uses correct database
✅ Workflow uses primary database in _export
✅ Individual tasks can override if needed
```

### Full Load Tables in Batch
```
✅ Automatically detected (klaviyo_lists, klaviyo_metric_data)
✅ Uses Case 3 template (DROP + CREATE, no WHERE)
✅ Still updates watermarks
✅ Processed in parallel with other tables
```

### Schema Differences
```
✅ Each table analyzed independently
✅ NULL handling applied only where needed
✅ Exact column order maintained per table
✅ Template selection per table based on schema
```

---

## Performance Benefits

### Why Batch Processing?
- ✅ **Faster**: All files created in one response
- ✅ **Consistent**: Single workflow file with all tasks
- ✅ **Efficient**: Parallel MCP calls where possible
- ✅ **Complete**: All tables configured together
- ✅ **Parallel Execution**: All tasks run concurrently in Treasure Data

### Execution Efficiency
```
Sequential Processing:
Table 1: 10 min
Table 2: 10 min
Table 3: 10 min
Total: 30 minutes

Parallel Processing:
All tables: ~10 minutes (depending on slowest table)
```

---

## Next Steps After Generation

1. **Review All Generated Files**:
   ```bash
   ls -la hist_union/queries/
   cat hist_union/hist_union_runner.dig
   ```

2. **Verify Workflow Syntax**:
   ```bash
   cd hist_union
   td wf check hist_union_runner.dig
   ```

3. **Run Batch Workflow**:
   ```bash
   td wf run hist_union_runner.dig
   ```

4. **Monitor Progress**:
   ```bash
   td wf logs hist_union_runner.dig
   ```

5. **Verify All Results**:
   ```sql
   -- Check watermarks for all tables
   SELECT * FROM {lkup_db}.inc_log
   WHERE project_name = 'hist_union'
   ORDER BY table_name;

   -- Check row counts for all histunion tables
   SELECT
     '{table1}_histunion' as table_name,
     COUNT(*) as row_count
   FROM {database}.{table1}_histunion
   UNION ALL
   SELECT
     '{table2}_histunion',
     COUNT(*)
   FROM {database}.{table2}_histunion
   -- ... (for all tables)
   ```

---

## Example

### Input
```
Create hist-union for these tables:
- client_src.klaviyo_events
- client_src.shopify_products_hist
- client_src.onetrust_profiles
- client_src.klaviyo_lists
```

### Output Summary
```
✅ Processed 4 tables:

1. klaviyo_events (Incremental - Case 1: Identical schemas)
   - Inc: client_src.klaviyo_events
   - Hist: client_src.klaviyo_events_hist
   - Target: client_src.klaviyo_events_histunion

2. shopify_products (Incremental - Case 2: Inc has extra columns)
   - Inc: client_src.shopify_products
   - Hist: client_src.shopify_products_hist
   - Target: client_src.shopify_products_histunion
   - Extra columns in inc: incremental_date

3. onetrust_profiles (Incremental - Case 1: Identical schemas)
   - Inc: client_src.onetrust_profiles
   - Hist: client_src.onetrust_profiles_hist
   - Target: client_src.onetrust_profiles_histunion

4. klaviyo_lists (FULL LOAD - Case 3)
   - Inc: client_src.klaviyo_lists
   - Hist: client_src.klaviyo_lists_hist
   - Target: client_src.klaviyo_lists_histunion

Created 4 SQL files + 1 workflow file
All tasks configured for parallel execution
```

---

## Production-Ready Guarantee

All generated code will:
- ✅ Use exact schemas from MCP tool for every table
- ✅ Handle schema differences correctly per table
- ✅ Use correct template based on individual table analysis
- ✅ Process all tables in parallel for maximum efficiency
- ✅ Maintain exact column order per table
- ✅ Include proper NULL handling where needed
- ✅ Update watermarks for all tables
- ✅ Follow Presto/Trino SQL syntax
- ✅ Be production-tested and proven

---

**Ready to proceed? Please provide your list of tables and I'll generate complete hist-union workflows for all of them using exact schemas from MCP tool and production-tested templates.**