Initial commit

2025-11-30 09:02:36 +08:00
commit 19e906ecca
7 changed files with 1584 additions and 0 deletions
--- a/commands/histunion-batch.md
+++ b/commands/histunion-batch.md
@@ -0,0 +1,420 @@
+---
+name: histunion-batch
+description: Create hist-union workflows for multiple tables in batch with parallel processing
+---
+
+# Create Batch Hist-Union Workflows
+
+## ⚠️ CRITICAL: This command processes multiple tables efficiently with schema validation
+
+I'll help you create hist-union workflows for multiple tables at once, with proper schema validation for each table.
+
+---
+
+## Required Information
+
+### 1. Table List
+Provide table names in any format (comma-separated or one per line):
+
+**Option A - Base names:**
+```
+client_src.klaviyo_events, client_src.shopify_products, client_src.onetrust_profiles
+```
+
+**Option B - Hist names:**
+```
+client_src.klaviyo_events_hist
+client_src.shopify_products_hist
+client_src.onetrust_profiles_hist
+```
+
+**Option C - Mixed formats:**
+```
+client_src.klaviyo_events, client_src.shopify_products_hist, client_src.onetrust_profiles
+```
+
+**Option D - List format:**
+```
+- client_src.klaviyo_events
+- client_src.shopify_products
+- client_src.onetrust_profiles
+```
+
+### 2. Lookup Database (Optional)
+- **Lookup/Config Database**: Database for inc_log watermark table
+- **Default**: `client_config` (will be used if not specified)
+
+---
+
+## What I'll Do
+
+### Step 1: Parse All Table Names
+I will parse and normalize all table names:
+```
+For each table in list:
+1. Extract database and base name
+2. Remove _hist or _histunion suffix if present
+3. Derive:
+   - Inc table: {database}.{base_name}
+   - Hist table: {database}.{base_name}_hist
+   - Target table: {database}.{base_name}_histunion
+```
+
+### Step 2: Get Schemas for All Tables via MCP Tool
+**CRITICAL**: I will get exact schemas for EVERY table:
+```
+For each table:
+1. Call mcp__mcc_treasuredata__describe_table for inc table
+   - Get complete column list
+   - Get exact column order
+   - Get data types
+
+2. Call mcp__mcc_treasuredata__describe_table for hist table
+   - Get complete column list
+   - Get exact column order
+   - Get data types
+
+3. Compare schemas:
+   - Document column differences
+   - Note any extra columns in inc vs hist
+   - Record exact column order
+```
+
+**Note**: This may require multiple MCP calls. I'll process them efficiently.
+
+### Step 3: Check Full Load Status for Each Table
+I will check each table against full load list:
+```
+For each table:
+IF table_name IN ('klaviyo_lists', 'klaviyo_metric_data'):
+    template[table] = 'FULL_LOAD'  # Case 3
+ELSE:
+    IF inc_schema == hist_schema:
+        template[table] = 'IDENTICAL'  # Case 1
+    ELSE:
+        template[table] = 'EXTRA_COLUMNS'  # Case 2
+```
+
+### Step 4: Generate SQL Files for All Tables
+I will create SQL file for each table in ONE response:
+```
+For each table, create: hist_union/queries/{base_name}.sql
+
+With correct template based on schema analysis:
+- Case 1: Identical schemas
+- Case 2: Inc has extra columns
+- Case 3: Full load
+
+All files created in parallel using multiple Write tool calls
+```
+
+### Step 5: Update Digdag Workflow
+I will update workflow with all tables:
+```
+File: hist_union/hist_union_runner.dig
+
+Structure:
+hist_union_tasks:
+  _parallel: true
+
+  +{table1_name}_histunion:
+    td>: queries/{table1_name}.sql
+
+  +{table2_name}_histunion:
+    td>: queries/{table2_name}.sql
+
+  +{table3_name}_histunion:
+    td>: queries/{table3_name}.sql
+
+  ... (all tables)
+```
+
+### Step 6: Verify Quality Gates for All Tables
+Before delivering, I will verify for EACH table:
+```
+For each table:
+✅ MCP tool used for both inc and hist schemas
+✅ Schema differences identified
+✅ Correct template selected
+✅ All inc columns present in exact order
+✅ NULL handling correct for missing columns
+✅ Watermarks included for both hist and inc
+✅ Parallel execution configured
+```
+
+---
+
+## Batch Processing Strategy
+
+### Efficient MCP Usage
+```
+1. Collect all table names first
+2. Make MCP calls for all inc tables
+3. Make MCP calls for all hist tables
+4. Compare all schemas in batch
+5. Generate all SQL files in ONE response
+6. Update workflow once with all tasks
+```
+
+### Parallel File Generation
+I will use multiple Write tool calls in a SINGLE response:
+```
+Single Response Contains:
+- Write: hist_union/queries/table1.sql
+- Write: hist_union/queries/table2.sql
+- Write: hist_union/queries/table3.sql
+- ... (all tables)
+- Edit: hist_union/hist_union_runner.dig (add all tasks)
+```
+
+---
+
+## Output
+
+I will generate:
+
+### For N Tables:
+1. **hist_union/queries/{table1}.sql** - SQL for table 1
+2. **hist_union/queries/{table2}.sql** - SQL for table 2
+3. **hist_union/queries/{table3}.sql** - SQL for table 3
+4. ... (one SQL file per table)
+5. **hist_union/hist_union_runner.dig** - Updated workflow with all tables
+
+### Workflow Structure:
+```yaml
+timezone: UTC
+
+_export:
+  td:
+    database: {database}
+  lkup_db: {lkup_db}
+
+create_inc_log_table:
+  td>:
+  query: |
+    CREATE TABLE IF NOT EXISTS ${lkup_db}.inc_log (
+      table_name varchar,
+      project_name varchar,
+      inc_value bigint
+    )
+
+hist_union_tasks:
+  _parallel: true
+
+  +table1_histunion:
+    td>: queries/table1.sql
+
+  +table2_histunion:
+    td>: queries/table2.sql
+
+  +table3_histunion:
+    td>: queries/table3.sql
+
+  # ... all tables processed in parallel
+```
+
+---
+
+## Progress Reporting
+
+During processing, I will report:
+
+### Phase 1: Parsing
+```
+Parsing table names...
+✅ Found 5 tables to process:
+  1. client_src.klaviyo_events
+  2. client_src.shopify_products
+  3. client_src.onetrust_profiles
+  4. client_src.klaviyo_lists (FULL LOAD)
+  5. client_src.users
+```
+
+### Phase 2: Schema Retrieval
+```
+Retrieving schemas via MCP tool...
+✅ Got schema for client_src.klaviyo_events (inc)
+✅ Got schema for client_src.klaviyo_events_hist (hist)
+✅ Got schema for client_src.shopify_products (inc)
+✅ Got schema for client_src.shopify_products_hist (hist)
+... (all tables)
+```
+
+### Phase 3: Schema Analysis
+```
+Analyzing schemas...
+✅ Table 1: Identical schemas - Use Case 1
+✅ Table 2: Inc has extra 'incremental_date' - Use Case 2
+✅ Table 3: Identical schemas - Use Case 1
+✅ Table 4: FULL LOAD - Use Case 3
+✅ Table 5: Identical schemas - Use Case 1
+```
+
+### Phase 4: File Generation
+```
+Generating all files...
+✅ Created hist_union/queries/klaviyo_events.sql
+✅ Created hist_union/queries/shopify_products.sql
+✅ Created hist_union/queries/onetrust_profiles.sql
+✅ Created hist_union/queries/klaviyo_lists.sql (FULL LOAD)
+✅ Created hist_union/queries/users.sql
+✅ Updated hist_union/hist_union_runner.dig with 5 parallel tasks
+```
+
+---
+
+## Special Handling
+
+### Mixed Databases
+If tables are from different databases:
+```
+✅ Supported - Each SQL file uses correct database
+✅ Workflow uses primary database in _export
+✅ Individual tasks can override if needed
+```
+
+### Full Load Tables in Batch
+```
+✅ Automatically detected (klaviyo_lists, klaviyo_metric_data)
+✅ Uses Case 3 template (DROP + CREATE, no WHERE)
+✅ Still updates watermarks
+✅ Processed in parallel with other tables
+```
+
+### Schema Differences
+```
+✅ Each table analyzed independently
+✅ NULL handling applied only where needed
+✅ Exact column order maintained per table
+✅ Template selection per table based on schema
+```
+
+---
+
+## Performance Benefits
+
+### Why Batch Processing?
+- ✅ **Faster**: All files created in one response
+- ✅ **Consistent**: Single workflow file with all tasks
+- ✅ **Efficient**: Parallel MCP calls where possible
+- ✅ **Complete**: All tables configured together
+- ✅ **Parallel Execution**: All tasks run concurrently in Treasure Data
+
+### Execution Efficiency
+```
+Sequential Processing:
+Table 1: 10 min
+Table 2: 10 min
+Table 3: 10 min
+Total: 30 minutes
+
+Parallel Processing:
+All tables: ~10 minutes (depending on slowest table)
+```
+
+---
+
+## Next Steps After Generation
+
+1. **Review All Generated Files**:
+   ```bash
+   ls -la hist_union/queries/
+   cat hist_union/hist_union_runner.dig
+   ```
+
+2. **Verify Workflow Syntax**:
+   ```bash
+   cd hist_union
+   td wf check hist_union_runner.dig
+   ```
+
+3. **Run Batch Workflow**:
+   ```bash
+   td wf run hist_union_runner.dig
+   ```
+
+4. **Monitor Progress**:
+   ```bash
+   td wf logs hist_union_runner.dig
+   ```
+
+5. **Verify All Results**:
+   ```sql
+   -- Check watermarks for all tables
+   SELECT * FROM {lkup_db}.inc_log
+   WHERE project_name = 'hist_union'
+   ORDER BY table_name;
+
+   -- Check row counts for all histunion tables
+   SELECT
+     '{table1}_histunion' as table_name,
+     COUNT(*) as row_count
+   FROM {database}.{table1}_histunion
+   UNION ALL
+   SELECT
+     '{table2}_histunion',
+     COUNT(*)
+   FROM {database}.{table2}_histunion
+   -- ... (for all tables)
+   ```
+
+---
+
+## Example
+
+### Input
+```
+Create hist-union for these tables:
+- client_src.klaviyo_events
+- client_src.shopify_products_hist
+- client_src.onetrust_profiles
+- client_src.klaviyo_lists
+```
+
+### Output Summary
+```
+✅ Processed 4 tables:
+
+1. klaviyo_events (Incremental - Case 1: Identical schemas)
+   - Inc: client_src.klaviyo_events
+   - Hist: client_src.klaviyo_events_hist
+   - Target: client_src.klaviyo_events_histunion
+
+2. shopify_products (Incremental - Case 2: Inc has extra columns)
+   - Inc: client_src.shopify_products
+   - Hist: client_src.shopify_products_hist
+   - Target: client_src.shopify_products_histunion
+   - Extra columns in inc: incremental_date
+
+3. onetrust_profiles (Incremental - Case 1: Identical schemas)
+   - Inc: client_src.onetrust_profiles
+   - Hist: client_src.onetrust_profiles_hist
+   - Target: client_src.onetrust_profiles_histunion
+
+4. klaviyo_lists (FULL LOAD - Case 3)
+   - Inc: client_src.klaviyo_lists
+   - Hist: client_src.klaviyo_lists_hist
+   - Target: client_src.klaviyo_lists_histunion
+
+Created 4 SQL files + 1 workflow file
+All tasks configured for parallel execution
+```
+
+---
+
+## Production-Ready Guarantee
+
+All generated code will:
+- ✅ Use exact schemas from MCP tool for every table
+- ✅ Handle schema differences correctly per table
+- ✅ Use correct template based on individual table analysis
+- ✅ Process all tables in parallel for maximum efficiency
+- ✅ Maintain exact column order per table
+- ✅ Include proper NULL handling where needed
+- ✅ Update watermarks for all tables
+- ✅ Follow Presto/Trino SQL syntax
+- ✅ Be production-tested and proven
+
+---
+
+**Ready to proceed? Please provide your list of tables and I'll generate complete hist-union workflows for all of them using exact schemas from MCP tool and production-tested templates.**
--- a/commands/histunion-create.md
+++ b/commands/histunion-create.md
@@ -0,0 +1,339 @@
+---
+name: histunion-create
+description: Create hist-union workflow for combining historical and incremental table data
+---
+
+# Create Hist-Union Workflow
+
+## ⚠️ CRITICAL: This command enforces strict schema validation and template adherence
+
+I'll help you create a production-ready hist-union workflow to combine historical and incremental table data.
+
+---
+
+## Required Information
+
+Please provide the following details:
+
+### 1. Table Names
+You can provide table names in any of these formats:
+- **Base name**: `client_src.klaviyo_events` (I'll derive hist and histunion names)
+- **Hist name**: `client_src.klaviyo_events_hist` (I'll derive inc and histunion names)
+- **Explicit**: Inc: `client_src.klaviyo_events`, Hist: `client_src.klaviyo_events_hist`
+
+### 2. Lookup Database (Optional)
+- **Lookup/Config Database**: Database for inc_log watermark table
+- **Default**: `client_config` (will be used if not specified)
+
+---
+
+## What I'll Do
+
+### Step 1: Parse Table Names Intelligently
+I will automatically derive all three table names:
+```
+From your input, I'll extract:
+- Database name
+- Base table name (removing _hist or _histunion if present)
+- Inc table: {database}.{base_name}
+- Hist table: {database}.{base_name}_hist
+- Target table: {database}.{base_name}_histunion
+```
+
+### Step 2: Get Exact Schemas via MCP Tool (MANDATORY)
+I will use MCP tool to get exact column information:
+```
+1. Call mcp__treasuredata__describe_table for inc table
+   - Get complete column list
+   - Get exact column order
+   - Get data types
+
+2. Call mcp__treasuredata__describe_table for hist table
+   - Get complete column list
+   - Get exact column order
+   - Get data types
+
+3. Compare schemas:
+   - Identify columns in inc but not in hist
+   - Identify any schema differences
+   - Document column order
+```
+
+### Step 3: Check Full Load Status
+I will check if table requires full load processing:
+```
+IF table_name IN ('klaviyo_lists', 'klaviyo_metric_data'):
+    Use FULL LOAD template (Case 3)
+    - DROP TABLE and recreate
+    - Load ALL data (no WHERE clause)
+    - Still update watermarks
+ELSE:
+    Use INCREMENTAL template (Case 1 or 2)
+    - CREATE TABLE IF NOT EXISTS
+    - Filter using inc_log watermarks
+    - Update watermarks after insert
+```
+
+### Step 4: Select Correct SQL Template
+Based on schema comparison:
+```
+IF full_load_table:
+    Template = Case 3 (Full Load)
+ELIF inc_schema == hist_schema:
+    Template = Case 1 (Identical schemas)
+ELSE:
+    Template = Case 2 (Inc has extra columns)
+```
+
+### Step 5: Generate SQL File
+I will create SQL file with exact schema:
+```
+File: hist_union/queries/{base_table_name}.sql
+
+Structure:
+- CREATE TABLE (or DROP + CREATE for full load)
+  - Use EXACT inc table schema
+  - Maintain exact column order
+
+- INSERT INTO with UNION ALL:
+  - Historical SELECT
+    - Add NULL for columns missing in hist
+    - Use inc_log watermark (skip for full load)
+  - Incremental SELECT
+    - Use all columns in exact order
+    - Use inc_log watermark (skip for full load)
+
+- UPDATE watermarks:
+  - Update hist table watermark
+  - Update inc table watermark
+```
+
+### Step 6: Create or Update Digdag Workflow
+I will update the workflow file:
+```
+File: hist_union/hist_union_runner.dig
+
+If file doesn't exist, create with:
+- timezone: UTC
+- _export section with database and lkup_db
+- +create_inc_log_table task
+- +hist_union_tasks section with _parallel: true
+
+Add new task:
+hist_union_tasks:
+  _parallel: true
+  +{table_name}_histunion:
+    td>: queries/{table_name}.sql
+```
+
+### Step 7: Verify Quality Gates
+Before delivering, I will verify:
+```
+✅ MCP tool used for both inc and hist table schemas
+✅ Schema differences identified and documented
+✅ Correct template selected (Case 1, 2, or 3)
+✅ All inc table columns present in CREATE TABLE
+✅ Exact column order maintained from inc schema
+✅ NULL added for columns missing in hist table (if applicable)
+✅ Watermark updates present for both hist and inc tables
+✅ _parallel: true configured for concurrent execution
+✅ No schedule block in workflow file
+✅ Correct lkup_db set (client_config or user-specified)
+```
+
+---
+
+## Output
+
+I will generate:
+
+### For Single Table:
+1. **hist_union/queries/{table_name}.sql** - SQL for combining hist and inc data
+2. **hist_union/hist_union_runner.dig** - Updated workflow file
+
+### File Contents:
+
+**SQL File Structure:**
+```sql
+-- CREATE TABLE using exact inc table schema
+CREATE TABLE IF NOT EXISTS {database}.{table_name}_histunion (
+  -- All columns from inc table in exact order
+  ...
+);
+
+-- INSERT with UNION ALL
+INSERT INTO {database}.{table_name}_histunion
+-- Historical data (with NULL for missing columns if needed)
+SELECT ...
+FROM {database}.{table_name}_hist
+WHERE time > COALESCE((SELECT MAX(inc_value) FROM {lkup_db}.inc_log ...), 0)
+
+UNION ALL
+
+-- Incremental data
+SELECT ...
+FROM {database}.{table_name}
+WHERE time > COALESCE((SELECT MAX(inc_value) FROM {lkup_db}.inc_log ...), 0);
+
+-- Update watermarks
+INSERT INTO {lkup_db}.inc_log ...
+```
+
+**Workflow File Structure:**
+```yaml
+timezone: UTC
+
+_export:
+  td:
+    database: {database}
+  lkup_db: {lkup_db}
+
+create_inc_log_table:
+  td>:
+  query: |
+    CREATE TABLE IF NOT EXISTS ${lkup_db}.inc_log (...)
+
+hist_union_tasks:
+  _parallel: true
+  +{table_name}_histunion:
+    td>: queries/{table_name}.sql
+```
+
+---
+
+## Special Cases
+
+### Full Load Tables
+For `klaviyo_lists` and `klaviyo_metric_data`:
+```sql
+-- DROP TABLE (fresh start each run)
+DROP TABLE IF EXISTS {database}.{table_name}_histunion;
+
+-- CREATE TABLE (no IF NOT EXISTS)
+CREATE TABLE {database}.{table_name}_histunion (...);
+
+-- INSERT with NO WHERE clause (load all data)
+INSERT INTO {database}.{table_name}_histunion
+SELECT ... FROM {database}.{table_name}_hist
+UNION ALL
+SELECT ... FROM {database}.{table_name};
+
+-- Still update watermarks (for tracking)
+INSERT INTO {lkup_db}.inc_log ...
+```
+
+### Schema Differences
+When inc table has columns that hist table doesn't:
+```sql
+-- CREATE uses inc schema (includes all columns)
+CREATE TABLE IF NOT EXISTS {database}.{table_name}_histunion (
+  incremental_date varchar,  -- Extra column from inc
+  ...other columns...
+);
+
+-- Hist SELECT adds NULL for missing columns
+SELECT
+  NULL as incremental_date,  -- NULL for missing column
+  ...other columns...
+FROM {database}.{table_name}_hist
+
+UNION ALL
+
+-- Inc SELECT uses all columns
+SELECT
+  incremental_date,  -- Actual value
+  ...other columns...
+FROM {database}.{table_name}
+```
+
+---
+
+## Next Steps After Generation
+
+1. **Review Generated Files**:
+   ```bash
+   cat hist_union/queries/{table_name}.sql
+   cat hist_union/hist_union_runner.dig
+   ```
+
+2. **Verify SQL Syntax**:
+   ```bash
+   cd hist_union
+   td wf check hist_union_runner.dig
+   ```
+
+3. **Run Workflow**:
+   ```bash
+   td wf run hist_union_runner.dig
+   ```
+
+4. **Verify Results**:
+   ```sql
+   -- Check row counts
+   SELECT COUNT(*) FROM {database}.{table_name}_histunion;
+
+   -- Check watermarks
+   SELECT * FROM {lkup_db}.inc_log
+   WHERE project_name = 'hist_union'
+   ORDER BY table_name;
+
+   -- Sample data
+   SELECT * FROM {database}.{table_name}_histunion
+   LIMIT 10;
+   ```
+
+---
+
+## Examples
+
+### Example 1: Simple Table Name
+```
+User: "Create hist-union for client_src.shopify_products"
+
+I will derive:
+- Inc: client_src.shopify_products
+- Hist: client_src.shopify_products_hist
+- Target: client_src.shopify_products_histunion
+- Lookup DB: client_config (default)
+```
+
+### Example 2: Hist Table Name
+```
+User: "Add client_src.klaviyo_events_hist to hist_union"
+
+I will derive:
+- Inc: client_src.klaviyo_events
+- Hist: client_src.klaviyo_events_hist
+- Target: client_src.klaviyo_events_histunion
+- Lookup DB: client_config (default)
+```
+
+### Example 3: Custom Lookup DB
+```
+User: "Create hist-union for mc_src.users with lookup db mc_config"
+
+I will use:
+- Inc: mc_src.users
+- Hist: mc_src.users_hist
+- Target: mc_src.users_histunion
+- Lookup DB: mc_config (user-specified)
+```
+
+---
+
+## Production-Ready Guarantee
+
+All generated code will:
+- ✅ Use exact schemas from MCP tool (no guessing)
+- ✅ Handle schema differences correctly
+- ✅ Use correct template based on full load check
+- ✅ Maintain exact column order
+- ✅ Include proper NULL handling
+- ✅ Update watermarks correctly
+- ✅ Use parallel execution for efficiency
+- ✅ Follow Presto/Trino SQL syntax
+- ✅ Be production-tested and proven
+
+---
+
+**Ready to proceed? Please provide the table name(s) and I'll generate your complete hist-union workflow using exact schemas from MCP tool and production-tested templates.**
--- a/commands/histunion-validate.md
+++ b/commands/histunion-validate.md
@@ -0,0 +1,381 @@
+---
+name: histunion-validate
+description: Validate hist-union workflow and SQL files against production quality gates
+---
+
+# Validate Hist-Union Workflows
+
+## ⚠️ CRITICAL: This command validates all hist-union files against production quality gates
+
+I'll help you validate your hist-union workflow files to ensure they meet production standards.
+
+---
+
+## What Gets Validated
+
+### 1. Workflow File Structure
+**File**: `hist_union/hist_union_runner.dig`
+
+Checks:
+- ✅ Valid YAML syntax
+- ✅ Required sections present (timezone, _export, tasks)
+- ✅ inc_log table creation task exists
+- ✅ hist_union_tasks section present
+- ✅ `_parallel: true` configured for concurrent execution
+- ✅ No schedule block (schedules should be external)
+- ✅ Correct lkup_db variable usage
+- ✅ All SQL files referenced exist
+
+### 2. SQL File Structure
+**Files**: `hist_union/queries/*.sql`
+
+For each SQL file, checks:
+- ✅ Valid SQL syntax (Presto/Trino compatible)
+- ✅ CREATE TABLE statement present
+- ✅ INSERT INTO with UNION ALL structure
+- ✅ Watermark filtering using inc_log (for incremental tables)
+- ✅ Watermark updates for both hist and inc tables
+- ✅ Correct project_name = 'hist_union' in watermark updates
+- ✅ No backticks (use double quotes for reserved keywords)
+- ✅ Consistent table naming (inc, hist, histunion)
+
+### 3. Schema Validation
+**Requires MCP access to Treasure Data**
+
+For each table pair, checks:
+- ✅ Inc table exists and is accessible
+- ✅ Hist table exists and is accessible
+- ✅ CREATE TABLE columns match inc table schema
+- ✅ Column order matches inc table schema
+- ✅ NULL handling for columns missing in hist table
+- ✅ All inc table columns present in SQL
+- ✅ UNION ALL column counts match
+
+### 4. Template Compliance
+Checks against template requirements:
+- ✅ Full load tables use correct template (DROP + no WHERE)
+- ✅ Incremental tables use correct template (CREATE IF NOT EXISTS + WHERE)
+- ✅ Watermark updates present for both tables
+- ✅ COALESCE used for watermark defaults
+- ✅ Correct table name variables used
+
+---
+
+## Validation Modes
+
+### Mode 1: Syntax Validation (Fast)
+**No MCP required** - Validates file structure and SQL syntax only
+```bash
+Use when: Quick syntax check without database access
+Checks: File structure, YAML syntax, SQL syntax, basic patterns
+Duration: ~10 seconds
+```
+
+### Mode 2: Schema Validation (Complete)
+**Requires MCP** - Validates against actual table schemas
+```bash
+Use when: Pre-deployment validation, full compliance check
+Checks: Everything in Mode 1 + schema matching, column validation
+Duration: ~30-60 seconds (depends on table count)
+```
+
+---
+
+## What I'll Do
+
+### Step 1: Scan Files
+```
+Scanning hist_union directory...
+✅ Found workflow file: hist_union_runner.dig
+✅ Found N SQL files in queries/
+```
+
+### Step 2: Validate Workflow File
+```
+Validating hist_union_runner.dig...
+✅ YAML syntax valid
+✅ timezone set to UTC
+✅ _export section present with td.database and lkup_db
+✅ +create_inc_log_table task present
+✅ +hist_union_tasks section present
+✅ _parallel: true configured
+✅ No schedule block found
+✅ All referenced SQL files exist
+```
+
+### Step 3: Validate Each SQL File
+```
+Validating hist_union/queries/klaviyo_events.sql...
+✅ SQL syntax valid (Presto/Trino compatible)
+✅ CREATE TABLE statement found
+✅ Table name: client_src.klaviyo_events_histunion
+✅ INSERT INTO with UNION ALL structure found
+✅ Watermark filtering present for hist table
+✅ Watermark filtering present for inc table
+✅ Watermark update for hist table found
+✅ Watermark update for inc table found
+✅ project_name = 'hist_union' verified
+✅ No backticks found (using double quotes)
+```
+
+### Step 4: Schema Validation (Mode 2 Only)
+```
+Validating schemas via MCP tool...
+
+Table: klaviyo_events
+✅ Inc table exists: client_src.klaviyo_events
+✅ Hist table exists: client_src.klaviyo_events_hist
+✅ Retrieved inc schema: 45 columns
+✅ Retrieved hist schema: 44 columns
+✅ Schema difference: inc has 'incremental_date', hist does not
+✅ CREATE TABLE matches inc schema (45 columns)
+✅ Column order matches inc schema
+✅ NULL handling present for 'incremental_date' in hist SELECT
+✅ All 45 inc columns present in SQL
+✅ UNION ALL column counts match (45 = 45)
+```
+
+### Step 5: Template Compliance Check
+```
+Checking template compliance...
+
+Table: klaviyo_lists
+⚠️  Full load table detected
+✅ Uses Case 3 template (DROP TABLE + no WHERE clause)
+✅ Watermarks still updated
+
+Table: klaviyo_events
+✅ Incremental table
+✅ Uses Case 2 template (inc has extra columns)
+✅ CREATE TABLE IF NOT EXISTS used
+✅ WHERE clauses present for watermark filtering
+✅ COALESCE with default value 0
+```
+
+### Step 6: Generate Validation Report
+```
+Generating validation report...
+✅ Report created with all findings
+✅ Errors highlighted (if any)
+✅ Warnings noted (if any)
+✅ Recommendations provided (if any)
+```
+
+---
+
+## Validation Report Format
+
+### Summary Section
+```
+═══════════════════════════════════════════════════════════
+HIST-UNION VALIDATION REPORT
+═══════════════════════════════════════════════════════════
+
+Validation Mode: [Syntax Only / Full Schema Validation]
+Timestamp: 2024-10-13 14:30:00 UTC
+Workflow File: hist_union/hist_union_runner.dig
+SQL Files: 5
+
+Overall Status: ✅ PASSED / ❌ FAILED / ⚠️  WARNINGS
+```
+
+### Detailed Results
+```
+───────────────────────────────────────────────────────────
+WORKFLOW FILE: hist_union_runner.dig
+───────────────────────────────────────────────────────────
+✅ YAML Syntax: Valid
+✅ Structure: Complete (all required sections present)
+✅ Parallel Execution: Configured (_parallel: true)
+✅ inc_log Task: Present
+✅ Schedule: None (correct)
+✅ SQL References: All 5 files exist
+
+───────────────────────────────────────────────────────────
+SQL FILE: queries/klaviyo_events.sql
+───────────────────────────────────────────────────────────
+✅ SQL Syntax: Valid (Presto/Trino)
+✅ Template: Case 2 (Inc has extra columns)
+✅ Table: client_src.klaviyo_events_histunion
+✅ CREATE TABLE: Present
+✅ UNION ALL: Correct structure
+✅ Watermarks: Both hist and inc updates present
+✅ NULL Handling: Correct for 'incremental_date'
+✅ Schema Match: All 45 columns present in correct order
+
+───────────────────────────────────────────────────────────
+SQL FILE: queries/klaviyo_lists.sql
+───────────────────────────────────────────────────────────
+✅ SQL Syntax: Valid (Presto/Trino)
+✅ Template: Case 3 (Full load)
+⚠️  Table Type: FULL LOAD table
+✅ DROP TABLE: Present
+✅ CREATE TABLE: Correct (no IF NOT EXISTS)
+✅ WHERE Clauses: Absent (correct for full load)
+✅ UNION ALL: Correct structure
+✅ Watermarks: Both hist and inc updates present
+✅ Schema Match: All 52 columns present in correct order
+
+... (for all SQL files)
+```
+
+### Issues Section (if any)
+```
+───────────────────────────────────────────────────────────
+ISSUES FOUND
+───────────────────────────────────────────────────────────
+
+❌ ERROR: queries/shopify_products.sql
+   - Line 15: Column 'incremental_date' missing in CREATE TABLE
+   - Expected: 'incremental_date varchar' based on inc table schema
+   - Fix: Add 'incremental_date varchar' to CREATE TABLE statement
+
+❌ ERROR: queries/users.sql
+   - Line 45: Using backticks around column "index"
+   - Fix: Replace `index` with "index" (Presto/Trino requires double quotes)
+
+⚠️  WARNING: hist_union_runner.dig
+   - Line 25: Task +shopify_variants_histunion references non-existent SQL file
+   - Expected: queries/shopify_variants.sql
+   - Fix: Create missing SQL file or remove task
+
+⚠️  WARNING: queries/onetrust_profiles.sql
+   - Missing watermark update for hist table
+   - Should have: INSERT INTO inc_log for onetrust_profiles_hist
+   - Fix: Add watermark update after UNION ALL insert
+```
+
+### Recommendations Section
+```
+───────────────────────────────────────────────────────────
+RECOMMENDATIONS
+───────────────────────────────────────────────────────────
+
+💡 Consider adding these improvements:
+   1. Add comments to SQL files explaining schema differences
+   2. Document which tables are full load vs incremental
+   3. Add error handling tasks in workflow
+   4. Consider adding validation queries after inserts
+
+💡 Performance optimizations:
+   1. Review parallel task limit based on TD account
+   2. Consider partitioning very large tables
+   3. Review watermark index on inc_log table
+```
+
+---
+
+## Error Categories
+
+### Critical Errors (Must Fix)
+- ❌ Invalid YAML syntax in workflow
+- ❌ Invalid SQL syntax
+- ❌ Missing required sections (CREATE, INSERT, watermarks)
+- ❌ Column count mismatch in UNION ALL
+- ❌ Schema mismatch with inc table
+- ❌ Referenced SQL files don't exist
+- ❌ Inc or hist table doesn't exist in TD
+
+### Warnings (Should Fix)
+- ⚠️  Using backticks instead of double quotes
+- ⚠️  Missing NULL handling for extra columns
+- ⚠️  Wrong template for full load tables
+- ⚠️  Watermark updates incomplete
+- ⚠️  Column order doesn't match schema
+
+### Info (Nice to Have)
+- ℹ️  Could add more comments
+- ℹ️  Could optimize query structure
+- ℹ️  Could add data validation queries
+
+---
+
+## Usage Examples
+
+### Example 1: Quick Syntax Check
+```
+User: "Validate my hist-union files"
+
+I will:
+1. Scan hist_union directory
+2. Validate workflow YAML syntax
+3. Validate all SQL file syntax
+4. Check file references
+5. Generate report with findings
+```
+
+### Example 2: Full Validation with Schema Check
+```
+User: "Validate hist-union files with full schema check"
+
+I will:
+1. Scan hist_union directory
+2. Validate workflow and SQL syntax
+3. Use MCP tool to get all table schemas
+4. Compare CREATE TABLE with actual schemas
+5. Verify column order and NULL handling
+6. Check template compliance
+7. Generate comprehensive report
+```
+
+### Example 3: Validate Specific File
+```
+User: "Validate just the klaviyo_events.sql file"
+
+I will:
+1. Read queries/klaviyo_events.sql
+2. Validate SQL syntax
+3. Check template structure
+4. Optionally get schema via MCP
+5. Generate focused report for this file
+```
+
+---
+
+## Next Steps After Validation
+
+### If Validation Passes
+```bash
+✅ All checks passed!
+
+Next steps:
+1. Deploy to Treasure Data: td wf push hist_union
+2. Run workflow: td wf run hist_union_runner.dig
+3. Monitor execution: td wf logs hist_union_runner.dig
+4. Verify results in target tables
+```
+
+### If Validation Fails
+```bash
+❌ Validation found N errors and M warnings
+
+Next steps:
+1. Review validation report for details
+2. Fix all critical errors (❌)
+3. Address warnings (⚠️ ) if possible
+4. Re-run validation
+5. Deploy only after all errors are resolved
+```
+
+---
+
+## Production-Ready Checklist
+
+Before deploying, ensure:
+- [ ] Workflow file YAML syntax is valid
+- [ ] All SQL files have valid Presto/Trino syntax
+- [ ] All referenced SQL files exist
+- [ ] inc_log table creation task present
+- [ ] Parallel execution configured
+- [ ] No schedule blocks in workflow
+- [ ] All CREATE TABLE statements match inc schemas
+- [ ] Column order matches inc table schemas
+- [ ] NULL handling present for schema differences
+- [ ] Watermark updates present for all tables
+- [ ] Full load tables use correct template
+- [ ] No backticks in SQL (use double quotes)
+- [ ] All table references are correct
+
+---
+
+**Ready to validate? Specify validation mode (syntax-only or full-schema) and I'll run comprehensive validation against all production quality gates.**