Initial commit

2025-11-30 09:02:46 +08:00
commit b81339e588
8 changed files with 2678 additions and 0 deletions
--- a/commands/transform-batch.md
+++ b/commands/transform-batch.md
@@ -0,0 +1,295 @@
+---
+name: transform-batch
+description: Transform multiple database tables in parallel with maximum efficiency
+---
+
+# Transform Multiple Tables to Staging (Batch Mode)
+
+## ⚠️ CRITICAL: This command enables parallel processing for 3x-10x faster transformations
+
+I'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance.
+
+---
+
+## Required Information
+
+Please provide the following details:
+
+### 1. Source Tables
+- **Table List**: Comma-separated list of tables (e.g., `table1, table2, table3`)
+- **Format**: `database.table_name` or just `table_name` (if same database)
+- **Example**: `client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion`
+
+### 2. Source Configuration
+- **Source Database**: Database containing tables (e.g., `client_src`)
+- **Staging Database**: Target database (default: `client_stg`)
+- **Lookup Database**: Reference database for rules (default: `client_config`)
+
+### 3. SQL Engine (Optional)
+- **Engine**: Choose one:
+  - `presto` or `trino` - Presto/Trino SQL engine (default)
+  - `hive` - Hive SQL engine
+  - `mixed` - Specify engine per table
+  - If not specified, will default to Presto/Trino for all tables
+
+### 4. Mixed Engine Example (Optional)
+If you need different engines for different tables:
+```
+Transform table1 using Hive, table2 using Presto, table3 using Hive
+```
+
+---
+
+## What I'll Do
+
+### Step 1: Parse Table List
+I will extract individual tables from your input:
+- Parse comma-separated list
+- Detect database prefix for each table
+- Identify total table count
+
+### Step 2: Detect Engine Strategy
+I will determine processing strategy:
+- **Single Engine**: All tables use same engine
+  - Presto/Trino (default) → All tables to `staging-transformer-presto`
+  - Hive → All tables to `staging-transformer-hive`
+- **Mixed Engines**: Different engines per table
+  - Parse engine specification per table
+  - Route each table to appropriate sub-agent
+
+### Step 3: Launch Parallel Sub-Agents
+I will create parallel sub-agent calls:
+- **ONE sub-agent per table** (maximum parallelism)
+- **Single message with multiple Task calls** (concurrent execution)
+- **Each sub-agent processes independently** (no blocking)
+- **All sub-agents skip git workflow** (consolidated at end)
+
+### Step 4: Monitor Parallel Execution
+I will track all sub-agent progress:
+- Wait for all sub-agents to complete
+- Collect results from each transformation
+- Identify any failures or errors
+- Report partial success if needed
+
+### Step 5: Consolidate Results
+After ALL tables complete successfully:
+1. **Aggregate file changes** across all tables
+2. **Execute single git workflow**:
+   - Create feature branch
+   - Commit all changes together
+   - Push to remote
+   - Create comprehensive PR
+3. **Report complete summary**
+
+---
+
+## Processing Strategy
+
+### Parallel Processing (Recommended for 2+ Tables)
+```
+User requests: "Transform tables A, B, C"
+
+Main Claude creates 3 parallel sub-agent calls:
+
+┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
+│  Sub-Agent 1    │  │  Sub-Agent 2    │  │  Sub-Agent 3    │
+│  (Table A)      │  │  (Table B)      │  │  (Table C)      │
+│  staging-       │  │  staging-       │  │  staging-       │
+│  transformer-   │  │  transformer-   │  │  transformer-   │
+│  presto         │  │  presto         │  │  presto         │
+└─────────────────┘  └─────────────────┘  └─────────────────┘
+        ↓                     ↓                     ↓
+    [Files for A]        [Files for B]        [Files for C]
+        ↓                     ↓                     ↓
+        └─────────────────────┴─────────────────────┘
+                              ↓
+                    [Consolidated Git Workflow]
+                    [Single PR with all tables]
+```
+
+### Performance Benefits:
+- **Speed**: N tables in ~1x time instead of N×time
+- **Efficiency**: Optimal resource utilization
+- **User Experience**: Faster results for batch operations
+- **Scalability**: Can handle 10+ tables efficiently
+
+---
+
+## Quality Assurance (Per Table)
+
+Each sub-agent ensures complete compliance:
+
+✅ **Column Limit Management** (max 200 columns)
+✅ **JSON Detection & Extraction** (automatic)
+✅ **Date Processing** (4 outputs per date column)
+✅ **Email/Phone Validation** (with hashing)
+✅ **String Standardization** (UPPER, TRIM, NULL handling)
+✅ **Deduplication Logic** (if configured)
+✅ **Join Processing** (if specified)
+✅ **Incremental Processing** (state tracking)
+✅ **SQL File Creation** (init, incremental, upsert)
+✅ **DIG File Management** (conditional creation)
+✅ **Configuration Update** (src_params.yml)
+✅ **Treasure Data Compatibility** (VARCHAR/BIGINT timestamps)
+
+---
+
+## Output Files
+
+### For Presto/Trino Engine (per table):
+- `staging/init_queries/{source_db}_{table}_init.sql`
+- `staging/queries/{source_db}_{table}.sql`
+- `staging/queries/{source_db}_{table}_upsert.sql` (if dedup)
+- Updated `staging/config/src_params.yml` (all tables)
+- `staging/staging_transformation.dig` (created once if not exists)
+
+### For Hive Engine (per table):
+- `staging_hive/queries/{source_db}_{table}.sql`
+- Updated `staging_hive/config/src_params.yml` (all tables)
+- `staging_hive/staging_hive.dig` (created once if not exists)
+- Template files (created once if not exist)
+
+### Plus:
+- Single git commit with all tables
+- Comprehensive pull request
+- Complete validation report for all tables
+
+---
+
+## Example Usage
+
+### Example 1: Same Engine (Presto Default)
+```
+User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion
+
+→ Parallel execution with 3 staging-transformer-presto agents
+→ All files to staging/ directory
+→ Single consolidated git workflow
+→ Time: ~1x (vs 3x sequential)
+```
+
+### Example 2: Same Engine (Hive Explicit)
+```
+User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion
+
+→ Parallel execution with 2 staging-transformer-hive agents
+→ All files to staging_hive/ directory
+→ Single consolidated git workflow
+→ Time: ~1x (vs 2x sequential)
+```
+
+### Example 3: Mixed Engines
+```
+User: Transform table1 using Hive, table2 using Presto, table3 using Hive
+
+→ Parallel execution:
+  - Table1 → staging-transformer-hive
+  - Table2 → staging-transformer-presto
+  - Table3 → staging-transformer-hive
+→ Files distributed to appropriate directories
+→ Single consolidated git workflow
+→ Time: ~1x (vs 3x sequential)
+```
+
+---
+
+## Error Handling
+
+### Partial Success Scenario
+If some tables succeed and others fail:
+
+1. **Report Clear Status**:
+   ```
+   ✅ Successfully transformed: table1, table2
+   ❌ Failed: table3 (error message)
+   ```
+
+2. **Preserve Successful Work**:
+   - Keep files from successful transformations
+   - Allow retry of only failed tables
+
+3. **Git Safety**:
+   - Only execute git workflow if ALL tables succeed
+   - Otherwise, keep changes local for review
+
+### Full Failure Scenario
+If all tables fail:
+- Report detailed error for each table
+- No git workflow execution
+- Provide troubleshooting guidance
+
+---
+
+## Next Steps After Batch Transformation
+
+1. **Review Pull Request**:
+   ```
+   Title: "Batch transform 5 tables to staging"
+
+   Body:
+   - Transformed tables: table1, table2, table3, table4, table5
+   - Engine: Presto/Trino
+   - All validation gates passed ✅
+   - Files created: 15 SQL files, 1 config update
+   ```
+
+2. **Verify Generated Files**:
+   ```bash
+   # For Presto
+   ls -l staging/queries/
+   ls -l staging/init_queries/
+   cat staging/config/src_params.yml
+
+   # For Hive
+   ls -l staging_hive/queries/
+   cat staging_hive/config/src_params.yml
+   ```
+
+3. **Test Workflow**:
+   ```bash
+   cd staging  # or staging_hive
+   td wf push
+   td wf run staging_transformation.dig  # or staging_hive.dig
+   ```
+
+4. **Monitor All Tables**:
+   ```sql
+   SELECT table_name, inc_value, project_name
+   FROM client_config.inc_log
+   WHERE table_name IN ('table1', 'table2', 'table3')
+   ORDER BY inc_value DESC
+   ```
+
+---
+
+## Performance Comparison
+
+| Tables | Sequential Time | Parallel Time | Speedup |
+|--------|----------------|---------------|---------|
+| 2      | ~10 min        | ~5 min        | 2x      |
+| 3      | ~15 min        | ~5 min        | 3x      |
+| 5      | ~25 min        | ~5 min        | 5x      |
+| 10     | ~50 min        | ~5 min        | 10x     |
+
+**Note**: Actual times vary based on table complexity and data volume.
+
+---
+
+## Production-Ready Guarantee
+
+All batch transformations will:
+- ✅ Execute in parallel for maximum speed
+- ✅ Maintain complete quality for each table
+- ✅ Provide atomic git workflow (all or nothing)
+- ✅ Include comprehensive error handling
+- ✅ Generate maintainable code
+- ✅ Match production standards exactly
+
+---
+
+**Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!**
+
+**Format Examples:**
+- `Transform tables: table1, table2, table3` (same database)
+- `Transform client_src.table1, client_src.table2` (explicit database)
+- `Transform table1 using Hive, table2 using Presto` (mixed engines)
--- a/commands/transform-table.md
+++ b/commands/transform-table.md
@@ -0,0 +1,186 @@
+---
+name: transform-table
+description: Transform a single database table to staging format with data quality improvements, PII handling, and JSON extraction
+---
+
+# Transform Single Table to Staging
+
+## ⚠️ CRITICAL: This command enforces strict sub-agent delegation
+
+I'll help you transform a database table to staging format using the appropriate staging-transformer sub-agent (Presto/Trino or Hive).
+
+---
+
+## Required Information
+
+Please provide the following details:
+
+### 1. Source Table
+- **Database Name**: Source database (e.g., `client_src`, `demo_db`)
+- **Table Name**: Source table name (e.g., `customer_profiles_histunion`)
+
+### 2. Target Configuration
+- **Staging Database**: Target database (default: `client_stg`)
+- **Lookup Database**: Reference database for rules (default: `client_config`)
+
+### 3. SQL Engine (Optional)
+- **Engine**: Choose one:
+  - `presto` or `trino` - Presto/Trino SQL engine (default)
+  - `hive` - Hive SQL engine
+  - If not specified, will default to Presto/Trino
+
+### 4. Transformation Requirements (Optional)
+- **Deduplication**: Required? (will check client_config.staging_trnsfrm_rules)
+- **JSON Columns**: Will auto-detect and process
+- **Join Logic**: Any joins needed? (will check additional_rules)
+
+---
+
+## What I'll Do
+
+### Step 1: Detect SQL Engine
+I will determine the appropriate sub-agent:
+- **Presto/Trino keywords** → `staging-transformer-presto`
+- **Hive keywords** → `staging-transformer-hive`
+- **No specification** → `staging-transformer-presto` (default)
+
+### Step 2: Delegate to Specialized Agent
+I will invoke the appropriate staging-transformer agent with:
+- Complete table transformation context
+- All mandatory requirements (13 rules)
+- Engine-specific SQL generation
+- Full compliance validation
+
+### Step 3: Sub-Agent Will Execute
+The specialized agent will:
+1. **Validate table existence** (MANDATORY first step)
+2. **Analyze metadata** (columns, types, data samples)
+3. **Check configuration** (deduplication rules, additional rules)
+4. **Detect JSON columns** (automatic processing)
+5. **Generate SQL files**:
+   - `staging/init_queries/{source_db}_{table_name}_init.sql` (Presto)
+   - `staging/queries/{source_db}_{table_name}.sql` (Presto)
+   - `staging/queries/{source_db}_{table_name}_upsert.sql` (if dedup, Presto)
+   - OR `staging_hive/queries/{source_db}_{table_name}.sql` (Hive)
+6. **Update configuration**: `staging/config/src_params.yml` or `staging_hive/config/src_params.yml`
+7. **Create/verify DIG file**: `staging/staging_transformation.dig` or `staging_hive/staging_hive.dig`
+8. **Execute git workflow**: Commit, branch, push, PR creation
+
+---
+
+## Quality Assurance
+
+The sub-agent ensures complete compliance with all requirements:
+
+✅ **Column Limit Management** (max 200 columns)
+✅ **JSON Detection & Extraction** (automatic)
+✅ **Date Processing** (4 outputs per date column)
+✅ **Email/Phone Validation** (with hashing)
+✅ **String Standardization** (UPPER, TRIM, NULL handling)
+✅ **Deduplication Logic** (if configured)
+✅ **Join Processing** (if specified)
+✅ **Incremental Processing** (state tracking)
+✅ **SQL File Creation** (init, incremental, upsert)
+✅ **DIG File Management** (conditional creation)
+✅ **Configuration Update** (src_params.yml)
+✅ **Git Workflow** (complete automation)
+✅ **Treasure Data Compatibility** (VARCHAR/BIGINT timestamps)
+
+---
+
+## Output Files
+
+### For Presto/Trino Engine:
+1. `staging/init_queries/{source_db}_{table_name}_init.sql` - Initial load SQL
+2. `staging/queries/{source_db}_{table_name}.sql` - Incremental SQL
+3. `staging/queries/{source_db}_{table_name}_upsert.sql` - Upsert SQL (if dedup)
+4. `staging/config/src_params.yml` - Updated configuration
+5. `staging/staging_transformation.dig` - Workflow (created if not exists)
+
+### For Hive Engine:
+1. `staging_hive/queries/{source_db}_{table_name}.sql` - Combined SQL
+2. `staging_hive/config/src_params.yml` - Updated configuration
+3. `staging_hive/staging_hive.dig` - Workflow (created if not exists)
+4. `staging_hive/queries/get_max_time.sql` - Template (created if not exists)
+5. `staging_hive/queries/get_stg_rows_for_delete.sql` - Template (created if not exists)
+
+### Plus:
+- Git commit with comprehensive message
+- Pull request with transformation summary
+- Validation report
+
+---
+
+## Example Usage
+
+### Example 1: Presto Engine (Default)
+```
+User: Transform table client_src.customer_profiles_histunion
+→ Engine: Presto (default)
+→ Sub-agent: staging-transformer-presto
+→ Output: staging/ directory files
+```
+
+### Example 2: Hive Engine (Explicit)
+```
+User: Transform table client_src.klaviyo_events_histunion using Hive
+→ Engine: Hive
+→ Sub-agent: staging-transformer-hive
+→ Output: staging_hive/ directory files
+```
+
+### Example 3: With Custom Databases
+```
+User: Transform demo_db.orders_histunion
+      Use demo_db_stg as staging database
+      Use client_config for lookup
+→ Engine: Presto (default)
+→ Custom databases applied
+```
+
+---
+
+## Next Steps After Transformation
+
+1. **Review generated files**:
+   ```bash
+   ls -l staging/queries/
+   ls -l staging/init_queries/
+   cat staging/config/src_params.yml
+   ```
+
+2. **Review Pull Request**:
+   - Check transformation summary
+   - Verify all validation gates passed
+   - Review generated SQL
+
+3. **Test the workflow**:
+   ```bash
+   cd staging
+   td wf push
+   td wf run staging_transformation.dig
+   ```
+
+4. **Monitor execution**:
+   ```sql
+   SELECT * FROM client_config.inc_log
+   WHERE table_name = '{your_table}'
+   ORDER BY inc_value DESC
+   LIMIT 1
+   ```
+
+---
+
+## Production-Ready Guarantee
+
+All transformations will:
+- ✅ Work the first time
+- ✅ Follow consistent patterns
+- ✅ Include complete error handling
+- ✅ Include comprehensive data quality
+- ✅ Be maintainable and documented
+- ✅ Match production standards exactly
+
+---
+
+**Ready to proceed? Please provide the source database and table name, and I'll delegate to the appropriate staging-transformer agent for complete processing.**
--- a/commands/transform-validation.md
+++ b/commands/transform-validation.md
@@ -0,0 +1,363 @@
+---
+name: transform-validation
+description: Validate staging transformation files against CLAUDE.md compliance and quality gates
+---
+
+# Validate Staging Transformation
+
+## ⚠️ CRITICAL: This validates against strict production quality gates
+
+I'll validate your staging transformation files for compliance with CLAUDE.md specifications and production standards.
+
+---
+
+## What I'll Validate
+
+### Quality Gates (ALL MUST PASS)
+
+#### 1. File Structure Compliance
+- ✅ Correct directory structure (staging/ or staging_hive/)
+- ✅ Proper file naming conventions
+- ✅ All required files present
+- ✅ No missing dependencies
+
+#### 2. SQL File Validation
+- ✅ **Incremental SQL** (`staging/queries/{source_db}_{table}.sql`)
+  - Correct FROM clause with `{source_database}.{source_table}`
+  - Correct WHERE clause with incremental logic
+  - Proper database references
+- ✅ **Initial Load SQL** (`staging/init_queries/{source_db}_{table}_init.sql`)
+  - Full table scan (no WHERE clause)
+  - Same transformations as incremental
+- ✅ **Upsert SQL** (`staging/queries/{source_db}_{table}_upsert.sql`)
+  - Only present if deduplication exists
+  - Correct DELETE and INSERT logic
+  - Work table pattern used
+
+#### 3. Data Transformation Standards
+- ✅ **Column Processing**:
+  - All columns transformed (except 'time')
+  - Column limit compliance (max 200)
+  - Proper type casting
+- ✅ **Date Columns** (CRITICAL):
+  - ALL date columns have 4 outputs (original, _std, _unixtime, _date)
+  - `time` column NOT transformed
+  - Correct FORMAT_DATETIME patterns (Presto) or date_format (Hive)
+- ✅ **JSON Columns** (CRITICAL):
+  - ALL JSON columns detected and processed
+  - Top-level key extraction completed
+  - Nested objects handled (up to 2 levels)
+  - Array handling with TRY_CAST
+  - All extractions wrapped with NULLIF(UPPER(...), '')
+- ✅ **Email Columns**:
+  - Original + _std + _hash + _valid outputs
+  - Correct validation regex
+  - SHA256 hashing applied
+- ✅ **Phone Columns**:
+  - Exact pattern compliance
+  - phone_std, phone_hash, phone_valid outputs
+  - No deviations from templates
+
+#### 4. Data Processing Order
+- ✅ **Clean → Join → Dedupe sequence followed**
+- ✅ Joins use cleaned columns (not raw)
+- ✅ Deduplication uses cleaned columns (not raw)
+- ✅ No raw column names in PARTITION BY
+- ✅ Proper CTE structure
+
+#### 5. Configuration File Validation
+- ✅ **src_params.yml structure**:
+  - Valid YAML syntax
+  - All required fields present
+  - Correct dependency group structure
+  - Proper table configuration
+- ✅ **Table Configuration**:
+  - `name`, `source_db`, `staging_table` present
+  - `has_dedup` boolean correct
+  - `partition_columns` matches dedup logic
+  - `mode` set correctly (inc/full)
+
+#### 6. DIG File Validation
+- ✅ **Loop-based architecture** used
+- ✅ No repetitive table blocks
+- ✅ Correct template structure
+- ✅ Proper error handling
+- ✅ Inc_log table creation
+- ✅ Conditional processing logic
+
+#### 7. Treasure Data Compatibility
+- ✅ **Forbidden Types**:
+  - No TIMESTAMP types (must be VARCHAR/BIGINT)
+  - No BOOLEAN types (must be VARCHAR)
+  - No DOUBLE for timestamps
+- ✅ **Function Compatibility**:
+  - Presto: FORMAT_DATETIME, TD_TIME_PARSE
+  - Hive: date_format, unix_timestamp, from_unixtime
+- ✅ **Type Casting**:
+  - TRY_CAST used for safety
+  - All timestamps cast to VARCHAR or BIGINT
+
+#### 8. Incremental Processing
+- ✅ **State Management**:
+  - Correct inc_log table usage
+  - COALESCE(MAX(inc_value), 0) pattern
+  - Proper WHERE clause filtering
+- ✅ **Database References**:
+  - Source table: `{source_database}.{source_table}`
+  - Inc_log: `${lkup_db}.inc_log`
+  - Correct project_name filter
+
+#### 9. Engine-Specific Validation
+
+**For Presto/Trino:**
+- ✅ Files in `staging/` directory
+- ✅ Three SQL files (init, incremental, upsert if dedup)
+- ✅ Presto-compatible functions used
+- ✅ JSON: json_extract_scalar with NULLIF(UPPER(...), '')
+
+**For Hive:**
+- ✅ Files in `staging_hive/` directory
+- ✅ Single SQL file with INSERT OVERWRITE
+- ✅ Template files present (get_max_time.sql, get_stg_rows_for_delete.sql)
+- ✅ Hive-compatible functions used
+- ✅ JSON: REGEXP_EXTRACT with NULLIF(UPPER(...), '')
+- ✅ Timestamp keyword escaped with backticks
+
+---
+
+## Validation Options
+
+### Option 1: Validate Specific Table
+Provide:
+- **Table Name**: e.g., `client_src.customer_profiles`
+- **Engine**: `presto` or `hive`
+
+I will:
+1. Find all related files for the table
+2. Check against ALL quality gates
+3. Report detailed findings with line numbers
+4. Provide remediation guidance
+
+### Option 2: Validate All Tables
+Say: **"validate all"** or **"validate all Presto"** or **"validate all Hive"**
+
+I will:
+1. Find all transformation files
+2. Validate each table against quality gates
+3. Report comprehensive project status
+4. Identify any inconsistencies
+
+### Option 3: Validate Configuration Only
+Say: **"validate config"**
+
+I will:
+1. Check `staging/config/src_params.yml` (Presto)
+2. Check `staging_hive/config/src_params.yml` (Hive)
+3. Validate YAML syntax
+4. Verify table configurations
+5. Check dependency groups
+
+---
+
+## Validation Process
+
+### Step 1: File Discovery
+I will locate all relevant files:
+- SQL files (init, queries, upsert)
+- Configuration files (src_params.yml)
+- Workflow files (staging_transformation.dig or staging_hive.dig)
+
+### Step 2: Load and Parse
+I will read all files:
+- Parse SQL syntax
+- Parse YAML structure
+- Extract transformation logic
+- Identify patterns
+
+### Step 3: Quality Gate Checks
+I will verify each gate systematically:
+- Execute all validation rules
+- Identify violations
+- Collect evidence (line numbers, code samples)
+- Categorize severity
+
+### Step 4: Generate Report
+
+#### Pass Report (if all gates pass)
+```
+✅ VALIDATION PASSED
+
+Table: client_src.customer_profiles
+Engine: Presto
+Files validated: 3
+
+Quality Gates: 9/9 PASSED
+✅ File Structure Compliance
+✅ SQL File Validation
+✅ Data Transformation Standards
+✅ Data Processing Order
+✅ Configuration File Validation
+✅ DIG File Validation
+✅ Treasure Data Compatibility
+✅ Incremental Processing
+✅ Engine-Specific Validation
+
+Transformation Details:
+- Columns: 45 (within 200 limit)
+- Date columns: 3 (12 outputs - all have 4 outputs ✅)
+- JSON columns: 2 (all processed ✅)
+- Email columns: 1 (validated with hashing ✅)
+- Phone columns: 1 (exact pattern compliance ✅)
+- Deduplication: Yes (customer_id)
+- Joins: None
+
+No issues found. Transformation is production-ready.
+```
+
+#### Fail Report (if any gate fails)
+```
+❌ VALIDATION FAILED
+
+Table: client_src.customer_profiles
+Engine: Presto
+Files validated: 3
+
+Quality Gates: 6/9 PASSED
+
+✅ File Structure Compliance
+✅ SQL File Validation
+❌ Data Transformation Standards - FAILED
+  - Date column 'created_at' missing outputs
+    Line 45: Only has _std output, missing _unixtime and _date
+  - JSON column 'attributes' not processed
+    Line 67: Raw column passed through without extraction
+
+✅ Data Processing Order
+✅ Configuration File Validation
+❌ DIG File Validation - FAILED
+  - Using old repetitive block structure
+    File: staging/staging_transformation.dig
+    Issue: Should use loop-based architecture
+
+✅ Treasure Data Compatibility
+✅ Incremental Processing
+❌ Engine-Specific Validation - FAILED
+  - JSON extraction missing NULLIF(UPPER(...), '') wrapper
+    Line 72: json_extract_scalar used without wrapper
+    Should be: NULLIF(UPPER(json_extract_scalar(...)), '')
+
+CRITICAL ISSUES (Must Fix):
+1. Add missing date outputs for 'created_at' column
+2. Process 'attributes' JSON column with key extraction
+3. Migrate DIG file to loop-based architecture
+4. Wrap all JSON extractions with NULLIF(UPPER(...), '')
+
+RECOMMENDATIONS:
+- Re-read CLAUDE.md date processing requirements
+- Check JSON column detection logic
+- Use DIG template from documentation
+- Review JSON extraction patterns
+
+Re-validate after fixing issues.
+```
+
+---
+
+## Common Issues Detected
+
+### Data Transformation Violations
+- Missing date column outputs (not all 4)
+- JSON columns not processed
+- Raw columns used in deduplication
+- Phone/email patterns deviated from templates
+
+### Configuration Violations
+- Invalid YAML syntax
+- Missing required fields
+- Incorrect dependency group structure
+- Table name mismatches
+
+### DIG File Violations
+- Old repetitive block structure
+- Missing error handling
+- Incorrect template variables
+- No inc_log creation
+
+### Engine-Specific Violations
+- **Presto**: Wrong date functions, missing TRY_CAST
+- **Hive**: REGEXP_EXTRACT patterns wrong, timestamp not escaped
+
+### Incremental Processing Violations
+- Wrong WHERE clause pattern
+- Incorrect database references
+- Missing COALESCE in inc_log lookup
+
+---
+
+## Remediation Guidance
+
+### For Each Failed Gate:
+1. **Reference CLAUDE.md** - Re-read relevant section
+2. **Check Examples** - Review sample code in documentation
+3. **Fix Violations** - Apply exact patterns from templates
+4. **Re-validate** - Run validation again until passes
+
+### Quick Fixes:
+
+**Date Columns Missing Outputs:**
+```sql
+-- Add all 4 outputs for each date column
+column AS column,
+FORMAT_DATETIME(...) AS column_std,
+TD_TIME_PARSE(...) AS column_unixtime,
+SUBSTR(...) AS column_date
+```
+
+**JSON Columns Not Processed:**
+```sql
+-- Sample data, extract keys, add extractions
+NULLIF(UPPER(json_extract_scalar(json_col, '$.key')), '') AS json_col_key
+```
+
+**DIG File Old Structure:**
+- Delete old file
+- Use loop-based template from documentation
+- Update src_params.yml instead
+
+---
+
+## Next Steps After Validation
+
+### If Validation Passes
+✅ Transformation is production-ready:
+- Deploy with confidence
+- Test workflow execution
+- Monitor inc_log for health
+
+### If Validation Fails
+❌ Fix reported issues:
+1. Address critical issues first
+2. Apply exact patterns from CLAUDE.md
+3. Re-validate until all gates pass
+4. **DO NOT deploy failing transformations**
+
+---
+
+## Production Quality Assurance
+
+This validation ensures:
+- ✅ Transformations work correctly
+- ✅ Data quality maintained
+- ✅ No data loss or corruption
+- ✅ Consistent patterns across tables
+- ✅ Maintainable code
+- ✅ Compliance with standards
+
+---
+
+**What would you like to validate?**
+
+Options:
+1. **Validate specific table**: Provide table name and engine
+2. **Validate all tables**: Say "validate all"
+3. **Validate configuration**: Say "validate config"