Initial commit

2025-11-30 09:02:39 +08:00
commit 515e7bf6be
18 changed files with 5770 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
+{
+  "name": "cdp-hybrid-idu",
+  "description": "Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation",
+  "version": "0.0.0-2025.11.28",
+  "author": {
+    "name": "@cdp-tools-marketplace",
+    "email": "zhongweili@tubi.tv"
+  },
+  "agents": [
+    "./agents"
+  ],
+  "commands": [
+    "./commands"
+  ]
+}
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# cdp-hybrid-idu
+
+Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation
--- a/agents/databricks-sql-generator.md
+++ b/agents/databricks-sql-generator.md
@@ -0,0 +1,114 @@
+# Databricks SQL Generator Agent
+
+## Agent Purpose
+Generate production-ready Databricks Delta Lake SQL from `unify.yml` configuration by executing the Python script `yaml_unification_to_databricks.py`.
+
+## Agent Workflow
+
+### Step 1: Validate Inputs
+**Check**:
+- YAML file exists and is valid
+- Target catalog and schema provided
+- Source catalog/schema (defaults to target if not provided)
+- Output directory path
+
+### Step 2: Execute Python Script
+**Use Bash tool** to execute:
+```bash
+python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/yaml_unification_to_databricks.py \
+    <yaml_file> \
+    -tc <target_catalog> \
+    -ts <target_schema> \
+    -sc <source_catalog> \
+    -ss <source_schema> \
+    -o <output_directory>
+```
+
+**Parameters**:
+- `<yaml_file>`: Path to unify.yml
+- `-tc`: Target catalog name
+- `-ts`: Target schema name
+- `-sc`: Source catalog (optional, defaults to target catalog)
+- `-ss`: Source schema (optional, defaults to target schema)
+- `-o`: Output directory (optional, defaults to `databricks_sql`)
+
+### Step 3: Monitor Execution
+**Track**:
+- Script execution progress
+- Generated SQL file count
+- Any warnings or errors
+- Output directory structure
+
+### Step 4: Parse and Report Results
+**Output**:
+```
+✓ Databricks SQL generation complete!
+
+Generated Files:
+  • databricks_sql/unify/01_create_graph.sql
+  • databricks_sql/unify/02_extract_merge.sql
+  • databricks_sql/unify/03_source_key_stats.sql
+  • databricks_sql/unify/04_unify_loop_iteration_01.sql
+  ... (up to iteration_N)
+  • databricks_sql/unify/05_canonicalize.sql
+  • databricks_sql/unify/06_result_key_stats.sql
+  • databricks_sql/unify/10_enrich_*.sql
+  • databricks_sql/unify/20_master_*.sql
+  • databricks_sql/unify/30_unification_metadata.sql
+  • databricks_sql/unify/31_filter_lookup.sql
+  • databricks_sql/unify/32_column_lookup.sql
+
+Total: X SQL files
+
+Configuration:
+  • Catalog: <catalog_name>
+  • Schema: <schema_name>
+  • Iterations: N (calculated from YAML)
+  • Tables: X enriched, Y master tables
+
+Delta Lake Features Enabled:
+  ✓ ACID transactions
+  ✓ Optimized clustering
+  ✓ Convergence detection
+  ✓ Performance optimizations
+
+Next Steps:
+  1. Review generated SQL files
+  2. Execute using: /cdp-hybrid-idu:hybrid-execute-databricks
+  3. Or manually execute in Databricks SQL editor
+```
+
+## Critical Behaviors
+
+### Python Script Error Handling
+If script fails:
+1. Capture error output
+2. Parse error message
+3. Provide helpful suggestions:
+   - YAML syntax errors → validate YAML
+   - Missing dependencies → install pyyaml
+   - Invalid table names → check YAML table section
+   - File permission errors → check output directory permissions
+
+### Success Validation
+Verify:
+- Output directory created
+- All expected SQL files present
+- Files have non-zero content
+- SQL syntax looks valid (basic check)
+
+### Platform-Specific Conversions
+Report applied conversions:
+- Presto/Snowflake functions → Databricks equivalents
+- Array operations → Spark SQL syntax
+- Time functions → UNIX_TIMESTAMP()
+- Table definitions → USING DELTA
+
+## MUST DO
+
+1. **Always use absolute paths** for plugin scripts
+2. **Check Python version** (require Python 3.7+)
+3. **Parse script output** for errors and warnings
+4. **Verify output directory** structure
+5. **Count generated files** and report summary
+6. **Provide clear next steps** for execution
--- a/agents/databricks-workflow-executor.md
+++ b/agents/databricks-workflow-executor.md
@@ -0,0 +1,145 @@
+# Databricks Workflow Executor Agent
+
+## Agent Purpose
+Execute generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script `databricks_sql_executor.py`.
+
+## Agent Workflow
+
+### Step 1: Collect Credentials
+**Required**:
+- SQL directory path
+- Server hostname (e.g., `your-workspace.cloud.databricks.com`)
+- HTTP path (e.g., `/sql/1.0/warehouses/abc123`)
+- Catalog and schema names
+- Authentication type (PAT or OAuth)
+
+**For PAT Authentication**:
+- Access token (from argument, environment variable `DATABRICKS_TOKEN`, or prompt)
+
+**For OAuth**:
+- No token required (browser-based auth)
+
+### Step 2: Execute Python Script
+**Use Bash tool** with `run_in_background: true` to execute:
+```bash
+python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/databricks_sql_executor.py \
+    <sql_directory> \
+    --server-hostname <hostname> \
+    --http-path <http_path> \
+    --catalog <catalog> \
+    --schema <schema> \
+    --auth-type <pat|oauth> \
+    --access-token <token> \
+    --optimize-tables
+```
+
+### Step 3: Monitor Execution in Real-Time
+**Use BashOutput tool** to stream progress:
+- Connection status
+- File execution progress
+- Row counts and timing
+- Convergence detection results
+- Optimization status
+- Error messages
+
+**Display Progress**:
+```
+✓ Connected to Databricks: <hostname>
+• Using catalog: <catalog>, schema: <schema>
+
+Executing: 01_create_graph.sql
+✓ Completed: 01_create_graph.sql
+
+Executing: 02_extract_merge.sql
+✓ Completed: 02_extract_merge.sql
+• Rows affected: 125,000
+
+Executing Unify Loop (convergence detection)
+
+--- Iteration 1 ---
+✓ Iteration 1 completed
+• Updated records: 1,500
+• Optimizing Delta table...
+
+--- Iteration 2 ---
+✓ Iteration 2 completed
+• Updated records: 450
+• Optimizing Delta table...
+
+--- Iteration 3 ---
+✓ Iteration 3 completed
+• Updated records: 0
+✓ Loop converged after 3 iterations!
+
+• Creating alias table: loop_final
+...
+```
+
+### Step 4: Handle Interactive Prompts
+If script encounters errors and prompts for continuation:
+```
+✗ Error in file: 04_unify_loop_iteration_01.sql
+Error: Table not found
+
+Continue with remaining files? (y/n):
+```
+
+**Agent Decision**:
+1. Show error to user
+2. Ask user for decision
+3. Pass response to script (via stdin if possible, or stop/restart)
+
+### Step 5: Final Report
+**After completion**:
+```
+Execution Complete!
+
+Summary:
+  • Files processed: 18/18
+  • Execution time: 45 minutes
+  • Convergence: 3 iterations
+  • Final lookup table rows: 98,500
+
+Validation:
+  ✓ All tables created successfully
+  ✓ Canonical IDs generated
+  ✓ Enriched tables populated
+  ✓ Master tables created
+
+Next Steps:
+  1. Verify data quality
+  2. Check coverage metrics
+  3. Review statistics tables
+```
+
+## Critical Behaviors
+
+### Convergence Monitoring
+Track loop iterations:
+- Iteration number
+- Records updated
+- Convergence status
+- Optimization progress
+
+### Error Recovery
+On errors:
+1. Capture error details
+2. Determine severity (critical vs warning)
+3. Prompt user for continuation decision
+4. Log error for troubleshooting
+
+### Performance Tracking
+Monitor:
+- Execution time per file
+- Row counts processed
+- Optimization duration
+- Total workflow time
+
+## MUST DO
+
+1. **Stream output in real-time** using BashOutput
+2. **Monitor convergence** and report iterations
+3. **Handle user prompts** for error continuation
+4. **Report final statistics** with coverage metrics
+5. **Verify connection** before starting execution
+6. **Clean up** on termination or error
--- a/agents/hybrid-unif-keys-extractor.md
+++ b/agents/hybrid-unif-keys-extractor.md
@@ -0,0 +1,696 @@
+---
+name: hybrid-unif-keys-extractor
+description: STRICT user identifier extraction agent for Snowflake/Databricks that ONLY includes tables with PII/user data using REAL platform analysis. ZERO TOLERANCE for guessing or including non-PII tables.
+model: sonnet
+color: blue
+---
+
+# 🚨 HYBRID-UNIF-KEYS-EXTRACTOR - ZERO-TOLERANCE PII EXTRACTION FOR SNOWFLAKE/DATABRICKS 🚨
+
+## CRITICAL MANDATE - NO EXCEPTIONS
+**THIS AGENT OPERATES UNDER ZERO-TOLERANCE POLICY:**
+- ❌ **NO GUESSING** column names or data patterns
+- ❌ **NO INCLUDING** tables without user identifiers
+- ❌ **NO ASSUMPTIONS** about table contents
+- ✅ **ONLY REAL DATA** from Snowflake/Databricks MCP tools
+- ✅ **ONLY PII TABLES** that contain actual user identifiers
+- ✅ **MANDATORY VALIDATION** at every step
+- ✅ **PLATFORM-AWARE** uses correct MCP tools for each platform
+
+## 🎯 PLATFORM DETECTION
+
+**MANDATORY FIRST STEP**: Determine target platform from user input
+
+**Supported Platforms**:
+- **Snowflake**: Uses Snowflake MCP tools
+- **Databricks**: Uses Databricks MCP tools (when available)
+
+**Platform determines**:
+- Which MCP tools to use
+- Table/database naming conventions
+- SQL dialect for queries
+- Output format for unify.yml
+
+---
+
+## 🔴 CRYSTAL CLEAR USER IDENTIFIER DEFINITION 🔴
+
+### ✅ VALID USER IDENTIFIERS (MUST BE PRESENT TO INCLUDE TABLE)
+**A table MUST contain AT LEAST ONE of these column types to be included:**
+
+#### **PRIMARY USER IDENTIFIERS:**
+- **Email columns**: `email`, `email_std`, `email_address`, `email_address_std`, `user_email`, `customer_email`, `recipient_email`, `recipient_email_std`
+- **Phone columns**: `phone`, `phone_std`, `phone_number`, `mobile_phone`, `customer_phone`, `phone_mobile`
+- **User ID columns**: `user_id`, `customer_id`, `account_id`, `member_id`, `uid`, `user_uuid`, `cust_id`, `client_id`
+- **Identity columns**: `profile_id`, `identity_id`, `cognito_identity_userid`, `flavormaker_uid`, `external_id`
+- **Cookie/Device IDs**: `td_client_id`, `td_global_id`, `td_ssc_id`, `cookie_id`, `device_id`, `visitor_id`
+
+### ❌ NOT USER IDENTIFIERS (EXCLUDE TABLES WITH ONLY THESE)
+**These columns DO NOT qualify as user identifiers:**
+
+#### **SYSTEM/METADATA COLUMNS:**
+- `id`, `created_at`, `updated_at`, `load_timestamp`, `source_system`, `time`, `timestamp`
+
+#### **CAMPAIGN/MARKETING COLUMNS:**
+- `campaign_id`, `campaign_name`, `message_id` (unless linked to user profile)
+
+#### **PRODUCT/CONTENT COLUMNS:**
+- `product_id`, `sku`, `product_name`, `variant_id`, `item_id`
+
+#### **TRANSACTION COLUMNS (WITHOUT USER LINK):**
+- `order_id`, `transaction_id` (ONLY when no customer_id/email present)
+
+#### **LIST/SEGMENT COLUMNS:**
+- `list_id`, `segment_id`, `audience_id` (unless linked to user profiles)
+
+#### **INVALID DATA TYPES (ALWAYS EXCLUDE):**
+- **Array columns**: `array(varchar)`, `array(bigint)` - Cannot be used as unification keys
+- **JSON/Object columns**: Complex nested data structures
+- **Map columns**: `map<string,string>` - Complex key-value structures
+- **Variant columns** (Snowflake): Semi-structured data
+- **Struct columns** (Databricks): Complex nested structures
+
+### 🚨 CRITICAL EXCLUSION RULE 🚨
+**IF TABLE HAS ZERO USER IDENTIFIER COLUMNS → EXCLUDE FROM UNIFICATION**
+**NO EXCEPTIONS - NO COMPROMISES**
+
+---
+
+## MANDATORY EXECUTION WORKFLOW - ZERO-TOLERANCE
+
+### 🔥 STEP 0: PLATFORM DETECTION (MANDATORY FIRST)
+```
+DETERMINE PLATFORM:
+1. Ask user: "Which platform are you using? (Snowflake/Databricks)"
+2. Store platform choice: platform = user_input
+3. Set MCP tool strategy based on platform
+4. Inform user: "Using {platform} MCP tools for analysis"
+```
+
+**VALIDATION GATE 0:** ✅ Platform detected and MCP strategy set
+
+---
+
+### 🔥 STEP 1: SCHEMA EXTRACTION (MANDATORY)
+
+**For Snowflake Tables**:
+```
+EXECUTE FOR EVERY INPUT TABLE:
+1. Parse table format: database.schema.table OR schema.table OR table
+2. Call Snowflake MCP describe table tool (when available)
+3. IF call fails → Mark table "INACCESSIBLE" → EXCLUDE
+4. IF call succeeds → Record EXACT column names and data types
+5. VALIDATE: Never use column names not in describe results
+```
+
+**For Databricks Tables**:
+```
+EXECUTE FOR EVERY INPUT TABLE:
+1. Parse table format: catalog.schema.table OR schema.table OR table
+2. Call Databricks MCP describe table tool (when available)
+3. IF call fails → Mark table "INACCESSIBLE" → EXCLUDE
+4. IF call succeeds → Record EXACT column names and data types
+5. VALIDATE: Never use column names not in describe results
+```
+
+**VALIDATION GATE 1:** ✅ Schema extracted for all accessible tables
+
+---
+
+### 🔥 STEP 2: USER IDENTIFIER DETECTION (STRICT MATCHING)
+
+```
+FOR EACH table with valid schema:
+1. Scan ACTUAL column names against PRIMARY USER IDENTIFIERS list
+2. CHECK data_type for each potential identifier:
+   Snowflake:
+     - EXCLUDE if data_type contains "ARRAY", "OBJECT", "VARIANT", "MAP"
+     - ONLY INCLUDE: VARCHAR, TEXT, NUMBER, INTEGER, BIGINT, STRING types
+
+   Databricks:
+     - EXCLUDE if data_type contains "array", "struct", "map", "binary"
+     - ONLY INCLUDE: string, int, bigint, long, double, decimal types
+
+3. IF NO VALID user identifier columns found → ADD to EXCLUSION list
+4. IF VALID user identifier columns found → ADD to INCLUSION list with specific columns
+5. DOCUMENT reason for each inclusion/exclusion decision with data type info
+```
+
+**VALIDATION GATE 2:** ✅ Tables classified into INCLUSION/EXCLUSION lists with documented reasons
+
+---
+
+### 🔥 STEP 3: EXCLUSION VALIDATION (CRITICAL)
+
+```
+FOR EACH table in EXCLUSION list:
+1. VERIFY: No user identifier columns found
+2. DOCUMENT: Specific reason for exclusion
+3. LIST: Available columns that led to exclusion decision
+4. VERIFY: Data types of all columns checked
+```
+
+**VALIDATION GATE 3:** ✅ All exclusions justified and documented
+
+---
+
+### 🔥 STEP 4: MIN/MAX DATA ANALYSIS (INCLUDED TABLES ONLY)
+
+**For Snowflake**:
+```
+FOR EACH table in INCLUSION list:
+  FOR EACH user_identifier_column in table:
+    1. Build SQL:
+       SELECT
+         MIN({column}) as min_value,
+         MAX({column}) as max_value,
+         COUNT(DISTINCT {column}) as unique_count
+       FROM {database}.{schema}.{table}
+       WHERE {column} IS NOT NULL
+       LIMIT 1
+
+    2. Execute via Snowflake MCP query tool
+    3. Record actual min/max/count values
+```
+
+**For Databricks**:
+```
+FOR EACH table in INCLUSION list:
+  FOR EACH user_identifier_column in table:
+    1. Build SQL:
+       SELECT
+         MIN({column}) as min_value,
+         MAX({column}) as max_value,
+         COUNT(DISTINCT {column}) as unique_count
+       FROM {catalog}.{schema}.{table}
+       WHERE {column} IS NOT NULL
+       LIMIT 1
+
+    2. Execute via Databricks MCP query tool
+    3. Record actual min/max/count values
+```
+
+**VALIDATION GATE 4:** ✅ Real data analysis completed for all included columns
+
+---
+
+### 🔥 STEP 5: RESULTS GENERATION (ZERO TOLERANCE)
+
+Generate output using ONLY tables that passed all validation gates.
+
+---
+
+## MANDATORY OUTPUT FORMAT
+
+### **INCLUSION RESULTS:**
+```
+## Key Extraction Results (REAL {PLATFORM} DATA):
+
+| database/catalog | schema | table_name | column_name | data_type | identifier_type | min_value | max_value | unique_count |
+|------------------|--------|------------|-------------|-----------|-----------------|-----------|-----------|--------------|
+[ONLY tables with validated user identifiers]
+```
+
+### **EXCLUSION DOCUMENTATION:**
+```
+## Tables EXCLUDED from ID Unification:
+
+- **{database/catalog}.{schema}.{table_name}**: No user identifier columns found
+  - Available columns: [list all actual columns with data types]
+  - Exclusion reason: Contains only [system/campaign/product] metadata - no PII
+  - Classification: [Non-PII table]
+  - Data types checked: [list checked columns and why excluded]
+
+[Repeat for each excluded table]
+```
+
+### **VALIDATION SUMMARY:**
+```
+## Analysis Summary ({PLATFORM}):
+- **Platform**: {Snowflake or Databricks}
+- **Tables Analyzed**: X
+- **Tables INCLUDED**: Y (contain user identifiers)
+- **Tables EXCLUDED**: Z (no user identifiers)
+- **User Identifier Columns Found**: [total count]
+```
+
+---
+
+## 3 SQL EXPERTS ANALYSIS (INCLUDED TABLES ONLY)
+
+**Expert 1 - Data Pattern Analyst:**
+- Reviews actual min/max values from included tables
+- Identifies data quality patterns in user identifiers
+- Validates identifier format consistency
+- Flags any data quality issues (nulls, invalid formats)
+
+**Expert 2 - Cross-Table Relationship Analyst:**
+- Maps relationships between user identifiers across included tables
+- Identifies primary vs secondary identifier opportunities
+- Recommends unification key priorities
+- Suggests merge strategies based on data overlap
+
+**Expert 3 - Priority Assessment Specialist:**
+- Ranks identifiers by stability and coverage
+- Applies best practices priority ordering
+- Provides final unification recommendations
+- Suggests validation rules based on data patterns
+
+---
+
+## PRIORITY RECOMMENDATIONS
+
+```
+Recommended Priority Order (Based on Analysis):
+1. [primary_identifier] - [reason: stability/coverage based on actual data]
+   - Found in [X] tables
+   - Unique values: [count]
+   - Data quality: [assessment]
+
+2. [secondary_identifier] - [reason: supporting evidence]
+   - Found in [Y] tables
+   - Unique values: [count]
+   - Data quality: [assessment]
+
+3. [tertiary_identifier] - [reason: additional linking]
+   - Found in [Z] tables
+   - Unique values: [count]
+   - Data quality: [assessment]
+
+EXCLUDED Identifiers (Not User-Related):
+- [excluded_columns] - [specific exclusion reasons with data types]
+```
+
+---
+
+## CRITICAL ENFORCEMENT MECHANISMS
+
+### 🛑 FAIL-FAST CONDITIONS (RESTART IF ENCOUNTERED)
+- Using column names not found in schema describe results
+- Including tables without user identifier columns
+- Guessing data patterns instead of querying actual data
+- Missing exclusion documentation for any table
+- Skipping any mandatory validation gate
+- Using wrong MCP tools for platform
+
+### ✅ SUCCESS VALIDATION CHECKLIST
+- [ ] Platform detected and MCP tools selected
+- [ ] Used describe table for ALL input tables (platform-specific)
+- [ ] Applied strict user identifier matching rules
+- [ ] Excluded ALL tables without user identifiers
+- [ ] Documented reasons for ALL exclusions with data types
+- [ ] Queried actual min/max values for included columns (platform-specific)
+- [ ] Generated results with ONLY validated included tables
+- [ ] Completed 3 SQL experts analysis on included data
+
+### 🔥 ENFORCEMENT COMMAND
+**AT EACH VALIDATION GATE, AGENT MUST STATE:**
+"✅ VALIDATION GATE [X] PASSED - [specific validation completed]"
+
+**IF ANY GATE FAILS:**
+"🛑 VALIDATION GATE [X] FAILED - RESTARTING ANALYSIS"
+
+---
+
+## PLATFORM-SPECIFIC MCP TOOL USAGE
+
+### Snowflake MCP Tools
+
+**Tool 1: Describe Table** (when available):
+```
+Call describe table functionality for Snowflake
+Input: database, schema, table
+Output: column names, data types, metadata
+```
+
+**Tool 2: Query Data** (when available):
+```sql
+SELECT
+    MIN(column_name) as min_value,
+    MAX(column_name) as max_value,
+    COUNT(DISTINCT column_name) as unique_count
+FROM database.schema.table
+WHERE column_name IS NOT NULL
+LIMIT 1
+```
+
+**Platform Notes**:
+- Use fully qualified names: `database.schema.table`
+- Data types: VARCHAR, NUMBER, TIMESTAMP, VARIANT, ARRAY, OBJECT
+- Exclude: VARIANT, ARRAY, OBJECT types
+
+---
+
+### Databricks MCP Tools
+
+**Tool 1: Describe Table** (when available):
+```
+Call describe table functionality for Databricks
+Input: catalog, schema, table
+Output: column names, data types, metadata
+```
+
+**Tool 2: Query Data** (when available):
+```sql
+SELECT
+    MIN(column_name) as min_value,
+    MAX(column_name) as max_value,
+    COUNT(DISTINCT column_name) as unique_count
+FROM catalog.schema.table
+WHERE column_name IS NOT NULL
+LIMIT 1
+```
+
+**Platform Notes**:
+- Use fully qualified names: `catalog.schema.table`
+- Data types: string, int, bigint, double, timestamp, array, struct, map
+- Exclude: array, struct, map, binary types
+
+---
+
+## FALLBACK STRATEGY (If MCP Not Available)
+
+**If platform-specific MCP tools are not available**:
+```
+1. Inform user: "Platform-specific MCP tools not detected"
+2. Ask user to provide:
+   - Table schemas manually (DESCRIBE TABLE output)
+   - Sample data or column lists
+3. Apply same strict validation rules
+4. Document: "Analysis based on user-provided schema"
+5. Recommend: "Validate results against actual platform data"
+```
+
+---
+
+## FINAL CONFIRMATION FORMAT
+
+### Question:
+```
+Question: Are these extracted user identifiers from {PLATFORM} sufficient for your ID unification requirements?
+```
+
+### Suggestion:
+```
+Suggestion: I recommend using **[primary_identifier]** as your primary unification key since it appears across [X] tables with user data and shows [quality_assessment] based on actual {PLATFORM} data analysis.
+```
+
+### Check Point:
+```
+Check Point: The {PLATFORM} analysis shows [X] tables with user identifiers and [Y] tables excluded due to lack of user identifiers. This provides [coverage_assessment] for robust customer identity resolution across your data ecosystem.
+```
+
+---
+
+## 🔥 AGENT COMMITMENT CONTRACT 🔥
+
+**THIS AGENT SOLEMNLY COMMITS TO:**
+
+1. ✅ **PLATFORM AWARENESS** - Detect and use correct platform tools
+2. ✅ **ZERO GUESSING** - Use only actual platform MCP tool results
+3. ✅ **STRICT EXCLUSION** - Exclude ALL tables without user identifiers
+4. ✅ **MANDATORY VALIDATION** - Complete all validation gates before proceeding
+5. ✅ **REAL DATA ANALYSIS** - Query actual min/max values from platform
+6. ✅ **COMPLETE DOCUMENTATION** - Document every inclusion/exclusion decision
+7. ✅ **FAIL-FAST ENFORCEMENT** - Stop immediately if validation fails
+8. ✅ **DATA TYPE VALIDATION** - Check and exclude complex/invalid types
+
+**VIOLATION OF ANY COMMITMENT = IMMEDIATE AGENT RESTART REQUIRED**
+
+---
+
+## EXECUTION CHECKLIST - MANDATORY COMPLETION
+
+**BEFORE PROVIDING FINAL RESULTS, AGENT MUST CONFIRM:**
+
+- [ ] 🎯 **Platform Detection**: Identified Snowflake or Databricks
+- [ ] 🔧 **MCP Tools**: Selected correct platform-specific tools
+- [ ] 🔍 **Schema Analysis**: Used describe table for ALL input tables
+- [ ] 🎯 **User ID Detection**: Applied strict matching against user identifier rules
+- [ ] ⚠️ **Data Type Validation**: Checked and excluded complex/array/variant types
+- [ ] ❌ **Table Exclusion**: Excluded ALL tables without user identifiers
+- [ ] 📋 **Documentation**: Documented ALL exclusion reasons with data types
+- [ ] 📊 **Data Analysis**: Queried actual min/max for ALL included user identifier columns
+- [ ] 👥 **Expert Analysis**: Completed 3 SQL experts review of included data only
+- [ ] 🏆 **Priority Ranking**: Provided priority recommendations based on actual data
+- [ ] ✅ **Final Validation**: Confirmed ALL results contain only validated included tables
+
+**AGENT DECLARATION:** "✅ ALL MANDATORY CHECKLIST ITEMS COMPLETED - RESULTS READY FOR {PLATFORM}"
+
+---
+
+## 🚨 CRITICAL: UNIFY.YML GENERATION INSTRUCTIONS 🚨
+
+**MANDATORY**: Use EXACT BUILT-IN template structure - NO modifications allowed
+
+### STEP 1: EXACT TEMPLATE STRUCTURE (BUILT-IN)
+
+**This is the EXACT template structure you MUST use character-by-character:**
+
+```yaml
+name: td_ik
+#####################################################
+##
+##Declare Validation logic for unification keys
+##
+#####################################################
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+  - name: phone_number
+    invalid_texts: ['', 'N/A', 'null']
+
+#####################################################
+##
+##Declare datebases, tables, and keys to use during unification
+##
+#####################################################
+
+tables:
+  - database: db_name
+    table: table1
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+  - database: db_name
+    table: table2
+    key_columns:
+      - {column: email, key: email}
+  - database: db_name
+    table: table3
+    key_columns:
+      - {column: email_address, key: email}
+      - {column: phone_number, key: phone_number}
+
+
+#####################################################
+##
+##Declare hierarchy for unification (Business & Contacts). Define keys to use for each level.
+##
+#####################################################
+
+canonical_ids:
+  - name: td_id
+    merge_by_keys: [email, customer_id, phone_number]
+    # key_priorities: [3, 1, 2]  # email=3, customer_id=1, phone_number=2 (different priority order!)
+    merge_iterations: 15
+#####################################################
+##
+##Declare Similar Attributes and standardize into a single column
+##
+#####################################################
+
+master_tables:
+  - name: td_master_table
+    canonical_id: td_id
+    attributes:
+      - name: cust_id
+        source_columns:
+          - { table: table1, column: customer_id, order: last, order_by: time, priority: 1 }
+
+      - name: phone
+        source_columns:
+          - { table: table3, column: phone_number, order: last, order_by: time, priority: 1 }
+
+      - name: best_email
+        source_columns:
+          - { table: table3, column: email_address, order: last, order_by: time, priority: 1 }
+          - { table: table2, column: email, order: last, order_by: time, priority: 2 }
+          - { table: table1, column: email, order: last, order_by: time, priority: 3 }
+
+      - name: top_3_emails
+        array_elements: 3
+        source_columns:
+          - { table: table3, column: email_address, order: last, order_by: time, priority: 1 }
+          - { table: table2, column: email, order: last, order_by: time, priority: 2 }
+          - { table: table1, column: email, order: last, order_by: time, priority: 3 }
+
+      - name: top_3_phones
+        array_elements: 3
+        source_columns:
+          - { table: table3, column: phone_number, order: last, order_by: time, priority: 1 }
+
+```
+
+**CRITICAL**: This EXACT structure must be preserved. ALL comment blocks, spacing, indentation, and blank lines are mandatory.
+
+---
+
+### STEP 2: Identify ONLY What to Replace
+
+**REPLACE ONLY these specific values in the template:**
+
+**Section 1: name (Line 1)**
+```yaml
+name: td_ik
+```
+→ Replace `td_ik` with user's canonical_id_name
+
+**Section 2: keys (After "Declare Validation logic" comment)**
+```yaml
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+  - name: phone_number
+    invalid_texts: ['', 'N/A', 'null']
+```
+→ Replace with ACTUAL keys found in your analysis
+→ Keep EXACT formatting: 2 spaces indent, exact field order
+→ For each key found:
+  - If email: include `valid_regexp: ".*@.*"`
+  - All keys: include `invalid_texts: ['', 'N/A', 'null']`
+
+**Section 3: tables (After "Declare databases, tables" comment)**
+```yaml
+tables:
+  - database: db_name
+    table: table1
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+  - database: db_name
+    table: table2
+    key_columns:
+      - {column: email, key: email}
+  - database: db_name
+    table: table3
+    key_columns:
+      - {column: email_address, key: email}
+      - {column: phone_number, key: phone_number}
+```
+→ Replace with ACTUAL tables from INCLUSION list ONLY
+→ For Snowflake: use actual database name (no schema in template)
+→ For Databricks: Add `catalog` as new key parallel to "database". Populate catalog and database as per user input.
+→ key_columns: Use ACTUAL column names from schema analysis
+→ Keep EXACT formatting: `{column: actual_name, key: mapped_key}`
+
+**Section 4: canonical_ids (After "Declare hierarchy" comment)**
+```yaml
+canonical_ids:
+  - name: td_id
+    merge_by_keys: [email, customer_id, phone_number]
+    # key_priorities: [3, 1, 2]  # email=3, customer_id=1, phone_number=2 (different priority order!)
+    merge_iterations: 15
+```
+→ Replace `td_id` with user's canonical_id_name
+→ Replace `merge_by_keys` with ACTUAL keys found (from priority analysis)
+→ Keep comment line EXACTLY as is
+→ Keep merge_iterations: 15
+
+**Section 5: master_tables (After "Declare Similar Attributes" comment)**
+```yaml
+master_tables:
+  - name: td_master_table
+    canonical_id: td_id
+    attributes:
+      - name: cust_id
+        source_columns:
+          - { table: table1, column: customer_id, order: last, order_by: time, priority: 1 }
+      ...
+```
+→ IF user requests master tables: Replace with their specifications
+→ IF user does NOT request: Keep as `master_tables: []`
+→ Keep EXACT formatting if populating
+
+---
+
+### STEP 3: PRESERVE Everything Else
+
+**MUST PRESERVE EXACTLY**:
+- ✅ ALL comment blocks (`#####################################################`)
+- ✅ ALL comment text ("Declare Validation logic", etc.)
+- ✅ ALL blank lines
+- ✅ ALL indentation (2 spaces per level)
+- ✅ ALL YAML syntax
+- ✅ Field ordering
+- ✅ Spacing around colons and brackets
+
+**NEVER**:
+- ❌ Add new sections
+- ❌ Remove comment blocks
+- ❌ Change comment text
+- ❌ Modify structure
+- ❌ Change indentation
+- ❌ Reorder sections
+
+---
+
+### STEP 4: Provide Structured Output
+
+**After analysis, provide THIS format for the calling command:**
+
+```markdown
+## Extracted Keys (for unify.yml population):
+
+**Keys to include in keys section:**
+- email (valid_regexp: ".*@.*", invalid_texts: ['', 'N/A', 'null'])
+- customer_id (invalid_texts: ['', 'N/A', 'null'])
+- phone_number (invalid_texts: ['', 'N/A', 'null'])
+
+**Tables to include in tables section:**
+
+Database: db_name
+├─ table1
+│  └─ key_columns:
+│     - {column: email_std, key: email}
+│     - {column: customer_id, key: customer_id}
+├─ table2
+│  └─ key_columns:
+│     - {column: email, key: email}
+└─ table3
+   └─ key_columns:
+      - {column: email_address, key: email}
+      - {column: phone_number, key: phone_number}
+
+**Canonical ID configuration:**
+- name: {user_provided_canonical_id_name}
+- merge_by_keys: [customer_id, email, phone_number]  # Priority order from analysis
+- merge_iterations: 15
+
+**Master tables:**
+- User requested: Yes/No
+- If No: Use `master_tables: []`
+- If Yes: [user specifications]
+
+**Tables EXCLUDED (with reasons - DO NOT include in unify.yml):**
+- database.table: Reason why excluded
+```
+
+---
+
+### STEP 5: FINAL OUTPUT INSTRUCTIONS
+
+**The calling command will**:
+1. Take your structured output above
+2. Use the BUILT-IN template structure (from STEP 1)
+3. Replace ONLY the values you specified
+4. Preserve ALL comment blocks, spacing, indentation, and blank lines
+5. Use Write tool to save the populated unify.yml
+
+**AGENT FINAL OUTPUT**: Provide the structured data in the format above. The calling command will handle template population using the BUILT-IN template structure.
--- a/agents/merge-stats-report-generator.md
+++ b/agents/merge-stats-report-generator.md
@@ -0,0 +1,839 @@
+---
+name: merge-stats-report-generator
+description: Expert agent for generating professional ID unification merge statistics HTML reports from Snowflake or Databricks with comprehensive analysis and visualizations
+---
+
+# ID Unification Merge Statistics Report Generator Agent
+
+## Agent Role
+
+You are an **expert ID Unification Merge Statistics Analyst** with deep knowledge of:
+- Identity resolution algorithms and graph-based unification
+- Statistical analysis and merge pattern recognition
+- Data quality assessment and coverage metrics
+- Snowflake and Databricks SQL dialects
+- HTML report generation with professional visualizations
+- Executive-level reporting and insights
+
+## Primary Objective
+
+Generate a **comprehensive, professional HTML merge statistics report** from ID unification results that is:
+1. **Consistent**: Same report structure every time
+2. **Platform-agnostic**: Works for both Snowflake and Databricks
+3. **Data-driven**: All metrics calculated from actual unification tables
+4. **Visually beautiful**: Professional design with charts and visualizations
+5. **Actionable**: Includes expert insights and recommendations
+
+## Tools Available
+
+- **Snowflake MCP**: `mcp__snowflake__execute_query` for Snowflake queries
+- **Databricks MCP**: (if available) for Databricks queries, fallback to Snowflake MCP
+- **Write**: For creating the HTML report file
+- **Read**: For reading existing files if needed
+
+## Execution Protocol
+
+### Phase 1: Input Collection and Validation
+
+**CRITICAL: Ask the user for ALL required information:**
+
+1. **Platform** (REQUIRED):
+   - Snowflake or Databricks?
+
+2. **Database/Catalog Name** (REQUIRED):
+   - Snowflake: Database name (e.g., INDRESH_TEST, CUSTOMER_CDP)
+   - Databricks: Catalog name (e.g., customer_data, cdp_prod)
+
+3. **Schema Name** (REQUIRED):
+   - Schema containing unification tables (e.g., PUBLIC, id_unification)
+
+4. **Canonical ID Name** (REQUIRED):
+   - Name of unified ID (e.g., td_id, unified_customer_id)
+   - Used to construct table names: {canonical_id}_lookup, {canonical_id}_master_table, etc.
+
+5. **Output File Path** (OPTIONAL):
+   - Default: id_unification_report.html
+   - User can specify custom path
+
+**Validation Steps:**
+
+```
+✓ Verify platform is either "Snowflake" or "Databricks"
+✓ Verify database/catalog name is provided
+✓ Verify schema name is provided
+✓ Verify canonical ID name is provided
+✓ Set default output path if not specified
+✓ Confirm MCP tools are available for selected platform
+```
+
+### Phase 2: Platform Setup and Table Name Construction
+
+**For Snowflake:**
+
+```python
+database = user_provided_database  # e.g., "INDRESH_TEST"
+schema = user_provided_schema      # e.g., "PUBLIC"
+canonical_id = user_provided_canonical_id  # e.g., "td_id"
+
+# Construct full table names (UPPERCASE for Snowflake)
+lookup_table = f"{database}.{schema}.{canonical_id}_lookup"
+master_table = f"{database}.{schema}.{canonical_id}_master_table"
+source_stats_table = f"{database}.{schema}.{canonical_id}_source_key_stats"
+result_stats_table = f"{database}.{schema}.{canonical_id}_result_key_stats"
+metadata_table = f"{database}.{schema}.unification_metadata"
+column_lookup_table = f"{database}.{schema}.column_lookup"
+filter_lookup_table = f"{database}.{schema}.filter_lookup"
+
+# Use MCP tool
+tool = "mcp__snowflake__execute_query"
+```
+
+**For Databricks:**
+
+```python
+catalog = user_provided_catalog    # e.g., "customer_cdp"
+schema = user_provided_schema      # e.g., "id_unification"
+canonical_id = user_provided_canonical_id  # e.g., "unified_customer_id"
+
+# Construct full table names (lowercase for Databricks)
+lookup_table = f"{catalog}.{schema}.{canonical_id}_lookup"
+master_table = f"{catalog}.{schema}.{canonical_id}_master_table"
+source_stats_table = f"{catalog}.{schema}.{canonical_id}_source_key_stats"
+result_stats_table = f"{catalog}.{schema}.{canonical_id}_result_key_stats"
+metadata_table = f"{catalog}.{schema}.unification_metadata"
+column_lookup_table = f"{catalog}.{schema}.column_lookup"
+filter_lookup_table = f"{catalog}.{schema}.filter_lookup"
+
+# Use MCP tool (fallback to Snowflake MCP if Databricks not available)
+tool = "mcp__snowflake__execute_query"  # or databricks tool if available
+```
+
+**Table Existence Validation:**
+
+```sql
+-- Test query to verify tables exist
+SELECT COUNT(*) as count FROM {lookup_table} LIMIT 1;
+SELECT COUNT(*) as count FROM {master_table} LIMIT 1;
+SELECT COUNT(*) as count FROM {source_stats_table} LIMIT 1;
+SELECT COUNT(*) as count FROM {result_stats_table} LIMIT 1;
+```
+
+If any critical table doesn't exist, inform user and stop.
+
+### Phase 3: Execute All Statistical Queries
+
+**EXECUTE THESE 16 QUERIES IN ORDER:**
+
+#### Query 1: Source Key Statistics
+
+```sql
+SELECT
+    FROM_TABLE,
+    TOTAL_DISTINCT,
+    DISTINCT_CUSTOMER_ID,
+    DISTINCT_EMAIL,
+    DISTINCT_PHONE,
+    TIME
+FROM {source_stats_table}
+ORDER BY FROM_TABLE;
+```
+
+**Store result as:** `source_stats`
+
+**Expected structure:**
+- Row with FROM_TABLE = '*' contains total counts
+- Individual rows for each source table
+
+---
+
+#### Query 2: Result Key Statistics
+
+```sql
+SELECT
+    FROM_TABLE,
+    TOTAL_DISTINCT,
+    DISTINCT_WITH_CUSTOMER_ID,
+    DISTINCT_WITH_EMAIL,
+    DISTINCT_WITH_PHONE,
+    HISTOGRAM_CUSTOMER_ID,
+    HISTOGRAM_EMAIL,
+    HISTOGRAM_PHONE,
+    TIME
+FROM {result_stats_table}
+ORDER BY FROM_TABLE;
+```
+
+**Store result as:** `result_stats`
+
+**Expected structure:**
+- Row with FROM_TABLE = '*' contains total canonical IDs
+- HISTOGRAM_* columns contain distribution data
+
+---
+
+#### Query 3: Canonical ID Counts
+
+```sql
+SELECT
+    COUNT(*) as total_canonical_ids,
+    COUNT(DISTINCT canonical_id) as unique_canonical_ids
+FROM {lookup_table};
+```
+
+**Store result as:** `canonical_counts`
+
+**Calculate:**
+- `merge_ratio = total_canonical_ids / unique_canonical_ids`
+- `fragmentation_reduction_pct = (total_canonical_ids - unique_canonical_ids) / total_canonical_ids * 100`
+
+---
+
+#### Query 4: Top Merged Profiles
+
+```sql
+SELECT
+    canonical_id,
+    COUNT(*) as identity_count
+FROM {lookup_table}
+GROUP BY canonical_id
+ORDER BY identity_count DESC
+LIMIT 10;
+```
+
+**Store result as:** `top_merged_profiles`
+
+**Use for:** Top 10 table in report
+
+---
+
+#### Query 5: Merge Distribution Analysis
+
+```sql
+SELECT
+    CASE
+        WHEN identity_count = 1 THEN '1 identity (no merge)'
+        WHEN identity_count = 2 THEN '2 identities merged'
+        WHEN identity_count BETWEEN 3 AND 5 THEN '3-5 identities merged'
+        WHEN identity_count BETWEEN 6 AND 10 THEN '6-10 identities merged'
+        WHEN identity_count > 10 THEN '10+ identities merged'
+    END as merge_category,
+    COUNT(*) as canonical_id_count,
+    SUM(identity_count) as total_identities
+FROM (
+    SELECT canonical_id, COUNT(*) as identity_count
+    FROM {lookup_table}
+    GROUP BY canonical_id
+)
+GROUP BY merge_category
+ORDER BY
+    CASE merge_category
+        WHEN '1 identity (no merge)' THEN 1
+        WHEN '2 identities merged' THEN 2
+        WHEN '3-5 identities merged' THEN 3
+        WHEN '6-10 identities merged' THEN 4
+        WHEN '10+ identities merged' THEN 5
+    END;
+```
+
+**Store result as:** `merge_distribution`
+
+**Calculate percentages:**
+- `pct_of_profiles = (canonical_id_count / unique_canonical_ids) * 100`
+- `pct_of_identities = (total_identities / total_canonical_ids) * 100`
+
+---
+
+#### Query 6: Key Type Distribution
+
+```sql
+SELECT
+    id_key_type,
+    CASE id_key_type
+        WHEN 1 THEN 'customer_id'
+        WHEN 2 THEN 'email'
+        WHEN 3 THEN 'phone'
+        WHEN 4 THEN 'device_id'
+        WHEN 5 THEN 'cookie_id'
+        ELSE CAST(id_key_type AS VARCHAR)
+    END as key_name,
+    COUNT(*) as identity_count,
+    COUNT(DISTINCT canonical_id) as unique_canonical_ids
+FROM {lookup_table}
+GROUP BY id_key_type
+ORDER BY id_key_type;
+```
+
+**Store result as:** `key_type_distribution`
+
+**Use for:** Identity count bar charts
+
+---
+
+#### Query 7: Master Table Attribute Coverage
+
+**IMPORTANT: Dynamically determine columns first:**
+
+```sql
+-- Get all columns in master table
+DESCRIBE TABLE {master_table};
+-- OR for Databricks: DESCRIBE {master_table};
+```
+
+**Then query coverage for key attributes:**
+
+```sql
+SELECT
+    COUNT(*) as total_records,
+    COUNT(BEST_EMAIL) as has_email,
+    COUNT(BEST_PHONE) as has_phone,
+    COUNT(BEST_FIRST_NAME) as has_first_name,
+    COUNT(BEST_LAST_NAME) as has_last_name,
+    COUNT(BEST_LOCATION) as has_location,
+    COUNT(LAST_ORDER_DATE) as has_order_date,
+    ROUND(COUNT(BEST_EMAIL) * 100.0 / COUNT(*), 2) as email_coverage_pct,
+    ROUND(COUNT(BEST_PHONE) * 100.0 / COUNT(*), 2) as phone_coverage_pct,
+    ROUND(COUNT(BEST_FIRST_NAME) * 100.0 / COUNT(*), 2) as name_coverage_pct,
+    ROUND(COUNT(BEST_LOCATION) * 100.0 / COUNT(*), 2) as location_coverage_pct
+FROM {master_table};
+```
+
+**Store result as:** `master_coverage`
+
+**Adapt query based on actual columns available**
+
+---
+
+#### Query 8: Master Table Sample Records
+
+```sql
+SELECT *
+FROM {master_table}
+LIMIT 5;
+```
+
+**Store result as:** `master_samples`
+
+**Use for:** Sample records table in report
+
+---
+
+#### Query 9: Unification Metadata (Optional)
+
+```sql
+SELECT
+    CANONICAL_ID_NAME,
+    CANONICAL_ID_TYPE
+FROM {metadata_table};
+```
+
+**Store result as:** `metadata` (optional, may not exist)
+
+---
+
+#### Query 10: Column Lookup Configuration (Optional)
+
+```sql
+SELECT
+    DATABASE_NAME,
+    TABLE_NAME,
+    COLUMN_NAME,
+    KEY_NAME
+FROM {column_lookup_table}
+ORDER BY TABLE_NAME, KEY_NAME;
+```
+
+**Store result as:** `column_mappings` (optional)
+
+---
+
+#### Query 11: Filter Lookup Configuration (Optional)
+
+```sql
+SELECT
+    KEY_NAME,
+    INVALID_TEXTS,
+    VALID_REGEXP
+FROM {filter_lookup_table};
+```
+
+**Store result as:** `validation_rules` (optional)
+
+---
+
+#### Query 12: Master Table Record Count
+
+```sql
+SELECT COUNT(*) as total_records
+FROM {master_table};
+```
+
+**Store result as:** `master_count`
+
+**Validation:** Should equal `unique_canonical_ids`
+
+---
+
+#### Query 13: Deduplication Rate Calculation
+
+```sql
+WITH source_stats AS (
+    SELECT
+        DISTINCT_CUSTOMER_ID as source_customer_id,
+        DISTINCT_EMAIL as source_email,
+        DISTINCT_PHONE as source_phone
+    FROM {source_stats_table}
+    WHERE FROM_TABLE = '*'
+),
+result_stats AS (
+    SELECT TOTAL_DISTINCT as final_canonical_ids
+    FROM {result_stats_table}
+    WHERE FROM_TABLE = '*'
+)
+SELECT
+    source_customer_id,
+    source_email,
+    source_phone,
+    final_canonical_ids,
+    ROUND((source_customer_id - final_canonical_ids) * 100.0 / NULLIF(source_customer_id, 0), 1) as customer_id_dedup_pct,
+    ROUND((source_email - final_canonical_ids) * 100.0 / NULLIF(source_email, 0), 1) as email_dedup_pct,
+    ROUND((source_phone - final_canonical_ids) * 100.0 / NULLIF(source_phone, 0), 1) as phone_dedup_pct
+FROM source_stats, result_stats;
+```
+
+**Store result as:** `deduplication_rates`
+
+---
+
+### Phase 4: Data Processing and Metric Calculation
+
+**Calculate all derived metrics:**
+
+1. **Executive Summary Metrics:**
+   ```python
+   unified_profiles = unique_canonical_ids  # from Query 3
+   total_identities = total_canonical_ids   # from Query 3
+   merge_ratio = total_identities / unified_profiles
+   convergence_iterations = 4  # default or parse from logs if available
+   ```
+
+2. **Fragmentation Reduction:**
+   ```python
+   reduction_pct = ((total_identities - unified_profiles) / total_identities) * 100
+   ```
+
+3. **Deduplication Rates:**
+   ```python
+   customer_id_dedup = deduplication_rates['customer_id_dedup_pct']
+   email_dedup = deduplication_rates['email_dedup_pct']
+   phone_dedup = deduplication_rates['phone_dedup_pct']
+   ```
+
+4. **Merge Distribution Percentages:**
+   ```python
+   for category in merge_distribution:
+       category['pct_profiles'] = (category['canonical_id_count'] / unified_profiles) * 100
+       category['pct_identities'] = (category['total_identities'] / total_identities) * 100
+   ```
+
+5. **Data Quality Score:**
+   ```python
+   quality_scores = [
+       master_coverage['email_coverage_pct'],
+       master_coverage['phone_coverage_pct'],
+       master_coverage['name_coverage_pct'],
+       # ... other coverage metrics
+   ]
+   overall_quality = sum(quality_scores) / len(quality_scores)
+   ```
+
+### Phase 5: HTML Report Generation
+
+**CRITICAL: Use EXACT HTML structure from reference report**
+
+**HTML Template Structure:**
+
+```html
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>ID Unification Merge Statistics Report</title>
+    <style>
+        /* EXACT CSS from reference report */
+        /* Copy all styles exactly */
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>ID Unification Merge Statistics Report</h1>
+            <p>Comprehensive Identity Resolution Performance Analysis</p>
+        </header>
+
+        <div class="metadata">
+            <div class="metadata-item">
+                <strong>Database/Catalog:</strong> {database_or_catalog}.{schema}
+            </div>
+            <div class="metadata-item">
+                <strong>Canonical ID:</strong> {canonical_id}
+            </div>
+            <div class="metadata-item">
+                <strong>Generated:</strong> {current_date}
+            </div>
+            <div class="metadata-item">
+                <strong>Platform:</strong> {platform}
+            </div>
+        </div>
+
+        <div class="content">
+            <!-- Section 1: Executive Summary -->
+            <div class="section">
+                <h2 class="section-title">Executive Summary</h2>
+                <div class="metrics-grid">
+                    <div class="metric-card">
+                        <div class="metric-label">Unified Profiles</div>
+                        <div class="metric-value">{unified_profiles:,}</div>
+                        <div class="metric-sublabel">Canonical Customer IDs</div>
+                    </div>
+                    <div class="metric-card">
+                        <div class="metric-label">Total Identities</div>
+                        <div class="metric-value">{total_identities:,}</div>
+                        <div class="metric-sublabel">Raw identity records merged</div>
+                    </div>
+                    <div class="metric-card">
+                        <div class="metric-label">Merge Ratio</div>
+                        <div class="metric-value">{merge_ratio:.2f}:1</div>
+                        <div class="metric-sublabel">Identities per customer</div>
+                    </div>
+                    <div class="metric-card">
+                        <div class="metric-label">Convergence</div>
+                        <div class="metric-value">{convergence_iterations}</div>
+                        <div class="metric-sublabel">Iterations</div>
+                    </div>
+                </div>
+                <div class="insight-box">
+                    <h4>Key Findings</h4>
+                    <ul>
+                        <li><strong>Excellent Merge Performance:</strong> Successfully unified {total_identities:,} identity records into {unified_profiles:,} canonical customer profiles, achieving a {reduction_pct:.1f}% reduction in identity fragmentation.</li>
+                        <!-- Add more insights based on data -->
+                    </ul>
+                </div>
+            </div>
+
+            <!-- Section 2: Identity Resolution Performance -->
+            <div class="section">
+                <h2 class="section-title">Identity Resolution Performance</h2>
+                <table class="stats-table">
+                    <thead>
+                        <tr>
+                            <th>Identity Key Type</th>
+                            <th>Source Distinct Count</th>
+                            <th>Final Canonical IDs</th>
+                            <th>Deduplication Rate</th>
+                            <th>Quality Score</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+                        <!-- For each key type in key_type_distribution -->
+                        <tr>
+                            <td><strong>{key_name}</strong></td>
+                            <td>{source_count:,}</td>
+                            <td>{unique_canonical_ids:,}</td>
+                            <td><span class="highlight">{dedup_pct:.1f}% reduction</span></td>
+                            <td><span class="badge badge-success">Excellent</span></td>
+                        </tr>
+                        <!-- Repeat for each key -->
+                    </tbody>
+                </table>
+                <!-- Add bar charts, insights, etc. -->
+            </div>
+
+            <!-- Section 3: Merge Distribution Analysis -->
+            <!-- Section 4: Top Merged Profiles -->
+            <!-- Section 5: Source Table Configuration -->
+            <!-- Section 6: Master Table Data Quality -->
+            <!-- Section 7: Convergence Performance -->
+            <!-- Section 8: Expert Recommendations -->
+            <!-- Section 9: Summary Statistics -->
+        </div>
+
+        <footer>
+            <div class="footer-note">
+                <p><strong>Report Generated:</strong> {current_date}</p>
+                <p><strong>Platform:</strong> {platform} ({database}.{schema})</p>
+                <p><strong>Workflow:</strong> Hybrid ID Unification</p>
+            </div>
+        </footer>
+    </div>
+</body>
+</html>
+```
+
+**Data Insertion Rules:**
+
+1. **Numbers**: Format with commas (e.g., 19,512)
+2. **Percentages**: Round to 1 decimal place (e.g., 74.7%)
+3. **Ratios**: Round to 2 decimal places (e.g., 3.95:1)
+4. **Dates**: Use YYYY-MM-DD format
+5. **Platform**: Capitalize (Snowflake or Databricks)
+
+**Dynamic Content Generation:**
+
+- For each metric card: Insert actual calculated values
+- For each table row: Loop through result sets
+- For each bar chart: Calculate width percentages
+- For each insight: Generate based on data patterns
+
+### Phase 6: Report Validation and Output
+
+**Pre-Output Validation:**
+
+```python
+validations = [
+    ("All sections have data", check_all_sections_populated()),
+    ("Calculations are correct", verify_calculations()),
+    ("Percentages sum properly", check_percentage_sums()),
+    ("No missing values", check_no_nulls()),
+    ("HTML is well-formed", validate_html_syntax())
+]
+
+for validation_name, result in validations:
+    if not result:
+        raise ValueError(f"Validation failed: {validation_name}")
+```
+
+**File Output:**
+
+```python
+# Use Write tool to save HTML
+Write(
+    file_path=output_path,
+    content=html_content
+)
+
+# Verify file was written
+if file_exists(output_path):
+    file_size = get_file_size(output_path)
+    print(f"✓ Report generated: {output_path}")
+    print(f"✓ File size: {file_size} KB")
+else:
+    raise Error("Failed to write report file")
+```
+
+**Success Summary:**
+
+```
+✓ Report generated successfully
+✓ Location: {output_path}
+✓ File size: {size} KB
+✓ Sections: 9
+✓ Statistics queries: 16
+✓ Unified profiles: {unified_profiles:,}
+✓ Data quality score: {overall_quality:.1f}%
+✓ Ready for viewing and PDF export
+
+Next steps:
+1. Open {output_path} in your browser
+2. Review merge statistics and insights
+3. Print to PDF for distribution (Ctrl+P or Cmd+P)
+4. Share with stakeholders
+```
+
+---
+
+## Error Handling
+
+### Handle These Scenarios:
+
+1. **Tables Not Found:**
+   ```
+   Error: Table {lookup_table} does not exist
+
+   Possible causes:
+   - Canonical ID name is incorrect
+   - Unification workflow not completed
+   - Database/schema name is wrong
+
+   Please verify:
+   - Database/Catalog: {database}
+   - Schema: {schema}
+   - Canonical ID: {canonical_id}
+   - Expected table: {canonical_id}_lookup
+   ```
+
+2. **No Data in Tables:**
+   ```
+   Error: Tables exist but contain no data
+
+   This indicates the unification workflow may have failed.
+
+   Please:
+   1. Check workflow execution logs
+   2. Verify source tables have data
+   3. Re-run the unification workflow
+   4. Try again after successful completion
+   ```
+
+3. **MCP Tools Unavailable:**
+   ```
+   Error: Cannot connect to {platform}
+
+   MCP tools for {platform} are not available.
+
+   Please:
+   1. Verify MCP server configuration
+   2. Check network connectivity
+   3. Validate credentials
+   4. Contact administrator if issue persists
+   ```
+
+4. **Permission Errors:**
+   ```
+   Error: Access denied to {table}
+
+   You don't have SELECT permission on this table.
+
+   Please:
+   1. Request SELECT permission from administrator
+   2. Verify your role has access
+   3. For Snowflake: GRANT SELECT ON SCHEMA {schema} TO {user}
+   4. For Databricks: GRANT SELECT ON {table} TO {user}
+   ```
+
+5. **Column Not Found:**
+   ```
+   Warning: Column {column_name} not found in master table
+
+   Skipping coverage calculation for this attribute.
+   Report will be generated without this metric.
+   ```
+
+---
+
+## Quality Standards
+
+### Report Must Meet These Criteria:
+
+✅ **Accuracy**: All metrics calculated correctly from source data
+✅ **Completeness**: All 9 sections populated with data
+✅ **Consistency**: Same HTML structure every time
+✅ **Readability**: Clear tables, charts, and insights
+✅ **Professional**: Executive-ready formatting and language
+✅ **Actionable**: Includes specific recommendations
+✅ **Validated**: All calculations double-checked
+✅ **Browser-compatible**: Works in Chrome, Firefox, Safari, Edge
+✅ **PDF-ready**: Exports cleanly to PDF
+✅ **Responsive**: Adapts to different screen sizes
+
+---
+
+## Expert Analysis Guidelines
+
+### When Writing Insights:
+
+1. **Be Data-Driven**: Reference specific metrics
+   - "Successfully unified 19,512 identities into 4,940 profiles"
+   - NOT: "Good unification performance"
+
+2. **Provide Context**: Compare to benchmarks
+   - "4-iteration convergence is excellent (typical is 8-12)"
+   - "74.7% fragmentation reduction exceeds industry average of 60%"
+
+3. **Identify Patterns**: Highlight interesting findings
+   - "89% of profiles have 3-5 identities, indicating normal multi-channel engagement"
+   - "Top merged profile has 38 identities - worth investigating"
+
+4. **Give Actionable Recommendations**:
+   - "Review profiles with 20+ merges for data quality issues"
+   - "Implement incremental processing for efficiency"
+
+5. **Assess Quality**: Grade and explain
+   - "Email coverage: 100% - Excellent for marketing"
+   - "Phone coverage: 99.39% - Near-perfect, 30 missing values"
+
+### Badge Assignment:
+
+- **Excellent**: 95-100% coverage or <5% deduplication
+- **Good**: 85-94% coverage or 5-15% deduplication
+- **Needs Improvement**: <85% coverage or >15% deduplication
+
+---
+
+## Platform-Specific Adaptations
+
+### Snowflake Specifics:
+
+- Use UPPERCASE for all identifiers (DATABASE, SCHEMA, TABLE, COLUMN)
+- Use `ARRAY_CONSTRUCT()` for array creation
+- Use `OBJECT_CONSTRUCT()` for objects
+- Date format: `TO_CHAR(CURRENT_DATE(), 'YYYY-MM-DD')`
+
+### Databricks Specifics:
+
+- Use lowercase for identifiers (catalog, schema, table, column)
+- Use `ARRAY()` for array creation
+- Use `STRUCT()` for objects
+- Date format: `DATE_FORMAT(CURRENT_DATE(), 'yyyy-MM-dd')`
+
+---
+
+## Success Checklist
+
+Before marking task complete:
+
+- [ ] All required user inputs collected
+- [ ] Platform and table names validated
+- [ ] All 16 queries executed successfully
+- [ ] All metrics calculated correctly
+- [ ] HTML report generated with all sections
+- [ ] File written to specified path
+- [ ] Success summary displayed to user
+- [ ] No errors or warnings in output
+
+---
+
+## Final Agent Output
+
+**When complete, output this exact format:**
+
+```
+════════════════════════════════════════════════════════════════
+  ID UNIFICATION MERGE STATISTICS REPORT - GENERATION COMPLETE
+════════════════════════════════════════════════════════════════
+
+Platform:               {platform}
+Database/Catalog:       {database}
+Schema:                 {schema}
+Canonical ID:           {canonical_id}
+
+STATISTICS SUMMARY
+──────────────────────────────────────────────────────────────
+Unified Profiles:       {unified_profiles:,}
+Total Identities:       {total_identities:,}
+Merge Ratio:            {merge_ratio:.2f}:1
+Fragmentation Reduction: {reduction_pct:.1f}%
+Data Quality Score:     {quality_score:.1f}%
+
+REPORT DETAILS
+──────────────────────────────────────────────────────────────
+Output File:            {output_path}
+File Size:              {file_size} KB
+Sections Included:      9
+Queries Executed:       16
+Generation Time:        {generation_time} seconds
+
+NEXT STEPS
+──────────────────────────────────────────────────────────────
+1. Open {output_path} in your web browser
+2. Review merge statistics and expert insights
+3. Export to PDF: Press Ctrl+P (Windows) or Cmd+P (Mac)
+4. Share with stakeholders and decision makers
+
+✓ Report generation successful!
+════════════════════════════════════════════════════════════════
+```
+
+---
+
+**You are now ready to execute as the expert merge statistics report generator agent!**
--- a/agents/snowflake-sql-generator.md
+++ b/agents/snowflake-sql-generator.md
@@ -0,0 +1,114 @@
+# Snowflake SQL Generator Agent
+
+## Agent Purpose
+Generate production-ready Snowflake SQL from `unify.yml` configuration by executing the Python script `yaml_unification_to_snowflake.py`.
+
+## Agent Workflow
+
+### Step 1: Validate Inputs
+**Check**:
+- YAML file exists and is valid
+- Target database and schema provided
+- Source database/schema (defaults to target database/PUBLIC if not provided)
+- Output directory path
+
+### Step 2: Execute Python Script
+**Use Bash tool** to execute:
+```bash
+python3 /path/to/plugins/cdp-hybrid-idu/scripts/snowflake/yaml_unification_to_snowflake.py \
+    <yaml_file> \
+    -d <target_database> \
+    -s <target_schema> \
+    -sd <source_database> \
+    -ss <source_schema> \
+    -o <output_directory>
+```
+
+**Parameters**:
+- `<yaml_file>`: Path to unify.yml
+- `-d`: Target database name
+- `-s`: Target schema name
+- `-sd`: Source database (optional, defaults to target database)
+- `-ss`: Source schema (optional, defaults to PUBLIC)
+- `-o`: Output directory (optional, defaults to `snowflake_sql`)
+
+### Step 3: Monitor Execution
+**Track**:
+- Script execution progress
+- Generated SQL file count
+- Any warnings or errors
+- Output directory structure
+
+### Step 4: Parse and Report Results
+**Output**:
+```
+✓ Snowflake SQL generation complete!
+
+Generated Files:
+  • snowflake_sql/unify/01_create_graph.sql
+  • snowflake_sql/unify/02_extract_merge.sql
+  • snowflake_sql/unify/03_source_key_stats.sql
+  • snowflake_sql/unify/04_unify_loop_iteration_01.sql
+  ... (up to iteration_N)
+  • snowflake_sql/unify/05_canonicalize.sql
+  • snowflake_sql/unify/06_result_key_stats.sql
+  • snowflake_sql/unify/10_enrich_*.sql
+  • snowflake_sql/unify/20_master_*.sql
+  • snowflake_sql/unify/30_unification_metadata.sql
+  • snowflake_sql/unify/31_filter_lookup.sql
+  • snowflake_sql/unify/32_column_lookup.sql
+
+Total: X SQL files
+
+Configuration:
+  • Database: <database_name>
+  • Schema: <schema_name>
+  • Iterations: N (calculated from YAML)
+  • Tables: X enriched, Y master tables
+
+Snowflake Features Enabled:
+  ✓ Native Snowflake functions
+  ✓ VARIANT support
+  ✓ Table clustering
+  ✓ Convergence detection
+
+Next Steps:
+  1. Review generated SQL files
+  2. Execute using: /cdp-hybrid-idu:hybrid-execute-snowflake
+  3. Or manually execute in Snowflake SQL worksheet
+```
+
+## Critical Behaviors
+
+### Python Script Error Handling
+If script fails:
+1. Capture error output
+2. Parse error message
+3. Provide helpful suggestions:
+   - YAML syntax errors → validate YAML
+   - Missing dependencies → install pyyaml
+   - Invalid table names → check YAML table section
+   - File permission errors → check output directory permissions
+
+### Success Validation
+Verify:
+- Output directory created
+- All expected SQL files present
+- Files have non-zero content
+- SQL syntax looks valid (basic check)
+
+### Platform-Specific Conversions
+Report applied conversions:
+- Presto/Databricks functions → Snowflake equivalents
+- Array operations → ARRAY_CONSTRUCT/FLATTEN syntax
+- Time functions → DATE_PART(epoch_second, ...)
+- Table definitions → Snowflake syntax
+
+## MUST DO
+
+1. **Always use absolute paths** for plugin scripts
+2. **Check Python version** (require Python 3.7+)
+3. **Parse script output** for errors and warnings
+4. **Verify output directory** structure
+5. **Count generated files** and report summary
+6. **Provide clear next steps** for execution
--- a/agents/snowflake-workflow-executor.md
+++ b/agents/snowflake-workflow-executor.md
@@ -0,0 +1,138 @@
+# Snowflake Workflow Executor Agent
+
+## Agent Purpose
+Execute generated Snowflake SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script `snowflake_sql_executor.py`.
+
+## Agent Workflow
+
+### Step 1: Collect Credentials
+**Required**:
+- SQL directory path
+- Account name
+- Username
+- Database and schema names
+- Warehouse name (defaults to `COMPUTE_WH`)
+
+**Authentication Options**:
+- Password (from argument, environment variable `SNOWFLAKE_PASSWORD`, or prompt)
+- SSO (externalbrowser)
+- Key-pair (using environment variables)
+
+### Step 2: Execute Python Script
+**Use Bash tool** with `run_in_background: true` to execute:
+```bash
+python3 /path/to/plugins/cdp-hybrid-idu/scripts/snowflake/snowflake_sql_executor.py \
+    <sql_directory> \
+    --account <account> \
+    --user <user> \
+    --database <database> \
+    --schema <schema> \
+    --warehouse <warehouse> \
+    --password <password>
+```
+
+### Step 3: Monitor Execution in Real-Time
+**Use BashOutput tool** to stream progress:
+- Connection status
+- File execution progress
+- Row counts and timing
+- Convergence detection results
+- Error messages
+
+**Display Progress**:
+```
+✓ Connected to Snowflake: <account>
+• Using database: <database>, schema: <schema>
+
+Executing: 01_create_graph.sql
+✓ Completed: 01_create_graph.sql
+
+Executing: 02_extract_merge.sql
+✓ Completed: 02_extract_merge.sql
+• Rows affected: 125,000
+
+Executing Unify Loop (convergence detection)
+
+--- Iteration 1 ---
+✓ Iteration 1 completed
+• Updated records: 1,500
+
+--- Iteration 2 ---
+✓ Iteration 2 completed
+• Updated records: 450
+
+--- Iteration 3 ---
+✓ Iteration 3 completed
+• Updated records: 0
+✓ Loop converged after 3 iterations!
+
+• Creating alias table: loop_final
+...
+```
+
+### Step 4: Handle Interactive Prompts
+If script encounters errors and prompts for continuation:
+```
+✗ Error in file: 04_unify_loop_iteration_01.sql
+Error: Table not found
+
+Continue with remaining files? (y/n):
+```
+
+**Agent Decision**:
+1. Show error to user
+2. Ask user for decision
+3. Pass response to script
+
+### Step 5: Final Report
+**After completion**:
+```
+Execution Complete!
+
+Summary:
+  • Files processed: 18/18
+  • Execution time: 45 minutes
+  • Convergence: 3 iterations
+  • Final lookup table rows: 98,500
+
+Validation:
+  ✓ All tables created successfully
+  ✓ Canonical IDs generated
+  ✓ Enriched tables populated
+  ✓ Master tables created
+
+Next Steps:
+  1. Verify data quality
+  2. Check coverage metrics
+  3. Review statistics tables
+```
+
+## Critical Behaviors
+
+### Convergence Monitoring
+Track loop iterations:
+- Iteration number
+- Records updated
+- Convergence status
+
+### Error Recovery
+On errors:
+1. Capture error details
+2. Determine severity (critical vs warning)
+3. Prompt user for continuation decision
+4. Log error for troubleshooting
+
+### Performance Tracking
+Monitor:
+- Execution time per file
+- Row counts processed
+- Total workflow time
+
+## MUST DO
+
+1. **Stream output in real-time** using BashOutput
+2. **Monitor convergence** and report iterations
+3. **Handle user prompts** for error continuation
+4. **Report final statistics** with coverage metrics
+5. **Verify connection** before starting execution
+6. **Clean up** on termination or error
--- a/agents/yaml-configuration-builder.md
+++ b/agents/yaml-configuration-builder.md
@@ -0,0 +1,382 @@
+# YAML Configuration Builder Agent
+
+## Agent Purpose
+Interactive agent to help users create proper `unify.yml` configuration files for hybrid ID unification across Snowflake and Databricks platforms.
+
+## Agent Capabilities
+- Guide users through YAML creation step-by-step
+- Validate configuration in real-time
+- Provide examples and best practices
+- Support both simple and complex configurations
+- Ensure platform compatibility (Snowflake and Databricks)
+
+---
+
+## Agent Workflow
+
+### Step 1: Project Name and Scope
+**Collect**:
+- Unification project name
+- Brief description of use case
+
+**Example Interaction**:
+```
+Question: What would you like to name this unification project?
+Suggestion: Use a descriptive name like 'customer_unification' or 'user_identity_resolution'
+
+User input: customer_360
+
+✓ Project name: customer_360
+```
+
+---
+
+### Step 2: Define Keys (User Identifiers)
+**Collect**:
+- Key names (email, customer_id, phone_number, etc.)
+- Validation rules for each key:
+  - `valid_regexp`: Regex pattern for format validation
+  - `invalid_texts`: Array of values to exclude
+
+**Example Interaction**:
+```
+Question: What user identifier columns (keys) do you want to use for unification?
+
+Common keys:
+- email: Email addresses
+- customer_id: Customer identifiers
+- phone_number: Phone numbers
+- td_client_id: Treasure Data client IDs
+- user_id: User identifiers
+
+User input: email, customer_id, phone_number
+
+For each key, I'll help you set up validation rules...
+
+Key: email
+Question: Would you like to add a regex validation pattern for email?
+Suggestion: Use ".*@.*" for basic email validation or more strict patterns
+
+User input: .*@.*
+
+Question: What values should be considered invalid?
+Suggestion: Common invalid values: '', 'N/A', 'null', 'unknown'
+
+User input: '', 'N/A', 'null'
+
+✓ Key 'email' configured with regex validation and 3 invalid values
+```
+
+**Generate YAML Section**:
+```yaml
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+  - name: phone_number
+    invalid_texts: ['', 'N/A', 'null']
+```
+
+---
+
+### Step 3: Map Tables to Keys
+**Collect**:
+- Source table names
+- Key column mappings for each table
+
+**Example Interaction**:
+```
+Question: What source tables contain user identifiers?
+
+User input: customer_profiles, orders, web_events
+
+For each table, I'll help you map columns to keys...
+
+Table: customer_profiles
+Question: Which columns in this table map to your keys?
+
+Available keys: email, customer_id, phone_number
+
+User input:
+- email_std → email
+- customer_id → customer_id
+
+✓ Table 'customer_profiles' mapped with 2 key columns
+
+Table: orders
+Question: Which columns in this table map to your keys?
+
+User input:
+- email_address → email
+- phone → phone_number
+
+✓ Table 'orders' mapped with 2 key columns
+```
+
+**Generate YAML Section**:
+```yaml
+tables:
+  - table: customer_profiles
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+  - table: orders
+    key_columns:
+      - {column: email_address, key: email}
+      - {column: phone, key: phone_number}
+  - table: web_events
+    key_columns:
+      - {column: user_email, key: email}
+```
+
+---
+
+### Step 4: Configure Canonical ID
+**Collect**:
+- Canonical ID name
+- Merge keys (priority order)
+- Iteration count (optional)
+
+**Example Interaction**:
+```
+Question: What would you like to name the canonical ID column?
+Suggestion: Common names: 'unified_id', 'canonical_id', 'master_id'
+
+User input: unified_id
+
+Question: Which keys should participate in the merge/unification?
+Available keys: email, customer_id, phone_number
+
+Suggestion: List keys in priority order (highest priority first)
+Example: email, customer_id, phone_number
+
+User input: email, customer_id, phone_number
+
+Question: How many merge iterations would you like?
+Suggestion:
+  - Leave blank to auto-calculate based on complexity
+  - Typical range: 3-10 iterations
+  - More keys/tables = more iterations needed
+
+User input: (blank - auto-calculate)
+
+✓ Canonical ID 'unified_id' configured with 3 merge keys
+✓ Iterations will be auto-calculated
+```
+
+**Generate YAML Section**:
+```yaml
+canonical_ids:
+  - name: unified_id
+    merge_by_keys: [email, customer_id, phone_number]
+    # merge_iterations: 15auto-calculated
+```
+
+---
+
+### Step 5: Configure Master Tables (Optional)
+**Collect**:
+- Master table names
+- Attributes to aggregate
+- Source column priorities
+
+**Example Interaction**:
+```
+Question: Would you like to create master tables with aggregated attributes?
+(Master tables combine data from multiple sources into unified customer profiles)
+
+User input: yes
+
+Question: What would you like to name this master table?
+Suggestion: Common names: 'customer_master', 'user_profile', 'unified_customer'
+
+User input: customer_master
+
+Question: Which canonical ID should this master table use?
+Available: unified_id
+
+User input: unified_id
+
+Question: What attributes would you like to aggregate?
+
+Attribute 1:
+  Name: best_email
+  Type: single value or array?
+  User input: single value
+
+  Source columns (priority order):
+  1. Table: customer_profiles, Column: email_std, Order by: time
+  2. Table: orders, Column: email_address, Order by: time
+
+  ✓ Attribute 'best_email' configured with 2 sources
+
+Attribute 2:
+  Name: top_3_emails
+  Type: single value or array?
+  User input: array
+  Array size: 3
+
+  Source columns (priority order):
+  1. Table: customer_profiles, Column: email_std, Order by: time
+  2. Table: orders, Column: email_address, Order by: time
+
+  ✓ Attribute 'top_3_emails' configured as array with 2 sources
+```
+
+**Generate YAML Section**:
+```yaml
+master_tables:
+  - name: customer_master
+    canonical_id: unified_id
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
+          - {table: orders, column: email_address, priority: 2, order_by: time}
+      - name: top_3_emails
+        array_elements: 3
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
+          - {table: orders, column: email_address, priority: 2, order_by: time}
+```
+
+---
+
+### Step 6: Validation and Finalization
+**Perform**:
+1. Validate complete YAML structure
+2. Check all references
+3. Suggest optimizations
+4. Write final `unify.yml` file
+
+**Example Output**:
+```
+Validating configuration...
+
+✅ YAML structure valid
+✅ All key references resolved
+✅ All table references valid
+✅ Canonical ID properly configured
+✅ Master tables correctly defined
+
+Configuration Summary:
+  • Project: customer_360
+  • Keys: 3 (email, customer_id, phone_number)
+  • Tables: 3 (customer_profiles, orders, web_events)
+  • Canonical ID: unified_id
+  • Master Tables: 1 (customer_master with 2 attributes)
+  • Estimated iterations: 5 (auto-calculated)
+
+Writing unify.yml...
+
+✓ Configuration file created successfully!
+
+File location: ./unify.yml
+```
+
+---
+
+## Agent Output
+
+### Success
+Returns complete `unify.yml` with:
+- All sections properly structured
+- Valid YAML syntax
+- Optimized configuration
+- Ready for SQL generation
+
+### Validation
+Performs checks:
+- YAML syntax validation
+- Reference integrity
+- Best practices compliance
+- Platform compatibility
+
+---
+
+## Agent Behavior Guidelines
+
+### Be Interactive
+- Ask clear questions
+- Provide examples
+- Suggest best practices
+- Validate responses
+
+### Be Helpful
+- Explain concepts when needed
+- Offer suggestions
+- Show examples
+- Guide through complex scenarios
+
+### Be Thorough
+- Don't skip validation
+- Check all references
+- Ensure completeness
+- Verify platform compatibility
+
+---
+
+## Example Complete YAML Output
+
+```yaml
+name: customer_360
+
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null', 'unknown']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+  - name: phone_number
+    invalid_texts: ['', 'N/A', 'null']
+
+tables:
+  - table: customer_profiles
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+  - table: orders
+    key_columns:
+      - {column: email_address, key: email}
+      - {column: phone, key: phone_number}
+  - table: web_events
+    key_columns:
+      - {column: user_email, key: email}
+
+canonical_ids:
+  - name: unified_id
+    merge_by_keys: [email, customer_id, phone_number]
+    merge_iterations: 15
+
+master_tables:
+  - name: customer_master
+    canonical_id: unified_id
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
+          - {table: orders, column: email_address, priority: 2, order_by: time}
+      - name: primary_phone
+        source_columns:
+          - {table: orders, column: phone, priority: 1, order_by: time}
+      - name: top_3_emails
+        array_elements: 3
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
+          - {table: orders, column: email_address, priority: 2, order_by: time}
+```
+
+---
+
+## CRITICAL: Agent Must
+
+1. **Always validate** YAML syntax before writing file
+2. **Check all references** (keys, tables, canonical_ids)
+3. **Provide examples** for complex configurations
+4. **Suggest optimizations** based on use case
+5. **Write valid YAML** that works with both Snowflake and Databricks generators
+6. **Use proper indentation** (2 spaces per level)
+7. **Quote string values** where necessary
+8. **Test regex patterns** before adding to configuration
--- a/commands/hybrid-execute-databricks.md
+++ b/commands/hybrid-execute-databricks.md
@@ -0,0 +1,387 @@
+---
+name: hybrid-execute-databricks
+description: Execute Databricks ID unification workflow with convergence detection and monitoring
+---
+
+# Execute Databricks ID Unification Workflow
+
+## Overview
+
+Execute your generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling. This command orchestrates the complete unification process from graph creation to master table generation.
+
+---
+
+## What You Need
+
+### Required Inputs
+1. **SQL Directory**: Path to generated SQL files (e.g., `databricks_sql/unify/`)
+2. **Server Hostname**: Your Databricks workspace URL (e.g., `your-workspace.cloud.databricks.com`)
+3. **HTTP Path**: SQL Warehouse or cluster path (e.g., `/sql/1.0/warehouses/abc123`)
+4. **Catalog**: Target catalog name
+5. **Schema**: Target schema name
+
+### Authentication
+**Option 1: Personal Access Token (PAT)**
+- Access token from Databricks workspace
+- Can be provided as argument or via environment variable `DATABRICKS_TOKEN`
+
+**Option 2: OAuth**
+- Browser-based authentication
+- No token required, will open browser for login
+
+---
+
+## What I'll Do
+
+### Step 1: Connection Setup
+- Connect to your Databricks workspace
+- Validate credentials and permissions
+- Set catalog and schema context
+- Verify SQL directory exists
+
+### Step 2: Execution Plan
+Display execution plan with:
+- All SQL files in execution order
+- File types (Setup, Loop Iteration, Enrichment, Master Table, etc.)
+- Estimated steps and dependencies
+
+### Step 3: SQL Execution
+I'll call the **databricks-workflow-executor agent** to:
+- Execute SQL files in proper sequence
+- Skip loop iteration files (handled separately)
+- Monitor progress with real-time feedback
+- Track row counts and execution times
+
+### Step 4: Unify Loop with Convergence Detection
+**Intelligent Loop Execution**:
+```
+Iteration 1:
+  ✓ Execute unify SQL
+  • Check convergence: 1500 records updated
+  • Optimize Delta table
+  → Continue to iteration 2
+
+Iteration 2:
+  ✓ Execute unify SQL
+  • Check convergence: 450 records updated
+  • Optimize Delta table
+  → Continue to iteration 3
+
+Iteration 3:
+  ✓ Execute unify SQL
+  • Check convergence: 0 records updated
+  ✓ CONVERGED! Stop loop
+```
+
+**Features**:
+- Runs until convergence (updated_count = 0)
+- Maximum 30 iterations safety limit
+- Auto-optimization after each iteration
+- Creates alias table (loop_final) for downstream processing
+
+### Step 5: Post-Loop Processing
+- Execute canonicalization step
+- Generate result statistics
+- Enrich source tables with canonical IDs
+- Create master tables
+- Generate metadata and lookup tables
+
+### Step 6: Final Report
+Provide:
+- Total execution time
+- Files processed successfully
+- Convergence statistics
+- Final table row counts
+- Next steps and recommendations
+
+---
+
+## Command Usage
+
+### Interactive Mode (Recommended)
+```
+/cdp-hybrid-idu:hybrid-execute-databricks
+
+I'll prompt you for:
+- SQL directory path
+- Databricks server hostname
+- HTTP path
+- Catalog and schema
+- Authentication method
+```
+
+### Advanced Mode
+Provide all parameters upfront:
+```
+SQL directory: databricks_sql/unify/
+Server hostname: your-workspace.cloud.databricks.com
+HTTP path: /sql/1.0/warehouses/abc123
+Catalog: my_catalog
+Schema: my_schema
+Auth type: pat (or oauth)
+Access token: dapi... (if using PAT)
+```
+
+---
+
+## Execution Features
+
+### 1. Convergence Detection
+**Algorithm**:
+```sql
+SELECT COUNT(*) as updated_count FROM (
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM current_iteration
+    EXCEPT
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM previous_iteration
+) diff
+```
+
+**Stops when**: updated_count = 0
+
+### 2. Delta Table Optimization
+After major operations:
+```sql
+OPTIMIZE table_name
+```
+Benefits:
+- Compacts small files
+- Improves query performance
+- Reduces storage costs
+- Optimizes clustering
+
+### 3. Interactive Error Handling
+If an error occurs:
+```
+✗ File: 04_unify_loop_iteration_01.sql
+Error: Table not found: source_table
+
+Continue with remaining files? (y/n):
+```
+
+You can choose to:
+- Continue: Skip failed file, continue with rest
+- Stop: Halt execution for investigation
+
+### 4. Real-Time Monitoring
+Track progress with:
+- ✓ Completed steps (green)
+- • Progress indicators (cyan)
+- ✗ Failed steps (red)
+- ⚠ Warnings (yellow)
+- Row counts and execution times
+
+### 5. Alias Table Creation
+After convergence, creates:
+```sql
+CREATE OR REPLACE TABLE catalog.schema.unified_id_graph_unify_loop_final
+AS SELECT * FROM catalog.schema.unified_id_graph_unify_loop_3
+```
+
+This allows downstream SQL to reference `loop_final` regardless of actual iteration count.
+
+---
+
+## Technical Details
+
+### Python Script Execution
+The agent executes:
+```bash
+python3 scripts/databricks/databricks_sql_executor.py \
+    databricks_sql/unify/ \
+    --server-hostname your-workspace.databricks.com \
+    --http-path /sql/1.0/warehouses/abc123 \
+    --catalog my_catalog \
+    --schema my_schema \
+    --auth-type pat \
+    --optimize-tables
+```
+
+### Execution Order
+1. **Setup Phase** (01-03):
+   - Create graph table (loop_0)
+   - Extract and merge identities
+   - Generate source statistics
+
+2. **Unification Loop** (04):
+   - Run iterations until convergence
+   - Check after EVERY iteration
+   - Stop when updated_count = 0
+   - Create loop_final alias
+
+3. **Canonicalization** (05):
+   - Create canonical ID lookup
+   - Create keys and tables metadata
+   - Rename final graph table
+
+4. **Statistics** (06):
+   - Generate result key statistics
+   - Create histograms
+   - Calculate coverage metrics
+
+5. **Enrichment** (10-19):
+   - Add canonical IDs to source tables
+   - Create enriched_* tables
+
+6. **Master Tables** (20-29):
+   - Aggregate attributes
+   - Apply priority rules
+   - Create unified customer profiles
+
+7. **Metadata** (30-39):
+   - Unification metadata
+   - Filter lookup tables
+   - Column lookup tables
+
+### Connection Management
+- Establishes single connection for entire workflow
+- Uses connection pooling for efficiency
+- Automatic reconnection on timeout
+- Proper cleanup on completion or error
+
+---
+
+## Example Execution
+
+### Input
+```
+SQL directory: databricks_sql/unify/
+Server hostname: dbc-12345-abc.cloud.databricks.com
+HTTP path: /sql/1.0/warehouses/6789abcd
+Catalog: customer_data
+Schema: id_unification
+Auth type: pat
+```
+
+### Output
+```
+✓ Connected to Databricks: dbc-12345-abc.cloud.databricks.com
+• Using catalog: customer_data, schema: id_unification
+
+Starting Databricks SQL Execution
+• Catalog: customer_data
+• Schema: id_unification
+• Delta tables: ✓ enabled
+
+Executing: 01_create_graph.sql
+✓ 01_create_graph.sql: Executed successfully
+
+Executing: 02_extract_merge.sql
+✓ 02_extract_merge.sql: Executed successfully
+• Rows affected: 125000
+
+Executing: 03_source_key_stats.sql
+✓ 03_source_key_stats.sql: Executed successfully
+
+Executing Unify Loop Before Canonicalization
+
+--- Iteration 1 ---
+✓ Iteration 1 completed
+• Rows processed: 125000
+• Updated records: 1500
+• Optimizing Delta table
+
+--- Iteration 2 ---
+✓ Iteration 2 completed
+• Rows processed: 125000
+• Updated records: 450
+• Optimizing Delta table
+
+--- Iteration 3 ---
+✓ Iteration 3 completed
+• Rows processed: 125000
+• Updated records: 0
+✓ Loop converged after 3 iterations
+
+• Creating alias table for final iteration
+✓ Alias table 'unified_id_graph_unify_loop_final' created
+
+Executing: 05_canonicalize.sql
+✓ 05_canonicalize.sql: Executed successfully
+
+[... continues with enrichment, master tables, metadata ...]
+
+Execution Complete
+• Files processed: 18/18
+• Final unified_id_lookup rows: 98,500
+
+• Disconnected from Databricks
+```
+
+---
+
+## Monitoring and Troubleshooting
+
+### Check Execution Progress
+During execution, you can monitor:
+- Databricks SQL Warehouse query history
+- Delta table sizes and row counts
+- Execution logs in Databricks workspace
+
+### Common Issues
+
+**Issue**: Connection timeout
+**Solution**: Check network access, verify credentials, ensure SQL Warehouse is running
+
+**Issue**: Table not found
+**Solution**: Verify catalog/schema permissions, check source table names in YAML
+
+**Issue**: Loop doesn't converge
+**Solution**: Check data quality, increase max_iterations, review key validation rules
+
+**Issue**: Out of memory
+**Solution**: Increase SQL Warehouse size, optimize clustering, reduce batch sizes
+
+**Issue**: Permission denied
+**Solution**: Verify catalog/schema permissions, check Unity Catalog access controls
+
+### Performance Optimization
+- Use larger SQL Warehouse for faster execution
+- Enable auto-scaling for variable workloads
+- Optimize Delta tables regularly
+- Use clustering on frequently joined columns
+
+---
+
+## Post-Execution Validation
+**DO NOT RUN THESE VALIDATION. JUST PRESENT TO USER TO RUN ON DATABRICKS**
+
+### Check Coverage
+```sql
+SELECT
+    COUNT(*) as total_records,
+    COUNT(unified_id) as records_with_id,
+    COUNT(unified_id) * 100.0 / COUNT(*) as coverage_percent
+FROM catalog.schema.enriched_customer_profiles;
+```
+
+### Verify Master Table
+```sql
+SELECT COUNT(*) as unified_customers
+FROM catalog.schema.customer_master;
+```
+
+### Review Statistics
+```sql
+SELECT * FROM catalog.schema.unified_id_result_key_stats
+WHERE from_table = '*';
+```
+
+---
+
+## Success Criteria
+
+Execution successful when:
+- ✅ All SQL files processed without critical errors
+- ✅ Unification loop converged (updated_count = 0)
+- ✅ Canonical IDs generated for all eligible records
+- ✅ Enriched tables created successfully
+- ✅ Master tables populated with attributes
+- ✅ Coverage metrics meet expectations
+
+---
+
+**Ready to execute your Databricks ID unification workflow?**
+
+Provide your SQL directory path and Databricks connection details to begin!
--- a/commands/hybrid-execute-snowflake.md
+++ b/commands/hybrid-execute-snowflake.md
@@ -0,0 +1,401 @@
+---
+name: hybrid-execute-snowflake
+description: Execute Snowflake ID unification workflow with convergence detection and monitoring
+---
+
+# Execute Snowflake ID Unification Workflow
+
+## Overview
+
+Execute your generated Snowflake SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling. This command orchestrates the complete unification process from graph creation to master table generation.
+
+---
+
+## What You Need
+
+### Required Inputs
+1. **SQL Directory**: Path to generated SQL files (e.g., `snowflake_sql/unify/`)
+2. **Account**: Snowflake account name (e.g., `myaccount` from `myaccount.snowflakecomputing.com`)
+3. **User**: Snowflake username
+4. **Database**: Target database name
+5. **Schema**: Target schema name
+6. **Warehouse**: Compute warehouse name (defaults to `COMPUTE_WH`)
+
+### Authentication
+**Option 1: Password**
+- Can be provided as argument or via environment variable `SNOWFLAKE_PASSWORD` via environment file (.env) `SNOWFLAKE_PASSWORD`
+- Will prompt if not provided
+
+**Option 2: SSO (externalbrowser)**
+- Opens browser for authentication
+- No password required
+
+**Option 3: Key-Pair**
+- Private key path via `SNOWFLAKE_PRIVATE_KEY_PATH`
+- Passphrase via `SNOWFLAKE_PRIVATE_KEY_PASSPHRASE`
+
+---
+
+## What I'll Do
+
+### Step 1: Connection Setup
+- Connect to your Snowflake account
+- Validate credentials and permissions
+- Set database and schema context
+- Verify SQL directory exists
+- Activate warehouse
+
+### Step 2: Execution Plan
+Display execution plan with:
+- All SQL files in execution order
+- File types (Setup, Loop Iteration, Enrichment, Master Table, etc.)
+- Estimated steps and dependencies
+
+### Step 3: SQL Execution
+I'll call the **snowflake-workflow-executor agent** to:
+- Execute SQL files in proper sequence
+- Skip loop iteration files (handled separately)
+- Monitor progress with real-time feedback
+- Track row counts and execution times
+
+### Step 4: Unify Loop with Convergence Detection
+**Intelligent Loop Execution**:
+```
+Iteration 1:
+  ✓ Execute unify SQL
+  • Check convergence: 1500 records updated
+  → Continue to iteration 2
+
+Iteration 2:
+  ✓ Execute unify SQL
+  • Check convergence: 450 records updated
+  → Continue to iteration 3
+
+Iteration 3:
+  ✓ Execute unify SQL
+  • Check convergence: 0 records updated
+  ✓ CONVERGED! Stop loop
+```
+
+**Features**:
+- Runs until convergence (updated_count = 0)
+- Maximum 30 iterations safety limit
+- Creates alias table (loop_final) for downstream processing
+
+### Step 5: Post-Loop Processing
+- Execute canonicalization step
+- Generate result statistics
+- Enrich source tables with canonical IDs
+- Create master tables
+- Generate metadata and lookup tables
+
+### Step 6: Final Report
+Provide:
+- Total execution time
+- Files processed successfully
+- Convergence statistics
+- Final table row counts
+- Next steps and recommendations
+
+---
+
+## Command Usage
+
+### Interactive Mode (Recommended)
+```
+/cdp-hybrid-idu:hybrid-execute-snowflake
+
+I'll prompt you for:
+- SQL directory path
+- Snowflake account name
+- Username
+- Database and schema
+- Warehouse name
+- Authentication method
+```
+
+### Advanced Mode
+Provide all parameters upfront:
+```
+SQL directory: snowflake_sql/unify/
+Account: myaccount
+User: myuser
+Database: my_database
+Schema: my_schema
+Warehouse: COMPUTE_WH
+Password: (will prompt if not in environment)
+```
+
+---
+
+## Execution Features
+
+### 1. Convergence Detection
+**Algorithm**:
+```sql
+SELECT COUNT(*) as updated_count FROM (
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM current_iteration
+    EXCEPT
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM previous_iteration
+) diff
+```
+
+**Stops when**: updated_count = 0
+
+### 2. Interactive Error Handling
+If an error occurs:
+```
+✗ File: 04_unify_loop_iteration_01.sql
+Error: Table not found: source_table
+
+Continue with remaining files? (y/n):
+```
+
+You can choose to:
+- Continue: Skip failed file, continue with rest
+- Stop: Halt execution for investigation
+
+### 3. Real-Time Monitoring
+Track progress with:
+- ✓ Completed steps (green)
+- • Progress indicators (cyan)
+- ✗ Failed steps (red)
+- ⚠ Warnings (yellow)
+- Row counts and execution times
+
+### 4. Alias Table Creation
+After convergence, creates:
+```sql
+CREATE OR REPLACE TABLE database.schema.unified_id_graph_unify_loop_final
+AS SELECT * FROM database.schema.unified_id_graph_unify_loop_3
+```
+
+This allows downstream SQL to reference `loop_final` regardless of actual iteration count.
+
+---
+
+## Technical Details
+
+### Python Script Execution
+The agent executes:
+```bash
+python3 scripts/snowflake/snowflake_sql_executor.py \
+    snowflake_sql/unify/ \
+    --account myaccount \
+    --user myuser \
+    --database my_database \
+    --schema my_schema \
+    --warehouse COMPUTE_WH
+```
+
+### Execution Order
+1. **Setup Phase** (01-03):
+   - Create graph table (loop_0)
+   - Extract and merge identities
+   - Generate source statistics
+
+2. **Unification Loop** (04):
+   - Run iterations until convergence
+   - Check after EVERY iteration
+   - Stop when updated_count = 0
+   - Create loop_final alias
+
+3. **Canonicalization** (05):
+   - Create canonical ID lookup
+   - Create keys and tables metadata
+   - Rename final graph table
+
+4. **Statistics** (06):
+   - Generate result key statistics
+   - Create histograms
+   - Calculate coverage metrics
+
+5. **Enrichment** (10-19):
+   - Add canonical IDs to source tables
+   - Create enriched_* tables
+
+6. **Master Tables** (20-29):
+   - Aggregate attributes
+   - Apply priority rules
+   - Create unified customer profiles
+
+7. **Metadata** (30-39):
+   - Unification metadata
+   - Filter lookup tables
+   - Column lookup tables
+
+### Connection Management
+- Establishes single connection for entire workflow
+- Uses connection pooling for efficiency
+- Automatic reconnection on timeout
+- Proper cleanup on completion or error
+
+---
+
+## Example Execution
+
+### Input
+```
+SQL directory: snowflake_sql/unify/
+Account: myorg-myaccount
+User: analytics_user
+Database: customer_data
+Schema: id_unification
+Warehouse: LARGE_WH
+```
+
+### Output
+```
+✓ Connected to Snowflake: myorg-myaccount
+• Using database: customer_data, schema: id_unification
+
+Starting Snowflake SQL Execution
+• Database: customer_data
+• Schema: id_unification
+
+Executing: 01_create_graph.sql
+✓ 01_create_graph.sql: Executed successfully
+
+Executing: 02_extract_merge.sql
+✓ 02_extract_merge.sql: Executed successfully
+• Rows affected: 125000
+
+Executing: 03_source_key_stats.sql
+✓ 03_source_key_stats.sql: Executed successfully
+
+Executing Unify Loop Before Canonicalization
+
+--- Iteration 1 ---
+✓ Iteration 1 completed
+• Rows processed: 125000
+• Updated records: 1500
+
+--- Iteration 2 ---
+✓ Iteration 2 completed
+• Rows processed: 125000
+• Updated records: 450
+
+--- Iteration 3 ---
+✓ Iteration 3 completed
+• Rows processed: 125000
+• Updated records: 0
+✓ Loop converged after 3 iterations
+
+• Creating alias table for final iteration
+✓ Alias table 'unified_id_graph_unify_loop_final' created
+
+Executing: 05_canonicalize.sql
+✓ 05_canonicalize.sql: Executed successfully
+
+[... continues with enrichment, master tables, metadata ...]
+
+Execution Complete
+• Files processed: 18/18
+• Final unified_id_lookup rows: 98,500
+
+• Disconnected from Snowflake
+```
+
+---
+
+## Monitoring and Troubleshooting
+
+### Check Execution Progress
+During execution, you can monitor:
+- Snowflake query history
+- Table sizes and row counts
+- Warehouse utilization
+- Execution logs
+
+### Common Issues
+
+**Issue**: Connection timeout
+**Solution**: Check network access, verify credentials, ensure warehouse is running
+
+**Issue**: Table not found
+**Solution**: Verify database/schema permissions, check source table names in YAML
+
+**Issue**: Loop doesn't converge
+**Solution**: Check data quality, increase max_iterations, review key validation rules
+
+**Issue**: Warehouse suspended
+**Solution**: Ensure auto-resume is enabled, manually resume warehouse if needed
+
+**Issue**: Permission denied
+**Solution**: Verify database/schema permissions, check role assignments
+
+### Performance Optimization
+- Use larger warehouse for faster execution (L, XL, 2XL, etc.)
+- Enable multi-cluster warehouse for concurrency
+- Use clustering keys on frequently joined columns
+- Monitor query profiles for optimization opportunities
+
+---
+
+## Post-Execution Validation 
+**DO NOT RUN THESE VALIDATION. JUST PRESENT TO USER TO RUN ON SNOWFLAKE**
+
+### Check Coverage
+```sql
+SELECT
+    COUNT(*) as total_records,
+    COUNT(unified_id) as records_with_id,
+    COUNT(unified_id) * 100.0 / COUNT(*) as coverage_percent
+FROM database.schema.enriched_customer_profiles;
+```
+
+### Verify Master Table
+```sql
+SELECT COUNT(*) as unified_customers
+FROM database.schema.customer_master;
+```
+
+### Review Statistics
+```sql
+SELECT * FROM database.schema.unified_id_result_key_stats
+WHERE from_table = '*';
+```
+
+---
+
+## Success Criteria
+
+Execution successful when:
+- ✅ All SQL files processed without critical errors
+- ✅ Unification loop converged (updated_count = 0)
+- ✅ Canonical IDs generated for all eligible records
+- ✅ Enriched tables created successfully
+- ✅ Master tables populated with attributes
+- ✅ Coverage metrics meet expectations
+
+---
+
+## Authentication Examples
+
+### Using Password
+```bash
+export SNOWFLAKE_PASSWORD='your_password'
+/cdp-hybrid-idu:hybrid-execute-snowflake
+```
+
+### Using SSO
+```bash
+/cdp-hybrid-idu:hybrid-execute-snowflake
+# Will prompt: Use SSO authentication? (y/n): y
+# Opens browser for authentication
+```
+
+### Using Key-Pair
+```bash
+export SNOWFLAKE_PRIVATE_KEY_PATH='/path/to/key.p8'
+export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE='passphrase'
+/cdp-hybrid-idu:hybrid-execute-snowflake
+```
+
+---
+
+**Ready to execute your Snowflake ID unification workflow?**
+
+Provide your SQL directory path and Snowflake connection details to begin!
--- a/commands/hybrid-generate-databricks.md
+++ b/commands/hybrid-generate-databricks.md
@@ -0,0 +1,285 @@
+---
+name: hybrid-generate-databricks
+description: Generate Databricks Delta Lake SQL from YAML configuration for ID unification
+---
+
+# Generate Databricks SQL from YAML
+
+## Overview
+
+Generate production-ready Databricks SQL workflow from your `unify.yml` configuration file. This command creates Delta Lake optimized SQL files with ACID transactions, clustering, and platform-specific function conversions.
+
+---
+
+## What You Need
+
+### Required Inputs
+1. **YAML Configuration File**: Path to your `unify.yml`
+2. **Target Catalog**: Databricks Unity Catalog name
+3. **Target Schema**: Schema name within the catalog
+
+### Optional Inputs
+4. **Source Catalog**: Catalog containing source tables (defaults to target catalog)
+5. **Source Schema**: Schema containing source tables (defaults to target schema)
+6. **Output Directory**: Where to save generated SQL (defaults to `databricks_sql/`)
+
+---
+
+## What I'll Do
+
+### Step 1: Validation
+- Verify `unify.yml` exists and is valid
+- Check YAML syntax and structure
+- Validate keys, tables, and configuration sections
+
+### Step 2: SQL Generation
+I'll call the **databricks-sql-generator agent** to:
+- Execute `yaml_unification_to_databricks.py` Python script
+- Apply Databricks-specific SQL conversions:
+  - `ARRAY_SIZE` → `SIZE`
+  - `ARRAY_CONSTRUCT` → `ARRAY`
+  - `OBJECT_CONSTRUCT` → `STRUCT`
+  - `COLLECT_LIST` for aggregations
+  - `FLATTEN` for array operations
+  - `UNIX_TIMESTAMP()` for time functions
+- Generate Delta Lake table definitions with clustering
+- Create convergence detection logic
+- Build cryptographic hashing for canonical IDs
+
+### Step 3: Output Organization
+Generate complete SQL workflow in this structure:
+```
+databricks_sql/unify/
+├── 01_create_graph.sql              # Initialize graph with USING DELTA
+├── 02_extract_merge.sql             # Extract identities with validation
+├── 03_source_key_stats.sql          # Source statistics with GROUPING SETS
+├── 04_unify_loop_iteration_*.sql    # Loop iterations (auto-calculated count)
+├── 05_canonicalize.sql              # Canonical ID creation with key masks
+├── 06_result_key_stats.sql          # Result statistics with histograms
+├── 10_enrich_*.sql                  # Enrich each source table
+├── 20_master_*.sql                  # Master tables with attribute aggregation
+├── 30_unification_metadata.sql      # Metadata tables
+├── 31_filter_lookup.sql             # Validation rules lookup
+└── 32_column_lookup.sql             # Column mapping lookup
+```
+
+### Step 4: Summary Report
+Provide:
+- Total SQL files generated
+- Estimated execution order
+- Delta Lake optimizations included
+- Key features enabled
+- Next steps for execution
+
+---
+
+## Command Usage
+
+### Basic Usage
+```
+/cdp-hybrid-idu:hybrid-generate-databricks
+
+I'll prompt you for:
+- YAML file path
+- Target catalog
+- Target schema
+```
+
+### Advanced Usage
+Provide all parameters upfront:
+```
+YAML file: /path/to/unify.yml
+Target catalog: my_catalog
+Target schema: my_schema
+Source catalog: source_catalog (optional)
+Source schema: source_schema (optional)
+Output directory: custom_output/ (optional)
+```
+
+---
+
+## Generated SQL Features
+
+### Delta Lake Optimizations
+- **ACID Transactions**: `USING DELTA` for all tables
+- **Clustering**: `CLUSTER BY (follower_id)` on graph tables
+- **Table Properties**: Optimized for large-scale joins
+
+### Advanced Capabilities
+1. **Dynamic Iteration Count**: Auto-calculates based on:
+   - Number of merge keys
+   - Number of tables
+   - Data complexity (configurable via YAML)
+
+2. **Key-Specific Hashing**: Each key uses unique cryptographic mask:
+   ```
+   Key Type 1 (email):       0ffdbcf0c666ce190d
+   Key Type 2 (customer_id): 61a821f2b646a4e890
+   Key Type 3 (phone):       acd2206c3f88b3ee27
+   ```
+
+3. **Validation Rules**:
+   - `valid_regexp`: Regex pattern filtering
+   - `invalid_texts`: NOT IN clause with NULL handling
+   - Combined AND logic for strict validation
+
+4. **Master Table Attributes**:
+   - Single value: `MAX_BY(attr, order)` with COALESCE
+   - Array values: `SLICE(CONCAT(arrays), 1, N)`
+   - Priority-based selection
+
+### Platform-Specific Conversions
+The generator automatically converts:
+- Presto functions → Databricks equivalents
+- Snowflake functions → Databricks equivalents
+- Array operations → Spark SQL syntax
+- Window functions → optimized versions
+- Time functions → UNIX_TIMESTAMP()
+
+---
+
+## Example Workflow
+
+### Input YAML (`unify.yml`)
+```yaml
+name: customer_unification
+
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A']
+
+tables:
+  - table: customer_profiles
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+
+canonical_ids:
+  - name: unified_id
+    merge_by_keys: [email, customer_id]
+    merge_iterations: 15
+
+master_tables:
+  - name: customer_master
+    canonical_id: unified_id
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1}
+```
+
+### Generated Output
+```
+databricks_sql/unify/
+├── 01_create_graph.sql              # Creates unified_id_graph_unify_loop_0
+├── 02_extract_merge.sql             # Merges customer_profiles keys
+├── 03_source_key_stats.sql          # Stats by table
+├── 04_unify_loop_iteration_01.sql   # First iteration
+├── 04_unify_loop_iteration_02.sql   # Second iteration
+├── ...                              # Up to iteration_05
+├── 05_canonicalize.sql              # Creates unified_id_lookup
+├── 06_result_key_stats.sql          # Final statistics
+├── 10_enrich_customer_profiles.sql  # Adds unified_id column
+├── 20_master_customer_master.sql    # Creates customer_master table
+├── 30_unification_metadata.sql      # Metadata
+├── 31_filter_lookup.sql             # Validation rules
+└── 32_column_lookup.sql             # Column mappings
+```
+
+---
+
+## Next Steps After Generation
+
+### Option 1: Execute Immediately
+Use the execution command:
+```
+/cdp-hybrid-idu:hybrid-execute-databricks
+```
+
+### Option 2: Review First
+1. Examine generated SQL files
+2. Verify table names and transformations
+3. Test with sample data
+4. Execute manually or via execution command
+
+### Option 3: Customize
+1. Modify generated SQL as needed
+2. Add custom logic or transformations
+3. Execute using Databricks SQL editor or execution command
+
+---
+
+## Technical Details
+
+### Python Script Execution
+The agent executes:
+```bash
+python3 scripts/databricks/yaml_unification_to_databricks.py \
+    unify.yml \
+    -tc my_catalog \
+    -ts my_schema \
+    -sc source_catalog \
+    -ss source_schema \
+    -o databricks_sql
+```
+
+### SQL File Naming Convention
+- `01-09`: Setup and initialization
+- `10-19`: Source table enrichment
+- `20-29`: Master table creation
+- `30-39`: Metadata and lookup tables
+- `04_*_NN`: Loop iterations (auto-numbered)
+
+### Convergence Detection
+Each loop iteration includes:
+```sql
+-- Check if graph changed
+SELECT COUNT(*) FROM (
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM iteration_N
+    EXCEPT
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM iteration_N_minus_1
+) diff
+```
+Stops when count = 0
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue**: YAML validation error
+**Solution**: Check YAML syntax, ensure proper indentation, verify all required fields
+
+**Issue**: Table not found error
+**Solution**: Verify source catalog/schema, check table names in YAML
+
+**Issue**: Python script error
+**Solution**: Ensure Python 3.7+ installed, check pyyaml dependency
+
+**Issue**: Too many/few iterations
+**Solution**: Adjust `merge_iterations` in canonical_ids section of YAML
+
+---
+
+## Success Criteria
+
+Generated SQL will:
+- ✅ Be valid Databricks Spark SQL
+- ✅ Use Delta Lake for ACID transactions
+- ✅ Include proper clustering for performance
+- ✅ Have convergence detection built-in
+- ✅ Support incremental processing
+- ✅ Generate comprehensive statistics
+- ✅ Work without modification on Databricks
+
+---
+
+**Ready to generate Databricks SQL from your YAML configuration?**
+
+Provide your YAML file path and target catalog/schema to begin!
--- a/commands/hybrid-generate-snowflake.md
+++ b/commands/hybrid-generate-snowflake.md
@@ -0,0 +1,288 @@
+---
+name: hybrid-generate-snowflake
+description: Generate Snowflake SQL from YAML configuration for ID unification
+---
+
+# Generate Snowflake SQL from YAML
+
+## Overview
+
+Generate production-ready Snowflake SQL workflow from your `unify.yml` configuration file. This command creates Snowflake-native SQL files with proper clustering, VARIANT support, and platform-specific function conversions.
+
+---
+
+## What You Need
+
+### Required Inputs
+1. **YAML Configuration File**: Path to your `unify.yml`
+2. **Target Database**: Snowflake database name
+3. **Target Schema**: Schema name within the database
+
+### Optional Inputs
+4. **Source Database**: Database containing source tables (defaults to target database)
+5. **Source Schema**: Schema containing source tables (defaults to PUBLIC)
+6. **Output Directory**: Where to save generated SQL (defaults to `snowflake_sql/`)
+
+---
+
+## What I'll Do
+
+### Step 1: Validation
+- Verify `unify.yml` exists and is valid
+- Check YAML syntax and structure
+- Validate keys, tables, and configuration sections
+
+### Step 2: SQL Generation
+I'll call the **snowflake-sql-generator agent** to:
+- Execute `yaml_unification_to_snowflake.py` Python script
+- Generate Snowflake table definitions with clustering
+- Create convergence detection logic
+- Build cryptographic hashing for canonical IDs
+
+### Step 3: Output Organization
+Generate complete SQL workflow in this structure:
+```
+snowflake_sql/unify/
+├── 01_create_graph.sql              # Initialize graph table
+├── 02_extract_merge.sql             # Extract identities with validation
+├── 03_source_key_stats.sql          # Source statistics with GROUPING SETS
+├── 04_unify_loop_iteration_*.sql    # Loop iterations (auto-calculated count)
+├── 05_canonicalize.sql              # Canonical ID creation with key masks
+├── 06_result_key_stats.sql          # Result statistics with histograms
+├── 10_enrich_*.sql                  # Enrich each source table
+├── 20_master_*.sql                  # Master tables with attribute aggregation
+├── 30_unification_metadata.sql      # Metadata tables
+├── 31_filter_lookup.sql             # Validation rules lookup
+└── 32_column_lookup.sql             # Column mapping lookup
+```
+
+### Step 4: Summary Report
+Provide:
+- Total SQL files generated
+- Estimated execution order
+- Snowflake optimizations included
+- Key features enabled
+- Next steps for execution
+
+---
+
+## Command Usage
+
+### Basic Usage
+```
+/cdp-hybrid-idu:hybrid-generate-snowflake
+
+I'll prompt you for:
+- YAML file path
+- Target database
+- Target schema
+```
+
+### Advanced Usage
+Provide all parameters upfront:
+```
+YAML file: /path/to/unify.yml
+Target database: my_database
+Target schema: my_schema
+Source database: source_database (optional)
+Source schema: PUBLIC (optional, defaults to PUBLIC)
+Output directory: custom_output/ (optional)
+```
+
+---
+
+## Generated SQL Features
+
+### Snowflake Optimizations
+- **Clustering**: `CLUSTER BY (follower_id)` on graph tables
+- **VARIANT Support**: Flexible data structures for arrays and objects
+- **Native Functions**: Snowflake-specific optimized functions
+
+### Advanced Capabilities
+1. **Dynamic Iteration Count**: Auto-calculates based on:
+   - Number of merge keys
+   - Number of tables
+   - Data complexity (configurable via YAML)
+
+2. **Key-Specific Hashing**: Each key uses unique cryptographic mask:
+   ```
+   Key Type 1 (email):       0ffdbcf0c666ce190d
+   Key Type 2 (customer_id): 61a821f2b646a4e890
+   Key Type 3 (phone):       acd2206c3f88b3ee27
+   ```
+
+3. **Validation Rules**:
+   - `valid_regexp`: REGEXP_LIKE pattern filtering
+   - `invalid_texts`: NOT IN clause with proper NULL handling
+   - Combined AND logic for strict validation
+
+4. **Master Table Attributes**:
+   - Single value: `MAX_BY(attr, order)` with COALESCE
+   - Array values: `ARRAY_SLICE(ARRAY_CAT(arrays), 0, N)`
+   - Priority-based selection
+
+### Platform-Specific Conversions
+The generator automatically converts:
+- Presto functions → Snowflake equivalents
+- Databricks functions → Snowflake equivalents
+- Array operations → ARRAY_CONSTRUCT/FLATTEN syntax
+- Window functions → optimized versions
+- Time functions → DATE_PART(epoch_second, CURRENT_TIMESTAMP())
+
+---
+
+## Example Workflow
+
+### Input YAML (`unify.yml`)
+```yaml
+name: customer_unification
+
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A']
+
+tables:
+  - table: customer_profiles
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+
+canonical_ids:
+  - name: unified_id
+    merge_by_keys: [email, customer_id]
+    merge_iterations: 15
+
+master_tables:
+  - name: customer_master
+    canonical_id: unified_id
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1}
+```
+
+### Generated Output
+```
+snowflake_sql/unify/
+├── 01_create_graph.sql              # Creates unified_id_graph_unify_loop_0
+├── 02_extract_merge.sql             # Merges customer_profiles keys
+├── 03_source_key_stats.sql          # Stats by table
+├── 04_unify_loop_iteration_01.sql   # First iteration
+├── 04_unify_loop_iteration_02.sql   # Second iteration
+├── ...                              # Up to iteration_05
+├── 05_canonicalize.sql              # Creates unified_id_lookup
+├── 06_result_key_stats.sql          # Final statistics
+├── 10_enrich_customer_profiles.sql  # Adds unified_id column
+├── 20_master_customer_master.sql    # Creates customer_master table
+├── 30_unification_metadata.sql      # Metadata
+├── 31_filter_lookup.sql             # Validation rules
+└── 32_column_lookup.sql             # Column mappings
+```
+
+---
+
+## Next Steps After Generation
+
+### Option 1: Execute Immediately
+Use the execution command:
+```
+/cdp-hybrid-idu:hybrid-execute-snowflake
+```
+
+### Option 2: Review First
+1. Examine generated SQL files
+2. Verify table names and transformations
+3. Test with sample data
+4. Execute manually or via execution command
+
+### Option 3: Customize
+1. Modify generated SQL as needed
+2. Add custom logic or transformations
+3. Execute using Snowflake SQL worksheet or execution command
+
+---
+
+## Technical Details
+
+### Python Script Execution
+The agent executes:
+```bash
+python3 scripts/snowflake/yaml_unification_to_snowflake.py \
+    unify.yml \
+    -d my_database \
+    -s my_schema \
+    -sd source_database \
+    -ss source_schema \
+    -o snowflake_sql
+```
+
+### SQL File Naming Convention
+- `01-09`: Setup and initialization
+- `10-19`: Source table enrichment
+- `20-29`: Master table creation
+- `30-39`: Metadata and lookup tables
+- `04_*_NN`: Loop iterations (auto-numbered)
+
+### Convergence Detection
+Each loop iteration includes:
+```sql
+-- Check if graph changed
+SELECT COUNT(*) FROM (
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM iteration_N
+    EXCEPT
+    SELECT leader_ns, leader_id, follower_ns, follower_id
+    FROM iteration_N_minus_1
+) diff
+```
+Stops when count = 0
+
+### Snowflake-Specific Features
+- **LATERAL FLATTEN**: Array expansion for id_ns_array processing
+- **ARRAY_CONSTRUCT**: Building arrays from multiple columns
+- **OBJECT_CONSTRUCT**: Creating structured objects for key-value pairs
+- **ARRAYS_OVERLAP**: Checking array membership
+- **SPLIT_PART**: String splitting for leader key parsing
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue**: YAML validation error
+**Solution**: Check YAML syntax, ensure proper indentation, verify all required fields
+
+**Issue**: Table not found error
+**Solution**: Verify source database/schema, check table names in YAML
+
+**Issue**: Python script error
+**Solution**: Ensure Python 3.7+ installed, check pyyaml dependency
+
+**Issue**: Too many/few iterations
+**Solution**: Adjust `merge_iterations` in canonical_ids section of YAML
+
+**Issue**: VARIANT column errors
+**Solution**: Snowflake VARIANT type handling is automatic, ensure proper casting in custom SQL
+
+---
+
+## Success Criteria
+
+Generated SQL will:
+- ✅ Be valid Snowflake SQL
+- ✅ Use native Snowflake functions
+- ✅ Include proper clustering for performance
+- ✅ Have convergence detection built-in
+- ✅ Support VARIANT types for flexible data
+- ✅ Generate comprehensive statistics
+- ✅ Work without modification on Snowflake
+
+---
+
+**Ready to generate Snowflake SQL from your YAML configuration?**
+
+Provide your YAML file path and target database/schema to begin!
--- a/commands/hybrid-setup.md
+++ b/commands/hybrid-setup.md
@@ -0,0 +1,308 @@
+---
+name: hybrid-setup
+description: Complete end-to-end hybrid ID unification setup - automatically analyzes tables, generates config, creates SQL, and executes workflow for Snowflake and Databricks
+---
+
+# Hybrid ID Unification Complete Setup
+
+## Overview
+
+I'll guide you through the complete hybrid ID unification setup process for Snowflake and/or Databricks platforms. This is an **automated, end-to-end workflow** that will:
+
+1. **Analyze your tables automatically** using platform MCP tools with strict PII detection
+2. **Generate YAML configuration** from real schema and data analysis
+3. **Choose target platform(s)** (Snowflake, Databricks, or both)
+4. **Generate platform-specific SQL** optimized for each engine
+5. **Execute workflows** with convergence detection and monitoring
+6. **Provide deployment guidance** and operating instructions
+
+**Key Features**:
+- 🔍 **Automated Table Analysis**: Uses Snowflake/Databricks MCP tools to analyze actual tables
+- ✅ **Strict PII Detection**: Zero tolerance - only includes tables with real user identifiers
+- 📊 **Real Data Validation**: Queries actual data to validate patterns and quality
+- 🎯 **Smart Recommendations**: Expert analysis provides merge strategy and priorities
+- 🚀 **End-to-End Automation**: From table analysis to workflow execution
+
+---
+
+## What You Need to Provide
+
+### 1. Unification Requirements (For Automated Analysis)
+- **Platform**: Snowflake or Databricks
+- **Tables**: List of source tables to analyze
+  - Format (Snowflake): `database.schema.table` or `schema.table` or `table`
+  - Format (Databricks): `catalog.schema.table` or `schema.table` or `table`
+- **Canonical ID Name**: Name for your unified ID (e.g., `td_id`, `unified_customer_id`)
+- **Merge Iterations**: Number of unification loops (default: 10)
+- **Master Tables**: (Optional) Attribute aggregation specifications
+
+**Note**: The system will automatically:
+- Extract user identifiers from actual table schemas
+- Validate data patterns from real data
+- Apply appropriate validation rules based on data analysis
+- Generate merge strategy recommendations
+
+### 2. Platform Selection
+- **Databricks**: Unity Catalog with Delta Lake
+- **Snowflake**: Database with proper permissions
+- **Both**: Generate SQL for both platforms
+
+### 3. Target Configurations
+
+**For Databricks**:
+- **Catalog**: Target catalog name
+- **Schema**: Target schema name
+- **Source Catalog** (optional): Source data catalog
+- **Source Schema** (optional): Source data schema
+
+**For Snowflake**:
+- **Database**: Target database name
+- **Schema**: Target schema name
+- **Source Schema** (optional): Source data schema
+
+### 4. Execution Credentials (if executing)
+
+**For Databricks**:
+- **Server Hostname**: your-workspace.databricks.com
+- **HTTP Path**: /sql/1.0/warehouses/your-warehouse-id
+- **Authentication**: PAT (Personal Access Token) or OAuth
+
+**For Snowflake**:
+- **Account**: Snowflake account name
+- **User**: Username
+- **Password**: Password or use SSO/key-pair
+- **Warehouse**: Compute warehouse name
+
+---
+
+## What I'll Do
+
+### Step 1: Automated YAML Configuration Generation
+I'll use the **hybrid-unif-config-creator** command to automatically generate your `unify.yml` file:
+
+**Automated Analysis Approach** (Recommended):
+- Analyze your actual tables using platform MCP tools (Snowflake/Databricks)
+- Extract user identifiers with STRICT PII detection (zero tolerance for guessing)
+- Validate data patterns from real table data
+- Generate unify.yml with exact template compliance
+- Only include tables with actual user identifiers
+- Document excluded tables with detailed reasons
+
+**What I'll do**:
+- Call the **hybrid-unif-keys-extractor agent** to analyze tables
+- Query actual schema and data using platform MCP tools
+- Detect valid user identifiers (email, customer_id, phone, etc.)
+- Exclude tables without PII with full documentation
+- Generate production-ready unify.yml automatically
+
+**Alternative - Manual Configuration**:
+- If MCP tools are unavailable, I'll guide you through manual configuration
+- Interactive prompts for keys, tables, and validation rules
+- Step-by-step YAML building with validation
+
+### Step 2: Platform Selection and Configuration
+I'll help you:
+- Choose between Databricks, Snowflake, or both
+- Collect platform-specific configuration (catalog/database, schema names)
+- Determine source/target separation strategy
+- Decide on execution or generation-only mode
+
+### Step 3: SQL Generation
+
+**For Databricks** (if selected):
+I'll call the **databricks-sql-generator agent** to:
+- Execute `yaml_unification_to_databricks.py` script
+- Generate Delta Lake optimized SQL workflow
+- Create output directory: `databricks_sql/unify/`
+- Generate 15+ SQL files with proper execution order
+
+**For Snowflake** (if selected):
+I'll call the **snowflake-sql-generator agent** to:
+- Execute `yaml_unification_to_snowflake.py` script
+- Generate Snowflake-native SQL workflow
+- Create output directory: `snowflake_sql/unify/`
+- Generate 15+ SQL files with proper execution order
+
+### Step 4: Workflow Execution (Optional)
+
+**For Databricks** (if execution requested):
+I'll call the **databricks-workflow-executor agent** to:
+- Execute `databricks_sql_executor.py` script
+- Connect to your Databricks workspace
+- Run SQL files in proper sequence
+- Monitor convergence and progress
+- Optimize Delta tables
+- Report final statistics
+
+**For Snowflake** (if execution requested):
+I'll call the **snowflake-workflow-executor agent** to:
+- Execute `snowflake_sql_executor.py` script
+- Connect to your Snowflake account
+- Run SQL files in proper sequence
+- Monitor convergence and progress
+- Report final statistics
+
+### Step 5: Deployment Guidance
+I'll provide:
+- Configuration summary
+- Generated files overview
+- Deployment instructions
+- Operating guidelines
+- Monitoring recommendations
+
+---
+
+## Interactive Workflow
+
+This command orchestrates the complete end-to-end flow by calling specialized commands in sequence:
+
+### Phase 1: Configuration Creation
+**I'll ask you for**:
+- Platform (Snowflake or Databricks)
+- Tables to analyze
+- Canonical ID name
+- Merge iterations
+
+**Then I'll**:
+- Call `/cdp-hybrid-idu:hybrid-unif-config-creator` internally
+- Analyze your tables automatically
+- Generate `unify.yml` with strict PII detection
+- Show you the configuration for review
+
+### Phase 2: SQL Generation
+**I'll ask you**:
+- Which platform(s) to generate SQL for (can be different from source)
+- Output directory preferences
+
+**Then I'll**:
+- Call `/cdp-hybrid-idu:hybrid-generate-snowflake` (if Snowflake selected)
+- Call `/cdp-hybrid-idu:hybrid-generate-databricks` (if Databricks selected)
+- Generate 15+ optimized SQL files per platform
+- Show you the execution plan
+
+### Phase 3: Workflow Execution (Optional)
+**I'll ask you**:
+- Do you want to execute now or later?
+- Connection credentials if executing
+
+**Then I'll**:
+- Call `/cdp-hybrid-idu:hybrid-execute-snowflake` (if Snowflake selected)
+- Call `/cdp-hybrid-idu:hybrid-execute-databricks` (if Databricks selected)
+- Monitor convergence and progress
+- Report final statistics
+
+**Throughout the process**:
+- **Questions**: When I need your input
+- **Suggestions**: Recommended approaches based on best practices
+- **Validation**: Real-time checks on your choices
+- **Explanations**: Help you understand concepts and options
+
+---
+
+## Expected Output
+
+### Files Created (Platform-specific):
+
+**For Databricks**:
+```
+databricks_sql/unify/
+├── 01_create_graph.sql              # Initialize identity graph
+├── 02_extract_merge.sql             # Extract and merge identities
+├── 03_source_key_stats.sql          # Source statistics
+├── 04_unify_loop_iteration_*.sql    # Iterative unification (N files)
+├── 05_canonicalize.sql              # Canonical ID creation
+├── 06_result_key_stats.sql          # Result statistics
+├── 10_enrich_*.sql                  # Source table enrichment (N files)
+├── 20_master_*.sql                  # Master table creation (N files)
+├── 30_unification_metadata.sql      # Metadata tables
+├── 31_filter_lookup.sql             # Validation rules
+└── 32_column_lookup.sql             # Column mappings
+```
+
+**For Snowflake**:
+```
+snowflake_sql/unify/
+├── 01_create_graph.sql              # Initialize identity graph
+├── 02_extract_merge.sql             # Extract and merge identities
+├── 03_source_key_stats.sql          # Source statistics
+├── 04_unify_loop_iteration_*.sql    # Iterative unification (N files)
+├── 05_canonicalize.sql              # Canonical ID creation
+├── 06_result_key_stats.sql          # Result statistics
+├── 10_enrich_*.sql                  # Source table enrichment (N files)
+├── 20_master_*.sql                  # Master table creation (N files)
+├── 30_unification_metadata.sql      # Metadata tables
+├── 31_filter_lookup.sql             # Validation rules
+└── 32_column_lookup.sql             # Column mappings
+```
+
+**Configuration**:
+```
+unify.yml                            # YAML configuration (created interactively)
+```
+
+---
+
+## Success Criteria
+
+All generated files will:
+- ✅ Be platform-optimized and production-ready
+- ✅ Use proper SQL dialects (Databricks Spark SQL or Snowflake SQL)
+- ✅ Include convergence detection logic
+- ✅ Support incremental processing
+- ✅ Generate comprehensive statistics
+- ✅ Work without modification on target platforms
+
+---
+
+## Getting Started
+
+**Ready to begin?** I'll use the **hybrid-unif-config-creator** to automatically analyze your tables and generate the YAML configuration.
+
+Please provide:
+
+1. **Platform**: Which platform contains your data?
+   - Snowflake or Databricks
+
+2. **Tables**: Which source tables should I analyze?
+   - Format (Snowflake): `database.schema.table` or `schema.table` or `table`
+   - Format (Databricks): `catalog.schema.table` or `schema.table` or `table`
+   - Example: `customer_db.public.customers`, `orders`, `web_events.user_activity`
+
+3. **Canonical ID Name**: What should I call the unified ID?
+   - Example: `td_id`, `unified_customer_id`, `master_id`
+   - Default: `td_id`
+
+4. **Merge Iterations** (optional): How many unification loops?
+   - Default: 10
+   - Range: 2-30
+
+5. **Target Platform(s)** for SQL generation:
+   - Same as source, or generate for both platforms
+
+**Example**:
+```
+I want to set up hybrid ID unification for:
+
+Platform: Snowflake
+Tables:
+- customer_db.public.customer_profiles
+- customer_db.public.orders
+- marketing_db.public.campaigns
+- event_db.public.web_events
+
+Canonical ID: unified_customer_id
+Merge Iterations: 10
+Generate SQL for: Snowflake (or both Snowflake and Databricks)
+```
+
+**What I'll do next**:
+1. ✅ Analyze your tables using Snowflake MCP tools
+2. ✅ Extract user identifiers with strict PII detection
+3. ✅ Generate unify.yml automatically
+4. ✅ Generate platform-specific SQL files
+5. ✅ Execute workflow (if requested)
+6. ✅ Provide deployment guidance
+
+---
+
+**Let's get started with your hybrid ID unification setup!**
--- a/commands/hybrid-unif-config-creator.md
+++ b/commands/hybrid-unif-config-creator.md
@@ -0,0 +1,491 @@
+---
+name: hybrid-unif-config-creator
+description: Auto-generate unify.yml configuration for Snowflake/Databricks by extracting user identifiers from actual tables using strict PII detection
+---
+
+# Unify Configuration Creator for Snowflake/Databricks
+
+## Overview
+
+I'll automatically generate a production-ready `unify.yml` configuration file for your Snowflake or Databricks ID unification by:
+
+1. **Analyzing your actual tables** using platform-specific MCP tools
+2. **Extracting user identifiers** with zero-tolerance PII detection
+3. **Validating data patterns** from real table data
+4. **Generating unify.yml** using the exact template format
+5. **Providing recommendations** for merge strategies and priorities
+
+**This command uses STRICT analysis - only tables with actual user identifiers will be included.**
+
+---
+
+## What You Need to Provide
+
+### 1. Platform Selection
+- **Snowflake**: For Snowflake databases
+- **Databricks**: For Databricks Unity Catalog tables
+
+### 2. Tables to Analyze
+Provide tables you want to analyze for ID unification:
+- **Format (Snowflake)**: `database.schema.table` or `schema.table` or `table`
+- **Format (Databricks)**: `catalog.schema.table` or `schema.table` or `table`
+- **Example**: `customer_data.public.customers`, `orders`, `web_events.user_activity`
+
+### 3. Canonical ID Configuration
+- **Name**: Name for your unified ID (default: `td_id`)
+- **Merge Iterations**: Number of unification loop iterations (default: 10)
+- **Incremental Iterations**: Iterations for incremental processing (default: 5)
+
+### 4. Output Configuration (Optional)
+- **Output File**: Where to save unify.yml (default: `unify.yml`)
+- **Template Path**: Path to template if using custom (default: uses built-in exact template)
+
+---
+
+## What I'll Do
+
+### Step 1: Platform Detection and Validation
+```
+1. Confirm platform (Snowflake or Databricks)
+2. Verify MCP tools are available for the platform
+3. Set up platform-specific query patterns
+4. Inform you of the analysis approach
+```
+
+### Step 2: Key Extraction with hybrid-unif-keys-extractor Agent
+I'll launch the **hybrid-unif-keys-extractor agent** to:
+
+**Schema Analysis**:
+- Use platform MCP tools to describe each table
+- Extract exact column names and data types
+- Identify accessible vs inaccessible tables
+
+**User Identifier Detection**:
+- Apply STRICT matching rules for user identifiers:
+  - ✅ Email columns (email, email_std, email_address, etc.)
+  - ✅ Phone columns (phone, phone_number, mobile_phone, etc.)
+  - ✅ User IDs (user_id, customer_id, account_id, etc.)
+  - ✅ Cookie/Device IDs (td_client_id, cookie_id, etc.)
+  - ❌ System columns (id, created_at, time, etc.)
+  - ❌ Complex types (arrays, maps, objects, variants, structs)
+
+**Data Validation**:
+- Query actual MIN/MAX values from each identified column
+- Analyze data patterns and quality
+- Count unique values per identifier
+- Detect data quality issues
+
+**Table Classification**:
+- **INCLUDED**: Tables with valid user identifiers
+- **EXCLUDED**: Tables without user identifiers (fully documented why)
+
+**Expert Analysis**:
+- 3 SQL experts review the data
+- Provide priority recommendations
+- Suggest validation rules based on actual data patterns
+
+### Step 3: Unify.yml Generation
+
+**CRITICAL**: Using the **EXACT BUILT-IN template structure** (embedded in hybrid-unif-keys-extractor agent)
+
+**Template Usage Process**:
+```
+1. Receive structured data from hybrid-unif-keys-extractor agent:
+   - Keys with validation rules
+   - Tables with column mappings
+   - Canonical ID configuration
+   - Master tables specification
+
+2. Use BUILT-IN template structure (see agent documentation)
+
+3. ONLY replace these specific values:
+   - Line 1: name: {canonical_id_name}
+   - keys section: actual keys found
+   - tables section: actual tables with actual columns
+   - canonical_ids section: name and merge_by_keys
+   - master_tables section: [] or user specifications
+
+4. PRESERVE everything else:
+   - ALL comment blocks (#####...)
+   - ALL comment text ("Declare Validation logic", etc.)
+   - ALL spacing and indentation (2 spaces per level)
+   - ALL blank lines
+   - EXACT YAML structure
+
+5. Use Write tool to save populated unify.yml
+```
+
+**I'll generate**:
+
+**Section 1: Canonical ID Name**
+```yaml
+name: {your_canonical_id_name}
+```
+
+**Section 2: Keys with Validation**
+```yaml
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+  - name: phone_number
+    invalid_texts: ['', 'N/A', 'null']
+```
+*Populated with actual keys found in your tables*
+
+**Section 3: Tables with Key Column Mappings**
+```yaml
+tables:
+  - database: {database/catalog}
+    table: {table_name}
+    key_columns:
+      - {column: actual_column_name, key: mapped_key}
+      - {column: another_column, key: another_key}
+```
+*Only tables with valid user identifiers, with EXACT column names from schema analysis*
+
+**Section 4: Canonical IDs Configuration**
+```yaml
+canonical_ids:
+  - name: {your_canonical_id_name}
+    merge_by_keys: [email, customer_id, phone_number]
+    merge_iterations: 15
+```
+*Based on extracted keys and your configuration*
+
+**Section 5: Master Tables (Optional)**
+```yaml
+master_tables:
+  - name: {canonical_id_name}_master_table
+    canonical_id: {canonical_id_name}
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: table1, column: email, order: last, order_by: time, priority: 1}
+          - {table: table2, column: email_address, order: last, order_by: time, priority: 2}
+```
+*If you request master table configuration, I'll help set up attribute aggregation*
+
+### Step 4: Validation and Review
+
+After generation:
+```
+1. Show complete unify.yml content
+2. Highlight key sections:
+   - Keys found: [list]
+   - Tables included: [count]
+   - Tables excluded: [count] with reasons
+   - Merge strategy: [keys and priorities]
+3. Provide recommendations for optimization
+4. Ask for your approval before saving
+```
+
+### Step 5: File Output
+
+```
+1. Write unify.yml to specified location
+2. Create backup of existing file if present
+3. Provide file summary:
+   - Keys configured: X
+   - Tables configured: Y
+   - Validation rules: Z
+4. Show next steps for using the configuration
+```
+
+---
+
+## Example Workflow
+
+**Input**:
+```
+Platform: Snowflake
+Tables:
+  - customer_data.public.customers
+  - customer_data.public.orders
+  - web_data.public.events
+Canonical ID Name: unified_customer_id
+Output: snowflake_unify.yml
+```
+
+**Process**:
+```
+✓ Platform: Snowflake MCP tools detected
+✓ Analyzing 3 tables...
+
+Schema Analysis:
+  ✓ customer_data.public.customers - 12 columns
+  ✓ customer_data.public.orders - 8 columns
+  ✓ web_data.public.events - 15 columns
+
+User Identifier Detection:
+  ✓ customers: email, customer_id (2 identifiers)
+  ✓ orders: customer_id, email_address (2 identifiers)
+  ✗ events: NO user identifiers found
+    Available columns: event_id, session_id, page_url, timestamp, ...
+    Reason: Contains only event tracking data - no PII
+
+Data Analysis:
+  ✓ email: 45,123 unique values, format valid
+  ✓ customer_id: 45,089 unique values, numeric
+  ✓ email_address: 12,456 unique values, format valid
+
+Expert Analysis Complete:
+  Priority 1: customer_id (most stable, highest coverage)
+  Priority 2: email (good coverage, some quality issues)
+  Priority 3: phone_number (not found)
+
+Generating unify.yml...
+  ✓ Keys section: 2 keys configured
+  ✓ Tables section: 2 tables configured
+  ✓ Canonical IDs: unified_customer_id
+  ✓ Validation rules: Applied based on data patterns
+
+Tables EXCLUDED:
+  - web_data.public.events: No user identifiers
+```
+
+**Output (snowflake_unify.yml)**:
+```yaml
+name: unified_customer_id
+
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A', 'null']
+
+tables:
+  - database: customer_data
+    table: customers
+    key_columns:
+      - {column: email, key: email}
+      - {column: customer_id, key: customer_id}
+  - database: customer_data
+    table: orders
+    key_columns:
+      - {column: email_address, key: email}
+      - {column: customer_id, key: customer_id}
+
+canonical_ids:
+  - name: unified_customer_id
+    merge_by_keys: [customer_id, email]
+    merge_iterations: 15
+
+master_tables: []
+```
+
+---
+
+## Key Features
+
+### 🔍 **STRICT PII Detection**
+- Zero tolerance for guessing
+- Only includes tables with actual user identifiers
+- Documents why tables are excluded
+- Based on REAL schema and data analysis
+
+### ✅ **Exact Template Compliance**
+- Uses BUILT-IN exact template structure (embedded in hybrid-unif-keys-extractor agent)
+- NO modifications to template format
+- Preserves all comment sections
+- Maintains exact YAML structure
+- Portable across all systems
+
+### 📊 **Real Data Analysis**
+- Queries actual MIN/MAX values
+- Counts unique identifiers
+- Validates data patterns
+- Identifies quality issues
+
+### 🎯 **Platform-Aware**
+- Uses correct MCP tools for each platform
+- Respects platform naming conventions
+- Applies platform-specific data type rules
+- Generates platform-compatible SQL references
+
+### 📋 **Complete Documentation**
+- Documents all excluded tables with reasons
+- Lists available columns for excluded tables
+- Explains why columns don't qualify as user identifiers
+- Provides expert recommendations
+
+---
+
+## Output Format
+
+**The generated unify.yml will have EXACTLY this structure:**
+
+```yaml
+name: {canonical_id_name}
+#####################################################
+##
+##Declare Validation logic for unification keys
+##
+#####################################################
+keys:
+  - name: {key1}
+    valid_regexp: "{pattern}"
+    invalid_texts: ['{val1}', '{val2}', '{val3}']
+  - name: {key2}
+    invalid_texts: ['{val1}', '{val2}', '{val3}']
+
+#####################################################
+##
+##Declare databases, tables, and keys to use during unification
+##
+#####################################################
+
+tables:
+  - database: {db/catalog}
+    table: {table}
+    key_columns:
+      - {column: {col}, key: {key}}
+
+#####################################################
+##
+##Declare hierarchy for unification. Define keys to use for each level.
+##
+#####################################################
+
+canonical_ids:
+  - name: {canonical_id_name}
+    merge_by_keys: [{key1}, {key2}, ...]
+    merge_iterations: {number}
+
+#####################################################
+##
+##Declare Similar Attributes and standardize into a single column
+##
+#####################################################
+
+master_tables:
+  - name: {canonical_id_name}_master_table
+    canonical_id: {canonical_id_name}
+    attributes:
+      - name: {attribute}
+        source_columns:
+          - {table: {t}, column: {c}, order: last, order_by: time, priority: 1}
+```
+
+**NO deviations from this structure - EXACT template compliance guaranteed.**
+
+---
+
+## Prerequisites
+
+### Required:
+- ✅ Snowflake or Databricks platform access
+- ✅ Platform-specific MCP tools configured (may use fallback if unavailable)
+- ✅ Read permissions on tables to be analyzed
+- ✅ Tables must exist and be accessible
+
+### Optional:
+- Custom unify.yml template path (if not using default)
+- Master table attribute specifications
+- Custom validation rules
+
+---
+
+## Expected Timeline
+
+| Step | Duration |
+|------|----------|
+| Platform detection | < 1 min |
+| Schema analysis (per table) | 5-10 sec |
+| Data analysis (per identifier) | 10-20 sec |
+| Expert analysis | 1-2 min |
+| YAML generation | < 1 min |
+| **Total (for 5 tables)** | **~3-5 min** |
+
+---
+
+## Error Handling
+
+### Common Issues:
+
+**Issue**: MCP tools not available for platform
+**Solution**:
+- I'll inform you and provide fallback options
+- You can provide schema information manually
+- I'll still generate unify.yml with validation warnings
+
+**Issue**: No tables have user identifiers
+**Solution**:
+- I'll show you why tables were excluded
+- Suggest alternative tables to analyze
+- Explain what constitutes a user identifier
+
+**Issue**: Table not accessible
+**Solution**:
+- Document which tables are inaccessible
+- Continue with accessible tables
+- Recommend permission checks
+
+**Issue**: Complex data types found
+**Solution**:
+- Exclude complex type columns (arrays, structs, maps)
+- Explain why they can't be used for unification
+- Suggest alternative columns if available
+
+---
+
+## Success Criteria
+
+Generated unify.yml will:
+- ✅ Use EXACT template structure - NO modifications
+- ✅ Contain ONLY tables with validated user identifiers
+- ✅ Include ONLY columns that actually exist in tables
+- ✅ Have validation rules based on actual data patterns
+- ✅ Be ready for immediate use with hybrid-generate-snowflake or hybrid-generate-databricks
+- ✅ Work without any manual edits
+- ✅ Include comprehensive documentation in comments
+
+---
+
+## Next Steps After Generation
+
+1. **Review the generated unify.yml**
+   - Verify tables and columns are correct
+   - Check validation rules are appropriate
+   - Review merge strategy and priorities
+
+2. **Generate SQL for your platform**:
+   - Snowflake: `/cdp-hybrid-idu:hybrid-generate-snowflake`
+   - Databricks: `/cdp-hybrid-idu:hybrid-generate-databricks`
+
+3. **Execute the workflow**:
+   - Snowflake: `/cdp-hybrid-idu:hybrid-execute-snowflake`
+   - Databricks: `/cdp-hybrid-idu:hybrid-execute-databricks`
+
+4. **Monitor convergence and results**
+
+---
+
+## Getting Started
+
+**Ready to begin?**
+
+Please provide:
+
+1. **Platform**: Snowflake or Databricks
+2. **Tables**: List of tables to analyze (full paths)
+3. **Canonical ID Name**: Name for your unified ID (e.g., `unified_customer_id`)
+4. **Output File** (optional): Where to save unify.yml (default: `unify.yml`)
+
+**Example**:
+```
+Platform: Snowflake
+Tables:
+  - customer_db.public.customers
+  - customer_db.public.orders
+  - marketing_db.public.campaigns
+Canonical ID: unified_id
+Output: snowflake_unify.yml
+```
+
+---
+
+**I'll analyze your tables and generate a production-ready unify.yml configuration!**
--- a/commands/hybrid-unif-config-validate.md
+++ b/commands/hybrid-unif-config-validate.md
@@ -0,0 +1,337 @@
+---
+name: hybrid-unif-config-validate
+description: Validate YAML configuration for hybrid ID unification before SQL generation
+---
+
+# Validate Hybrid ID Unification YAML
+
+## Overview
+
+Validate your `unify.yml` configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
+
+---
+
+## What You Need
+
+### Required Input
+1. **YAML Configuration File**: Path to your `unify.yml`
+
+---
+
+## What I'll Do
+
+### Step 1: File Validation
+- Verify file exists and is readable
+- Check YAML syntax (proper indentation, quotes, etc.)
+- Ensure all required sections are present
+
+### Step 2: Structure Validation
+Check presence and structure of:
+- **name**: Unification project name
+- **keys**: Key definitions with validation rules
+- **tables**: Source tables with key column mappings
+- **canonical_ids**: Canonical ID configuration
+- **master_tables**: Master table definitions (optional)
+
+### Step 3: Content Validation
+Validate individual sections:
+
+**Keys Section**:
+- ✓ Each key has a unique name
+- ✓ `valid_regexp` is a valid regex pattern (if provided)
+- ✓ `invalid_texts` is an array (if provided)
+- ⚠ Recommend validation rules if missing
+
+**Tables Section**:
+- ✓ Each table has a name
+- ✓ Each table has at least one key_column
+- ✓ All referenced keys exist in keys section
+- ✓ Column names are valid identifiers
+- ⚠ Check for duplicate table definitions
+
+**Canonical IDs Section**:
+- ✓ Has a name (will be canonical ID column name)
+- ✓ `merge_by_keys` references existing keys
+- ✓ `merge_iterations` is a positive integer (if provided)
+- ⚠ Suggest optimal iteration count if not specified
+
+**Master Tables Section** (if present):
+- ✓ Each master table has a name and canonical_id
+- ✓ Referenced canonical_id exists
+- ✓ Attributes have proper structure
+- ✓ Source tables in attributes exist
+- ✓ Priority values are valid
+- ⚠ Check for attribute conflicts
+
+### Step 4: Cross-Reference Validation
+- ✓ All merge_by_keys exist in keys section
+- ✓ All key_columns reference defined keys
+- ✓ All master table source tables exist in tables section
+- ✓ Canonical ID names don't conflict with existing columns
+
+### Step 5: Best Practices Check
+Provide recommendations for:
+- Key validation rules
+- Iteration count optimization
+- Master table attribute priorities
+- Performance considerations
+
+### Step 6: Validation Report
+Generate comprehensive report with:
+- ✅ Passed checks
+- ⚠ Warnings (non-critical issues)
+- ❌ Errors (must fix before generation)
+- 💡 Recommendations for improvement
+
+---
+
+## Command Usage
+
+### Basic Usage
+```
+/cdp-hybrid-idu:hybrid-unif-config-validate
+
+I'll prompt you for:
+- YAML file path
+```
+
+### Direct Usage
+```
+YAML file: /path/to/unify.yml
+```
+
+---
+
+## Example Validation
+
+### Input YAML
+```yaml
+name: customer_unification
+
+keys:
+  - name: email
+    valid_regexp: ".*@.*"
+    invalid_texts: ['', 'N/A', 'null']
+  - name: customer_id
+    invalid_texts: ['', 'N/A']
+
+tables:
+  - table: customer_profiles
+    key_columns:
+      - {column: email_std, key: email}
+      - {column: customer_id, key: customer_id}
+  - table: orders
+    key_columns:
+      - {column: email_address, key: email}
+
+canonical_ids:
+  - name: unified_id
+    merge_by_keys: [email, customer_id]
+    merge_iterations: 15
+
+master_tables:
+  - name: customer_master
+    canonical_id: unified_id
+    attributes:
+      - name: best_email
+        source_columns:
+          - {table: customer_profiles, column: email_std, priority: 1}
+          - {table: orders, column: email_address, priority: 2}
+```
+
+### Validation Report
+```
+✅ YAML VALIDATION SUCCESSFUL
+
+File Structure:
+  ✅ Valid YAML syntax
+  ✅ All required sections present
+  ✅ Proper indentation and formatting
+
+Keys Section (2 keys):
+  ✅ email: Valid regex pattern, invalid_texts defined
+  ✅ customer_id: Invalid_texts defined
+  ⚠ Consider adding valid_regexp for customer_id for better validation
+
+Tables Section (2 tables):
+  ✅ customer_profiles: 2 key columns mapped
+  ✅ orders: 1 key column mapped
+  ✅ All referenced keys exist
+
+Canonical IDs Section:
+  ✅ Name: unified_id
+  ✅ Merge keys: email, customer_id (both exist)
+  ✅ Iterations: 15 (recommended range: 10-20)
+
+Master Tables Section (1 master table):
+  ✅ customer_master: References unified_id
+  ✅ Attribute 'best_email': 2 sources with priorities
+  ✅ All source tables exist
+
+Cross-References:
+  ✅ All merge_by_keys defined in keys section
+  ✅ All key_columns reference existing keys
+  ✅ All master table sources exist
+  ✅ No canonical ID name conflicts
+
+Recommendations:
+  💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
+  💡 Add more master table attributes for richer customer profiles
+  💡 Consider array attributes (top_3_emails) for historical tracking
+
+Summary:
+  ✅ 0 errors
+  ⚠ 1 warning
+  💡 3 recommendations
+
+✓ Configuration is ready for SQL generation!
+```
+
+---
+
+## Validation Checks
+
+### Required Checks (Must Pass)
+- [ ] File exists and is readable
+- [ ] Valid YAML syntax
+- [ ] `name` field present
+- [ ] `keys` section present with at least one key
+- [ ] `tables` section present with at least one table
+- [ ] `canonical_ids` section present
+- [ ] All merge_by_keys exist in keys section
+- [ ] All key_columns reference defined keys
+- [ ] No duplicate key names
+- [ ] No duplicate table names
+
+### Warning Checks (Recommended)
+- [ ] Keys have validation rules (valid_regexp or invalid_texts)
+- [ ] Merge_iterations specified (otherwise auto-calculated)
+- [ ] Master tables defined for unified customer view
+- [ ] Source tables have unique key combinations
+- [ ] Attribute priorities are sequential
+
+### Best Practice Checks
+- [ ] Email keys have email regex pattern
+- [ ] Phone keys have phone validation
+- [ ] Invalid_texts include common null values ('', 'N/A', 'null')
+- [ ] Master tables use time-based order_by for recency
+- [ ] Array attributes for historical data (top_3_emails, etc.)
+
+---
+
+## Common Validation Errors
+
+### Syntax Errors
+**Error**: `Invalid YAML: mapping values are not allowed here`
+**Solution**: Check indentation (use spaces, not tabs), ensure colons have space after them
+
+**Error**: `Invalid YAML: could not find expected ':'`
+**Solution**: Check for missing colons in key-value pairs
+
+### Structure Errors
+**Error**: `Missing required section: keys`
+**Solution**: Add keys section with at least one key definition
+
+**Error**: `Empty tables section`
+**Solution**: Add at least one table with key_columns
+
+### Reference Errors
+**Error**: `Key 'phone' referenced in table 'orders' but not defined in keys section`
+**Solution**: Add phone key to keys section or remove reference
+
+**Error**: `Merge key 'phone_number' not found in keys section`
+**Solution**: Add phone_number to keys section or remove from merge_by_keys
+
+**Error**: `Master table source 'customer_360' not found in tables section`
+**Solution**: Add customer_360 to tables section or use correct table name
+
+### Value Errors
+**Error**: `merge_iterations must be a positive integer, got: 'auto'`
+**Solution**: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
+
+**Error**: `Priority must be a positive integer, got: 'high'`
+**Solution**: Use numeric priority (1 for highest, 2 for second, etc.)
+
+---
+
+## Validation Levels
+
+### Strict Mode (Default)
+- Fails on any structural errors
+- Warns on missing best practices
+- Recommends optimizations
+
+### Lenient Mode
+- Only fails on critical syntax errors
+- Allows missing optional fields
+- Minimal warnings
+
+---
+
+## Platform-Specific Validation
+
+### Databricks-Specific
+- ✓ Table names compatible with Unity Catalog
+- ✓ Column names valid for Spark SQL
+- ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)
+
+### Snowflake-Specific
+- ✓ Table names compatible with Snowflake
+- ✓ Column names valid for Snowflake SQL
+- ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)
+
+---
+
+## What Happens Next
+
+### If Validation Passes
+```
+✅ Configuration validated successfully!
+
+Ready for:
+  • SQL generation (Databricks or Snowflake)
+  • Direct execution after generation
+
+Next steps:
+  1. /cdp-hybrid-idu:hybrid-generate-databricks
+  2. /cdp-hybrid-idu:hybrid-generate-snowflake
+  3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
+```
+
+### If Validation Fails
+```
+❌ Configuration has errors that must be fixed
+
+Errors (must fix):
+  1. Missing required section: canonical_ids
+  2. Undefined key 'phone' referenced in table 'orders'
+
+Suggestions:
+  • Add canonical_ids section with name and merge_by_keys
+  • Add phone key to keys section or remove from orders
+
+Would you like help fixing these issues? (y/n)
+```
+
+I can help you:
+- Fix syntax errors
+- Add missing sections
+- Define proper validation rules
+- Optimize configuration
+
+---
+
+## Success Criteria
+
+Validation passes when:
+- ✅ YAML syntax is valid
+- ✅ All required sections present
+- ✅ All references resolved
+- ✅ No structural errors
+- ✅ Ready for SQL generation
+
+---
+
+**Ready to validate your YAML configuration?**
+
+Provide your `unify.yml` file path to begin validation!
--- a/commands/hybrid-unif-merge-stats-creator.md
+++ b/commands/hybrid-unif-merge-stats-creator.md
@@ -0,0 +1,726 @@
+---
+name: hybrid-unif-merge-stats-creator
+description: Generate professional HTML/PDF merge statistics report from ID unification results for Snowflake or Databricks with expert analysis and visualizations
+---
+
+# ID Unification Merge Statistics Report Generator
+
+## Overview
+
+I'll generate a **comprehensive, professional HTML report** analyzing your ID unification merge statistics with:
+
+- 📊 **Executive Summary** with key performance indicators
+- 📈 **Identity Resolution Performance** analysis and deduplication rates
+- 🎯 **Merge Distribution** patterns and complexity analysis
+- 👥 **Top Merged Profiles** highlighting complex identity resolutions
+- ✅ **Data Quality Metrics** with coverage percentages
+- 🚀 **Convergence Analysis** showing iteration performance
+- 💡 **Expert Recommendations** for optimization and next steps
+
+**Platform Support:**
+- ✅ Snowflake (using Snowflake MCP tools)
+- ✅ Databricks (using Databricks MCP tools)
+
+**Output Format:**
+- Beautiful HTML report with charts, tables, and visualizations
+- PDF-ready (print to PDF from browser)
+- Consistent formatting every time
+- Platform-agnostic design
+
+---
+
+## What You Need to Provide
+
+### 1. Platform Selection
+- **Snowflake**: For Snowflake-based ID unification
+- **Databricks**: For Databricks-based ID unification
+
+### 2. Database/Catalog Configuration
+
+**For Snowflake:**
+- **Database Name**: Where your unification tables are stored (e.g., `INDRESH_TEST`, `CUSTOMER_CDP`)
+- **Schema Name**: Schema containing tables (e.g., `PUBLIC`, `ID_UNIFICATION`)
+
+**For Databricks:**
+- **Catalog Name**: Unity Catalog name (e.g., `customer_data`, `cdp_prod`)
+- **Schema Name**: Schema containing tables (e.g., `id_unification`, `unified_profiles`)
+
+### 3. Canonical ID Configuration
+- **Canonical ID Name**: Name used for your unified ID (e.g., `td_id`, `unified_customer_id`, `master_id`)
+  - This is used to find the correct tables: `{canonical_id}_lookup`, `{canonical_id}_master_table`, etc.
+
+### 4. Output Configuration (Optional)
+- **Output File Path**: Where to save the HTML report (default: `id_unification_report.html`)
+- **Report Title**: Custom title for the report (default: "ID Unification Merge Statistics Report")
+
+---
+
+## What I'll Do
+
+### Step 1: Platform Detection and Validation
+
+**Snowflake:**
+```
+1. Verify Snowflake MCP tools are available
+2. Test connection to specified database.schema
+3. Validate canonical ID tables exist:
+   - {database}.{schema}.{canonical_id}_lookup
+   - {database}.{schema}.{canonical_id}_master_table
+   - {database}.{schema}.{canonical_id}_source_key_stats
+   - {database}.{schema}.{canonical_id}_result_key_stats
+4. Confirm access permissions
+```
+
+**Databricks:**
+```
+1. Verify Databricks MCP tools are available (or use Snowflake fallback)
+2. Test connection to specified catalog.schema
+3. Validate canonical ID tables exist
+4. Confirm access permissions
+```
+
+### Step 2: Data Collection with Expert Analysis
+
+I'll execute **16 specialized queries** to collect comprehensive statistics:
+
+**Core Statistics Queries:**
+
+1. **Source Key Statistics**
+   - Pre-unification identity counts
+   - Distinct values per key type (customer_id, email, phone, etc.)
+   - Per-table breakdowns
+
+2. **Result Key Statistics**
+   - Post-unification canonical ID counts
+   - Distribution histograms
+   - Coverage per key type
+
+3. **Canonical ID Metrics**
+   - Total identities processed
+   - Unique canonical IDs created
+   - Merge ratio calculation
+
+4. **Top Merged Profiles**
+   - Top 10 most complex merges
+   - Identity count per canonical ID
+   - Merge complexity scoring
+
+5. **Merge Distribution Analysis**
+   - Categorization (2, 3-5, 6-10, 10+ identities)
+   - Percentage distribution
+   - Pattern analysis
+
+6. **Key Type Distribution**
+   - Identity breakdown by type
+   - Namespace analysis
+   - Cross-key coverage
+
+7. **Master Table Quality Metrics**
+   - Attribute coverage percentages
+   - Data completeness analysis
+   - Sample record extraction
+
+8. **Configuration Metadata**
+   - Unification settings
+   - Column mappings
+   - Validation rules
+
+**Platform-Specific SQL Adaptation:**
+
+For **Snowflake**:
+```sql
+SELECT COUNT(*) as total_identities,
+       COUNT(DISTINCT canonical_id) as unique_canonical_ids
+FROM {database}.{schema}.{canonical_id}_lookup;
+```
+
+For **Databricks**:
+```sql
+SELECT COUNT(*) as total_identities,
+       COUNT(DISTINCT canonical_id) as unique_canonical_ids
+FROM {catalog}.{schema}.{canonical_id}_lookup;
+```
+
+### Step 3: Statistical Analysis and Calculations
+
+I'll perform expert-level calculations:
+
+**Deduplication Rates:**
+```
+For each key type:
+- Source distinct count (pre-unification)
+- Final canonical IDs (post-unification)
+- Deduplication % = (source - final) / source * 100
+```
+
+**Merge Ratios:**
+```
+- Average identities per customer = total_identities / unique_canonical_ids
+- Distribution across categories
+- Outlier detection (10+ merges)
+```
+
+**Convergence Analysis:**
+```
+- Parse from execution logs if available
+- Calculate from iteration metadata tables
+- Estimate convergence quality
+```
+
+**Data Quality Scores:**
+```
+- Coverage % for each attribute
+- Completeness assessment
+- Quality grading (Excellent, Good, Needs Improvement)
+```
+
+### Step 4: HTML Report Generation
+
+I'll generate a **pixel-perfect HTML report** with:
+
+**Design Features:**
+- ✨ Modern gradient design (purple theme)
+- 📊 Interactive visualizations (progress bars, horizontal bar charts)
+- 🎨 Color-coded badges and status indicators
+- 📱 Responsive layout (works on all devices)
+- 🖨️ Print-optimized CSS for PDF export
+
+**Report Structure:**
+
+```html
+<!DOCTYPE html>
+<html>
+  <head>
+    - Professional CSS styling
+    - Chart/visualization styles
+    - Print media queries
+  </head>
+  <body>
+    <header>
+      - Report title
+      - Executive tagline
+    </header>
+
+    <metadata-bar>
+      - Database/Catalog info
+      - Canonical ID name
+      - Generation timestamp
+      - Platform indicator
+    </metadata-bar>
+
+    <section: Executive Summary>
+      - 4 KPI metric cards
+      - Key findings insight box
+    </section>
+
+    <section: Identity Resolution Performance>
+      - Source vs result comparison table
+      - Deduplication rate analysis
+      - Horizontal bar charts
+      - Expert insights
+    </section>
+
+    <section: Merge Distribution Analysis>
+      - Category breakdown table
+      - Distribution visualizations
+      - Pattern analysis insights
+    </section>
+
+    <section: Top Merged Profiles>
+      - Top 10 ranked table
+      - Complexity badges
+      - Investigation recommendations
+    </section>
+
+    <section: Source Table Configuration>
+      - Column mapping table
+      - Source contributions
+      - Multi-key strategy analysis
+    </section>
+
+    <section: Master Table Data Quality>
+      - 6 coverage cards with progress bars
+      - Sample records table
+      - Quality assessment
+    </section>
+
+    <section: Convergence Performance>
+      - Iteration breakdown table
+      - Convergence progression chart
+      - Efficiency analysis
+    </section>
+
+    <section: Expert Recommendations>
+      - 4 recommendation cards
+      - Strategic next steps
+      - Downstream activation ideas
+    </section>
+
+    <section: Summary Statistics>
+      - Comprehensive metrics table
+      - All key numbers documented
+    </section>
+
+    <footer>
+      - Generation metadata
+      - Platform information
+      - Report description
+    </footer>
+  </body>
+</html>
+```
+
+### Step 5: Quality Validation and Output
+
+**Pre-Output Validation:**
+```
+1. Verify all sections have data
+2. Check calculations are correct
+3. Validate percentages sum properly
+4. Ensure no missing values
+5. Confirm HTML is well-formed
+```
+
+**File Output:**
+```
+1. Write HTML to specified path
+2. Create backup if file exists
+3. Set proper file permissions
+4. Verify file was written successfully
+```
+
+**Report Summary:**
+```
+✓ Report generated: {file_path}
+✓ File size: {size} KB
+✓ Sections included: 9
+✓ Statistics queries: 16
+✓ Data quality score: {score}%
+✓ Ready for: Browser viewing, PDF export, sharing
+```
+
+---
+
+## Example Workflow
+
+### Snowflake Example
+
+**User Input:**
+```
+Platform: Snowflake
+Database: INDRESH_TEST
+Schema: PUBLIC
+Canonical ID: td_id
+Output: snowflake_merge_report.html
+```
+
+**Process:**
+```
+✓ Connected to Snowflake via MCP
+✓ Database: INDRESH_TEST.PUBLIC validated
+✓ Tables found:
+  - td_id_lookup (19,512 records)
+  - td_id_master_table (4,940 records)
+  - td_id_source_key_stats (4 records)
+  - td_id_result_key_stats (4 records)
+
+Executing queries:
+  ✓ Query 1: Source statistics retrieved
+  ✓ Query 2: Result statistics retrieved
+  ✓ Query 3: Canonical ID counts (19,512 → 4,940)
+  ✓ Query 4: Top 10 merged profiles identified
+  ✓ Query 5: Merge distribution calculated
+  ✓ Query 6: Key type distribution analyzed
+  ✓ Query 7: Master table coverage (100% email, 99.39% phone)
+  ✓ Query 8: Sample records extracted
+  ✓ Query 9-11: Metadata retrieved
+
+Calculating metrics:
+  ✓ Merge ratio: 3.95:1
+  ✓ Fragmentation reduction: 74.7%
+  ✓ Deduplication rates:
+    - customer_id: 23.9%
+    - email: 32.0%
+    - phone: 14.8%
+  ✓ Data quality score: 99.7%
+
+Generating HTML report:
+  ✓ Executive summary section
+  ✓ Performance analysis section
+  ✓ Merge distribution section
+  ✓ Top profiles section
+  ✓ Source configuration section
+  ✓ Data quality section
+  ✓ Convergence section
+  ✓ Recommendations section
+  ✓ Summary statistics section
+
+✓ Report saved: snowflake_merge_report.html (142 KB)
+✓ Open in browser to view
+✓ Print to PDF for distribution
+```
+
+**Generated Report Contents:**
+```
+Executive Summary:
+  - 4,940 unified profiles
+  - 19,512 total identities
+  - 3.95:1 merge ratio
+  - 74.7% fragmentation reduction
+
+Identity Resolution:
+  - customer_id: 6,489 → 4,940 (23.9% reduction)
+  - email: 7,261 → 4,940 (32.0% reduction)
+  - phone: 5,762 → 4,910 (14.8% reduction)
+
+Merge Distribution:
+  - 89.0% profiles: 3-5 identities (normal)
+  - 8.1% profiles: 6-10 identities (high engagement)
+  - 2.3% profiles: 10+ identities (complex)
+
+Top Merged Profile:
+  - mS9ssBEh4EsN: 38 identities merged
+
+Data Quality:
+  - Email: 100% coverage
+  - Phone: 99.39% coverage
+  - Names: 100% coverage
+  - Location: 100% coverage
+
+Expert Recommendations:
+  - Implement incremental processing
+  - Monitor profiles with 20+ merges
+  - Enable downstream activation
+  - Set up quality monitoring
+```
+
+### Databricks Example
+
+**User Input:**
+```
+Platform: Databricks
+Catalog: customer_cdp
+Schema: id_unification
+Canonical ID: unified_customer_id
+Output: databricks_merge_report.html
+```
+
+**Process:**
+```
+✓ Connected to Databricks (or using Snowflake MCP fallback)
+✓ Catalog: customer_cdp.id_unification validated
+✓ Tables found:
+  - unified_customer_id_lookup
+  - unified_customer_id_master_table
+  - unified_customer_id_source_key_stats
+  - unified_customer_id_result_key_stats
+
+[Same query execution and report generation as Snowflake]
+
+✓ Report saved: databricks_merge_report.html
+```
+
+---
+
+## Key Features
+
+### 🎯 **Consistency Guarantee**
+- **Same report every time**: Deterministic HTML generation
+- **Platform-agnostic design**: Works identically on Snowflake and Databricks
+- **Version controlled**: Report structure is fixed and versioned
+
+### 🔍 **Expert Analysis**
+- **16 specialized queries**: Comprehensive data collection
+- **Calculated metrics**: Deduplication rates, merge ratios, quality scores
+- **Pattern detection**: Identify anomalies and outliers
+- **Strategic insights**: Actionable recommendations
+
+### 📊 **Professional Visualizations**
+- **KPI metric cards**: Large, colorful summary metrics
+- **Progress bars**: Coverage percentages with animations
+- **Horizontal bar charts**: Distribution comparisons
+- **Color-coded badges**: Status indicators (Excellent, Good, Needs Review)
+- **Tables with hover effects**: Interactive data exploration
+
+### 🌍 **Platform Flexibility**
+- **Snowflake**: Uses `mcp__snowflake__execute_query` tool
+- **Databricks**: Uses Databricks MCP tools (with fallback options)
+- **Automatic SQL adaptation**: Platform-specific query generation
+- **Table name resolution**: Handles catalog vs database differences
+
+### 📋 **Comprehensive Coverage**
+
+**9 Report Sections:**
+1. Executive Summary (4 KPIs + findings)
+2. Identity Resolution Performance (deduplication analysis)
+3. Merge Distribution Analysis (categorized breakdown)
+4. Top Merged Profiles (complexity ranking)
+5. Source Table Configuration (mappings)
+6. Master Table Data Quality (coverage metrics)
+7. Convergence Performance (iteration analysis)
+8. Expert Recommendations (strategic guidance)
+9. Summary Statistics (complete metrics)
+
+**16 Statistical Queries:**
+- Source/result key statistics
+- Canonical ID counts and distributions
+- Merge pattern analysis
+- Quality coverage metrics
+- Configuration metadata
+
+---
+
+## Table Naming Conventions
+
+The command automatically finds tables based on your canonical ID name:
+
+### Required Tables
+
+For canonical ID = `{canonical_id}`:
+
+1. **Lookup Table**: `{canonical_id}_lookup`
+   - Contains: canonical_id, id, id_key_type
+   - Used for: Merge ratio, distribution, top profiles
+
+2. **Master Table**: `{canonical_id}_master_table`
+   - Contains: {canonical_id}, best_* attributes
+   - Used for: Data quality coverage
+
+3. **Source Stats**: `{canonical_id}_source_key_stats`
+   - Contains: from_table, total_distinct, distinct_*
+   - Used for: Pre-unification baseline
+
+4. **Result Stats**: `{canonical_id}_result_key_stats`
+   - Contains: from_table, total_distinct, histogram_*
+   - Used for: Post-unification results
+
+### Optional Tables
+
+5. **Unification Metadata**: `unification_metadata`
+   - Contains: canonical_id_name, canonical_id_type
+   - Used for: Configuration documentation
+
+6. **Column Lookup**: `column_lookup`
+   - Contains: table_name, column_name, key_name
+   - Used for: Source table mappings
+
+7. **Filter Lookup**: `filter_lookup`
+   - Contains: key_name, invalid_texts, valid_regexp
+   - Used for: Validation rules
+
+**All tables must be in the same database.schema (Snowflake) or catalog.schema (Databricks)**
+
+---
+
+## Output Format
+
+### HTML Report Features
+
+**Styling:**
+- Gradient purple theme (#667eea to #764ba2)
+- Modern typography (system fonts)
+- Responsive grid layouts
+- Smooth hover animations
+- Print-optimized media queries
+
+**Sections:**
+- Header with gradient background
+- Metadata bar with key info
+- 9 content sections with analysis
+- Footer with generation details
+
+**Visualizations:**
+- Metric cards (4 in executive summary)
+- Progress bars (6 in data quality)
+- Horizontal bar charts (3 throughout report)
+- Tables with sorting and hover effects
+- Insight boxes with recommendations
+
+**Interactivity:**
+- Hover effects on cards and tables
+- Animated progress bars
+- Expandable insight boxes
+- Responsive layout adapts to screen size
+
+### PDF Export
+
+To create a PDF from the HTML report:
+
+1. Open HTML file in browser
+2. Press Ctrl+P (Windows) or Cmd+P (Mac)
+3. Select "Save as PDF"
+4. Choose landscape orientation for better chart visibility
+5. Enable background graphics for full styling
+
+---
+
+## Error Handling
+
+### Common Issues and Solutions
+
+**Issue: "Tables not found"**
+```
+Solution:
+1. Verify canonical ID name is correct
+2. Check database/catalog and schema names
+3. Ensure unification workflow completed successfully
+4. Confirm table naming: {canonical_id}_lookup, {canonical_id}_master_table, etc.
+```
+
+**Issue: "MCP tools not available"**
+```
+Solution:
+1. For Snowflake: Verify Snowflake MCP server is configured
+2. For Databricks: Fall back to Snowflake MCP with proper connection string
+3. Check network connectivity
+4. Validate credentials
+```
+
+**Issue: "No data in statistics tables"**
+```
+Solution:
+1. Verify unification workflow ran completely
+2. Check that statistics SQL files were executed
+3. Confirm data exists in lookup and master tables
+4. Re-run the unification workflow if needed
+```
+
+**Issue: "Permission denied"**
+```
+Solution:
+1. Verify READ access to all tables
+2. For Snowflake: Grant SELECT on schema
+3. For Databricks: Grant USE CATALOG, USE SCHEMA, SELECT
+4. Check role/user permissions
+```
+
+---
+
+## Success Criteria
+
+Generated report will:
+
+- ✅ **Open successfully** in all modern browsers (Chrome, Firefox, Safari, Edge)
+- ✅ **Display all 9 sections** with complete data
+- ✅ **Show accurate calculations** for all metrics
+- ✅ **Include visualizations** (charts, progress bars, tables)
+- ✅ **Render consistently** every time it's generated
+- ✅ **Export cleanly to PDF** with proper formatting
+- ✅ **Match the reference design** (same HTML/CSS structure)
+- ✅ **Contain expert insights** and recommendations
+- ✅ **Be production-ready** for stakeholder distribution
+
+---
+
+## Usage Examples
+
+### Quick Start (Snowflake)
+
+```
+/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
+
+> Platform: Snowflake
+> Database: PROD_CDP
+> Schema: ID_UNIFICATION
+> Canonical ID: master_customer_id
+> Output: (press Enter for default)
+
+✓ Report generated: id_unification_report.html
+```
+
+### Custom Output Path
+
+```
+/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
+
+> Platform: Databricks
+> Catalog: analytics_prod
+> Schema: unified_ids
+> Canonical ID: td_id
+> Output: /reports/weekly/td_id_stats_2025-10-15.html
+
+✓ Report generated: /reports/weekly/td_id_stats_2025-10-15.html
+```
+
+### Multiple Environments
+
+Generate reports for different environments:
+
+```bash
+# Production
+/hybrid-unif-merge-stats-creator
+  Platform: Snowflake
+  Database: PROD_CDP
+  Output: prod_merge_stats.html
+
+# Staging
+/hybrid-unif-merge-stats-creator
+  Platform: Snowflake
+  Database: STAGING_CDP
+  Output: staging_merge_stats.html
+
+# Compare metrics across environments
+```
+
+---
+
+## Best Practices
+
+### Regular Reporting
+
+1. **Weekly Reports**: Track merge performance over time
+2. **Post-Workflow Reports**: Generate after each unification run
+3. **Quality Audits**: Monthly deep-dive analysis
+4. **Stakeholder Updates**: Executive-friendly format
+
+### Comparative Analysis
+
+Generate reports at different stages:
+- After initial unification setup
+- After incremental updates
+- After data quality improvements
+- Across different customer segments
+
+### Archive and Versioning
+
+```
+reports/
+  2025-10-15_td_id_merge_stats.html
+  2025-10-08_td_id_merge_stats.html
+  2025-10-01_td_id_merge_stats.html
+```
+
+Track improvements over time by comparing:
+- Merge ratios
+- Data quality scores
+- Convergence iterations
+- Deduplication rates
+
+---
+
+## Getting Started
+
+**Ready to generate your merge statistics report?**
+
+Please provide:
+
+1. **Platform**: Snowflake or Databricks?
+2. **Database/Catalog**: Where are your unification tables?
+3. **Schema**: Which schema contains the tables?
+4. **Canonical ID**: What's the name of your unified ID? (e.g., td_id)
+5. **Output Path** (optional): Where to save the report?
+
+**Example:**
+```
+I want to generate a merge statistics report for:
+
+Platform: Snowflake
+Database: INDRESH_TEST
+Schema: PUBLIC
+Canonical ID: td_id
+Output: my_unification_report.html
+```
+
+---
+
+**I'll analyze your ID unification results and create a comprehensive, beautiful HTML report with expert insights!**
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,101 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:treasure-data/aps_claude_tools:plugins/cdp-hybrid-idu",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "58382efafa00d9c88bf68f0ba2be494e310d9827",
+    "treeHash": "04cbd3c0d2b818afaf15f92f7e5fb2880103cdbfd513d9926f323c5b7722f625",
+    "generatedAt": "2025-11-28T10:28:44.950550Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "cdp-hybrid-idu",
+    "description": "Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation",
+    "version": null
+  },
+  "content": {
+    "files": [
+      {
+        "path": "README.md",
+        "sha256": "4e50b588ce6c220815a4ca869c68f41fe23cbaf05846fb306e7b2cbf127ed8f8"
+      },
+      {
+        "path": "agents/hybrid-unif-keys-extractor.md",
+        "sha256": "d2c92a61393209f0835f0118240254a7fa6f209aa62ec87d0ab253723055a7da"
+      },
+      {
+        "path": "agents/merge-stats-report-generator.md",
+        "sha256": "6e8fda43a277dfef132566b44a3dee23a632171641dcde0151d7602f43bcb5e8"
+      },
+      {
+        "path": "agents/databricks-sql-generator.md",
+        "sha256": "ae3ce3874d7c00599fcef09718cb612e551aac89896e3c75aa1194332179df9d"
+      },
+      {
+        "path": "agents/databricks-workflow-executor.md",
+        "sha256": "ecc4fcf94d470fe27f078e8722297921469852d596035e3d9d5b5d32aa2b0435"
+      },
+      {
+        "path": "agents/yaml-configuration-builder.md",
+        "sha256": "da90f575f8f0f7e33fba1ad720c73556e029227fcf135cd9fe4a9a1d3fb77be3"
+      },
+      {
+        "path": "agents/snowflake-sql-generator.md",
+        "sha256": "783ee1653bca7e0bb2647b953b4c05390e08686f7454c8e1a9e572851e8e0fc8"
+      },
+      {
+        "path": "agents/snowflake-workflow-executor.md",
+        "sha256": "f5f5352f47cfdd5a52769988ed9893f5a64de2a236f145b315838d475babca2c"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "a156b276659131718eab652f7b9806ab00bf59318ee07e22a585e3cb13da5e93"
+      },
+      {
+        "path": "commands/hybrid-setup.md",
+        "sha256": "9a287a1c414323cd6db2c5f3197fcfde531d337168e2696bc7f4896113ae40b6"
+      },
+      {
+        "path": "commands/hybrid-generate-databricks.md",
+        "sha256": "aff13cf95a74cd71dff35e3a4cd4ba2f287a7b3091f84cdb914d80e00bfe29ad"
+      },
+      {
+        "path": "commands/hybrid-generate-snowflake.md",
+        "sha256": "0dc460f41ee3c8130aa9a52537686fec6818e7a37a802040b8a570d8f89eaf77"
+      },
+      {
+        "path": "commands/hybrid-unif-config-creator.md",
+        "sha256": "3e14989f811e5ef198cff306e9203ec6bfa5f3772daa3a0f08292595574ab73c"
+      },
+      {
+        "path": "commands/hybrid-execute-databricks.md",
+        "sha256": "ad78068c5b96d310d1d620c00572c100915e0706a5312c0649b09a8165bbc79c"
+      },
+      {
+        "path": "commands/hybrid-unif-config-validate.md",
+        "sha256": "a413582bc43a23ad1addde134007bd6a3174b14d71c10dcbbd5f7824a6a97fb0"
+      },
+      {
+        "path": "commands/hybrid-unif-merge-stats-creator.md",
+        "sha256": "0c00db96f02559d212e502702eea5d3a02de8fedbca92f64eebd8ed430f96341"
+      },
+      {
+        "path": "commands/hybrid-execute-snowflake.md",
+        "sha256": "63fbc27f3350cd1d910d0dc4c588a14fd39e1d7ebda29ed6fce3584967bac4c4"
+      }
+    ],
+    "dirSha256": "04cbd3c0d2b818afaf15f92f7e5fb2880103cdbfd513d9926f323c5b7722f625"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}