Initial commit

2025-11-30 09:02:39 +08:00
commit 515e7bf6be
18 changed files with 5770 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,15 @@
 {
  "name": "cdp-hybrid-idu",
  "description": "Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation",
  "version": "0.0.0-2025.11.28",
  "author": {
    "name": "@cdp-tools-marketplace",
    "email": "zhongweili@tubi.tv"
  },
  "agents": [
    "./agents"
  ],
  "commands": [
    "./commands"
  ]
 }
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # cdp-hybrid-idu
 Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation
--- a/agents/databricks-sql-generator.md
+++ b/agents/databricks-sql-generator.md
@@ -0,0 +1,114 @@
 # Databricks SQL Generator Agent
 ## Agent Purpose
 Generate production-ready Databricks Delta Lake SQL from `unify.yml` configuration by executing the Python script `yaml_unification_to_databricks.py`.
 ## Agent Workflow
 ### Step 1: Validate Inputs
 **Check**:
 - YAML file exists and is valid
 - Target catalog and schema provided
 - Source catalog/schema (defaults to target if not provided)
 - Output directory path
 ### Step 2: Execute Python Script
 **Use Bash tool** to execute:
 ```bash
 python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/yaml_unification_to_databricks.py \
    <yaml_file> \
    -tc <target_catalog> \
    -ts <target_schema> \
    -sc <source_catalog> \
    -ss <source_schema> \
    -o <output_directory>
 ```
 **Parameters**:
 - `<yaml_file>`: Path to unify.yml
 - `-tc`: Target catalog name
 - `-ts`: Target schema name
 - `-sc`: Source catalog (optional, defaults to target catalog)
 - `-ss`: Source schema (optional, defaults to target schema)
 - `-o`: Output directory (optional, defaults to `databricks_sql`)
 ### Step 3: Monitor Execution
 **Track**:
 - Script execution progress
 - Generated SQL file count
 - Any warnings or errors
 - Output directory structure
 ### Step 4: Parse and Report Results
 **Output**:
 ```
 ✓ Databricks SQL generation complete!
 Generated Files:
  • databricks_sql/unify/01_create_graph.sql
  • databricks_sql/unify/02_extract_merge.sql
  • databricks_sql/unify/03_source_key_stats.sql
  • databricks_sql/unify/04_unify_loop_iteration_01.sql
  ... (up to iteration_N)
  • databricks_sql/unify/05_canonicalize.sql
  • databricks_sql/unify/06_result_key_stats.sql
  • databricks_sql/unify/10_enrich_*.sql
  • databricks_sql/unify/20_master_*.sql
  • databricks_sql/unify/30_unification_metadata.sql
  • databricks_sql/unify/31_filter_lookup.sql
  • databricks_sql/unify/32_column_lookup.sql
 Total: X SQL files
 Configuration:
  • Catalog: <catalog_name>
  • Schema: <schema_name>
  • Iterations: N (calculated from YAML)
  • Tables: X enriched, Y master tables
 Delta Lake Features Enabled:
  ✓ ACID transactions
  ✓ Optimized clustering
  ✓ Convergence detection
  ✓ Performance optimizations
 Next Steps:
  1. Review generated SQL files
  2. Execute using: /cdp-hybrid-idu:hybrid-execute-databricks
  3. Or manually execute in Databricks SQL editor
 ```
 ## Critical Behaviors
 ### Python Script Error Handling
 If script fails:
 1. Capture error output
 2. Parse error message
 3. Provide helpful suggestions:
   - YAML syntax errors → validate YAML
   - Missing dependencies → install pyyaml
   - Invalid table names → check YAML table section
   - File permission errors → check output directory permissions
 ### Success Validation
 Verify:
 - Output directory created
 - All expected SQL files present
 - Files have non-zero content
 - SQL syntax looks valid (basic check)
 ### Platform-Specific Conversions
 Report applied conversions:
 - Presto/Snowflake functions → Databricks equivalents
 - Array operations → Spark SQL syntax
 - Time functions → UNIX_TIMESTAMP()
 - Table definitions → USING DELTA
 ## MUST DO
 1. **Always use absolute paths** for plugin scripts
 2. **Check Python version** (require Python 3.7+)
 3. **Parse script output** for errors and warnings
 4. **Verify output directory** structure
 5. **Count generated files** and report summary
 6. **Provide clear next steps** for execution
--- a/agents/databricks-workflow-executor.md
+++ b/agents/databricks-workflow-executor.md
@@ -0,0 +1,145 @@
 # Databricks Workflow Executor Agent
 ## Agent Purpose
 Execute generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script `databricks_sql_executor.py`.
 ## Agent Workflow
 ### Step 1: Collect Credentials
 **Required**:
 - SQL directory path
 - Server hostname (e.g., `your-workspace.cloud.databricks.com`)
 - HTTP path (e.g., `/sql/1.0/warehouses/abc123`)
 - Catalog and schema names
 - Authentication type (PAT or OAuth)
 **For PAT Authentication**:
 - Access token (from argument, environment variable `DATABRICKS_TOKEN`, or prompt)
 **For OAuth**:
 - No token required (browser-based auth)
 ### Step 2: Execute Python Script
 **Use Bash tool** with `run_in_background: true` to execute:
 ```bash
 python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/databricks_sql_executor.py \
    <sql_directory> \
    --server-hostname <hostname> \
    --http-path <http_path> \
    --catalog <catalog> \
    --schema <schema> \
    --auth-type <pat|oauth> \
    --access-token <token> \
    --optimize-tables
 ```
 ### Step 3: Monitor Execution in Real-Time
 **Use BashOutput tool** to stream progress:
 - Connection status
 - File execution progress
 - Row counts and timing
 - Convergence detection results
 - Optimization status
 - Error messages
 **Display Progress**:
 ```
 ✓ Connected to Databricks: <hostname>
 • Using catalog: <catalog>, schema: <schema>
 Executing: 01_create_graph.sql
 ✓ Completed: 01_create_graph.sql
 Executing: 02_extract_merge.sql
 ✓ Completed: 02_extract_merge.sql
 • Rows affected: 125,000
 Executing Unify Loop (convergence detection)
 --- Iteration 1 ---
 ✓ Iteration 1 completed
 • Updated records: 1,500
 • Optimizing Delta table...
 --- Iteration 2 ---
 ✓ Iteration 2 completed
 • Updated records: 450
 • Optimizing Delta table...
 --- Iteration 3 ---
 ✓ Iteration 3 completed
 • Updated records: 0
 ✓ Loop converged after 3 iterations!
 • Creating alias table: loop_final
 ...
 ```
 ### Step 4: Handle Interactive Prompts
 If script encounters errors and prompts for continuation:
 ```
 ✗ Error in file: 04_unify_loop_iteration_01.sql
 Error: Table not found
 Continue with remaining files? (y/n):
 ```
 **Agent Decision**:
 1. Show error to user
 2. Ask user for decision
 3. Pass response to script (via stdin if possible, or stop/restart)
 ### Step 5: Final Report
 **After completion**:
 ```
 Execution Complete!
 Summary:
  • Files processed: 18/18
  • Execution time: 45 minutes
  • Convergence: 3 iterations
  • Final lookup table rows: 98,500
 Validation:
  ✓ All tables created successfully
  ✓ Canonical IDs generated
  ✓ Enriched tables populated
  ✓ Master tables created
 Next Steps:
  1. Verify data quality
  2. Check coverage metrics
  3. Review statistics tables
 ```
 ## Critical Behaviors
 ### Convergence Monitoring
 Track loop iterations:
 - Iteration number
 - Records updated
 - Convergence status
 - Optimization progress
 ### Error Recovery
 On errors:
 1. Capture error details
 2. Determine severity (critical vs warning)
 3. Prompt user for continuation decision
 4. Log error for troubleshooting
 ### Performance Tracking
 Monitor:
 - Execution time per file
 - Row counts processed
 - Optimization duration
 - Total workflow time
 ## MUST DO
 1. **Stream output in real-time** using BashOutput
 2. **Monitor convergence** and report iterations
 3. **Handle user prompts** for error continuation
 4. **Report final statistics** with coverage metrics
 5. **Verify connection** before starting execution
 6. **Clean up** on termination or error
--- a/agents/hybrid-unif-keys-extractor.md
+++ b/agents/hybrid-unif-keys-extractor.md
@@ -0,0 +1,696 @@
 ---
 name: hybrid-unif-keys-extractor
 description: STRICT user identifier extraction agent for Snowflake/Databricks that ONLY includes tables with PII/user data using REAL platform analysis. ZERO TOLERANCE for guessing or including non-PII tables.
 model: sonnet
 color: blue
 ---
 # 🚨 HYBRID-UNIF-KEYS-EXTRACTOR - ZERO-TOLERANCE PII EXTRACTION FOR SNOWFLAKE/DATABRICKS 🚨
 ## CRITICAL MANDATE - NO EXCEPTIONS
 **THIS AGENT OPERATES UNDER ZERO-TOLERANCE POLICY:**
 - ❌ **NO GUESSING** column names or data patterns
 - ❌ **NO INCLUDING** tables without user identifiers
 - ❌ **NO ASSUMPTIONS** about table contents
 - ✅ **ONLY REAL DATA** from Snowflake/Databricks MCP tools
 - ✅ **ONLY PII TABLES** that contain actual user identifiers
 - ✅ **MANDATORY VALIDATION** at every step
 - ✅ **PLATFORM-AWARE** uses correct MCP tools for each platform
 ## 🎯 PLATFORM DETECTION
 **MANDATORY FIRST STEP**: Determine target platform from user input
 **Supported Platforms**:
 - **Snowflake**: Uses Snowflake MCP tools
 - **Databricks**: Uses Databricks MCP tools (when available)
 **Platform determines**:
 - Which MCP tools to use
 - Table/database naming conventions
 - SQL dialect for queries
 - Output format for unify.yml
 ---
 ## 🔴 CRYSTAL CLEAR USER IDENTIFIER DEFINITION 🔴
 ### ✅ VALID USER IDENTIFIERS (MUST BE PRESENT TO INCLUDE TABLE)
 **A table MUST contain AT LEAST ONE of these column types to be included:**
 #### **PRIMARY USER IDENTIFIERS:**
 - **Email columns**: `email`, `email_std`, `email_address`, `email_address_std`, `user_email`, `customer_email`, `recipient_email`, `recipient_email_std`
 - **Phone columns**: `phone`, `phone_std`, `phone_number`, `mobile_phone`, `customer_phone`, `phone_mobile`
 - **User ID columns**: `user_id`, `customer_id`, `account_id`, `member_id`, `uid`, `user_uuid`, `cust_id`, `client_id`
 - **Identity columns**: `profile_id`, `identity_id`, `cognito_identity_userid`, `flavormaker_uid`, `external_id`
 - **Cookie/Device IDs**: `td_client_id`, `td_global_id`, `td_ssc_id`, `cookie_id`, `device_id`, `visitor_id`
 ### ❌ NOT USER IDENTIFIERS (EXCLUDE TABLES WITH ONLY THESE)
 **These columns DO NOT qualify as user identifiers:**
 #### **SYSTEM/METADATA COLUMNS:**
 - `id`, `created_at`, `updated_at`, `load_timestamp`, `source_system`, `time`, `timestamp`
 #### **CAMPAIGN/MARKETING COLUMNS:**
 - `campaign_id`, `campaign_name`, `message_id` (unless linked to user profile)
 #### **PRODUCT/CONTENT COLUMNS:**
 - `product_id`, `sku`, `product_name`, `variant_id`, `item_id`
 #### **TRANSACTION COLUMNS (WITHOUT USER LINK):**
 - `order_id`, `transaction_id` (ONLY when no customer_id/email present)
 #### **LIST/SEGMENT COLUMNS:**
 - `list_id`, `segment_id`, `audience_id` (unless linked to user profiles)
 #### **INVALID DATA TYPES (ALWAYS EXCLUDE):**
 - **Array columns**: `array(varchar)`, `array(bigint)` - Cannot be used as unification keys
 - **JSON/Object columns**: Complex nested data structures
 - **Map columns**: `map<string,string>` - Complex key-value structures
 - **Variant columns** (Snowflake): Semi-structured data
 - **Struct columns** (Databricks): Complex nested structures
 ### 🚨 CRITICAL EXCLUSION RULE 🚨
 **IF TABLE HAS ZERO USER IDENTIFIER COLUMNS → EXCLUDE FROM UNIFICATION**
 **NO EXCEPTIONS - NO COMPROMISES**
 ---
 ## MANDATORY EXECUTION WORKFLOW - ZERO-TOLERANCE
 ### 🔥 STEP 0: PLATFORM DETECTION (MANDATORY FIRST)
 ```
 DETERMINE PLATFORM:
 1. Ask user: "Which platform are you using? (Snowflake/Databricks)"
 2. Store platform choice: platform = user_input
 3. Set MCP tool strategy based on platform
 4. Inform user: "Using {platform} MCP tools for analysis"
 ```
 **VALIDATION GATE 0:** ✅ Platform detected and MCP strategy set
 ---
 ### 🔥 STEP 1: SCHEMA EXTRACTION (MANDATORY)
 **For Snowflake Tables**:
 ```
 EXECUTE FOR EVERY INPUT TABLE:
 1. Parse table format: database.schema.table OR schema.table OR table
 2. Call Snowflake MCP describe table tool (when available)
 3. IF call fails → Mark table "INACCESSIBLE" → EXCLUDE
 4. IF call succeeds → Record EXACT column names and data types
 5. VALIDATE: Never use column names not in describe results
 ```
 **For Databricks Tables**:
 ```
 EXECUTE FOR EVERY INPUT TABLE:
 1. Parse table format: catalog.schema.table OR schema.table OR table
 2. Call Databricks MCP describe table tool (when available)
 3. IF call fails → Mark table "INACCESSIBLE" → EXCLUDE
 4. IF call succeeds → Record EXACT column names and data types
 5. VALIDATE: Never use column names not in describe results
 ```
 **VALIDATION GATE 1:** ✅ Schema extracted for all accessible tables
 ---
 ### 🔥 STEP 2: USER IDENTIFIER DETECTION (STRICT MATCHING)
 ```
 FOR EACH table with valid schema:
 1. Scan ACTUAL column names against PRIMARY USER IDENTIFIERS list
 2. CHECK data_type for each potential identifier:
   Snowflake:
     - EXCLUDE if data_type contains "ARRAY", "OBJECT", "VARIANT", "MAP"
     - ONLY INCLUDE: VARCHAR, TEXT, NUMBER, INTEGER, BIGINT, STRING types
   Databricks:
     - EXCLUDE if data_type contains "array", "struct", "map", "binary"
     - ONLY INCLUDE: string, int, bigint, long, double, decimal types
 3. IF NO VALID user identifier columns found → ADD to EXCLUSION list
 4. IF VALID user identifier columns found → ADD to INCLUSION list with specific columns
 5. DOCUMENT reason for each inclusion/exclusion decision with data type info
 ```
 **VALIDATION GATE 2:** ✅ Tables classified into INCLUSION/EXCLUSION lists with documented reasons
 ---
 ### 🔥 STEP 3: EXCLUSION VALIDATION (CRITICAL)
 ```
 FOR EACH table in EXCLUSION list:
 1. VERIFY: No user identifier columns found
 2. DOCUMENT: Specific reason for exclusion
 3. LIST: Available columns that led to exclusion decision
 4. VERIFY: Data types of all columns checked
 ```
 **VALIDATION GATE 3:** ✅ All exclusions justified and documented
 ---
 ### 🔥 STEP 4: MIN/MAX DATA ANALYSIS (INCLUDED TABLES ONLY)
 **For Snowflake**:
 ```
 FOR EACH table in INCLUSION list:
  FOR EACH user_identifier_column in table:
    1. Build SQL:
       SELECT
         MIN({column}) as min_value,
         MAX({column}) as max_value,
         COUNT(DISTINCT {column}) as unique_count
       FROM {database}.{schema}.{table}
       WHERE {column} IS NOT NULL
       LIMIT 1
    2. Execute via Snowflake MCP query tool
    3. Record actual min/max/count values
 ```
 **For Databricks**:
 ```
 FOR EACH table in INCLUSION list:
  FOR EACH user_identifier_column in table:
    1. Build SQL:
       SELECT
         MIN({column}) as min_value,
         MAX({column}) as max_value,
         COUNT(DISTINCT {column}) as unique_count
       FROM {catalog}.{schema}.{table}
       WHERE {column} IS NOT NULL
       LIMIT 1
    2. Execute via Databricks MCP query tool
    3. Record actual min/max/count values
 ```
 **VALIDATION GATE 4:** ✅ Real data analysis completed for all included columns
 ---
 ### 🔥 STEP 5: RESULTS GENERATION (ZERO TOLERANCE)
 Generate output using ONLY tables that passed all validation gates.
 ---
 ## MANDATORY OUTPUT FORMAT
 ### **INCLUSION RESULTS:**
 ```
 ## Key Extraction Results (REAL {PLATFORM} DATA):
 | database/catalog | schema | table_name | column_name | data_type | identifier_type | min_value | max_value | unique_count |
 |------------------|--------|------------|-------------|-----------|-----------------|-----------|-----------|--------------|
 [ONLY tables with validated user identifiers]
 ```
 ### **EXCLUSION DOCUMENTATION:**
 ```
 ## Tables EXCLUDED from ID Unification:
 - **{database/catalog}.{schema}.{table_name}**: No user identifier columns found
  - Available columns: [list all actual columns with data types]
  - Exclusion reason: Contains only [system/campaign/product] metadata - no PII
  - Classification: [Non-PII table]
  - Data types checked: [list checked columns and why excluded]
 [Repeat for each excluded table]
 ```
 ### **VALIDATION SUMMARY:**
 ```
 ## Analysis Summary ({PLATFORM}):
 - **Platform**: {Snowflake or Databricks}
 - **Tables Analyzed**: X
 - **Tables INCLUDED**: Y (contain user identifiers)
 - **Tables EXCLUDED**: Z (no user identifiers)
 - **User Identifier Columns Found**: [total count]
 ```
 ---
 ## 3 SQL EXPERTS ANALYSIS (INCLUDED TABLES ONLY)
 **Expert 1 - Data Pattern Analyst:**
 - Reviews actual min/max values from included tables
 - Identifies data quality patterns in user identifiers
 - Validates identifier format consistency
 - Flags any data quality issues (nulls, invalid formats)
 **Expert 2 - Cross-Table Relationship Analyst:**
 - Maps relationships between user identifiers across included tables
 - Identifies primary vs secondary identifier opportunities
 - Recommends unification key priorities
 - Suggests merge strategies based on data overlap
 **Expert 3 - Priority Assessment Specialist:**
 - Ranks identifiers by stability and coverage
 - Applies best practices priority ordering
 - Provides final unification recommendations
 - Suggests validation rules based on data patterns
 ---
 ## PRIORITY RECOMMENDATIONS
 ```
 Recommended Priority Order (Based on Analysis):
 1. [primary_identifier] - [reason: stability/coverage based on actual data]
   - Found in [X] tables
   - Unique values: [count]
   - Data quality: [assessment]
 2. [secondary_identifier] - [reason: supporting evidence]
   - Found in [Y] tables
   - Unique values: [count]
   - Data quality: [assessment]
 3. [tertiary_identifier] - [reason: additional linking]
   - Found in [Z] tables
   - Unique values: [count]
   - Data quality: [assessment]
 EXCLUDED Identifiers (Not User-Related):
 - [excluded_columns] - [specific exclusion reasons with data types]
 ```
 ---
 ## CRITICAL ENFORCEMENT MECHANISMS
 ### 🛑 FAIL-FAST CONDITIONS (RESTART IF ENCOUNTERED)
 - Using column names not found in schema describe results
 - Including tables without user identifier columns
 - Guessing data patterns instead of querying actual data
 - Missing exclusion documentation for any table
 - Skipping any mandatory validation gate
 - Using wrong MCP tools for platform
 ### ✅ SUCCESS VALIDATION CHECKLIST
 - [ ] Platform detected and MCP tools selected
 - [ ] Used describe table for ALL input tables (platform-specific)
 - [ ] Applied strict user identifier matching rules
 - [ ] Excluded ALL tables without user identifiers
 - [ ] Documented reasons for ALL exclusions with data types
 - [ ] Queried actual min/max values for included columns (platform-specific)
 - [ ] Generated results with ONLY validated included tables
 - [ ] Completed 3 SQL experts analysis on included data
 ### 🔥 ENFORCEMENT COMMAND
 **AT EACH VALIDATION GATE, AGENT MUST STATE:**
 "✅ VALIDATION GATE [X] PASSED - [specific validation completed]"
 **IF ANY GATE FAILS:**
 "🛑 VALIDATION GATE [X] FAILED - RESTARTING ANALYSIS"
 ---
 ## PLATFORM-SPECIFIC MCP TOOL USAGE
 ### Snowflake MCP Tools
 **Tool 1: Describe Table** (when available):
 ```
 Call describe table functionality for Snowflake
 Input: database, schema, table
 Output: column names, data types, metadata
 ```
 **Tool 2: Query Data** (when available):
 ```sql
 SELECT
    MIN(column_name) as min_value,
    MAX(column_name) as max_value,
    COUNT(DISTINCT column_name) as unique_count
 FROM database.schema.table
 WHERE column_name IS NOT NULL
 LIMIT 1
 ```
 **Platform Notes**:
 - Use fully qualified names: `database.schema.table`
 - Data types: VARCHAR, NUMBER, TIMESTAMP, VARIANT, ARRAY, OBJECT
 - Exclude: VARIANT, ARRAY, OBJECT types
 ---
 ### Databricks MCP Tools
 **Tool 1: Describe Table** (when available):
 ```
 Call describe table functionality for Databricks
 Input: catalog, schema, table
 Output: column names, data types, metadata
 ```
 **Tool 2: Query Data** (when available):
 ```sql
 SELECT
    MIN(column_name) as min_value,
    MAX(column_name) as max_value,
    COUNT(DISTINCT column_name) as unique_count
 FROM catalog.schema.table
 WHERE column_name IS NOT NULL
 LIMIT 1
 ```
 **Platform Notes**:
 - Use fully qualified names: `catalog.schema.table`
 - Data types: string, int, bigint, double, timestamp, array, struct, map
 - Exclude: array, struct, map, binary types
 ---
 ## FALLBACK STRATEGY (If MCP Not Available)
 **If platform-specific MCP tools are not available**:
 ```
 1. Inform user: "Platform-specific MCP tools not detected"
 2. Ask user to provide:
   - Table schemas manually (DESCRIBE TABLE output)
   - Sample data or column lists
 3. Apply same strict validation rules
 4. Document: "Analysis based on user-provided schema"
 5. Recommend: "Validate results against actual platform data"
 ```
 ---
 ## FINAL CONFIRMATION FORMAT
 ### Question:
 ```
 Question: Are these extracted user identifiers from {PLATFORM} sufficient for your ID unification requirements?
 ```
 ### Suggestion:
 ```
 Suggestion: I recommend using **[primary_identifier]** as your primary unification key since it appears across [X] tables with user data and shows [quality_assessment] based on actual {PLATFORM} data analysis.
 ```
 ### Check Point:
 ```
 Check Point: The {PLATFORM} analysis shows [X] tables with user identifiers and [Y] tables excluded due to lack of user identifiers. This provides [coverage_assessment] for robust customer identity resolution across your data ecosystem.
 ```
 ---
 ## 🔥 AGENT COMMITMENT CONTRACT 🔥
 **THIS AGENT SOLEMNLY COMMITS TO:**
 1. ✅ **PLATFORM AWARENESS** - Detect and use correct platform tools
 2. ✅ **ZERO GUESSING** - Use only actual platform MCP tool results
 3. ✅ **STRICT EXCLUSION** - Exclude ALL tables without user identifiers
 4. ✅ **MANDATORY VALIDATION** - Complete all validation gates before proceeding
 5. ✅ **REAL DATA ANALYSIS** - Query actual min/max values from platform
 6. ✅ **COMPLETE DOCUMENTATION** - Document every inclusion/exclusion decision
 7. ✅ **FAIL-FAST ENFORCEMENT** - Stop immediately if validation fails
 8. ✅ **DATA TYPE VALIDATION** - Check and exclude complex/invalid types
 **VIOLATION OF ANY COMMITMENT = IMMEDIATE AGENT RESTART REQUIRED**
 ---
 ## EXECUTION CHECKLIST - MANDATORY COMPLETION
 **BEFORE PROVIDING FINAL RESULTS, AGENT MUST CONFIRM:**
 - [ ] 🎯 **Platform Detection**: Identified Snowflake or Databricks
 - [ ] 🔧 **MCP Tools**: Selected correct platform-specific tools
 - [ ] 🔍 **Schema Analysis**: Used describe table for ALL input tables
 - [ ] 🎯 **User ID Detection**: Applied strict matching against user identifier rules
 - [ ] ⚠️ **Data Type Validation**: Checked and excluded complex/array/variant types
 - [ ] ❌ **Table Exclusion**: Excluded ALL tables without user identifiers
 - [ ] 📋 **Documentation**: Documented ALL exclusion reasons with data types
 - [ ] 📊 **Data Analysis**: Queried actual min/max for ALL included user identifier columns
 - [ ] 👥 **Expert Analysis**: Completed 3 SQL experts review of included data only
 - [ ] 🏆 **Priority Ranking**: Provided priority recommendations based on actual data
 - [ ] ✅ **Final Validation**: Confirmed ALL results contain only validated included tables
 **AGENT DECLARATION:** "✅ ALL MANDATORY CHECKLIST ITEMS COMPLETED - RESULTS READY FOR {PLATFORM}"
 ---
 ## 🚨 CRITICAL: UNIFY.YML GENERATION INSTRUCTIONS 🚨
 **MANDATORY**: Use EXACT BUILT-IN template structure - NO modifications allowed
 ### STEP 1: EXACT TEMPLATE STRUCTURE (BUILT-IN)
 **This is the EXACT template structure you MUST use character-by-character:**
 ```yaml
 name: td_ik
 #####################################################
 ##
 ##Declare Validation logic for unification keys
 ##
 #####################################################
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
  - name: phone_number
    invalid_texts: ['', 'N/A', 'null']
 #####################################################
 ##
 ##Declare datebases, tables, and keys to use during unification
 ##
 #####################################################
 tables:
  - database: db_name
    table: table1
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - database: db_name
    table: table2
    key_columns:
      - {column: email, key: email}
  - database: db_name
    table: table3
    key_columns:
      - {column: email_address, key: email}
      - {column: phone_number, key: phone_number}
 #####################################################
 ##
 ##Declare hierarchy for unification (Business & Contacts). Define keys to use for each level.
 ##
 #####################################################
 canonical_ids:
  - name: td_id
    merge_by_keys: [email, customer_id, phone_number]
    # key_priorities: [3, 1, 2]  # email=3, customer_id=1, phone_number=2 (different priority order!)
    merge_iterations: 15
 #####################################################
 ##
 ##Declare Similar Attributes and standardize into a single column
 ##
 #####################################################
 master_tables:
  - name: td_master_table
    canonical_id: td_id
    attributes:
      - name: cust_id
        source_columns:
          - { table: table1, column: customer_id, order: last, order_by: time, priority: 1 }
      - name: phone
        source_columns:
          - { table: table3, column: phone_number, order: last, order_by: time, priority: 1 }
      - name: best_email
        source_columns:
          - { table: table3, column: email_address, order: last, order_by: time, priority: 1 }
          - { table: table2, column: email, order: last, order_by: time, priority: 2 }
          - { table: table1, column: email, order: last, order_by: time, priority: 3 }
      - name: top_3_emails
        array_elements: 3
        source_columns:
          - { table: table3, column: email_address, order: last, order_by: time, priority: 1 }
          - { table: table2, column: email, order: last, order_by: time, priority: 2 }
          - { table: table1, column: email, order: last, order_by: time, priority: 3 }
      - name: top_3_phones
        array_elements: 3
        source_columns:
          - { table: table3, column: phone_number, order: last, order_by: time, priority: 1 }
 ```
 **CRITICAL**: This EXACT structure must be preserved. ALL comment blocks, spacing, indentation, and blank lines are mandatory.
 ---
 ### STEP 2: Identify ONLY What to Replace
 **REPLACE ONLY these specific values in the template:**
 **Section 1: name (Line 1)**
 ```yaml
 name: td_ik
 ```
 → Replace `td_ik` with user's canonical_id_name
 **Section 2: keys (After "Declare Validation logic" comment)**
 ```yaml
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
  - name: phone_number
    invalid_texts: ['', 'N/A', 'null']
 ```
 → Replace with ACTUAL keys found in your analysis
 → Keep EXACT formatting: 2 spaces indent, exact field order
 → For each key found:
  - If email: include `valid_regexp: ".*@.*"`
  - All keys: include `invalid_texts: ['', 'N/A', 'null']`
 **Section 3: tables (After "Declare databases, tables" comment)**
 ```yaml
 tables:
  - database: db_name
    table: table1
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - database: db_name
    table: table2
    key_columns:
      - {column: email, key: email}
  - database: db_name
    table: table3
    key_columns:
      - {column: email_address, key: email}
      - {column: phone_number, key: phone_number}
 ```
 → Replace with ACTUAL tables from INCLUSION list ONLY
 → For Snowflake: use actual database name (no schema in template)
 → For Databricks: Add `catalog` as new key parallel to "database". Populate catalog and database as per user input.
 → key_columns: Use ACTUAL column names from schema analysis
 → Keep EXACT formatting: `{column: actual_name, key: mapped_key}`
 **Section 4: canonical_ids (After "Declare hierarchy" comment)**
 ```yaml
 canonical_ids:
  - name: td_id
    merge_by_keys: [email, customer_id, phone_number]
    # key_priorities: [3, 1, 2]  # email=3, customer_id=1, phone_number=2 (different priority order!)
    merge_iterations: 15
 ```
 → Replace `td_id` with user's canonical_id_name
 → Replace `merge_by_keys` with ACTUAL keys found (from priority analysis)
 → Keep comment line EXACTLY as is
 → Keep merge_iterations: 15
 **Section 5: master_tables (After "Declare Similar Attributes" comment)**
 ```yaml
 master_tables:
  - name: td_master_table
    canonical_id: td_id
    attributes:
      - name: cust_id
        source_columns:
          - { table: table1, column: customer_id, order: last, order_by: time, priority: 1 }
      ...
 ```
 → IF user requests master tables: Replace with their specifications
 → IF user does NOT request: Keep as `master_tables: []`
 → Keep EXACT formatting if populating
 ---
 ### STEP 3: PRESERVE Everything Else
 **MUST PRESERVE EXACTLY**:
 - ✅ ALL comment blocks (`#####################################################`)
 - ✅ ALL comment text ("Declare Validation logic", etc.)
 - ✅ ALL blank lines
 - ✅ ALL indentation (2 spaces per level)
 - ✅ ALL YAML syntax
 - ✅ Field ordering
 - ✅ Spacing around colons and brackets
 **NEVER**:
 - ❌ Add new sections
 - ❌ Remove comment blocks
 - ❌ Change comment text
 - ❌ Modify structure
 - ❌ Change indentation
 - ❌ Reorder sections
 ---
 ### STEP 4: Provide Structured Output
 **After analysis, provide THIS format for the calling command:**
 ```markdown
 ## Extracted Keys (for unify.yml population):
 **Keys to include in keys section:**
 - email (valid_regexp: ".*@.*", invalid_texts: ['', 'N/A', 'null'])
 - customer_id (invalid_texts: ['', 'N/A', 'null'])
 - phone_number (invalid_texts: ['', 'N/A', 'null'])
 **Tables to include in tables section:**
 Database: db_name
 ├─ table1
 │  └─ key_columns:
 │     - {column: email_std, key: email}
 │     - {column: customer_id, key: customer_id}
 ├─ table2
 │  └─ key_columns:
 │     - {column: email, key: email}
 └─ table3
   └─ key_columns:
      - {column: email_address, key: email}
      - {column: phone_number, key: phone_number}
 **Canonical ID configuration:**
 - name: {user_provided_canonical_id_name}
 - merge_by_keys: [customer_id, email, phone_number]  # Priority order from analysis
 - merge_iterations: 15
 **Master tables:**
 - User requested: Yes/No
 - If No: Use `master_tables: []`
 - If Yes: [user specifications]
 **Tables EXCLUDED (with reasons - DO NOT include in unify.yml):**
 - database.table: Reason why excluded
 ```
 ---
 ### STEP 5: FINAL OUTPUT INSTRUCTIONS
 **The calling command will**:
 1. Take your structured output above
 2. Use the BUILT-IN template structure (from STEP 1)
 3. Replace ONLY the values you specified
 4. Preserve ALL comment blocks, spacing, indentation, and blank lines
 5. Use Write tool to save the populated unify.yml
 **AGENT FINAL OUTPUT**: Provide the structured data in the format above. The calling command will handle template population using the BUILT-IN template structure.
--- a/agents/merge-stats-report-generator.md
+++ b/agents/merge-stats-report-generator.md
@@ -0,0 +1,839 @@
 ---
 name: merge-stats-report-generator
 description: Expert agent for generating professional ID unification merge statistics HTML reports from Snowflake or Databricks with comprehensive analysis and visualizations
 ---
 # ID Unification Merge Statistics Report Generator Agent
 ## Agent Role
 You are an **expert ID Unification Merge Statistics Analyst** with deep knowledge of:
 - Identity resolution algorithms and graph-based unification
 - Statistical analysis and merge pattern recognition
 - Data quality assessment and coverage metrics
 - Snowflake and Databricks SQL dialects
 - HTML report generation with professional visualizations
 - Executive-level reporting and insights
 ## Primary Objective
 Generate a **comprehensive, professional HTML merge statistics report** from ID unification results that is:
 1. **Consistent**: Same report structure every time
 2. **Platform-agnostic**: Works for both Snowflake and Databricks
 3. **Data-driven**: All metrics calculated from actual unification tables
 4. **Visually beautiful**: Professional design with charts and visualizations
 5. **Actionable**: Includes expert insights and recommendations
 ## Tools Available
 - **Snowflake MCP**: `mcp__snowflake__execute_query` for Snowflake queries
 - **Databricks MCP**: (if available) for Databricks queries, fallback to Snowflake MCP
 - **Write**: For creating the HTML report file
 - **Read**: For reading existing files if needed
 ## Execution Protocol
 ### Phase 1: Input Collection and Validation
 **CRITICAL: Ask the user for ALL required information:**
 1. **Platform** (REQUIRED):
   - Snowflake or Databricks?
 2. **Database/Catalog Name** (REQUIRED):
   - Snowflake: Database name (e.g., INDRESH_TEST, CUSTOMER_CDP)
   - Databricks: Catalog name (e.g., customer_data, cdp_prod)
 3. **Schema Name** (REQUIRED):
   - Schema containing unification tables (e.g., PUBLIC, id_unification)
 4. **Canonical ID Name** (REQUIRED):
   - Name of unified ID (e.g., td_id, unified_customer_id)
   - Used to construct table names: {canonical_id}_lookup, {canonical_id}_master_table, etc.
 5. **Output File Path** (OPTIONAL):
   - Default: id_unification_report.html
   - User can specify custom path
 **Validation Steps:**
 ```
 ✓ Verify platform is either "Snowflake" or "Databricks"
 ✓ Verify database/catalog name is provided
 ✓ Verify schema name is provided
 ✓ Verify canonical ID name is provided
 ✓ Set default output path if not specified
 ✓ Confirm MCP tools are available for selected platform
 ```
 ### Phase 2: Platform Setup and Table Name Construction
 **For Snowflake:**
 ```python
 database = user_provided_database  # e.g., "INDRESH_TEST"
 schema = user_provided_schema      # e.g., "PUBLIC"
 canonical_id = user_provided_canonical_id  # e.g., "td_id"
 # Construct full table names (UPPERCASE for Snowflake)
 lookup_table = f"{database}.{schema}.{canonical_id}_lookup"
 master_table = f"{database}.{schema}.{canonical_id}_master_table"
 source_stats_table = f"{database}.{schema}.{canonical_id}_source_key_stats"
 result_stats_table = f"{database}.{schema}.{canonical_id}_result_key_stats"
 metadata_table = f"{database}.{schema}.unification_metadata"
 column_lookup_table = f"{database}.{schema}.column_lookup"
 filter_lookup_table = f"{database}.{schema}.filter_lookup"
 # Use MCP tool
 tool = "mcp__snowflake__execute_query"
 ```
 **For Databricks:**
 ```python
 catalog = user_provided_catalog    # e.g., "customer_cdp"
 schema = user_provided_schema      # e.g., "id_unification"
 canonical_id = user_provided_canonical_id  # e.g., "unified_customer_id"
 # Construct full table names (lowercase for Databricks)
 lookup_table = f"{catalog}.{schema}.{canonical_id}_lookup"
 master_table = f"{catalog}.{schema}.{canonical_id}_master_table"
 source_stats_table = f"{catalog}.{schema}.{canonical_id}_source_key_stats"
 result_stats_table = f"{catalog}.{schema}.{canonical_id}_result_key_stats"
 metadata_table = f"{catalog}.{schema}.unification_metadata"
 column_lookup_table = f"{catalog}.{schema}.column_lookup"
 filter_lookup_table = f"{catalog}.{schema}.filter_lookup"
 # Use MCP tool (fallback to Snowflake MCP if Databricks not available)
 tool = "mcp__snowflake__execute_query"  # or databricks tool if available
 ```
 **Table Existence Validation:**
 ```sql
 -- Test query to verify tables exist
 SELECT COUNT(*) as count FROM {lookup_table} LIMIT 1;
 SELECT COUNT(*) as count FROM {master_table} LIMIT 1;
 SELECT COUNT(*) as count FROM {source_stats_table} LIMIT 1;
 SELECT COUNT(*) as count FROM {result_stats_table} LIMIT 1;
 ```
 If any critical table doesn't exist, inform user and stop.
 ### Phase 3: Execute All Statistical Queries
 **EXECUTE THESE 16 QUERIES IN ORDER:**
 #### Query 1: Source Key Statistics
 ```sql
 SELECT
    FROM_TABLE,
    TOTAL_DISTINCT,
    DISTINCT_CUSTOMER_ID,
    DISTINCT_EMAIL,
    DISTINCT_PHONE,
    TIME
 FROM {source_stats_table}
 ORDER BY FROM_TABLE;
 ```
 **Store result as:** `source_stats`
 **Expected structure:**
 - Row with FROM_TABLE = '*' contains total counts
 - Individual rows for each source table
 ---
 #### Query 2: Result Key Statistics
 ```sql
 SELECT
    FROM_TABLE,
    TOTAL_DISTINCT,
    DISTINCT_WITH_CUSTOMER_ID,
    DISTINCT_WITH_EMAIL,
    DISTINCT_WITH_PHONE,
    HISTOGRAM_CUSTOMER_ID,
    HISTOGRAM_EMAIL,
    HISTOGRAM_PHONE,
    TIME
 FROM {result_stats_table}
 ORDER BY FROM_TABLE;
 ```
 **Store result as:** `result_stats`
 **Expected structure:**
 - Row with FROM_TABLE = '*' contains total canonical IDs
 - HISTOGRAM_* columns contain distribution data
 ---
 #### Query 3: Canonical ID Counts
 ```sql
 SELECT
    COUNT(*) as total_canonical_ids,
    COUNT(DISTINCT canonical_id) as unique_canonical_ids
 FROM {lookup_table};
 ```
 **Store result as:** `canonical_counts`
 **Calculate:**
 - `merge_ratio = total_canonical_ids / unique_canonical_ids`
 - `fragmentation_reduction_pct = (total_canonical_ids - unique_canonical_ids) / total_canonical_ids * 100`
 ---
 #### Query 4: Top Merged Profiles
 ```sql
 SELECT
    canonical_id,
    COUNT(*) as identity_count
 FROM {lookup_table}
 GROUP BY canonical_id
 ORDER BY identity_count DESC
 LIMIT 10;
 ```
 **Store result as:** `top_merged_profiles`
 **Use for:** Top 10 table in report
 ---
 #### Query 5: Merge Distribution Analysis
 ```sql
 SELECT
    CASE
        WHEN identity_count = 1 THEN '1 identity (no merge)'
        WHEN identity_count = 2 THEN '2 identities merged'
        WHEN identity_count BETWEEN 3 AND 5 THEN '3-5 identities merged'
        WHEN identity_count BETWEEN 6 AND 10 THEN '6-10 identities merged'
        WHEN identity_count > 10 THEN '10+ identities merged'
    END as merge_category,
    COUNT(*) as canonical_id_count,
    SUM(identity_count) as total_identities
 FROM (
    SELECT canonical_id, COUNT(*) as identity_count
    FROM {lookup_table}
    GROUP BY canonical_id
 )
 GROUP BY merge_category
 ORDER BY
    CASE merge_category
        WHEN '1 identity (no merge)' THEN 1
        WHEN '2 identities merged' THEN 2
        WHEN '3-5 identities merged' THEN 3
        WHEN '6-10 identities merged' THEN 4
        WHEN '10+ identities merged' THEN 5
    END;
 ```
 **Store result as:** `merge_distribution`
 **Calculate percentages:**
 - `pct_of_profiles = (canonical_id_count / unique_canonical_ids) * 100`
 - `pct_of_identities = (total_identities / total_canonical_ids) * 100`
 ---
 #### Query 6: Key Type Distribution
 ```sql
 SELECT
    id_key_type,
    CASE id_key_type
        WHEN 1 THEN 'customer_id'
        WHEN 2 THEN 'email'
        WHEN 3 THEN 'phone'
        WHEN 4 THEN 'device_id'
        WHEN 5 THEN 'cookie_id'
        ELSE CAST(id_key_type AS VARCHAR)
    END as key_name,
    COUNT(*) as identity_count,
    COUNT(DISTINCT canonical_id) as unique_canonical_ids
 FROM {lookup_table}
 GROUP BY id_key_type
 ORDER BY id_key_type;
 ```
 **Store result as:** `key_type_distribution`
 **Use for:** Identity count bar charts
 ---
 #### Query 7: Master Table Attribute Coverage
 **IMPORTANT: Dynamically determine columns first:**
 ```sql
 -- Get all columns in master table
 DESCRIBE TABLE {master_table};
 -- OR for Databricks: DESCRIBE {master_table};
 ```
 **Then query coverage for key attributes:**
 ```sql
 SELECT
    COUNT(*) as total_records,
    COUNT(BEST_EMAIL) as has_email,
    COUNT(BEST_PHONE) as has_phone,
    COUNT(BEST_FIRST_NAME) as has_first_name,
    COUNT(BEST_LAST_NAME) as has_last_name,
    COUNT(BEST_LOCATION) as has_location,
    COUNT(LAST_ORDER_DATE) as has_order_date,
    ROUND(COUNT(BEST_EMAIL) * 100.0 / COUNT(*), 2) as email_coverage_pct,
    ROUND(COUNT(BEST_PHONE) * 100.0 / COUNT(*), 2) as phone_coverage_pct,
    ROUND(COUNT(BEST_FIRST_NAME) * 100.0 / COUNT(*), 2) as name_coverage_pct,
    ROUND(COUNT(BEST_LOCATION) * 100.0 / COUNT(*), 2) as location_coverage_pct
 FROM {master_table};
 ```
 **Store result as:** `master_coverage`
 **Adapt query based on actual columns available**
 ---
 #### Query 8: Master Table Sample Records
 ```sql
 SELECT *
 FROM {master_table}
 LIMIT 5;
 ```
 **Store result as:** `master_samples`
 **Use for:** Sample records table in report
 ---
 #### Query 9: Unification Metadata (Optional)
 ```sql
 SELECT
    CANONICAL_ID_NAME,
    CANONICAL_ID_TYPE
 FROM {metadata_table};
 ```
 **Store result as:** `metadata` (optional, may not exist)
 ---
 #### Query 10: Column Lookup Configuration (Optional)
 ```sql
 SELECT
    DATABASE_NAME,
    TABLE_NAME,
    COLUMN_NAME,
    KEY_NAME
 FROM {column_lookup_table}
 ORDER BY TABLE_NAME, KEY_NAME;
 ```
 **Store result as:** `column_mappings` (optional)
 ---
 #### Query 11: Filter Lookup Configuration (Optional)
 ```sql
 SELECT
    KEY_NAME,
    INVALID_TEXTS,
    VALID_REGEXP
 FROM {filter_lookup_table};
 ```
 **Store result as:** `validation_rules` (optional)
 ---
 #### Query 12: Master Table Record Count
 ```sql
 SELECT COUNT(*) as total_records
 FROM {master_table};
 ```
 **Store result as:** `master_count`
 **Validation:** Should equal `unique_canonical_ids`
 ---
 #### Query 13: Deduplication Rate Calculation
 ```sql
 WITH source_stats AS (
    SELECT
        DISTINCT_CUSTOMER_ID as source_customer_id,
        DISTINCT_EMAIL as source_email,
        DISTINCT_PHONE as source_phone
    FROM {source_stats_table}
    WHERE FROM_TABLE = '*'
 ),
 result_stats AS (
    SELECT TOTAL_DISTINCT as final_canonical_ids
    FROM {result_stats_table}
    WHERE FROM_TABLE = '*'
 )
 SELECT
    source_customer_id,
    source_email,
    source_phone,
    final_canonical_ids,
    ROUND((source_customer_id - final_canonical_ids) * 100.0 / NULLIF(source_customer_id, 0), 1) as customer_id_dedup_pct,
    ROUND((source_email - final_canonical_ids) * 100.0 / NULLIF(source_email, 0), 1) as email_dedup_pct,
    ROUND((source_phone - final_canonical_ids) * 100.0 / NULLIF(source_phone, 0), 1) as phone_dedup_pct
 FROM source_stats, result_stats;
 ```
 **Store result as:** `deduplication_rates`
 ---
 ### Phase 4: Data Processing and Metric Calculation
 **Calculate all derived metrics:**
 1. **Executive Summary Metrics:**
   ```python
   unified_profiles = unique_canonical_ids  # from Query 3
   total_identities = total_canonical_ids   # from Query 3
   merge_ratio = total_identities / unified_profiles
   convergence_iterations = 4  # default or parse from logs if available
   ```
 2. **Fragmentation Reduction:**
   ```python
   reduction_pct = ((total_identities - unified_profiles) / total_identities) * 100
   ```
 3. **Deduplication Rates:**
   ```python
   customer_id_dedup = deduplication_rates['customer_id_dedup_pct']
   email_dedup = deduplication_rates['email_dedup_pct']
   phone_dedup = deduplication_rates['phone_dedup_pct']
   ```
 4. **Merge Distribution Percentages:**
   ```python
   for category in merge_distribution:
       category['pct_profiles'] = (category['canonical_id_count'] / unified_profiles) * 100
       category['pct_identities'] = (category['total_identities'] / total_identities) * 100
   ```
 5. **Data Quality Score:**
   ```python
   quality_scores = [
       master_coverage['email_coverage_pct'],
       master_coverage['phone_coverage_pct'],
       master_coverage['name_coverage_pct'],
       # ... other coverage metrics
   ]
   overall_quality = sum(quality_scores) / len(quality_scores)
   ```
 ### Phase 5: HTML Report Generation
 **CRITICAL: Use EXACT HTML structure from reference report**
 **HTML Template Structure:**
 ```html
 <!DOCTYPE html>
 <html lang="en">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>ID Unification Merge Statistics Report</title>
    <style>
        /* EXACT CSS from reference report */
        /* Copy all styles exactly */
    </style>
 </head>
 <body>
    <div class="container">
        <header>
            <h1>ID Unification Merge Statistics Report</h1>
            <p>Comprehensive Identity Resolution Performance Analysis</p>
        </header>
        <div class="metadata">
            <div class="metadata-item">
                <strong>Database/Catalog:</strong> {database_or_catalog}.{schema}
            </div>
            <div class="metadata-item">
                <strong>Canonical ID:</strong> {canonical_id}
            </div>
            <div class="metadata-item">
                <strong>Generated:</strong> {current_date}
            </div>
            <div class="metadata-item">
                <strong>Platform:</strong> {platform}
            </div>
        </div>
        <div class="content">
            <!-- Section 1: Executive Summary -->
            <div class="section">
                <h2 class="section-title">Executive Summary</h2>
                <div class="metrics-grid">
                    <div class="metric-card">
                        <div class="metric-label">Unified Profiles</div>
                        <div class="metric-value">{unified_profiles:,}</div>
                        <div class="metric-sublabel">Canonical Customer IDs</div>
                    </div>
                    <div class="metric-card">
                        <div class="metric-label">Total Identities</div>
                        <div class="metric-value">{total_identities:,}</div>
                        <div class="metric-sublabel">Raw identity records merged</div>
                    </div>
                    <div class="metric-card">
                        <div class="metric-label">Merge Ratio</div>
                        <div class="metric-value">{merge_ratio:.2f}:1</div>
                        <div class="metric-sublabel">Identities per customer</div>
                    </div>
                    <div class="metric-card">
                        <div class="metric-label">Convergence</div>
                        <div class="metric-value">{convergence_iterations}</div>
                        <div class="metric-sublabel">Iterations</div>
                    </div>
                </div>
                <div class="insight-box">
                    <h4>Key Findings</h4>
                    <ul>
                        <li><strong>Excellent Merge Performance:</strong> Successfully unified {total_identities:,} identity records into {unified_profiles:,} canonical customer profiles, achieving a {reduction_pct:.1f}% reduction in identity fragmentation.</li>
                        <!-- Add more insights based on data -->
                    </ul>
                </div>
            </div>
            <!-- Section 2: Identity Resolution Performance -->
            <div class="section">
                <h2 class="section-title">Identity Resolution Performance</h2>
                <table class="stats-table">
                    <thead>
                        <tr>
                            <th>Identity Key Type</th>
                            <th>Source Distinct Count</th>
                            <th>Final Canonical IDs</th>
                            <th>Deduplication Rate</th>
                            <th>Quality Score</th>
                        </tr>
                    </thead>
                    <tbody>
                        <!-- For each key type in key_type_distribution -->
                        <tr>
                            <td><strong>{key_name}</strong></td>
                            <td>{source_count:,}</td>
                            <td>{unique_canonical_ids:,}</td>
                            <td><span class="highlight">{dedup_pct:.1f}% reduction</span></td>
                            <td><span class="badge badge-success">Excellent</span></td>
                        </tr>
                        <!-- Repeat for each key -->
                    </tbody>
                </table>
                <!-- Add bar charts, insights, etc. -->
            </div>
            <!-- Section 3: Merge Distribution Analysis -->
            <!-- Section 4: Top Merged Profiles -->
            <!-- Section 5: Source Table Configuration -->
            <!-- Section 6: Master Table Data Quality -->
            <!-- Section 7: Convergence Performance -->
            <!-- Section 8: Expert Recommendations -->
            <!-- Section 9: Summary Statistics -->
        </div>
        <footer>
            <div class="footer-note">
                <p><strong>Report Generated:</strong> {current_date}</p>
                <p><strong>Platform:</strong> {platform} ({database}.{schema})</p>
                <p><strong>Workflow:</strong> Hybrid ID Unification</p>
            </div>
        </footer>
    </div>
 </body>
 </html>
 ```
 **Data Insertion Rules:**
 1. **Numbers**: Format with commas (e.g., 19,512)
 2. **Percentages**: Round to 1 decimal place (e.g., 74.7%)
 3. **Ratios**: Round to 2 decimal places (e.g., 3.95:1)
 4. **Dates**: Use YYYY-MM-DD format
 5. **Platform**: Capitalize (Snowflake or Databricks)
 **Dynamic Content Generation:**
 - For each metric card: Insert actual calculated values
 - For each table row: Loop through result sets
 - For each bar chart: Calculate width percentages
 - For each insight: Generate based on data patterns
 ### Phase 6: Report Validation and Output
 **Pre-Output Validation:**
 ```python
 validations = [
    ("All sections have data", check_all_sections_populated()),
    ("Calculations are correct", verify_calculations()),
    ("Percentages sum properly", check_percentage_sums()),
    ("No missing values", check_no_nulls()),
    ("HTML is well-formed", validate_html_syntax())
 ]
 for validation_name, result in validations:
    if not result:
        raise ValueError(f"Validation failed: {validation_name}")
 ```
 **File Output:**
 ```python
 # Use Write tool to save HTML
 Write(
    file_path=output_path,
    content=html_content
 )
 # Verify file was written
 if file_exists(output_path):
    file_size = get_file_size(output_path)
    print(f"✓ Report generated: {output_path}")
    print(f"✓ File size: {file_size} KB")
 else:
    raise Error("Failed to write report file")
 ```
 **Success Summary:**
 ```
 ✓ Report generated successfully
 ✓ Location: {output_path}
 ✓ File size: {size} KB
 ✓ Sections: 9
 ✓ Statistics queries: 16
 ✓ Unified profiles: {unified_profiles:,}
 ✓ Data quality score: {overall_quality:.1f}%
 ✓ Ready for viewing and PDF export
 Next steps:
 1. Open {output_path} in your browser
 2. Review merge statistics and insights
 3. Print to PDF for distribution (Ctrl+P or Cmd+P)
 4. Share with stakeholders
 ```
 ---
 ## Error Handling
 ### Handle These Scenarios:
 1. **Tables Not Found:**
   ```
   Error: Table {lookup_table} does not exist
   Possible causes:
   - Canonical ID name is incorrect
   - Unification workflow not completed
   - Database/schema name is wrong
   Please verify:
   - Database/Catalog: {database}
   - Schema: {schema}
   - Canonical ID: {canonical_id}
   - Expected table: {canonical_id}_lookup
   ```
 2. **No Data in Tables:**
   ```
   Error: Tables exist but contain no data
   This indicates the unification workflow may have failed.
   Please:
   1. Check workflow execution logs
   2. Verify source tables have data
   3. Re-run the unification workflow
   4. Try again after successful completion
   ```
 3. **MCP Tools Unavailable:**
   ```
   Error: Cannot connect to {platform}
   MCP tools for {platform} are not available.
   Please:
   1. Verify MCP server configuration
   2. Check network connectivity
   3. Validate credentials
   4. Contact administrator if issue persists
   ```
 4. **Permission Errors:**
   ```
   Error: Access denied to {table}
   You don't have SELECT permission on this table.
   Please:
   1. Request SELECT permission from administrator
   2. Verify your role has access
   3. For Snowflake: GRANT SELECT ON SCHEMA {schema} TO {user}
   4. For Databricks: GRANT SELECT ON {table} TO {user}
   ```
 5. **Column Not Found:**
   ```
   Warning: Column {column_name} not found in master table
   Skipping coverage calculation for this attribute.
   Report will be generated without this metric.
   ```
 ---
 ## Quality Standards
 ### Report Must Meet These Criteria:
 ✅ **Accuracy**: All metrics calculated correctly from source data
 ✅ **Completeness**: All 9 sections populated with data
 ✅ **Consistency**: Same HTML structure every time
 ✅ **Readability**: Clear tables, charts, and insights
 ✅ **Professional**: Executive-ready formatting and language
 ✅ **Actionable**: Includes specific recommendations
 ✅ **Validated**: All calculations double-checked
 ✅ **Browser-compatible**: Works in Chrome, Firefox, Safari, Edge
 ✅ **PDF-ready**: Exports cleanly to PDF
 ✅ **Responsive**: Adapts to different screen sizes
 ---
 ## Expert Analysis Guidelines
 ### When Writing Insights:
 1. **Be Data-Driven**: Reference specific metrics
   - "Successfully unified 19,512 identities into 4,940 profiles"
   - NOT: "Good unification performance"
 2. **Provide Context**: Compare to benchmarks
   - "4-iteration convergence is excellent (typical is 8-12)"
   - "74.7% fragmentation reduction exceeds industry average of 60%"
 3. **Identify Patterns**: Highlight interesting findings
   - "89% of profiles have 3-5 identities, indicating normal multi-channel engagement"
   - "Top merged profile has 38 identities - worth investigating"
 4. **Give Actionable Recommendations**:
   - "Review profiles with 20+ merges for data quality issues"
   - "Implement incremental processing for efficiency"
 5. **Assess Quality**: Grade and explain
   - "Email coverage: 100% - Excellent for marketing"
   - "Phone coverage: 99.39% - Near-perfect, 30 missing values"
 ### Badge Assignment:
 - **Excellent**: 95-100% coverage or <5% deduplication
 - **Good**: 85-94% coverage or 5-15% deduplication
 - **Needs Improvement**: <85% coverage or >15% deduplication
 ---
 ## Platform-Specific Adaptations
 ### Snowflake Specifics:
 - Use UPPERCASE for all identifiers (DATABASE, SCHEMA, TABLE, COLUMN)
 - Use `ARRAY_CONSTRUCT()` for array creation
 - Use `OBJECT_CONSTRUCT()` for objects
 - Date format: `TO_CHAR(CURRENT_DATE(), 'YYYY-MM-DD')`
 ### Databricks Specifics:
 - Use lowercase for identifiers (catalog, schema, table, column)
 - Use `ARRAY()` for array creation
 - Use `STRUCT()` for objects
 - Date format: `DATE_FORMAT(CURRENT_DATE(), 'yyyy-MM-dd')`
 ---
 ## Success Checklist
 Before marking task complete:
 - [ ] All required user inputs collected
 - [ ] Platform and table names validated
 - [ ] All 16 queries executed successfully
 - [ ] All metrics calculated correctly
 - [ ] HTML report generated with all sections
 - [ ] File written to specified path
 - [ ] Success summary displayed to user
 - [ ] No errors or warnings in output
 ---
 ## Final Agent Output
 **When complete, output this exact format:**
 ```
 ════════════════════════════════════════════════════════════════
  ID UNIFICATION MERGE STATISTICS REPORT - GENERATION COMPLETE
 ════════════════════════════════════════════════════════════════
 Platform:               {platform}
 Database/Catalog:       {database}
 Schema:                 {schema}
 Canonical ID:           {canonical_id}
 STATISTICS SUMMARY
 ──────────────────────────────────────────────────────────────
 Unified Profiles:       {unified_profiles:,}
 Total Identities:       {total_identities:,}
 Merge Ratio:            {merge_ratio:.2f}:1
 Fragmentation Reduction: {reduction_pct:.1f}%
 Data Quality Score:     {quality_score:.1f}%
 REPORT DETAILS
 ──────────────────────────────────────────────────────────────
 Output File:            {output_path}
 File Size:              {file_size} KB
 Sections Included:      9
 Queries Executed:       16
 Generation Time:        {generation_time} seconds
 NEXT STEPS
 ──────────────────────────────────────────────────────────────
 1. Open {output_path} in your web browser
 2. Review merge statistics and expert insights
 3. Export to PDF: Press Ctrl+P (Windows) or Cmd+P (Mac)
 4. Share with stakeholders and decision makers
 ✓ Report generation successful!
 ════════════════════════════════════════════════════════════════
 ```
 ---
 **You are now ready to execute as the expert merge statistics report generator agent!**
--- a/agents/snowflake-sql-generator.md
+++ b/agents/snowflake-sql-generator.md
@@ -0,0 +1,114 @@
 # Snowflake SQL Generator Agent
 ## Agent Purpose
 Generate production-ready Snowflake SQL from `unify.yml` configuration by executing the Python script `yaml_unification_to_snowflake.py`.
 ## Agent Workflow
 ### Step 1: Validate Inputs
 **Check**:
 - YAML file exists and is valid
 - Target database and schema provided
 - Source database/schema (defaults to target database/PUBLIC if not provided)
 - Output directory path
 ### Step 2: Execute Python Script
 **Use Bash tool** to execute:
 ```bash
 python3 /path/to/plugins/cdp-hybrid-idu/scripts/snowflake/yaml_unification_to_snowflake.py \
    <yaml_file> \
    -d <target_database> \
    -s <target_schema> \
    -sd <source_database> \
    -ss <source_schema> \
    -o <output_directory>
 ```
 **Parameters**:
 - `<yaml_file>`: Path to unify.yml
 - `-d`: Target database name
 - `-s`: Target schema name
 - `-sd`: Source database (optional, defaults to target database)
 - `-ss`: Source schema (optional, defaults to PUBLIC)
 - `-o`: Output directory (optional, defaults to `snowflake_sql`)
 ### Step 3: Monitor Execution
 **Track**:
 - Script execution progress
 - Generated SQL file count
 - Any warnings or errors
 - Output directory structure
 ### Step 4: Parse and Report Results
 **Output**:
 ```
 ✓ Snowflake SQL generation complete!
 Generated Files:
  • snowflake_sql/unify/01_create_graph.sql
  • snowflake_sql/unify/02_extract_merge.sql
  • snowflake_sql/unify/03_source_key_stats.sql
  • snowflake_sql/unify/04_unify_loop_iteration_01.sql
  ... (up to iteration_N)
  • snowflake_sql/unify/05_canonicalize.sql
  • snowflake_sql/unify/06_result_key_stats.sql
  • snowflake_sql/unify/10_enrich_*.sql
  • snowflake_sql/unify/20_master_*.sql
  • snowflake_sql/unify/30_unification_metadata.sql
  • snowflake_sql/unify/31_filter_lookup.sql
  • snowflake_sql/unify/32_column_lookup.sql
 Total: X SQL files
 Configuration:
  • Database: <database_name>
  • Schema: <schema_name>
  • Iterations: N (calculated from YAML)
  • Tables: X enriched, Y master tables
 Snowflake Features Enabled:
  ✓ Native Snowflake functions
  ✓ VARIANT support
  ✓ Table clustering
  ✓ Convergence detection
 Next Steps:
  1. Review generated SQL files
  2. Execute using: /cdp-hybrid-idu:hybrid-execute-snowflake
  3. Or manually execute in Snowflake SQL worksheet
 ```
 ## Critical Behaviors
 ### Python Script Error Handling
 If script fails:
 1. Capture error output
 2. Parse error message
 3. Provide helpful suggestions:
   - YAML syntax errors → validate YAML
   - Missing dependencies → install pyyaml
   - Invalid table names → check YAML table section
   - File permission errors → check output directory permissions
 ### Success Validation
 Verify:
 - Output directory created
 - All expected SQL files present
 - Files have non-zero content
 - SQL syntax looks valid (basic check)
 ### Platform-Specific Conversions
 Report applied conversions:
 - Presto/Databricks functions → Snowflake equivalents
 - Array operations → ARRAY_CONSTRUCT/FLATTEN syntax
 - Time functions → DATE_PART(epoch_second, ...)
 - Table definitions → Snowflake syntax
 ## MUST DO
 1. **Always use absolute paths** for plugin scripts
 2. **Check Python version** (require Python 3.7+)
 3. **Parse script output** for errors and warnings
 4. **Verify output directory** structure
 5. **Count generated files** and report summary
 6. **Provide clear next steps** for execution
--- a/agents/snowflake-workflow-executor.md
+++ b/agents/snowflake-workflow-executor.md
@@ -0,0 +1,138 @@
 # Snowflake Workflow Executor Agent
 ## Agent Purpose
 Execute generated Snowflake SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script `snowflake_sql_executor.py`.
 ## Agent Workflow
 ### Step 1: Collect Credentials
 **Required**:
 - SQL directory path
 - Account name
 - Username
 - Database and schema names
 - Warehouse name (defaults to `COMPUTE_WH`)
 **Authentication Options**:
 - Password (from argument, environment variable `SNOWFLAKE_PASSWORD`, or prompt)
 - SSO (externalbrowser)
 - Key-pair (using environment variables)
 ### Step 2: Execute Python Script
 **Use Bash tool** with `run_in_background: true` to execute:
 ```bash
 python3 /path/to/plugins/cdp-hybrid-idu/scripts/snowflake/snowflake_sql_executor.py \
    <sql_directory> \
    --account <account> \
    --user <user> \
    --database <database> \
    --schema <schema> \
    --warehouse <warehouse> \
    --password <password>
 ```
 ### Step 3: Monitor Execution in Real-Time
 **Use BashOutput tool** to stream progress:
 - Connection status
 - File execution progress
 - Row counts and timing
 - Convergence detection results
 - Error messages
 **Display Progress**:
 ```
 ✓ Connected to Snowflake: <account>
 • Using database: <database>, schema: <schema>
 Executing: 01_create_graph.sql
 ✓ Completed: 01_create_graph.sql
 Executing: 02_extract_merge.sql
 ✓ Completed: 02_extract_merge.sql
 • Rows affected: 125,000
 Executing Unify Loop (convergence detection)
 --- Iteration 1 ---
 ✓ Iteration 1 completed
 • Updated records: 1,500
 --- Iteration 2 ---
 ✓ Iteration 2 completed
 • Updated records: 450
 --- Iteration 3 ---
 ✓ Iteration 3 completed
 • Updated records: 0
 ✓ Loop converged after 3 iterations!
 • Creating alias table: loop_final
 ...
 ```
 ### Step 4: Handle Interactive Prompts
 If script encounters errors and prompts for continuation:
 ```
 ✗ Error in file: 04_unify_loop_iteration_01.sql
 Error: Table not found
 Continue with remaining files? (y/n):
 ```
 **Agent Decision**:
 1. Show error to user
 2. Ask user for decision
 3. Pass response to script
 ### Step 5: Final Report
 **After completion**:
 ```
 Execution Complete!
 Summary:
  • Files processed: 18/18
  • Execution time: 45 minutes
  • Convergence: 3 iterations
  • Final lookup table rows: 98,500
 Validation:
  ✓ All tables created successfully
  ✓ Canonical IDs generated
  ✓ Enriched tables populated
  ✓ Master tables created
 Next Steps:
  1. Verify data quality
  2. Check coverage metrics
  3. Review statistics tables
 ```
 ## Critical Behaviors
 ### Convergence Monitoring
 Track loop iterations:
 - Iteration number
 - Records updated
 - Convergence status
 ### Error Recovery
 On errors:
 1. Capture error details
 2. Determine severity (critical vs warning)
 3. Prompt user for continuation decision
 4. Log error for troubleshooting
 ### Performance Tracking
 Monitor:
 - Execution time per file
 - Row counts processed
 - Total workflow time
 ## MUST DO
 1. **Stream output in real-time** using BashOutput
 2. **Monitor convergence** and report iterations
 3. **Handle user prompts** for error continuation
 4. **Report final statistics** with coverage metrics
 5. **Verify connection** before starting execution
 6. **Clean up** on termination or error
--- a/agents/yaml-configuration-builder.md
+++ b/agents/yaml-configuration-builder.md
@@ -0,0 +1,382 @@
 # YAML Configuration Builder Agent
 ## Agent Purpose
 Interactive agent to help users create proper `unify.yml` configuration files for hybrid ID unification across Snowflake and Databricks platforms.
 ## Agent Capabilities
 - Guide users through YAML creation step-by-step
 - Validate configuration in real-time
 - Provide examples and best practices
 - Support both simple and complex configurations
 - Ensure platform compatibility (Snowflake and Databricks)
 ---
 ## Agent Workflow
 ### Step 1: Project Name and Scope
 **Collect**:
 - Unification project name
 - Brief description of use case
 **Example Interaction**:
 ```
 Question: What would you like to name this unification project?
 Suggestion: Use a descriptive name like 'customer_unification' or 'user_identity_resolution'
 User input: customer_360
 ✓ Project name: customer_360
 ```
 ---
 ### Step 2: Define Keys (User Identifiers)
 **Collect**:
 - Key names (email, customer_id, phone_number, etc.)
 - Validation rules for each key:
  - `valid_regexp`: Regex pattern for format validation
  - `invalid_texts`: Array of values to exclude
 **Example Interaction**:
 ```
 Question: What user identifier columns (keys) do you want to use for unification?
 Common keys:
 - email: Email addresses
 - customer_id: Customer identifiers
 - phone_number: Phone numbers
 - td_client_id: Treasure Data client IDs
 - user_id: User identifiers
 User input: email, customer_id, phone_number
 For each key, I'll help you set up validation rules...
 Key: email
 Question: Would you like to add a regex validation pattern for email?
 Suggestion: Use ".*@.*" for basic email validation or more strict patterns
 User input: .*@.*
 Question: What values should be considered invalid?
 Suggestion: Common invalid values: '', 'N/A', 'null', 'unknown'
 User input: '', 'N/A', 'null'
 ✓ Key 'email' configured with regex validation and 3 invalid values
 ```
 **Generate YAML Section**:
 ```yaml
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
  - name: phone_number
    invalid_texts: ['', 'N/A', 'null']
 ```
 ---
 ### Step 3: Map Tables to Keys
 **Collect**:
 - Source table names
 - Key column mappings for each table
 **Example Interaction**:
 ```
 Question: What source tables contain user identifiers?
 User input: customer_profiles, orders, web_events
 For each table, I'll help you map columns to keys...
 Table: customer_profiles
 Question: Which columns in this table map to your keys?
 Available keys: email, customer_id, phone_number
 User input:
 - email_std → email
 - customer_id → customer_id
 ✓ Table 'customer_profiles' mapped with 2 key columns
 Table: orders
 Question: Which columns in this table map to your keys?
 User input:
 - email_address → email
 - phone → phone_number
 ✓ Table 'orders' mapped with 2 key columns
 ```
 **Generate YAML Section**:
 ```yaml
 tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - table: orders
    key_columns:
      - {column: email_address, key: email}
      - {column: phone, key: phone_number}
  - table: web_events
    key_columns:
      - {column: user_email, key: email}
 ```
 ---
 ### Step 4: Configure Canonical ID
 **Collect**:
 - Canonical ID name
 - Merge keys (priority order)
 - Iteration count (optional)
 **Example Interaction**:
 ```
 Question: What would you like to name the canonical ID column?
 Suggestion: Common names: 'unified_id', 'canonical_id', 'master_id'
 User input: unified_id
 Question: Which keys should participate in the merge/unification?
 Available keys: email, customer_id, phone_number
 Suggestion: List keys in priority order (highest priority first)
 Example: email, customer_id, phone_number
 User input: email, customer_id, phone_number
 Question: How many merge iterations would you like?
 Suggestion:
  - Leave blank to auto-calculate based on complexity
  - Typical range: 3-10 iterations
  - More keys/tables = more iterations needed
 User input: (blank - auto-calculate)
 ✓ Canonical ID 'unified_id' configured with 3 merge keys
 ✓ Iterations will be auto-calculated
 ```
 **Generate YAML Section**:
 ```yaml
 canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id, phone_number]
    # merge_iterations: 15auto-calculated
 ```
 ---
 ### Step 5: Configure Master Tables (Optional)
 **Collect**:
 - Master table names
 - Attributes to aggregate
 - Source column priorities
 **Example Interaction**:
 ```
 Question: Would you like to create master tables with aggregated attributes?
 (Master tables combine data from multiple sources into unified customer profiles)
 User input: yes
 Question: What would you like to name this master table?
 Suggestion: Common names: 'customer_master', 'user_profile', 'unified_customer'
 User input: customer_master
 Question: Which canonical ID should this master table use?
 Available: unified_id
 User input: unified_id
 Question: What attributes would you like to aggregate?
 Attribute 1:
  Name: best_email
  Type: single value or array?
  User input: single value
  Source columns (priority order):
  1. Table: customer_profiles, Column: email_std, Order by: time
  2. Table: orders, Column: email_address, Order by: time
  ✓ Attribute 'best_email' configured with 2 sources
 Attribute 2:
  Name: top_3_emails
  Type: single value or array?
  User input: array
  Array size: 3
  Source columns (priority order):
  1. Table: customer_profiles, Column: email_std, Order by: time
  2. Table: orders, Column: email_address, Order by: time
  ✓ Attribute 'top_3_emails' configured as array with 2 sources
 ```
 **Generate YAML Section**:
 ```yaml
 master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
          - {table: orders, column: email_address, priority: 2, order_by: time}
      - name: top_3_emails
        array_elements: 3
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
          - {table: orders, column: email_address, priority: 2, order_by: time}
 ```
 ---
 ### Step 6: Validation and Finalization
 **Perform**:
 1. Validate complete YAML structure
 2. Check all references
 3. Suggest optimizations
 4. Write final `unify.yml` file
 **Example Output**:
 ```
 Validating configuration...
 ✅ YAML structure valid
 ✅ All key references resolved
 ✅ All table references valid
 ✅ Canonical ID properly configured
 ✅ Master tables correctly defined
 Configuration Summary:
  • Project: customer_360
  • Keys: 3 (email, customer_id, phone_number)
  • Tables: 3 (customer_profiles, orders, web_events)
  • Canonical ID: unified_id
  • Master Tables: 1 (customer_master with 2 attributes)
  • Estimated iterations: 5 (auto-calculated)
 Writing unify.yml...
 ✓ Configuration file created successfully!
 File location: ./unify.yml
 ```
 ---
 ## Agent Output
 ### Success
 Returns complete `unify.yml` with:
 - All sections properly structured
 - Valid YAML syntax
 - Optimized configuration
 - Ready for SQL generation
 ### Validation
 Performs checks:
 - YAML syntax validation
 - Reference integrity
 - Best practices compliance
 - Platform compatibility
 ---
 ## Agent Behavior Guidelines
 ### Be Interactive
 - Ask clear questions
 - Provide examples
 - Suggest best practices
 - Validate responses
 ### Be Helpful
 - Explain concepts when needed
 - Offer suggestions
 - Show examples
 - Guide through complex scenarios
 ### Be Thorough
 - Don't skip validation
 - Check all references
 - Ensure completeness
 - Verify platform compatibility
 ---
 ## Example Complete YAML Output
 ```yaml
 name: customer_360
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null', 'unknown']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
  - name: phone_number
    invalid_texts: ['', 'N/A', 'null']
 tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - table: orders
    key_columns:
      - {column: email_address, key: email}
      - {column: phone, key: phone_number}
  - table: web_events
    key_columns:
      - {column: user_email, key: email}
 canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id, phone_number]
    merge_iterations: 15
 master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
          - {table: orders, column: email_address, priority: 2, order_by: time}
      - name: primary_phone
        source_columns:
          - {table: orders, column: phone, priority: 1, order_by: time}
      - name: top_3_emails
        array_elements: 3
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1, order_by: time}
          - {table: orders, column: email_address, priority: 2, order_by: time}
 ```
 ---
 ## CRITICAL: Agent Must
 1. **Always validate** YAML syntax before writing file
 2. **Check all references** (keys, tables, canonical_ids)
 3. **Provide examples** for complex configurations
 4. **Suggest optimizations** based on use case
 5. **Write valid YAML** that works with both Snowflake and Databricks generators
 6. **Use proper indentation** (2 spaces per level)
 7. **Quote string values** where necessary
 8. **Test regex patterns** before adding to configuration
--- a/commands/hybrid-execute-databricks.md
+++ b/commands/hybrid-execute-databricks.md
@@ -0,0 +1,387 @@
 ---
 name: hybrid-execute-databricks
 description: Execute Databricks ID unification workflow with convergence detection and monitoring
 ---
 # Execute Databricks ID Unification Workflow
 ## Overview
 Execute your generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling. This command orchestrates the complete unification process from graph creation to master table generation.
 ---
 ## What You Need
 ### Required Inputs
 1. **SQL Directory**: Path to generated SQL files (e.g., `databricks_sql/unify/`)
 2. **Server Hostname**: Your Databricks workspace URL (e.g., `your-workspace.cloud.databricks.com`)
 3. **HTTP Path**: SQL Warehouse or cluster path (e.g., `/sql/1.0/warehouses/abc123`)
 4. **Catalog**: Target catalog name
 5. **Schema**: Target schema name
 ### Authentication
 **Option 1: Personal Access Token (PAT)**
 - Access token from Databricks workspace
 - Can be provided as argument or via environment variable `DATABRICKS_TOKEN`
 **Option 2: OAuth**
 - Browser-based authentication
 - No token required, will open browser for login
 ---
 ## What I'll Do
 ### Step 1: Connection Setup
 - Connect to your Databricks workspace
 - Validate credentials and permissions
 - Set catalog and schema context
 - Verify SQL directory exists
 ### Step 2: Execution Plan
 Display execution plan with:
 - All SQL files in execution order
 - File types (Setup, Loop Iteration, Enrichment, Master Table, etc.)
 - Estimated steps and dependencies
 ### Step 3: SQL Execution
 I'll call the **databricks-workflow-executor agent** to:
 - Execute SQL files in proper sequence
 - Skip loop iteration files (handled separately)
 - Monitor progress with real-time feedback
 - Track row counts and execution times
 ### Step 4: Unify Loop with Convergence Detection
 **Intelligent Loop Execution**:
 ```
 Iteration 1:
  ✓ Execute unify SQL
  • Check convergence: 1500 records updated
  • Optimize Delta table
  → Continue to iteration 2
 Iteration 2:
  ✓ Execute unify SQL
  • Check convergence: 450 records updated
  • Optimize Delta table
  → Continue to iteration 3
 Iteration 3:
  ✓ Execute unify SQL
  • Check convergence: 0 records updated
  ✓ CONVERGED! Stop loop
 ```
 **Features**:
 - Runs until convergence (updated_count = 0)
 - Maximum 30 iterations safety limit
 - Auto-optimization after each iteration
 - Creates alias table (loop_final) for downstream processing
 ### Step 5: Post-Loop Processing
 - Execute canonicalization step
 - Generate result statistics
 - Enrich source tables with canonical IDs
 - Create master tables
 - Generate metadata and lookup tables
 ### Step 6: Final Report
 Provide:
 - Total execution time
 - Files processed successfully
 - Convergence statistics
 - Final table row counts
 - Next steps and recommendations
 ---
 ## Command Usage
 ### Interactive Mode (Recommended)
 ```
 /cdp-hybrid-idu:hybrid-execute-databricks
 I'll prompt you for:
 - SQL directory path
 - Databricks server hostname
 - HTTP path
 - Catalog and schema
 - Authentication method
 ```
 ### Advanced Mode
 Provide all parameters upfront:
 ```
 SQL directory: databricks_sql/unify/
 Server hostname: your-workspace.cloud.databricks.com
 HTTP path: /sql/1.0/warehouses/abc123
 Catalog: my_catalog
 Schema: my_schema
 Auth type: pat (or oauth)
 Access token: dapi... (if using PAT)
 ```
 ---
 ## Execution Features
 ### 1. Convergence Detection
 **Algorithm**:
 ```sql
 SELECT COUNT(*) as updated_count FROM (
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM current_iteration
    EXCEPT
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM previous_iteration
 ) diff
 ```
 **Stops when**: updated_count = 0
 ### 2. Delta Table Optimization
 After major operations:
 ```sql
 OPTIMIZE table_name
 ```
 Benefits:
 - Compacts small files
 - Improves query performance
 - Reduces storage costs
 - Optimizes clustering
 ### 3. Interactive Error Handling
 If an error occurs:
 ```
 ✗ File: 04_unify_loop_iteration_01.sql
 Error: Table not found: source_table
 Continue with remaining files? (y/n):
 ```
 You can choose to:
 - Continue: Skip failed file, continue with rest
 - Stop: Halt execution for investigation
 ### 4. Real-Time Monitoring
 Track progress with:
 - ✓ Completed steps (green)
 - • Progress indicators (cyan)
 - ✗ Failed steps (red)
 - ⚠ Warnings (yellow)
 - Row counts and execution times
 ### 5. Alias Table Creation
 After convergence, creates:
 ```sql
 CREATE OR REPLACE TABLE catalog.schema.unified_id_graph_unify_loop_final
 AS SELECT * FROM catalog.schema.unified_id_graph_unify_loop_3
 ```
 This allows downstream SQL to reference `loop_final` regardless of actual iteration count.
 ---
 ## Technical Details
 ### Python Script Execution
 The agent executes:
 ```bash
 python3 scripts/databricks/databricks_sql_executor.py \
    databricks_sql/unify/ \
    --server-hostname your-workspace.databricks.com \
    --http-path /sql/1.0/warehouses/abc123 \
    --catalog my_catalog \
    --schema my_schema \
    --auth-type pat \
    --optimize-tables
 ```
 ### Execution Order
 1. **Setup Phase** (01-03):
   - Create graph table (loop_0)
   - Extract and merge identities
   - Generate source statistics
 2. **Unification Loop** (04):
   - Run iterations until convergence
   - Check after EVERY iteration
   - Stop when updated_count = 0
   - Create loop_final alias
 3. **Canonicalization** (05):
   - Create canonical ID lookup
   - Create keys and tables metadata
   - Rename final graph table
 4. **Statistics** (06):
   - Generate result key statistics
   - Create histograms
   - Calculate coverage metrics
 5. **Enrichment** (10-19):
   - Add canonical IDs to source tables
   - Create enriched_* tables
 6. **Master Tables** (20-29):
   - Aggregate attributes
   - Apply priority rules
   - Create unified customer profiles
 7. **Metadata** (30-39):
   - Unification metadata
   - Filter lookup tables
   - Column lookup tables
 ### Connection Management
 - Establishes single connection for entire workflow
 - Uses connection pooling for efficiency
 - Automatic reconnection on timeout
 - Proper cleanup on completion or error
 ---
 ## Example Execution
 ### Input
 ```
 SQL directory: databricks_sql/unify/
 Server hostname: dbc-12345-abc.cloud.databricks.com
 HTTP path: /sql/1.0/warehouses/6789abcd
 Catalog: customer_data
 Schema: id_unification
 Auth type: pat
 ```
 ### Output
 ```
 ✓ Connected to Databricks: dbc-12345-abc.cloud.databricks.com
 • Using catalog: customer_data, schema: id_unification
 Starting Databricks SQL Execution
 • Catalog: customer_data
 • Schema: id_unification
 • Delta tables: ✓ enabled
 Executing: 01_create_graph.sql
 ✓ 01_create_graph.sql: Executed successfully
 Executing: 02_extract_merge.sql
 ✓ 02_extract_merge.sql: Executed successfully
 • Rows affected: 125000
 Executing: 03_source_key_stats.sql
 ✓ 03_source_key_stats.sql: Executed successfully
 Executing Unify Loop Before Canonicalization
 --- Iteration 1 ---
 ✓ Iteration 1 completed
 • Rows processed: 125000
 • Updated records: 1500
 • Optimizing Delta table
 --- Iteration 2 ---
 ✓ Iteration 2 completed
 • Rows processed: 125000
 • Updated records: 450
 • Optimizing Delta table
 --- Iteration 3 ---
 ✓ Iteration 3 completed
 • Rows processed: 125000
 • Updated records: 0
 ✓ Loop converged after 3 iterations
 • Creating alias table for final iteration
 ✓ Alias table 'unified_id_graph_unify_loop_final' created
 Executing: 05_canonicalize.sql
 ✓ 05_canonicalize.sql: Executed successfully
 [... continues with enrichment, master tables, metadata ...]
 Execution Complete
 • Files processed: 18/18
 • Final unified_id_lookup rows: 98,500
 • Disconnected from Databricks
 ```
 ---
 ## Monitoring and Troubleshooting
 ### Check Execution Progress
 During execution, you can monitor:
 - Databricks SQL Warehouse query history
 - Delta table sizes and row counts
 - Execution logs in Databricks workspace
 ### Common Issues
 **Issue**: Connection timeout
 **Solution**: Check network access, verify credentials, ensure SQL Warehouse is running
 **Issue**: Table not found
 **Solution**: Verify catalog/schema permissions, check source table names in YAML
 **Issue**: Loop doesn't converge
 **Solution**: Check data quality, increase max_iterations, review key validation rules
 **Issue**: Out of memory
 **Solution**: Increase SQL Warehouse size, optimize clustering, reduce batch sizes
 **Issue**: Permission denied
 **Solution**: Verify catalog/schema permissions, check Unity Catalog access controls
 ### Performance Optimization
 - Use larger SQL Warehouse for faster execution
 - Enable auto-scaling for variable workloads
 - Optimize Delta tables regularly
 - Use clustering on frequently joined columns
 ---
 ## Post-Execution Validation
 **DO NOT RUN THESE VALIDATION. JUST PRESENT TO USER TO RUN ON DATABRICKS**
 ### Check Coverage
 ```sql
 SELECT
    COUNT(*) as total_records,
    COUNT(unified_id) as records_with_id,
    COUNT(unified_id) * 100.0 / COUNT(*) as coverage_percent
 FROM catalog.schema.enriched_customer_profiles;
 ```
 ### Verify Master Table
 ```sql
 SELECT COUNT(*) as unified_customers
 FROM catalog.schema.customer_master;
 ```
 ### Review Statistics
 ```sql
 SELECT * FROM catalog.schema.unified_id_result_key_stats
 WHERE from_table = '*';
 ```
 ---
 ## Success Criteria
 Execution successful when:
 - ✅ All SQL files processed without critical errors
 - ✅ Unification loop converged (updated_count = 0)
 - ✅ Canonical IDs generated for all eligible records
 - ✅ Enriched tables created successfully
 - ✅ Master tables populated with attributes
 - ✅ Coverage metrics meet expectations
 ---
 **Ready to execute your Databricks ID unification workflow?**
 Provide your SQL directory path and Databricks connection details to begin!
--- a/commands/hybrid-execute-snowflake.md
+++ b/commands/hybrid-execute-snowflake.md
@@ -0,0 +1,401 @@
 ---
 name: hybrid-execute-snowflake
 description: Execute Snowflake ID unification workflow with convergence detection and monitoring
 ---
 # Execute Snowflake ID Unification Workflow
 ## Overview
 Execute your generated Snowflake SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling. This command orchestrates the complete unification process from graph creation to master table generation.
 ---
 ## What You Need
 ### Required Inputs
 1. **SQL Directory**: Path to generated SQL files (e.g., `snowflake_sql/unify/`)
 2. **Account**: Snowflake account name (e.g., `myaccount` from `myaccount.snowflakecomputing.com`)
 3. **User**: Snowflake username
 4. **Database**: Target database name
 5. **Schema**: Target schema name
 6. **Warehouse**: Compute warehouse name (defaults to `COMPUTE_WH`)
 ### Authentication
 **Option 1: Password**
 - Can be provided as argument or via environment variable `SNOWFLAKE_PASSWORD` via environment file (.env) `SNOWFLAKE_PASSWORD`
 - Will prompt if not provided
 **Option 2: SSO (externalbrowser)**
 - Opens browser for authentication
 - No password required
 **Option 3: Key-Pair**
 - Private key path via `SNOWFLAKE_PRIVATE_KEY_PATH`
 - Passphrase via `SNOWFLAKE_PRIVATE_KEY_PASSPHRASE`
 ---
 ## What I'll Do
 ### Step 1: Connection Setup
 - Connect to your Snowflake account
 - Validate credentials and permissions
 - Set database and schema context
 - Verify SQL directory exists
 - Activate warehouse
 ### Step 2: Execution Plan
 Display execution plan with:
 - All SQL files in execution order
 - File types (Setup, Loop Iteration, Enrichment, Master Table, etc.)
 - Estimated steps and dependencies
 ### Step 3: SQL Execution
 I'll call the **snowflake-workflow-executor agent** to:
 - Execute SQL files in proper sequence
 - Skip loop iteration files (handled separately)
 - Monitor progress with real-time feedback
 - Track row counts and execution times
 ### Step 4: Unify Loop with Convergence Detection
 **Intelligent Loop Execution**:
 ```
 Iteration 1:
  ✓ Execute unify SQL
  • Check convergence: 1500 records updated
  → Continue to iteration 2
 Iteration 2:
  ✓ Execute unify SQL
  • Check convergence: 450 records updated
  → Continue to iteration 3
 Iteration 3:
  ✓ Execute unify SQL
  • Check convergence: 0 records updated
  ✓ CONVERGED! Stop loop
 ```
 **Features**:
 - Runs until convergence (updated_count = 0)
 - Maximum 30 iterations safety limit
 - Creates alias table (loop_final) for downstream processing
 ### Step 5: Post-Loop Processing
 - Execute canonicalization step
 - Generate result statistics
 - Enrich source tables with canonical IDs
 - Create master tables
 - Generate metadata and lookup tables
 ### Step 6: Final Report
 Provide:
 - Total execution time
 - Files processed successfully
 - Convergence statistics
 - Final table row counts
 - Next steps and recommendations
 ---
 ## Command Usage
 ### Interactive Mode (Recommended)
 ```
 /cdp-hybrid-idu:hybrid-execute-snowflake
 I'll prompt you for:
 - SQL directory path
 - Snowflake account name
 - Username
 - Database and schema
 - Warehouse name
 - Authentication method
 ```
 ### Advanced Mode
 Provide all parameters upfront:
 ```
 SQL directory: snowflake_sql/unify/
 Account: myaccount
 User: myuser
 Database: my_database
 Schema: my_schema
 Warehouse: COMPUTE_WH
 Password: (will prompt if not in environment)
 ```
 ---
 ## Execution Features
 ### 1. Convergence Detection
 **Algorithm**:
 ```sql
 SELECT COUNT(*) as updated_count FROM (
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM current_iteration
    EXCEPT
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM previous_iteration
 ) diff
 ```
 **Stops when**: updated_count = 0
 ### 2. Interactive Error Handling
 If an error occurs:
 ```
 ✗ File: 04_unify_loop_iteration_01.sql
 Error: Table not found: source_table
 Continue with remaining files? (y/n):
 ```
 You can choose to:
 - Continue: Skip failed file, continue with rest
 - Stop: Halt execution for investigation
 ### 3. Real-Time Monitoring
 Track progress with:
 - ✓ Completed steps (green)
 - • Progress indicators (cyan)
 - ✗ Failed steps (red)
 - ⚠ Warnings (yellow)
 - Row counts and execution times
 ### 4. Alias Table Creation
 After convergence, creates:
 ```sql
 CREATE OR REPLACE TABLE database.schema.unified_id_graph_unify_loop_final
 AS SELECT * FROM database.schema.unified_id_graph_unify_loop_3
 ```
 This allows downstream SQL to reference `loop_final` regardless of actual iteration count.
 ---
 ## Technical Details
 ### Python Script Execution
 The agent executes:
 ```bash
 python3 scripts/snowflake/snowflake_sql_executor.py \
    snowflake_sql/unify/ \
    --account myaccount \
    --user myuser \
    --database my_database \
    --schema my_schema \
    --warehouse COMPUTE_WH
 ```
 ### Execution Order
 1. **Setup Phase** (01-03):
   - Create graph table (loop_0)
   - Extract and merge identities
   - Generate source statistics
 2. **Unification Loop** (04):
   - Run iterations until convergence
   - Check after EVERY iteration
   - Stop when updated_count = 0
   - Create loop_final alias
 3. **Canonicalization** (05):
   - Create canonical ID lookup
   - Create keys and tables metadata
   - Rename final graph table
 4. **Statistics** (06):
   - Generate result key statistics
   - Create histograms
   - Calculate coverage metrics
 5. **Enrichment** (10-19):
   - Add canonical IDs to source tables
   - Create enriched_* tables
 6. **Master Tables** (20-29):
   - Aggregate attributes
   - Apply priority rules
   - Create unified customer profiles
 7. **Metadata** (30-39):
   - Unification metadata
   - Filter lookup tables
   - Column lookup tables
 ### Connection Management
 - Establishes single connection for entire workflow
 - Uses connection pooling for efficiency
 - Automatic reconnection on timeout
 - Proper cleanup on completion or error
 ---
 ## Example Execution
 ### Input
 ```
 SQL directory: snowflake_sql/unify/
 Account: myorg-myaccount
 User: analytics_user
 Database: customer_data
 Schema: id_unification
 Warehouse: LARGE_WH
 ```
 ### Output
 ```
 ✓ Connected to Snowflake: myorg-myaccount
 • Using database: customer_data, schema: id_unification
 Starting Snowflake SQL Execution
 • Database: customer_data
 • Schema: id_unification
 Executing: 01_create_graph.sql
 ✓ 01_create_graph.sql: Executed successfully
 Executing: 02_extract_merge.sql
 ✓ 02_extract_merge.sql: Executed successfully
 • Rows affected: 125000
 Executing: 03_source_key_stats.sql
 ✓ 03_source_key_stats.sql: Executed successfully
 Executing Unify Loop Before Canonicalization
 --- Iteration 1 ---
 ✓ Iteration 1 completed
 • Rows processed: 125000
 • Updated records: 1500
 --- Iteration 2 ---
 ✓ Iteration 2 completed
 • Rows processed: 125000
 • Updated records: 450
 --- Iteration 3 ---
 ✓ Iteration 3 completed
 • Rows processed: 125000
 • Updated records: 0
 ✓ Loop converged after 3 iterations
 • Creating alias table for final iteration
 ✓ Alias table 'unified_id_graph_unify_loop_final' created
 Executing: 05_canonicalize.sql
 ✓ 05_canonicalize.sql: Executed successfully
 [... continues with enrichment, master tables, metadata ...]
 Execution Complete
 • Files processed: 18/18
 • Final unified_id_lookup rows: 98,500
 • Disconnected from Snowflake
 ```
 ---
 ## Monitoring and Troubleshooting
 ### Check Execution Progress
 During execution, you can monitor:
 - Snowflake query history
 - Table sizes and row counts
 - Warehouse utilization
 - Execution logs
 ### Common Issues
 **Issue**: Connection timeout
 **Solution**: Check network access, verify credentials, ensure warehouse is running
 **Issue**: Table not found
 **Solution**: Verify database/schema permissions, check source table names in YAML
 **Issue**: Loop doesn't converge
 **Solution**: Check data quality, increase max_iterations, review key validation rules
 **Issue**: Warehouse suspended
 **Solution**: Ensure auto-resume is enabled, manually resume warehouse if needed
 **Issue**: Permission denied
 **Solution**: Verify database/schema permissions, check role assignments
 ### Performance Optimization
 - Use larger warehouse for faster execution (L, XL, 2XL, etc.)
 - Enable multi-cluster warehouse for concurrency
 - Use clustering keys on frequently joined columns
 - Monitor query profiles for optimization opportunities
 ---
 ## Post-Execution Validation 
 **DO NOT RUN THESE VALIDATION. JUST PRESENT TO USER TO RUN ON SNOWFLAKE**
 ### Check Coverage
 ```sql
 SELECT
    COUNT(*) as total_records,
    COUNT(unified_id) as records_with_id,
    COUNT(unified_id) * 100.0 / COUNT(*) as coverage_percent
 FROM database.schema.enriched_customer_profiles;
 ```
 ### Verify Master Table
 ```sql
 SELECT COUNT(*) as unified_customers
 FROM database.schema.customer_master;
 ```
 ### Review Statistics
 ```sql
 SELECT * FROM database.schema.unified_id_result_key_stats
 WHERE from_table = '*';
 ```
 ---
 ## Success Criteria
 Execution successful when:
 - ✅ All SQL files processed without critical errors
 - ✅ Unification loop converged (updated_count = 0)
 - ✅ Canonical IDs generated for all eligible records
 - ✅ Enriched tables created successfully
 - ✅ Master tables populated with attributes
 - ✅ Coverage metrics meet expectations
 ---
 ## Authentication Examples
 ### Using Password
 ```bash
 export SNOWFLAKE_PASSWORD='your_password'
 /cdp-hybrid-idu:hybrid-execute-snowflake
 ```
 ### Using SSO
 ```bash
 /cdp-hybrid-idu:hybrid-execute-snowflake
 # Will prompt: Use SSO authentication? (y/n): y
 # Opens browser for authentication
 ```
 ### Using Key-Pair
 ```bash
 export SNOWFLAKE_PRIVATE_KEY_PATH='/path/to/key.p8'
 export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE='passphrase'
 /cdp-hybrid-idu:hybrid-execute-snowflake
 ```
 ---
 **Ready to execute your Snowflake ID unification workflow?**
 Provide your SQL directory path and Snowflake connection details to begin!
--- a/commands/hybrid-generate-databricks.md
+++ b/commands/hybrid-generate-databricks.md
@@ -0,0 +1,285 @@
 ---
 name: hybrid-generate-databricks
 description: Generate Databricks Delta Lake SQL from YAML configuration for ID unification
 ---
 # Generate Databricks SQL from YAML
 ## Overview
 Generate production-ready Databricks SQL workflow from your `unify.yml` configuration file. This command creates Delta Lake optimized SQL files with ACID transactions, clustering, and platform-specific function conversions.
 ---
 ## What You Need
 ### Required Inputs
 1. **YAML Configuration File**: Path to your `unify.yml`
 2. **Target Catalog**: Databricks Unity Catalog name
 3. **Target Schema**: Schema name within the catalog
 ### Optional Inputs
 4. **Source Catalog**: Catalog containing source tables (defaults to target catalog)
 5. **Source Schema**: Schema containing source tables (defaults to target schema)
 6. **Output Directory**: Where to save generated SQL (defaults to `databricks_sql/`)
 ---
 ## What I'll Do
 ### Step 1: Validation
 - Verify `unify.yml` exists and is valid
 - Check YAML syntax and structure
 - Validate keys, tables, and configuration sections
 ### Step 2: SQL Generation
 I'll call the **databricks-sql-generator agent** to:
 - Execute `yaml_unification_to_databricks.py` Python script
 - Apply Databricks-specific SQL conversions:
  - `ARRAY_SIZE` → `SIZE`
  - `ARRAY_CONSTRUCT` → `ARRAY`
  - `OBJECT_CONSTRUCT` → `STRUCT`
  - `COLLECT_LIST` for aggregations
  - `FLATTEN` for array operations
  - `UNIX_TIMESTAMP()` for time functions
 - Generate Delta Lake table definitions with clustering
 - Create convergence detection logic
 - Build cryptographic hashing for canonical IDs
 ### Step 3: Output Organization
 Generate complete SQL workflow in this structure:
 ```
 databricks_sql/unify/
 ├── 01_create_graph.sql              # Initialize graph with USING DELTA
 ├── 02_extract_merge.sql             # Extract identities with validation
 ├── 03_source_key_stats.sql          # Source statistics with GROUPING SETS
 ├── 04_unify_loop_iteration_*.sql    # Loop iterations (auto-calculated count)
 ├── 05_canonicalize.sql              # Canonical ID creation with key masks
 ├── 06_result_key_stats.sql          # Result statistics with histograms
 ├── 10_enrich_*.sql                  # Enrich each source table
 ├── 20_master_*.sql                  # Master tables with attribute aggregation
 ├── 30_unification_metadata.sql      # Metadata tables
 ├── 31_filter_lookup.sql             # Validation rules lookup
 └── 32_column_lookup.sql             # Column mapping lookup
 ```
 ### Step 4: Summary Report
 Provide:
 - Total SQL files generated
 - Estimated execution order
 - Delta Lake optimizations included
 - Key features enabled
 - Next steps for execution
 ---
 ## Command Usage
 ### Basic Usage
 ```
 /cdp-hybrid-idu:hybrid-generate-databricks
 I'll prompt you for:
 - YAML file path
 - Target catalog
 - Target schema
 ```
 ### Advanced Usage
 Provide all parameters upfront:
 ```
 YAML file: /path/to/unify.yml
 Target catalog: my_catalog
 Target schema: my_schema
 Source catalog: source_catalog (optional)
 Source schema: source_schema (optional)
 Output directory: custom_output/ (optional)
 ```
 ---
 ## Generated SQL Features
 ### Delta Lake Optimizations
 - **ACID Transactions**: `USING DELTA` for all tables
 - **Clustering**: `CLUSTER BY (follower_id)` on graph tables
 - **Table Properties**: Optimized for large-scale joins
 ### Advanced Capabilities
 1. **Dynamic Iteration Count**: Auto-calculates based on:
   - Number of merge keys
   - Number of tables
   - Data complexity (configurable via YAML)
 2. **Key-Specific Hashing**: Each key uses unique cryptographic mask:
   ```
   Key Type 1 (email):       0ffdbcf0c666ce190d
   Key Type 2 (customer_id): 61a821f2b646a4e890
   Key Type 3 (phone):       acd2206c3f88b3ee27
   ```
 3. **Validation Rules**:
   - `valid_regexp`: Regex pattern filtering
   - `invalid_texts`: NOT IN clause with NULL handling
   - Combined AND logic for strict validation
 4. **Master Table Attributes**:
   - Single value: `MAX_BY(attr, order)` with COALESCE
   - Array values: `SLICE(CONCAT(arrays), 1, N)`
   - Priority-based selection
 ### Platform-Specific Conversions
 The generator automatically converts:
 - Presto functions → Databricks equivalents
 - Snowflake functions → Databricks equivalents
 - Array operations → Spark SQL syntax
 - Window functions → optimized versions
 - Time functions → UNIX_TIMESTAMP()
 ---
 ## Example Workflow
 ### Input YAML (`unify.yml`)
 ```yaml
 name: customer_unification
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A']
 tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
 canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id]
    merge_iterations: 15
 master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1}
 ```
 ### Generated Output
 ```
 databricks_sql/unify/
 ├── 01_create_graph.sql              # Creates unified_id_graph_unify_loop_0
 ├── 02_extract_merge.sql             # Merges customer_profiles keys
 ├── 03_source_key_stats.sql          # Stats by table
 ├── 04_unify_loop_iteration_01.sql   # First iteration
 ├── 04_unify_loop_iteration_02.sql   # Second iteration
 ├── ...                              # Up to iteration_05
 ├── 05_canonicalize.sql              # Creates unified_id_lookup
 ├── 06_result_key_stats.sql          # Final statistics
 ├── 10_enrich_customer_profiles.sql  # Adds unified_id column
 ├── 20_master_customer_master.sql    # Creates customer_master table
 ├── 30_unification_metadata.sql      # Metadata
 ├── 31_filter_lookup.sql             # Validation rules
 └── 32_column_lookup.sql             # Column mappings
 ```
 ---
 ## Next Steps After Generation
 ### Option 1: Execute Immediately
 Use the execution command:
 ```
 /cdp-hybrid-idu:hybrid-execute-databricks
 ```
 ### Option 2: Review First
 1. Examine generated SQL files
 2. Verify table names and transformations
 3. Test with sample data
 4. Execute manually or via execution command
 ### Option 3: Customize
 1. Modify generated SQL as needed
 2. Add custom logic or transformations
 3. Execute using Databricks SQL editor or execution command
 ---
 ## Technical Details
 ### Python Script Execution
 The agent executes:
 ```bash
 python3 scripts/databricks/yaml_unification_to_databricks.py \
    unify.yml \
    -tc my_catalog \
    -ts my_schema \
    -sc source_catalog \
    -ss source_schema \
    -o databricks_sql
 ```
 ### SQL File Naming Convention
 - `01-09`: Setup and initialization
 - `10-19`: Source table enrichment
 - `20-29`: Master table creation
 - `30-39`: Metadata and lookup tables
 - `04_*_NN`: Loop iterations (auto-numbered)
 ### Convergence Detection
 Each loop iteration includes:
 ```sql
 -- Check if graph changed
 SELECT COUNT(*) FROM (
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM iteration_N
    EXCEPT
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM iteration_N_minus_1
 ) diff
 ```
 Stops when count = 0
 ---
 ## Troubleshooting
 ### Common Issues
 **Issue**: YAML validation error
 **Solution**: Check YAML syntax, ensure proper indentation, verify all required fields
 **Issue**: Table not found error
 **Solution**: Verify source catalog/schema, check table names in YAML
 **Issue**: Python script error
 **Solution**: Ensure Python 3.7+ installed, check pyyaml dependency
 **Issue**: Too many/few iterations
 **Solution**: Adjust `merge_iterations` in canonical_ids section of YAML
 ---
 ## Success Criteria
 Generated SQL will:
 - ✅ Be valid Databricks Spark SQL
 - ✅ Use Delta Lake for ACID transactions
 - ✅ Include proper clustering for performance
 - ✅ Have convergence detection built-in
 - ✅ Support incremental processing
 - ✅ Generate comprehensive statistics
 - ✅ Work without modification on Databricks
 ---
 **Ready to generate Databricks SQL from your YAML configuration?**
 Provide your YAML file path and target catalog/schema to begin!
--- a/commands/hybrid-generate-snowflake.md
+++ b/commands/hybrid-generate-snowflake.md
@@ -0,0 +1,288 @@
 ---
 name: hybrid-generate-snowflake
 description: Generate Snowflake SQL from YAML configuration for ID unification
 ---
 # Generate Snowflake SQL from YAML
 ## Overview
 Generate production-ready Snowflake SQL workflow from your `unify.yml` configuration file. This command creates Snowflake-native SQL files with proper clustering, VARIANT support, and platform-specific function conversions.
 ---
 ## What You Need
 ### Required Inputs
 1. **YAML Configuration File**: Path to your `unify.yml`
 2. **Target Database**: Snowflake database name
 3. **Target Schema**: Schema name within the database
 ### Optional Inputs
 4. **Source Database**: Database containing source tables (defaults to target database)
 5. **Source Schema**: Schema containing source tables (defaults to PUBLIC)
 6. **Output Directory**: Where to save generated SQL (defaults to `snowflake_sql/`)
 ---
 ## What I'll Do
 ### Step 1: Validation
 - Verify `unify.yml` exists and is valid
 - Check YAML syntax and structure
 - Validate keys, tables, and configuration sections
 ### Step 2: SQL Generation
 I'll call the **snowflake-sql-generator agent** to:
 - Execute `yaml_unification_to_snowflake.py` Python script
 - Generate Snowflake table definitions with clustering
 - Create convergence detection logic
 - Build cryptographic hashing for canonical IDs
 ### Step 3: Output Organization
 Generate complete SQL workflow in this structure:
 ```
 snowflake_sql/unify/
 ├── 01_create_graph.sql              # Initialize graph table
 ├── 02_extract_merge.sql             # Extract identities with validation
 ├── 03_source_key_stats.sql          # Source statistics with GROUPING SETS
 ├── 04_unify_loop_iteration_*.sql    # Loop iterations (auto-calculated count)
 ├── 05_canonicalize.sql              # Canonical ID creation with key masks
 ├── 06_result_key_stats.sql          # Result statistics with histograms
 ├── 10_enrich_*.sql                  # Enrich each source table
 ├── 20_master_*.sql                  # Master tables with attribute aggregation
 ├── 30_unification_metadata.sql      # Metadata tables
 ├── 31_filter_lookup.sql             # Validation rules lookup
 └── 32_column_lookup.sql             # Column mapping lookup
 ```
 ### Step 4: Summary Report
 Provide:
 - Total SQL files generated
 - Estimated execution order
 - Snowflake optimizations included
 - Key features enabled
 - Next steps for execution
 ---
 ## Command Usage
 ### Basic Usage
 ```
 /cdp-hybrid-idu:hybrid-generate-snowflake
 I'll prompt you for:
 - YAML file path
 - Target database
 - Target schema
 ```
 ### Advanced Usage
 Provide all parameters upfront:
 ```
 YAML file: /path/to/unify.yml
 Target database: my_database
 Target schema: my_schema
 Source database: source_database (optional)
 Source schema: PUBLIC (optional, defaults to PUBLIC)
 Output directory: custom_output/ (optional)
 ```
 ---
 ## Generated SQL Features
 ### Snowflake Optimizations
 - **Clustering**: `CLUSTER BY (follower_id)` on graph tables
 - **VARIANT Support**: Flexible data structures for arrays and objects
 - **Native Functions**: Snowflake-specific optimized functions
 ### Advanced Capabilities
 1. **Dynamic Iteration Count**: Auto-calculates based on:
   - Number of merge keys
   - Number of tables
   - Data complexity (configurable via YAML)
 2. **Key-Specific Hashing**: Each key uses unique cryptographic mask:
   ```
   Key Type 1 (email):       0ffdbcf0c666ce190d
   Key Type 2 (customer_id): 61a821f2b646a4e890
   Key Type 3 (phone):       acd2206c3f88b3ee27
   ```
 3. **Validation Rules**:
   - `valid_regexp`: REGEXP_LIKE pattern filtering
   - `invalid_texts`: NOT IN clause with proper NULL handling
   - Combined AND logic for strict validation
 4. **Master Table Attributes**:
   - Single value: `MAX_BY(attr, order)` with COALESCE
   - Array values: `ARRAY_SLICE(ARRAY_CAT(arrays), 0, N)`
   - Priority-based selection
 ### Platform-Specific Conversions
 The generator automatically converts:
 - Presto functions → Snowflake equivalents
 - Databricks functions → Snowflake equivalents
 - Array operations → ARRAY_CONSTRUCT/FLATTEN syntax
 - Window functions → optimized versions
 - Time functions → DATE_PART(epoch_second, CURRENT_TIMESTAMP())
 ---
 ## Example Workflow
 ### Input YAML (`unify.yml`)
 ```yaml
 name: customer_unification
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A']
 tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
 canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id]
    merge_iterations: 15
 master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1}
 ```
 ### Generated Output
 ```
 snowflake_sql/unify/
 ├── 01_create_graph.sql              # Creates unified_id_graph_unify_loop_0
 ├── 02_extract_merge.sql             # Merges customer_profiles keys
 ├── 03_source_key_stats.sql          # Stats by table
 ├── 04_unify_loop_iteration_01.sql   # First iteration
 ├── 04_unify_loop_iteration_02.sql   # Second iteration
 ├── ...                              # Up to iteration_05
 ├── 05_canonicalize.sql              # Creates unified_id_lookup
 ├── 06_result_key_stats.sql          # Final statistics
 ├── 10_enrich_customer_profiles.sql  # Adds unified_id column
 ├── 20_master_customer_master.sql    # Creates customer_master table
 ├── 30_unification_metadata.sql      # Metadata
 ├── 31_filter_lookup.sql             # Validation rules
 └── 32_column_lookup.sql             # Column mappings
 ```
 ---
 ## Next Steps After Generation
 ### Option 1: Execute Immediately
 Use the execution command:
 ```
 /cdp-hybrid-idu:hybrid-execute-snowflake
 ```
 ### Option 2: Review First
 1. Examine generated SQL files
 2. Verify table names and transformations
 3. Test with sample data
 4. Execute manually or via execution command
 ### Option 3: Customize
 1. Modify generated SQL as needed
 2. Add custom logic or transformations
 3. Execute using Snowflake SQL worksheet or execution command
 ---
 ## Technical Details
 ### Python Script Execution
 The agent executes:
 ```bash
 python3 scripts/snowflake/yaml_unification_to_snowflake.py \
    unify.yml \
    -d my_database \
    -s my_schema \
    -sd source_database \
    -ss source_schema \
    -o snowflake_sql
 ```
 ### SQL File Naming Convention
 - `01-09`: Setup and initialization
 - `10-19`: Source table enrichment
 - `20-29`: Master table creation
 - `30-39`: Metadata and lookup tables
 - `04_*_NN`: Loop iterations (auto-numbered)
 ### Convergence Detection
 Each loop iteration includes:
 ```sql
 -- Check if graph changed
 SELECT COUNT(*) FROM (
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM iteration_N
    EXCEPT
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM iteration_N_minus_1
 ) diff
 ```
 Stops when count = 0
 ### Snowflake-Specific Features
 - **LATERAL FLATTEN**: Array expansion for id_ns_array processing
 - **ARRAY_CONSTRUCT**: Building arrays from multiple columns
 - **OBJECT_CONSTRUCT**: Creating structured objects for key-value pairs
 - **ARRAYS_OVERLAP**: Checking array membership
 - **SPLIT_PART**: String splitting for leader key parsing
 ---
 ## Troubleshooting
 ### Common Issues
 **Issue**: YAML validation error
 **Solution**: Check YAML syntax, ensure proper indentation, verify all required fields
 **Issue**: Table not found error
 **Solution**: Verify source database/schema, check table names in YAML
 **Issue**: Python script error
 **Solution**: Ensure Python 3.7+ installed, check pyyaml dependency
 **Issue**: Too many/few iterations
 **Solution**: Adjust `merge_iterations` in canonical_ids section of YAML
 **Issue**: VARIANT column errors
 **Solution**: Snowflake VARIANT type handling is automatic, ensure proper casting in custom SQL
 ---
 ## Success Criteria
 Generated SQL will:
 - ✅ Be valid Snowflake SQL
 - ✅ Use native Snowflake functions
 - ✅ Include proper clustering for performance
 - ✅ Have convergence detection built-in
 - ✅ Support VARIANT types for flexible data
 - ✅ Generate comprehensive statistics
 - ✅ Work without modification on Snowflake
 ---
 **Ready to generate Snowflake SQL from your YAML configuration?**
 Provide your YAML file path and target database/schema to begin!
--- a/commands/hybrid-setup.md
+++ b/commands/hybrid-setup.md
@@ -0,0 +1,308 @@
 ---
 name: hybrid-setup
 description: Complete end-to-end hybrid ID unification setup - automatically analyzes tables, generates config, creates SQL, and executes workflow for Snowflake and Databricks
 ---
 # Hybrid ID Unification Complete Setup
 ## Overview
 I'll guide you through the complete hybrid ID unification setup process for Snowflake and/or Databricks platforms. This is an **automated, end-to-end workflow** that will:
 1. **Analyze your tables automatically** using platform MCP tools with strict PII detection
 2. **Generate YAML configuration** from real schema and data analysis
 3. **Choose target platform(s)** (Snowflake, Databricks, or both)
 4. **Generate platform-specific SQL** optimized for each engine
 5. **Execute workflows** with convergence detection and monitoring
 6. **Provide deployment guidance** and operating instructions
 **Key Features**:
 - 🔍 **Automated Table Analysis**: Uses Snowflake/Databricks MCP tools to analyze actual tables
 - ✅ **Strict PII Detection**: Zero tolerance - only includes tables with real user identifiers
 - 📊 **Real Data Validation**: Queries actual data to validate patterns and quality
 - 🎯 **Smart Recommendations**: Expert analysis provides merge strategy and priorities
 - 🚀 **End-to-End Automation**: From table analysis to workflow execution
 ---
 ## What You Need to Provide
 ### 1. Unification Requirements (For Automated Analysis)
 - **Platform**: Snowflake or Databricks
 - **Tables**: List of source tables to analyze
  - Format (Snowflake): `database.schema.table` or `schema.table` or `table`
  - Format (Databricks): `catalog.schema.table` or `schema.table` or `table`
 - **Canonical ID Name**: Name for your unified ID (e.g., `td_id`, `unified_customer_id`)
 - **Merge Iterations**: Number of unification loops (default: 10)
 - **Master Tables**: (Optional) Attribute aggregation specifications
 **Note**: The system will automatically:
 - Extract user identifiers from actual table schemas
 - Validate data patterns from real data
 - Apply appropriate validation rules based on data analysis
 - Generate merge strategy recommendations
 ### 2. Platform Selection
 - **Databricks**: Unity Catalog with Delta Lake
 - **Snowflake**: Database with proper permissions
 - **Both**: Generate SQL for both platforms
 ### 3. Target Configurations
 **For Databricks**:
 - **Catalog**: Target catalog name
 - **Schema**: Target schema name
 - **Source Catalog** (optional): Source data catalog
 - **Source Schema** (optional): Source data schema
 **For Snowflake**:
 - **Database**: Target database name
 - **Schema**: Target schema name
 - **Source Schema** (optional): Source data schema
 ### 4. Execution Credentials (if executing)
 **For Databricks**:
 - **Server Hostname**: your-workspace.databricks.com
 - **HTTP Path**: /sql/1.0/warehouses/your-warehouse-id
 - **Authentication**: PAT (Personal Access Token) or OAuth
 **For Snowflake**:
 - **Account**: Snowflake account name
 - **User**: Username
 - **Password**: Password or use SSO/key-pair
 - **Warehouse**: Compute warehouse name
 ---
 ## What I'll Do
 ### Step 1: Automated YAML Configuration Generation
 I'll use the **hybrid-unif-config-creator** command to automatically generate your `unify.yml` file:
 **Automated Analysis Approach** (Recommended):
 - Analyze your actual tables using platform MCP tools (Snowflake/Databricks)
 - Extract user identifiers with STRICT PII detection (zero tolerance for guessing)
 - Validate data patterns from real table data
 - Generate unify.yml with exact template compliance
 - Only include tables with actual user identifiers
 - Document excluded tables with detailed reasons
 **What I'll do**:
 - Call the **hybrid-unif-keys-extractor agent** to analyze tables
 - Query actual schema and data using platform MCP tools
 - Detect valid user identifiers (email, customer_id, phone, etc.)
 - Exclude tables without PII with full documentation
 - Generate production-ready unify.yml automatically
 **Alternative - Manual Configuration**:
 - If MCP tools are unavailable, I'll guide you through manual configuration
 - Interactive prompts for keys, tables, and validation rules
 - Step-by-step YAML building with validation
 ### Step 2: Platform Selection and Configuration
 I'll help you:
 - Choose between Databricks, Snowflake, or both
 - Collect platform-specific configuration (catalog/database, schema names)
 - Determine source/target separation strategy
 - Decide on execution or generation-only mode
 ### Step 3: SQL Generation
 **For Databricks** (if selected):
 I'll call the **databricks-sql-generator agent** to:
 - Execute `yaml_unification_to_databricks.py` script
 - Generate Delta Lake optimized SQL workflow
 - Create output directory: `databricks_sql/unify/`
 - Generate 15+ SQL files with proper execution order
 **For Snowflake** (if selected):
 I'll call the **snowflake-sql-generator agent** to:
 - Execute `yaml_unification_to_snowflake.py` script
 - Generate Snowflake-native SQL workflow
 - Create output directory: `snowflake_sql/unify/`
 - Generate 15+ SQL files with proper execution order
 ### Step 4: Workflow Execution (Optional)
 **For Databricks** (if execution requested):
 I'll call the **databricks-workflow-executor agent** to:
 - Execute `databricks_sql_executor.py` script
 - Connect to your Databricks workspace
 - Run SQL files in proper sequence
 - Monitor convergence and progress
 - Optimize Delta tables
 - Report final statistics
 **For Snowflake** (if execution requested):
 I'll call the **snowflake-workflow-executor agent** to:
 - Execute `snowflake_sql_executor.py` script
 - Connect to your Snowflake account
 - Run SQL files in proper sequence
 - Monitor convergence and progress
 - Report final statistics
 ### Step 5: Deployment Guidance
 I'll provide:
 - Configuration summary
 - Generated files overview
 - Deployment instructions
 - Operating guidelines
 - Monitoring recommendations
 ---
 ## Interactive Workflow
 This command orchestrates the complete end-to-end flow by calling specialized commands in sequence:
 ### Phase 1: Configuration Creation
 **I'll ask you for**:
 - Platform (Snowflake or Databricks)
 - Tables to analyze
 - Canonical ID name
 - Merge iterations
 **Then I'll**:
 - Call `/cdp-hybrid-idu:hybrid-unif-config-creator` internally
 - Analyze your tables automatically
 - Generate `unify.yml` with strict PII detection
 - Show you the configuration for review
 ### Phase 2: SQL Generation
 **I'll ask you**:
 - Which platform(s) to generate SQL for (can be different from source)
 - Output directory preferences
 **Then I'll**:
 - Call `/cdp-hybrid-idu:hybrid-generate-snowflake` (if Snowflake selected)
 - Call `/cdp-hybrid-idu:hybrid-generate-databricks` (if Databricks selected)
 - Generate 15+ optimized SQL files per platform
 - Show you the execution plan
 ### Phase 3: Workflow Execution (Optional)
 **I'll ask you**:
 - Do you want to execute now or later?
 - Connection credentials if executing
 **Then I'll**:
 - Call `/cdp-hybrid-idu:hybrid-execute-snowflake` (if Snowflake selected)
 - Call `/cdp-hybrid-idu:hybrid-execute-databricks` (if Databricks selected)
 - Monitor convergence and progress
 - Report final statistics
 **Throughout the process**:
 - **Questions**: When I need your input
 - **Suggestions**: Recommended approaches based on best practices
 - **Validation**: Real-time checks on your choices
 - **Explanations**: Help you understand concepts and options
 ---
 ## Expected Output
 ### Files Created (Platform-specific):
 **For Databricks**:
 ```
 databricks_sql/unify/
 ├── 01_create_graph.sql              # Initialize identity graph
 ├── 02_extract_merge.sql             # Extract and merge identities
 ├── 03_source_key_stats.sql          # Source statistics
 ├── 04_unify_loop_iteration_*.sql    # Iterative unification (N files)
 ├── 05_canonicalize.sql              # Canonical ID creation
 ├── 06_result_key_stats.sql          # Result statistics
 ├── 10_enrich_*.sql                  # Source table enrichment (N files)
 ├── 20_master_*.sql                  # Master table creation (N files)
 ├── 30_unification_metadata.sql      # Metadata tables
 ├── 31_filter_lookup.sql             # Validation rules
 └── 32_column_lookup.sql             # Column mappings
 ```
 **For Snowflake**:
 ```
 snowflake_sql/unify/
 ├── 01_create_graph.sql              # Initialize identity graph
 ├── 02_extract_merge.sql             # Extract and merge identities
 ├── 03_source_key_stats.sql          # Source statistics
 ├── 04_unify_loop_iteration_*.sql    # Iterative unification (N files)
 ├── 05_canonicalize.sql              # Canonical ID creation
 ├── 06_result_key_stats.sql          # Result statistics
 ├── 10_enrich_*.sql                  # Source table enrichment (N files)
 ├── 20_master_*.sql                  # Master table creation (N files)
 ├── 30_unification_metadata.sql      # Metadata tables
 ├── 31_filter_lookup.sql             # Validation rules
 └── 32_column_lookup.sql             # Column mappings
 ```
 **Configuration**:
 ```
 unify.yml                            # YAML configuration (created interactively)
 ```
 ---
 ## Success Criteria
 All generated files will:
 - ✅ Be platform-optimized and production-ready
 - ✅ Use proper SQL dialects (Databricks Spark SQL or Snowflake SQL)
 - ✅ Include convergence detection logic
 - ✅ Support incremental processing
 - ✅ Generate comprehensive statistics
 - ✅ Work without modification on target platforms
 ---
 ## Getting Started
 **Ready to begin?** I'll use the **hybrid-unif-config-creator** to automatically analyze your tables and generate the YAML configuration.
 Please provide:
 1. **Platform**: Which platform contains your data?
   - Snowflake or Databricks
 2. **Tables**: Which source tables should I analyze?
   - Format (Snowflake): `database.schema.table` or `schema.table` or `table`
   - Format (Databricks): `catalog.schema.table` or `schema.table` or `table`
   - Example: `customer_db.public.customers`, `orders`, `web_events.user_activity`
 3. **Canonical ID Name**: What should I call the unified ID?
   - Example: `td_id`, `unified_customer_id`, `master_id`
   - Default: `td_id`
 4. **Merge Iterations** (optional): How many unification loops?
   - Default: 10
   - Range: 2-30
 5. **Target Platform(s)** for SQL generation:
   - Same as source, or generate for both platforms
 **Example**:
 ```
 I want to set up hybrid ID unification for:
 Platform: Snowflake
 Tables:
 - customer_db.public.customer_profiles
 - customer_db.public.orders
 - marketing_db.public.campaigns
 - event_db.public.web_events
 Canonical ID: unified_customer_id
 Merge Iterations: 10
 Generate SQL for: Snowflake (or both Snowflake and Databricks)
 ```
 **What I'll do next**:
 1. ✅ Analyze your tables using Snowflake MCP tools
 2. ✅ Extract user identifiers with strict PII detection
 3. ✅ Generate unify.yml automatically
 4. ✅ Generate platform-specific SQL files
 5. ✅ Execute workflow (if requested)
 6. ✅ Provide deployment guidance
 ---
 **Let's get started with your hybrid ID unification setup!**
--- a/commands/hybrid-unif-config-creator.md
+++ b/commands/hybrid-unif-config-creator.md
@@ -0,0 +1,491 @@
 ---
 name: hybrid-unif-config-creator
 description: Auto-generate unify.yml configuration for Snowflake/Databricks by extracting user identifiers from actual tables using strict PII detection
 ---
 # Unify Configuration Creator for Snowflake/Databricks
 ## Overview
 I'll automatically generate a production-ready `unify.yml` configuration file for your Snowflake or Databricks ID unification by:
 1. **Analyzing your actual tables** using platform-specific MCP tools
 2. **Extracting user identifiers** with zero-tolerance PII detection
 3. **Validating data patterns** from real table data
 4. **Generating unify.yml** using the exact template format
 5. **Providing recommendations** for merge strategies and priorities
 **This command uses STRICT analysis - only tables with actual user identifiers will be included.**
 ---
 ## What You Need to Provide
 ### 1. Platform Selection
 - **Snowflake**: For Snowflake databases
 - **Databricks**: For Databricks Unity Catalog tables
 ### 2. Tables to Analyze
 Provide tables you want to analyze for ID unification:
 - **Format (Snowflake)**: `database.schema.table` or `schema.table` or `table`
 - **Format (Databricks)**: `catalog.schema.table` or `schema.table` or `table`
 - **Example**: `customer_data.public.customers`, `orders`, `web_events.user_activity`
 ### 3. Canonical ID Configuration
 - **Name**: Name for your unified ID (default: `td_id`)
 - **Merge Iterations**: Number of unification loop iterations (default: 10)
 - **Incremental Iterations**: Iterations for incremental processing (default: 5)
 ### 4. Output Configuration (Optional)
 - **Output File**: Where to save unify.yml (default: `unify.yml`)
 - **Template Path**: Path to template if using custom (default: uses built-in exact template)
 ---
 ## What I'll Do
 ### Step 1: Platform Detection and Validation
 ```
 1. Confirm platform (Snowflake or Databricks)
 2. Verify MCP tools are available for the platform
 3. Set up platform-specific query patterns
 4. Inform you of the analysis approach
 ```
 ### Step 2: Key Extraction with hybrid-unif-keys-extractor Agent
 I'll launch the **hybrid-unif-keys-extractor agent** to:
 **Schema Analysis**:
 - Use platform MCP tools to describe each table
 - Extract exact column names and data types
 - Identify accessible vs inaccessible tables
 **User Identifier Detection**:
 - Apply STRICT matching rules for user identifiers:
  - ✅ Email columns (email, email_std, email_address, etc.)
  - ✅ Phone columns (phone, phone_number, mobile_phone, etc.)
  - ✅ User IDs (user_id, customer_id, account_id, etc.)
  - ✅ Cookie/Device IDs (td_client_id, cookie_id, etc.)
  - ❌ System columns (id, created_at, time, etc.)
  - ❌ Complex types (arrays, maps, objects, variants, structs)
 **Data Validation**:
 - Query actual MIN/MAX values from each identified column
 - Analyze data patterns and quality
 - Count unique values per identifier
 - Detect data quality issues
 **Table Classification**:
 - **INCLUDED**: Tables with valid user identifiers
 - **EXCLUDED**: Tables without user identifiers (fully documented why)
 **Expert Analysis**:
 - 3 SQL experts review the data
 - Provide priority recommendations
 - Suggest validation rules based on actual data patterns
 ### Step 3: Unify.yml Generation
 **CRITICAL**: Using the **EXACT BUILT-IN template structure** (embedded in hybrid-unif-keys-extractor agent)
 **Template Usage Process**:
 ```
 1. Receive structured data from hybrid-unif-keys-extractor agent:
   - Keys with validation rules
   - Tables with column mappings
   - Canonical ID configuration
   - Master tables specification
 2. Use BUILT-IN template structure (see agent documentation)
 3. ONLY replace these specific values:
   - Line 1: name: {canonical_id_name}
   - keys section: actual keys found
   - tables section: actual tables with actual columns
   - canonical_ids section: name and merge_by_keys
   - master_tables section: [] or user specifications
 4. PRESERVE everything else:
   - ALL comment blocks (#####...)
   - ALL comment text ("Declare Validation logic", etc.)
   - ALL spacing and indentation (2 spaces per level)
   - ALL blank lines
   - EXACT YAML structure
 5. Use Write tool to save populated unify.yml
 ```
 **I'll generate**:
 **Section 1: Canonical ID Name**
 ```yaml
 name: {your_canonical_id_name}
 ```
 **Section 2: Keys with Validation**
 ```yaml
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
  - name: phone_number
    invalid_texts: ['', 'N/A', 'null']
 ```
 *Populated with actual keys found in your tables*
 **Section 3: Tables with Key Column Mappings**
 ```yaml
 tables:
  - database: {database/catalog}
    table: {table_name}
    key_columns:
      - {column: actual_column_name, key: mapped_key}
      - {column: another_column, key: another_key}
 ```
 *Only tables with valid user identifiers, with EXACT column names from schema analysis*
 **Section 4: Canonical IDs Configuration**
 ```yaml
 canonical_ids:
  - name: {your_canonical_id_name}
    merge_by_keys: [email, customer_id, phone_number]
    merge_iterations: 15
 ```
 *Based on extracted keys and your configuration*
 **Section 5: Master Tables (Optional)**
 ```yaml
 master_tables:
  - name: {canonical_id_name}_master_table
    canonical_id: {canonical_id_name}
    attributes:
      - name: best_email
        source_columns:
          - {table: table1, column: email, order: last, order_by: time, priority: 1}
          - {table: table2, column: email_address, order: last, order_by: time, priority: 2}
 ```
 *If you request master table configuration, I'll help set up attribute aggregation*
 ### Step 4: Validation and Review
 After generation:
 ```
 1. Show complete unify.yml content
 2. Highlight key sections:
   - Keys found: [list]
   - Tables included: [count]
   - Tables excluded: [count] with reasons
   - Merge strategy: [keys and priorities]
 3. Provide recommendations for optimization
 4. Ask for your approval before saving
 ```
 ### Step 5: File Output
 ```
 1. Write unify.yml to specified location
 2. Create backup of existing file if present
 3. Provide file summary:
   - Keys configured: X
   - Tables configured: Y
   - Validation rules: Z
 4. Show next steps for using the configuration
 ```
 ---
 ## Example Workflow
 **Input**:
 ```
 Platform: Snowflake
 Tables:
  - customer_data.public.customers
  - customer_data.public.orders
  - web_data.public.events
 Canonical ID Name: unified_customer_id
 Output: snowflake_unify.yml
 ```
 **Process**:
 ```
 ✓ Platform: Snowflake MCP tools detected
 ✓ Analyzing 3 tables...
 Schema Analysis:
  ✓ customer_data.public.customers - 12 columns
  ✓ customer_data.public.orders - 8 columns
  ✓ web_data.public.events - 15 columns
 User Identifier Detection:
  ✓ customers: email, customer_id (2 identifiers)
  ✓ orders: customer_id, email_address (2 identifiers)
  ✗ events: NO user identifiers found
    Available columns: event_id, session_id, page_url, timestamp, ...
    Reason: Contains only event tracking data - no PII
 Data Analysis:
  ✓ email: 45,123 unique values, format valid
  ✓ customer_id: 45,089 unique values, numeric
  ✓ email_address: 12,456 unique values, format valid
 Expert Analysis Complete:
  Priority 1: customer_id (most stable, highest coverage)
  Priority 2: email (good coverage, some quality issues)
  Priority 3: phone_number (not found)
 Generating unify.yml...
  ✓ Keys section: 2 keys configured
  ✓ Tables section: 2 tables configured
  ✓ Canonical IDs: unified_customer_id
  ✓ Validation rules: Applied based on data patterns
 Tables EXCLUDED:
  - web_data.public.events: No user identifiers
 ```
 **Output (snowflake_unify.yml)**:
 ```yaml
 name: unified_customer_id
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A', 'null']
 tables:
  - database: customer_data
    table: customers
    key_columns:
      - {column: email, key: email}
      - {column: customer_id, key: customer_id}
  - database: customer_data
    table: orders
    key_columns:
      - {column: email_address, key: email}
      - {column: customer_id, key: customer_id}
 canonical_ids:
  - name: unified_customer_id
    merge_by_keys: [customer_id, email]
    merge_iterations: 15
 master_tables: []
 ```
 ---
 ## Key Features
 ### 🔍 **STRICT PII Detection**
 - Zero tolerance for guessing
 - Only includes tables with actual user identifiers
 - Documents why tables are excluded
 - Based on REAL schema and data analysis
 ### ✅ **Exact Template Compliance**
 - Uses BUILT-IN exact template structure (embedded in hybrid-unif-keys-extractor agent)
 - NO modifications to template format
 - Preserves all comment sections
 - Maintains exact YAML structure
 - Portable across all systems
 ### 📊 **Real Data Analysis**
 - Queries actual MIN/MAX values
 - Counts unique identifiers
 - Validates data patterns
 - Identifies quality issues
 ### 🎯 **Platform-Aware**
 - Uses correct MCP tools for each platform
 - Respects platform naming conventions
 - Applies platform-specific data type rules
 - Generates platform-compatible SQL references
 ### 📋 **Complete Documentation**
 - Documents all excluded tables with reasons
 - Lists available columns for excluded tables
 - Explains why columns don't qualify as user identifiers
 - Provides expert recommendations
 ---
 ## Output Format
 **The generated unify.yml will have EXACTLY this structure:**
 ```yaml
 name: {canonical_id_name}
 #####################################################
 ##
 ##Declare Validation logic for unification keys
 ##
 #####################################################
 keys:
  - name: {key1}
    valid_regexp: "{pattern}"
    invalid_texts: ['{val1}', '{val2}', '{val3}']
  - name: {key2}
    invalid_texts: ['{val1}', '{val2}', '{val3}']
 #####################################################
 ##
 ##Declare databases, tables, and keys to use during unification
 ##
 #####################################################
 tables:
  - database: {db/catalog}
    table: {table}
    key_columns:
      - {column: {col}, key: {key}}
 #####################################################
 ##
 ##Declare hierarchy for unification. Define keys to use for each level.
 ##
 #####################################################
 canonical_ids:
  - name: {canonical_id_name}
    merge_by_keys: [{key1}, {key2}, ...]
    merge_iterations: {number}
 #####################################################
 ##
 ##Declare Similar Attributes and standardize into a single column
 ##
 #####################################################
 master_tables:
  - name: {canonical_id_name}_master_table
    canonical_id: {canonical_id_name}
    attributes:
      - name: {attribute}
        source_columns:
          - {table: {t}, column: {c}, order: last, order_by: time, priority: 1}
 ```
 **NO deviations from this structure - EXACT template compliance guaranteed.**
 ---
 ## Prerequisites
 ### Required:
 - ✅ Snowflake or Databricks platform access
 - ✅ Platform-specific MCP tools configured (may use fallback if unavailable)
 - ✅ Read permissions on tables to be analyzed
 - ✅ Tables must exist and be accessible
 ### Optional:
 - Custom unify.yml template path (if not using default)
 - Master table attribute specifications
 - Custom validation rules
 ---
 ## Expected Timeline
 | Step | Duration |
 |------|----------|
 | Platform detection | < 1 min |
 | Schema analysis (per table) | 5-10 sec |
 | Data analysis (per identifier) | 10-20 sec |
 | Expert analysis | 1-2 min |
 | YAML generation | < 1 min |
 | **Total (for 5 tables)** | **~3-5 min** |
 ---
 ## Error Handling
 ### Common Issues:
 **Issue**: MCP tools not available for platform
 **Solution**:
 - I'll inform you and provide fallback options
 - You can provide schema information manually
 - I'll still generate unify.yml with validation warnings
 **Issue**: No tables have user identifiers
 **Solution**:
 - I'll show you why tables were excluded
 - Suggest alternative tables to analyze
 - Explain what constitutes a user identifier
 **Issue**: Table not accessible
 **Solution**:
 - Document which tables are inaccessible
 - Continue with accessible tables
 - Recommend permission checks
 **Issue**: Complex data types found
 **Solution**:
 - Exclude complex type columns (arrays, structs, maps)
 - Explain why they can't be used for unification
 - Suggest alternative columns if available
 ---
 ## Success Criteria
 Generated unify.yml will:
 - ✅ Use EXACT template structure - NO modifications
 - ✅ Contain ONLY tables with validated user identifiers
 - ✅ Include ONLY columns that actually exist in tables
 - ✅ Have validation rules based on actual data patterns
 - ✅ Be ready for immediate use with hybrid-generate-snowflake or hybrid-generate-databricks
 - ✅ Work without any manual edits
 - ✅ Include comprehensive documentation in comments
 ---
 ## Next Steps After Generation
 1. **Review the generated unify.yml**
   - Verify tables and columns are correct
   - Check validation rules are appropriate
   - Review merge strategy and priorities
 2. **Generate SQL for your platform**:
   - Snowflake: `/cdp-hybrid-idu:hybrid-generate-snowflake`
   - Databricks: `/cdp-hybrid-idu:hybrid-generate-databricks`
 3. **Execute the workflow**:
   - Snowflake: `/cdp-hybrid-idu:hybrid-execute-snowflake`
   - Databricks: `/cdp-hybrid-idu:hybrid-execute-databricks`
 4. **Monitor convergence and results**
 ---
 ## Getting Started
 **Ready to begin?**
 Please provide:
 1. **Platform**: Snowflake or Databricks
 2. **Tables**: List of tables to analyze (full paths)
 3. **Canonical ID Name**: Name for your unified ID (e.g., `unified_customer_id`)
 4. **Output File** (optional): Where to save unify.yml (default: `unify.yml`)
 **Example**:
 ```
 Platform: Snowflake
 Tables:
  - customer_db.public.customers
  - customer_db.public.orders
  - marketing_db.public.campaigns
 Canonical ID: unified_id
 Output: snowflake_unify.yml
 ```
 ---
 **I'll analyze your tables and generate a production-ready unify.yml configuration!**
--- a/commands/hybrid-unif-config-validate.md
+++ b/commands/hybrid-unif-config-validate.md
@@ -0,0 +1,337 @@
 ---
 name: hybrid-unif-config-validate
 description: Validate YAML configuration for hybrid ID unification before SQL generation
 ---
 # Validate Hybrid ID Unification YAML
 ## Overview
 Validate your `unify.yml` configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
 ---
 ## What You Need
 ### Required Input
 1. **YAML Configuration File**: Path to your `unify.yml`
 ---
 ## What I'll Do
 ### Step 1: File Validation
 - Verify file exists and is readable
 - Check YAML syntax (proper indentation, quotes, etc.)
 - Ensure all required sections are present
 ### Step 2: Structure Validation
 Check presence and structure of:
 - **name**: Unification project name
 - **keys**: Key definitions with validation rules
 - **tables**: Source tables with key column mappings
 - **canonical_ids**: Canonical ID configuration
 - **master_tables**: Master table definitions (optional)
 ### Step 3: Content Validation
 Validate individual sections:
 **Keys Section**:
 - ✓ Each key has a unique name
 - ✓ `valid_regexp` is a valid regex pattern (if provided)
 - ✓ `invalid_texts` is an array (if provided)
 - ⚠ Recommend validation rules if missing
 **Tables Section**:
 - ✓ Each table has a name
 - ✓ Each table has at least one key_column
 - ✓ All referenced keys exist in keys section
 - ✓ Column names are valid identifiers
 - ⚠ Check for duplicate table definitions
 **Canonical IDs Section**:
 - ✓ Has a name (will be canonical ID column name)
 - ✓ `merge_by_keys` references existing keys
 - ✓ `merge_iterations` is a positive integer (if provided)
 - ⚠ Suggest optimal iteration count if not specified
 **Master Tables Section** (if present):
 - ✓ Each master table has a name and canonical_id
 - ✓ Referenced canonical_id exists
 - ✓ Attributes have proper structure
 - ✓ Source tables in attributes exist
 - ✓ Priority values are valid
 - ⚠ Check for attribute conflicts
 ### Step 4: Cross-Reference Validation
 - ✓ All merge_by_keys exist in keys section
 - ✓ All key_columns reference defined keys
 - ✓ All master table source tables exist in tables section
 - ✓ Canonical ID names don't conflict with existing columns
 ### Step 5: Best Practices Check
 Provide recommendations for:
 - Key validation rules
 - Iteration count optimization
 - Master table attribute priorities
 - Performance considerations
 ### Step 6: Validation Report
 Generate comprehensive report with:
 - ✅ Passed checks
 - ⚠ Warnings (non-critical issues)
 - ❌ Errors (must fix before generation)
 - 💡 Recommendations for improvement
 ---
 ## Command Usage
 ### Basic Usage
 ```
 /cdp-hybrid-idu:hybrid-unif-config-validate
 I'll prompt you for:
 - YAML file path
 ```
 ### Direct Usage
 ```
 YAML file: /path/to/unify.yml
 ```
 ---
 ## Example Validation
 ### Input YAML
 ```yaml
 name: customer_unification
 keys:
  - name: email
    valid_regexp: ".*@.*"
    invalid_texts: ['', 'N/A', 'null']
  - name: customer_id
    invalid_texts: ['', 'N/A']
 tables:
  - table: customer_profiles
    key_columns:
      - {column: email_std, key: email}
      - {column: customer_id, key: customer_id}
  - table: orders
    key_columns:
      - {column: email_address, key: email}
 canonical_ids:
  - name: unified_id
    merge_by_keys: [email, customer_id]
    merge_iterations: 15
 master_tables:
  - name: customer_master
    canonical_id: unified_id
    attributes:
      - name: best_email
        source_columns:
          - {table: customer_profiles, column: email_std, priority: 1}
          - {table: orders, column: email_address, priority: 2}
 ```
 ### Validation Report
 ```
 ✅ YAML VALIDATION SUCCESSFUL
 File Structure:
  ✅ Valid YAML syntax
  ✅ All required sections present
  ✅ Proper indentation and formatting
 Keys Section (2 keys):
  ✅ email: Valid regex pattern, invalid_texts defined
  ✅ customer_id: Invalid_texts defined
  ⚠ Consider adding valid_regexp for customer_id for better validation
 Tables Section (2 tables):
  ✅ customer_profiles: 2 key columns mapped
  ✅ orders: 1 key column mapped
  ✅ All referenced keys exist
 Canonical IDs Section:
  ✅ Name: unified_id
  ✅ Merge keys: email, customer_id (both exist)
  ✅ Iterations: 15 (recommended range: 10-20)
 Master Tables Section (1 master table):
  ✅ customer_master: References unified_id
  ✅ Attribute 'best_email': 2 sources with priorities
  ✅ All source tables exist
 Cross-References:
  ✅ All merge_by_keys defined in keys section
  ✅ All key_columns reference existing keys
  ✅ All master table sources exist
  ✅ No canonical ID name conflicts
 Recommendations:
  💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
  💡 Add more master table attributes for richer customer profiles
  💡 Consider array attributes (top_3_emails) for historical tracking
 Summary:
  ✅ 0 errors
  ⚠ 1 warning
  💡 3 recommendations
 ✓ Configuration is ready for SQL generation!
 ```
 ---
 ## Validation Checks
 ### Required Checks (Must Pass)
 - [ ] File exists and is readable
 - [ ] Valid YAML syntax
 - [ ] `name` field present
 - [ ] `keys` section present with at least one key
 - [ ] `tables` section present with at least one table
 - [ ] `canonical_ids` section present
 - [ ] All merge_by_keys exist in keys section
 - [ ] All key_columns reference defined keys
 - [ ] No duplicate key names
 - [ ] No duplicate table names
 ### Warning Checks (Recommended)
 - [ ] Keys have validation rules (valid_regexp or invalid_texts)
 - [ ] Merge_iterations specified (otherwise auto-calculated)
 - [ ] Master tables defined for unified customer view
 - [ ] Source tables have unique key combinations
 - [ ] Attribute priorities are sequential
 ### Best Practice Checks
 - [ ] Email keys have email regex pattern
 - [ ] Phone keys have phone validation
 - [ ] Invalid_texts include common null values ('', 'N/A', 'null')
 - [ ] Master tables use time-based order_by for recency
 - [ ] Array attributes for historical data (top_3_emails, etc.)
 ---
 ## Common Validation Errors
 ### Syntax Errors
 **Error**: `Invalid YAML: mapping values are not allowed here`
 **Solution**: Check indentation (use spaces, not tabs), ensure colons have space after them
 **Error**: `Invalid YAML: could not find expected ':'`
 **Solution**: Check for missing colons in key-value pairs
 ### Structure Errors
 **Error**: `Missing required section: keys`
 **Solution**: Add keys section with at least one key definition
 **Error**: `Empty tables section`
 **Solution**: Add at least one table with key_columns
 ### Reference Errors
 **Error**: `Key 'phone' referenced in table 'orders' but not defined in keys section`
 **Solution**: Add phone key to keys section or remove reference
 **Error**: `Merge key 'phone_number' not found in keys section`
 **Solution**: Add phone_number to keys section or remove from merge_by_keys
 **Error**: `Master table source 'customer_360' not found in tables section`
 **Solution**: Add customer_360 to tables section or use correct table name
 ### Value Errors
 **Error**: `merge_iterations must be a positive integer, got: 'auto'`
 **Solution**: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
 **Error**: `Priority must be a positive integer, got: 'high'`
 **Solution**: Use numeric priority (1 for highest, 2 for second, etc.)
 ---
 ## Validation Levels
 ### Strict Mode (Default)
 - Fails on any structural errors
 - Warns on missing best practices
 - Recommends optimizations
 ### Lenient Mode
 - Only fails on critical syntax errors
 - Allows missing optional fields
 - Minimal warnings
 ---
 ## Platform-Specific Validation
 ### Databricks-Specific
 - ✓ Table names compatible with Unity Catalog
 - ✓ Column names valid for Spark SQL
 - ⚠ Check for reserved keywords (DATABASE, TABLE, etc.)
 ### Snowflake-Specific
 - ✓ Table names compatible with Snowflake
 - ✓ Column names valid for Snowflake SQL
 - ⚠ Check for reserved keywords (ACCOUNT, SCHEMA, etc.)
 ---
 ## What Happens Next
 ### If Validation Passes
 ```
 ✅ Configuration validated successfully!
 Ready for:
  • SQL generation (Databricks or Snowflake)
  • Direct execution after generation
 Next steps:
  1. /cdp-hybrid-idu:hybrid-generate-databricks
  2. /cdp-hybrid-idu:hybrid-generate-snowflake
  3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
 ```
 ### If Validation Fails
 ```
 ❌ Configuration has errors that must be fixed
 Errors (must fix):
  1. Missing required section: canonical_ids
  2. Undefined key 'phone' referenced in table 'orders'
 Suggestions:
  • Add canonical_ids section with name and merge_by_keys
  • Add phone key to keys section or remove from orders
 Would you like help fixing these issues? (y/n)
 ```
 I can help you:
 - Fix syntax errors
 - Add missing sections
 - Define proper validation rules
 - Optimize configuration
 ---
 ## Success Criteria
 Validation passes when:
 - ✅ YAML syntax is valid
 - ✅ All required sections present
 - ✅ All references resolved
 - ✅ No structural errors
 - ✅ Ready for SQL generation
 ---
 **Ready to validate your YAML configuration?**
 Provide your `unify.yml` file path to begin validation!
--- a/commands/hybrid-unif-merge-stats-creator.md
+++ b/commands/hybrid-unif-merge-stats-creator.md
@@ -0,0 +1,726 @@
 ---
 name: hybrid-unif-merge-stats-creator
 description: Generate professional HTML/PDF merge statistics report from ID unification results for Snowflake or Databricks with expert analysis and visualizations
 ---
 # ID Unification Merge Statistics Report Generator
 ## Overview
 I'll generate a **comprehensive, professional HTML report** analyzing your ID unification merge statistics with:
 - 📊 **Executive Summary** with key performance indicators
 - 📈 **Identity Resolution Performance** analysis and deduplication rates
 - 🎯 **Merge Distribution** patterns and complexity analysis
 - 👥 **Top Merged Profiles** highlighting complex identity resolutions
 - ✅ **Data Quality Metrics** with coverage percentages
 - 🚀 **Convergence Analysis** showing iteration performance
 - 💡 **Expert Recommendations** for optimization and next steps
 **Platform Support:**
 - ✅ Snowflake (using Snowflake MCP tools)
 - ✅ Databricks (using Databricks MCP tools)
 **Output Format:**
 - Beautiful HTML report with charts, tables, and visualizations
 - PDF-ready (print to PDF from browser)
 - Consistent formatting every time
 - Platform-agnostic design
 ---
 ## What You Need to Provide
 ### 1. Platform Selection
 - **Snowflake**: For Snowflake-based ID unification
 - **Databricks**: For Databricks-based ID unification
 ### 2. Database/Catalog Configuration
 **For Snowflake:**
 - **Database Name**: Where your unification tables are stored (e.g., `INDRESH_TEST`, `CUSTOMER_CDP`)
 - **Schema Name**: Schema containing tables (e.g., `PUBLIC`, `ID_UNIFICATION`)
 **For Databricks:**
 - **Catalog Name**: Unity Catalog name (e.g., `customer_data`, `cdp_prod`)
 - **Schema Name**: Schema containing tables (e.g., `id_unification`, `unified_profiles`)
 ### 3. Canonical ID Configuration
 - **Canonical ID Name**: Name used for your unified ID (e.g., `td_id`, `unified_customer_id`, `master_id`)
  - This is used to find the correct tables: `{canonical_id}_lookup`, `{canonical_id}_master_table`, etc.
 ### 4. Output Configuration (Optional)
 - **Output File Path**: Where to save the HTML report (default: `id_unification_report.html`)
 - **Report Title**: Custom title for the report (default: "ID Unification Merge Statistics Report")
 ---
 ## What I'll Do
 ### Step 1: Platform Detection and Validation
 **Snowflake:**
 ```
 1. Verify Snowflake MCP tools are available
 2. Test connection to specified database.schema
 3. Validate canonical ID tables exist:
   - {database}.{schema}.{canonical_id}_lookup
   - {database}.{schema}.{canonical_id}_master_table
   - {database}.{schema}.{canonical_id}_source_key_stats
   - {database}.{schema}.{canonical_id}_result_key_stats
 4. Confirm access permissions
 ```
 **Databricks:**
 ```
 1. Verify Databricks MCP tools are available (or use Snowflake fallback)
 2. Test connection to specified catalog.schema
 3. Validate canonical ID tables exist
 4. Confirm access permissions
 ```
 ### Step 2: Data Collection with Expert Analysis
 I'll execute **16 specialized queries** to collect comprehensive statistics:
 **Core Statistics Queries:**
 1. **Source Key Statistics**
   - Pre-unification identity counts
   - Distinct values per key type (customer_id, email, phone, etc.)
   - Per-table breakdowns
 2. **Result Key Statistics**
   - Post-unification canonical ID counts
   - Distribution histograms
   - Coverage per key type
 3. **Canonical ID Metrics**
   - Total identities processed
   - Unique canonical IDs created
   - Merge ratio calculation
 4. **Top Merged Profiles**
   - Top 10 most complex merges
   - Identity count per canonical ID
   - Merge complexity scoring
 5. **Merge Distribution Analysis**
   - Categorization (2, 3-5, 6-10, 10+ identities)
   - Percentage distribution
   - Pattern analysis
 6. **Key Type Distribution**
   - Identity breakdown by type
   - Namespace analysis
   - Cross-key coverage
 7. **Master Table Quality Metrics**
   - Attribute coverage percentages
   - Data completeness analysis
   - Sample record extraction
 8. **Configuration Metadata**
   - Unification settings
   - Column mappings
   - Validation rules
 **Platform-Specific SQL Adaptation:**
 For **Snowflake**:
 ```sql
 SELECT COUNT(*) as total_identities,
       COUNT(DISTINCT canonical_id) as unique_canonical_ids
 FROM {database}.{schema}.{canonical_id}_lookup;
 ```
 For **Databricks**:
 ```sql
 SELECT COUNT(*) as total_identities,
       COUNT(DISTINCT canonical_id) as unique_canonical_ids
 FROM {catalog}.{schema}.{canonical_id}_lookup;
 ```
 ### Step 3: Statistical Analysis and Calculations
 I'll perform expert-level calculations:
 **Deduplication Rates:**
 ```
 For each key type:
 - Source distinct count (pre-unification)
 - Final canonical IDs (post-unification)
 - Deduplication % = (source - final) / source * 100
 ```
 **Merge Ratios:**
 ```
 - Average identities per customer = total_identities / unique_canonical_ids
 - Distribution across categories
 - Outlier detection (10+ merges)
 ```
 **Convergence Analysis:**
 ```
 - Parse from execution logs if available
 - Calculate from iteration metadata tables
 - Estimate convergence quality
 ```
 **Data Quality Scores:**
 ```
 - Coverage % for each attribute
 - Completeness assessment
 - Quality grading (Excellent, Good, Needs Improvement)
 ```
 ### Step 4: HTML Report Generation
 I'll generate a **pixel-perfect HTML report** with:
 **Design Features:**
 - ✨ Modern gradient design (purple theme)
 - 📊 Interactive visualizations (progress bars, horizontal bar charts)
 - 🎨 Color-coded badges and status indicators
 - 📱 Responsive layout (works on all devices)
 - 🖨️ Print-optimized CSS for PDF export
 **Report Structure:**
 ```html
 <!DOCTYPE html>
 <html>
  <head>
    - Professional CSS styling
    - Chart/visualization styles
    - Print media queries
  </head>
  <body>
    <header>
      - Report title
      - Executive tagline
    </header>
    <metadata-bar>
      - Database/Catalog info
      - Canonical ID name
      - Generation timestamp
      - Platform indicator
    </metadata-bar>
    <section: Executive Summary>
      - 4 KPI metric cards
      - Key findings insight box
    </section>
    <section: Identity Resolution Performance>
      - Source vs result comparison table
      - Deduplication rate analysis
      - Horizontal bar charts
      - Expert insights
    </section>
    <section: Merge Distribution Analysis>
      - Category breakdown table
      - Distribution visualizations
      - Pattern analysis insights
    </section>
    <section: Top Merged Profiles>
      - Top 10 ranked table
      - Complexity badges
      - Investigation recommendations
    </section>
    <section: Source Table Configuration>
      - Column mapping table
      - Source contributions
      - Multi-key strategy analysis
    </section>
    <section: Master Table Data Quality>
      - 6 coverage cards with progress bars
      - Sample records table
      - Quality assessment
    </section>
    <section: Convergence Performance>
      - Iteration breakdown table
      - Convergence progression chart
      - Efficiency analysis
    </section>
    <section: Expert Recommendations>
      - 4 recommendation cards
      - Strategic next steps
      - Downstream activation ideas
    </section>
    <section: Summary Statistics>
      - Comprehensive metrics table
      - All key numbers documented
    </section>
    <footer>
      - Generation metadata
      - Platform information
      - Report description
    </footer>
  </body>
 </html>
 ```
 ### Step 5: Quality Validation and Output
 **Pre-Output Validation:**
 ```
 1. Verify all sections have data
 2. Check calculations are correct
 3. Validate percentages sum properly
 4. Ensure no missing values
 5. Confirm HTML is well-formed
 ```
 **File Output:**
 ```
 1. Write HTML to specified path
 2. Create backup if file exists
 3. Set proper file permissions
 4. Verify file was written successfully
 ```
 **Report Summary:**
 ```
 ✓ Report generated: {file_path}
 ✓ File size: {size} KB
 ✓ Sections included: 9
 ✓ Statistics queries: 16
 ✓ Data quality score: {score}%
 ✓ Ready for: Browser viewing, PDF export, sharing
 ```
 ---
 ## Example Workflow
 ### Snowflake Example
 **User Input:**
 ```
 Platform: Snowflake
 Database: INDRESH_TEST
 Schema: PUBLIC
 Canonical ID: td_id
 Output: snowflake_merge_report.html
 ```
 **Process:**
 ```
 ✓ Connected to Snowflake via MCP
 ✓ Database: INDRESH_TEST.PUBLIC validated
 ✓ Tables found:
  - td_id_lookup (19,512 records)
  - td_id_master_table (4,940 records)
  - td_id_source_key_stats (4 records)
  - td_id_result_key_stats (4 records)
 Executing queries:
  ✓ Query 1: Source statistics retrieved
  ✓ Query 2: Result statistics retrieved
  ✓ Query 3: Canonical ID counts (19,512 → 4,940)
  ✓ Query 4: Top 10 merged profiles identified
  ✓ Query 5: Merge distribution calculated
  ✓ Query 6: Key type distribution analyzed
  ✓ Query 7: Master table coverage (100% email, 99.39% phone)
  ✓ Query 8: Sample records extracted
  ✓ Query 9-11: Metadata retrieved
 Calculating metrics:
  ✓ Merge ratio: 3.95:1
  ✓ Fragmentation reduction: 74.7%
  ✓ Deduplication rates:
    - customer_id: 23.9%
    - email: 32.0%
    - phone: 14.8%
  ✓ Data quality score: 99.7%
 Generating HTML report:
  ✓ Executive summary section
  ✓ Performance analysis section
  ✓ Merge distribution section
  ✓ Top profiles section
  ✓ Source configuration section
  ✓ Data quality section
  ✓ Convergence section
  ✓ Recommendations section
  ✓ Summary statistics section
 ✓ Report saved: snowflake_merge_report.html (142 KB)
 ✓ Open in browser to view
 ✓ Print to PDF for distribution
 ```
 **Generated Report Contents:**
 ```
 Executive Summary:
  - 4,940 unified profiles
  - 19,512 total identities
  - 3.95:1 merge ratio
  - 74.7% fragmentation reduction
 Identity Resolution:
  - customer_id: 6,489 → 4,940 (23.9% reduction)
  - email: 7,261 → 4,940 (32.0% reduction)
  - phone: 5,762 → 4,910 (14.8% reduction)
 Merge Distribution:
  - 89.0% profiles: 3-5 identities (normal)
  - 8.1% profiles: 6-10 identities (high engagement)
  - 2.3% profiles: 10+ identities (complex)
 Top Merged Profile:
  - mS9ssBEh4EsN: 38 identities merged
 Data Quality:
  - Email: 100% coverage
  - Phone: 99.39% coverage
  - Names: 100% coverage
  - Location: 100% coverage
 Expert Recommendations:
  - Implement incremental processing
  - Monitor profiles with 20+ merges
  - Enable downstream activation
  - Set up quality monitoring
 ```
 ### Databricks Example
 **User Input:**
 ```
 Platform: Databricks
 Catalog: customer_cdp
 Schema: id_unification
 Canonical ID: unified_customer_id
 Output: databricks_merge_report.html
 ```
 **Process:**
 ```
 ✓ Connected to Databricks (or using Snowflake MCP fallback)
 ✓ Catalog: customer_cdp.id_unification validated
 ✓ Tables found:
  - unified_customer_id_lookup
  - unified_customer_id_master_table
  - unified_customer_id_source_key_stats
  - unified_customer_id_result_key_stats
 [Same query execution and report generation as Snowflake]
 ✓ Report saved: databricks_merge_report.html
 ```
 ---
 ## Key Features
 ### 🎯 **Consistency Guarantee**
 - **Same report every time**: Deterministic HTML generation
 - **Platform-agnostic design**: Works identically on Snowflake and Databricks
 - **Version controlled**: Report structure is fixed and versioned
 ### 🔍 **Expert Analysis**
 - **16 specialized queries**: Comprehensive data collection
 - **Calculated metrics**: Deduplication rates, merge ratios, quality scores
 - **Pattern detection**: Identify anomalies and outliers
 - **Strategic insights**: Actionable recommendations
 ### 📊 **Professional Visualizations**
 - **KPI metric cards**: Large, colorful summary metrics
 - **Progress bars**: Coverage percentages with animations
 - **Horizontal bar charts**: Distribution comparisons
 - **Color-coded badges**: Status indicators (Excellent, Good, Needs Review)
 - **Tables with hover effects**: Interactive data exploration
 ### 🌍 **Platform Flexibility**
 - **Snowflake**: Uses `mcp__snowflake__execute_query` tool
 - **Databricks**: Uses Databricks MCP tools (with fallback options)
 - **Automatic SQL adaptation**: Platform-specific query generation
 - **Table name resolution**: Handles catalog vs database differences
 ### 📋 **Comprehensive Coverage**
 **9 Report Sections:**
 1. Executive Summary (4 KPIs + findings)
 2. Identity Resolution Performance (deduplication analysis)
 3. Merge Distribution Analysis (categorized breakdown)
 4. Top Merged Profiles (complexity ranking)
 5. Source Table Configuration (mappings)
 6. Master Table Data Quality (coverage metrics)
 7. Convergence Performance (iteration analysis)
 8. Expert Recommendations (strategic guidance)
 9. Summary Statistics (complete metrics)
 **16 Statistical Queries:**
 - Source/result key statistics
 - Canonical ID counts and distributions
 - Merge pattern analysis
 - Quality coverage metrics
 - Configuration metadata
 ---
 ## Table Naming Conventions
 The command automatically finds tables based on your canonical ID name:
 ### Required Tables
 For canonical ID = `{canonical_id}`:
 1. **Lookup Table**: `{canonical_id}_lookup`
   - Contains: canonical_id, id, id_key_type
   - Used for: Merge ratio, distribution, top profiles
 2. **Master Table**: `{canonical_id}_master_table`
   - Contains: {canonical_id}, best_* attributes
   - Used for: Data quality coverage
 3. **Source Stats**: `{canonical_id}_source_key_stats`
   - Contains: from_table, total_distinct, distinct_*
   - Used for: Pre-unification baseline
 4. **Result Stats**: `{canonical_id}_result_key_stats`
   - Contains: from_table, total_distinct, histogram_*
   - Used for: Post-unification results
 ### Optional Tables
 5. **Unification Metadata**: `unification_metadata`
   - Contains: canonical_id_name, canonical_id_type
   - Used for: Configuration documentation
 6. **Column Lookup**: `column_lookup`
   - Contains: table_name, column_name, key_name
   - Used for: Source table mappings
 7. **Filter Lookup**: `filter_lookup`
   - Contains: key_name, invalid_texts, valid_regexp
   - Used for: Validation rules
 **All tables must be in the same database.schema (Snowflake) or catalog.schema (Databricks)**
 ---
 ## Output Format
 ### HTML Report Features
 **Styling:**
 - Gradient purple theme (#667eea to #764ba2)
 - Modern typography (system fonts)
 - Responsive grid layouts
 - Smooth hover animations
 - Print-optimized media queries
 **Sections:**
 - Header with gradient background
 - Metadata bar with key info
 - 9 content sections with analysis
 - Footer with generation details
 **Visualizations:**
 - Metric cards (4 in executive summary)
 - Progress bars (6 in data quality)
 - Horizontal bar charts (3 throughout report)
 - Tables with sorting and hover effects
 - Insight boxes with recommendations
 **Interactivity:**
 - Hover effects on cards and tables
 - Animated progress bars
 - Expandable insight boxes
 - Responsive layout adapts to screen size
 ### PDF Export
 To create a PDF from the HTML report:
 1. Open HTML file in browser
 2. Press Ctrl+P (Windows) or Cmd+P (Mac)
 3. Select "Save as PDF"
 4. Choose landscape orientation for better chart visibility
 5. Enable background graphics for full styling
 ---
 ## Error Handling
 ### Common Issues and Solutions
 **Issue: "Tables not found"**
 ```
 Solution:
 1. Verify canonical ID name is correct
 2. Check database/catalog and schema names
 3. Ensure unification workflow completed successfully
 4. Confirm table naming: {canonical_id}_lookup, {canonical_id}_master_table, etc.
 ```
 **Issue: "MCP tools not available"**
 ```
 Solution:
 1. For Snowflake: Verify Snowflake MCP server is configured
 2. For Databricks: Fall back to Snowflake MCP with proper connection string
 3. Check network connectivity
 4. Validate credentials
 ```
 **Issue: "No data in statistics tables"**
 ```
 Solution:
 1. Verify unification workflow ran completely
 2. Check that statistics SQL files were executed
 3. Confirm data exists in lookup and master tables
 4. Re-run the unification workflow if needed
 ```
 **Issue: "Permission denied"**
 ```
 Solution:
 1. Verify READ access to all tables
 2. For Snowflake: Grant SELECT on schema
 3. For Databricks: Grant USE CATALOG, USE SCHEMA, SELECT
 4. Check role/user permissions
 ```
 ---
 ## Success Criteria
 Generated report will:
 - ✅ **Open successfully** in all modern browsers (Chrome, Firefox, Safari, Edge)
 - ✅ **Display all 9 sections** with complete data
 - ✅ **Show accurate calculations** for all metrics
 - ✅ **Include visualizations** (charts, progress bars, tables)
 - ✅ **Render consistently** every time it's generated
 - ✅ **Export cleanly to PDF** with proper formatting
 - ✅ **Match the reference design** (same HTML/CSS structure)
 - ✅ **Contain expert insights** and recommendations
 - ✅ **Be production-ready** for stakeholder distribution
 ---
 ## Usage Examples
 ### Quick Start (Snowflake)
 ```
 /cdp-hybrid-idu:hybrid-unif-merge-stats-creator
 > Platform: Snowflake
 > Database: PROD_CDP
 > Schema: ID_UNIFICATION
 > Canonical ID: master_customer_id
 > Output: (press Enter for default)
 ✓ Report generated: id_unification_report.html
 ```
 ### Custom Output Path
 ```
 /cdp-hybrid-idu:hybrid-unif-merge-stats-creator
 > Platform: Databricks
 > Catalog: analytics_prod
 > Schema: unified_ids
 > Canonical ID: td_id
 > Output: /reports/weekly/td_id_stats_2025-10-15.html
 ✓ Report generated: /reports/weekly/td_id_stats_2025-10-15.html
 ```
 ### Multiple Environments
 Generate reports for different environments:
 ```bash
 # Production
 /hybrid-unif-merge-stats-creator
  Platform: Snowflake
  Database: PROD_CDP
  Output: prod_merge_stats.html
 # Staging
 /hybrid-unif-merge-stats-creator
  Platform: Snowflake
  Database: STAGING_CDP
  Output: staging_merge_stats.html
 # Compare metrics across environments
 ```
 ---
 ## Best Practices
 ### Regular Reporting
 1. **Weekly Reports**: Track merge performance over time
 2. **Post-Workflow Reports**: Generate after each unification run
 3. **Quality Audits**: Monthly deep-dive analysis
 4. **Stakeholder Updates**: Executive-friendly format
 ### Comparative Analysis
 Generate reports at different stages:
 - After initial unification setup
 - After incremental updates
 - After data quality improvements
 - Across different customer segments
 ### Archive and Versioning
 ```
 reports/
  2025-10-15_td_id_merge_stats.html
  2025-10-08_td_id_merge_stats.html
  2025-10-01_td_id_merge_stats.html
 ```
 Track improvements over time by comparing:
 - Merge ratios
 - Data quality scores
 - Convergence iterations
 - Deduplication rates
 ---
 ## Getting Started
 **Ready to generate your merge statistics report?**
 Please provide:
 1. **Platform**: Snowflake or Databricks?
 2. **Database/Catalog**: Where are your unification tables?
 3. **Schema**: Which schema contains the tables?
 4. **Canonical ID**: What's the name of your unified ID? (e.g., td_id)
 5. **Output Path** (optional): Where to save the report?
 **Example:**
 ```
 I want to generate a merge statistics report for:
 Platform: Snowflake
 Database: INDRESH_TEST
 Schema: PUBLIC
 Canonical ID: td_id
 Output: my_unification_report.html
 ```
 ---
 **I'll analyze your ID unification results and create a comprehensive, beautiful HTML report with expert insights!**
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,101 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:treasure-data/aps_claude_tools:plugins/cdp-hybrid-idu",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "58382efafa00d9c88bf68f0ba2be494e310d9827",
    "treeHash": "04cbd3c0d2b818afaf15f92f7e5fb2880103cdbfd513d9926f323c5b7722f625",
    "generatedAt": "2025-11-28T10:28:44.950550Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "cdp-hybrid-idu",
    "description": "Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation",
    "version": null
  },
  "content": {
    "files": [
      {
        "path": "README.md",
        "sha256": "4e50b588ce6c220815a4ca869c68f41fe23cbaf05846fb306e7b2cbf127ed8f8"
      },
      {
        "path": "agents/hybrid-unif-keys-extractor.md",
        "sha256": "d2c92a61393209f0835f0118240254a7fa6f209aa62ec87d0ab253723055a7da"
      },
      {
        "path": "agents/merge-stats-report-generator.md",
        "sha256": "6e8fda43a277dfef132566b44a3dee23a632171641dcde0151d7602f43bcb5e8"
      },
      {
        "path": "agents/databricks-sql-generator.md",
        "sha256": "ae3ce3874d7c00599fcef09718cb612e551aac89896e3c75aa1194332179df9d"
      },
      {
        "path": "agents/databricks-workflow-executor.md",
        "sha256": "ecc4fcf94d470fe27f078e8722297921469852d596035e3d9d5b5d32aa2b0435"
      },
      {
        "path": "agents/yaml-configuration-builder.md",
        "sha256": "da90f575f8f0f7e33fba1ad720c73556e029227fcf135cd9fe4a9a1d3fb77be3"
      },
      {
        "path": "agents/snowflake-sql-generator.md",
        "sha256": "783ee1653bca7e0bb2647b953b4c05390e08686f7454c8e1a9e572851e8e0fc8"
      },
      {
        "path": "agents/snowflake-workflow-executor.md",
        "sha256": "f5f5352f47cfdd5a52769988ed9893f5a64de2a236f145b315838d475babca2c"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "a156b276659131718eab652f7b9806ab00bf59318ee07e22a585e3cb13da5e93"
      },
      {
        "path": "commands/hybrid-setup.md",
        "sha256": "9a287a1c414323cd6db2c5f3197fcfde531d337168e2696bc7f4896113ae40b6"
      },
      {
        "path": "commands/hybrid-generate-databricks.md",
        "sha256": "aff13cf95a74cd71dff35e3a4cd4ba2f287a7b3091f84cdb914d80e00bfe29ad"
      },
      {
        "path": "commands/hybrid-generate-snowflake.md",
        "sha256": "0dc460f41ee3c8130aa9a52537686fec6818e7a37a802040b8a570d8f89eaf77"
      },
      {
        "path": "commands/hybrid-unif-config-creator.md",
        "sha256": "3e14989f811e5ef198cff306e9203ec6bfa5f3772daa3a0f08292595574ab73c"
      },
      {
        "path": "commands/hybrid-execute-databricks.md",
        "sha256": "ad78068c5b96d310d1d620c00572c100915e0706a5312c0649b09a8165bbc79c"
      },
      {
        "path": "commands/hybrid-unif-config-validate.md",
        "sha256": "a413582bc43a23ad1addde134007bd6a3174b14d71c10dcbbd5f7824a6a97fb0"
      },
      {
        "path": "commands/hybrid-unif-merge-stats-creator.md",
        "sha256": "0c00db96f02559d212e502702eea5d3a02de8fedbca92f64eebd8ed430f96341"
      },
      {
        "path": "commands/hybrid-execute-snowflake.md",
        "sha256": "63fbc27f3350cd1d910d0dc4c588a14fd39e1d7ebda29ed6fce3584967bac4c4"
      }
    ],
    "dirSha256": "04cbd3c0d2b818afaf15f92f7e5fb2880103cdbfd513d9926f323c5b7722f625"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
		`@@ -0,0 +1,3 @@`
							`# cdp-hybrid-idu`

							`Multi-platform ID Unification for Snowflake and Databricks with YAML-driven configuration, convergence detection, and master table generation`