--- name: hybrid-setup description: Complete end-to-end hybrid ID unification setup - automatically analyzes tables, generates config, creates SQL, and executes workflow for Snowflake and Databricks --- # Hybrid ID Unification Complete Setup ## Overview I'll guide you through the complete hybrid ID unification setup process for Snowflake and/or Databricks platforms. This is an **automated, end-to-end workflow** that will: 1. **Analyze your tables automatically** using platform MCP tools with strict PII detection 2. **Generate YAML configuration** from real schema and data analysis 3. **Choose target platform(s)** (Snowflake, Databricks, or both) 4. **Generate platform-specific SQL** optimized for each engine 5. **Execute workflows** with convergence detection and monitoring 6. **Provide deployment guidance** and operating instructions **Key Features**: - 🔍 **Automated Table Analysis**: Uses Snowflake/Databricks MCP tools to analyze actual tables - ✅ **Strict PII Detection**: Zero tolerance - only includes tables with real user identifiers - 📊 **Real Data Validation**: Queries actual data to validate patterns and quality - 🎯 **Smart Recommendations**: Expert analysis provides merge strategy and priorities - 🚀 **End-to-End Automation**: From table analysis to workflow execution --- ## What You Need to Provide ### 1. Unification Requirements (For Automated Analysis) - **Platform**: Snowflake or Databricks - **Tables**: List of source tables to analyze - Format (Snowflake): `database.schema.table` or `schema.table` or `table` - Format (Databricks): `catalog.schema.table` or `schema.table` or `table` - **Canonical ID Name**: Name for your unified ID (e.g., `td_id`, `unified_customer_id`) - **Merge Iterations**: Number of unification loops (default: 10) - **Master Tables**: (Optional) Attribute aggregation specifications **Note**: The system will automatically: - Extract user identifiers from actual table schemas - Validate data patterns from real data - Apply appropriate validation rules based on data analysis - Generate merge strategy recommendations ### 2. Platform Selection - **Databricks**: Unity Catalog with Delta Lake - **Snowflake**: Database with proper permissions - **Both**: Generate SQL for both platforms ### 3. Target Configurations **For Databricks**: - **Catalog**: Target catalog name - **Schema**: Target schema name - **Source Catalog** (optional): Source data catalog - **Source Schema** (optional): Source data schema **For Snowflake**: - **Database**: Target database name - **Schema**: Target schema name - **Source Schema** (optional): Source data schema ### 4. Execution Credentials (if executing) **For Databricks**: - **Server Hostname**: your-workspace.databricks.com - **HTTP Path**: /sql/1.0/warehouses/your-warehouse-id - **Authentication**: PAT (Personal Access Token) or OAuth **For Snowflake**: - **Account**: Snowflake account name - **User**: Username - **Password**: Password or use SSO/key-pair - **Warehouse**: Compute warehouse name --- ## What I'll Do ### Step 1: Automated YAML Configuration Generation I'll use the **hybrid-unif-config-creator** command to automatically generate your `unify.yml` file: **Automated Analysis Approach** (Recommended): - Analyze your actual tables using platform MCP tools (Snowflake/Databricks) - Extract user identifiers with STRICT PII detection (zero tolerance for guessing) - Validate data patterns from real table data - Generate unify.yml with exact template compliance - Only include tables with actual user identifiers - Document excluded tables with detailed reasons **What I'll do**: - Call the **hybrid-unif-keys-extractor agent** to analyze tables - Query actual schema and data using platform MCP tools - Detect valid user identifiers (email, customer_id, phone, etc.) - Exclude tables without PII with full documentation - Generate production-ready unify.yml automatically **Alternative - Manual Configuration**: - If MCP tools are unavailable, I'll guide you through manual configuration - Interactive prompts for keys, tables, and validation rules - Step-by-step YAML building with validation ### Step 2: Platform Selection and Configuration I'll help you: - Choose between Databricks, Snowflake, or both - Collect platform-specific configuration (catalog/database, schema names) - Determine source/target separation strategy - Decide on execution or generation-only mode ### Step 3: SQL Generation **For Databricks** (if selected): I'll call the **databricks-sql-generator agent** to: - Execute `yaml_unification_to_databricks.py` script - Generate Delta Lake optimized SQL workflow - Create output directory: `databricks_sql/unify/` - Generate 15+ SQL files with proper execution order **For Snowflake** (if selected): I'll call the **snowflake-sql-generator agent** to: - Execute `yaml_unification_to_snowflake.py` script - Generate Snowflake-native SQL workflow - Create output directory: `snowflake_sql/unify/` - Generate 15+ SQL files with proper execution order ### Step 4: Workflow Execution (Optional) **For Databricks** (if execution requested): I'll call the **databricks-workflow-executor agent** to: - Execute `databricks_sql_executor.py` script - Connect to your Databricks workspace - Run SQL files in proper sequence - Monitor convergence and progress - Optimize Delta tables - Report final statistics **For Snowflake** (if execution requested): I'll call the **snowflake-workflow-executor agent** to: - Execute `snowflake_sql_executor.py` script - Connect to your Snowflake account - Run SQL files in proper sequence - Monitor convergence and progress - Report final statistics ### Step 5: Deployment Guidance I'll provide: - Configuration summary - Generated files overview - Deployment instructions - Operating guidelines - Monitoring recommendations --- ## Interactive Workflow This command orchestrates the complete end-to-end flow by calling specialized commands in sequence: ### Phase 1: Configuration Creation **I'll ask you for**: - Platform (Snowflake or Databricks) - Tables to analyze - Canonical ID name - Merge iterations **Then I'll**: - Call `/cdp-hybrid-idu:hybrid-unif-config-creator` internally - Analyze your tables automatically - Generate `unify.yml` with strict PII detection - Show you the configuration for review ### Phase 2: SQL Generation **I'll ask you**: - Which platform(s) to generate SQL for (can be different from source) - Output directory preferences **Then I'll**: - Call `/cdp-hybrid-idu:hybrid-generate-snowflake` (if Snowflake selected) - Call `/cdp-hybrid-idu:hybrid-generate-databricks` (if Databricks selected) - Generate 15+ optimized SQL files per platform - Show you the execution plan ### Phase 3: Workflow Execution (Optional) **I'll ask you**: - Do you want to execute now or later? - Connection credentials if executing **Then I'll**: - Call `/cdp-hybrid-idu:hybrid-execute-snowflake` (if Snowflake selected) - Call `/cdp-hybrid-idu:hybrid-execute-databricks` (if Databricks selected) - Monitor convergence and progress - Report final statistics **Throughout the process**: - **Questions**: When I need your input - **Suggestions**: Recommended approaches based on best practices - **Validation**: Real-time checks on your choices - **Explanations**: Help you understand concepts and options --- ## Expected Output ### Files Created (Platform-specific): **For Databricks**: ``` databricks_sql/unify/ ├── 01_create_graph.sql # Initialize identity graph ├── 02_extract_merge.sql # Extract and merge identities ├── 03_source_key_stats.sql # Source statistics ├── 04_unify_loop_iteration_*.sql # Iterative unification (N files) ├── 05_canonicalize.sql # Canonical ID creation ├── 06_result_key_stats.sql # Result statistics ├── 10_enrich_*.sql # Source table enrichment (N files) ├── 20_master_*.sql # Master table creation (N files) ├── 30_unification_metadata.sql # Metadata tables ├── 31_filter_lookup.sql # Validation rules └── 32_column_lookup.sql # Column mappings ``` **For Snowflake**: ``` snowflake_sql/unify/ ├── 01_create_graph.sql # Initialize identity graph ├── 02_extract_merge.sql # Extract and merge identities ├── 03_source_key_stats.sql # Source statistics ├── 04_unify_loop_iteration_*.sql # Iterative unification (N files) ├── 05_canonicalize.sql # Canonical ID creation ├── 06_result_key_stats.sql # Result statistics ├── 10_enrich_*.sql # Source table enrichment (N files) ├── 20_master_*.sql # Master table creation (N files) ├── 30_unification_metadata.sql # Metadata tables ├── 31_filter_lookup.sql # Validation rules └── 32_column_lookup.sql # Column mappings ``` **Configuration**: ``` unify.yml # YAML configuration (created interactively) ``` --- ## Success Criteria All generated files will: - ✅ Be platform-optimized and production-ready - ✅ Use proper SQL dialects (Databricks Spark SQL or Snowflake SQL) - ✅ Include convergence detection logic - ✅ Support incremental processing - ✅ Generate comprehensive statistics - ✅ Work without modification on target platforms --- ## Getting Started **Ready to begin?** I'll use the **hybrid-unif-config-creator** to automatically analyze your tables and generate the YAML configuration. Please provide: 1. **Platform**: Which platform contains your data? - Snowflake or Databricks 2. **Tables**: Which source tables should I analyze? - Format (Snowflake): `database.schema.table` or `schema.table` or `table` - Format (Databricks): `catalog.schema.table` or `schema.table` or `table` - Example: `customer_db.public.customers`, `orders`, `web_events.user_activity` 3. **Canonical ID Name**: What should I call the unified ID? - Example: `td_id`, `unified_customer_id`, `master_id` - Default: `td_id` 4. **Merge Iterations** (optional): How many unification loops? - Default: 10 - Range: 2-30 5. **Target Platform(s)** for SQL generation: - Same as source, or generate for both platforms **Example**: ``` I want to set up hybrid ID unification for: Platform: Snowflake Tables: - customer_db.public.customer_profiles - customer_db.public.orders - marketing_db.public.campaigns - event_db.public.web_events Canonical ID: unified_customer_id Merge Iterations: 10 Generate SQL for: Snowflake (or both Snowflake and Databricks) ``` **What I'll do next**: 1. ✅ Analyze your tables using Snowflake MCP tools 2. ✅ Extract user identifiers with strict PII detection 3. ✅ Generate unify.yml automatically 4. ✅ Generate platform-specific SQL files 5. ✅ Execute workflow (if requested) 6. ✅ Provide deployment guidance --- **Let's get started with your hybrid ID unification setup!**