Initial commit

2025-11-30 09:02:49 +08:00
commit 1c95d6eb21
13 changed files with 3089 additions and 0 deletions
--- a/commands/unify-create-config.md
+++ b/commands/unify-create-config.md
@@ -0,0 +1,314 @@
+---
+name: unify-create-config
+description: Generate core ID unification configuration files (unify.yml and id_unification.dig)
+---
+
+# Create Core Unification Configuration
+
+## Overview
+
+I'll generate core ID unification configuration files using the **id-unification-creator** specialized agent.
+
+This command creates **TD-COMPLIANT** unification files:
+- ✅ **DYNAMIC CONFIGURATION** - Based on prep table analysis
+- ✅ **METHOD-SPECIFIC** - Persistent_id OR canonical_id (never both)
+- ✅ **REGIONAL ENDPOINTS** - Correct URL for your region
+- ✅ **SCHEMA VALIDATION** - Prevents first-run failures
+
+---
+
+## Prerequisites
+
+**REQUIRED**: Prep table configuration must exist:
+- `unification/config/environment.yml` - Client configuration
+- `unification/config/src_prep_params.yml` - Prep table mappings
+
+If you haven't created these yet, run:
+- `/cdp-unification:unify-create-prep` first, OR
+- `/cdp-unification:unify-setup` for complete end-to-end setup
+
+---
+
+## What You Need to Provide
+
+### 1. ID Method Selection
+Choose ONE method:
+
+**Option A: persistent_id (RECOMMENDED)**
+- Stable IDs that persist across updates
+- Better for customer data platforms
+- Recommended for most use cases
+- **Provide persistent_id name** (e.g., `td_claude_id`, `stable_customer_id`)
+
+**Option B: canonical_id**
+- Traditional approach with merge capabilities
+- Good for legacy systems
+- **Provide canonical_id name** (e.g., `master_customer_id`)
+
+### 2. Update Strategy
+- **Full Refresh**: Reprocess all data each time (`full_refresh: true`)
+- **Incremental**: Process only new/updated records (`full_refresh: false`)
+
+### 3. Regional Endpoint
+Choose your Treasure Data region:
+- **US**: https://api-cdp.treasuredata.com/unifications/workflow_call
+- **EU**: https://api-cdp.eu01.treasuredata.com/unifications/workflow_call
+- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com/unifications/workflow_call
+- **Japan**: https://api-cdp.treasuredata.co.jp/unifications/workflow_call
+
+### 4. Unification Name
+- Name for this unification project (e.g., `claude`, `customer_360`)
+
+---
+
+## What I'll Do
+
+### Step 1: Validate Prerequisites
+I'll check that these files exist:
+- `unification/config/environment.yml`
+- `unification/config/src_prep_params.yml`
+
+And extract:
+- Client short name
+- Unified input table name
+- All prep table configurations with column mappings
+
+### Step 2: Extract Key Information
+I'll parse `src_prep_params.yml` to identify:
+- All unique `alias_as` column names
+- Key types: email, phone, td_client_id, td_global_id, customer_id, etc.
+- Complete list of available keys for `merge_by_keys`
+
+### Step 3: Generate unification/config/unify.yml
+I'll create:
+```yaml
+name: {unif_name}
+
+keys:
+  - name: email
+    invalid_texts: ['']
+  - name: td_client_id
+    invalid_texts: ['']
+  - name: phone
+    invalid_texts: ['']
+  # ... ALL detected key types
+
+tables:
+  - database: ${client_short_name}_${stg}
+    table: ${globals.unif_input_tbl}
+    incremental_columns: [time]
+    key_columns:
+      - {column: email, key: email}
+      - {column: td_client_id, key: td_client_id}
+      - {column: phone, key: phone}
+      # ... ALL alias_as columns mapped
+
+# ONLY ONE of these sections (based on your selection):
+persistent_ids:
+  - name: {persistent_id_name}
+    merge_by_keys: [email, td_client_id, phone, ...]
+    merge_iterations: 15
+
+# OR
+
+canonical_ids:
+  - name: {canonical_id_name}
+    merge_by_keys: [email, td_client_id, phone, ...]
+    merge_iterations: 15
+```
+
+### Step 4: Validate and Update Schema (CRITICAL)
+I'll prevent first-run failures by:
+1. Reading `unify.yml` to extract `merge_by_keys` list
+2. Reading `queries/create_schema.sql` to check existing columns
+3. Comparing required vs existing columns
+4. Updating `create_schema.sql` if missing columns:
+   - Add all keys from `merge_by_keys` as varchar
+   - Add source, time, ingest_time columns
+   - Update BOTH table definitions (main and tmp)
+
+### Step 5: Generate unification/id_unification.dig
+I'll create:
+```yaml
+timezone: UTC
+
+_export:
+  !include : config/environment.yml
+  !include : config/src_prep_params.yml
+
+call_unification:
+  http_call>: {REGIONAL_ENDPOINT_URL}
+  headers:
+    - authorization: ${secret:td.apikey}
+    - content-type: application/json
+  method: POST
+  retry: true
+  content_format: json
+  content:
+    run_persistent_ids: {true/false}    # ONLY if persistent_id
+    run_canonical_ids: {true/false}     # ONLY if canonical_id
+    run_enrichments: true
+    run_master_tables: true
+    full_refresh: {true/false}
+    keep_debug_tables: true
+    unification:
+      !include : config/unify.yml
+```
+
+---
+
+## Expected Output
+
+### Files Created
+```
+unification/
+├── config/
+│   └── unify.yml                      ✓ Dynamic configuration
+├── queries/
+│   └── create_schema.sql              ✓ Updated with all columns
+└── id_unification.dig                 ✓ Core unification workflow
+```
+
+### Example unify.yml (persistent_id method)
+```yaml
+name: customer_360
+
+keys:
+  - name: email
+    invalid_texts: ['']
+  - name: td_client_id
+    invalid_texts: ['']
+    valid_regexp: '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
+
+tables:
+  - database: ${client_short_name}_${stg}
+    table: ${globals.unif_input_tbl}
+    incremental_columns: [time]
+    key_columns:
+      - {column: email, key: email}
+      - {column: td_client_id, key: td_client_id}
+
+persistent_ids:
+  - name: td_claude_id
+    merge_by_keys: [email, td_client_id]
+    merge_iterations: 15
+```
+
+### Example id_unification.dig (US region, incremental)
+```yaml
+timezone: UTC
+
+_export:
+  !include : config/environment.yml
+  !include : config/src_prep_params.yml
+
+call_unification:
+  http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
+  headers:
+    - authorization: ${secret:td.apikey}
+    - content-type: application/json
+  method: POST
+  retry: true
+  content_format: json
+  content:
+    run_persistent_ids: true
+    run_enrichments: true
+    run_master_tables: true
+    full_refresh: false
+    keep_debug_tables: true
+    unification:
+      !include : config/unify.yml
+```
+
+---
+
+## Critical Requirements
+
+### ✅ Dynamic Configuration
+- All keys detected from `src_prep_params.yml`
+- All column mappings from prep analysis
+- Method-specific configuration (never both)
+
+### ⚠️ Schema Completeness
+- `create_schema.sql` MUST contain ALL columns from `merge_by_keys`
+- Prevents "column not found" errors on first run
+- Updates both main and tmp table definitions
+
+### ⚠️ Config File Inclusion
+- `id_unification.dig` MUST include BOTH config files in `_export`:
+  - `environment.yml` - For `${client_short_name}_${stg}`
+  - `src_prep_params.yml` - For `${globals.unif_input_tbl}`
+
+### ⚠️ Regional Endpoint
+- Must use exact URL for selected region
+- Different endpoints for US, EU, Asia Pacific, Japan
+
+---
+
+## Validation Checklist
+
+Before completing, I'll verify:
+- [ ] unify.yml contains all detected key types
+- [ ] key_columns section maps ALL alias_as columns
+- [ ] Only ONE ID method section exists
+- [ ] merge_by_keys includes ALL available keys
+- [ ] **CRITICAL**: create_schema.sql contains ALL columns from merge_by_keys
+- [ ] **CRITICAL**: Both table definitions updated (main and tmp)
+- [ ] id_unification.dig has correct regional endpoint
+- [ ] **CRITICAL**: _export includes BOTH config files
+- [ ] Workflow flags match selected method only
+- [ ] Proper TD YAML/DIG syntax
+
+---
+
+## Success Criteria
+
+All generated files will:
+- ✅ **TD-COMPLIANT** - Work without modification in TD
+- ✅ **DYNAMICALLY CONFIGURED** - Based on actual prep analysis
+- ✅ **METHOD-ACCURATE** - Exact implementation of selected method
+- ✅ **REGIONALLY CORRECT** - Proper endpoint for region
+- ✅ **SCHEMA-COMPLETE** - All required columns present
+
+---
+
+## Next Steps
+
+After creating core config, you can:
+1. **Test unification workflow**: `dig run unification/id_unification.dig`
+2. **Add enrichment**: Use `/cdp-unification:unify-setup` to add staging enrichment
+3. **Create main orchestrator**: Combine prep + unification + enrichment
+
+---
+
+## Getting Started
+
+**Ready to create core unification config?** Please provide:
+
+1. **ID Method**:
+   - Choose: `persistent_id` or `canonical_id`
+   - Provide ID name: e.g., `td_claude_id`
+
+2. **Update Strategy**:
+   - Choose: `incremental` or `full_refresh`
+
+3. **Regional Endpoint**:
+   - Choose: `US`, `EU`, `Asia Pacific`, or `Japan`
+
+4. **Unification Name**:
+   - e.g., `customer_360`, `claude`
+
+**Example:**
+```
+ID Method: persistent_id
+ID Name: td_claude_id
+Update Strategy: incremental
+Region: US
+Unification Name: customer_360
+```
+
+I'll call the **id-unification-creator** agent to generate all core unification files.
+
+---
+
+**Let's create your unification configuration!**
--- a/commands/unify-create-prep.md
+++ b/commands/unify-create-prep.md
@@ -0,0 +1,233 @@
+---
+name: unify-create-prep
+description: Generate prep table creation files and configuration for ID unification
+---
+
+# Create Prep Table Configuration
+
+## Overview
+
+I'll generate prep table creation files and configuration using the **dynamic-prep-creation** specialized agent.
+
+This command creates **PRODUCTION-READY** prep table files:
+- ⚠️ **EXACT TEMPLATES** - No modifications allowed
+- ⚠️ **ZERO CHANGES** - Character-for-character accuracy
+- ✅ **GENERIC FILES** - Reusable across all projects
+- ✅ **DYNAMIC CONFIGURATION** - Adapts to your table structure
+
+---
+
+## What You Need to Provide
+
+### 1. Table Analysis Results
+If you've already run key extraction:
+- Provide the list of **included tables** with their user identifier columns
+- I can use the results from `/cdp-unification:unify-extract-keys`
+
+OR provide directly:
+- **Source tables**: database.table_name format
+- **User identifier columns**: For each table, which columns contain identifiers
+
+### 2. Client Configuration
+- **Client short name**: Your client identifier (e.g., `mck`, `client_name`)
+- **Database suffixes**:
+  - Source database suffix (default: `src`)
+  - Staging database suffix (default: `stg`)
+  - Lookup database (default: `config`)
+
+### 3. Column Mappings
+For each table, specify which columns to include and their unified aliases:
+- **Email columns** → alias: `email`
+- **Phone columns** → alias: `phone`
+- **Customer ID columns** → alias: `customer_id`
+- **TD Client ID** → alias: `td_client_id`
+- **TD Global ID** → alias: `td_global_id`
+
+---
+
+## What I'll Do
+
+### Step 1: Create Directory Structure
+I'll create:
+- `unification/config/` directory
+- `unification/queries/` directory
+
+### Step 2: Generate Generic Files (EXACT TEMPLATES)
+I'll create these files with **ZERO MODIFICATIONS**:
+
+**⚠️ `unification/dynmic_prep_creation.dig`** (EXACT filename - no 'a' in dynmic)
+- Generic prep workflow
+- Handles schema creation, table looping, and data insertion
+- Uses variables from config files
+
+**⚠️ `unification/queries/create_schema.sql`**
+- Generic schema creation for unified input table
+- Creates both main and tmp tables
+
+**⚠️ `unification/queries/loop_on_tables.sql`**
+- Complex production SQL for dynamic table processing
+- Generates prep table SQL and unified input table SQL
+- Handles incremental logic and deduplication
+
+**⚠️ `unification/queries/unif_input_tbl.sql`**
+- DSAR processing and data cleaning
+- Exclusion list management for masked data
+- Dynamic column detection and insertion
+
+### Step 3: Generate Dynamic Configuration Files
+
+**`unification/config/environment.yml`**
+```yaml
+client_short_name: {your_client_name}
+src: src
+stg: stg
+gld: gld
+lkup: references
+```
+
+**`unification/config/src_prep_params.yml`**
+- Dynamic table configuration based on your table analysis
+- Column mappings with unified aliases
+- Prep table naming conventions
+
+### Step 4: Dynamic Column Detection (CRITICAL)
+For `unif_input_tbl.sql`, I'll:
+1. Query Treasure Data schema: `information_schema.columns`
+2. Detect all columns besides email, phone, source, ingest_time, time
+3. Auto-generate column list for data_cleaned CTE
+4. Replace placeholder with actual columns
+
+---
+
+## Expected Output
+
+### Generic Files (EXACT - NO CHANGES)
+```
+unification/
+├── dynmic_prep_creation.dig          ⚠️ EXACT filename
+├── queries/
+│   ├── create_schema.sql             ⚠️ EXACT content
+│   ├── loop_on_tables.sql            ⚠️ EXACT content
+│   └── unif_input_tbl.sql            ⚠️ WITH dynamic columns
+```
+
+### Dynamic Configuration Files
+```
+unification/config/
+├── environment.yml                   ✓ Client-specific
+└── src_prep_params.yml              ✓ Table-specific
+```
+
+### Example src_prep_params.yml Structure
+```yaml
+globals:
+  unif_input_tbl: unif_input
+
+prep_tbls:
+  - src_tbl: user_events
+    src_db: ${client_short_name}_${stg}
+    snk_db: ${client_short_name}_${stg}
+    snk_tbl: ${src_tbl}_prep
+    columns:
+      - col:
+        name: user_email
+        alias_as: email
+      - col:
+        name: td_client_id
+        alias_as: td_client_id
+
+  - src_tbl: customers
+    src_db: ${client_short_name}_${stg}
+    snk_db: ${client_short_name}_${stg}
+    snk_tbl: ${src_tbl}_prep
+    columns:
+      - col:
+        name: email
+        alias_as: email
+      - col:
+        name: customer_id
+        alias_as: customer_id
+```
+
+---
+
+## Critical Requirements
+
+### ⚠️ NEVER MODIFY GENERIC FILES
+- **dynmic_prep_creation.dig**: EXACT template, character-for-character
+- **create_schema.sql**: EXACT SQL, no changes
+- **loop_on_tables.sql**: EXACT complex SQL, no modifications
+- **unif_input_tbl.sql**: EXACT template + dynamic column replacement
+
+### ✅ DYNAMIC CONFIGURATION ONLY
+- **environment.yml**: Client-specific variables
+- **src_prep_params.yml**: Table-specific mappings
+
+### 🚨 CRITICAL FILENAME
+- **MUST be "dynmic_prep_creation.dig"** (NO 'a' in dynmic)
+- This is intentional - production systems expect this exact name
+
+### 🚨 NO TIME COLUMN
+- **NEVER ADD** `time` column to src_prep_params.yml
+- Time is auto-generated by SQL template
+- Only include actual identifier columns
+
+---
+
+## Validation Checklist
+
+Before completing, I'll verify:
+- [ ] File named "dynmic_prep_creation.dig" exists
+- [ ] Content matches template character-for-character
+- [ ] All variable placeholders preserved
+- [ ] Queries folder contains exact SQL files
+- [ ] Config folder contains YAML files
+- [ ] Dynamic columns inserted in unif_input_tbl.sql
+- [ ] No time column in src_prep_params.yml
+- [ ] All directories created
+
+---
+
+## Success Criteria
+
+All generated files will:
+- ✅ **EXACT TEMPLATES** - Character-for-character accuracy
+- ✅ **PRODUCTION-READY** - Deployable to TD without changes
+- ✅ **DYNAMIC CONFIGURATION** - Adapts to table structure
+- ✅ **DSAR COMPLIANT** - Includes exclusion list processing
+- ✅ **INCREMENTAL PROCESSING** - Supports time-based updates
+
+---
+
+## Next Steps
+
+After prep creation, you can:
+1. **Test prep workflow**: `dig run unification/dynmic_prep_creation.dig`
+2. **Create unification config**: Use `/cdp-unification:unify-create-config`
+3. **Complete full setup**: Use `/cdp-unification:unify-setup`
+
+---
+
+## Getting Started
+
+**Ready to create prep tables?** Please provide:
+
+1. **Table list with columns**:
+   ```
+   Table: analytics.user_events
+   Columns: user_email (email), td_client_id (td_client_id)
+
+   Table: crm.customers
+   Columns: email (email), customer_id (customer_id)
+   ```
+
+2. **Client configuration**:
+   ```
+   Client short name: mck
+   ```
+
+I'll call the **dynamic-prep-creation** agent to generate all prep files with exact templates.
+
+---
+
+**Let's create your prep table configuration!**
--- a/commands/unify-extract-keys.md
+++ b/commands/unify-extract-keys.md
@@ -0,0 +1,191 @@
+---
+name: unify-extract-keys
+description: Extract and validate user identifier columns from tables using live Treasure Data analysis
+---
+
+# Extract and Validate User Identifiers
+
+## Overview
+
+I'll analyze your Treasure Data tables to extract and validate user identifier columns using the **unif-keys-extractor** specialized agent.
+
+This command performs **ZERO-TOLERANCE** identifier extraction:
+- ❌ **NO GUESSING** - Only uses real Treasure Data MCP tools
+- ❌ **NO ASSUMPTIONS** - Every table is analyzed with live data
+- ✅ **STRICT VALIDATION** - Only includes tables with actual user identifiers
+- ✅ **COMPREHENSIVE ANALYSIS** - 3 SQL experts review and priority recommendations
+
+---
+
+## What You Need to Provide
+
+### Table List
+Provide the tables you want to analyze for ID unification:
+- **Format**: `database.table_name`
+- **Example**: `analytics.user_events`, `crm.customers`, `web.pageviews`
+
+---
+
+## What I'll Do
+
+### Step 1: Schema Extraction (MANDATORY)
+For each table, I'll:
+- Call `mcp__mcc_treasuredata__describe_table(table, database)`
+- Extract EXACT column names and data types
+- Identify tables that are inaccessible
+
+### Step 2: User Identifier Detection (STRICT MATCHING)
+I'll scan for valid user identifier columns:
+
+**✅ VALID USER IDENTIFIERS:**
+- **Email columns**: email, email_std, email_address, user_email, customer_email
+- **Phone columns**: phone, phone_std, phone_number, mobile_phone, customer_phone
+- **User ID columns**: user_id, customer_id, account_id, member_id, uid, user_uuid
+- **Identity columns**: profile_id, identity_id, cognito_identity_userid
+- **Cookie/Device IDs**: td_client_id, td_global_id, td_ssc_id, cookie_id, device_id
+
+**❌ NOT USER IDENTIFIERS (EXCLUDED):**
+- System columns: id, created_at, updated_at, load_timestamp
+- Campaign columns: campaign_id, message_id
+- Product columns: product_id, sku, variant_id
+- Complex types: array, map, json columns
+
+### Step 3: Exclusion Validation (CRITICAL)
+For tables WITHOUT user identifiers, I'll:
+- Document the exclusion reason
+- List available columns for transparency
+- Explain why the table doesn't qualify
+
+### Step 4: Min/Max Data Analysis (INCLUDED TABLES ONLY)
+For tables WITH user identifiers, I'll:
+- Query actual data: `SELECT MIN(column), MAX(column) FROM table`
+- Validate data patterns and formats
+- Assess data quality
+
+### Step 5: 3 SQL Experts Analysis
+I'll provide structured analysis from three perspectives:
+1. **Data Pattern Analyst**: Reviews actual min/max values and data quality
+2. **Cross-Table Relationship Analyst**: Maps identifier relationships across tables
+3. **Priority Assessment Specialist**: Ranks identifiers by stability and coverage
+
+### Step 6: Priority Recommendations
+I'll provide:
+- Recommended priority ordering (TD standard)
+- Reasoning for each recommendation
+- Compatibility assessment across tables
+
+---
+
+## Expected Output
+
+### Key Extraction Results Table
+```
+| database_name | table_name | column_name | data_type | identifier_type | min_value | max_value |
+|---------------|------------|-------------|-----------|-----------------|-----------|-----------|
+| analytics     | user_events| user_email  | varchar   | email           | a@test.com| z@test.com|
+| analytics     | user_events| td_client_id| varchar   | cookie_id       | 00000000-.| ffffffff-.|
+| crm           | customers  | email       | varchar   | email           | admin@... | user@...  |
+```
+
+### Exclusion Documentation
+```
+## Tables EXCLUDED from ID Unification:
+
+- **analytics.product_catalog**: No user identifier columns found
+  - Available columns: [product_id, sku, product_name, category, price]
+  - Exclusion reason: Contains only product metadata - no PII
+  - Classification: Non-PII table
+```
+
+### Validation Summary
+```
+## Analysis Summary:
+- **Tables Analyzed**: 5
+- **Tables INCLUDED**: 3 (contain user identifiers)
+- **Tables EXCLUDED**: 2 (no user identifiers)
+- **User Identifier Columns Found**: 8
+```
+
+### 3 SQL Experts Analysis
+```
+**Expert 1 - Data Pattern Analyst:**
+- Email columns show valid format patterns across 2 tables
+- td_client_id shows UUID format with good coverage
+- Data quality: High (95%+ non-null for email)
+
+**Expert 2 - Cross-Table Relationship Analyst:**
+- Email appears in analytics.user_events and crm.customers (primary link)
+- td_client_id unique to analytics.user_events (secondary ID)
+- Recommendation: Email as primary key for unification
+
+**Expert 3 - Priority Assessment Specialist:**
+- Priority 1: email (stable, cross-table presence, good coverage)
+- Priority 2: td_client_id (system-generated, analytics-specific)
+- Recommended merge_by_keys: [email, td_client_id]
+```
+
+### Priority Recommendations (TD Standard)
+```
+Recommended Priority Order (TD Standard):
+1. email - Stable identifier across multiple tables with high coverage
+2. td_client_id - System-generated ID for analytics tracking
+3. phone - Secondary contact identifier (if available)
+
+EXCLUDED Identifiers (Not User-Related):
+- product_id - Product reference, not user identifier
+- campaign_id - Campaign metadata, not user-specific
+```
+
+---
+
+## Validation Gates
+
+I'll pass through these mandatory validation gates:
+- ✅ **GATE 1**: Schema extracted for all accessible tables
+- ✅ **GATE 2**: Tables classified into INCLUSION/EXCLUSION lists
+- ✅ **GATE 3**: All exclusions justified and documented
+- ✅ **GATE 4**: Real data analysis completed for included columns
+- ✅ **GATE 5**: 3 SQL experts analysis completed
+- ✅ **GATE 6**: Priority recommendations provided
+
+---
+
+## Next Steps
+
+After key extraction, you can:
+1. **Proceed with full setup**: Use `/cdp-unification:unify-setup` to continue with complete configuration
+2. **Create prep tables**: Use `/cdp-unification:unify-create-prep` with the extracted keys
+3. **Review and adjust**: Discuss the results and make adjustments to table selection
+
+---
+
+## Communication Pattern
+
+I'll use **TD Copilot standard format**:
+
+**Question**: Are these extracted user identifiers sufficient for your ID unification requirements?
+
+**Suggestion**: I recommend using **email** as your primary unification key since it appears across multiple tables with good data quality.
+
+**Check Point**: The analysis shows X tables with user identifiers and Y tables excluded. This provides comprehensive coverage for customer identity resolution.
+
+---
+
+## Getting Started
+
+**Ready to extract user identifiers?** Please provide your table list:
+
+**Example:**
+```
+Please analyze these tables for ID unification:
+- analytics.user_events
+- crm.customers
+- web.pageviews
+- marketing.campaigns
+```
+
+I'll call the **unif-keys-extractor** agent to perform comprehensive analysis with ZERO-TOLERANCE validation.
+
+---
+
+**Let's begin the analysis!**
--- a/commands/unify-setup.md
+++ b/commands/unify-setup.md
@@ -0,0 +1,200 @@
+---
+name: unify-setup
+description: Complete end-to-end ID unification setup from table analysis to deployment
+---
+
+# Complete ID Unification Setup
+
+## Overview
+
+I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:
+
+1. **Extract and validate user identifiers** from your tables
+2. **Help you choose the right ID method** (canonical_id vs persistent_id)
+3. **Generate prep table configurations** for data standardization
+4. **Create core unification files** (unify.yml and id_unification.dig)
+5. **Set up staging enrichment** for post-unification processing
+6. **Create orchestration workflow** (unif_runner.dig) to run everything in sequence
+
+---
+
+## What You Need to Provide
+
+### 1. Table List
+Please provide the list of tables you want to include in ID unification:
+- Format: `database.table_name` (e.g., `analytics.user_events`, `crm.customers`)
+- I'll analyze each table using Treasure Data MCP tools to extract user identifiers
+
+### 2. Client Configuration
+- **Client short name**: Your client identifier (e.g., `mck`, `client`)
+- **Unification name**: Name for this unification project (e.g., `claude`, `customer_360`)
+- **Lookup/Config database suffix**: (default: `config`)
+  - Creates database: `${client_short_name}_${lookup_suffix}` (e.g., `client_config`)
+  - ⚠️ **I WILL CREATE THIS DATABASE** if it doesn't exist
+
+### 3. ID Method Selection
+I'll explain the options and help you choose:
+- **persistent_id**: Stable IDs that persist across updates (recommended for most cases)
+- **canonical_id**: Traditional approach with merge capabilities
+
+### 4. Update Strategy
+- **Incremental**: Process only new/updated records
+- **Full Refresh**: Reprocess all data each time
+
+### 5. Regional Endpoint
+- **US**: https://api-cdp.treasuredata.com
+- **EU**: https://api-cdp.eu01.treasuredata.com
+- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com
+- **Japan**: https://api-cdp.treasuredata.co.jp
+
+---
+
+## What I'll Do
+
+### Step 1: Extract and Validate Keys (via unif-keys-extractor agent)
+I'll:
+- Use Treasure Data MCP tools to analyze table schemas
+- Extract user identifier columns (email, phone, td_client_id, etc.)
+- Query sample data to validate identifier patterns
+- Provide 3 SQL experts analysis of key relationships
+- Recommend priority ordering for unification keys
+- Exclude tables without user identifiers
+
+### Step 2: Configuration Guidance
+I'll:
+- Explain canonical_id vs persistent_id concepts
+- Recommend best approach for your use case
+- Discuss incremental vs full refresh strategies
+- Help you understand regional endpoint requirements
+
+### Step 3: Generate Prep Tables (via dynamic-prep-creation agent)
+I'll create:
+- `unification/dynmic_prep_creation.dig` - Prep workflow
+- `unification/queries/create_schema.sql` - Schema creation
+- `unification/queries/loop_on_tables.sql` - Dynamic loop logic
+- `unification/queries/unif_input_tbl.sql` - DSAR processing and data cleaning
+- `unification/config/environment.yml` - Client configuration
+- `unification/config/src_prep_params.yml` - Dynamic table mappings
+
+### Step 4: Generate Core Unification (via id-unification-creator agent)
+I'll create:
+- `unification/config/unify.yml` - Unification configuration with keys and tables
+- `unification/id_unification.dig` - Core unification workflow with HTTP API call
+- Updated `unification/queries/create_schema.sql` - Schema with all required columns
+
+### Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)
+I'll create:
+- `unification/config/stage_enrich.yml` - Enrichment configuration
+- `unification/enrich/queries/generate_join_query.sql` - Join query generation
+- `unification/enrich/queries/execute_join_presto.sql` - Presto execution
+- `unification/enrich/queries/execute_join_hive.sql` - Hive execution
+- `unification/enrich/queries/enrich_tbl_creation.sql` - Table creation
+- `unification/enrich_runner.dig` - Enrichment workflow
+
+### Step 6: Create Main Orchestration
+I'll create:
+- `unification/unif_runner.dig` - Main workflow that calls:
+  - prep_creation → id_unification → enrichment (in sequence)
+
+### Step 7: ⚠️ MANDATORY VALIDATION (NEW!)
+**CRITICAL**: Before deployment, I MUST run comprehensive validation:
+- `/cdp-unification:unify-validate` command
+- Validates ALL files against exact templates
+- Checks database and table existence
+- Verifies configuration consistency
+- **BLOCKS deployment if ANY validation fails**
+
+**If validation FAILS:**
+- I will show exact fix commands
+- You must fix all errors
+- Re-run validation until 100% pass
+- Only then proceed to deployment
+
+**If validation PASSES:**
+- Proceed to deployment with confidence
+- All files are production-ready
+
+### Step 8: Deployment Guidance
+I'll provide:
+- Configuration summary
+- Deployment instructions
+- Operating guidelines
+- Monitoring recommendations
+
+---
+
+## Interactive Workflow
+
+I'll use the **TD Copilot communication pattern** throughout:
+
+- **Question**: When I need your input or choice
+- **Suggestion**: When I recommend a specific approach
+- **Check Point**: When you should verify understanding
+
+---
+
+## Expected Output
+
+### Files Created (All under `unification/` directory):
+
+**Workflows:**
+- `unif_runner.dig` - Main orchestration workflow
+- `dynmic_prep_creation.dig` - Prep table creation
+- `id_unification.dig` - Core unification
+- `enrich_runner.dig` - Staging enrichment
+
+**Configuration:**
+- `config/environment.yml` - Client settings
+- `config/src_prep_params.yml` - Prep table mappings
+- `config/unify.yml` - Unification configuration
+- `config/stage_enrich.yml` - Enrichment configuration
+
+**SQL Templates:**
+- `queries/create_schema.sql` - Schema creation
+- `queries/loop_on_tables.sql` - Dynamic loop logic
+- `queries/unif_input_tbl.sql` - DSAR and data cleaning
+- `enrich/queries/generate_join_query.sql` - Join generation
+- `enrich/queries/execute_join_presto.sql` - Presto execution
+- `enrich/queries/execute_join_hive.sql` - Hive execution
+- `enrich/queries/enrich_tbl_creation.sql` - Table creation
+
+---
+
+## Success Criteria
+
+All generated files will:
+- ✅ Be TD-compliant and deployment-ready
+- ✅ Use exact templates from documentation
+- ✅ Include comprehensive error handling
+- ✅ Follow TD Copilot standards
+- ✅ Work without modification in Treasure Data
+- ✅ Support incremental processing
+- ✅ Include DSAR processing
+- ✅ Generate proper master tables
+
+---
+
+## Getting Started
+
+**Ready to begin?** Please provide:
+
+1. Your table list (database.table_name format)
+2. Client short name
+3. Unification name
+
+I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.
+
+**Example:**
+```
+I want to set up ID unification for:
+- analytics.user_events
+- crm.customers
+- web.pageviews
+
+Client: mck
+Unification name: customer_360
+```
+
+---
+
+**Let's get started!**
--- a/commands/unify-validate.md
+++ b/commands/unify-validate.md
@@ -0,0 +1,194 @@
+---
+name: unify-validate
+description: Validate all ID unification files against exact templates before deployment
+---
+
+# ID Unification Validation Command
+
+## Purpose
+
+**MANDATORY validation gate** that checks ALL generated unification files against exact templates from agent prompts. This prevents deployment of incorrect configurations.
+
+**⚠️ CRITICAL**: This command MUST complete successfully before `td wf push` or workflow execution.
+
+---
+
+## What This Command Validates
+
+### 1. File Existence Check
+- ✅ `unification/unif_runner.dig` exists
+- ✅ `unification/dynmic_prep_creation.dig` exists
+- ✅ `unification/id_unification.dig` exists
+- ✅ `unification/enrich_runner.dig` exists
+- ✅ `unification/config/environment.yml` exists
+- ✅ `unification/config/src_prep_params.yml` exists
+- ✅ `unification/config/unify.yml` exists
+- ✅ `unification/config/stage_enrich.yml` exists
+- ✅ All SQL files in `unification/queries/` exist
+- ✅ All SQL files in `unification/enrich/queries/` exist
+
+### 2. Template Compliance Check
+
+**unif_runner.dig Validation:**
+- ✅ Uses `require>` operator (NOT `call>`)
+- ✅ No `echo>` operators with subtasks
+- ✅ Matches exact template from `/plugins/cdp-unification/prompt.md` lines 186-217
+- ✅ Has `_error:` section with email_alert
+- ✅ Includes both `config/environment.yml` and `config/src_prep_params.yml`
+
+**stage_enrich.yml Validation:**
+- ✅ RULE 1: `unif_input` table has `column` and `key` both using `alias_as`
+- ✅ RULE 2: Staging tables have `column` using `col.name` and `key` using `alias_as`
+- ✅ All key_columns match actual columns from `src_prep_params.yml`
+- ✅ No template columns (like adobe_clickstream, loyalty_id_std)
+- ✅ Table names match `src_tbl` (NO _prep suffix)
+
+**enrich_runner.dig Validation:**
+- ✅ Matches exact template from `unification-staging-enricher.md` lines 261-299
+- ✅ Includes all 3 config files in `_export`
+- ✅ Uses `td_for_each>` for dynamic execution
+- ✅ Has Presto and Hive conditional execution
+
+### 3. Database & Table Existence Check
+- ✅ `${client_short_name}_${src}` database exists
+- ✅ `${client_short_name}_${stg}` database exists
+- ✅ `${client_short_name}_${gld}` database exists (if used)
+- ✅ `${client_short_name}_${lkup}` database exists
+- ✅ `cdp_unification_${unif_name}` database exists
+- ✅ `${client_short_name}_${lkup}.exclusion_list` table exists
+
+### 4. Configuration Validation
+- ✅ All variables in `environment.yml` are defined
+- ✅ All tables in `src_prep_params.yml` exist in source database
+- ✅ All columns in `src_prep_params.yml` exist in source tables
+- ✅ `unify.yml` merge_by_keys match `src_prep_params.yml` alias_as columns
+- ✅ No undefined variables (${...})
+
+### 5. YAML Syntax Check
+- ✅ All YAML files have valid syntax
+- ✅ Proper indentation (2 spaces)
+- ✅ No tabs in YAML files
+- ✅ All strings properly quoted where needed
+
+---
+
+## Validation Report Format
+
+```
+╔══════════════════════════════════════════════════════════════╗
+║          ID UNIFICATION VALIDATION REPORT                    ║
+╚══════════════════════════════════════════════════════════════╝
+
+[1/5] File Existence Check
+  ✅ unification/unif_runner.dig
+  ✅ unification/dynmic_prep_creation.dig
+  ✅ unification/id_unification.dig
+  ✅ unification/enrich_runner.dig
+  ✅ unification/config/environment.yml
+  ✅ unification/config/src_prep_params.yml
+  ✅ unification/config/unify.yml
+  ✅ unification/config/stage_enrich.yml
+  ✅ 3/3 SQL files in queries/
+  ✅ 4/4 SQL files in enrich/queries/
+
+[2/5] Template Compliance Check
+  ✅ unif_runner.dig uses require> operator
+  ✅ unif_runner.dig has no echo> conflicts
+  ✅ stage_enrich.yml RULE 1 compliant (unif_input table)
+  ✅ stage_enrich.yml RULE 2 compliant (staging tables)
+  ❌ stage_enrich.yml has incorrect mapping on line 23
+      Expected: column: email_address_std
+      Found:    column: email
+      FIX: Update line 23 to use col.name from src_prep_params.yml
+
+[3/5] Database & Table Existence
+  ✅ client_src exists
+  ✅ client_stg exists
+  ✅ client_gld exists
+  ✅ client_config exists
+  ❌ client_config.exclusion_list does NOT exist
+      FIX: Run: td query -d client_config -t presto -w "CREATE TABLE IF NOT EXISTS exclusion_list (key_value VARCHAR, key_name VARCHAR, tbls ARRAY(VARCHAR), note VARCHAR)"
+
+[4/5] Configuration Validation
+  ✅ All variables defined in environment.yml
+  ✅ Source table client_stg.snowflake_orders exists
+  ✅ All columns exist in source table
+  ✅ unify.yml keys match src_prep_params.yml
+
+[5/5] YAML Syntax Check
+  ✅ All YAML files have valid syntax
+  ✅ Proper indentation
+  ✅ No tabs found
+
+╔══════════════════════════════════════════════════════════════╗
+║                    VALIDATION SUMMARY                        ║
+╚══════════════════════════════════════════════════════════════╝
+
+Total Checks: 45
+Passed: 43 ✅
+Failed: 2 ❌
+
+❌ VALIDATION FAILED - DO NOT DEPLOY
+
+Required Actions:
+1. Fix stage_enrich.yml line 23 mapping
+2. Create client_config.exclusion_list table
+
+Re-run validation after fixes: /cdp-unification:unify-validate
+```
+
+---
+
+## Error Codes
+
+- **EXIT 0**: All validations passed ✅
+- **EXIT 1**: File existence failures
+- **EXIT 2**: Template compliance failures
+- **EXIT 3**: Database/table missing
+- **EXIT 4**: Configuration errors
+- **EXIT 5**: YAML syntax errors
+
+---
+
+## Usage
+
+**Standalone:**
+```
+/cdp-unification:unify-validate
+```
+
+**Auto-triggered in unify-setup** (MANDATORY step before deployment)
+
+**Manual validation before deployment:**
+```
+cd unification
+/cdp-unification:unify-validate
+```
+
+If validation PASSES → Proceed with `td wf push unification`
+If validation FAILS → Fix errors and re-validate
+
+---
+
+## Integration with unify-setup
+
+The `/unify-setup` command will automatically:
+1. Generate all unification files
+2. **RUN VALIDATION** (this command)
+3. **BLOCK deployment** if validation fails
+4. **Show fix instructions** for each error
+5. **Auto-retry validation** after fixes
+6. Only proceed to deployment after 100% validation success
+
+---
+
+## Success Criteria
+
+✅ **ALL checks must pass** before deployment is allowed
+✅ **No exceptions** - even 1 failure blocks deployment
+✅ **Detailed error messages** with exact fix instructions
+✅ **Auto-remediation suggestions** where possible
+
+---
+
+**Let's validate your unification files!**