Initial commit
This commit is contained in:
314
commands/unify-create-config.md
Normal file
314
commands/unify-create-config.md
Normal file
@@ -0,0 +1,314 @@
|
||||
---
|
||||
name: unify-create-config
|
||||
description: Generate core ID unification configuration files (unify.yml and id_unification.dig)
|
||||
---
|
||||
|
||||
# Create Core Unification Configuration
|
||||
|
||||
## Overview
|
||||
|
||||
I'll generate core ID unification configuration files using the **id-unification-creator** specialized agent.
|
||||
|
||||
This command creates **TD-COMPLIANT** unification files:
|
||||
- ✅ **DYNAMIC CONFIGURATION** - Based on prep table analysis
|
||||
- ✅ **METHOD-SPECIFIC** - Persistent_id OR canonical_id (never both)
|
||||
- ✅ **REGIONAL ENDPOINTS** - Correct URL for your region
|
||||
- ✅ **SCHEMA VALIDATION** - Prevents first-run failures
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**REQUIRED**: Prep table configuration must exist:
|
||||
- `unification/config/environment.yml` - Client configuration
|
||||
- `unification/config/src_prep_params.yml` - Prep table mappings
|
||||
|
||||
If you haven't created these yet, run:
|
||||
- `/cdp-unification:unify-create-prep` first, OR
|
||||
- `/cdp-unification:unify-setup` for complete end-to-end setup
|
||||
|
||||
---
|
||||
|
||||
## What You Need to Provide
|
||||
|
||||
### 1. ID Method Selection
|
||||
Choose ONE method:
|
||||
|
||||
**Option A: persistent_id (RECOMMENDED)**
|
||||
- Stable IDs that persist across updates
|
||||
- Better for customer data platforms
|
||||
- Recommended for most use cases
|
||||
- **Provide persistent_id name** (e.g., `td_claude_id`, `stable_customer_id`)
|
||||
|
||||
**Option B: canonical_id**
|
||||
- Traditional approach with merge capabilities
|
||||
- Good for legacy systems
|
||||
- **Provide canonical_id name** (e.g., `master_customer_id`)
|
||||
|
||||
### 2. Update Strategy
|
||||
- **Full Refresh**: Reprocess all data each time (`full_refresh: true`)
|
||||
- **Incremental**: Process only new/updated records (`full_refresh: false`)
|
||||
|
||||
### 3. Regional Endpoint
|
||||
Choose your Treasure Data region:
|
||||
- **US**: https://api-cdp.treasuredata.com/unifications/workflow_call
|
||||
- **EU**: https://api-cdp.eu01.treasuredata.com/unifications/workflow_call
|
||||
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com/unifications/workflow_call
|
||||
- **Japan**: https://api-cdp.treasuredata.co.jp/unifications/workflow_call
|
||||
|
||||
### 4. Unification Name
|
||||
- Name for this unification project (e.g., `claude`, `customer_360`)
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Validate Prerequisites
|
||||
I'll check that these files exist:
|
||||
- `unification/config/environment.yml`
|
||||
- `unification/config/src_prep_params.yml`
|
||||
|
||||
And extract:
|
||||
- Client short name
|
||||
- Unified input table name
|
||||
- All prep table configurations with column mappings
|
||||
|
||||
### Step 2: Extract Key Information
|
||||
I'll parse `src_prep_params.yml` to identify:
|
||||
- All unique `alias_as` column names
|
||||
- Key types: email, phone, td_client_id, td_global_id, customer_id, etc.
|
||||
- Complete list of available keys for `merge_by_keys`
|
||||
|
||||
### Step 3: Generate unification/config/unify.yml
|
||||
I'll create:
|
||||
```yaml
|
||||
name: {unif_name}
|
||||
|
||||
keys:
|
||||
- name: email
|
||||
invalid_texts: ['']
|
||||
- name: td_client_id
|
||||
invalid_texts: ['']
|
||||
- name: phone
|
||||
invalid_texts: ['']
|
||||
# ... ALL detected key types
|
||||
|
||||
tables:
|
||||
- database: ${client_short_name}_${stg}
|
||||
table: ${globals.unif_input_tbl}
|
||||
incremental_columns: [time]
|
||||
key_columns:
|
||||
- {column: email, key: email}
|
||||
- {column: td_client_id, key: td_client_id}
|
||||
- {column: phone, key: phone}
|
||||
# ... ALL alias_as columns mapped
|
||||
|
||||
# ONLY ONE of these sections (based on your selection):
|
||||
persistent_ids:
|
||||
- name: {persistent_id_name}
|
||||
merge_by_keys: [email, td_client_id, phone, ...]
|
||||
merge_iterations: 15
|
||||
|
||||
# OR
|
||||
|
||||
canonical_ids:
|
||||
- name: {canonical_id_name}
|
||||
merge_by_keys: [email, td_client_id, phone, ...]
|
||||
merge_iterations: 15
|
||||
```
|
||||
|
||||
### Step 4: Validate and Update Schema (CRITICAL)
|
||||
I'll prevent first-run failures by:
|
||||
1. Reading `unify.yml` to extract `merge_by_keys` list
|
||||
2. Reading `queries/create_schema.sql` to check existing columns
|
||||
3. Comparing required vs existing columns
|
||||
4. Updating `create_schema.sql` if missing columns:
|
||||
- Add all keys from `merge_by_keys` as varchar
|
||||
- Add source, time, ingest_time columns
|
||||
- Update BOTH table definitions (main and tmp)
|
||||
|
||||
### Step 5: Generate unification/id_unification.dig
|
||||
I'll create:
|
||||
```yaml
|
||||
timezone: UTC
|
||||
|
||||
_export:
|
||||
!include : config/environment.yml
|
||||
!include : config/src_prep_params.yml
|
||||
|
||||
+call_unification:
|
||||
http_call>: {REGIONAL_ENDPOINT_URL}
|
||||
headers:
|
||||
- authorization: ${secret:td.apikey}
|
||||
- content-type: application/json
|
||||
method: POST
|
||||
retry: true
|
||||
content_format: json
|
||||
content:
|
||||
run_persistent_ids: {true/false} # ONLY if persistent_id
|
||||
run_canonical_ids: {true/false} # ONLY if canonical_id
|
||||
run_enrichments: true
|
||||
run_master_tables: true
|
||||
full_refresh: {true/false}
|
||||
keep_debug_tables: true
|
||||
unification:
|
||||
!include : config/unify.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Files Created
|
||||
```
|
||||
unification/
|
||||
├── config/
|
||||
│ └── unify.yml ✓ Dynamic configuration
|
||||
├── queries/
|
||||
│ └── create_schema.sql ✓ Updated with all columns
|
||||
└── id_unification.dig ✓ Core unification workflow
|
||||
```
|
||||
|
||||
### Example unify.yml (persistent_id method)
|
||||
```yaml
|
||||
name: customer_360
|
||||
|
||||
keys:
|
||||
- name: email
|
||||
invalid_texts: ['']
|
||||
- name: td_client_id
|
||||
invalid_texts: ['']
|
||||
valid_regexp: '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
|
||||
|
||||
tables:
|
||||
- database: ${client_short_name}_${stg}
|
||||
table: ${globals.unif_input_tbl}
|
||||
incremental_columns: [time]
|
||||
key_columns:
|
||||
- {column: email, key: email}
|
||||
- {column: td_client_id, key: td_client_id}
|
||||
|
||||
persistent_ids:
|
||||
- name: td_claude_id
|
||||
merge_by_keys: [email, td_client_id]
|
||||
merge_iterations: 15
|
||||
```
|
||||
|
||||
### Example id_unification.dig (US region, incremental)
|
||||
```yaml
|
||||
timezone: UTC
|
||||
|
||||
_export:
|
||||
!include : config/environment.yml
|
||||
!include : config/src_prep_params.yml
|
||||
|
||||
+call_unification:
|
||||
http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
|
||||
headers:
|
||||
- authorization: ${secret:td.apikey}
|
||||
- content-type: application/json
|
||||
method: POST
|
||||
retry: true
|
||||
content_format: json
|
||||
content:
|
||||
run_persistent_ids: true
|
||||
run_enrichments: true
|
||||
run_master_tables: true
|
||||
full_refresh: false
|
||||
keep_debug_tables: true
|
||||
unification:
|
||||
!include : config/unify.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Requirements
|
||||
|
||||
### ✅ Dynamic Configuration
|
||||
- All keys detected from `src_prep_params.yml`
|
||||
- All column mappings from prep analysis
|
||||
- Method-specific configuration (never both)
|
||||
|
||||
### ⚠️ Schema Completeness
|
||||
- `create_schema.sql` MUST contain ALL columns from `merge_by_keys`
|
||||
- Prevents "column not found" errors on first run
|
||||
- Updates both main and tmp table definitions
|
||||
|
||||
### ⚠️ Config File Inclusion
|
||||
- `id_unification.dig` MUST include BOTH config files in `_export`:
|
||||
- `environment.yml` - For `${client_short_name}_${stg}`
|
||||
- `src_prep_params.yml` - For `${globals.unif_input_tbl}`
|
||||
|
||||
### ⚠️ Regional Endpoint
|
||||
- Must use exact URL for selected region
|
||||
- Different endpoints for US, EU, Asia Pacific, Japan
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before completing, I'll verify:
|
||||
- [ ] unify.yml contains all detected key types
|
||||
- [ ] key_columns section maps ALL alias_as columns
|
||||
- [ ] Only ONE ID method section exists
|
||||
- [ ] merge_by_keys includes ALL available keys
|
||||
- [ ] **CRITICAL**: create_schema.sql contains ALL columns from merge_by_keys
|
||||
- [ ] **CRITICAL**: Both table definitions updated (main and tmp)
|
||||
- [ ] id_unification.dig has correct regional endpoint
|
||||
- [ ] **CRITICAL**: _export includes BOTH config files
|
||||
- [ ] Workflow flags match selected method only
|
||||
- [ ] Proper TD YAML/DIG syntax
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
All generated files will:
|
||||
- ✅ **TD-COMPLIANT** - Work without modification in TD
|
||||
- ✅ **DYNAMICALLY CONFIGURED** - Based on actual prep analysis
|
||||
- ✅ **METHOD-ACCURATE** - Exact implementation of selected method
|
||||
- ✅ **REGIONALLY CORRECT** - Proper endpoint for region
|
||||
- ✅ **SCHEMA-COMPLETE** - All required columns present
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After creating core config, you can:
|
||||
1. **Test unification workflow**: `dig run unification/id_unification.dig`
|
||||
2. **Add enrichment**: Use `/cdp-unification:unify-setup` to add staging enrichment
|
||||
3. **Create main orchestrator**: Combine prep + unification + enrichment
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
**Ready to create core unification config?** Please provide:
|
||||
|
||||
1. **ID Method**:
|
||||
- Choose: `persistent_id` or `canonical_id`
|
||||
- Provide ID name: e.g., `td_claude_id`
|
||||
|
||||
2. **Update Strategy**:
|
||||
- Choose: `incremental` or `full_refresh`
|
||||
|
||||
3. **Regional Endpoint**:
|
||||
- Choose: `US`, `EU`, `Asia Pacific`, or `Japan`
|
||||
|
||||
4. **Unification Name**:
|
||||
- e.g., `customer_360`, `claude`
|
||||
|
||||
**Example:**
|
||||
```
|
||||
ID Method: persistent_id
|
||||
ID Name: td_claude_id
|
||||
Update Strategy: incremental
|
||||
Region: US
|
||||
Unification Name: customer_360
|
||||
```
|
||||
|
||||
I'll call the **id-unification-creator** agent to generate all core unification files.
|
||||
|
||||
---
|
||||
|
||||
**Let's create your unification configuration!**
|
||||
233
commands/unify-create-prep.md
Normal file
233
commands/unify-create-prep.md
Normal file
@@ -0,0 +1,233 @@
|
||||
---
|
||||
name: unify-create-prep
|
||||
description: Generate prep table creation files and configuration for ID unification
|
||||
---
|
||||
|
||||
# Create Prep Table Configuration
|
||||
|
||||
## Overview
|
||||
|
||||
I'll generate prep table creation files and configuration using the **dynamic-prep-creation** specialized agent.
|
||||
|
||||
This command creates **PRODUCTION-READY** prep table files:
|
||||
- ⚠️ **EXACT TEMPLATES** - No modifications allowed
|
||||
- ⚠️ **ZERO CHANGES** - Character-for-character accuracy
|
||||
- ✅ **GENERIC FILES** - Reusable across all projects
|
||||
- ✅ **DYNAMIC CONFIGURATION** - Adapts to your table structure
|
||||
|
||||
---
|
||||
|
||||
## What You Need to Provide
|
||||
|
||||
### 1. Table Analysis Results
|
||||
If you've already run key extraction:
|
||||
- Provide the list of **included tables** with their user identifier columns
|
||||
- I can use the results from `/cdp-unification:unify-extract-keys`
|
||||
|
||||
OR provide directly:
|
||||
- **Source tables**: database.table_name format
|
||||
- **User identifier columns**: For each table, which columns contain identifiers
|
||||
|
||||
### 2. Client Configuration
|
||||
- **Client short name**: Your client identifier (e.g., `mck`, `client_name`)
|
||||
- **Database suffixes**:
|
||||
- Source database suffix (default: `src`)
|
||||
- Staging database suffix (default: `stg`)
|
||||
- Lookup database (default: `config`)
|
||||
|
||||
### 3. Column Mappings
|
||||
For each table, specify which columns to include and their unified aliases:
|
||||
- **Email columns** → alias: `email`
|
||||
- **Phone columns** → alias: `phone`
|
||||
- **Customer ID columns** → alias: `customer_id`
|
||||
- **TD Client ID** → alias: `td_client_id`
|
||||
- **TD Global ID** → alias: `td_global_id`
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Create Directory Structure
|
||||
I'll create:
|
||||
- `unification/config/` directory
|
||||
- `unification/queries/` directory
|
||||
|
||||
### Step 2: Generate Generic Files (EXACT TEMPLATES)
|
||||
I'll create these files with **ZERO MODIFICATIONS**:
|
||||
|
||||
**⚠️ `unification/dynmic_prep_creation.dig`** (EXACT filename - no 'a' in dynmic)
|
||||
- Generic prep workflow
|
||||
- Handles schema creation, table looping, and data insertion
|
||||
- Uses variables from config files
|
||||
|
||||
**⚠️ `unification/queries/create_schema.sql`**
|
||||
- Generic schema creation for unified input table
|
||||
- Creates both main and tmp tables
|
||||
|
||||
**⚠️ `unification/queries/loop_on_tables.sql`**
|
||||
- Complex production SQL for dynamic table processing
|
||||
- Generates prep table SQL and unified input table SQL
|
||||
- Handles incremental logic and deduplication
|
||||
|
||||
**⚠️ `unification/queries/unif_input_tbl.sql`**
|
||||
- DSAR processing and data cleaning
|
||||
- Exclusion list management for masked data
|
||||
- Dynamic column detection and insertion
|
||||
|
||||
### Step 3: Generate Dynamic Configuration Files
|
||||
|
||||
**`unification/config/environment.yml`**
|
||||
```yaml
|
||||
client_short_name: {your_client_name}
|
||||
src: src
|
||||
stg: stg
|
||||
gld: gld
|
||||
lkup: references
|
||||
```
|
||||
|
||||
**`unification/config/src_prep_params.yml`**
|
||||
- Dynamic table configuration based on your table analysis
|
||||
- Column mappings with unified aliases
|
||||
- Prep table naming conventions
|
||||
|
||||
### Step 4: Dynamic Column Detection (CRITICAL)
|
||||
For `unif_input_tbl.sql`, I'll:
|
||||
1. Query Treasure Data schema: `information_schema.columns`
|
||||
2. Detect all columns besides email, phone, source, ingest_time, time
|
||||
3. Auto-generate column list for data_cleaned CTE
|
||||
4. Replace placeholder with actual columns
|
||||
|
||||
---
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Generic Files (EXACT - NO CHANGES)
|
||||
```
|
||||
unification/
|
||||
├── dynmic_prep_creation.dig ⚠️ EXACT filename
|
||||
├── queries/
|
||||
│ ├── create_schema.sql ⚠️ EXACT content
|
||||
│ ├── loop_on_tables.sql ⚠️ EXACT content
|
||||
│ └── unif_input_tbl.sql ⚠️ WITH dynamic columns
|
||||
```
|
||||
|
||||
### Dynamic Configuration Files
|
||||
```
|
||||
unification/config/
|
||||
├── environment.yml ✓ Client-specific
|
||||
└── src_prep_params.yml ✓ Table-specific
|
||||
```
|
||||
|
||||
### Example src_prep_params.yml Structure
|
||||
```yaml
|
||||
globals:
|
||||
unif_input_tbl: unif_input
|
||||
|
||||
prep_tbls:
|
||||
- src_tbl: user_events
|
||||
src_db: ${client_short_name}_${stg}
|
||||
snk_db: ${client_short_name}_${stg}
|
||||
snk_tbl: ${src_tbl}_prep
|
||||
columns:
|
||||
- col:
|
||||
name: user_email
|
||||
alias_as: email
|
||||
- col:
|
||||
name: td_client_id
|
||||
alias_as: td_client_id
|
||||
|
||||
- src_tbl: customers
|
||||
src_db: ${client_short_name}_${stg}
|
||||
snk_db: ${client_short_name}_${stg}
|
||||
snk_tbl: ${src_tbl}_prep
|
||||
columns:
|
||||
- col:
|
||||
name: email
|
||||
alias_as: email
|
||||
- col:
|
||||
name: customer_id
|
||||
alias_as: customer_id
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Requirements
|
||||
|
||||
### ⚠️ NEVER MODIFY GENERIC FILES
|
||||
- **dynmic_prep_creation.dig**: EXACT template, character-for-character
|
||||
- **create_schema.sql**: EXACT SQL, no changes
|
||||
- **loop_on_tables.sql**: EXACT complex SQL, no modifications
|
||||
- **unif_input_tbl.sql**: EXACT template + dynamic column replacement
|
||||
|
||||
### ✅ DYNAMIC CONFIGURATION ONLY
|
||||
- **environment.yml**: Client-specific variables
|
||||
- **src_prep_params.yml**: Table-specific mappings
|
||||
|
||||
### 🚨 CRITICAL FILENAME
|
||||
- **MUST be "dynmic_prep_creation.dig"** (NO 'a' in dynmic)
|
||||
- This is intentional - production systems expect this exact name
|
||||
|
||||
### 🚨 NO TIME COLUMN
|
||||
- **NEVER ADD** `time` column to src_prep_params.yml
|
||||
- Time is auto-generated by SQL template
|
||||
- Only include actual identifier columns
|
||||
|
||||
---
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
Before completing, I'll verify:
|
||||
- [ ] File named "dynmic_prep_creation.dig" exists
|
||||
- [ ] Content matches template character-for-character
|
||||
- [ ] All variable placeholders preserved
|
||||
- [ ] Queries folder contains exact SQL files
|
||||
- [ ] Config folder contains YAML files
|
||||
- [ ] Dynamic columns inserted in unif_input_tbl.sql
|
||||
- [ ] No time column in src_prep_params.yml
|
||||
- [ ] All directories created
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
All generated files will:
|
||||
- ✅ **EXACT TEMPLATES** - Character-for-character accuracy
|
||||
- ✅ **PRODUCTION-READY** - Deployable to TD without changes
|
||||
- ✅ **DYNAMIC CONFIGURATION** - Adapts to table structure
|
||||
- ✅ **DSAR COMPLIANT** - Includes exclusion list processing
|
||||
- ✅ **INCREMENTAL PROCESSING** - Supports time-based updates
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After prep creation, you can:
|
||||
1. **Test prep workflow**: `dig run unification/dynmic_prep_creation.dig`
|
||||
2. **Create unification config**: Use `/cdp-unification:unify-create-config`
|
||||
3. **Complete full setup**: Use `/cdp-unification:unify-setup`
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
**Ready to create prep tables?** Please provide:
|
||||
|
||||
1. **Table list with columns**:
|
||||
```
|
||||
Table: analytics.user_events
|
||||
Columns: user_email (email), td_client_id (td_client_id)
|
||||
|
||||
Table: crm.customers
|
||||
Columns: email (email), customer_id (customer_id)
|
||||
```
|
||||
|
||||
2. **Client configuration**:
|
||||
```
|
||||
Client short name: mck
|
||||
```
|
||||
|
||||
I'll call the **dynamic-prep-creation** agent to generate all prep files with exact templates.
|
||||
|
||||
---
|
||||
|
||||
**Let's create your prep table configuration!**
|
||||
191
commands/unify-extract-keys.md
Normal file
191
commands/unify-extract-keys.md
Normal file
@@ -0,0 +1,191 @@
|
||||
---
|
||||
name: unify-extract-keys
|
||||
description: Extract and validate user identifier columns from tables using live Treasure Data analysis
|
||||
---
|
||||
|
||||
# Extract and Validate User Identifiers
|
||||
|
||||
## Overview
|
||||
|
||||
I'll analyze your Treasure Data tables to extract and validate user identifier columns using the **unif-keys-extractor** specialized agent.
|
||||
|
||||
This command performs **ZERO-TOLERANCE** identifier extraction:
|
||||
- ❌ **NO GUESSING** - Only uses real Treasure Data MCP tools
|
||||
- ❌ **NO ASSUMPTIONS** - Every table is analyzed with live data
|
||||
- ✅ **STRICT VALIDATION** - Only includes tables with actual user identifiers
|
||||
- ✅ **COMPREHENSIVE ANALYSIS** - 3 SQL experts review and priority recommendations
|
||||
|
||||
---
|
||||
|
||||
## What You Need to Provide
|
||||
|
||||
### Table List
|
||||
Provide the tables you want to analyze for ID unification:
|
||||
- **Format**: `database.table_name`
|
||||
- **Example**: `analytics.user_events`, `crm.customers`, `web.pageviews`
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Schema Extraction (MANDATORY)
|
||||
For each table, I'll:
|
||||
- Call `mcp__mcc_treasuredata__describe_table(table, database)`
|
||||
- Extract EXACT column names and data types
|
||||
- Identify tables that are inaccessible
|
||||
|
||||
### Step 2: User Identifier Detection (STRICT MATCHING)
|
||||
I'll scan for valid user identifier columns:
|
||||
|
||||
**✅ VALID USER IDENTIFIERS:**
|
||||
- **Email columns**: email, email_std, email_address, user_email, customer_email
|
||||
- **Phone columns**: phone, phone_std, phone_number, mobile_phone, customer_phone
|
||||
- **User ID columns**: user_id, customer_id, account_id, member_id, uid, user_uuid
|
||||
- **Identity columns**: profile_id, identity_id, cognito_identity_userid
|
||||
- **Cookie/Device IDs**: td_client_id, td_global_id, td_ssc_id, cookie_id, device_id
|
||||
|
||||
**❌ NOT USER IDENTIFIERS (EXCLUDED):**
|
||||
- System columns: id, created_at, updated_at, load_timestamp
|
||||
- Campaign columns: campaign_id, message_id
|
||||
- Product columns: product_id, sku, variant_id
|
||||
- Complex types: array, map, json columns
|
||||
|
||||
### Step 3: Exclusion Validation (CRITICAL)
|
||||
For tables WITHOUT user identifiers, I'll:
|
||||
- Document the exclusion reason
|
||||
- List available columns for transparency
|
||||
- Explain why the table doesn't qualify
|
||||
|
||||
### Step 4: Min/Max Data Analysis (INCLUDED TABLES ONLY)
|
||||
For tables WITH user identifiers, I'll:
|
||||
- Query actual data: `SELECT MIN(column), MAX(column) FROM table`
|
||||
- Validate data patterns and formats
|
||||
- Assess data quality
|
||||
|
||||
### Step 5: 3 SQL Experts Analysis
|
||||
I'll provide structured analysis from three perspectives:
|
||||
1. **Data Pattern Analyst**: Reviews actual min/max values and data quality
|
||||
2. **Cross-Table Relationship Analyst**: Maps identifier relationships across tables
|
||||
3. **Priority Assessment Specialist**: Ranks identifiers by stability and coverage
|
||||
|
||||
### Step 6: Priority Recommendations
|
||||
I'll provide:
|
||||
- Recommended priority ordering (TD standard)
|
||||
- Reasoning for each recommendation
|
||||
- Compatibility assessment across tables
|
||||
|
||||
---
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Key Extraction Results Table
|
||||
```
|
||||
| database_name | table_name | column_name | data_type | identifier_type | min_value | max_value |
|
||||
|---------------|------------|-------------|-----------|-----------------|-----------|-----------|
|
||||
| analytics | user_events| user_email | varchar | email | a@test.com| z@test.com|
|
||||
| analytics | user_events| td_client_id| varchar | cookie_id | 00000000-.| ffffffff-.|
|
||||
| crm | customers | email | varchar | email | admin@... | user@... |
|
||||
```
|
||||
|
||||
### Exclusion Documentation
|
||||
```
|
||||
## Tables EXCLUDED from ID Unification:
|
||||
|
||||
- **analytics.product_catalog**: No user identifier columns found
|
||||
- Available columns: [product_id, sku, product_name, category, price]
|
||||
- Exclusion reason: Contains only product metadata - no PII
|
||||
- Classification: Non-PII table
|
||||
```
|
||||
|
||||
### Validation Summary
|
||||
```
|
||||
## Analysis Summary:
|
||||
- **Tables Analyzed**: 5
|
||||
- **Tables INCLUDED**: 3 (contain user identifiers)
|
||||
- **Tables EXCLUDED**: 2 (no user identifiers)
|
||||
- **User Identifier Columns Found**: 8
|
||||
```
|
||||
|
||||
### 3 SQL Experts Analysis
|
||||
```
|
||||
**Expert 1 - Data Pattern Analyst:**
|
||||
- Email columns show valid format patterns across 2 tables
|
||||
- td_client_id shows UUID format with good coverage
|
||||
- Data quality: High (95%+ non-null for email)
|
||||
|
||||
**Expert 2 - Cross-Table Relationship Analyst:**
|
||||
- Email appears in analytics.user_events and crm.customers (primary link)
|
||||
- td_client_id unique to analytics.user_events (secondary ID)
|
||||
- Recommendation: Email as primary key for unification
|
||||
|
||||
**Expert 3 - Priority Assessment Specialist:**
|
||||
- Priority 1: email (stable, cross-table presence, good coverage)
|
||||
- Priority 2: td_client_id (system-generated, analytics-specific)
|
||||
- Recommended merge_by_keys: [email, td_client_id]
|
||||
```
|
||||
|
||||
### Priority Recommendations (TD Standard)
|
||||
```
|
||||
Recommended Priority Order (TD Standard):
|
||||
1. email - Stable identifier across multiple tables with high coverage
|
||||
2. td_client_id - System-generated ID for analytics tracking
|
||||
3. phone - Secondary contact identifier (if available)
|
||||
|
||||
EXCLUDED Identifiers (Not User-Related):
|
||||
- product_id - Product reference, not user identifier
|
||||
- campaign_id - Campaign metadata, not user-specific
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Gates
|
||||
|
||||
I'll pass through these mandatory validation gates:
|
||||
- ✅ **GATE 1**: Schema extracted for all accessible tables
|
||||
- ✅ **GATE 2**: Tables classified into INCLUSION/EXCLUSION lists
|
||||
- ✅ **GATE 3**: All exclusions justified and documented
|
||||
- ✅ **GATE 4**: Real data analysis completed for included columns
|
||||
- ✅ **GATE 5**: 3 SQL experts analysis completed
|
||||
- ✅ **GATE 6**: Priority recommendations provided
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After key extraction, you can:
|
||||
1. **Proceed with full setup**: Use `/cdp-unification:unify-setup` to continue with complete configuration
|
||||
2. **Create prep tables**: Use `/cdp-unification:unify-create-prep` with the extracted keys
|
||||
3. **Review and adjust**: Discuss the results and make adjustments to table selection
|
||||
|
||||
---
|
||||
|
||||
## Communication Pattern
|
||||
|
||||
I'll use **TD Copilot standard format**:
|
||||
|
||||
**Question**: Are these extracted user identifiers sufficient for your ID unification requirements?
|
||||
|
||||
**Suggestion**: I recommend using **email** as your primary unification key since it appears across multiple tables with good data quality.
|
||||
|
||||
**Check Point**: The analysis shows X tables with user identifiers and Y tables excluded. This provides comprehensive coverage for customer identity resolution.
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
**Ready to extract user identifiers?** Please provide your table list:
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Please analyze these tables for ID unification:
|
||||
- analytics.user_events
|
||||
- crm.customers
|
||||
- web.pageviews
|
||||
- marketing.campaigns
|
||||
```
|
||||
|
||||
I'll call the **unif-keys-extractor** agent to perform comprehensive analysis with ZERO-TOLERANCE validation.
|
||||
|
||||
---
|
||||
|
||||
**Let's begin the analysis!**
|
||||
200
commands/unify-setup.md
Normal file
200
commands/unify-setup.md
Normal file
@@ -0,0 +1,200 @@
|
||||
---
|
||||
name: unify-setup
|
||||
description: Complete end-to-end ID unification setup from table analysis to deployment
|
||||
---
|
||||
|
||||
# Complete ID Unification Setup
|
||||
|
||||
## Overview
|
||||
|
||||
I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:
|
||||
|
||||
1. **Extract and validate user identifiers** from your tables
|
||||
2. **Help you choose the right ID method** (canonical_id vs persistent_id)
|
||||
3. **Generate prep table configurations** for data standardization
|
||||
4. **Create core unification files** (unify.yml and id_unification.dig)
|
||||
5. **Set up staging enrichment** for post-unification processing
|
||||
6. **Create orchestration workflow** (unif_runner.dig) to run everything in sequence
|
||||
|
||||
---
|
||||
|
||||
## What You Need to Provide
|
||||
|
||||
### 1. Table List
|
||||
Please provide the list of tables you want to include in ID unification:
|
||||
- Format: `database.table_name` (e.g., `analytics.user_events`, `crm.customers`)
|
||||
- I'll analyze each table using Treasure Data MCP tools to extract user identifiers
|
||||
|
||||
### 2. Client Configuration
|
||||
- **Client short name**: Your client identifier (e.g., `mck`, `client`)
|
||||
- **Unification name**: Name for this unification project (e.g., `claude`, `customer_360`)
|
||||
- **Lookup/Config database suffix**: (default: `config`)
|
||||
- Creates database: `${client_short_name}_${lookup_suffix}` (e.g., `client_config`)
|
||||
- ⚠️ **I WILL CREATE THIS DATABASE** if it doesn't exist
|
||||
|
||||
### 3. ID Method Selection
|
||||
I'll explain the options and help you choose:
|
||||
- **persistent_id**: Stable IDs that persist across updates (recommended for most cases)
|
||||
- **canonical_id**: Traditional approach with merge capabilities
|
||||
|
||||
### 4. Update Strategy
|
||||
- **Incremental**: Process only new/updated records
|
||||
- **Full Refresh**: Reprocess all data each time
|
||||
|
||||
### 5. Regional Endpoint
|
||||
- **US**: https://api-cdp.treasuredata.com
|
||||
- **EU**: https://api-cdp.eu01.treasuredata.com
|
||||
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com
|
||||
- **Japan**: https://api-cdp.treasuredata.co.jp
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Extract and Validate Keys (via unif-keys-extractor agent)
|
||||
I'll:
|
||||
- Use Treasure Data MCP tools to analyze table schemas
|
||||
- Extract user identifier columns (email, phone, td_client_id, etc.)
|
||||
- Query sample data to validate identifier patterns
|
||||
- Provide 3 SQL experts analysis of key relationships
|
||||
- Recommend priority ordering for unification keys
|
||||
- Exclude tables without user identifiers
|
||||
|
||||
### Step 2: Configuration Guidance
|
||||
I'll:
|
||||
- Explain canonical_id vs persistent_id concepts
|
||||
- Recommend best approach for your use case
|
||||
- Discuss incremental vs full refresh strategies
|
||||
- Help you understand regional endpoint requirements
|
||||
|
||||
### Step 3: Generate Prep Tables (via dynamic-prep-creation agent)
|
||||
I'll create:
|
||||
- `unification/dynmic_prep_creation.dig` - Prep workflow
|
||||
- `unification/queries/create_schema.sql` - Schema creation
|
||||
- `unification/queries/loop_on_tables.sql` - Dynamic loop logic
|
||||
- `unification/queries/unif_input_tbl.sql` - DSAR processing and data cleaning
|
||||
- `unification/config/environment.yml` - Client configuration
|
||||
- `unification/config/src_prep_params.yml` - Dynamic table mappings
|
||||
|
||||
### Step 4: Generate Core Unification (via id-unification-creator agent)
|
||||
I'll create:
|
||||
- `unification/config/unify.yml` - Unification configuration with keys and tables
|
||||
- `unification/id_unification.dig` - Core unification workflow with HTTP API call
|
||||
- Updated `unification/queries/create_schema.sql` - Schema with all required columns
|
||||
|
||||
### Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)
|
||||
I'll create:
|
||||
- `unification/config/stage_enrich.yml` - Enrichment configuration
|
||||
- `unification/enrich/queries/generate_join_query.sql` - Join query generation
|
||||
- `unification/enrich/queries/execute_join_presto.sql` - Presto execution
|
||||
- `unification/enrich/queries/execute_join_hive.sql` - Hive execution
|
||||
- `unification/enrich/queries/enrich_tbl_creation.sql` - Table creation
|
||||
- `unification/enrich_runner.dig` - Enrichment workflow
|
||||
|
||||
### Step 6: Create Main Orchestration
|
||||
I'll create:
|
||||
- `unification/unif_runner.dig` - Main workflow that calls:
|
||||
- prep_creation → id_unification → enrichment (in sequence)
|
||||
|
||||
### Step 7: ⚠️ MANDATORY VALIDATION (NEW!)
|
||||
**CRITICAL**: Before deployment, I MUST run comprehensive validation:
|
||||
- `/cdp-unification:unify-validate` command
|
||||
- Validates ALL files against exact templates
|
||||
- Checks database and table existence
|
||||
- Verifies configuration consistency
|
||||
- **BLOCKS deployment if ANY validation fails**
|
||||
|
||||
**If validation FAILS:**
|
||||
- I will show exact fix commands
|
||||
- You must fix all errors
|
||||
- Re-run validation until 100% pass
|
||||
- Only then proceed to deployment
|
||||
|
||||
**If validation PASSES:**
|
||||
- Proceed to deployment with confidence
|
||||
- All files are production-ready
|
||||
|
||||
### Step 8: Deployment Guidance
|
||||
I'll provide:
|
||||
- Configuration summary
|
||||
- Deployment instructions
|
||||
- Operating guidelines
|
||||
- Monitoring recommendations
|
||||
|
||||
---
|
||||
|
||||
## Interactive Workflow
|
||||
|
||||
I'll use the **TD Copilot communication pattern** throughout:
|
||||
|
||||
- **Question**: When I need your input or choice
|
||||
- **Suggestion**: When I recommend a specific approach
|
||||
- **Check Point**: When you should verify understanding
|
||||
|
||||
---
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Files Created (All under `unification/` directory):
|
||||
|
||||
**Workflows:**
|
||||
- `unif_runner.dig` - Main orchestration workflow
|
||||
- `dynmic_prep_creation.dig` - Prep table creation
|
||||
- `id_unification.dig` - Core unification
|
||||
- `enrich_runner.dig` - Staging enrichment
|
||||
|
||||
**Configuration:**
|
||||
- `config/environment.yml` - Client settings
|
||||
- `config/src_prep_params.yml` - Prep table mappings
|
||||
- `config/unify.yml` - Unification configuration
|
||||
- `config/stage_enrich.yml` - Enrichment configuration
|
||||
|
||||
**SQL Templates:**
|
||||
- `queries/create_schema.sql` - Schema creation
|
||||
- `queries/loop_on_tables.sql` - Dynamic loop logic
|
||||
- `queries/unif_input_tbl.sql` - DSAR and data cleaning
|
||||
- `enrich/queries/generate_join_query.sql` - Join generation
|
||||
- `enrich/queries/execute_join_presto.sql` - Presto execution
|
||||
- `enrich/queries/execute_join_hive.sql` - Hive execution
|
||||
- `enrich/queries/enrich_tbl_creation.sql` - Table creation
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
All generated files will:
|
||||
- ✅ Be TD-compliant and deployment-ready
|
||||
- ✅ Use exact templates from documentation
|
||||
- ✅ Include comprehensive error handling
|
||||
- ✅ Follow TD Copilot standards
|
||||
- ✅ Work without modification in Treasure Data
|
||||
- ✅ Support incremental processing
|
||||
- ✅ Include DSAR processing
|
||||
- ✅ Generate proper master tables
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
**Ready to begin?** Please provide:
|
||||
|
||||
1. Your table list (database.table_name format)
|
||||
2. Client short name
|
||||
3. Unification name
|
||||
|
||||
I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
I want to set up ID unification for:
|
||||
- analytics.user_events
|
||||
- crm.customers
|
||||
- web.pageviews
|
||||
|
||||
Client: mck
|
||||
Unification name: customer_360
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Let's get started!**
|
||||
194
commands/unify-validate.md
Normal file
194
commands/unify-validate.md
Normal file
@@ -0,0 +1,194 @@
|
||||
---
|
||||
name: unify-validate
|
||||
description: Validate all ID unification files against exact templates before deployment
|
||||
---
|
||||
|
||||
# ID Unification Validation Command
|
||||
|
||||
## Purpose
|
||||
|
||||
**MANDATORY validation gate** that checks ALL generated unification files against exact templates from agent prompts. This prevents deployment of incorrect configurations.
|
||||
|
||||
**⚠️ CRITICAL**: This command MUST complete successfully before `td wf push` or workflow execution.
|
||||
|
||||
---
|
||||
|
||||
## What This Command Validates
|
||||
|
||||
### 1. File Existence Check
|
||||
- ✅ `unification/unif_runner.dig` exists
|
||||
- ✅ `unification/dynmic_prep_creation.dig` exists
|
||||
- ✅ `unification/id_unification.dig` exists
|
||||
- ✅ `unification/enrich_runner.dig` exists
|
||||
- ✅ `unification/config/environment.yml` exists
|
||||
- ✅ `unification/config/src_prep_params.yml` exists
|
||||
- ✅ `unification/config/unify.yml` exists
|
||||
- ✅ `unification/config/stage_enrich.yml` exists
|
||||
- ✅ All SQL files in `unification/queries/` exist
|
||||
- ✅ All SQL files in `unification/enrich/queries/` exist
|
||||
|
||||
### 2. Template Compliance Check
|
||||
|
||||
**unif_runner.dig Validation:**
|
||||
- ✅ Uses `require>` operator (NOT `call>`)
|
||||
- ✅ No `echo>` operators with subtasks
|
||||
- ✅ Matches exact template from `/plugins/cdp-unification/prompt.md` lines 186-217
|
||||
- ✅ Has `_error:` section with email_alert
|
||||
- ✅ Includes both `config/environment.yml` and `config/src_prep_params.yml`
|
||||
|
||||
**stage_enrich.yml Validation:**
|
||||
- ✅ RULE 1: `unif_input` table has `column` and `key` both using `alias_as`
|
||||
- ✅ RULE 2: Staging tables have `column` using `col.name` and `key` using `alias_as`
|
||||
- ✅ All key_columns match actual columns from `src_prep_params.yml`
|
||||
- ✅ No template columns (like adobe_clickstream, loyalty_id_std)
|
||||
- ✅ Table names match `src_tbl` (NO _prep suffix)
|
||||
|
||||
**enrich_runner.dig Validation:**
|
||||
- ✅ Matches exact template from `unification-staging-enricher.md` lines 261-299
|
||||
- ✅ Includes all 3 config files in `_export`
|
||||
- ✅ Uses `td_for_each>` for dynamic execution
|
||||
- ✅ Has Presto and Hive conditional execution
|
||||
|
||||
### 3. Database & Table Existence Check
|
||||
- ✅ `${client_short_name}_${src}` database exists
|
||||
- ✅ `${client_short_name}_${stg}` database exists
|
||||
- ✅ `${client_short_name}_${gld}` database exists (if used)
|
||||
- ✅ `${client_short_name}_${lkup}` database exists
|
||||
- ✅ `cdp_unification_${unif_name}` database exists
|
||||
- ✅ `${client_short_name}_${lkup}.exclusion_list` table exists
|
||||
|
||||
### 4. Configuration Validation
|
||||
- ✅ All variables in `environment.yml` are defined
|
||||
- ✅ All tables in `src_prep_params.yml` exist in source database
|
||||
- ✅ All columns in `src_prep_params.yml` exist in source tables
|
||||
- ✅ `unify.yml` merge_by_keys match `src_prep_params.yml` alias_as columns
|
||||
- ✅ No undefined variables (${...})
|
||||
|
||||
### 5. YAML Syntax Check
|
||||
- ✅ All YAML files have valid syntax
|
||||
- ✅ Proper indentation (2 spaces)
|
||||
- ✅ No tabs in YAML files
|
||||
- ✅ All strings properly quoted where needed
|
||||
|
||||
---
|
||||
|
||||
## Validation Report Format
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════════╗
|
||||
║ ID UNIFICATION VALIDATION REPORT ║
|
||||
╚══════════════════════════════════════════════════════════════╝
|
||||
|
||||
[1/5] File Existence Check
|
||||
✅ unification/unif_runner.dig
|
||||
✅ unification/dynmic_prep_creation.dig
|
||||
✅ unification/id_unification.dig
|
||||
✅ unification/enrich_runner.dig
|
||||
✅ unification/config/environment.yml
|
||||
✅ unification/config/src_prep_params.yml
|
||||
✅ unification/config/unify.yml
|
||||
✅ unification/config/stage_enrich.yml
|
||||
✅ 3/3 SQL files in queries/
|
||||
✅ 4/4 SQL files in enrich/queries/
|
||||
|
||||
[2/5] Template Compliance Check
|
||||
✅ unif_runner.dig uses require> operator
|
||||
✅ unif_runner.dig has no echo> conflicts
|
||||
✅ stage_enrich.yml RULE 1 compliant (unif_input table)
|
||||
✅ stage_enrich.yml RULE 2 compliant (staging tables)
|
||||
❌ stage_enrich.yml has incorrect mapping on line 23
|
||||
Expected: column: email_address_std
|
||||
Found: column: email
|
||||
FIX: Update line 23 to use col.name from src_prep_params.yml
|
||||
|
||||
[3/5] Database & Table Existence
|
||||
✅ client_src exists
|
||||
✅ client_stg exists
|
||||
✅ client_gld exists
|
||||
✅ client_config exists
|
||||
❌ client_config.exclusion_list does NOT exist
|
||||
FIX: Run: td query -d client_config -t presto -w "CREATE TABLE IF NOT EXISTS exclusion_list (key_value VARCHAR, key_name VARCHAR, tbls ARRAY(VARCHAR), note VARCHAR)"
|
||||
|
||||
[4/5] Configuration Validation
|
||||
✅ All variables defined in environment.yml
|
||||
✅ Source table client_stg.snowflake_orders exists
|
||||
✅ All columns exist in source table
|
||||
✅ unify.yml keys match src_prep_params.yml
|
||||
|
||||
[5/5] YAML Syntax Check
|
||||
✅ All YAML files have valid syntax
|
||||
✅ Proper indentation
|
||||
✅ No tabs found
|
||||
|
||||
╔══════════════════════════════════════════════════════════════╗
|
||||
║ VALIDATION SUMMARY ║
|
||||
╚══════════════════════════════════════════════════════════════╝
|
||||
|
||||
Total Checks: 45
|
||||
Passed: 43 ✅
|
||||
Failed: 2 ❌
|
||||
|
||||
❌ VALIDATION FAILED - DO NOT DEPLOY
|
||||
|
||||
Required Actions:
|
||||
1. Fix stage_enrich.yml line 23 mapping
|
||||
2. Create client_config.exclusion_list table
|
||||
|
||||
Re-run validation after fixes: /cdp-unification:unify-validate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Codes
|
||||
|
||||
- **EXIT 0**: All validations passed ✅
|
||||
- **EXIT 1**: File existence failures
|
||||
- **EXIT 2**: Template compliance failures
|
||||
- **EXIT 3**: Database/table missing
|
||||
- **EXIT 4**: Configuration errors
|
||||
- **EXIT 5**: YAML syntax errors
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
**Standalone:**
|
||||
```
|
||||
/cdp-unification:unify-validate
|
||||
```
|
||||
|
||||
**Auto-triggered in unify-setup** (MANDATORY step before deployment)
|
||||
|
||||
**Manual validation before deployment:**
|
||||
```
|
||||
cd unification
|
||||
/cdp-unification:unify-validate
|
||||
```
|
||||
|
||||
If validation PASSES → Proceed with `td wf push unification`
|
||||
If validation FAILS → Fix errors and re-validate
|
||||
|
||||
---
|
||||
|
||||
## Integration with unify-setup
|
||||
|
||||
The `/unify-setup` command will automatically:
|
||||
1. Generate all unification files
|
||||
2. **RUN VALIDATION** (this command)
|
||||
3. **BLOCK deployment** if validation fails
|
||||
4. **Show fix instructions** for each error
|
||||
5. **Auto-retry validation** after fixes
|
||||
6. Only proceed to deployment after 100% validation success
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ **ALL checks must pass** before deployment is allowed
|
||||
✅ **No exceptions** - even 1 failure blocks deployment
|
||||
✅ **Detailed error messages** with exact fix instructions
|
||||
✅ **Auto-remediation suggestions** where possible
|
||||
|
||||
---
|
||||
|
||||
**Let's validate your unification files!**
|
||||
Reference in New Issue
Block a user