Initial commit
This commit is contained in:
191
commands/unify-extract-keys.md
Normal file
191
commands/unify-extract-keys.md
Normal file
@@ -0,0 +1,191 @@
|
||||
---
|
||||
name: unify-extract-keys
|
||||
description: Extract and validate user identifier columns from tables using live Treasure Data analysis
|
||||
---
|
||||
|
||||
# Extract and Validate User Identifiers
|
||||
|
||||
## Overview
|
||||
|
||||
I'll analyze your Treasure Data tables to extract and validate user identifier columns using the **unif-keys-extractor** specialized agent.
|
||||
|
||||
This command performs **ZERO-TOLERANCE** identifier extraction:
|
||||
- ❌ **NO GUESSING** - Only uses real Treasure Data MCP tools
|
||||
- ❌ **NO ASSUMPTIONS** - Every table is analyzed with live data
|
||||
- ✅ **STRICT VALIDATION** - Only includes tables with actual user identifiers
|
||||
- ✅ **COMPREHENSIVE ANALYSIS** - 3 SQL experts review and priority recommendations
|
||||
|
||||
---
|
||||
|
||||
## What You Need to Provide
|
||||
|
||||
### Table List
|
||||
Provide the tables you want to analyze for ID unification:
|
||||
- **Format**: `database.table_name`
|
||||
- **Example**: `analytics.user_events`, `crm.customers`, `web.pageviews`
|
||||
|
||||
---
|
||||
|
||||
## What I'll Do
|
||||
|
||||
### Step 1: Schema Extraction (MANDATORY)
|
||||
For each table, I'll:
|
||||
- Call `mcp__mcc_treasuredata__describe_table(table, database)`
|
||||
- Extract EXACT column names and data types
|
||||
- Identify tables that are inaccessible
|
||||
|
||||
### Step 2: User Identifier Detection (STRICT MATCHING)
|
||||
I'll scan for valid user identifier columns:
|
||||
|
||||
**✅ VALID USER IDENTIFIERS:**
|
||||
- **Email columns**: email, email_std, email_address, user_email, customer_email
|
||||
- **Phone columns**: phone, phone_std, phone_number, mobile_phone, customer_phone
|
||||
- **User ID columns**: user_id, customer_id, account_id, member_id, uid, user_uuid
|
||||
- **Identity columns**: profile_id, identity_id, cognito_identity_userid
|
||||
- **Cookie/Device IDs**: td_client_id, td_global_id, td_ssc_id, cookie_id, device_id
|
||||
|
||||
**❌ NOT USER IDENTIFIERS (EXCLUDED):**
|
||||
- System columns: id, created_at, updated_at, load_timestamp
|
||||
- Campaign columns: campaign_id, message_id
|
||||
- Product columns: product_id, sku, variant_id
|
||||
- Complex types: array, map, json columns
|
||||
|
||||
### Step 3: Exclusion Validation (CRITICAL)
|
||||
For tables WITHOUT user identifiers, I'll:
|
||||
- Document the exclusion reason
|
||||
- List available columns for transparency
|
||||
- Explain why the table doesn't qualify
|
||||
|
||||
### Step 4: Min/Max Data Analysis (INCLUDED TABLES ONLY)
|
||||
For tables WITH user identifiers, I'll:
|
||||
- Query actual data: `SELECT MIN(column), MAX(column) FROM table`
|
||||
- Validate data patterns and formats
|
||||
- Assess data quality
|
||||
|
||||
### Step 5: 3 SQL Experts Analysis
|
||||
I'll provide structured analysis from three perspectives:
|
||||
1. **Data Pattern Analyst**: Reviews actual min/max values and data quality
|
||||
2. **Cross-Table Relationship Analyst**: Maps identifier relationships across tables
|
||||
3. **Priority Assessment Specialist**: Ranks identifiers by stability and coverage
|
||||
|
||||
### Step 6: Priority Recommendations
|
||||
I'll provide:
|
||||
- Recommended priority ordering (TD standard)
|
||||
- Reasoning for each recommendation
|
||||
- Compatibility assessment across tables
|
||||
|
||||
---
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Key Extraction Results Table
|
||||
```
|
||||
| database_name | table_name | column_name | data_type | identifier_type | min_value | max_value |
|
||||
|---------------|------------|-------------|-----------|-----------------|-----------|-----------|
|
||||
| analytics | user_events| user_email | varchar | email | a@test.com| z@test.com|
|
||||
| analytics | user_events| td_client_id| varchar | cookie_id | 00000000-.| ffffffff-.|
|
||||
| crm | customers | email | varchar | email | admin@... | user@... |
|
||||
```
|
||||
|
||||
### Exclusion Documentation
|
||||
```
|
||||
## Tables EXCLUDED from ID Unification:
|
||||
|
||||
- **analytics.product_catalog**: No user identifier columns found
|
||||
- Available columns: [product_id, sku, product_name, category, price]
|
||||
- Exclusion reason: Contains only product metadata - no PII
|
||||
- Classification: Non-PII table
|
||||
```
|
||||
|
||||
### Validation Summary
|
||||
```
|
||||
## Analysis Summary:
|
||||
- **Tables Analyzed**: 5
|
||||
- **Tables INCLUDED**: 3 (contain user identifiers)
|
||||
- **Tables EXCLUDED**: 2 (no user identifiers)
|
||||
- **User Identifier Columns Found**: 8
|
||||
```
|
||||
|
||||
### 3 SQL Experts Analysis
|
||||
```
|
||||
**Expert 1 - Data Pattern Analyst:**
|
||||
- Email columns show valid format patterns across 2 tables
|
||||
- td_client_id shows UUID format with good coverage
|
||||
- Data quality: High (95%+ non-null for email)
|
||||
|
||||
**Expert 2 - Cross-Table Relationship Analyst:**
|
||||
- Email appears in analytics.user_events and crm.customers (primary link)
|
||||
- td_client_id unique to analytics.user_events (secondary ID)
|
||||
- Recommendation: Email as primary key for unification
|
||||
|
||||
**Expert 3 - Priority Assessment Specialist:**
|
||||
- Priority 1: email (stable, cross-table presence, good coverage)
|
||||
- Priority 2: td_client_id (system-generated, analytics-specific)
|
||||
- Recommended merge_by_keys: [email, td_client_id]
|
||||
```
|
||||
|
||||
### Priority Recommendations (TD Standard)
|
||||
```
|
||||
Recommended Priority Order (TD Standard):
|
||||
1. email - Stable identifier across multiple tables with high coverage
|
||||
2. td_client_id - System-generated ID for analytics tracking
|
||||
3. phone - Secondary contact identifier (if available)
|
||||
|
||||
EXCLUDED Identifiers (Not User-Related):
|
||||
- product_id - Product reference, not user identifier
|
||||
- campaign_id - Campaign metadata, not user-specific
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Gates
|
||||
|
||||
I'll pass through these mandatory validation gates:
|
||||
- ✅ **GATE 1**: Schema extracted for all accessible tables
|
||||
- ✅ **GATE 2**: Tables classified into INCLUSION/EXCLUSION lists
|
||||
- ✅ **GATE 3**: All exclusions justified and documented
|
||||
- ✅ **GATE 4**: Real data analysis completed for included columns
|
||||
- ✅ **GATE 5**: 3 SQL experts analysis completed
|
||||
- ✅ **GATE 6**: Priority recommendations provided
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After key extraction, you can:
|
||||
1. **Proceed with full setup**: Use `/cdp-unification:unify-setup` to continue with complete configuration
|
||||
2. **Create prep tables**: Use `/cdp-unification:unify-create-prep` with the extracted keys
|
||||
3. **Review and adjust**: Discuss the results and make adjustments to table selection
|
||||
|
||||
---
|
||||
|
||||
## Communication Pattern
|
||||
|
||||
I'll use **TD Copilot standard format**:
|
||||
|
||||
**Question**: Are these extracted user identifiers sufficient for your ID unification requirements?
|
||||
|
||||
**Suggestion**: I recommend using **email** as your primary unification key since it appears across multiple tables with good data quality.
|
||||
|
||||
**Check Point**: The analysis shows X tables with user identifiers and Y tables excluded. This provides comprehensive coverage for customer identity resolution.
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
**Ready to extract user identifiers?** Please provide your table list:
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Please analyze these tables for ID unification:
|
||||
- analytics.user_events
|
||||
- crm.customers
|
||||
- web.pageviews
|
||||
- marketing.campaigns
|
||||
```
|
||||
|
||||
I'll call the **unif-keys-extractor** agent to perform comprehensive analysis with ZERO-TOLERANCE validation.
|
||||
|
||||
---
|
||||
|
||||
**Let's begin the analysis!**
|
||||
Reference in New Issue
Block a user