6.6 KiB
name, description
| name | description |
|---|---|
| unify-extract-keys | Extract and validate user identifier columns from tables using live Treasure Data analysis |
Extract and Validate User Identifiers
Overview
I'll analyze your Treasure Data tables to extract and validate user identifier columns using the unif-keys-extractor specialized agent.
This command performs ZERO-TOLERANCE identifier extraction:
- ❌ NO GUESSING - Only uses real Treasure Data MCP tools
- ❌ NO ASSUMPTIONS - Every table is analyzed with live data
- ✅ STRICT VALIDATION - Only includes tables with actual user identifiers
- ✅ COMPREHENSIVE ANALYSIS - 3 SQL experts review and priority recommendations
What You Need to Provide
Table List
Provide the tables you want to analyze for ID unification:
- Format:
database.table_name - Example:
analytics.user_events,crm.customers,web.pageviews
What I'll Do
Step 1: Schema Extraction (MANDATORY)
For each table, I'll:
- Call
mcp__mcc_treasuredata__describe_table(table, database) - Extract EXACT column names and data types
- Identify tables that are inaccessible
Step 2: User Identifier Detection (STRICT MATCHING)
I'll scan for valid user identifier columns:
✅ VALID USER IDENTIFIERS:
- Email columns: email, email_std, email_address, user_email, customer_email
- Phone columns: phone, phone_std, phone_number, mobile_phone, customer_phone
- User ID columns: user_id, customer_id, account_id, member_id, uid, user_uuid
- Identity columns: profile_id, identity_id, cognito_identity_userid
- Cookie/Device IDs: td_client_id, td_global_id, td_ssc_id, cookie_id, device_id
❌ NOT USER IDENTIFIERS (EXCLUDED):
- System columns: id, created_at, updated_at, load_timestamp
- Campaign columns: campaign_id, message_id
- Product columns: product_id, sku, variant_id
- Complex types: array, map, json columns
Step 3: Exclusion Validation (CRITICAL)
For tables WITHOUT user identifiers, I'll:
- Document the exclusion reason
- List available columns for transparency
- Explain why the table doesn't qualify
Step 4: Min/Max Data Analysis (INCLUDED TABLES ONLY)
For tables WITH user identifiers, I'll:
- Query actual data:
SELECT MIN(column), MAX(column) FROM table - Validate data patterns and formats
- Assess data quality
Step 5: 3 SQL Experts Analysis
I'll provide structured analysis from three perspectives:
- Data Pattern Analyst: Reviews actual min/max values and data quality
- Cross-Table Relationship Analyst: Maps identifier relationships across tables
- Priority Assessment Specialist: Ranks identifiers by stability and coverage
Step 6: Priority Recommendations
I'll provide:
- Recommended priority ordering (TD standard)
- Reasoning for each recommendation
- Compatibility assessment across tables
Expected Output
Key Extraction Results Table
| database_name | table_name | column_name | data_type | identifier_type | min_value | max_value |
|---------------|------------|-------------|-----------|-----------------|-----------|-----------|
| analytics | user_events| user_email | varchar | email | a@test.com| z@test.com|
| analytics | user_events| td_client_id| varchar | cookie_id | 00000000-.| ffffffff-.|
| crm | customers | email | varchar | email | admin@... | user@... |
Exclusion Documentation
## Tables EXCLUDED from ID Unification:
- **analytics.product_catalog**: No user identifier columns found
- Available columns: [product_id, sku, product_name, category, price]
- Exclusion reason: Contains only product metadata - no PII
- Classification: Non-PII table
Validation Summary
## Analysis Summary:
- **Tables Analyzed**: 5
- **Tables INCLUDED**: 3 (contain user identifiers)
- **Tables EXCLUDED**: 2 (no user identifiers)
- **User Identifier Columns Found**: 8
3 SQL Experts Analysis
**Expert 1 - Data Pattern Analyst:**
- Email columns show valid format patterns across 2 tables
- td_client_id shows UUID format with good coverage
- Data quality: High (95%+ non-null for email)
**Expert 2 - Cross-Table Relationship Analyst:**
- Email appears in analytics.user_events and crm.customers (primary link)
- td_client_id unique to analytics.user_events (secondary ID)
- Recommendation: Email as primary key for unification
**Expert 3 - Priority Assessment Specialist:**
- Priority 1: email (stable, cross-table presence, good coverage)
- Priority 2: td_client_id (system-generated, analytics-specific)
- Recommended merge_by_keys: [email, td_client_id]
Priority Recommendations (TD Standard)
Recommended Priority Order (TD Standard):
1. email - Stable identifier across multiple tables with high coverage
2. td_client_id - System-generated ID for analytics tracking
3. phone - Secondary contact identifier (if available)
EXCLUDED Identifiers (Not User-Related):
- product_id - Product reference, not user identifier
- campaign_id - Campaign metadata, not user-specific
Validation Gates
I'll pass through these mandatory validation gates:
- ✅ GATE 1: Schema extracted for all accessible tables
- ✅ GATE 2: Tables classified into INCLUSION/EXCLUSION lists
- ✅ GATE 3: All exclusions justified and documented
- ✅ GATE 4: Real data analysis completed for included columns
- ✅ GATE 5: 3 SQL experts analysis completed
- ✅ GATE 6: Priority recommendations provided
Next Steps
After key extraction, you can:
- Proceed with full setup: Use
/cdp-unification:unify-setupto continue with complete configuration - Create prep tables: Use
/cdp-unification:unify-create-prepwith the extracted keys - Review and adjust: Discuss the results and make adjustments to table selection
Communication Pattern
I'll use TD Copilot standard format:
Question: Are these extracted user identifiers sufficient for your ID unification requirements?
Suggestion: I recommend using email as your primary unification key since it appears across multiple tables with good data quality.
Check Point: The analysis shows X tables with user identifiers and Y tables excluded. This provides comprehensive coverage for customer identity resolution.
Getting Started
Ready to extract user identifiers? Please provide your table list:
Example:
Please analyze these tables for ID unification:
- analytics.user_events
- crm.customers
- web.pageviews
- marketing.campaigns
I'll call the unif-keys-extractor agent to perform comprehensive analysis with ZERO-TOLERANCE validation.
Let's begin the analysis!