Files
gh-treasure-data-aps-claude…/commands/unify-extract-keys.md
2025-11-30 09:02:49 +08:00

6.6 KiB

name, description
name description
unify-extract-keys Extract and validate user identifier columns from tables using live Treasure Data analysis

Extract and Validate User Identifiers

Overview

I'll analyze your Treasure Data tables to extract and validate user identifier columns using the unif-keys-extractor specialized agent.

This command performs ZERO-TOLERANCE identifier extraction:

  • NO GUESSING - Only uses real Treasure Data MCP tools
  • NO ASSUMPTIONS - Every table is analyzed with live data
  • STRICT VALIDATION - Only includes tables with actual user identifiers
  • COMPREHENSIVE ANALYSIS - 3 SQL experts review and priority recommendations

What You Need to Provide

Table List

Provide the tables you want to analyze for ID unification:

  • Format: database.table_name
  • Example: analytics.user_events, crm.customers, web.pageviews

What I'll Do

Step 1: Schema Extraction (MANDATORY)

For each table, I'll:

  • Call mcp__mcc_treasuredata__describe_table(table, database)
  • Extract EXACT column names and data types
  • Identify tables that are inaccessible

Step 2: User Identifier Detection (STRICT MATCHING)

I'll scan for valid user identifier columns:

VALID USER IDENTIFIERS:

  • Email columns: email, email_std, email_address, user_email, customer_email
  • Phone columns: phone, phone_std, phone_number, mobile_phone, customer_phone
  • User ID columns: user_id, customer_id, account_id, member_id, uid, user_uuid
  • Identity columns: profile_id, identity_id, cognito_identity_userid
  • Cookie/Device IDs: td_client_id, td_global_id, td_ssc_id, cookie_id, device_id

NOT USER IDENTIFIERS (EXCLUDED):

  • System columns: id, created_at, updated_at, load_timestamp
  • Campaign columns: campaign_id, message_id
  • Product columns: product_id, sku, variant_id
  • Complex types: array, map, json columns

Step 3: Exclusion Validation (CRITICAL)

For tables WITHOUT user identifiers, I'll:

  • Document the exclusion reason
  • List available columns for transparency
  • Explain why the table doesn't qualify

Step 4: Min/Max Data Analysis (INCLUDED TABLES ONLY)

For tables WITH user identifiers, I'll:

  • Query actual data: SELECT MIN(column), MAX(column) FROM table
  • Validate data patterns and formats
  • Assess data quality

Step 5: 3 SQL Experts Analysis

I'll provide structured analysis from three perspectives:

  1. Data Pattern Analyst: Reviews actual min/max values and data quality
  2. Cross-Table Relationship Analyst: Maps identifier relationships across tables
  3. Priority Assessment Specialist: Ranks identifiers by stability and coverage

Step 6: Priority Recommendations

I'll provide:

  • Recommended priority ordering (TD standard)
  • Reasoning for each recommendation
  • Compatibility assessment across tables

Expected Output

Key Extraction Results Table

| database_name | table_name | column_name | data_type | identifier_type | min_value | max_value |
|---------------|------------|-------------|-----------|-----------------|-----------|-----------|
| analytics     | user_events| user_email  | varchar   | email           | a@test.com| z@test.com|
| analytics     | user_events| td_client_id| varchar   | cookie_id       | 00000000-.| ffffffff-.|
| crm           | customers  | email       | varchar   | email           | admin@... | user@...  |

Exclusion Documentation

## Tables EXCLUDED from ID Unification:

- **analytics.product_catalog**: No user identifier columns found
  - Available columns: [product_id, sku, product_name, category, price]
  - Exclusion reason: Contains only product metadata - no PII
  - Classification: Non-PII table

Validation Summary

## Analysis Summary:
- **Tables Analyzed**: 5
- **Tables INCLUDED**: 3 (contain user identifiers)
- **Tables EXCLUDED**: 2 (no user identifiers)
- **User Identifier Columns Found**: 8

3 SQL Experts Analysis

**Expert 1 - Data Pattern Analyst:**
- Email columns show valid format patterns across 2 tables
- td_client_id shows UUID format with good coverage
- Data quality: High (95%+ non-null for email)

**Expert 2 - Cross-Table Relationship Analyst:**
- Email appears in analytics.user_events and crm.customers (primary link)
- td_client_id unique to analytics.user_events (secondary ID)
- Recommendation: Email as primary key for unification

**Expert 3 - Priority Assessment Specialist:**
- Priority 1: email (stable, cross-table presence, good coverage)
- Priority 2: td_client_id (system-generated, analytics-specific)
- Recommended merge_by_keys: [email, td_client_id]

Priority Recommendations (TD Standard)

Recommended Priority Order (TD Standard):
1. email - Stable identifier across multiple tables with high coverage
2. td_client_id - System-generated ID for analytics tracking
3. phone - Secondary contact identifier (if available)

EXCLUDED Identifiers (Not User-Related):
- product_id - Product reference, not user identifier
- campaign_id - Campaign metadata, not user-specific

Validation Gates

I'll pass through these mandatory validation gates:

  • GATE 1: Schema extracted for all accessible tables
  • GATE 2: Tables classified into INCLUSION/EXCLUSION lists
  • GATE 3: All exclusions justified and documented
  • GATE 4: Real data analysis completed for included columns
  • GATE 5: 3 SQL experts analysis completed
  • GATE 6: Priority recommendations provided

Next Steps

After key extraction, you can:

  1. Proceed with full setup: Use /cdp-unification:unify-setup to continue with complete configuration
  2. Create prep tables: Use /cdp-unification:unify-create-prep with the extracted keys
  3. Review and adjust: Discuss the results and make adjustments to table selection

Communication Pattern

I'll use TD Copilot standard format:

Question: Are these extracted user identifiers sufficient for your ID unification requirements?

Suggestion: I recommend using email as your primary unification key since it appears across multiple tables with good data quality.

Check Point: The analysis shows X tables with user identifiers and Y tables excluded. This provides comprehensive coverage for customer identity resolution.


Getting Started

Ready to extract user identifiers? Please provide your table list:

Example:

Please analyze these tables for ID unification:
- analytics.user_events
- crm.customers
- web.pageviews
- marketing.campaigns

I'll call the unif-keys-extractor agent to perform comprehensive analysis with ZERO-TOLERANCE validation.


Let's begin the analysis!