Files
gh-treasure-data-aps-claude…/commands/unify-setup.md
2025-11-30 09:02:49 +08:00

6.7 KiB

name, description
name description
unify-setup Complete end-to-end ID unification setup from table analysis to deployment

Complete ID Unification Setup

Overview

I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:

  1. Extract and validate user identifiers from your tables
  2. Help you choose the right ID method (canonical_id vs persistent_id)
  3. Generate prep table configurations for data standardization
  4. Create core unification files (unify.yml and id_unification.dig)
  5. Set up staging enrichment for post-unification processing
  6. Create orchestration workflow (unif_runner.dig) to run everything in sequence

What You Need to Provide

1. Table List

Please provide the list of tables you want to include in ID unification:

  • Format: database.table_name (e.g., analytics.user_events, crm.customers)
  • I'll analyze each table using Treasure Data MCP tools to extract user identifiers

2. Client Configuration

  • Client short name: Your client identifier (e.g., mck, client)
  • Unification name: Name for this unification project (e.g., claude, customer_360)
  • Lookup/Config database suffix: (default: config)
    • Creates database: ${client_short_name}_${lookup_suffix} (e.g., client_config)
    • ⚠️ I WILL CREATE THIS DATABASE if it doesn't exist

3. ID Method Selection

I'll explain the options and help you choose:

  • persistent_id: Stable IDs that persist across updates (recommended for most cases)
  • canonical_id: Traditional approach with merge capabilities

4. Update Strategy

  • Incremental: Process only new/updated records
  • Full Refresh: Reprocess all data each time

5. Regional Endpoint


What I'll Do

Step 1: Extract and Validate Keys (via unif-keys-extractor agent)

I'll:

  • Use Treasure Data MCP tools to analyze table schemas
  • Extract user identifier columns (email, phone, td_client_id, etc.)
  • Query sample data to validate identifier patterns
  • Provide 3 SQL experts analysis of key relationships
  • Recommend priority ordering for unification keys
  • Exclude tables without user identifiers

Step 2: Configuration Guidance

I'll:

  • Explain canonical_id vs persistent_id concepts
  • Recommend best approach for your use case
  • Discuss incremental vs full refresh strategies
  • Help you understand regional endpoint requirements

Step 3: Generate Prep Tables (via dynamic-prep-creation agent)

I'll create:

  • unification/dynmic_prep_creation.dig - Prep workflow
  • unification/queries/create_schema.sql - Schema creation
  • unification/queries/loop_on_tables.sql - Dynamic loop logic
  • unification/queries/unif_input_tbl.sql - DSAR processing and data cleaning
  • unification/config/environment.yml - Client configuration
  • unification/config/src_prep_params.yml - Dynamic table mappings

Step 4: Generate Core Unification (via id-unification-creator agent)

I'll create:

  • unification/config/unify.yml - Unification configuration with keys and tables
  • unification/id_unification.dig - Core unification workflow with HTTP API call
  • Updated unification/queries/create_schema.sql - Schema with all required columns

Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)

I'll create:

  • unification/config/stage_enrich.yml - Enrichment configuration
  • unification/enrich/queries/generate_join_query.sql - Join query generation
  • unification/enrich/queries/execute_join_presto.sql - Presto execution
  • unification/enrich/queries/execute_join_hive.sql - Hive execution
  • unification/enrich/queries/enrich_tbl_creation.sql - Table creation
  • unification/enrich_runner.dig - Enrichment workflow

Step 6: Create Main Orchestration

I'll create:

  • unification/unif_runner.dig - Main workflow that calls:
    • prep_creation → id_unification → enrichment (in sequence)

Step 7: ⚠️ MANDATORY VALIDATION (NEW!)

CRITICAL: Before deployment, I MUST run comprehensive validation:

  • /cdp-unification:unify-validate command
  • Validates ALL files against exact templates
  • Checks database and table existence
  • Verifies configuration consistency
  • BLOCKS deployment if ANY validation fails

If validation FAILS:

  • I will show exact fix commands
  • You must fix all errors
  • Re-run validation until 100% pass
  • Only then proceed to deployment

If validation PASSES:

  • Proceed to deployment with confidence
  • All files are production-ready

Step 8: Deployment Guidance

I'll provide:

  • Configuration summary
  • Deployment instructions
  • Operating guidelines
  • Monitoring recommendations

Interactive Workflow

I'll use the TD Copilot communication pattern throughout:

  • Question: When I need your input or choice
  • Suggestion: When I recommend a specific approach
  • Check Point: When you should verify understanding

Expected Output

Files Created (All under unification/ directory):

Workflows:

  • unif_runner.dig - Main orchestration workflow
  • dynmic_prep_creation.dig - Prep table creation
  • id_unification.dig - Core unification
  • enrich_runner.dig - Staging enrichment

Configuration:

  • config/environment.yml - Client settings
  • config/src_prep_params.yml - Prep table mappings
  • config/unify.yml - Unification configuration
  • config/stage_enrich.yml - Enrichment configuration

SQL Templates:

  • queries/create_schema.sql - Schema creation
  • queries/loop_on_tables.sql - Dynamic loop logic
  • queries/unif_input_tbl.sql - DSAR and data cleaning
  • enrich/queries/generate_join_query.sql - Join generation
  • enrich/queries/execute_join_presto.sql - Presto execution
  • enrich/queries/execute_join_hive.sql - Hive execution
  • enrich/queries/enrich_tbl_creation.sql - Table creation

Success Criteria

All generated files will:

  • Be TD-compliant and deployment-ready
  • Use exact templates from documentation
  • Include comprehensive error handling
  • Follow TD Copilot standards
  • Work without modification in Treasure Data
  • Support incremental processing
  • Include DSAR processing
  • Generate proper master tables

Getting Started

Ready to begin? Please provide:

  1. Your table list (database.table_name format)
  2. Client short name
  3. Unification name

I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.

Example:

I want to set up ID unification for:
- analytics.user_events
- crm.customers
- web.pageviews

Client: mck
Unification name: customer_360

Let's get started!