zhongwei/gh-treasure-data-aps-claude-tools-plugins-cdp-unification

Files

Zhongwei Li 1c95d6eb21 Initial commit

2025-11-30 09:02:49 +08:00

6.7 KiB

Raw Blame History

name, description

name	description
unify-setup	Complete end-to-end ID unification setup from table analysis to deployment

Complete ID Unification Setup

Overview

I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:

Extract and validate user identifiers from your tables
Help you choose the right ID method (canonical_id vs persistent_id)
Generate prep table configurations for data standardization
Create core unification files (unify.yml and id_unification.dig)
Set up staging enrichment for post-unification processing
Create orchestration workflow (unif_runner.dig) to run everything in sequence

What You Need to Provide

1. Table List

Please provide the list of tables you want to include in ID unification:

Format: database.table_name (e.g., analytics.user_events, crm.customers)
I'll analyze each table using Treasure Data MCP tools to extract user identifiers

2. Client Configuration

Client short name: Your client identifier (e.g., mck, client)
Unification name: Name for this unification project (e.g., claude, customer_360)
Lookup/Config database suffix: (default: config)
- Creates database: ${client_short_name}_${lookup_suffix} (e.g., client_config)
- ⚠️ I WILL CREATE THIS DATABASE if it doesn't exist

3. ID Method Selection

I'll explain the options and help you choose:

persistent_id: Stable IDs that persist across updates (recommended for most cases)
canonical_id: Traditional approach with merge capabilities

4. Update Strategy

Incremental: Process only new/updated records
Full Refresh: Reprocess all data each time

5. Regional Endpoint

US: https://api-cdp.treasuredata.com
EU: https://api-cdp.eu01.treasuredata.com
Asia Pacific: https://api-cdp.ap02.treasuredata.com
Japan: https://api-cdp.treasuredata.co.jp

What I'll Do

Step 1: Extract and Validate Keys (via unif-keys-extractor agent)

I'll:

Use Treasure Data MCP tools to analyze table schemas
Extract user identifier columns (email, phone, td_client_id, etc.)
Query sample data to validate identifier patterns
Provide 3 SQL experts analysis of key relationships
Recommend priority ordering for unification keys
Exclude tables without user identifiers

Step 2: Configuration Guidance

I'll:

Explain canonical_id vs persistent_id concepts
Recommend best approach for your use case
Discuss incremental vs full refresh strategies
Help you understand regional endpoint requirements

Step 3: Generate Prep Tables (via dynamic-prep-creation agent)

I'll create:

unification/dynmic_prep_creation.dig - Prep workflow
unification/queries/create_schema.sql - Schema creation
unification/queries/loop_on_tables.sql - Dynamic loop logic
unification/queries/unif_input_tbl.sql - DSAR processing and data cleaning
unification/config/environment.yml - Client configuration
unification/config/src_prep_params.yml - Dynamic table mappings

Step 4: Generate Core Unification (via id-unification-creator agent)

I'll create:

unification/config/unify.yml - Unification configuration with keys and tables
unification/id_unification.dig - Core unification workflow with HTTP API call
Updated unification/queries/create_schema.sql - Schema with all required columns

Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)

I'll create:

unification/config/stage_enrich.yml - Enrichment configuration
unification/enrich/queries/generate_join_query.sql - Join query generation
unification/enrich/queries/execute_join_presto.sql - Presto execution
unification/enrich/queries/execute_join_hive.sql - Hive execution
unification/enrich/queries/enrich_tbl_creation.sql - Table creation
unification/enrich_runner.dig - Enrichment workflow

Step 6: Create Main Orchestration

I'll create:

unification/unif_runner.dig - Main workflow that calls:
- prep_creation → id_unification → enrichment (in sequence)

Step 7: ⚠️ MANDATORY VALIDATION (NEW!)

CRITICAL: Before deployment, I MUST run comprehensive validation:

/cdp-unification:unify-validate command
Validates ALL files against exact templates
Checks database and table existence
Verifies configuration consistency
BLOCKS deployment if ANY validation fails

If validation FAILS:

I will show exact fix commands
You must fix all errors
Re-run validation until 100% pass
Only then proceed to deployment

If validation PASSES:

Proceed to deployment with confidence
All files are production-ready

Step 8: Deployment Guidance

I'll provide:

Configuration summary
Deployment instructions
Operating guidelines
Monitoring recommendations

Interactive Workflow

I'll use the TD Copilot communication pattern throughout:

Question: When I need your input or choice
Suggestion: When I recommend a specific approach
Check Point: When you should verify understanding

Expected Output

Files Created (All under `unification/` directory):

Workflows:

unif_runner.dig - Main orchestration workflow
dynmic_prep_creation.dig - Prep table creation
id_unification.dig - Core unification
enrich_runner.dig - Staging enrichment

Configuration:

config/environment.yml - Client settings
config/src_prep_params.yml - Prep table mappings
config/unify.yml - Unification configuration
config/stage_enrich.yml - Enrichment configuration

SQL Templates:

queries/create_schema.sql - Schema creation
queries/loop_on_tables.sql - Dynamic loop logic
queries/unif_input_tbl.sql - DSAR and data cleaning
enrich/queries/generate_join_query.sql - Join generation
enrich/queries/execute_join_presto.sql - Presto execution
enrich/queries/execute_join_hive.sql - Hive execution
enrich/queries/enrich_tbl_creation.sql - Table creation

Success Criteria

All generated files will:

✅ Be TD-compliant and deployment-ready
✅ Use exact templates from documentation
✅ Include comprehensive error handling
✅ Follow TD Copilot standards
✅ Work without modification in Treasure Data
✅ Support incremental processing
✅ Include DSAR processing
✅ Generate proper master tables

Getting Started

Ready to begin? Please provide:

Your table list (database.table_name format)
Client short name
Unification name

I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.

Example:

I want to set up ID unification for:
- analytics.user_events
- crm.customers
- web.pageviews

Client: mck
Unification name: customer_360

Let's get started!

6.7 KiB Raw Blame History

Complete ID Unification Setup

Overview

What You Need to Provide

1. Table List

2. Client Configuration

3. ID Method Selection

4. Update Strategy

5. Regional Endpoint

What I'll Do

Step 1: Extract and Validate Keys (via unif-keys-extractor agent)

Step 2: Configuration Guidance

Step 3: Generate Prep Tables (via dynamic-prep-creation agent)

Step 4: Generate Core Unification (via id-unification-creator agent)

Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)

Step 6: Create Main Orchestration

Step 7: ⚠️ MANDATORY VALIDATION (NEW!)

Step 8: Deployment Guidance

Interactive Workflow

Expected Output

Files Created (All under unification/ directory):

Success Criteria

Getting Started

6.7 KiB

Raw Blame History

Files Created (All under `unification/` directory):