201 lines
6.7 KiB
Markdown
201 lines
6.7 KiB
Markdown
---
|
|
name: unify-setup
|
|
description: Complete end-to-end ID unification setup from table analysis to deployment
|
|
---
|
|
|
|
# Complete ID Unification Setup
|
|
|
|
## Overview
|
|
|
|
I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:
|
|
|
|
1. **Extract and validate user identifiers** from your tables
|
|
2. **Help you choose the right ID method** (canonical_id vs persistent_id)
|
|
3. **Generate prep table configurations** for data standardization
|
|
4. **Create core unification files** (unify.yml and id_unification.dig)
|
|
5. **Set up staging enrichment** for post-unification processing
|
|
6. **Create orchestration workflow** (unif_runner.dig) to run everything in sequence
|
|
|
|
---
|
|
|
|
## What You Need to Provide
|
|
|
|
### 1. Table List
|
|
Please provide the list of tables you want to include in ID unification:
|
|
- Format: `database.table_name` (e.g., `analytics.user_events`, `crm.customers`)
|
|
- I'll analyze each table using Treasure Data MCP tools to extract user identifiers
|
|
|
|
### 2. Client Configuration
|
|
- **Client short name**: Your client identifier (e.g., `mck`, `client`)
|
|
- **Unification name**: Name for this unification project (e.g., `claude`, `customer_360`)
|
|
- **Lookup/Config database suffix**: (default: `config`)
|
|
- Creates database: `${client_short_name}_${lookup_suffix}` (e.g., `client_config`)
|
|
- ⚠️ **I WILL CREATE THIS DATABASE** if it doesn't exist
|
|
|
|
### 3. ID Method Selection
|
|
I'll explain the options and help you choose:
|
|
- **persistent_id**: Stable IDs that persist across updates (recommended for most cases)
|
|
- **canonical_id**: Traditional approach with merge capabilities
|
|
|
|
### 4. Update Strategy
|
|
- **Incremental**: Process only new/updated records
|
|
- **Full Refresh**: Reprocess all data each time
|
|
|
|
### 5. Regional Endpoint
|
|
- **US**: https://api-cdp.treasuredata.com
|
|
- **EU**: https://api-cdp.eu01.treasuredata.com
|
|
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com
|
|
- **Japan**: https://api-cdp.treasuredata.co.jp
|
|
|
|
---
|
|
|
|
## What I'll Do
|
|
|
|
### Step 1: Extract and Validate Keys (via unif-keys-extractor agent)
|
|
I'll:
|
|
- Use Treasure Data MCP tools to analyze table schemas
|
|
- Extract user identifier columns (email, phone, td_client_id, etc.)
|
|
- Query sample data to validate identifier patterns
|
|
- Provide 3 SQL experts analysis of key relationships
|
|
- Recommend priority ordering for unification keys
|
|
- Exclude tables without user identifiers
|
|
|
|
### Step 2: Configuration Guidance
|
|
I'll:
|
|
- Explain canonical_id vs persistent_id concepts
|
|
- Recommend best approach for your use case
|
|
- Discuss incremental vs full refresh strategies
|
|
- Help you understand regional endpoint requirements
|
|
|
|
### Step 3: Generate Prep Tables (via dynamic-prep-creation agent)
|
|
I'll create:
|
|
- `unification/dynmic_prep_creation.dig` - Prep workflow
|
|
- `unification/queries/create_schema.sql` - Schema creation
|
|
- `unification/queries/loop_on_tables.sql` - Dynamic loop logic
|
|
- `unification/queries/unif_input_tbl.sql` - DSAR processing and data cleaning
|
|
- `unification/config/environment.yml` - Client configuration
|
|
- `unification/config/src_prep_params.yml` - Dynamic table mappings
|
|
|
|
### Step 4: Generate Core Unification (via id-unification-creator agent)
|
|
I'll create:
|
|
- `unification/config/unify.yml` - Unification configuration with keys and tables
|
|
- `unification/id_unification.dig` - Core unification workflow with HTTP API call
|
|
- Updated `unification/queries/create_schema.sql` - Schema with all required columns
|
|
|
|
### Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)
|
|
I'll create:
|
|
- `unification/config/stage_enrich.yml` - Enrichment configuration
|
|
- `unification/enrich/queries/generate_join_query.sql` - Join query generation
|
|
- `unification/enrich/queries/execute_join_presto.sql` - Presto execution
|
|
- `unification/enrich/queries/execute_join_hive.sql` - Hive execution
|
|
- `unification/enrich/queries/enrich_tbl_creation.sql` - Table creation
|
|
- `unification/enrich_runner.dig` - Enrichment workflow
|
|
|
|
### Step 6: Create Main Orchestration
|
|
I'll create:
|
|
- `unification/unif_runner.dig` - Main workflow that calls:
|
|
- prep_creation → id_unification → enrichment (in sequence)
|
|
|
|
### Step 7: ⚠️ MANDATORY VALIDATION (NEW!)
|
|
**CRITICAL**: Before deployment, I MUST run comprehensive validation:
|
|
- `/cdp-unification:unify-validate` command
|
|
- Validates ALL files against exact templates
|
|
- Checks database and table existence
|
|
- Verifies configuration consistency
|
|
- **BLOCKS deployment if ANY validation fails**
|
|
|
|
**If validation FAILS:**
|
|
- I will show exact fix commands
|
|
- You must fix all errors
|
|
- Re-run validation until 100% pass
|
|
- Only then proceed to deployment
|
|
|
|
**If validation PASSES:**
|
|
- Proceed to deployment with confidence
|
|
- All files are production-ready
|
|
|
|
### Step 8: Deployment Guidance
|
|
I'll provide:
|
|
- Configuration summary
|
|
- Deployment instructions
|
|
- Operating guidelines
|
|
- Monitoring recommendations
|
|
|
|
---
|
|
|
|
## Interactive Workflow
|
|
|
|
I'll use the **TD Copilot communication pattern** throughout:
|
|
|
|
- **Question**: When I need your input or choice
|
|
- **Suggestion**: When I recommend a specific approach
|
|
- **Check Point**: When you should verify understanding
|
|
|
|
---
|
|
|
|
## Expected Output
|
|
|
|
### Files Created (All under `unification/` directory):
|
|
|
|
**Workflows:**
|
|
- `unif_runner.dig` - Main orchestration workflow
|
|
- `dynmic_prep_creation.dig` - Prep table creation
|
|
- `id_unification.dig` - Core unification
|
|
- `enrich_runner.dig` - Staging enrichment
|
|
|
|
**Configuration:**
|
|
- `config/environment.yml` - Client settings
|
|
- `config/src_prep_params.yml` - Prep table mappings
|
|
- `config/unify.yml` - Unification configuration
|
|
- `config/stage_enrich.yml` - Enrichment configuration
|
|
|
|
**SQL Templates:**
|
|
- `queries/create_schema.sql` - Schema creation
|
|
- `queries/loop_on_tables.sql` - Dynamic loop logic
|
|
- `queries/unif_input_tbl.sql` - DSAR and data cleaning
|
|
- `enrich/queries/generate_join_query.sql` - Join generation
|
|
- `enrich/queries/execute_join_presto.sql` - Presto execution
|
|
- `enrich/queries/execute_join_hive.sql` - Hive execution
|
|
- `enrich/queries/enrich_tbl_creation.sql` - Table creation
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
All generated files will:
|
|
- ✅ Be TD-compliant and deployment-ready
|
|
- ✅ Use exact templates from documentation
|
|
- ✅ Include comprehensive error handling
|
|
- ✅ Follow TD Copilot standards
|
|
- ✅ Work without modification in Treasure Data
|
|
- ✅ Support incremental processing
|
|
- ✅ Include DSAR processing
|
|
- ✅ Generate proper master tables
|
|
|
|
---
|
|
|
|
## Getting Started
|
|
|
|
**Ready to begin?** Please provide:
|
|
|
|
1. Your table list (database.table_name format)
|
|
2. Client short name
|
|
3. Unification name
|
|
|
|
I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.
|
|
|
|
**Example:**
|
|
```
|
|
I want to set up ID unification for:
|
|
- analytics.user_events
|
|
- crm.customers
|
|
- web.pageviews
|
|
|
|
Client: mck
|
|
Unification name: customer_360
|
|
```
|
|
|
|
---
|
|
|
|
**Let's get started!**
|