Files
2025-11-30 09:02:49 +08:00

201 lines
6.7 KiB
Markdown

---
name: unify-setup
description: Complete end-to-end ID unification setup from table analysis to deployment
---
# Complete ID Unification Setup
## Overview
I'll guide you through the complete ID unification setup process for Treasure Data CDP. This is an interactive, end-to-end workflow that will:
1. **Extract and validate user identifiers** from your tables
2. **Help you choose the right ID method** (canonical_id vs persistent_id)
3. **Generate prep table configurations** for data standardization
4. **Create core unification files** (unify.yml and id_unification.dig)
5. **Set up staging enrichment** for post-unification processing
6. **Create orchestration workflow** (unif_runner.dig) to run everything in sequence
---
## What You Need to Provide
### 1. Table List
Please provide the list of tables you want to include in ID unification:
- Format: `database.table_name` (e.g., `analytics.user_events`, `crm.customers`)
- I'll analyze each table using Treasure Data MCP tools to extract user identifiers
### 2. Client Configuration
- **Client short name**: Your client identifier (e.g., `mck`, `client`)
- **Unification name**: Name for this unification project (e.g., `claude`, `customer_360`)
- **Lookup/Config database suffix**: (default: `config`)
- Creates database: `${client_short_name}_${lookup_suffix}` (e.g., `client_config`)
- ⚠️ **I WILL CREATE THIS DATABASE** if it doesn't exist
### 3. ID Method Selection
I'll explain the options and help you choose:
- **persistent_id**: Stable IDs that persist across updates (recommended for most cases)
- **canonical_id**: Traditional approach with merge capabilities
### 4. Update Strategy
- **Incremental**: Process only new/updated records
- **Full Refresh**: Reprocess all data each time
### 5. Regional Endpoint
- **US**: https://api-cdp.treasuredata.com
- **EU**: https://api-cdp.eu01.treasuredata.com
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com
- **Japan**: https://api-cdp.treasuredata.co.jp
---
## What I'll Do
### Step 1: Extract and Validate Keys (via unif-keys-extractor agent)
I'll:
- Use Treasure Data MCP tools to analyze table schemas
- Extract user identifier columns (email, phone, td_client_id, etc.)
- Query sample data to validate identifier patterns
- Provide 3 SQL experts analysis of key relationships
- Recommend priority ordering for unification keys
- Exclude tables without user identifiers
### Step 2: Configuration Guidance
I'll:
- Explain canonical_id vs persistent_id concepts
- Recommend best approach for your use case
- Discuss incremental vs full refresh strategies
- Help you understand regional endpoint requirements
### Step 3: Generate Prep Tables (via dynamic-prep-creation agent)
I'll create:
- `unification/dynmic_prep_creation.dig` - Prep workflow
- `unification/queries/create_schema.sql` - Schema creation
- `unification/queries/loop_on_tables.sql` - Dynamic loop logic
- `unification/queries/unif_input_tbl.sql` - DSAR processing and data cleaning
- `unification/config/environment.yml` - Client configuration
- `unification/config/src_prep_params.yml` - Dynamic table mappings
### Step 4: Generate Core Unification (via id-unification-creator agent)
I'll create:
- `unification/config/unify.yml` - Unification configuration with keys and tables
- `unification/id_unification.dig` - Core unification workflow with HTTP API call
- Updated `unification/queries/create_schema.sql` - Schema with all required columns
### Step 5: Generate Staging Enrichment (via unification-staging-enricher agent)
I'll create:
- `unification/config/stage_enrich.yml` - Enrichment configuration
- `unification/enrich/queries/generate_join_query.sql` - Join query generation
- `unification/enrich/queries/execute_join_presto.sql` - Presto execution
- `unification/enrich/queries/execute_join_hive.sql` - Hive execution
- `unification/enrich/queries/enrich_tbl_creation.sql` - Table creation
- `unification/enrich_runner.dig` - Enrichment workflow
### Step 6: Create Main Orchestration
I'll create:
- `unification/unif_runner.dig` - Main workflow that calls:
- prep_creation → id_unification → enrichment (in sequence)
### Step 7: ⚠️ MANDATORY VALIDATION (NEW!)
**CRITICAL**: Before deployment, I MUST run comprehensive validation:
- `/cdp-unification:unify-validate` command
- Validates ALL files against exact templates
- Checks database and table existence
- Verifies configuration consistency
- **BLOCKS deployment if ANY validation fails**
**If validation FAILS:**
- I will show exact fix commands
- You must fix all errors
- Re-run validation until 100% pass
- Only then proceed to deployment
**If validation PASSES:**
- Proceed to deployment with confidence
- All files are production-ready
### Step 8: Deployment Guidance
I'll provide:
- Configuration summary
- Deployment instructions
- Operating guidelines
- Monitoring recommendations
---
## Interactive Workflow
I'll use the **TD Copilot communication pattern** throughout:
- **Question**: When I need your input or choice
- **Suggestion**: When I recommend a specific approach
- **Check Point**: When you should verify understanding
---
## Expected Output
### Files Created (All under `unification/` directory):
**Workflows:**
- `unif_runner.dig` - Main orchestration workflow
- `dynmic_prep_creation.dig` - Prep table creation
- `id_unification.dig` - Core unification
- `enrich_runner.dig` - Staging enrichment
**Configuration:**
- `config/environment.yml` - Client settings
- `config/src_prep_params.yml` - Prep table mappings
- `config/unify.yml` - Unification configuration
- `config/stage_enrich.yml` - Enrichment configuration
**SQL Templates:**
- `queries/create_schema.sql` - Schema creation
- `queries/loop_on_tables.sql` - Dynamic loop logic
- `queries/unif_input_tbl.sql` - DSAR and data cleaning
- `enrich/queries/generate_join_query.sql` - Join generation
- `enrich/queries/execute_join_presto.sql` - Presto execution
- `enrich/queries/execute_join_hive.sql` - Hive execution
- `enrich/queries/enrich_tbl_creation.sql` - Table creation
---
## Success Criteria
All generated files will:
- ✅ Be TD-compliant and deployment-ready
- ✅ Use exact templates from documentation
- ✅ Include comprehensive error handling
- ✅ Follow TD Copilot standards
- ✅ Work without modification in Treasure Data
- ✅ Support incremental processing
- ✅ Include DSAR processing
- ✅ Generate proper master tables
---
## Getting Started
**Ready to begin?** Please provide:
1. Your table list (database.table_name format)
2. Client short name
3. Unification name
I'll start by analyzing your tables with the unif-keys-extractor agent to extract and validate user identifiers.
**Example:**
```
I want to set up ID unification for:
- analytics.user_events
- crm.customers
- web.pageviews
Client: mck
Unification name: customer_360
```
---
**Let's get started!**