315 lines
8.6 KiB
Markdown
315 lines
8.6 KiB
Markdown
---
|
|
name: unify-create-config
|
|
description: Generate core ID unification configuration files (unify.yml and id_unification.dig)
|
|
---
|
|
|
|
# Create Core Unification Configuration
|
|
|
|
## Overview
|
|
|
|
I'll generate core ID unification configuration files using the **id-unification-creator** specialized agent.
|
|
|
|
This command creates **TD-COMPLIANT** unification files:
|
|
- ✅ **DYNAMIC CONFIGURATION** - Based on prep table analysis
|
|
- ✅ **METHOD-SPECIFIC** - Persistent_id OR canonical_id (never both)
|
|
- ✅ **REGIONAL ENDPOINTS** - Correct URL for your region
|
|
- ✅ **SCHEMA VALIDATION** - Prevents first-run failures
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
**REQUIRED**: Prep table configuration must exist:
|
|
- `unification/config/environment.yml` - Client configuration
|
|
- `unification/config/src_prep_params.yml` - Prep table mappings
|
|
|
|
If you haven't created these yet, run:
|
|
- `/cdp-unification:unify-create-prep` first, OR
|
|
- `/cdp-unification:unify-setup` for complete end-to-end setup
|
|
|
|
---
|
|
|
|
## What You Need to Provide
|
|
|
|
### 1. ID Method Selection
|
|
Choose ONE method:
|
|
|
|
**Option A: persistent_id (RECOMMENDED)**
|
|
- Stable IDs that persist across updates
|
|
- Better for customer data platforms
|
|
- Recommended for most use cases
|
|
- **Provide persistent_id name** (e.g., `td_claude_id`, `stable_customer_id`)
|
|
|
|
**Option B: canonical_id**
|
|
- Traditional approach with merge capabilities
|
|
- Good for legacy systems
|
|
- **Provide canonical_id name** (e.g., `master_customer_id`)
|
|
|
|
### 2. Update Strategy
|
|
- **Full Refresh**: Reprocess all data each time (`full_refresh: true`)
|
|
- **Incremental**: Process only new/updated records (`full_refresh: false`)
|
|
|
|
### 3. Regional Endpoint
|
|
Choose your Treasure Data region:
|
|
- **US**: https://api-cdp.treasuredata.com/unifications/workflow_call
|
|
- **EU**: https://api-cdp.eu01.treasuredata.com/unifications/workflow_call
|
|
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com/unifications/workflow_call
|
|
- **Japan**: https://api-cdp.treasuredata.co.jp/unifications/workflow_call
|
|
|
|
### 4. Unification Name
|
|
- Name for this unification project (e.g., `claude`, `customer_360`)
|
|
|
|
---
|
|
|
|
## What I'll Do
|
|
|
|
### Step 1: Validate Prerequisites
|
|
I'll check that these files exist:
|
|
- `unification/config/environment.yml`
|
|
- `unification/config/src_prep_params.yml`
|
|
|
|
And extract:
|
|
- Client short name
|
|
- Unified input table name
|
|
- All prep table configurations with column mappings
|
|
|
|
### Step 2: Extract Key Information
|
|
I'll parse `src_prep_params.yml` to identify:
|
|
- All unique `alias_as` column names
|
|
- Key types: email, phone, td_client_id, td_global_id, customer_id, etc.
|
|
- Complete list of available keys for `merge_by_keys`
|
|
|
|
### Step 3: Generate unification/config/unify.yml
|
|
I'll create:
|
|
```yaml
|
|
name: {unif_name}
|
|
|
|
keys:
|
|
- name: email
|
|
invalid_texts: ['']
|
|
- name: td_client_id
|
|
invalid_texts: ['']
|
|
- name: phone
|
|
invalid_texts: ['']
|
|
# ... ALL detected key types
|
|
|
|
tables:
|
|
- database: ${client_short_name}_${stg}
|
|
table: ${globals.unif_input_tbl}
|
|
incremental_columns: [time]
|
|
key_columns:
|
|
- {column: email, key: email}
|
|
- {column: td_client_id, key: td_client_id}
|
|
- {column: phone, key: phone}
|
|
# ... ALL alias_as columns mapped
|
|
|
|
# ONLY ONE of these sections (based on your selection):
|
|
persistent_ids:
|
|
- name: {persistent_id_name}
|
|
merge_by_keys: [email, td_client_id, phone, ...]
|
|
merge_iterations: 15
|
|
|
|
# OR
|
|
|
|
canonical_ids:
|
|
- name: {canonical_id_name}
|
|
merge_by_keys: [email, td_client_id, phone, ...]
|
|
merge_iterations: 15
|
|
```
|
|
|
|
### Step 4: Validate and Update Schema (CRITICAL)
|
|
I'll prevent first-run failures by:
|
|
1. Reading `unify.yml` to extract `merge_by_keys` list
|
|
2. Reading `queries/create_schema.sql` to check existing columns
|
|
3. Comparing required vs existing columns
|
|
4. Updating `create_schema.sql` if missing columns:
|
|
- Add all keys from `merge_by_keys` as varchar
|
|
- Add source, time, ingest_time columns
|
|
- Update BOTH table definitions (main and tmp)
|
|
|
|
### Step 5: Generate unification/id_unification.dig
|
|
I'll create:
|
|
```yaml
|
|
timezone: UTC
|
|
|
|
_export:
|
|
!include : config/environment.yml
|
|
!include : config/src_prep_params.yml
|
|
|
|
+call_unification:
|
|
http_call>: {REGIONAL_ENDPOINT_URL}
|
|
headers:
|
|
- authorization: ${secret:td.apikey}
|
|
- content-type: application/json
|
|
method: POST
|
|
retry: true
|
|
content_format: json
|
|
content:
|
|
run_persistent_ids: {true/false} # ONLY if persistent_id
|
|
run_canonical_ids: {true/false} # ONLY if canonical_id
|
|
run_enrichments: true
|
|
run_master_tables: true
|
|
full_refresh: {true/false}
|
|
keep_debug_tables: true
|
|
unification:
|
|
!include : config/unify.yml
|
|
```
|
|
|
|
---
|
|
|
|
## Expected Output
|
|
|
|
### Files Created
|
|
```
|
|
unification/
|
|
├── config/
|
|
│ └── unify.yml ✓ Dynamic configuration
|
|
├── queries/
|
|
│ └── create_schema.sql ✓ Updated with all columns
|
|
└── id_unification.dig ✓ Core unification workflow
|
|
```
|
|
|
|
### Example unify.yml (persistent_id method)
|
|
```yaml
|
|
name: customer_360
|
|
|
|
keys:
|
|
- name: email
|
|
invalid_texts: ['']
|
|
- name: td_client_id
|
|
invalid_texts: ['']
|
|
valid_regexp: '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
|
|
|
|
tables:
|
|
- database: ${client_short_name}_${stg}
|
|
table: ${globals.unif_input_tbl}
|
|
incremental_columns: [time]
|
|
key_columns:
|
|
- {column: email, key: email}
|
|
- {column: td_client_id, key: td_client_id}
|
|
|
|
persistent_ids:
|
|
- name: td_claude_id
|
|
merge_by_keys: [email, td_client_id]
|
|
merge_iterations: 15
|
|
```
|
|
|
|
### Example id_unification.dig (US region, incremental)
|
|
```yaml
|
|
timezone: UTC
|
|
|
|
_export:
|
|
!include : config/environment.yml
|
|
!include : config/src_prep_params.yml
|
|
|
|
+call_unification:
|
|
http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
|
|
headers:
|
|
- authorization: ${secret:td.apikey}
|
|
- content-type: application/json
|
|
method: POST
|
|
retry: true
|
|
content_format: json
|
|
content:
|
|
run_persistent_ids: true
|
|
run_enrichments: true
|
|
run_master_tables: true
|
|
full_refresh: false
|
|
keep_debug_tables: true
|
|
unification:
|
|
!include : config/unify.yml
|
|
```
|
|
|
|
---
|
|
|
|
## Critical Requirements
|
|
|
|
### ✅ Dynamic Configuration
|
|
- All keys detected from `src_prep_params.yml`
|
|
- All column mappings from prep analysis
|
|
- Method-specific configuration (never both)
|
|
|
|
### ⚠️ Schema Completeness
|
|
- `create_schema.sql` MUST contain ALL columns from `merge_by_keys`
|
|
- Prevents "column not found" errors on first run
|
|
- Updates both main and tmp table definitions
|
|
|
|
### ⚠️ Config File Inclusion
|
|
- `id_unification.dig` MUST include BOTH config files in `_export`:
|
|
- `environment.yml` - For `${client_short_name}_${stg}`
|
|
- `src_prep_params.yml` - For `${globals.unif_input_tbl}`
|
|
|
|
### ⚠️ Regional Endpoint
|
|
- Must use exact URL for selected region
|
|
- Different endpoints for US, EU, Asia Pacific, Japan
|
|
|
|
---
|
|
|
|
## Validation Checklist
|
|
|
|
Before completing, I'll verify:
|
|
- [ ] unify.yml contains all detected key types
|
|
- [ ] key_columns section maps ALL alias_as columns
|
|
- [ ] Only ONE ID method section exists
|
|
- [ ] merge_by_keys includes ALL available keys
|
|
- [ ] **CRITICAL**: create_schema.sql contains ALL columns from merge_by_keys
|
|
- [ ] **CRITICAL**: Both table definitions updated (main and tmp)
|
|
- [ ] id_unification.dig has correct regional endpoint
|
|
- [ ] **CRITICAL**: _export includes BOTH config files
|
|
- [ ] Workflow flags match selected method only
|
|
- [ ] Proper TD YAML/DIG syntax
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
All generated files will:
|
|
- ✅ **TD-COMPLIANT** - Work without modification in TD
|
|
- ✅ **DYNAMICALLY CONFIGURED** - Based on actual prep analysis
|
|
- ✅ **METHOD-ACCURATE** - Exact implementation of selected method
|
|
- ✅ **REGIONALLY CORRECT** - Proper endpoint for region
|
|
- ✅ **SCHEMA-COMPLETE** - All required columns present
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
After creating core config, you can:
|
|
1. **Test unification workflow**: `dig run unification/id_unification.dig`
|
|
2. **Add enrichment**: Use `/cdp-unification:unify-setup` to add staging enrichment
|
|
3. **Create main orchestrator**: Combine prep + unification + enrichment
|
|
|
|
---
|
|
|
|
## Getting Started
|
|
|
|
**Ready to create core unification config?** Please provide:
|
|
|
|
1. **ID Method**:
|
|
- Choose: `persistent_id` or `canonical_id`
|
|
- Provide ID name: e.g., `td_claude_id`
|
|
|
|
2. **Update Strategy**:
|
|
- Choose: `incremental` or `full_refresh`
|
|
|
|
3. **Regional Endpoint**:
|
|
- Choose: `US`, `EU`, `Asia Pacific`, or `Japan`
|
|
|
|
4. **Unification Name**:
|
|
- e.g., `customer_360`, `claude`
|
|
|
|
**Example:**
|
|
```
|
|
ID Method: persistent_id
|
|
ID Name: td_claude_id
|
|
Update Strategy: incremental
|
|
Region: US
|
|
Unification Name: customer_360
|
|
```
|
|
|
|
I'll call the **id-unification-creator** agent to generate all core unification files.
|
|
|
|
---
|
|
|
|
**Let's create your unification configuration!**
|