Files
gh-treasure-data-aps-claude…/commands/unify-create-config.md
2025-11-30 09:02:49 +08:00

315 lines
8.6 KiB
Markdown

---
name: unify-create-config
description: Generate core ID unification configuration files (unify.yml and id_unification.dig)
---
# Create Core Unification Configuration
## Overview
I'll generate core ID unification configuration files using the **id-unification-creator** specialized agent.
This command creates **TD-COMPLIANT** unification files:
-**DYNAMIC CONFIGURATION** - Based on prep table analysis
-**METHOD-SPECIFIC** - Persistent_id OR canonical_id (never both)
-**REGIONAL ENDPOINTS** - Correct URL for your region
-**SCHEMA VALIDATION** - Prevents first-run failures
---
## Prerequisites
**REQUIRED**: Prep table configuration must exist:
- `unification/config/environment.yml` - Client configuration
- `unification/config/src_prep_params.yml` - Prep table mappings
If you haven't created these yet, run:
- `/cdp-unification:unify-create-prep` first, OR
- `/cdp-unification:unify-setup` for complete end-to-end setup
---
## What You Need to Provide
### 1. ID Method Selection
Choose ONE method:
**Option A: persistent_id (RECOMMENDED)**
- Stable IDs that persist across updates
- Better for customer data platforms
- Recommended for most use cases
- **Provide persistent_id name** (e.g., `td_claude_id`, `stable_customer_id`)
**Option B: canonical_id**
- Traditional approach with merge capabilities
- Good for legacy systems
- **Provide canonical_id name** (e.g., `master_customer_id`)
### 2. Update Strategy
- **Full Refresh**: Reprocess all data each time (`full_refresh: true`)
- **Incremental**: Process only new/updated records (`full_refresh: false`)
### 3. Regional Endpoint
Choose your Treasure Data region:
- **US**: https://api-cdp.treasuredata.com/unifications/workflow_call
- **EU**: https://api-cdp.eu01.treasuredata.com/unifications/workflow_call
- **Asia Pacific**: https://api-cdp.ap02.treasuredata.com/unifications/workflow_call
- **Japan**: https://api-cdp.treasuredata.co.jp/unifications/workflow_call
### 4. Unification Name
- Name for this unification project (e.g., `claude`, `customer_360`)
---
## What I'll Do
### Step 1: Validate Prerequisites
I'll check that these files exist:
- `unification/config/environment.yml`
- `unification/config/src_prep_params.yml`
And extract:
- Client short name
- Unified input table name
- All prep table configurations with column mappings
### Step 2: Extract Key Information
I'll parse `src_prep_params.yml` to identify:
- All unique `alias_as` column names
- Key types: email, phone, td_client_id, td_global_id, customer_id, etc.
- Complete list of available keys for `merge_by_keys`
### Step 3: Generate unification/config/unify.yml
I'll create:
```yaml
name: {unif_name}
keys:
- name: email
invalid_texts: ['']
- name: td_client_id
invalid_texts: ['']
- name: phone
invalid_texts: ['']
# ... ALL detected key types
tables:
- database: ${client_short_name}_${stg}
table: ${globals.unif_input_tbl}
incremental_columns: [time]
key_columns:
- {column: email, key: email}
- {column: td_client_id, key: td_client_id}
- {column: phone, key: phone}
# ... ALL alias_as columns mapped
# ONLY ONE of these sections (based on your selection):
persistent_ids:
- name: {persistent_id_name}
merge_by_keys: [email, td_client_id, phone, ...]
merge_iterations: 15
# OR
canonical_ids:
- name: {canonical_id_name}
merge_by_keys: [email, td_client_id, phone, ...]
merge_iterations: 15
```
### Step 4: Validate and Update Schema (CRITICAL)
I'll prevent first-run failures by:
1. Reading `unify.yml` to extract `merge_by_keys` list
2. Reading `queries/create_schema.sql` to check existing columns
3. Comparing required vs existing columns
4. Updating `create_schema.sql` if missing columns:
- Add all keys from `merge_by_keys` as varchar
- Add source, time, ingest_time columns
- Update BOTH table definitions (main and tmp)
### Step 5: Generate unification/id_unification.dig
I'll create:
```yaml
timezone: UTC
_export:
!include : config/environment.yml
!include : config/src_prep_params.yml
+call_unification:
http_call>: {REGIONAL_ENDPOINT_URL}
headers:
- authorization: ${secret:td.apikey}
- content-type: application/json
method: POST
retry: true
content_format: json
content:
run_persistent_ids: {true/false} # ONLY if persistent_id
run_canonical_ids: {true/false} # ONLY if canonical_id
run_enrichments: true
run_master_tables: true
full_refresh: {true/false}
keep_debug_tables: true
unification:
!include : config/unify.yml
```
---
## Expected Output
### Files Created
```
unification/
├── config/
│ └── unify.yml ✓ Dynamic configuration
├── queries/
│ └── create_schema.sql ✓ Updated with all columns
└── id_unification.dig ✓ Core unification workflow
```
### Example unify.yml (persistent_id method)
```yaml
name: customer_360
keys:
- name: email
invalid_texts: ['']
- name: td_client_id
invalid_texts: ['']
valid_regexp: '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
tables:
- database: ${client_short_name}_${stg}
table: ${globals.unif_input_tbl}
incremental_columns: [time]
key_columns:
- {column: email, key: email}
- {column: td_client_id, key: td_client_id}
persistent_ids:
- name: td_claude_id
merge_by_keys: [email, td_client_id]
merge_iterations: 15
```
### Example id_unification.dig (US region, incremental)
```yaml
timezone: UTC
_export:
!include : config/environment.yml
!include : config/src_prep_params.yml
+call_unification:
http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
headers:
- authorization: ${secret:td.apikey}
- content-type: application/json
method: POST
retry: true
content_format: json
content:
run_persistent_ids: true
run_enrichments: true
run_master_tables: true
full_refresh: false
keep_debug_tables: true
unification:
!include : config/unify.yml
```
---
## Critical Requirements
### ✅ Dynamic Configuration
- All keys detected from `src_prep_params.yml`
- All column mappings from prep analysis
- Method-specific configuration (never both)
### ⚠️ Schema Completeness
- `create_schema.sql` MUST contain ALL columns from `merge_by_keys`
- Prevents "column not found" errors on first run
- Updates both main and tmp table definitions
### ⚠️ Config File Inclusion
- `id_unification.dig` MUST include BOTH config files in `_export`:
- `environment.yml` - For `${client_short_name}_${stg}`
- `src_prep_params.yml` - For `${globals.unif_input_tbl}`
### ⚠️ Regional Endpoint
- Must use exact URL for selected region
- Different endpoints for US, EU, Asia Pacific, Japan
---
## Validation Checklist
Before completing, I'll verify:
- [ ] unify.yml contains all detected key types
- [ ] key_columns section maps ALL alias_as columns
- [ ] Only ONE ID method section exists
- [ ] merge_by_keys includes ALL available keys
- [ ] **CRITICAL**: create_schema.sql contains ALL columns from merge_by_keys
- [ ] **CRITICAL**: Both table definitions updated (main and tmp)
- [ ] id_unification.dig has correct regional endpoint
- [ ] **CRITICAL**: _export includes BOTH config files
- [ ] Workflow flags match selected method only
- [ ] Proper TD YAML/DIG syntax
---
## Success Criteria
All generated files will:
-**TD-COMPLIANT** - Work without modification in TD
-**DYNAMICALLY CONFIGURED** - Based on actual prep analysis
-**METHOD-ACCURATE** - Exact implementation of selected method
-**REGIONALLY CORRECT** - Proper endpoint for region
-**SCHEMA-COMPLETE** - All required columns present
---
## Next Steps
After creating core config, you can:
1. **Test unification workflow**: `dig run unification/id_unification.dig`
2. **Add enrichment**: Use `/cdp-unification:unify-setup` to add staging enrichment
3. **Create main orchestrator**: Combine prep + unification + enrichment
---
## Getting Started
**Ready to create core unification config?** Please provide:
1. **ID Method**:
- Choose: `persistent_id` or `canonical_id`
- Provide ID name: e.g., `td_claude_id`
2. **Update Strategy**:
- Choose: `incremental` or `full_refresh`
3. **Regional Endpoint**:
- Choose: `US`, `EU`, `Asia Pacific`, or `Japan`
4. **Unification Name**:
- e.g., `customer_360`, `claude`
**Example:**
```
ID Method: persistent_id
ID Name: td_claude_id
Update Strategy: incremental
Region: US
Unification Name: customer_360
```
I'll call the **id-unification-creator** agent to generate all core unification files.
---
**Let's create your unification configuration!**