Files
gh-treasure-data-aps-claude…/commands/unify-create-prep.md
2025-11-30 09:02:49 +08:00

234 lines
6.6 KiB
Markdown

---
name: unify-create-prep
description: Generate prep table creation files and configuration for ID unification
---
# Create Prep Table Configuration
## Overview
I'll generate prep table creation files and configuration using the **dynamic-prep-creation** specialized agent.
This command creates **PRODUCTION-READY** prep table files:
- ⚠️ **EXACT TEMPLATES** - No modifications allowed
- ⚠️ **ZERO CHANGES** - Character-for-character accuracy
-**GENERIC FILES** - Reusable across all projects
-**DYNAMIC CONFIGURATION** - Adapts to your table structure
---
## What You Need to Provide
### 1. Table Analysis Results
If you've already run key extraction:
- Provide the list of **included tables** with their user identifier columns
- I can use the results from `/cdp-unification:unify-extract-keys`
OR provide directly:
- **Source tables**: database.table_name format
- **User identifier columns**: For each table, which columns contain identifiers
### 2. Client Configuration
- **Client short name**: Your client identifier (e.g., `mck`, `client_name`)
- **Database suffixes**:
- Source database suffix (default: `src`)
- Staging database suffix (default: `stg`)
- Lookup database (default: `config`)
### 3. Column Mappings
For each table, specify which columns to include and their unified aliases:
- **Email columns** → alias: `email`
- **Phone columns** → alias: `phone`
- **Customer ID columns** → alias: `customer_id`
- **TD Client ID** → alias: `td_client_id`
- **TD Global ID** → alias: `td_global_id`
---
## What I'll Do
### Step 1: Create Directory Structure
I'll create:
- `unification/config/` directory
- `unification/queries/` directory
### Step 2: Generate Generic Files (EXACT TEMPLATES)
I'll create these files with **ZERO MODIFICATIONS**:
**⚠️ `unification/dynmic_prep_creation.dig`** (EXACT filename - no 'a' in dynmic)
- Generic prep workflow
- Handles schema creation, table looping, and data insertion
- Uses variables from config files
**⚠️ `unification/queries/create_schema.sql`**
- Generic schema creation for unified input table
- Creates both main and tmp tables
**⚠️ `unification/queries/loop_on_tables.sql`**
- Complex production SQL for dynamic table processing
- Generates prep table SQL and unified input table SQL
- Handles incremental logic and deduplication
**⚠️ `unification/queries/unif_input_tbl.sql`**
- DSAR processing and data cleaning
- Exclusion list management for masked data
- Dynamic column detection and insertion
### Step 3: Generate Dynamic Configuration Files
**`unification/config/environment.yml`**
```yaml
client_short_name: {your_client_name}
src: src
stg: stg
gld: gld
lkup: references
```
**`unification/config/src_prep_params.yml`**
- Dynamic table configuration based on your table analysis
- Column mappings with unified aliases
- Prep table naming conventions
### Step 4: Dynamic Column Detection (CRITICAL)
For `unif_input_tbl.sql`, I'll:
1. Query Treasure Data schema: `information_schema.columns`
2. Detect all columns besides email, phone, source, ingest_time, time
3. Auto-generate column list for data_cleaned CTE
4. Replace placeholder with actual columns
---
## Expected Output
### Generic Files (EXACT - NO CHANGES)
```
unification/
├── dynmic_prep_creation.dig ⚠️ EXACT filename
├── queries/
│ ├── create_schema.sql ⚠️ EXACT content
│ ├── loop_on_tables.sql ⚠️ EXACT content
│ └── unif_input_tbl.sql ⚠️ WITH dynamic columns
```
### Dynamic Configuration Files
```
unification/config/
├── environment.yml ✓ Client-specific
└── src_prep_params.yml ✓ Table-specific
```
### Example src_prep_params.yml Structure
```yaml
globals:
unif_input_tbl: unif_input
prep_tbls:
- src_tbl: user_events
src_db: ${client_short_name}_${stg}
snk_db: ${client_short_name}_${stg}
snk_tbl: ${src_tbl}_prep
columns:
- col:
name: user_email
alias_as: email
- col:
name: td_client_id
alias_as: td_client_id
- src_tbl: customers
src_db: ${client_short_name}_${stg}
snk_db: ${client_short_name}_${stg}
snk_tbl: ${src_tbl}_prep
columns:
- col:
name: email
alias_as: email
- col:
name: customer_id
alias_as: customer_id
```
---
## Critical Requirements
### ⚠️ NEVER MODIFY GENERIC FILES
- **dynmic_prep_creation.dig**: EXACT template, character-for-character
- **create_schema.sql**: EXACT SQL, no changes
- **loop_on_tables.sql**: EXACT complex SQL, no modifications
- **unif_input_tbl.sql**: EXACT template + dynamic column replacement
### ✅ DYNAMIC CONFIGURATION ONLY
- **environment.yml**: Client-specific variables
- **src_prep_params.yml**: Table-specific mappings
### 🚨 CRITICAL FILENAME
- **MUST be "dynmic_prep_creation.dig"** (NO 'a' in dynmic)
- This is intentional - production systems expect this exact name
### 🚨 NO TIME COLUMN
- **NEVER ADD** `time` column to src_prep_params.yml
- Time is auto-generated by SQL template
- Only include actual identifier columns
---
## Validation Checklist
Before completing, I'll verify:
- [ ] File named "dynmic_prep_creation.dig" exists
- [ ] Content matches template character-for-character
- [ ] All variable placeholders preserved
- [ ] Queries folder contains exact SQL files
- [ ] Config folder contains YAML files
- [ ] Dynamic columns inserted in unif_input_tbl.sql
- [ ] No time column in src_prep_params.yml
- [ ] All directories created
---
## Success Criteria
All generated files will:
-**EXACT TEMPLATES** - Character-for-character accuracy
-**PRODUCTION-READY** - Deployable to TD without changes
-**DYNAMIC CONFIGURATION** - Adapts to table structure
-**DSAR COMPLIANT** - Includes exclusion list processing
-**INCREMENTAL PROCESSING** - Supports time-based updates
---
## Next Steps
After prep creation, you can:
1. **Test prep workflow**: `dig run unification/dynmic_prep_creation.dig`
2. **Create unification config**: Use `/cdp-unification:unify-create-config`
3. **Complete full setup**: Use `/cdp-unification:unify-setup`
---
## Getting Started
**Ready to create prep tables?** Please provide:
1. **Table list with columns**:
```
Table: analytics.user_events
Columns: user_email (email), td_client_id (td_client_id)
Table: crm.customers
Columns: email (email), customer_id (customer_id)
```
2. **Client configuration**:
```
Client short name: mck
```
I'll call the **dynamic-prep-creation** agent to generate all prep files with exact templates.
---
**Let's create your prep table configuration!**