Files
gh-treasure-data-aps-claude…/commands/unify-create-prep.md
2025-11-30 09:02:49 +08:00

6.6 KiB

name, description
name description
unify-create-prep Generate prep table creation files and configuration for ID unification

Create Prep Table Configuration

Overview

I'll generate prep table creation files and configuration using the dynamic-prep-creation specialized agent.

This command creates PRODUCTION-READY prep table files:

  • ⚠️ EXACT TEMPLATES - No modifications allowed
  • ⚠️ ZERO CHANGES - Character-for-character accuracy
  • GENERIC FILES - Reusable across all projects
  • DYNAMIC CONFIGURATION - Adapts to your table structure

What You Need to Provide

1. Table Analysis Results

If you've already run key extraction:

  • Provide the list of included tables with their user identifier columns
  • I can use the results from /cdp-unification:unify-extract-keys

OR provide directly:

  • Source tables: database.table_name format
  • User identifier columns: For each table, which columns contain identifiers

2. Client Configuration

  • Client short name: Your client identifier (e.g., mck, client_name)
  • Database suffixes:
    • Source database suffix (default: src)
    • Staging database suffix (default: stg)
    • Lookup database (default: config)

3. Column Mappings

For each table, specify which columns to include and their unified aliases:

  • Email columns → alias: email
  • Phone columns → alias: phone
  • Customer ID columns → alias: customer_id
  • TD Client ID → alias: td_client_id
  • TD Global ID → alias: td_global_id

What I'll Do

Step 1: Create Directory Structure

I'll create:

  • unification/config/ directory
  • unification/queries/ directory

Step 2: Generate Generic Files (EXACT TEMPLATES)

I'll create these files with ZERO MODIFICATIONS:

⚠️ unification/dynmic_prep_creation.dig (EXACT filename - no 'a' in dynmic)

  • Generic prep workflow
  • Handles schema creation, table looping, and data insertion
  • Uses variables from config files

⚠️ unification/queries/create_schema.sql

  • Generic schema creation for unified input table
  • Creates both main and tmp tables

⚠️ unification/queries/loop_on_tables.sql

  • Complex production SQL for dynamic table processing
  • Generates prep table SQL and unified input table SQL
  • Handles incremental logic and deduplication

⚠️ unification/queries/unif_input_tbl.sql

  • DSAR processing and data cleaning
  • Exclusion list management for masked data
  • Dynamic column detection and insertion

Step 3: Generate Dynamic Configuration Files

unification/config/environment.yml

client_short_name: {your_client_name}
src: src
stg: stg
gld: gld
lkup: references

unification/config/src_prep_params.yml

  • Dynamic table configuration based on your table analysis
  • Column mappings with unified aliases
  • Prep table naming conventions

Step 4: Dynamic Column Detection (CRITICAL)

For unif_input_tbl.sql, I'll:

  1. Query Treasure Data schema: information_schema.columns
  2. Detect all columns besides email, phone, source, ingest_time, time
  3. Auto-generate column list for data_cleaned CTE
  4. Replace placeholder with actual columns

Expected Output

Generic Files (EXACT - NO CHANGES)

unification/
├── dynmic_prep_creation.dig          ⚠️ EXACT filename
├── queries/
│   ├── create_schema.sql             ⚠️ EXACT content
│   ├── loop_on_tables.sql            ⚠️ EXACT content
│   └── unif_input_tbl.sql            ⚠️ WITH dynamic columns

Dynamic Configuration Files

unification/config/
├── environment.yml                   ✓ Client-specific
└── src_prep_params.yml              ✓ Table-specific

Example src_prep_params.yml Structure

globals:
  unif_input_tbl: unif_input

prep_tbls:
  - src_tbl: user_events
    src_db: ${client_short_name}_${stg}
    snk_db: ${client_short_name}_${stg}
    snk_tbl: ${src_tbl}_prep
    columns:
      - col:
        name: user_email
        alias_as: email
      - col:
        name: td_client_id
        alias_as: td_client_id

  - src_tbl: customers
    src_db: ${client_short_name}_${stg}
    snk_db: ${client_short_name}_${stg}
    snk_tbl: ${src_tbl}_prep
    columns:
      - col:
        name: email
        alias_as: email
      - col:
        name: customer_id
        alias_as: customer_id

Critical Requirements

⚠️ NEVER MODIFY GENERIC FILES

  • dynmic_prep_creation.dig: EXACT template, character-for-character
  • create_schema.sql: EXACT SQL, no changes
  • loop_on_tables.sql: EXACT complex SQL, no modifications
  • unif_input_tbl.sql: EXACT template + dynamic column replacement

DYNAMIC CONFIGURATION ONLY

  • environment.yml: Client-specific variables
  • src_prep_params.yml: Table-specific mappings

🚨 CRITICAL FILENAME

  • MUST be "dynmic_prep_creation.dig" (NO 'a' in dynmic)
  • This is intentional - production systems expect this exact name

🚨 NO TIME COLUMN

  • NEVER ADD time column to src_prep_params.yml
  • Time is auto-generated by SQL template
  • Only include actual identifier columns

Validation Checklist

Before completing, I'll verify:

  • File named "dynmic_prep_creation.dig" exists
  • Content matches template character-for-character
  • All variable placeholders preserved
  • Queries folder contains exact SQL files
  • Config folder contains YAML files
  • Dynamic columns inserted in unif_input_tbl.sql
  • No time column in src_prep_params.yml
  • All directories created

Success Criteria

All generated files will:

  • EXACT TEMPLATES - Character-for-character accuracy
  • PRODUCTION-READY - Deployable to TD without changes
  • DYNAMIC CONFIGURATION - Adapts to table structure
  • DSAR COMPLIANT - Includes exclusion list processing
  • INCREMENTAL PROCESSING - Supports time-based updates

Next Steps

After prep creation, you can:

  1. Test prep workflow: dig run unification/dynmic_prep_creation.dig
  2. Create unification config: Use /cdp-unification:unify-create-config
  3. Complete full setup: Use /cdp-unification:unify-setup

Getting Started

Ready to create prep tables? Please provide:

  1. Table list with columns:

    Table: analytics.user_events
    Columns: user_email (email), td_client_id (td_client_id)
    
    Table: crm.customers
    Columns: email (email), customer_id (customer_id)
    
  2. Client configuration:

    Client short name: mck
    

I'll call the dynamic-prep-creation agent to generate all prep files with exact templates.


Let's create your prep table configuration!