Files
gh-treasure-data-aps-claude…/commands/transform-table.md
2025-11-30 09:02:46 +08:00

5.9 KiB

name, description
name description
transform-table Transform a single database table to staging format with data quality improvements, PII handling, and JSON extraction

Transform Single Table to Staging

⚠️ CRITICAL: This command enforces strict sub-agent delegation

I'll help you transform a database table to staging format using the appropriate staging-transformer sub-agent (Presto/Trino or Hive).


Required Information

Please provide the following details:

1. Source Table

  • Database Name: Source database (e.g., client_src, demo_db)
  • Table Name: Source table name (e.g., customer_profiles_histunion)

2. Target Configuration

  • Staging Database: Target database (default: client_stg)
  • Lookup Database: Reference database for rules (default: client_config)

3. SQL Engine (Optional)

  • Engine: Choose one:
    • presto or trino - Presto/Trino SQL engine (default)
    • hive - Hive SQL engine
    • If not specified, will default to Presto/Trino

4. Transformation Requirements (Optional)

  • Deduplication: Required? (will check client_config.staging_trnsfrm_rules)
  • JSON Columns: Will auto-detect and process
  • Join Logic: Any joins needed? (will check additional_rules)

What I'll Do

Step 1: Detect SQL Engine

I will determine the appropriate sub-agent:

  • Presto/Trino keywordsstaging-transformer-presto
  • Hive keywordsstaging-transformer-hive
  • No specificationstaging-transformer-presto (default)

Step 2: Delegate to Specialized Agent

I will invoke the appropriate staging-transformer agent with:

  • Complete table transformation context
  • All mandatory requirements (13 rules)
  • Engine-specific SQL generation
  • Full compliance validation

Step 3: Sub-Agent Will Execute

The specialized agent will:

  1. Validate table existence (MANDATORY first step)
  2. Analyze metadata (columns, types, data samples)
  3. Check configuration (deduplication rules, additional rules)
  4. Detect JSON columns (automatic processing)
  5. Generate SQL files:
    • staging/init_queries/{source_db}_{table_name}_init.sql (Presto)
    • staging/queries/{source_db}_{table_name}.sql (Presto)
    • staging/queries/{source_db}_{table_name}_upsert.sql (if dedup, Presto)
    • OR staging_hive/queries/{source_db}_{table_name}.sql (Hive)
  6. Update configuration: staging/config/src_params.yml or staging_hive/config/src_params.yml
  7. Create/verify DIG file: staging/staging_transformation.dig or staging_hive/staging_hive.dig
  8. Execute git workflow: Commit, branch, push, PR creation

Quality Assurance

The sub-agent ensures complete compliance with all requirements:

Column Limit Management (max 200 columns) JSON Detection & Extraction (automatic) Date Processing (4 outputs per date column) Email/Phone Validation (with hashing) String Standardization (UPPER, TRIM, NULL handling) Deduplication Logic (if configured) Join Processing (if specified) Incremental Processing (state tracking) SQL File Creation (init, incremental, upsert) DIG File Management (conditional creation) Configuration Update (src_params.yml) Git Workflow (complete automation) Treasure Data Compatibility (VARCHAR/BIGINT timestamps)


Output Files

For Presto/Trino Engine:

  1. staging/init_queries/{source_db}_{table_name}_init.sql - Initial load SQL
  2. staging/queries/{source_db}_{table_name}.sql - Incremental SQL
  3. staging/queries/{source_db}_{table_name}_upsert.sql - Upsert SQL (if dedup)
  4. staging/config/src_params.yml - Updated configuration
  5. staging/staging_transformation.dig - Workflow (created if not exists)

For Hive Engine:

  1. staging_hive/queries/{source_db}_{table_name}.sql - Combined SQL
  2. staging_hive/config/src_params.yml - Updated configuration
  3. staging_hive/staging_hive.dig - Workflow (created if not exists)
  4. staging_hive/queries/get_max_time.sql - Template (created if not exists)
  5. staging_hive/queries/get_stg_rows_for_delete.sql - Template (created if not exists)

Plus:

  • Git commit with comprehensive message
  • Pull request with transformation summary
  • Validation report

Example Usage

Example 1: Presto Engine (Default)

User: Transform table client_src.customer_profiles_histunion
→ Engine: Presto (default)
→ Sub-agent: staging-transformer-presto
→ Output: staging/ directory files

Example 2: Hive Engine (Explicit)

User: Transform table client_src.klaviyo_events_histunion using Hive
→ Engine: Hive
→ Sub-agent: staging-transformer-hive
→ Output: staging_hive/ directory files

Example 3: With Custom Databases

User: Transform demo_db.orders_histunion
      Use demo_db_stg as staging database
      Use client_config for lookup
→ Engine: Presto (default)
→ Custom databases applied

Next Steps After Transformation

  1. Review generated files:

    ls -l staging/queries/
    ls -l staging/init_queries/
    cat staging/config/src_params.yml
    
  2. Review Pull Request:

    • Check transformation summary
    • Verify all validation gates passed
    • Review generated SQL
  3. Test the workflow:

    cd staging
    td wf push
    td wf run staging_transformation.dig
    
  4. Monitor execution:

    SELECT * FROM client_config.inc_log
    WHERE table_name = '{your_table}'
    ORDER BY inc_value DESC
    LIMIT 1
    

Production-Ready Guarantee

All transformations will:

  • Work the first time
  • Follow consistent patterns
  • Include complete error handling
  • Include comprehensive data quality
  • Be maintainable and documented
  • Match production standards exactly

Ready to proceed? Please provide the source database and table name, and I'll delegate to the appropriate staging-transformer agent for complete processing.