zhongwei/gh-treasure-data-aps-claude-tools-plugins-cdp-staging

Files

Zhongwei Li b81339e588 Initial commit

2025-11-30 09:02:46 +08:00

5.9 KiB

Raw Permalink Blame History

name, description

name	description
transform-table	Transform a single database table to staging format with data quality improvements, PII handling, and JSON extraction

Transform Single Table to Staging

⚠️ CRITICAL: This command enforces strict sub-agent delegation

I'll help you transform a database table to staging format using the appropriate staging-transformer sub-agent (Presto/Trino or Hive).

Required Information

Please provide the following details:

1. Source Table

Database Name: Source database (e.g., client_src, demo_db)
Table Name: Source table name (e.g., customer_profiles_histunion)

2. Target Configuration

Staging Database: Target database (default: client_stg)
Lookup Database: Reference database for rules (default: client_config)

3. SQL Engine (Optional)

Engine: Choose one:
- presto or trino - Presto/Trino SQL engine (default)
- hive - Hive SQL engine
- If not specified, will default to Presto/Trino

4. Transformation Requirements (Optional)

Deduplication: Required? (will check client_config.staging_trnsfrm_rules)
JSON Columns: Will auto-detect and process
Join Logic: Any joins needed? (will check additional_rules)

What I'll Do

Step 1: Detect SQL Engine

I will determine the appropriate sub-agent:

Presto/Trino keywords → staging-transformer-presto
Hive keywords → staging-transformer-hive
No specification → staging-transformer-presto (default)

Step 2: Delegate to Specialized Agent

I will invoke the appropriate staging-transformer agent with:

Complete table transformation context
All mandatory requirements (13 rules)
Engine-specific SQL generation
Full compliance validation

Step 3: Sub-Agent Will Execute

The specialized agent will:

Validate table existence (MANDATORY first step)
Analyze metadata (columns, types, data samples)
Check configuration (deduplication rules, additional rules)
Detect JSON columns (automatic processing)
Generate SQL files:
- staging/init_queries/{source_db}_{table_name}_init.sql (Presto)
- staging/queries/{source_db}_{table_name}.sql (Presto)
- staging/queries/{source_db}_{table_name}_upsert.sql (if dedup, Presto)
- OR staging_hive/queries/{source_db}_{table_name}.sql (Hive)
Update configuration: staging/config/src_params.yml or staging_hive/config/src_params.yml
Create/verify DIG file: staging/staging_transformation.dig or staging_hive/staging_hive.dig
Execute git workflow: Commit, branch, push, PR creation

Quality Assurance

The sub-agent ensures complete compliance with all requirements:

✅ Column Limit Management (max 200 columns) ✅ JSON Detection & Extraction (automatic) ✅ Date Processing (4 outputs per date column) ✅ Email/Phone Validation (with hashing) ✅ String Standardization (UPPER, TRIM, NULL handling) ✅ Deduplication Logic (if configured) ✅ Join Processing (if specified) ✅ Incremental Processing (state tracking) ✅ SQL File Creation (init, incremental, upsert) ✅ DIG File Management (conditional creation) ✅ Configuration Update (src_params.yml) ✅ Git Workflow (complete automation) ✅ Treasure Data Compatibility (VARCHAR/BIGINT timestamps)

Output Files

For Presto/Trino Engine:

staging/init_queries/{source_db}_{table_name}_init.sql - Initial load SQL
staging/queries/{source_db}_{table_name}.sql - Incremental SQL
staging/queries/{source_db}_{table_name}_upsert.sql - Upsert SQL (if dedup)
staging/config/src_params.yml - Updated configuration
staging/staging_transformation.dig - Workflow (created if not exists)

For Hive Engine:

staging_hive/queries/{source_db}_{table_name}.sql - Combined SQL
staging_hive/config/src_params.yml - Updated configuration
staging_hive/staging_hive.dig - Workflow (created if not exists)
staging_hive/queries/get_max_time.sql - Template (created if not exists)
staging_hive/queries/get_stg_rows_for_delete.sql - Template (created if not exists)

Plus:

Git commit with comprehensive message
Pull request with transformation summary
Validation report

Example Usage

Example 1: Presto Engine (Default)

User: Transform table client_src.customer_profiles_histunion
→ Engine: Presto (default)
→ Sub-agent: staging-transformer-presto
→ Output: staging/ directory files

Example 2: Hive Engine (Explicit)

User: Transform table client_src.klaviyo_events_histunion using Hive
→ Engine: Hive
→ Sub-agent: staging-transformer-hive
→ Output: staging_hive/ directory files

Example 3: With Custom Databases

User: Transform demo_db.orders_histunion
      Use demo_db_stg as staging database
      Use client_config for lookup
→ Engine: Presto (default)
→ Custom databases applied

Next Steps After Transformation

Review generated files:

ls -l staging/queries/
ls -l staging/init_queries/
cat staging/config/src_params.yml

Review Pull Request:
- Check transformation summary
- Verify all validation gates passed
- Review generated SQL

Test the workflow:

cd staging
td wf push
td wf run staging_transformation.dig

Monitor execution:

SELECT * FROM client_config.inc_log
WHERE table_name = '{your_table}'
ORDER BY inc_value DESC
LIMIT 1

Production-Ready Guarantee

All transformations will:

✅ Work the first time
✅ Follow consistent patterns
✅ Include complete error handling
✅ Include comprehensive data quality
✅ Be maintainable and documented
✅ Match production standards exactly

Ready to proceed? Please provide the source database and table name, and I'll delegate to the appropriate staging-transformer agent for complete processing.

5.9 KiB Raw Permalink Blame History