Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 09:02:46 +08:00
commit b81339e588
8 changed files with 2678 additions and 0 deletions

186
commands/transform-table.md Normal file
View File

@@ -0,0 +1,186 @@
---
name: transform-table
description: Transform a single database table to staging format with data quality improvements, PII handling, and JSON extraction
---
# Transform Single Table to Staging
## ⚠️ CRITICAL: This command enforces strict sub-agent delegation
I'll help you transform a database table to staging format using the appropriate staging-transformer sub-agent (Presto/Trino or Hive).
---
## Required Information
Please provide the following details:
### 1. Source Table
- **Database Name**: Source database (e.g., `client_src`, `demo_db`)
- **Table Name**: Source table name (e.g., `customer_profiles_histunion`)
### 2. Target Configuration
- **Staging Database**: Target database (default: `client_stg`)
- **Lookup Database**: Reference database for rules (default: `client_config`)
### 3. SQL Engine (Optional)
- **Engine**: Choose one:
- `presto` or `trino` - Presto/Trino SQL engine (default)
- `hive` - Hive SQL engine
- If not specified, will default to Presto/Trino
### 4. Transformation Requirements (Optional)
- **Deduplication**: Required? (will check client_config.staging_trnsfrm_rules)
- **JSON Columns**: Will auto-detect and process
- **Join Logic**: Any joins needed? (will check additional_rules)
---
## What I'll Do
### Step 1: Detect SQL Engine
I will determine the appropriate sub-agent:
- **Presto/Trino keywords** → `staging-transformer-presto`
- **Hive keywords** → `staging-transformer-hive`
- **No specification** → `staging-transformer-presto` (default)
### Step 2: Delegate to Specialized Agent
I will invoke the appropriate staging-transformer agent with:
- Complete table transformation context
- All mandatory requirements (13 rules)
- Engine-specific SQL generation
- Full compliance validation
### Step 3: Sub-Agent Will Execute
The specialized agent will:
1. **Validate table existence** (MANDATORY first step)
2. **Analyze metadata** (columns, types, data samples)
3. **Check configuration** (deduplication rules, additional rules)
4. **Detect JSON columns** (automatic processing)
5. **Generate SQL files**:
- `staging/init_queries/{source_db}_{table_name}_init.sql` (Presto)
- `staging/queries/{source_db}_{table_name}.sql` (Presto)
- `staging/queries/{source_db}_{table_name}_upsert.sql` (if dedup, Presto)
- OR `staging_hive/queries/{source_db}_{table_name}.sql` (Hive)
6. **Update configuration**: `staging/config/src_params.yml` or `staging_hive/config/src_params.yml`
7. **Create/verify DIG file**: `staging/staging_transformation.dig` or `staging_hive/staging_hive.dig`
8. **Execute git workflow**: Commit, branch, push, PR creation
---
## Quality Assurance
The sub-agent ensures complete compliance with all requirements:
**Column Limit Management** (max 200 columns)
**JSON Detection & Extraction** (automatic)
**Date Processing** (4 outputs per date column)
**Email/Phone Validation** (with hashing)
**String Standardization** (UPPER, TRIM, NULL handling)
**Deduplication Logic** (if configured)
**Join Processing** (if specified)
**Incremental Processing** (state tracking)
**SQL File Creation** (init, incremental, upsert)
**DIG File Management** (conditional creation)
**Configuration Update** (src_params.yml)
**Git Workflow** (complete automation)
**Treasure Data Compatibility** (VARCHAR/BIGINT timestamps)
---
## Output Files
### For Presto/Trino Engine:
1. `staging/init_queries/{source_db}_{table_name}_init.sql` - Initial load SQL
2. `staging/queries/{source_db}_{table_name}.sql` - Incremental SQL
3. `staging/queries/{source_db}_{table_name}_upsert.sql` - Upsert SQL (if dedup)
4. `staging/config/src_params.yml` - Updated configuration
5. `staging/staging_transformation.dig` - Workflow (created if not exists)
### For Hive Engine:
1. `staging_hive/queries/{source_db}_{table_name}.sql` - Combined SQL
2. `staging_hive/config/src_params.yml` - Updated configuration
3. `staging_hive/staging_hive.dig` - Workflow (created if not exists)
4. `staging_hive/queries/get_max_time.sql` - Template (created if not exists)
5. `staging_hive/queries/get_stg_rows_for_delete.sql` - Template (created if not exists)
### Plus:
- Git commit with comprehensive message
- Pull request with transformation summary
- Validation report
---
## Example Usage
### Example 1: Presto Engine (Default)
```
User: Transform table client_src.customer_profiles_histunion
→ Engine: Presto (default)
→ Sub-agent: staging-transformer-presto
→ Output: staging/ directory files
```
### Example 2: Hive Engine (Explicit)
```
User: Transform table client_src.klaviyo_events_histunion using Hive
→ Engine: Hive
→ Sub-agent: staging-transformer-hive
→ Output: staging_hive/ directory files
```
### Example 3: With Custom Databases
```
User: Transform demo_db.orders_histunion
Use demo_db_stg as staging database
Use client_config for lookup
→ Engine: Presto (default)
→ Custom databases applied
```
---
## Next Steps After Transformation
1. **Review generated files**:
```bash
ls -l staging/queries/
ls -l staging/init_queries/
cat staging/config/src_params.yml
```
2. **Review Pull Request**:
- Check transformation summary
- Verify all validation gates passed
- Review generated SQL
3. **Test the workflow**:
```bash
cd staging
td wf push
td wf run staging_transformation.dig
```
4. **Monitor execution**:
```sql
SELECT * FROM client_config.inc_log
WHERE table_name = '{your_table}'
ORDER BY inc_value DESC
LIMIT 1
```
---
## Production-Ready Guarantee
All transformations will:
- ✅ Work the first time
- ✅ Follow consistent patterns
- ✅ Include complete error handling
- ✅ Include comprehensive data quality
- ✅ Be maintainable and documented
- ✅ Match production standards exactly
---
**Ready to proceed? Please provide the source database and table name, and I'll delegate to the appropriate staging-transformer agent for complete processing.**