zhongwei/gh-treasure-data-aps-claude-tools-plugins-cdp-hybrid-idu

Files

Zhongwei Li 515e7bf6be Initial commit

2025-11-30 09:02:39 +08:00

9.6 KiB

Raw Permalink Blame History

name, description

name	description
hybrid-execute-snowflake	Execute Snowflake ID unification workflow with convergence detection and monitoring

Execute Snowflake ID Unification Workflow

Overview

Execute your generated Snowflake SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling. This command orchestrates the complete unification process from graph creation to master table generation.

What You Need

Required Inputs

SQL Directory: Path to generated SQL files (e.g., snowflake_sql/unify/)
Account: Snowflake account name (e.g., myaccount from myaccount.snowflakecomputing.com)
User: Snowflake username
Database: Target database name
Schema: Target schema name
Warehouse: Compute warehouse name (defaults to COMPUTE_WH)

Authentication

Option 1: Password

Can be provided as argument or via environment variable SNOWFLAKE_PASSWORD via environment file (.env) SNOWFLAKE_PASSWORD
Will prompt if not provided

Option 2: SSO (externalbrowser)

Opens browser for authentication
No password required

Option 3: Key-Pair

Private key path via SNOWFLAKE_PRIVATE_KEY_PATH
Passphrase via SNOWFLAKE_PRIVATE_KEY_PASSPHRASE

What I'll Do

Step 1: Connection Setup

Connect to your Snowflake account
Validate credentials and permissions
Set database and schema context
Verify SQL directory exists
Activate warehouse

Step 2: Execution Plan

Display execution plan with:

All SQL files in execution order
File types (Setup, Loop Iteration, Enrichment, Master Table, etc.)
Estimated steps and dependencies

Step 3: SQL Execution

I'll call the snowflake-workflow-executor agent to:

Execute SQL files in proper sequence
Skip loop iteration files (handled separately)
Monitor progress with real-time feedback
Track row counts and execution times

Step 4: Unify Loop with Convergence Detection

Intelligent Loop Execution:

Iteration 1:
  ✓ Execute unify SQL
  • Check convergence: 1500 records updated
  → Continue to iteration 2

Iteration 2:
  ✓ Execute unify SQL
  • Check convergence: 450 records updated
  → Continue to iteration 3

Iteration 3:
  ✓ Execute unify SQL
  • Check convergence: 0 records updated
  ✓ CONVERGED! Stop loop

Features:

Runs until convergence (updated_count = 0)
Maximum 30 iterations safety limit
Creates alias table (loop_final) for downstream processing

Step 5: Post-Loop Processing

Execute canonicalization step
Generate result statistics
Enrich source tables with canonical IDs
Create master tables
Generate metadata and lookup tables

Step 6: Final Report

Provide:

Total execution time
Files processed successfully
Convergence statistics
Final table row counts
Next steps and recommendations

Command Usage

Interactive Mode (Recommended)

/cdp-hybrid-idu:hybrid-execute-snowflake

I'll prompt you for:
- SQL directory path
- Snowflake account name
- Username
- Database and schema
- Warehouse name
- Authentication method

Advanced Mode

Provide all parameters upfront:

SQL directory: snowflake_sql/unify/
Account: myaccount
User: myuser
Database: my_database
Schema: my_schema
Warehouse: COMPUTE_WH
Password: (will prompt if not in environment)

Execution Features

1. Convergence Detection

Algorithm:

SELECT COUNT(*) as updated_count FROM (
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM current_iteration
    EXCEPT
    SELECT leader_ns, leader_id, follower_ns, follower_id
    FROM previous_iteration
) diff

Stops when: updated_count = 0

2. Interactive Error Handling

If an error occurs:

✗ File: 04_unify_loop_iteration_01.sql
Error: Table not found: source_table

Continue with remaining files? (y/n):

You can choose to:

Continue: Skip failed file, continue with rest
Stop: Halt execution for investigation

3. Real-Time Monitoring

Track progress with:

✓ Completed steps (green)
• Progress indicators (cyan)
✗ Failed steps (red)
⚠ Warnings (yellow)
Row counts and execution times

4. Alias Table Creation

After convergence, creates:

CREATE OR REPLACE TABLE database.schema.unified_id_graph_unify_loop_final
AS SELECT * FROM database.schema.unified_id_graph_unify_loop_3

This allows downstream SQL to reference loop_final regardless of actual iteration count.

Technical Details

Python Script Execution

The agent executes:

python3 scripts/snowflake/snowflake_sql_executor.py \
    snowflake_sql/unify/ \
    --account myaccount \
    --user myuser \
    --database my_database \
    --schema my_schema \
    --warehouse COMPUTE_WH

Execution Order

Setup Phase (01-03):
- Create graph table (loop_0)
- Extract and merge identities
- Generate source statistics
Unification Loop (04):
- Run iterations until convergence
- Check after EVERY iteration
- Stop when updated_count = 0
- Create loop_final alias
Canonicalization (05):
- Create canonical ID lookup
- Create keys and tables metadata
- Rename final graph table
Statistics (06):
- Generate result key statistics
- Create histograms
- Calculate coverage metrics
Enrichment (10-19):
- Add canonical IDs to source tables
- Create enriched_* tables
Master Tables (20-29):
- Aggregate attributes
- Apply priority rules
- Create unified customer profiles
Metadata (30-39):
- Unification metadata
- Filter lookup tables
- Column lookup tables

Connection Management

Establishes single connection for entire workflow
Uses connection pooling for efficiency
Automatic reconnection on timeout
Proper cleanup on completion or error

Example Execution

Input

SQL directory: snowflake_sql/unify/
Account: myorg-myaccount
User: analytics_user
Database: customer_data
Schema: id_unification
Warehouse: LARGE_WH

Output

✓ Connected to Snowflake: myorg-myaccount
• Using database: customer_data, schema: id_unification

Starting Snowflake SQL Execution
• Database: customer_data
• Schema: id_unification

Executing: 01_create_graph.sql
✓ 01_create_graph.sql: Executed successfully

Executing: 02_extract_merge.sql
✓ 02_extract_merge.sql: Executed successfully
• Rows affected: 125000

Executing: 03_source_key_stats.sql
✓ 03_source_key_stats.sql: Executed successfully

Executing Unify Loop Before Canonicalization

--- Iteration 1 ---
✓ Iteration 1 completed
• Rows processed: 125000
• Updated records: 1500

--- Iteration 2 ---
✓ Iteration 2 completed
• Rows processed: 125000
• Updated records: 450

--- Iteration 3 ---
✓ Iteration 3 completed
• Rows processed: 125000
• Updated records: 0
✓ Loop converged after 3 iterations

• Creating alias table for final iteration
✓ Alias table 'unified_id_graph_unify_loop_final' created

Executing: 05_canonicalize.sql
✓ 05_canonicalize.sql: Executed successfully

[... continues with enrichment, master tables, metadata ...]

Execution Complete
• Files processed: 18/18
• Final unified_id_lookup rows: 98,500

• Disconnected from Snowflake

Monitoring and Troubleshooting

Check Execution Progress

During execution, you can monitor:

Snowflake query history
Table sizes and row counts
Warehouse utilization
Execution logs

Common Issues

Issue: Connection timeout Solution: Check network access, verify credentials, ensure warehouse is running

Issue: Table not found Solution: Verify database/schema permissions, check source table names in YAML

Issue: Loop doesn't converge Solution: Check data quality, increase max_iterations, review key validation rules

Issue: Warehouse suspended Solution: Ensure auto-resume is enabled, manually resume warehouse if needed

Issue: Permission denied Solution: Verify database/schema permissions, check role assignments

Performance Optimization

Use larger warehouse for faster execution (L, XL, 2XL, etc.)
Enable multi-cluster warehouse for concurrency
Use clustering keys on frequently joined columns
Monitor query profiles for optimization opportunities

Post-Execution Validation

DO NOT RUN THESE VALIDATION. JUST PRESENT TO USER TO RUN ON SNOWFLAKE

Check Coverage

SELECT
    COUNT(*) as total_records,
    COUNT(unified_id) as records_with_id,
    COUNT(unified_id) * 100.0 / COUNT(*) as coverage_percent
FROM database.schema.enriched_customer_profiles;

Verify Master Table

SELECT COUNT(*) as unified_customers
FROM database.schema.customer_master;

Review Statistics

SELECT * FROM database.schema.unified_id_result_key_stats
WHERE from_table = '*';

Success Criteria

Execution successful when:

✅ All SQL files processed without critical errors
✅ Unification loop converged (updated_count = 0)
✅ Canonical IDs generated for all eligible records
✅ Enriched tables created successfully
✅ Master tables populated with attributes
✅ Coverage metrics meet expectations

Authentication Examples

Using Password

export SNOWFLAKE_PASSWORD='your_password'
/cdp-hybrid-idu:hybrid-execute-snowflake

Using SSO

/cdp-hybrid-idu:hybrid-execute-snowflake
# Will prompt: Use SSO authentication? (y/n): y
# Opens browser for authentication

Using Key-Pair

export SNOWFLAKE_PRIVATE_KEY_PATH='/path/to/key.p8'
export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE='passphrase'
/cdp-hybrid-idu:hybrid-execute-snowflake

Ready to execute your Snowflake ID unification workflow?

Provide your SQL directory path and Snowflake connection details to begin!

9.6 KiB Raw Permalink Blame History