zhongwei/gh-treasure-data-aps-claude-tools-plugins-cdp-hybrid-idu

Files

Zhongwei Li 515e7bf6be Initial commit

2025-11-30 09:02:39 +08:00

3.5 KiB

Raw Blame History

Databricks Workflow Executor Agent

Agent Purpose

Execute generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script databricks_sql_executor.py.

Agent Workflow

Step 1: Collect Credentials

Required:

SQL directory path
Server hostname (e.g., your-workspace.cloud.databricks.com)
HTTP path (e.g., /sql/1.0/warehouses/abc123)
Catalog and schema names
Authentication type (PAT or OAuth)

For PAT Authentication:

Access token (from argument, environment variable DATABRICKS_TOKEN, or prompt)

For OAuth:

No token required (browser-based auth)

Step 2: Execute Python Script

Use Bash tool with run_in_background: true to execute:

python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/databricks_sql_executor.py \
    <sql_directory> \
    --server-hostname <hostname> \
    --http-path <http_path> \
    --catalog <catalog> \
    --schema <schema> \
    --auth-type <pat|oauth> \
    --access-token <token> \
    --optimize-tables

Step 3: Monitor Execution in Real-Time

Use BashOutput tool to stream progress:

Connection status
File execution progress
Row counts and timing
Convergence detection results
Optimization status
Error messages

Display Progress:

✓ Connected to Databricks: <hostname>
• Using catalog: <catalog>, schema: <schema>

Executing: 01_create_graph.sql
✓ Completed: 01_create_graph.sql

Executing: 02_extract_merge.sql
✓ Completed: 02_extract_merge.sql
• Rows affected: 125,000

Executing Unify Loop (convergence detection)

--- Iteration 1 ---
✓ Iteration 1 completed
• Updated records: 1,500
• Optimizing Delta table...

--- Iteration 2 ---
✓ Iteration 2 completed
• Updated records: 450
• Optimizing Delta table...

--- Iteration 3 ---
✓ Iteration 3 completed
• Updated records: 0
✓ Loop converged after 3 iterations!

• Creating alias table: loop_final
...

Step 4: Handle Interactive Prompts

If script encounters errors and prompts for continuation:

✗ Error in file: 04_unify_loop_iteration_01.sql
Error: Table not found

Continue with remaining files? (y/n):

Agent Decision:

Show error to user
Ask user for decision
Pass response to script (via stdin if possible, or stop/restart)

Step 5: Final Report

After completion:

Execution Complete!

Summary:
  • Files processed: 18/18
  • Execution time: 45 minutes
  • Convergence: 3 iterations
  • Final lookup table rows: 98,500

Validation:
  ✓ All tables created successfully
  ✓ Canonical IDs generated
  ✓ Enriched tables populated
  ✓ Master tables created

Next Steps:
  1. Verify data quality
  2. Check coverage metrics
  3. Review statistics tables

Critical Behaviors

Convergence Monitoring

Track loop iterations:

Iteration number
Records updated
Convergence status
Optimization progress

Error Recovery

On errors:

Capture error details
Determine severity (critical vs warning)
Prompt user for continuation decision
Log error for troubleshooting

Performance Tracking

Monitor:

Execution time per file
Row counts processed
Optimization duration
Total workflow time

MUST DO

Stream output in real-time using BashOutput
Monitor convergence and report iterations
Handle user prompts for error continuation
Report final statistics with coverage metrics
Verify connection before starting execution
Clean up on termination or error

3.5 KiB Raw Blame History