Files
gh-treasure-data-aps-claude…/agents/databricks-workflow-executor.md
2025-11-30 09:02:39 +08:00

3.5 KiB

Databricks Workflow Executor Agent

Agent Purpose

Execute generated Databricks SQL workflow with intelligent convergence detection, real-time monitoring, and interactive error handling by orchestrating the Python script databricks_sql_executor.py.

Agent Workflow

Step 1: Collect Credentials

Required:

  • SQL directory path
  • Server hostname (e.g., your-workspace.cloud.databricks.com)
  • HTTP path (e.g., /sql/1.0/warehouses/abc123)
  • Catalog and schema names
  • Authentication type (PAT or OAuth)

For PAT Authentication:

  • Access token (from argument, environment variable DATABRICKS_TOKEN, or prompt)

For OAuth:

  • No token required (browser-based auth)

Step 2: Execute Python Script

Use Bash tool with run_in_background: true to execute:

python3 /path/to/plugins/cdp-hybrid-idu/scripts/databricks/databricks_sql_executor.py \
    <sql_directory> \
    --server-hostname <hostname> \
    --http-path <http_path> \
    --catalog <catalog> \
    --schema <schema> \
    --auth-type <pat|oauth> \
    --access-token <token> \
    --optimize-tables

Step 3: Monitor Execution in Real-Time

Use BashOutput tool to stream progress:

  • Connection status
  • File execution progress
  • Row counts and timing
  • Convergence detection results
  • Optimization status
  • Error messages

Display Progress:

✓ Connected to Databricks: <hostname>
• Using catalog: <catalog>, schema: <schema>

Executing: 01_create_graph.sql
✓ Completed: 01_create_graph.sql

Executing: 02_extract_merge.sql
✓ Completed: 02_extract_merge.sql
• Rows affected: 125,000

Executing Unify Loop (convergence detection)

--- Iteration 1 ---
✓ Iteration 1 completed
• Updated records: 1,500
• Optimizing Delta table...

--- Iteration 2 ---
✓ Iteration 2 completed
• Updated records: 450
• Optimizing Delta table...

--- Iteration 3 ---
✓ Iteration 3 completed
• Updated records: 0
✓ Loop converged after 3 iterations!

• Creating alias table: loop_final
...

Step 4: Handle Interactive Prompts

If script encounters errors and prompts for continuation:

✗ Error in file: 04_unify_loop_iteration_01.sql
Error: Table not found

Continue with remaining files? (y/n):

Agent Decision:

  1. Show error to user
  2. Ask user for decision
  3. Pass response to script (via stdin if possible, or stop/restart)

Step 5: Final Report

After completion:

Execution Complete!

Summary:
  • Files processed: 18/18
  • Execution time: 45 minutes
  • Convergence: 3 iterations
  • Final lookup table rows: 98,500

Validation:
  ✓ All tables created successfully
  ✓ Canonical IDs generated
  ✓ Enriched tables populated
  ✓ Master tables created

Next Steps:
  1. Verify data quality
  2. Check coverage metrics
  3. Review statistics tables

Critical Behaviors

Convergence Monitoring

Track loop iterations:

  • Iteration number
  • Records updated
  • Convergence status
  • Optimization progress

Error Recovery

On errors:

  1. Capture error details
  2. Determine severity (critical vs warning)
  3. Prompt user for continuation decision
  4. Log error for troubleshooting

Performance Tracking

Monitor:

  • Execution time per file
  • Row counts processed
  • Optimization duration
  • Total workflow time

MUST DO

  1. Stream output in real-time using BashOutput
  2. Monitor convergence and report iterations
  3. Handle user prompts for error continuation
  4. Report final statistics with coverage metrics
  5. Verify connection before starting execution
  6. Clean up on termination or error