zhongwei/gh-linus-mcmanamey-unify-2-1-plugin

Files

Zhongwei Li 506a828b22 Initial commit

2025-11-30 08:37:55 +08:00

6.8 KiB

Executable File

Raw Blame History

description: Fires off a agent in the background to complete tasks autonomously argument-hint: [user-prompt] | [task-file-name] allowed-tools: Read, Task, TodoWrite

Background PySpark Data Engineer Agent

Launch a PySpark data engineer agent to work autonomously in the background on ETL tasks, data pipeline fixes, or code reviews.

Usage

Option 1: Direct prompt

/background "Fix the validation issues in g_xa_mg_statsclasscount.py"

Option 2: Task file from .claude/tasks/

/background code_review_fixes_task_list.md

Variables

TASK_INPUT: Either a direct prompt string or a task file name from .claude/tasks/
TASK_FILE_PATH: Full path to task file if using a task file
PROMPT_CONTENT: The actual prompt to send to the agent

Instructions

1. Determine Task Source

Check if $ARGUMENTS looks like a file name (ends with .md or contains no spaces):

If YES: It's a task file name from .claude/tasks/
If NO: It's a direct user prompt

2. Load Task Content

If using task file:

List all available task files in .claude/tasks/ directory
Find the task file matching the provided name (exact match or partial match)
Read the task file content
Use the full task file content as the prompt

If using direct prompt:

Use the $ARGUMENTS directly as the prompt

3. Launch PySpark Data Engineer Agent

Launch the specialized pyspark-data-engineer agent using the Task tool:

Important Configuration:

subagent_type: pyspark-data-engineer
model: sonnet (default) or opus for complex tasks
description: Short 3-5 word description based on task type
prompt: Complete, detailed instructions including:
- The task content (from file or direct prompt)
- Explicit instruction to follow .claude/CLAUDE.md best practices
- Instruction to run quality gates (syntax check, linting, formatting)
- Instruction to create a comprehensive final report

Prompt Template:

You are a PySpark data engineer working on the Unify 2.1 Data Migration project using Azure Synapse Analytics.

CRITICAL INSTRUCTIONS:
- Read and follow ALL guidelines in .claude/CLAUDE.md
- Use .claude/rules/python_rules.md for coding standards
- Maximum line length: 240 characters
- No blank lines inside functions
- Use @synapse_error_print_handler decorator on all methods
- Use NotebookLogger for all logging (not print statements)
- Use TableUtilities methods for DataFrame operations

TASK TO COMPLETE:
{TASK_CONTENT}

QUALITY GATES (MUST RUN BEFORE COMPLETION):
1. Syntax validation: python3 -m py_compile <file_path>
2. Linting: ruff check python_files/
3. Formatting: ruff format python_files/

FINAL REPORT REQUIREMENTS:
Provide a comprehensive report including:
1. Summary of changes made
2. Files modified with line numbers
3. Quality gate results (syntax, linting, formatting)
4. Testing recommendations
5. Any issues encountered and resolutions
6. Next steps or follow-up tasks

Work autonomously and complete all tasks in the list. Use your available tools to read files, make edits, run tests, and validate your work.

4. Inform User

After launching the agent, inform the user:

Agent has been launched in the background
Task being worked on (summary)
Estimated completion time (if known from task file)
The agent will work autonomously and provide a final report

Task File Structure

Expected task file format in .claude/tasks/:

# Task Title

**Date Created**: YYYY-MM-DD
**Priority**: HIGH/MEDIUM/LOW
**Estimated Total Time**: X minutes
**Files Affected**: N

## Task 1: Description
**File**: path/to/file.py
**Line**: 123
**Estimated Time**: X minutes
**Severity**: CRITICAL/HIGH/MEDIUM/LOW

**Current Code**:
```python
# code

Required Fix:

# fixed code

Reason: Explanation Testing: How to verify

(Repeat for each task)


## Examples

### Example 1: Using Task File

User: /background code_review_fixes_task_list.md

Agent Response:

Lists available task files
Finds and reads code_review_fixes_task_list.md
Launches pyspark-data-engineer agent with task content
Informs user: "PySpark data engineer agent launched to complete 9 code review fixes (est. 27 minutes)"


### Example 2: Using Direct Prompt

User: /background "Add data validation methods to the statsclasscount gold table and ensure they are called in the transform method"

Agent Response:

Uses the prompt directly
Launches pyspark-data-engineer agent with the prompt
Informs user: "PySpark data engineer agent launched to add data validation methods"


### Example 3: Partial Task File Name Match

User: /background code_review

Agent Response:

Lists task files and finds "code_review_fixes_task_list.md"
Confirms match with user or proceeds if unambiguous
Launches agent with task content


## Available Task Files

List available task files from `.claude/tasks/` directory when user runs the command without arguments or with "list" argument:

/background /background list


Output:

Available task files in .claude/tasks/:

code_review_fixes_task_list.md (9 tasks, 27 min, HIGH priority)

Usage: /background - Run agent with task file /background "your prompt" - Run agent with direct prompt /background list - Show available task files


## Agent Workflow

The pyspark-data-engineer agent will:

1. **Read Context**: Load .claude/CLAUDE.md, .claude/rules/python_rules.md
2. **Analyze Tasks**: Break down task list into actionable items
3. **Execute Changes**: Read files, make edits, apply fixes
4. **Validate Work**: Run syntax checks, linting, formatting
5. **Test Changes**: Execute relevant tests if available
6. **Generate Report**: Comprehensive summary of all work completed

## Best Practices

### For Task Files
- Keep tasks atomic and well-defined
- Include file paths and line numbers
- Provide current code and required fix
- Specify testing requirements
- Estimate time for each task
- Prioritize tasks (CRITICAL, HIGH, MEDIUM, LOW)

### For Direct Prompts
- Be specific about files and functionality
- Reference table/database names
- Specify layer (bronze, silver, gold)
- Include any business requirements
- Mention quality requirements

## Success Criteria

Agent task completion requires:
- ✅ All code changes implemented
- ✅ Syntax validation passes (python3 -m py_compile)
- ✅ Linting passes (ruff check)
- ✅ Code formatted (ruff format)
- ✅ No new issues introduced
- ✅ Comprehensive final report provided

## Notes

- The agent has access to all project files and tools
- It follows medallion architecture patterns (bronze/silver/gold)
- It uses established utilities (SparkOptimiser, TableUtilities, NotebookLogger)
- It respects project coding standards (240 char lines, no blanks in functions)
- It works autonomously without requiring additional user input
- Results are reported back when complete

6.8 KiB Executable File Raw Blame History

Background PySpark Data Engineer Agent

Usage

Variables

Instructions

1. Determine Task Source

2. Load Task Content

3. Launch PySpark Data Engineer Agent

4. Inform User

Task File Structure

6.8 KiB

Executable File

Raw Blame History