---
model: claude-haiku-4-5-20251001
allowed-tools: Bash(git branch:*), Bash(git status:*), Bash(git log:*), Bash(git diff:*), mcp__*, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__repo_list_pull_requests_by_repo_or_project, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_create_pull_request_thread, mcp__ado__repo_reply_to_comment, mcp__ado__repo_update_pull_request, mcp__ado__repo_search_commits, mcp__ado__pipelines_get_builds, Read, Task
argument-hint: [PR_ID] (optional - if not provided, will list all open PRs)
# PR Review and Approval
---
## Task
Review open pull requests in the current repository and approve/complete them if they meet quality standards.

## Instructions

### 1. Get Repository Information
- Use `mcp__ado__repo_get_repo_by_name_or_id` with:
  - Project: `Program Unify`
  - Repository: `unify_2_1_dm_synapse_env_d10`
- Extract repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`

### 2. List Open Pull Requests
- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` with:
  - Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
  - Status: `Active`
- If `$ARGUMENTS` provided, filter to that specific PR ID
- Display all open PRs with key details (ID, title, source/target branches, author)

### 3. Review Each Pull Request
For each PR (or the specified PR):

#### 3.1 Get PR Details
- Use `mcp__ado__repo_get_pull_request_by_id` to get full PR details
- Check merge status - if conflicts exist, stop and report

#### 3.2 Get PR Changes
- Use `mcp__ado__repo_search_commits` to get commits in the PR
- Identify files changed and scope of changes

#### 3.3 Review Code Quality
Read changed files and evaluate:
1. **Code Quality & Maintainability**
   - Proper use of type hints and descriptive variable names
   - Maximum line length (240 chars) compliance
   - No blank lines inside functions
   - Proper import organization
   - Use of `@synapse_error_print_handler` decorator
   - Proper error handling with meaningful messages

2. **PySpark Best Practices**
   - DataFrame operations over raw SQL
   - Proper use of `TableUtilities` methods
   - Correct logging with `NotebookLogger`
   - Proper session management

3. **ETL Pattern Compliance**
   - Follows ETL class pattern for Silver/Gold layers
   - Proper extract/transform/load method structure
   - Correct database and table naming conventions

4. **Standards Compliance**
   - Follows project coding standards from `.claude/rules/python_rules.md`
   - No missing docstrings (unless explicitly instructed to omit)
   - Proper use of configuration from `configuration.yaml`

#### 3.4 Review DevOps Considerations
1. **CI/CD Integration**
   - Changes compatible with existing pipeline
   - No breaking changes to deployment process

2. **Configuration & Infrastructure**
   - Proper environment detection pattern
   - Azure integration handled correctly
   - No hardcoded paths or credentials

3. **Testing & Quality Gates**
   - Syntax validation would pass
   - Linting compliance (ruff check)
   - Test coverage for new functionality

#### 3.5 Deep PySpark Analysis (Conditional)
**Only execute if PR modifies PySpark ETL code**

Check if PR changes affect:
- `python_files/pipeline_operations/bronze_layer_deployment.py`
- `python_files/pipeline_operations/silver_dag_deployment.py`
- `python_files/pipeline_operations/gold_dag_deployment.py`
- Any files in `python_files/silver/`
- Any files in `python_files/gold/`
- `python_files/utilities/session_optimiser.py`

**If PySpark files are modified, use Task tool to launch pyspark-engineer agent:**

```
Task tool parameters:
- subagent_type: "pyspark-engineer"
- description: "Deep PySpark analysis for PR #[PR_ID]"
- prompt: "
  Perform expert-level PySpark analysis for PR #[PR_ID]:

  PR Details:
  - Title: [PR_TITLE]
  - Changed Files: [LIST_OF_CHANGED_FILES]
  - Source Branch: [SOURCE_BRANCH]
  - Target Branch: [TARGET_BRANCH]

  Review Requirements:
  1. Read all changed PySpark files
  2. Analyze transformation logic for:
     - Partitioning strategies and data skew
     - Shuffle optimisation opportunities
     - Broadcast join usage and optimisation
     - Memory management and caching strategies
     - DataFrame operation efficiency
  3. Validate Medallion Architecture compliance:
     - Bronze layer: Raw data preservation patterns
     - Silver layer: Cleansing and standardization
     - Gold layer: Business model optimisation
  4. Check performance considerations:
     - Identify potential bottlenecks
     - Suggest optimisation opportunities
     - Validate cost-efficiency patterns
  5. Verify test coverage:
     - Check for pytest test files
     - Validate test completeness
     - Suggest missing test scenarios
  6. Review production readiness:
     - Error handling for data pipeline failures
     - Idempotent operation design
     - Monitoring and logging completeness

  Provide detailed findings in this format:

  ## PySpark Analysis Results

  ### Critical Issues (blocking)
  - [List any critical performance or correctness issues]

  ### Performance Optimisations
  - [Specific optimisation recommendations]

  ### Architecture Compliance
  - [Medallion architecture adherence assessment]

  ### Test Coverage
  - [Test completeness and gaps]

  ### Recommendations
  - [Specific actionable improvements]

  Return your analysis for integration into the PR review.
  "
```

**Integration of PySpark Analysis:**
- If pyspark-engineer identifies critical issues → Add to review comments
- If optimisations suggested → Add as optional improvement comments
- If architecture violations found → Add as required changes
- Include all findings in final review summary

### 4. Provide Review Comments
- Use `mcp__ado__repo_list_pull_request_threads` to check existing review comments
- If issues found, use `mcp__ado__repo_create_pull_request_thread` to add:
  - Specific file-level comments with line numbers
  - Clear description of issues
  - Suggested improvements
  - Mark as `Active` status if changes required

### 5. Approve and Complete PR (if satisfied)
**Only proceed if ALL criteria met:**
- No merge conflicts
- Code quality standards met
- PySpark best practices followed
- ETL patterns correct
- No DevOps concerns
- Proper error handling and logging
- Standards compliant
- **PySpark analysis (if performed) shows no critical issues**
- **Performance optimisations either implemented or deferred with justification**
- **Medallion architecture compliance validated**

**If approved:**
1. Use `mcp__ado__repo_update_pull_request` with:
   - Set `autoComplete: true`
   - Set `mergeStrategy: "NoFastForward"` (or "Squash" if many small commits)
   - Set `deleteSourceBranch: false` (preserve branch history)
   - Set `transitionWorkItems: true`
   - Add approval comment explaining what was reviewed

2. Confirm completion with summary:
   - PR ID and title
   - Number of commits reviewed
   - Key changes identified
   - Approval rationale

### 6. Report Results
Provide comprehensive summary:
- Total open PRs reviewed
- PRs approved and completed (with IDs)
- PRs requiring changes (with summary of issues)
- PRs blocked by merge conflicts
- **PySpark analysis findings (if performed)**
- **Performance optimisation recommendations**

## Important Notes
- **No deferrals**: All identified issues must be addressed before approval
- **Immediate action**: If improvements needed, request them now - no "future work" comments
- **Thorough review**: Check both code quality AND DevOps considerations
- **Professional objectivity**: Prioritize technical accuracy over validation
- **Merge conflicts**: Do NOT approve PRs with merge conflicts - report them for manual resolution