--- model: claude-haiku-4-5-20251001 allowed-tools: Bash(git branch:*), Bash(git status:*), Bash(git log:*), Bash(git diff:*), mcp__*, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__repo_list_pull_requests_by_repo_or_project, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_create_pull_request_thread, mcp__ado__repo_reply_to_comment, mcp__ado__repo_update_pull_request, mcp__ado__repo_search_commits, mcp__ado__pipelines_get_builds, Read, Task argument-hint: [PR_ID] (optional - if not provided, will list all open PRs) # PR Review and Approval --- ## Task Review open pull requests in the current repository and approve/complete them if they meet quality standards. ## Instructions ### 1. Get Repository Information - Use `mcp__ado__repo_get_repo_by_name_or_id` with: - Project: `Program Unify` - Repository: `unify_2_1_dm_synapse_env_d10` - Extract repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338` ### 2. List Open Pull Requests - Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` with: - Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338` - Status: `Active` - If `$ARGUMENTS` provided, filter to that specific PR ID - Display all open PRs with key details (ID, title, source/target branches, author) ### 3. Review Each Pull Request For each PR (or the specified PR): #### 3.1 Get PR Details - Use `mcp__ado__repo_get_pull_request_by_id` to get full PR details - Check merge status - if conflicts exist, stop and report #### 3.2 Get PR Changes - Use `mcp__ado__repo_search_commits` to get commits in the PR - Identify files changed and scope of changes #### 3.3 Review Code Quality Read changed files and evaluate: 1. **Code Quality & Maintainability** - Proper use of type hints and descriptive variable names - Maximum line length (240 chars) compliance - No blank lines inside functions - Proper import organization - Use of `@synapse_error_print_handler` decorator - Proper error handling with meaningful messages 2. **PySpark Best Practices** - DataFrame operations over raw SQL - Proper use of `TableUtilities` methods - Correct logging with `NotebookLogger` - Proper session management 3. **ETL Pattern Compliance** - Follows ETL class pattern for Silver/Gold layers - Proper extract/transform/load method structure - Correct database and table naming conventions 4. **Standards Compliance** - Follows project coding standards from `.claude/rules/python_rules.md` - No missing docstrings (unless explicitly instructed to omit) - Proper use of configuration from `configuration.yaml` #### 3.4 Review DevOps Considerations 1. **CI/CD Integration** - Changes compatible with existing pipeline - No breaking changes to deployment process 2. **Configuration & Infrastructure** - Proper environment detection pattern - Azure integration handled correctly - No hardcoded paths or credentials 3. **Testing & Quality Gates** - Syntax validation would pass - Linting compliance (ruff check) - Test coverage for new functionality #### 3.5 Deep PySpark Analysis (Conditional) **Only execute if PR modifies PySpark ETL code** Check if PR changes affect: - `python_files/pipeline_operations/bronze_layer_deployment.py` - `python_files/pipeline_operations/silver_dag_deployment.py` - `python_files/pipeline_operations/gold_dag_deployment.py` - Any files in `python_files/silver/` - Any files in `python_files/gold/` - `python_files/utilities/session_optimiser.py` **If PySpark files are modified, use Task tool to launch pyspark-engineer agent:** ``` Task tool parameters: - subagent_type: "pyspark-engineer" - description: "Deep PySpark analysis for PR #[PR_ID]" - prompt: " Perform expert-level PySpark analysis for PR #[PR_ID]: PR Details: - Title: [PR_TITLE] - Changed Files: [LIST_OF_CHANGED_FILES] - Source Branch: [SOURCE_BRANCH] - Target Branch: [TARGET_BRANCH] Review Requirements: 1. Read all changed PySpark files 2. Analyze transformation logic for: - Partitioning strategies and data skew - Shuffle optimisation opportunities - Broadcast join usage and optimisation - Memory management and caching strategies - DataFrame operation efficiency 3. Validate Medallion Architecture compliance: - Bronze layer: Raw data preservation patterns - Silver layer: Cleansing and standardization - Gold layer: Business model optimisation 4. Check performance considerations: - Identify potential bottlenecks - Suggest optimisation opportunities - Validate cost-efficiency patterns 5. Verify test coverage: - Check for pytest test files - Validate test completeness - Suggest missing test scenarios 6. Review production readiness: - Error handling for data pipeline failures - Idempotent operation design - Monitoring and logging completeness Provide detailed findings in this format: ## PySpark Analysis Results ### Critical Issues (blocking) - [List any critical performance or correctness issues] ### Performance Optimisations - [Specific optimisation recommendations] ### Architecture Compliance - [Medallion architecture adherence assessment] ### Test Coverage - [Test completeness and gaps] ### Recommendations - [Specific actionable improvements] Return your analysis for integration into the PR review. " ``` **Integration of PySpark Analysis:** - If pyspark-engineer identifies critical issues → Add to review comments - If optimisations suggested → Add as optional improvement comments - If architecture violations found → Add as required changes - Include all findings in final review summary ### 4. Provide Review Comments - Use `mcp__ado__repo_list_pull_request_threads` to check existing review comments - If issues found, use `mcp__ado__repo_create_pull_request_thread` to add: - Specific file-level comments with line numbers - Clear description of issues - Suggested improvements - Mark as `Active` status if changes required ### 5. Approve and Complete PR (if satisfied) **Only proceed if ALL criteria met:** - No merge conflicts - Code quality standards met - PySpark best practices followed - ETL patterns correct - No DevOps concerns - Proper error handling and logging - Standards compliant - **PySpark analysis (if performed) shows no critical issues** - **Performance optimisations either implemented or deferred with justification** - **Medallion architecture compliance validated** **If approved:** 1. Use `mcp__ado__repo_update_pull_request` with: - Set `autoComplete: true` - Set `mergeStrategy: "NoFastForward"` (or "Squash" if many small commits) - Set `deleteSourceBranch: false` (preserve branch history) - Set `transitionWorkItems: true` - Add approval comment explaining what was reviewed 2. Confirm completion with summary: - PR ID and title - Number of commits reviewed - Key changes identified - Approval rationale ### 6. Report Results Provide comprehensive summary: - Total open PRs reviewed - PRs approved and completed (with IDs) - PRs requiring changes (with summary of issues) - PRs blocked by merge conflicts - **PySpark analysis findings (if performed)** - **Performance optimisation recommendations** ## Important Notes - **No deferrals**: All identified issues must be addressed before approval - **Immediate action**: If improvements needed, request them now - no "future work" comments - **Thorough review**: Check both code quality AND DevOps considerations - **Professional objectivity**: Prioritize technical accuracy over validation - **Merge conflicts**: Do NOT approve PRs with merge conflicts - report them for manual resolution