Files
gh-linus-mcmanamey-unify-2-…/commands/pr-review.md
2025-11-30 08:37:55 +08:00

207 lines
7.6 KiB
Markdown
Executable File

---
model: claude-haiku-4-5-20251001
allowed-tools: Bash(git branch:*), Bash(git status:*), Bash(git log:*), Bash(git diff:*), mcp__*, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__repo_list_pull_requests_by_repo_or_project, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_create_pull_request_thread, mcp__ado__repo_reply_to_comment, mcp__ado__repo_update_pull_request, mcp__ado__repo_search_commits, mcp__ado__pipelines_get_builds, Read, Task
argument-hint: [PR_ID] (optional - if not provided, will list all open PRs)
# PR Review and Approval
---
## Task
Review open pull requests in the current repository and approve/complete them if they meet quality standards.
## Instructions
### 1. Get Repository Information
- Use `mcp__ado__repo_get_repo_by_name_or_id` with:
- Project: `Program Unify`
- Repository: `unify_2_1_dm_synapse_env_d10`
- Extract repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
### 2. List Open Pull Requests
- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` with:
- Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
- Status: `Active`
- If `$ARGUMENTS` provided, filter to that specific PR ID
- Display all open PRs with key details (ID, title, source/target branches, author)
### 3. Review Each Pull Request
For each PR (or the specified PR):
#### 3.1 Get PR Details
- Use `mcp__ado__repo_get_pull_request_by_id` to get full PR details
- Check merge status - if conflicts exist, stop and report
#### 3.2 Get PR Changes
- Use `mcp__ado__repo_search_commits` to get commits in the PR
- Identify files changed and scope of changes
#### 3.3 Review Code Quality
Read changed files and evaluate:
1. **Code Quality & Maintainability**
- Proper use of type hints and descriptive variable names
- Maximum line length (240 chars) compliance
- No blank lines inside functions
- Proper import organization
- Use of `@synapse_error_print_handler` decorator
- Proper error handling with meaningful messages
2. **PySpark Best Practices**
- DataFrame operations over raw SQL
- Proper use of `TableUtilities` methods
- Correct logging with `NotebookLogger`
- Proper session management
3. **ETL Pattern Compliance**
- Follows ETL class pattern for Silver/Gold layers
- Proper extract/transform/load method structure
- Correct database and table naming conventions
4. **Standards Compliance**
- Follows project coding standards from `.claude/rules/python_rules.md`
- No missing docstrings (unless explicitly instructed to omit)
- Proper use of configuration from `configuration.yaml`
#### 3.4 Review DevOps Considerations
1. **CI/CD Integration**
- Changes compatible with existing pipeline
- No breaking changes to deployment process
2. **Configuration & Infrastructure**
- Proper environment detection pattern
- Azure integration handled correctly
- No hardcoded paths or credentials
3. **Testing & Quality Gates**
- Syntax validation would pass
- Linting compliance (ruff check)
- Test coverage for new functionality
#### 3.5 Deep PySpark Analysis (Conditional)
**Only execute if PR modifies PySpark ETL code**
Check if PR changes affect:
- `python_files/pipeline_operations/bronze_layer_deployment.py`
- `python_files/pipeline_operations/silver_dag_deployment.py`
- `python_files/pipeline_operations/gold_dag_deployment.py`
- Any files in `python_files/silver/`
- Any files in `python_files/gold/`
- `python_files/utilities/session_optimiser.py`
**If PySpark files are modified, use Task tool to launch pyspark-engineer agent:**
```
Task tool parameters:
- subagent_type: "pyspark-engineer"
- description: "Deep PySpark analysis for PR #[PR_ID]"
- prompt: "
Perform expert-level PySpark analysis for PR #[PR_ID]:
PR Details:
- Title: [PR_TITLE]
- Changed Files: [LIST_OF_CHANGED_FILES]
- Source Branch: [SOURCE_BRANCH]
- Target Branch: [TARGET_BRANCH]
Review Requirements:
1. Read all changed PySpark files
2. Analyze transformation logic for:
- Partitioning strategies and data skew
- Shuffle optimisation opportunities
- Broadcast join usage and optimisation
- Memory management and caching strategies
- DataFrame operation efficiency
3. Validate Medallion Architecture compliance:
- Bronze layer: Raw data preservation patterns
- Silver layer: Cleansing and standardization
- Gold layer: Business model optimisation
4. Check performance considerations:
- Identify potential bottlenecks
- Suggest optimisation opportunities
- Validate cost-efficiency patterns
5. Verify test coverage:
- Check for pytest test files
- Validate test completeness
- Suggest missing test scenarios
6. Review production readiness:
- Error handling for data pipeline failures
- Idempotent operation design
- Monitoring and logging completeness
Provide detailed findings in this format:
## PySpark Analysis Results
### Critical Issues (blocking)
- [List any critical performance or correctness issues]
### Performance Optimisations
- [Specific optimisation recommendations]
### Architecture Compliance
- [Medallion architecture adherence assessment]
### Test Coverage
- [Test completeness and gaps]
### Recommendations
- [Specific actionable improvements]
Return your analysis for integration into the PR review.
"
```
**Integration of PySpark Analysis:**
- If pyspark-engineer identifies critical issues → Add to review comments
- If optimisations suggested → Add as optional improvement comments
- If architecture violations found → Add as required changes
- Include all findings in final review summary
### 4. Provide Review Comments
- Use `mcp__ado__repo_list_pull_request_threads` to check existing review comments
- If issues found, use `mcp__ado__repo_create_pull_request_thread` to add:
- Specific file-level comments with line numbers
- Clear description of issues
- Suggested improvements
- Mark as `Active` status if changes required
### 5. Approve and Complete PR (if satisfied)
**Only proceed if ALL criteria met:**
- No merge conflicts
- Code quality standards met
- PySpark best practices followed
- ETL patterns correct
- No DevOps concerns
- Proper error handling and logging
- Standards compliant
- **PySpark analysis (if performed) shows no critical issues**
- **Performance optimisations either implemented or deferred with justification**
- **Medallion architecture compliance validated**
**If approved:**
1. Use `mcp__ado__repo_update_pull_request` with:
- Set `autoComplete: true`
- Set `mergeStrategy: "NoFastForward"` (or "Squash" if many small commits)
- Set `deleteSourceBranch: false` (preserve branch history)
- Set `transitionWorkItems: true`
- Add approval comment explaining what was reviewed
2. Confirm completion with summary:
- PR ID and title
- Number of commits reviewed
- Key changes identified
- Approval rationale
### 6. Report Results
Provide comprehensive summary:
- Total open PRs reviewed
- PRs approved and completed (with IDs)
- PRs requiring changes (with summary of issues)
- PRs blocked by merge conflicts
- **PySpark analysis findings (if performed)**
- **Performance optimisation recommendations**
## Important Notes
- **No deferrals**: All identified issues must be addressed before approval
- **Immediate action**: If improvements needed, request them now - no "future work" comments
- **Thorough review**: Check both code quality AND DevOps considerations
- **Professional objectivity**: Prioritize technical accuracy over validation
- **Merge conflicts**: Do NOT approve PRs with merge conflicts - report them for manual resolution