Initial commit
This commit is contained in:
237
commands/background.md
Executable file
237
commands/background.md
Executable file
@@ -0,0 +1,237 @@
|
||||
---
|
||||
description: Fires off a agent in the background to complete tasks autonomously
|
||||
argument-hint: [user-prompt] | [task-file-name]
|
||||
allowed-tools: Read, Task, TodoWrite
|
||||
---
|
||||
|
||||
# Background PySpark Data Engineer Agent
|
||||
|
||||
Launch a PySpark data engineer agent to work autonomously in the background on ETL tasks, data pipeline fixes, or code reviews.
|
||||
|
||||
## Usage
|
||||
|
||||
**Option 1: Direct prompt**
|
||||
```
|
||||
/background "Fix the validation issues in g_xa_mg_statsclasscount.py"
|
||||
```
|
||||
|
||||
**Option 2: Task file from .claude/tasks/**
|
||||
```
|
||||
/background code_review_fixes_task_list.md
|
||||
```
|
||||
|
||||
## Variables
|
||||
|
||||
- `TASK_INPUT`: Either a direct prompt string or a task file name from `.claude/tasks/`
|
||||
- `TASK_FILE_PATH`: Full path to task file if using a task file
|
||||
- `PROMPT_CONTENT`: The actual prompt to send to the agent
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Determine Task Source
|
||||
|
||||
Check if `$ARGUMENTS` looks like a file name (ends with `.md` or contains no spaces):
|
||||
- If YES: It's a task file name from `.claude/tasks/`
|
||||
- If NO: It's a direct user prompt
|
||||
|
||||
### 2. Load Task Content
|
||||
|
||||
**If using task file:**
|
||||
1. List all available task files in `.claude/tasks/` directory
|
||||
2. Find the task file matching the provided name (exact match or partial match)
|
||||
3. Read the task file content
|
||||
4. Use the full task file content as the prompt
|
||||
|
||||
**If using direct prompt:**
|
||||
1. Use the `$ARGUMENTS` directly as the prompt
|
||||
|
||||
### 3. Launch PySpark Data Engineer Agent
|
||||
|
||||
Launch the specialized `pyspark-data-engineer` agent using the Task tool:
|
||||
|
||||
**Important Configuration:**
|
||||
- **subagent_type**: `pyspark-data-engineer`
|
||||
- **model**: `sonnet` (default) or `opus` for complex tasks
|
||||
- **description**: Short 3-5 word description based on task type
|
||||
- **prompt**: Complete, detailed instructions including:
|
||||
- The task content (from file or direct prompt)
|
||||
- Explicit instruction to follow `.claude/CLAUDE.md` best practices
|
||||
- Instruction to run quality gates (syntax check, linting, formatting)
|
||||
- Instruction to create a comprehensive final report
|
||||
|
||||
**Prompt Template:**
|
||||
```
|
||||
You are a PySpark data engineer working on the Unify 2.1 Data Migration project using Azure Synapse Analytics.
|
||||
|
||||
CRITICAL INSTRUCTIONS:
|
||||
- Read and follow ALL guidelines in .claude/CLAUDE.md
|
||||
- Use .claude/rules/python_rules.md for coding standards
|
||||
- Maximum line length: 240 characters
|
||||
- No blank lines inside functions
|
||||
- Use @synapse_error_print_handler decorator on all methods
|
||||
- Use NotebookLogger for all logging (not print statements)
|
||||
- Use TableUtilities methods for DataFrame operations
|
||||
|
||||
TASK TO COMPLETE:
|
||||
{TASK_CONTENT}
|
||||
|
||||
QUALITY GATES (MUST RUN BEFORE COMPLETION):
|
||||
1. Syntax validation: python3 -m py_compile <file_path>
|
||||
2. Linting: ruff check python_files/
|
||||
3. Formatting: ruff format python_files/
|
||||
|
||||
FINAL REPORT REQUIREMENTS:
|
||||
Provide a comprehensive report including:
|
||||
1. Summary of changes made
|
||||
2. Files modified with line numbers
|
||||
3. Quality gate results (syntax, linting, formatting)
|
||||
4. Testing recommendations
|
||||
5. Any issues encountered and resolutions
|
||||
6. Next steps or follow-up tasks
|
||||
|
||||
Work autonomously and complete all tasks in the list. Use your available tools to read files, make edits, run tests, and validate your work.
|
||||
```
|
||||
|
||||
### 4. Inform User
|
||||
|
||||
After launching the agent, inform the user:
|
||||
- Agent has been launched in the background
|
||||
- Task being worked on (summary)
|
||||
- Estimated completion time (if known from task file)
|
||||
- The agent will work autonomously and provide a final report
|
||||
|
||||
## Task File Structure
|
||||
|
||||
Expected task file format in `.claude/tasks/`:
|
||||
|
||||
```markdown
|
||||
# Task Title
|
||||
|
||||
**Date Created**: YYYY-MM-DD
|
||||
**Priority**: HIGH/MEDIUM/LOW
|
||||
**Estimated Total Time**: X minutes
|
||||
**Files Affected**: N
|
||||
|
||||
## Task 1: Description
|
||||
**File**: path/to/file.py
|
||||
**Line**: 123
|
||||
**Estimated Time**: X minutes
|
||||
**Severity**: CRITICAL/HIGH/MEDIUM/LOW
|
||||
|
||||
**Current Code**:
|
||||
```python
|
||||
# code
|
||||
```
|
||||
|
||||
**Required Fix**:
|
||||
```python
|
||||
# fixed code
|
||||
```
|
||||
|
||||
**Reason**: Explanation
|
||||
**Testing**: How to verify
|
||||
|
||||
---
|
||||
|
||||
(Repeat for each task)
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Using Task File
|
||||
```
|
||||
User: /background code_review_fixes_task_list.md
|
||||
|
||||
Agent Response:
|
||||
1. Lists available task files
|
||||
2. Finds and reads code_review_fixes_task_list.md
|
||||
3. Launches pyspark-data-engineer agent with task content
|
||||
4. Informs user: "PySpark data engineer agent launched to complete 9 code review fixes (est. 27 minutes)"
|
||||
```
|
||||
|
||||
### Example 2: Using Direct Prompt
|
||||
```
|
||||
User: /background "Add data validation methods to the statsclasscount gold table and ensure they are called in the transform method"
|
||||
|
||||
Agent Response:
|
||||
1. Uses the prompt directly
|
||||
2. Launches pyspark-data-engineer agent with the prompt
|
||||
3. Informs user: "PySpark data engineer agent launched to add data validation methods"
|
||||
```
|
||||
|
||||
### Example 3: Partial Task File Name Match
|
||||
```
|
||||
User: /background code_review
|
||||
|
||||
Agent Response:
|
||||
1. Lists task files and finds "code_review_fixes_task_list.md"
|
||||
2. Confirms match with user or proceeds if unambiguous
|
||||
3. Launches agent with task content
|
||||
```
|
||||
|
||||
## Available Task Files
|
||||
|
||||
List available task files from `.claude/tasks/` directory when user runs the command without arguments or with "list" argument:
|
||||
|
||||
```
|
||||
/background
|
||||
/background list
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Available task files in .claude/tasks/:
|
||||
1. code_review_fixes_task_list.md (9 tasks, 27 min, HIGH priority)
|
||||
|
||||
Usage:
|
||||
/background <task-file-name> - Run agent with task file
|
||||
/background "your prompt" - Run agent with direct prompt
|
||||
/background list - Show available task files
|
||||
```
|
||||
|
||||
## Agent Workflow
|
||||
|
||||
The pyspark-data-engineer agent will:
|
||||
|
||||
1. **Read Context**: Load .claude/CLAUDE.md, .claude/rules/python_rules.md
|
||||
2. **Analyze Tasks**: Break down task list into actionable items
|
||||
3. **Execute Changes**: Read files, make edits, apply fixes
|
||||
4. **Validate Work**: Run syntax checks, linting, formatting
|
||||
5. **Test Changes**: Execute relevant tests if available
|
||||
6. **Generate Report**: Comprehensive summary of all work completed
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Task Files
|
||||
- Keep tasks atomic and well-defined
|
||||
- Include file paths and line numbers
|
||||
- Provide current code and required fix
|
||||
- Specify testing requirements
|
||||
- Estimate time for each task
|
||||
- Prioritize tasks (CRITICAL, HIGH, MEDIUM, LOW)
|
||||
|
||||
### For Direct Prompts
|
||||
- Be specific about files and functionality
|
||||
- Reference table/database names
|
||||
- Specify layer (bronze, silver, gold)
|
||||
- Include any business requirements
|
||||
- Mention quality requirements
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Agent task completion requires:
|
||||
- ✅ All code changes implemented
|
||||
- ✅ Syntax validation passes (python3 -m py_compile)
|
||||
- ✅ Linting passes (ruff check)
|
||||
- ✅ Code formatted (ruff format)
|
||||
- ✅ No new issues introduced
|
||||
- ✅ Comprehensive final report provided
|
||||
|
||||
## Notes
|
||||
|
||||
- The agent has access to all project files and tools
|
||||
- It follows medallion architecture patterns (bronze/silver/gold)
|
||||
- It uses established utilities (SparkOptimiser, TableUtilities, NotebookLogger)
|
||||
- It respects project coding standards (240 char lines, no blanks in functions)
|
||||
- It works autonomously without requiring additional user input
|
||||
- Results are reported back when complete
|
||||
181
commands/branch-cleanup.md
Executable file
181
commands/branch-cleanup.md
Executable file
@@ -0,0 +1,181 @@
|
||||
---
|
||||
allowed-tools: Bash(git branch:*), Bash(git checkout:*), Bash(git push:*), Bash(git merge:*), Bash(gh:*), Read, Grep
|
||||
argument-hint: [--dry-run] | [--force] | [--remote-only] | [--local-only]
|
||||
description: Use PROACTIVELY to clean up merged branches, stale remotes, and organize branch structure
|
||||
|
||||
---
|
||||
|
||||
# Git Branch Cleanup & Organization
|
||||
|
||||
Clean up merged branches and organize repository structure: $ARGUMENTS
|
||||
|
||||
## Current Repository State
|
||||
|
||||
- All branches: !`git branch -a`
|
||||
- Recent branches: !`git for-each-ref --count=10 --sort=-committerdate refs/heads/ --format='%(refname:short) - %(committerdate:relative)'`
|
||||
- Remote branches: !`git branch -r`
|
||||
- Merged branches: !`git branch --merged main 2>/dev/null || git branch --merged master 2>/dev/null || echo "No main/master branch found"`
|
||||
- Current branch: !`git branch --show-current`
|
||||
|
||||
## Task
|
||||
|
||||
Perform comprehensive branch cleanup and organization based on the repository state and provided arguments.
|
||||
|
||||
## Cleanup Operations
|
||||
|
||||
### 1. Identify Branches for Cleanup
|
||||
- **Merged branches**: Find local branches already merged into main/master
|
||||
- **Stale remote branches**: Identify remote-tracking branches that no longer exist
|
||||
- **Old branches**: Detect branches with no recent activity (>30 days)
|
||||
- **Feature branches**: Organize feature/* hotfix/* release/* branches
|
||||
|
||||
### 2. Safety Checks Before Deletion
|
||||
- Verify branches are actually merged using `git merge-base`
|
||||
- Check if branches have unpushed commits
|
||||
- Confirm branches aren't the current working branch
|
||||
- Validate against protected branch patterns
|
||||
|
||||
### 3. Branch Categories to Handle
|
||||
- **Safe to delete**: Merged feature branches, old hotfix branches
|
||||
- **Needs review**: Unmerged branches with old commits
|
||||
- **Keep**: Main branches (main, master, develop), active feature branches
|
||||
- **Archive**: Long-running branches that might need preservation
|
||||
|
||||
### 4. Remote Branch Synchronization
|
||||
- Remove remote-tracking branches for deleted remotes
|
||||
- Prune remote references with `git remote prune origin`
|
||||
- Update branch tracking relationships
|
||||
- Clean up remote branch references
|
||||
|
||||
## Command Modes
|
||||
|
||||
### Default Mode (Interactive)
|
||||
1. Show branch analysis with recommendations
|
||||
2. Ask for confirmation before each deletion
|
||||
3. Provide summary of actions taken
|
||||
4. Offer to push deletions to remote
|
||||
|
||||
### Dry Run Mode (`--dry-run`)
|
||||
1. Show what would be deleted without making changes
|
||||
2. Display branch analysis and recommendations
|
||||
3. Provide cleanup statistics
|
||||
4. Exit without modifying repository
|
||||
|
||||
### Force Mode (`--force`)
|
||||
1. Delete merged branches without confirmation
|
||||
2. Clean up stale remotes automatically
|
||||
3. Provide summary of all actions taken
|
||||
4. Use with caution - no undo capability
|
||||
|
||||
### Remote Only (`--remote-only`)
|
||||
1. Only clean up remote-tracking branches
|
||||
2. Synchronize with actual remote state
|
||||
3. Remove stale remote references
|
||||
4. Keep all local branches intact
|
||||
|
||||
### Local Only (`--local-only`)
|
||||
1. Only clean up local branches
|
||||
2. Don't affect remote-tracking branches
|
||||
3. Keep remote synchronization intact
|
||||
4. Focus on local workspace organization
|
||||
|
||||
## Safety Features
|
||||
|
||||
### Pre-cleanup Validation
|
||||
- Ensure working directory is clean
|
||||
- Check for uncommitted changes
|
||||
- Verify current branch is safe (not target for deletion)
|
||||
- Create backup references if requested
|
||||
|
||||
### Protected Branches
|
||||
Never delete branches matching these patterns:
|
||||
- `main`, `master`, `develop`, `staging`, `production`
|
||||
- `release/*` (unless explicitly confirmed)
|
||||
- Current working branch
|
||||
- Branches with unpushed commits (unless forced)
|
||||
|
||||
### Recovery Information
|
||||
- Display git reflog references for deleted branches
|
||||
- Provide commands to recover accidentally deleted branches
|
||||
- Show SHA hashes for branch tips before deletion
|
||||
- Create recovery script if multiple branches deleted
|
||||
|
||||
## Branch Organization Features
|
||||
|
||||
### Naming Convention Enforcement
|
||||
- Suggest renaming branches to follow team conventions
|
||||
- Organize branches by type (feature/, bugfix/, hotfix/)
|
||||
- Identify branches that don't follow naming patterns
|
||||
- Provide batch renaming suggestions
|
||||
|
||||
### Branch Tracking Setup
|
||||
- Set up proper upstream tracking for feature branches
|
||||
- Configure push/pull behavior for new branches
|
||||
- Identify branches missing upstream configuration
|
||||
- Fix broken tracking relationships
|
||||
|
||||
## Output and Reporting
|
||||
|
||||
### Cleanup Summary
|
||||
```
|
||||
Branch Cleanup Summary:
|
||||
✅ Deleted 3 merged feature branches
|
||||
✅ Removed 5 stale remote references
|
||||
✅ Cleaned up 2 old hotfix branches
|
||||
⚠️ Found 1 unmerged branch requiring attention
|
||||
📊 Repository now has 8 active branches (was 18)
|
||||
```
|
||||
|
||||
### Recovery Instructions
|
||||
```
|
||||
Branch Recovery Commands:
|
||||
git checkout -b feature/user-auth 1a2b3c4d # Recover feature/user-auth
|
||||
git push origin feature/user-auth # Restore to remote
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Regular Maintenance Schedule
|
||||
- Run cleanup weekly for active repositories
|
||||
- Use `--dry-run` first to review changes
|
||||
- Coordinate with team before major cleanups
|
||||
- Document any non-standard branches to preserve
|
||||
|
||||
### Team Coordination
|
||||
- Communicate branch deletion plans with team
|
||||
- Check if anyone has work-in-progress on old branches
|
||||
- Use GitHub/GitLab branch protection rules
|
||||
- Maintain shared documentation of branch policies
|
||||
|
||||
### Branch Lifecycle Management
|
||||
- Delete feature branches immediately after merge
|
||||
- Keep release branches until next major release
|
||||
- Archive long-term experimental branches
|
||||
- Use tags to mark important branch states before deletion
|
||||
|
||||
## Example Usage
|
||||
|
||||
```bash
|
||||
# Safe interactive cleanup
|
||||
/branch-cleanup
|
||||
|
||||
# See what would be cleaned without changes
|
||||
/branch-cleanup --dry-run
|
||||
|
||||
# Clean only remote tracking branches
|
||||
/branch-cleanup --remote-only
|
||||
|
||||
# Force cleanup of merged branches
|
||||
/branch-cleanup --force
|
||||
|
||||
# Clean only local branches
|
||||
/branch-cleanup --local-only
|
||||
```
|
||||
|
||||
## Integration with GitHub/GitLab
|
||||
|
||||
If GitHub CLI or GitLab CLI is available:
|
||||
- Check PR status before deleting branches
|
||||
- Verify branches are actually merged in web interface
|
||||
- Clean up both local and remote branches consistently
|
||||
- Update branch protection rules if needed
|
||||
70
commands/code-review.md
Executable file
70
commands/code-review.md
Executable file
@@ -0,0 +1,70 @@
|
||||
---
|
||||
allowed-tools: Read, Bash, Grep, Glob
|
||||
argument-hint: [file-path] | [commit-hash] | --full
|
||||
description: Comprehensive code quality review with security, performance, and architecture analysis
|
||||
|
||||
---
|
||||
|
||||
# Code Quality Review
|
||||
|
||||
Perform comprehensive code quality review: $ARGUMENTS
|
||||
|
||||
## Current State
|
||||
|
||||
- Git status: !`git status --porcelain`
|
||||
- Recent changes: !`git diff --stat HEAD~5`
|
||||
- Repository info: !`git log --oneline -5`
|
||||
- Build status: !`npm run build --dry-run 2>/dev/null || echo "No build script"`
|
||||
|
||||
## Task
|
||||
|
||||
Follow these steps to conduct a thorough code review:
|
||||
|
||||
1. **Repository Analysis**
|
||||
- Examine the repository structure and identify the primary language/framework
|
||||
- Check for configuration files (package.json, requirements.txt, Cargo.toml, etc.)
|
||||
- Review README and documentation for context
|
||||
|
||||
2. **Code Quality Assessment**
|
||||
- Scan for code smells, anti-patterns, and potential bugs
|
||||
- Check for consistent coding style and naming conventions
|
||||
- Identify unused imports, variables, or dead code
|
||||
- Review error handling and logging practices
|
||||
|
||||
3. **Security Review**
|
||||
- Look for common security vulnerabilities (SQL injection, XSS, etc.)
|
||||
- Check for hardcoded secrets, API keys, or passwords
|
||||
- Review authentication and authorization logic
|
||||
- Examine input validation and sanitization
|
||||
|
||||
4. **Performance Analysis**
|
||||
- Identify potential performance bottlenecks
|
||||
- Check for inefficient algorithms or database queries
|
||||
- Review memory usage patterns and potential leaks
|
||||
- Analyze bundle size and optimization opportunities
|
||||
|
||||
5. **Architecture & Design**
|
||||
- Evaluate code organization and separation of concerns
|
||||
- Check for proper abstraction and modularity
|
||||
- Review dependency management and coupling
|
||||
- Assess scalability and maintainability
|
||||
|
||||
6. **Testing Coverage**
|
||||
- Check existing test coverage and quality
|
||||
- Identify areas lacking proper testing
|
||||
- Review test structure and organization
|
||||
- Suggest additional test scenarios
|
||||
|
||||
7. **Documentation Review**
|
||||
- Evaluate code comments and inline documentation
|
||||
- Check API documentation completeness
|
||||
- Review README and setup instructions
|
||||
- Identify areas needing better documentation
|
||||
|
||||
8. **Recommendations**
|
||||
- Prioritize issues by severity (critical, high, medium, low)
|
||||
- Provide specific, actionable recommendations
|
||||
- Suggest tools and practices for improvement
|
||||
- Create a summary report with next steps
|
||||
|
||||
Remember to be constructive and provide specific examples with file paths and line numbers where applicable.
|
||||
130
commands/create-feature.md
Executable file
130
commands/create-feature.md
Executable file
@@ -0,0 +1,130 @@
|
||||
---
|
||||
allowed-tools: Read, Write, Edit, Bash
|
||||
argument-hint: [feature-name] | [feature-type] [name]
|
||||
description: Scaffold new feature with boilerplate code, tests, and documentation
|
||||
|
||||
---
|
||||
|
||||
# Create Feature
|
||||
|
||||
Scaffold new feature: $ARGUMENTS
|
||||
|
||||
## Current Project Context
|
||||
|
||||
- Project structure: !`find . -maxdepth 2 -type d -name src -o -name components -o -name features | head -5`
|
||||
- Current branch: !`git branch --show-current`
|
||||
- Package info: @package.json or @Cargo.toml or @requirements.txt (if exists)
|
||||
- Architecture docs: @docs/architecture.md or @README.md (if exists)
|
||||
|
||||
## Task
|
||||
|
||||
Follow this systematic approach to create a new feature: $ARGUMENTS
|
||||
|
||||
1. **Feature Planning**
|
||||
- Define the feature requirements and acceptance criteria
|
||||
- Break down the feature into smaller, manageable tasks
|
||||
- Identify affected components and potential impact areas
|
||||
- Plan the API/interface design before implementation
|
||||
|
||||
2. **Research and Analysis**
|
||||
- Study existing codebase patterns and conventions
|
||||
- Identify similar features for consistency
|
||||
- Research external dependencies or libraries needed
|
||||
- Review any relevant documentation or specifications
|
||||
|
||||
3. **Architecture Design**
|
||||
- Design the feature architecture and data flow
|
||||
- Plan database schema changes if needed
|
||||
- Define API endpoints and contracts
|
||||
- Consider scalability and performance implications
|
||||
|
||||
4. **Environment Setup**
|
||||
- Create a new feature branch: `git checkout -b feature/$ARGUMENTS`
|
||||
- Ensure development environment is up to date
|
||||
- Install any new dependencies required
|
||||
- Set up feature flags if applicable
|
||||
|
||||
5. **Implementation Strategy**
|
||||
- Start with core functionality and build incrementally
|
||||
- Follow the project's coding standards and patterns
|
||||
- Implement proper error handling and validation
|
||||
- Use dependency injection and maintain loose coupling
|
||||
|
||||
6. **Database Changes (if applicable)**
|
||||
- Create migration scripts for schema changes
|
||||
- Ensure backward compatibility
|
||||
- Plan for rollback scenarios
|
||||
- Test migrations on sample data
|
||||
|
||||
7. **API Development**
|
||||
- Implement API endpoints with proper HTTP status codes
|
||||
- Add request/response validation
|
||||
- Implement proper authentication and authorization
|
||||
- Document API contracts and examples
|
||||
|
||||
8. **Frontend Implementation (if applicable)**
|
||||
- Create reusable components following project patterns
|
||||
- Implement responsive design and accessibility
|
||||
- Add proper state management
|
||||
- Handle loading and error states
|
||||
|
||||
9. **Testing Implementation**
|
||||
- Write unit tests for core business logic
|
||||
- Create integration tests for API endpoints
|
||||
- Add end-to-end tests for user workflows
|
||||
- Test error scenarios and edge cases
|
||||
|
||||
10. **Security Considerations**
|
||||
- Implement proper input validation and sanitization
|
||||
- Add authorization checks for sensitive operations
|
||||
- Review for common security vulnerabilities
|
||||
- Ensure data protection and privacy compliance
|
||||
|
||||
11. **Performance Optimization**
|
||||
- Optimize database queries and indexes
|
||||
- Implement caching where appropriate
|
||||
- Monitor memory usage and optimize algorithms
|
||||
- Consider lazy loading and pagination
|
||||
|
||||
12. **Documentation**
|
||||
- Add inline code documentation and comments
|
||||
- Update API documentation
|
||||
- Create user documentation if needed
|
||||
- Update project README if applicable
|
||||
|
||||
13. **Code Review Preparation**
|
||||
- Run all tests and ensure they pass
|
||||
- Run linting and formatting tools
|
||||
- Check for code coverage and quality metrics
|
||||
- Perform self-review of the changes
|
||||
|
||||
14. **Integration Testing**
|
||||
- Test feature integration with existing functionality
|
||||
- Verify feature flags work correctly
|
||||
- Test deployment and rollback procedures
|
||||
- Validate monitoring and logging
|
||||
|
||||
15. **Commit and Push**
|
||||
- Create atomic commits with descriptive messages
|
||||
- Follow conventional commit format if project uses it
|
||||
- Push feature branch: `git push origin feature/$ARGUMENTS`
|
||||
|
||||
16. **Pull Request Creation**
|
||||
- Create PR with comprehensive description
|
||||
- Include screenshots or demos if applicable
|
||||
- Add appropriate labels and reviewers
|
||||
- Link to any related issues or specifications
|
||||
|
||||
17. **Quality Assurance**
|
||||
- Coordinate with QA team for testing
|
||||
- Address any bugs or issues found
|
||||
- Verify accessibility and usability requirements
|
||||
- Test on different environments and browsers
|
||||
|
||||
18. **Deployment Planning**
|
||||
- Plan feature rollout strategy
|
||||
- Set up monitoring and alerting
|
||||
- Prepare rollback procedures
|
||||
- Schedule deployment and communication
|
||||
|
||||
Remember to maintain code quality, follow project conventions, and prioritize user experience throughout the development process.
|
||||
19
commands/create-pr.md
Executable file
19
commands/create-pr.md
Executable file
@@ -0,0 +1,19 @@
|
||||
# Create Pull Request Command
|
||||
|
||||
Create a new branch, commit changes, and submit a pull request.
|
||||
|
||||
## Behavior
|
||||
- Creates a new branch based on current changes
|
||||
- Formats modified files using Biome
|
||||
- Analyzes changes and automatically splits into logical commits when appropriate
|
||||
- Each commit focuses on a single logical change or feature
|
||||
- Creates descriptive commit messages for each logical unit
|
||||
- Pushes branch to remote
|
||||
- Creates pull request with proper summary and test plan
|
||||
|
||||
## Guidelines for Automatic Commit Splitting
|
||||
- Split commits by feature, component, or concern
|
||||
- Keep related file changes together in the same commit
|
||||
- Separate refactoring from feature additions
|
||||
- Ensure each commit can be understood independently
|
||||
- Multiple unrelated changes should be split into separate commits
|
||||
36
commands/create-prd.md
Executable file
36
commands/create-prd.md
Executable file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
allowed-tools: Read, Write, Edit, Grep, Glob
|
||||
argument-hint: [feature-name] | --template | --interactive
|
||||
description: Create Product Requirements Document (PRD) for new features
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
# Create Product Requirements Document
|
||||
|
||||
You are an experienced Product Manager. Create a Product Requirements Document (PRD) for a feature we are adding to the product: **$ARGUMENTS**
|
||||
|
||||
**IMPORTANT:**
|
||||
- Focus on the feature and user needs, not technical implementation
|
||||
- Do not include any time estimates
|
||||
|
||||
## Product Context
|
||||
|
||||
1. **Product Documentation**: @product-development/resources/product.md (to understand the product)
|
||||
2. **Feature Documentation**: @product-development/current-feature/feature.md (to understand the feature idea)
|
||||
3. **JTBD Documentation**: @product-development/current-feature/JTBD.md (to understand the Jobs to be Done)
|
||||
|
||||
## Task
|
||||
|
||||
Create a comprehensive PRD document that captures the what, why, and how of the product:
|
||||
|
||||
1. Use the PRD template from `@product-development/resources/PRD-template.md`
|
||||
2. Based on the feature documentation, create a PRD that defines:
|
||||
- Problem statement and user needs
|
||||
- Feature specifications and scope
|
||||
- Success metrics and acceptance criteria
|
||||
- User experience requirements
|
||||
- Technical considerations (high-level only)
|
||||
|
||||
3. Output the completed PRD to `product-development/current-feature/PRD.md`
|
||||
|
||||
Focus on creating a comprehensive PRD that clearly defines the feature requirements while maintaining alignment with user needs and business objectives.
|
||||
126
commands/create-pull-request.md
Executable file
126
commands/create-pull-request.md
Executable file
@@ -0,0 +1,126 @@
|
||||
# How to Create a Pull Request Using GitHub CLI
|
||||
|
||||
This guide explains how to create pull requests using GitHub CLI in our project.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Install GitHub CLI if you haven't already:
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install gh
|
||||
|
||||
# Windows
|
||||
winget install --id GitHub.cli
|
||||
|
||||
# Linux
|
||||
# Follow instructions at https://github.com/cli/cli/blob/trunk/docs/install_linux.md
|
||||
```
|
||||
|
||||
2. Authenticate with GitHub:
|
||||
```bash
|
||||
gh auth login
|
||||
```
|
||||
|
||||
## Creating a New Pull Request
|
||||
|
||||
1. First, prepare your PR description following the template in `.github/pull_request_template.md`
|
||||
|
||||
2. Use the `gh pr create` command to create a new pull request:
|
||||
|
||||
```bash
|
||||
# Basic command structure
|
||||
gh pr create --title "✨(scope): Your descriptive title" --body "Your PR description" --base main --draft
|
||||
```
|
||||
|
||||
For more complex PR descriptions with proper formatting, use the `--body-file` option with the exact PR template structure:
|
||||
|
||||
```bash
|
||||
# Create PR with proper template structure
|
||||
gh pr create --title "✨(scope): Your descriptive title" --body-file <(echo -e "## Issue\n\n- resolve:\n\n## Why is this change needed?\nYour description here.\n\n## What would you like reviewers to focus on?\n- Point 1\n- Point 2\n\n## Testing Verification\nHow you tested these changes.\n\n## What was done\npr_agent:summary\n\n## Detailed Changes\npr_agent:walkthrough\n\n## Additional Notes\nAny additional notes.") --base main --draft
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **PR Title Format**: Use conventional commit format with emojis
|
||||
|
||||
- Always include an appropriate emoji at the beginning of the title
|
||||
- Use the actual emoji character (not the code representation like `:sparkles:`)
|
||||
- Examples:
|
||||
- `✨(supabase): Add staging remote configuration`
|
||||
- `🐛(auth): Fix login redirect issue`
|
||||
- `📝(readme): Update installation instructions`
|
||||
|
||||
2. **Description Template**: Always use our PR template structure from `.github/pull_request_template.md`:
|
||||
|
||||
- Issue reference
|
||||
- Why the change is needed
|
||||
- Review focus points
|
||||
- Testing verification
|
||||
- PR-Agent sections (keep `pr_agent:summary` and `pr_agent:walkthrough` tags intact)
|
||||
- Additional notes
|
||||
|
||||
3. **Template Accuracy**: Ensure your PR description precisely follows the template structure:
|
||||
|
||||
- Don't modify or rename the PR-Agent sections (`pr_agent:summary` and `pr_agent:walkthrough`)
|
||||
- Keep all section headers exactly as they appear in the template
|
||||
- Don't add custom sections that aren't in the template
|
||||
|
||||
4. **Draft PRs**: Start as draft when the work is in progress
|
||||
- Use `--draft` flag in the command
|
||||
- Convert to ready for review when complete using `gh pr ready`
|
||||
|
||||
### Common Mistakes to Avoid
|
||||
|
||||
1. **Incorrect Section Headers**: Always use the exact section headers from the template
|
||||
2. **Modifying PR-Agent Sections**: Don't remove or modify the `pr_agent:summary` and `pr_agent:walkthrough` placeholders
|
||||
3. **Adding Custom Sections**: Stick to the sections defined in the template
|
||||
4. **Using Outdated Templates**: Always refer to the current `.github/pull_request_template.md` file
|
||||
|
||||
### Missing Sections
|
||||
|
||||
Always include all template sections, even if some are marked as "N/A" or "None"
|
||||
|
||||
## Additional GitHub CLI PR Commands
|
||||
|
||||
Here are some additional useful GitHub CLI commands for managing PRs:
|
||||
|
||||
```bash
|
||||
# List your open pull requests
|
||||
gh pr list --author "@me"
|
||||
|
||||
# Check PR status
|
||||
gh pr status
|
||||
|
||||
# View a specific PR
|
||||
gh pr view <PR-NUMBER>
|
||||
|
||||
# Check out a PR branch locally
|
||||
gh pr checkout <PR-NUMBER>
|
||||
|
||||
# Convert a draft PR to ready for review
|
||||
gh pr ready <PR-NUMBER>
|
||||
|
||||
# Add reviewers to a PR
|
||||
gh pr edit <PR-NUMBER> --add-reviewer username1,username2
|
||||
|
||||
# Merge a PR
|
||||
gh pr merge <PR-NUMBER> --squash
|
||||
```
|
||||
|
||||
## Using Templates for PR Creation
|
||||
|
||||
To simplify PR creation with consistent descriptions, you can create a template file:
|
||||
|
||||
1. Create a file named `pr-template.md` with your PR template
|
||||
2. Use it when creating PRs:
|
||||
|
||||
```bash
|
||||
gh pr create --title "feat(scope): Your title" --body-file pr-template.md --base main --draft
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [PR Template](.github/pull_request_template.md)
|
||||
- [Conventional Commits](https://www.conventionalcommits.org/)
|
||||
- [GitHub CLI documentation](https://cli.github.com/manual/)
|
||||
197
commands/describe.md
Executable file
197
commands/describe.md
Executable file
@@ -0,0 +1,197 @@
|
||||
---
|
||||
allowed-tools: Read, mcp__mcp-server-motherduck__query, Grep, Glob, Bash
|
||||
argument-hint: [file-path] (optional - defaults to currently open file)
|
||||
description: Add comprehensive descriptive comments to code files, focusing on data flow, joining logic, and business context
|
||||
---
|
||||
|
||||
# Add Descriptive Comments to Code
|
||||
|
||||
Add detailed, descriptive comments to the selected file: $ARGUMENTS
|
||||
|
||||
## Current Context
|
||||
|
||||
- Currently open file: !`echo $CLAUDE_OPEN_FILE`
|
||||
- File layer detection: !`basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown"`
|
||||
- Git status: !`git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"`
|
||||
|
||||
## Task
|
||||
|
||||
You will add comprehensive descriptive comments to the **currently open file** (or the file specified in $ARGUMENTS if provided).
|
||||
|
||||
### Instructions
|
||||
|
||||
1. **Determine Target File**
|
||||
- If $ARGUMENTS contains a file path, use that file
|
||||
- Otherwise, use the currently open file from the IDE
|
||||
- Verify the file exists and is readable
|
||||
|
||||
2. **Analyze File Context**
|
||||
- Identify the file type (silver/gold layer transformation, utility, pipeline operation)
|
||||
- Read and understand the complete file structure
|
||||
- Identify the ETL pattern (extract, transform, load methods)
|
||||
- Map out all DataFrame operations and transformations
|
||||
|
||||
3. **Analyze Data Sources and Schemas**
|
||||
- Use DuckDB MCP to query relevant source tables if available:
|
||||
```sql
|
||||
-- Example: Check schema of source table
|
||||
DESCRIBE table_name;
|
||||
SELECT * FROM table_name LIMIT 5;
|
||||
```
|
||||
- Reference `.claude/memory/data_dictionary/` for column definitions and business context
|
||||
- Identify all source tables being read (bronze/silver layer)
|
||||
- Document the schema of input and output DataFrames
|
||||
|
||||
4. **Document Joining Logic (Priority Focus)**
|
||||
- For each join operation, add comments explaining:
|
||||
- **WHY** the join is happening (business reason)
|
||||
- **WHAT** tables are being joined
|
||||
- **JOIN TYPE** (left, inner, outer) and why that type was chosen
|
||||
- **JOIN KEYS** and their meaning
|
||||
- **EXPECTED CARDINALITY** (1:1, 1:many, many:many)
|
||||
- **NULL HANDLING** strategy for unmatched records
|
||||
|
||||
Example format:
|
||||
```python
|
||||
# JOIN: Link incidents to persons involved
|
||||
# Type: LEFT JOIN (preserve all incidents even if person data missing)
|
||||
# Keys: incident_id (unique identifier from FVMS system)
|
||||
# Expected: 1:many (one incident can have multiple persons)
|
||||
# Nulls: Person details will be NULL for incidents with no associated persons
|
||||
joined_df = incident_df.join(person_df, on="incident_id", how="left")
|
||||
```
|
||||
|
||||
5. **Document Transformations Step-by-Step**
|
||||
- Add inline comments explaining each transformation
|
||||
- Describe column derivations and calculations
|
||||
- Explain business rules being applied
|
||||
- Document any data quality fixes or cleansing
|
||||
- Note any deduplication logic
|
||||
|
||||
6. **Document Data Quality Patterns**
|
||||
- Explain null handling strategies
|
||||
- Document default values and their business meaning
|
||||
- Describe validation rules
|
||||
- Note any data type conversions
|
||||
|
||||
7. **Add Function/Method Documentation**
|
||||
- Add docstring-style comments at the start of each method explaining:
|
||||
- Purpose of the method
|
||||
- Input: Source tables and their schemas
|
||||
- Output: Resulting table and schema
|
||||
- Business logic summary
|
||||
|
||||
Example format:
|
||||
```python
|
||||
def transform(self) -> DataFrame:
|
||||
"""
|
||||
Transform incident data with person and location enrichment.
|
||||
|
||||
Input: bronze_fvms.b_fvms_incident (raw incident records)
|
||||
Output: silver_fvms.s_fvms_incident (validated, enriched incidents)
|
||||
|
||||
Transformations:
|
||||
1. Join with person table to add demographic details
|
||||
2. Join with address table to add location coordinates
|
||||
3. Apply business rules for incident classification
|
||||
4. Deduplicate based on incident_id and date_created
|
||||
5. Add row hash for change detection
|
||||
|
||||
Business Context:
|
||||
- Incidents represent family violence events recorded in FVMS
|
||||
- Each incident may involve multiple persons (victims, offenders)
|
||||
- Location data enables geographic analysis and reporting
|
||||
"""
|
||||
```
|
||||
|
||||
8. **Add Header Comments**
|
||||
- Add a comprehensive header at the top of the file explaining:
|
||||
- File purpose and business context
|
||||
- Source systems and tables
|
||||
- Target table and database
|
||||
- Key transformations and business rules
|
||||
- Dependencies on other tables or processes
|
||||
|
||||
9. **Variable Naming Context**
|
||||
- When variable names are abbreviated or unclear, add comments explaining:
|
||||
- What the variable represents
|
||||
- The business meaning of the data
|
||||
- Expected data types and formats
|
||||
- Reference data dictionary entries if available
|
||||
|
||||
10. **Use Data Dictionary References**
|
||||
- Check `.claude/memory/data_dictionary/` for column definitions
|
||||
- Reference these definitions in comments to explain field meanings
|
||||
- Link business terminology to technical column names
|
||||
- Example: `# offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)`
|
||||
|
||||
11. **Query DuckDB for Context (When Available)**
|
||||
- Use MCP DuckDB tool to inspect actual data patterns:
|
||||
- Check distinct values: `SELECT DISTINCT column_name FROM table LIMIT 20;`
|
||||
- Verify join relationships: `SELECT COUNT(*) FROM table1 JOIN table2 ...`
|
||||
- Understand data distributions: `SELECT column, COUNT(*) FROM table GROUP BY column;`
|
||||
- Use insights from queries to write more accurate comments
|
||||
|
||||
12. **Preserve Code Formatting Standards**
|
||||
- Do NOT add blank lines inside functions (project standard)
|
||||
- Maximum line length: 240 characters
|
||||
- Maintain existing indentation
|
||||
- Keep comments concise but informative
|
||||
- Use inline comments for single-line explanations
|
||||
- Use block comments for multi-step processes
|
||||
|
||||
13. **Focus Areas by File Type**
|
||||
|
||||
**Silver Layer Files (`python_files/silver/`):**
|
||||
- Document source bronze tables
|
||||
- Explain validation rules
|
||||
- Describe enumeration mappings
|
||||
- Note data cleansing operations
|
||||
|
||||
**Gold Layer Files (`python_files/gold/`):**
|
||||
- Document all source silver tables
|
||||
- Explain aggregation logic
|
||||
- Describe business metrics calculations
|
||||
- Note analytical transformations
|
||||
|
||||
**Utility Files (`python_files/utilities/`):**
|
||||
- Explain helper function purposes
|
||||
- Document parameter meanings
|
||||
- Describe return values
|
||||
- Note edge cases handled
|
||||
|
||||
14. **Comment Quality Guidelines**
|
||||
- Comments should explain **WHY**, not just **WHAT**
|
||||
- Avoid obvious comments (e.g., don't say "create dataframe" for `df = spark.createDataFrame()`)
|
||||
- Focus on business context and data relationships
|
||||
- Use proper grammar and complete sentences
|
||||
- Be concise but thorough
|
||||
- Think like a new developer reading the code for the first time
|
||||
|
||||
15. **Final Validation**
|
||||
- Run syntax check: `python3 -m py_compile <file>`
|
||||
- Run linting: `ruff check <file>`
|
||||
- Format code: `ruff format <file>`
|
||||
- Ensure all comments are accurate and helpful
|
||||
|
||||
## Example Output Structure
|
||||
|
||||
After adding comments, the file should have:
|
||||
- ✅ Comprehensive header explaining file purpose
|
||||
- ✅ Method-level documentation for extract/transform/load
|
||||
- ✅ Detailed join operation comments (business reason, type, keys, cardinality)
|
||||
- ✅ Step-by-step transformation explanations
|
||||
- ✅ Data quality and validation logic documented
|
||||
- ✅ Variable context for unclear names
|
||||
- ✅ References to data dictionary where applicable
|
||||
- ✅ Business context linking technical operations to real-world meaning
|
||||
|
||||
## Important Notes
|
||||
- **ALWAYS** use Australian English spelling conventions throughout the comments and documentation
|
||||
- **DO NOT** remove or modify existing functionality
|
||||
- **DO NOT** change code structure or logic
|
||||
- **ONLY** add descriptive comments
|
||||
- **PRESERVE** all existing comments
|
||||
- **MAINTAIN** project coding standards (no blank lines in functions, 240 char max)
|
||||
- **USE** the data dictionary and DuckDB queries to provide accurate context
|
||||
- **THINK** about the user who will read this code - walk them through the logic clearly
|
||||
88
commands/dev-agent.md
Executable file
88
commands/dev-agent.md
Executable file
@@ -0,0 +1,88 @@
|
||||
# PySpark Azure Synapse Expert Agent
|
||||
|
||||
## Overview
|
||||
Expert data engineer specializing in PySpark development within Azure Synapse Analytics environment. Focuses on scalable data processing, optimization, and enterprise-grade solutions.
|
||||
|
||||
## Core Competencies
|
||||
|
||||
### PySpark Expertise
|
||||
- Advanced DataFrame/Dataset operations
|
||||
- Performance optimization and tuning
|
||||
- Custom UDFs and aggregations
|
||||
- Spark SQL query optimization
|
||||
- Memory management and partitioning strategies
|
||||
|
||||
### Azure Synapse Mastery
|
||||
- Synapse Spark pools configuration
|
||||
- Integration with Azure Data Lake Storage
|
||||
- Synapse Pipelines orchestration
|
||||
- Serverless SQL pools interaction
|
||||
|
||||
|
||||
### Data Engineering Skills
|
||||
- ETL/ELT pipeline design
|
||||
- Data quality and validation frameworks
|
||||
|
||||
## Technical Stack
|
||||
|
||||
### Languages & Frameworks
|
||||
- **Primary**: Python, PySpark
|
||||
- **Secondary**: SQL, PowerShell
|
||||
- **Libraries**: pandas, numpy, pytest
|
||||
|
||||
### Azure Services
|
||||
- Azure Synapse Analytics
|
||||
- Azure Data Lake Storage Gen2
|
||||
- Azure Key Vault
|
||||
|
||||
### Tools & Platforms
|
||||
- Git/Azure DevOps
|
||||
- Jupyter/Synapse Notebooks
|
||||
|
||||
## Responsibilities
|
||||
|
||||
### Development
|
||||
- Design optimized PySpark jobs for large-scale data processing
|
||||
- Implement data transformation logic with performance considerations
|
||||
- Create reusable libraries and frameworks
|
||||
- Build automated testing suites for data pipelines
|
||||
|
||||
### Optimization
|
||||
- Analyze and tune Spark job performance
|
||||
- Optimize cluster configurations and resource allocation
|
||||
- Implement caching strategies and data skew handling
|
||||
- Monitor and troubleshoot production workloads
|
||||
|
||||
### Architecture
|
||||
- Design scalable data lake architectures
|
||||
- Establish data partitioning and storage strategies
|
||||
- Define data governance and security protocols
|
||||
- Create disaster recovery and backup procedures
|
||||
|
||||
## Best Practices
|
||||
**CRITICAL** read .claude/CLAUDE.md for best practices
|
||||
|
||||
|
||||
### Performance
|
||||
- Leverage broadcast joins and bucketing
|
||||
- Optimize shuffle operations and partition sizes
|
||||
- Use appropriate file formats (Parquet, Delta)
|
||||
- Implement incremental processing patterns
|
||||
|
||||
### Security
|
||||
- Implement row-level and column-level security
|
||||
- Use managed identities and service principals
|
||||
- Encrypt data at rest and in transit
|
||||
- Follow least privilege access principles
|
||||
|
||||
## Communication Style
|
||||
- Provides technical solutions with clear performance implications
|
||||
- Focuses on scalable, production-ready implementations
|
||||
- Emphasizes best practices and enterprise patterns
|
||||
- Delivers concise explanations with practical examples
|
||||
|
||||
## Key Metrics
|
||||
- Pipeline execution time and resource utilization
|
||||
- Data quality scores and SLA compliance
|
||||
- Cost optimization and resource efficiency
|
||||
- System reliability and uptime statistics
|
||||
194
commands/explain-code.md
Executable file
194
commands/explain-code.md
Executable file
@@ -0,0 +1,194 @@
|
||||
# Analyze and Explain Code Functionality
|
||||
|
||||
Analyze and explain code functionality
|
||||
|
||||
## Instructions
|
||||
|
||||
Follow this systematic approach to explain code: **$ARGUMENTS**
|
||||
|
||||
1. **Code Context Analysis**
|
||||
- Identify the programming language and framework
|
||||
- Understand the broader context and purpose of the code
|
||||
- Identify the file location and its role in the project
|
||||
- Review related imports, dependencies, and configurations
|
||||
|
||||
2. **High-Level Overview**
|
||||
- Provide a summary of what the code does
|
||||
- Explain the main purpose and functionality
|
||||
- Identify the problem the code is solving
|
||||
- Describe how it fits into the larger system
|
||||
|
||||
3. **Code Structure Breakdown**
|
||||
- Break down the code into logical sections
|
||||
- Identify classes, functions, and methods
|
||||
- Explain the overall architecture and design patterns
|
||||
- Map out data flow and control flow
|
||||
|
||||
4. **Line-by-Line Analysis**
|
||||
- Explain complex or non-obvious lines of code
|
||||
- Describe variable declarations and their purposes
|
||||
- Explain function calls and their parameters
|
||||
- Clarify conditional logic and loops
|
||||
|
||||
5. **Algorithm and Logic Explanation**
|
||||
- Describe the algorithm or approach being used
|
||||
- Explain the logic behind complex calculations
|
||||
- Break down nested conditions and loops
|
||||
- Clarify recursive or asynchronous operations
|
||||
|
||||
6. **Data Structures and Types**
|
||||
- Explain data types and structures being used
|
||||
- Describe how data is transformed or processed
|
||||
- Explain object relationships and hierarchies
|
||||
- Clarify input and output formats
|
||||
|
||||
7. **Framework and Library Usage**
|
||||
- Explain framework-specific patterns and conventions
|
||||
- Describe library functions and their purposes
|
||||
- Explain API calls and their expected responses
|
||||
- Clarify configuration and setup code
|
||||
|
||||
8. **Error Handling and Edge Cases**
|
||||
- Explain error handling mechanisms
|
||||
- Describe exception handling and recovery
|
||||
- Identify edge cases being handled
|
||||
- Explain validation and defensive programming
|
||||
|
||||
9. **Performance Considerations**
|
||||
- Identify performance-critical sections
|
||||
- Explain optimization techniques being used
|
||||
- Describe complexity and scalability implications
|
||||
- Point out potential bottlenecks or inefficiencies
|
||||
|
||||
10. **Security Implications**
|
||||
- Identify security-related code sections
|
||||
- Explain authentication and authorization logic
|
||||
- Describe input validation and sanitization
|
||||
- Point out potential security vulnerabilities
|
||||
|
||||
11. **Testing and Debugging**
|
||||
- Explain how the code can be tested
|
||||
- Identify debugging points and logging
|
||||
- Describe mock data or test scenarios
|
||||
- Explain test helpers and utilities
|
||||
|
||||
12. **Dependencies and Integrations**
|
||||
- Explain external service integrations
|
||||
- Describe database operations and queries
|
||||
- Explain API interactions and protocols
|
||||
- Clarify third-party library usage
|
||||
|
||||
**Explanation Format Examples:**
|
||||
|
||||
**For Complex Algorithms:**
|
||||
```
|
||||
This function implements a depth-first search algorithm:
|
||||
|
||||
1. Line 1-3: Initialize a stack with the starting node and a visited set
|
||||
2. Line 4-8: Main loop - continue until stack is empty
|
||||
3. Line 9-11: Pop a node and check if it's the target
|
||||
4. Line 12-15: Add unvisited neighbors to the stack
|
||||
5. Line 16: Return null if target not found
|
||||
|
||||
Time Complexity: O(V + E) where V is vertices and E is edges
|
||||
Space Complexity: O(V) for the visited set and stack
|
||||
```
|
||||
|
||||
**For API Integration Code:**
|
||||
```
|
||||
This code handles user authentication with a third-party service:
|
||||
|
||||
1. Extract credentials from request headers
|
||||
2. Validate credential format and required fields
|
||||
3. Make API call to authentication service
|
||||
4. Handle response and extract user data
|
||||
5. Create session token and set cookies
|
||||
6. Return user profile or error response
|
||||
|
||||
Error Handling: Catches network errors, invalid credentials, and service unavailability
|
||||
Security: Uses HTTPS, validates inputs, and sanitizes responses
|
||||
```
|
||||
|
||||
**For Database Operations:**
|
||||
```
|
||||
This function performs a complex database query with joins:
|
||||
|
||||
1. Build base query with primary table
|
||||
2. Add LEFT JOIN for related user data
|
||||
3. Apply WHERE conditions for filtering
|
||||
4. Add ORDER BY for consistent sorting
|
||||
5. Implement pagination with LIMIT/OFFSET
|
||||
6. Execute query and handle potential errors
|
||||
7. Transform raw results into domain objects
|
||||
|
||||
Performance Notes: Uses indexes on filtered columns, implements connection pooling
|
||||
```
|
||||
|
||||
13. **Common Patterns and Idioms**
|
||||
- Identify language-specific patterns and idioms
|
||||
- Explain design patterns being implemented
|
||||
- Describe architectural patterns in use
|
||||
- Clarify naming conventions and code style
|
||||
|
||||
14. **Potential Improvements**
|
||||
- Suggest code improvements and optimizations
|
||||
- Identify possible refactoring opportunities
|
||||
- Point out maintainability concerns
|
||||
- Recommend best practices and standards
|
||||
|
||||
15. **Related Code and Context**
|
||||
- Reference related functions and classes
|
||||
- Explain how this code interacts with other components
|
||||
- Describe the calling context and usage patterns
|
||||
- Point to relevant documentation and resources
|
||||
|
||||
16. **Debugging and Troubleshooting**
|
||||
- Explain how to debug issues in this code
|
||||
- Identify common failure points
|
||||
- Describe logging and monitoring approaches
|
||||
- Suggest testing strategies
|
||||
|
||||
**Language-Specific Considerations:**
|
||||
|
||||
**JavaScript/TypeScript:**
|
||||
- Explain async/await and Promise handling
|
||||
- Describe closure and scope behavior
|
||||
- Clarify this binding and arrow functions
|
||||
- Explain event handling and callbacks
|
||||
|
||||
**Python:**
|
||||
- Explain list comprehensions and generators
|
||||
- Describe decorator usage and purpose
|
||||
- Clarify context managers and with statements
|
||||
- Explain class inheritance and method resolution
|
||||
|
||||
**Java:**
|
||||
- Explain generics and type parameters
|
||||
- Describe annotation usage and processing
|
||||
- Clarify stream operations and lambda expressions
|
||||
- Explain exception hierarchy and handling
|
||||
|
||||
**C#:**
|
||||
- Explain LINQ queries and expressions
|
||||
- Describe async/await and Task handling
|
||||
- Clarify delegate and event usage
|
||||
- Explain nullable reference types
|
||||
|
||||
**Go:**
|
||||
- Explain goroutines and channel usage
|
||||
- Describe interface implementation
|
||||
- Clarify error handling patterns
|
||||
- Explain package structure and imports
|
||||
|
||||
**Rust:**
|
||||
- Explain ownership and borrowing
|
||||
- Describe lifetime annotations
|
||||
- Clarify pattern matching and Option/Result types
|
||||
- Explain trait implementations
|
||||
|
||||
Remember to:
|
||||
- Use clear, non-technical language when possible
|
||||
- Provide examples and analogies for complex concepts
|
||||
- Structure explanations logically from high-level to detailed
|
||||
- Include visual diagrams or flowcharts when helpful
|
||||
- Tailor the explanation level to the intended audience
|
||||
361
commands/local-commit.md
Executable file
361
commands/local-commit.md
Executable file
@@ -0,0 +1,361 @@
|
||||
---
|
||||
allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item
|
||||
argument-hint: [message] | --no-verify | --amend | --pr-s | --pr-d | --pr-m
|
||||
description: Create well-formatted commits with conventional commit format and emoji, integrated with Azure DevOps
|
||||
---
|
||||
|
||||
# Smart Git Commit with Azure DevOps Integration
|
||||
|
||||
Create well-formatted commit: $ARGUMENTS
|
||||
|
||||
## Repository Configuration
|
||||
- **Project**: Program Unify
|
||||
- **Repository ID**: e030ea00-2f85-4b19-88c3-05a864d7298d
|
||||
- **Repository Name**: unify_2_1_dm_synapse_env_d10
|
||||
- **Branch Structure**: `feature/* → staging → develop → main`
|
||||
- **Main Branch**: main
|
||||
|
||||
## Implementation Logic for Claude
|
||||
|
||||
When processing this command, Claude should:
|
||||
|
||||
1. **Detect Repository**: Check if current repo is `unify_2_1_dm_synapse_env_d10`
|
||||
- Use `git remote -v` or check current directory path
|
||||
- Can also use `mcp__ado__repo_get_repo_by_name_or_id` to verify
|
||||
|
||||
2. **Parse Arguments**: Extract flags from `$ARGUMENTS`
|
||||
- **PR Flags**:
|
||||
- `--pr-s`: Set target = `staging`
|
||||
- `--pr-d`: Set target = `develop`
|
||||
- `--pr-m`: Set target = `main`
|
||||
- `--pr` (no suffix): ERROR if unify_2_1_dm_synapse_env_d10, else target = `develop`
|
||||
|
||||
3. **Validate Current Branch** (if PR flag provided):
|
||||
- Get current branch: `git branch --show-current`
|
||||
- For `--pr-s`: Require `feature/*` branch (reject `staging`, `develop`, `main`)
|
||||
- For `--pr-d`: Require `staging` branch exactly
|
||||
- For `--pr-m`: Require `develop` branch exactly
|
||||
- If validation fails: Show clear error and exit
|
||||
|
||||
4. **Execute Commit Workflow**:
|
||||
- Stage changes (`git add .` )
|
||||
- Create commit with emoji conventional format
|
||||
- Run pre-commit hooks (unless `--no-verify`)
|
||||
- Push to current branch
|
||||
|
||||
5. **Create Pull Request** (if PR flag):
|
||||
- Call `mcp__ado__repo_create_pull_request` with:
|
||||
- `repository_id`: e030ea00-2f85-4b19-88c3-05a864d7298d
|
||||
- `source_branch`: Current branch from step 3
|
||||
- `target_branch`: Target from step 2
|
||||
- `title`: Extract from commit message
|
||||
- `description`: Generate with summary and test plan
|
||||
- Return PR URL to user
|
||||
|
||||
6. **Add Work Item Comments Automatically** (if PR was created in step 5):
|
||||
- **Condition Check**: Only execute if:
|
||||
- A PR was created in step 5 (any `--pr-*` flag was used)
|
||||
- PR creation was successful and returned a PR ID
|
||||
- **Get Work Items from PR**:
|
||||
- Use `mcp__ado__repo_get_pull_request_by_id` with:
|
||||
- `repositoryId`: e030ea00-2f85-4b19-88c3-05a864d7298d
|
||||
- `pullRequestId`: PR ID from step 5
|
||||
- `includeWorkItemRefs`: true
|
||||
- Extract work item IDs from the PR response
|
||||
- If no work items found, log info message and skip to next step
|
||||
- **Add Comments to Each Work Item**:
|
||||
- For each work item ID extracted from PR:
|
||||
- Use `mcp__ado__wit_get_work_item` to verify work item exists
|
||||
- Generate comment with:
|
||||
- PR title and number
|
||||
- Commit message and SHA
|
||||
- File changes summary from `git diff --stat`
|
||||
- Link to PR in Azure DevOps
|
||||
- Link to commit in Azure DevOps
|
||||
- **IMPORTANT**: Do NOT include any footer text like "Automatically added by /local-commit command" or similar attribution
|
||||
- Call `mcp__ado__wit_add_work_item_comment` with:
|
||||
- `project`: "Program Unify"
|
||||
- `workItemId`: Current work item ID
|
||||
- `comment`: Generated comment with HTML formatting
|
||||
- `format`: "html"
|
||||
- Log success/failure for each work item
|
||||
- If ANY work item fails, warn but don't fail the commit
|
||||
|
||||
## Current Repository State
|
||||
|
||||
- Git status: !`git status --short`
|
||||
- Current branch: !`git branch --show-current`
|
||||
- Staged changes: !`git diff --cached --stat`
|
||||
- Unstaged changes: !`git diff --stat`
|
||||
- Recent commits: !`git log --oneline -5`
|
||||
|
||||
## What This Command Does
|
||||
|
||||
1. Analyzes current git status and changes
|
||||
2. If no files staged, stages all modified files with `git add`
|
||||
3. Reviews changes with `git diff`
|
||||
4. Analyzes for multiple logical changes
|
||||
5. For complex changes, suggests split commits
|
||||
6. Creates commit with emoji conventional format
|
||||
7. Automatically runs pre-commit hooks (ruff lint/format, trailing whitespace, etc.)
|
||||
- Pre-commit may modify files (auto-fixes)
|
||||
- If files are modified, they'll be re-staged automatically
|
||||
- Use `--no-verify` to skip hooks in emergencies only
|
||||
8. **NEW**: With PR flags, creates Azure DevOps pull request after push
|
||||
- Uses `mcp__ado__repo_create_pull_request` to create PR
|
||||
- Automatically links work items if commit message contains work item IDs
|
||||
- **IMPORTANT Branch Flow Rules** (unify_2_1_dm_synapse_env_d10 ONLY):
|
||||
- `--pr-s`: Feature branch → `staging` (standard feature PR)
|
||||
- `--pr-d`: `staging` → `develop` (promote staging to develop)
|
||||
- `--pr-m`: `develop` → `main` (promote develop to production)
|
||||
- `--pr`: **NOT ALLOWED** - must specify `-s`, `-d`, or `-m` for this repository
|
||||
- **For OTHER repositories**: `--pr` creates PR to `develop` branch (legacy behavior)
|
||||
9. **NEW**: Automatically adds comments to linked work items after PR creation
|
||||
- Retrieves work items linked to the PR using `mcp__ado__repo_get_pull_request_by_id`
|
||||
- Automatically adds comment to each linked work item with:
|
||||
- PR title and number
|
||||
- Commit message and SHA
|
||||
- Summary of file changes
|
||||
- Direct link to PR in Azure DevOps
|
||||
- Direct link to commit in Azure DevOps
|
||||
- **IMPORTANT**: No footer attribution text (e.g., "Automatically added by /local-commit command")
|
||||
- Validates work items exist before commenting
|
||||
- Continues even if some work items fail (warns only)
|
||||
|
||||
## Commit Message Format
|
||||
|
||||
### Type + Emoji Mapping
|
||||
- ✨ `feat`: New feature
|
||||
- 🐛 `fix`: Bug fix
|
||||
- 📝 `docs`: Documentation
|
||||
- 💄 `style`: Formatting/style
|
||||
- ♻️ `refactor`: Code refactoring
|
||||
- ⚡️ `perf`: Performance improvements
|
||||
- ✅ `test`: Tests
|
||||
- 🔧 `chore`: Tooling, configuration
|
||||
- 🚀 `ci`: CI/CD improvements
|
||||
- ⏪️ `revert`: Reverting changes
|
||||
- 🚨 `fix`: Compiler/linter warnings
|
||||
- 🔒️ `fix`: Security issues
|
||||
- 🩹 `fix`: Simple non-critical fix
|
||||
- 🚑️ `fix`: Critical hotfix
|
||||
- 🎨 `style`: Code structure/format
|
||||
- 🔥 `fix`: Remove code/files
|
||||
- 📦️ `chore`: Dependencies
|
||||
- 🌱 `chore`: Seed files
|
||||
- 🧑💻 `chore`: Developer experience
|
||||
- 🏷️ `feat`: Types
|
||||
- 💬 `feat`: Text/literals
|
||||
- 🌐 `feat`: i18n/l10n
|
||||
- 💡 `feat`: Business logic
|
||||
- 📱 `feat`: Responsive design
|
||||
- 🚸 `feat`: UX improvements
|
||||
- ♿️ `feat`: Accessibility
|
||||
- 🗃️ `db`: Database changes
|
||||
- 🚩 `feat`: Feature flags
|
||||
- ⚰️ `refactor`: Remove dead code
|
||||
- 🦺 `feat`: Validation
|
||||
|
||||
## Commit Strategy
|
||||
|
||||
### Single Commit (Default)
|
||||
```bash
|
||||
git add .
|
||||
git commit -m "✨ feat: implement user auth"
|
||||
```
|
||||
|
||||
### Multiple Commits (Complex Changes)
|
||||
```bash
|
||||
# Stage and commit separately
|
||||
git add src/auth.py
|
||||
git commit -m "✨ feat: add authentication module"
|
||||
|
||||
git add tests/test_auth.py
|
||||
git commit -m "✅ test: add auth unit tests"
|
||||
|
||||
git add docs/auth.md
|
||||
git commit -m "📝 docs: document auth API"
|
||||
|
||||
# Push all commits
|
||||
git push
|
||||
```
|
||||
|
||||
## Pre-Commit Hooks
|
||||
|
||||
Your project uses pre-commit with:
|
||||
- **Ruff**: Linting with auto-fix + formatting
|
||||
- **Standard hooks**: Trailing whitespace, AST check, YAML/JSON/TOML validation
|
||||
- **Security**: Private key detection
|
||||
- **Quality**: Debug statement detection, merge conflict check
|
||||
|
||||
**Important**: Pre-commit hooks will auto-fix issues and may modify your files. The commit process will:
|
||||
1. Run pre-commit hooks
|
||||
2. If hooks modify files, automatically re-stage them
|
||||
3. Complete the commit with all fixes applied
|
||||
|
||||
## Command Options
|
||||
|
||||
- `--no-verify`: Skip pre-commit checks (emergency use only)
|
||||
- `--amend`: Amend previous commit
|
||||
- **`--pr-s`**: Create PR to `staging` branch (feature → staging)
|
||||
- **`--pr-d`**: Create PR to `develop` branch (staging → develop)
|
||||
- **`--pr-m`**: Create PR to `main` branch (develop → main)
|
||||
- `--pr`: Legacy flag for other repositories (creates PR to `develop`)
|
||||
- **NOT ALLOWED** in unify_2_1_dm_synapse_env_d10 - must use `-s`, `-d`, or `-m`
|
||||
- Default: Run all pre-commit hooks and create new commit
|
||||
- **Automatic Work Item Comments**: When using any PR flag, work items linked to the PR will automatically receive comments with commit details (no footer attribution)
|
||||
|
||||
## Azure DevOps Integration Features
|
||||
|
||||
### Pull Request Workflow (PR Flags)
|
||||
When using PR flags, the command will:
|
||||
1. Commit changes locally
|
||||
2. Push to remote branch
|
||||
3. Validate repository and branch configuration:
|
||||
- **THIS repo (unify_2_1_dm_synapse_env_d10)**: Requires explicit flag (`--pr-s`, `--pr-d`, or `--pr-m`)
|
||||
- `--pr-s`: Current feature branch → `staging`
|
||||
- `--pr-d`: Must be on `staging` branch → `develop`
|
||||
- `--pr-m`: Must be on `develop` branch → `main`
|
||||
- `--pr` alone: **ERROR** - must specify target
|
||||
- **OTHER repos**: `--pr` creates PR to `develop` (all other flags ignored)
|
||||
4. Use `mcp__ado__repo_create_pull_request` to create PR with:
|
||||
- **Title**: Extracted from commit message
|
||||
- **Description**: Full commit details with summary and test plan
|
||||
- **Source Branch**: Current branch
|
||||
- **Target Branch**: Determined by flag and repository
|
||||
- **Work Items**: Auto-linked from commit message (e.g., "fixes #12345")
|
||||
|
||||
### Viewing Commit History
|
||||
You can view commit history using:
|
||||
- `mcp__ado__repo_search_commits` - Search commits by branch, author, date range
|
||||
- Traditional `git log` - For local history
|
||||
|
||||
### Branch Management
|
||||
- `mcp__ado__repo_list_branches_by_repo` - View all Azure DevOps branches
|
||||
- `git branch` - View local branches
|
||||
|
||||
## Branch Validation Rules (unify_2_1_dm_synapse_env_d10)
|
||||
|
||||
Before creating a PR, the command validates:
|
||||
|
||||
### --pr-s (Feature → Staging)
|
||||
- ✅ **ALLOWED**: Any `feature/*` branch
|
||||
- ❌ **BLOCKED**: `staging`, `develop`, `main` branches
|
||||
- **Target**: `staging`
|
||||
|
||||
### --pr-d (Staging → Develop)
|
||||
- ✅ **ALLOWED**: Only `staging` branch
|
||||
- ❌ **BLOCKED**: All other branches (including `feature/*`)
|
||||
- **Target**: `develop`
|
||||
|
||||
### --pr-m (Develop → Main)
|
||||
- ✅ **ALLOWED**: Only `develop` branch
|
||||
- ❌ **BLOCKED**: All other branches (including `staging`, `feature/*`)
|
||||
- **Target**: `main`
|
||||
|
||||
### --pr (Legacy - NOT ALLOWED)
|
||||
- ❌ **BLOCKED**: All branches in unify_2_1_dm_synapse_env_d10
|
||||
- 💡 **Error Message**: "Must use --pr-s, --pr-d, or --pr-m for this repository"
|
||||
- ✅ **ALLOWED**: All other repositories (targets `develop`)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Let pre-commit work** - Don't use `--no-verify` unless absolutely necessary
|
||||
2. **Atomic commits** - One logical change per commit
|
||||
3. **Descriptive messages** - Emoji + type + clear description
|
||||
4. **Review before commit** - Always check `git diff`
|
||||
5. **Clean history** - Split complex changes into multiple commits
|
||||
6. **Trust the hooks** - They maintain code quality automatically
|
||||
7. **Use correct PR flag** - `--pr-s` for features, `--pr-d` for staging promotion, `--pr-m` for production
|
||||
8. **Link work items** - Reference Azure DevOps work items in commit messages (e.g., "#43815") to enable automatic PR linking
|
||||
9. **Validate branch** - Ensure you're on the correct branch before using `--pr-d` or `--pr-m`
|
||||
10. **Work item linking** - Work items linked to PRs will automatically receive comments with commit details
|
||||
11. **Keep stakeholders informed** - Use PR flags to ensure work items are automatically updated with progress
|
||||
|
||||
## Example Workflows
|
||||
|
||||
### Simple Commit
|
||||
```bash
|
||||
/commit "fix: resolve enum import error"
|
||||
```
|
||||
|
||||
### Commit with Work Item
|
||||
```bash
|
||||
/commit "feat: add enum imports for Synapse environment"
|
||||
```
|
||||
|
||||
### Commit and Create PR (Feature to Staging)
|
||||
```bash
|
||||
/commit --pr-s "feat: refactor commit command with ADO MCP integration"
|
||||
```
|
||||
This will:
|
||||
1. Create commit locally
|
||||
2. Push to current branch
|
||||
3. Create PR: `feature/xyz → staging`
|
||||
4. Link work items automatically if mentioned in commit message
|
||||
|
||||
### Promote Staging to Develop
|
||||
```bash
|
||||
# First checkout staging branch
|
||||
git checkout staging
|
||||
git pull origin staging
|
||||
|
||||
# Then commit and create PR
|
||||
/commit --pr-d "release: promote staging changes to develop"
|
||||
```
|
||||
This will:
|
||||
1. Create commit on `staging` branch
|
||||
2. Push to `staging`
|
||||
3. Create PR: `staging → develop`
|
||||
|
||||
### Promote Develop to Main (Production)
|
||||
```bash
|
||||
# First checkout develop branch
|
||||
git checkout develop
|
||||
git pull origin develop
|
||||
|
||||
# Then commit and create PR
|
||||
/commit --pr-m "release: promote develop to production"
|
||||
```
|
||||
This will:
|
||||
1. Create commit on `develop` branch
|
||||
2. Push to `develop`
|
||||
3. Create PR: `develop → main`
|
||||
|
||||
### Error: Using --pr without suffix
|
||||
```bash
|
||||
/commit --pr "feat: some feature"
|
||||
```
|
||||
**Result**: ERROR - unify_2_1_dm_synapse_env_d10 requires explicit PR target (`--pr-s`, `--pr-d`, or `--pr-m`)
|
||||
|
||||
### Feature PR with Automatic Work Item Comments
|
||||
```bash
|
||||
# On feature/xyz branch
|
||||
/commit --pr-s "feat(user-auth): implement OAuth2 authentication #12345"
|
||||
```
|
||||
This will:
|
||||
1. Create commit on feature branch
|
||||
2. Push to feature branch
|
||||
3. Create PR: `feature/xyz → staging`
|
||||
4. Link work item #12345 to the PR
|
||||
5. Automatically add comment to work item #12345 with:
|
||||
- PR title and number
|
||||
- Commit message and SHA
|
||||
- File changes summary
|
||||
- Link to PR in Azure DevOps
|
||||
- Link to commit in Azure DevOps
|
||||
- (No footer attribution text)
|
||||
|
||||
### Staging to Develop PR with Multiple Work Items
|
||||
```bash
|
||||
# On staging branch
|
||||
/commit --pr-d "release: promote staging to develop - fixes #12345, #67890"
|
||||
```
|
||||
This will:
|
||||
1. Create commit on `staging` branch
|
||||
2. Push to `staging`
|
||||
3. Create PR: `staging → develop`
|
||||
4. Link work items #12345 and #67890 to the PR
|
||||
5. Automatically add comments to both work items with PR and commit details (without footer attribution)
|
||||
|
||||
**Note**: Work items are automatically detected from commit message and linked to PR. Comments are added automatically to all linked work items without any footer text.
|
||||
125
commands/multi-agent.md
Executable file
125
commands/multi-agent.md
Executable file
@@ -0,0 +1,125 @@
|
||||
---
|
||||
description: Discuss multi-agent workflow strategy for a specific task
|
||||
argument-hint: [task-description]
|
||||
allowed-tools: Read, Task, TodoWrite
|
||||
---
|
||||
|
||||
# Multi-Agent Workflow Discussion
|
||||
|
||||
Prepare to discuss how you will use a multi-agent workflow to ${ARGUMENTS}.
|
||||
|
||||
## Instructions
|
||||
|
||||
1. **Analyze the Task**: ${ARGUMENTS}
|
||||
- Break down the complexity
|
||||
- Identify parallelizable components
|
||||
- Determine if multi-agent approach is optimal
|
||||
|
||||
2. **Evaluate Approach**:
|
||||
- Should this use `/background` (single agent) or `/orchestrate` (multiple agents)?
|
||||
- How many agents would be optimal?
|
||||
- What are the dependencies between subtasks?
|
||||
|
||||
3. **Design Strategy**:
|
||||
- Outline the orchestration plan
|
||||
- Define agent responsibilities
|
||||
- Specify communication format (JSON responses)
|
||||
- Identify quality gates
|
||||
|
||||
4. **Provide Recommendations**:
|
||||
- Best command to use (`/background` vs `/orchestrate`)
|
||||
- Number of agents (if using orchestrate: 2-8)
|
||||
- Subtask breakdown
|
||||
- Estimated completion time
|
||||
- Resource requirements
|
||||
|
||||
## Output Format
|
||||
|
||||
Provide a structured discussion covering:
|
||||
|
||||
1. **Task Analysis**: Understanding of "${ARGUMENTS}"
|
||||
2. **Complexity Assessment**: Simple/Moderate/High
|
||||
3. **Recommended Approach**: `/background` or `/orchestrate`
|
||||
4. **Agent Breakdown** (if orchestrate):
|
||||
- Agent 1: Responsibility
|
||||
- Agent 2: Responsibility
|
||||
- ...
|
||||
5. **Dependencies**: None / Sequential / Hybrid
|
||||
6. **Estimated Time**: X minutes
|
||||
7. **Next Steps**: Commands to run
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Simple Task
|
||||
```
|
||||
User: /aa_command "optimize the g_x_mg_statsclasscount table"
|
||||
|
||||
Output:
|
||||
Task Analysis: Single gold table optimization - focused work
|
||||
Complexity: Moderate
|
||||
Recommended: /background
|
||||
Reasoning: Single file, sequential optimization steps
|
||||
Estimated Time: 15 minutes
|
||||
Next Step: /background "Optimize g_x_mg_statsclasscount.py for performance"
|
||||
```
|
||||
|
||||
### Example 2: Complex Task
|
||||
```
|
||||
User: /aa_command "implement data quality framework across all layers"
|
||||
|
||||
Output:
|
||||
Task Analysis: Cross-layer feature implementation
|
||||
Complexity: High
|
||||
Recommended: /orchestrate
|
||||
Agent Breakdown:
|
||||
- Agent 1: Design quality schema and base classes
|
||||
- Agent 2: Implement bronze layer validators
|
||||
- Agent 3: Implement silver layer validators
|
||||
- Agent 4: Implement gold layer validators
|
||||
- Agent 5: Create validation tests
|
||||
- Agent 6: Update documentation
|
||||
Dependencies: Agent 1 must complete before 2-4 (hybrid approach)
|
||||
Estimated Time: 45-60 minutes
|
||||
Next Step: Create task file and run /orchestrate data_quality_framework.md
|
||||
```
|
||||
|
||||
### Example 3: Medium Complexity
|
||||
```
|
||||
User: /aa_command "fix all linting errors in the silver layer"
|
||||
|
||||
Output:
|
||||
Task Analysis: Code quality fixes across multiple files
|
||||
Complexity: Moderate-High
|
||||
Recommended: /orchestrate (if >15 files) or /background (if <15 files)
|
||||
Agent Breakdown:
|
||||
- Agent 1: Fix linting in silver_cms files
|
||||
- Agent 2: Fix linting in silver_fvms files
|
||||
- Agent 3: Fix linting in silver_nicherms files
|
||||
Dependencies: None (fully parallel)
|
||||
Estimated Time: 20-30 minutes
|
||||
Next Step: /orchestrate "Fix linting errors: silver_cms, silver_fvms, silver_nicherms in parallel"
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Discuss strategy for any task
|
||||
/aa_command "optimize all gold tables for performance"
|
||||
|
||||
# Get recommendations for feature implementation
|
||||
/aa_command "add monitoring and alerting to the pipeline"
|
||||
|
||||
# Plan refactoring work
|
||||
/aa_command "refactor all ETL classes to use new base class pattern"
|
||||
|
||||
# Evaluate testing strategy
|
||||
/aa_command "write comprehensive tests for the medallion architecture"
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- This command helps you plan before executing
|
||||
- Use this to determine optimal agent strategy
|
||||
- Creates a blueprint for `/background` or `/orchestrate` commands
|
||||
- Considers parallelism, dependencies, and complexity
|
||||
- Provides concrete next steps and command examples
|
||||
54
commands/my-devops-tasks.md
Executable file
54
commands/my-devops-tasks.md
Executable file
@@ -0,0 +1,54 @@
|
||||
# ADO MCP Task Retrieval Prompt
|
||||
|
||||
Use the Azure DevOps MCP tools to retrieve all user stories and tasks assigned to me that are currently in "New", "Active", "Committed", or "Backlog" states. Create a comprehensive markdown document with the following structure:
|
||||
|
||||
## Query Parameters
|
||||
- **Assigned To**: @Me
|
||||
- **Work Item Types**: User Story, Task, Bug
|
||||
- **States**: New, Active, Committed, Backlog
|
||||
- **Include**: All active iterations and backlog
|
||||
|
||||
## Required Output Format
|
||||
|
||||
```markdown
|
||||
# My Active Work Items
|
||||
|
||||
## Summary
|
||||
- **Total Items**: {count}
|
||||
- **By Type**: {breakdown by work item type}
|
||||
- **By State**: {breakdown by state}
|
||||
- **Last Updated**: {current date}
|
||||
|
||||
## Work Items
|
||||
|
||||
### {Work Item Type} - {ID}: {Title}
|
||||
**URL** {URL to work item}
|
||||
**Status**: {State} | **Priority**: {Priority} | **Effort**: {Story Points/Original Estimate}
|
||||
**Iteration**: S{Iteration Path} | **Area**: {Area Path}
|
||||
|
||||
**Description Summary**:
|
||||
{Provide a 2-3 sentence summary of the description/acceptance criteria}
|
||||
|
||||
**Key Details**:
|
||||
- **Created**: {Created Date}
|
||||
- **Tags**: {Tags if any}
|
||||
- **Parent**: {Parent work item if applicable}
|
||||
|
||||
**[View in ADO]({URL to work item})**
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
## Specific Requirements
|
||||
|
||||
1. **Summarize Descriptions**: For each work item, provide a concise 2-3 sentence summary of the description and acceptance criteria, focusing on the core objective and deliverables.
|
||||
|
||||
2. **Clickable URLs**: Ensure all Azure DevOps URLs are properly formatted as clickable markdown links. including the actual work item
|
||||
|
||||
3. **Sort Order**: Sort by Priority (High to Low), then by State (Active, Committed, New, Backlog), then by Story Points/Effort (High to Low).
|
||||
|
||||
4. **Data Validation**: If any work items have missing key fields (Priority, Effort, etc.), note this in the output.
|
||||
|
||||
5. **Additional Context**: Include any relevant comments from the last 7 days if present.
|
||||
|
||||
Execute this query and generate the markdown document with all my currently assigned work items.
|
||||
510
commands/orchestrate.md
Executable file
510
commands/orchestrate.md
Executable file
@@ -0,0 +1,510 @@
|
||||
---
|
||||
description: Orchestrate multiple generic agents working in parallel on complex tasks
|
||||
argument-hint: [user-prompt] | [task-file-name]
|
||||
allowed-tools: Read, Task, TodoWrite
|
||||
---
|
||||
|
||||
# Multi-Agent Orchestrator
|
||||
|
||||
Launch an orchestrator agent that coordinates multiple generic agents working in parallel on complex, decomposable tasks. All agents communicate via JSON format for structured coordination.
|
||||
|
||||
## Usage
|
||||
|
||||
**Option 1: Direct prompt**
|
||||
```
|
||||
/orchestrate "Analyze all gold tables, identify optimization opportunities, and implement improvements across the codebase"
|
||||
```
|
||||
|
||||
**Option 2: Task file from .claude/tasks/**
|
||||
```
|
||||
/orchestrate multi_agent_pipeline_optimization.md
|
||||
```
|
||||
|
||||
**Option 3: List available orchestration tasks**
|
||||
```
|
||||
/orchestrate list
|
||||
```
|
||||
|
||||
## Variables
|
||||
|
||||
- `TASK_INPUT`: Either a direct prompt string or a task file name from `.claude/tasks/`
|
||||
- `TASK_FILE_PATH`: Full path to task file if using a task file
|
||||
- `PROMPT_CONTENT`: The actual prompt to send to the orchestrator agent
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Determine Task Source
|
||||
|
||||
Check if `$ARGUMENTS` looks like a file name (ends with `.md` or contains no spaces):
|
||||
- If YES: It's a task file name from `.claude/tasks/`
|
||||
- If NO: It's a direct user prompt
|
||||
- If "list": Show available orchestration task files
|
||||
|
||||
### 2. Load Task Content
|
||||
|
||||
**If using task file:**
|
||||
1. List all available task files in `.claude/tasks/` directory
|
||||
2. Find the task file matching the provided name (exact match or partial match)
|
||||
3. Read the task file content
|
||||
4. Use the full task file content as the prompt
|
||||
|
||||
**If using direct prompt:**
|
||||
1. Use the `$ARGUMENTS` directly as the prompt
|
||||
|
||||
**If "list" command:**
|
||||
1. Show all available orchestration task files with metadata
|
||||
2. Exit without launching agents
|
||||
|
||||
### 3. Launch Orchestrator Agent
|
||||
|
||||
Launch the orchestrator agent using the Task tool with the following configuration:
|
||||
|
||||
**Important Configuration:**
|
||||
- **subagent_type**: `general-purpose`
|
||||
- **model**: `sonnet` (default) or `opus` for highly complex orchestrations
|
||||
- **description**: Short 3-5 word description (e.g., "Orchestrate pipeline optimization")
|
||||
- **prompt**: Complete orchestrator instructions (see template below)
|
||||
|
||||
**Orchestrator Prompt Template:**
|
||||
```
|
||||
You are an ORCHESTRATOR AGENT coordinating multiple generic worker agents on a complex project task.
|
||||
|
||||
PROJECT CONTEXT:
|
||||
- Project: Unify 2.1 Data Migration using Azure Synapse Analytics
|
||||
- Architecture: Medallion pattern (Bronze/Silver/Gold layers)
|
||||
- Primary Language: PySpark Python
|
||||
- Follow: .claude/CLAUDE.md and .claude/rules/python_rules.md
|
||||
|
||||
YOUR ORCHESTRATOR RESPONSIBILITIES:
|
||||
1. Analyze the main task and decompose it into 2-8 independent subtasks
|
||||
2. Launch multiple generic worker agents (use Task tool with subagent_type="general-purpose")
|
||||
3. Provide each worker agent with:
|
||||
- Clear, self-contained instructions
|
||||
- Required context (file paths, requirements)
|
||||
- Expected JSON response format
|
||||
4. Collect and aggregate all worker responses
|
||||
5. Validate completeness and consistency
|
||||
6. Produce final consolidated report
|
||||
|
||||
MAIN TASK TO ORCHESTRATE:
|
||||
{TASK_CONTENT}
|
||||
|
||||
WORKER AGENT COMMUNICATION PROTOCOL:
|
||||
Each worker agent MUST return results in this JSON format:
|
||||
```json
|
||||
{
|
||||
"agent_id": "unique_identifier",
|
||||
"task_assigned": "brief description",
|
||||
"status": "completed|failed|partial",
|
||||
"results": {
|
||||
"files_modified": ["path/to/file1.py", "path/to/file2.py"],
|
||||
"changes_summary": "description of changes",
|
||||
"metrics": {
|
||||
"lines_added": 0,
|
||||
"lines_removed": 0,
|
||||
"functions_added": 0,
|
||||
"issues_fixed": 0
|
||||
}
|
||||
},
|
||||
"quality_checks": {
|
||||
"syntax_check": "passed|failed",
|
||||
"linting": "passed|failed",
|
||||
"formatting": "passed|failed"
|
||||
},
|
||||
"issues_encountered": ["issue1", "issue2"],
|
||||
"recommendations": ["recommendation1", "recommendation2"],
|
||||
"execution_time_seconds": 0
|
||||
}
|
||||
```
|
||||
|
||||
WORKER AGENT PROMPT TEMPLATE:
|
||||
When launching each worker agent, use this prompt structure:
|
||||
|
||||
```
|
||||
You are a WORKER AGENT (ID: {agent_id}) reporting to an orchestrator.
|
||||
|
||||
CRITICAL: You MUST return your results in JSON format as specified below.
|
||||
|
||||
PROJECT CONTEXT:
|
||||
- Read and follow: .claude/CLAUDE.md and .claude/rules/python_rules.md
|
||||
- Coding Standards: 240 char lines, no blanks in functions, type hints required
|
||||
- Use: @synapse_error_print_handler decorator, NotebookLogger, TableUtilities
|
||||
|
||||
YOUR ASSIGNED SUBTASK:
|
||||
{subtask_description}
|
||||
|
||||
FILES TO WORK ON:
|
||||
{file_list}
|
||||
|
||||
REQUIREMENTS:
|
||||
{specific_requirements}
|
||||
|
||||
QUALITY GATES (MUST RUN):
|
||||
1. python3 -m py_compile <modified_files>
|
||||
2. ruff check python_files/
|
||||
3. ruff format python_files/
|
||||
|
||||
REQUIRED JSON RESPONSE FORMAT:
|
||||
```json
|
||||
{
|
||||
"agent_id": "{agent_id}",
|
||||
"task_assigned": "{subtask_description}",
|
||||
"status": "completed",
|
||||
"results": {
|
||||
"files_modified": [],
|
||||
"changes_summary": "",
|
||||
"metrics": {
|
||||
"lines_added": 0,
|
||||
"lines_removed": 0,
|
||||
"functions_added": 0,
|
||||
"issues_fixed": 0
|
||||
}
|
||||
},
|
||||
"quality_checks": {
|
||||
"syntax_check": "passed|failed",
|
||||
"linting": "passed|failed",
|
||||
"formatting": "passed|failed"
|
||||
},
|
||||
"issues_encountered": [],
|
||||
"recommendations": [],
|
||||
"execution_time_seconds": 0
|
||||
}
|
||||
```
|
||||
|
||||
Work autonomously, complete your task, run quality gates, and return the JSON response.
|
||||
```
|
||||
|
||||
ORCHESTRATION WORKFLOW:
|
||||
1. **Task Decomposition**: Break main task into 2-8 independent subtasks
|
||||
2. **Agent Assignment**: Create unique agent IDs (agent_1, agent_2, etc.)
|
||||
3. **Parallel Launch**: Launch all worker agents simultaneously using Task tool
|
||||
4. **Monitor Progress**: Track each agent's completion
|
||||
5. **Collect Results**: Parse JSON responses from each worker agent
|
||||
6. **Validate Output**: Ensure all quality checks passed
|
||||
7. **Aggregate Results**: Combine all worker outputs
|
||||
8. **Generate Report**: Create comprehensive orchestration summary
|
||||
|
||||
FINAL ORCHESTRATOR REPORT FORMAT:
|
||||
```json
|
||||
{
|
||||
"orchestration_summary": {
|
||||
"main_task": "{original task description}",
|
||||
"total_agents_launched": 0,
|
||||
"successful_agents": 0,
|
||||
"failed_agents": 0,
|
||||
"total_execution_time_seconds": 0
|
||||
},
|
||||
"agent_results": [
|
||||
{worker_agent_json_response_1},
|
||||
{worker_agent_json_response_2},
|
||||
...
|
||||
],
|
||||
"consolidated_metrics": {
|
||||
"total_files_modified": 0,
|
||||
"total_lines_added": 0,
|
||||
"total_lines_removed": 0,
|
||||
"total_functions_added": 0,
|
||||
"total_issues_fixed": 0
|
||||
},
|
||||
"quality_validation": {
|
||||
"all_syntax_checks_passed": true,
|
||||
"all_linting_passed": true,
|
||||
"all_formatting_passed": true
|
||||
},
|
||||
"consolidated_issues": [],
|
||||
"consolidated_recommendations": [],
|
||||
"next_steps": []
|
||||
}
|
||||
```
|
||||
|
||||
BEST PRACTICES:
|
||||
- Keep subtasks independent (no dependencies between worker agents)
|
||||
- Provide complete context to each worker agent
|
||||
- Launch all agents in parallel for maximum efficiency
|
||||
- Validate JSON responses from each worker
|
||||
- Aggregate metrics and results systematically
|
||||
- Flag any worker failures or incomplete results
|
||||
- Provide actionable next steps
|
||||
|
||||
Work autonomously and orchestrate the complete task execution.
|
||||
```
|
||||
|
||||
### 4. Inform User
|
||||
|
||||
After launching the orchestrator, inform the user:
|
||||
- Orchestrator agent has been launched
|
||||
- Main task being orchestrated (summary)
|
||||
- Expected number of worker agents to be spawned
|
||||
- Estimated completion time (if known)
|
||||
- The orchestrator will coordinate all work and provide a consolidated JSON report
|
||||
|
||||
## Task File Structure
|
||||
|
||||
Expected orchestration task file format in `.claude/tasks/`:
|
||||
|
||||
```markdown
|
||||
# Orchestration Task Title
|
||||
|
||||
**Date Created**: YYYY-MM-DD
|
||||
**Priority**: HIGH/MEDIUM/LOW
|
||||
**Estimated Total Time**: X minutes
|
||||
**Complexity**: High/Medium/Low
|
||||
**Recommended Worker Agents**: N
|
||||
|
||||
## Main Objective
|
||||
Clear description of the overall goal
|
||||
|
||||
## Success Criteria
|
||||
- [ ] Criterion 1
|
||||
- [ ] Criterion 2
|
||||
- [ ] Criterion 3
|
||||
|
||||
## Suggested Subtask Decomposition
|
||||
|
||||
### Subtask 1: Title
|
||||
**Scope**: Files/components affected
|
||||
**Estimated Time**: X minutes
|
||||
**Dependencies**: None or list other subtasks
|
||||
|
||||
**Description**: What needs to be done
|
||||
|
||||
**Expected Outputs**:
|
||||
- Output 1
|
||||
- Output 2
|
||||
|
||||
---
|
||||
|
||||
### Subtask 2: Title
|
||||
**Scope**: Files/components affected
|
||||
**Estimated Time**: X minutes
|
||||
**Dependencies**: None or list other subtasks
|
||||
|
||||
**Description**: What needs to be done
|
||||
|
||||
**Expected Outputs**:
|
||||
- Output 1
|
||||
- Output 2
|
||||
|
||||
---
|
||||
|
||||
(Repeat for each suggested subtask)
|
||||
|
||||
## Quality Requirements
|
||||
- All code must pass syntax validation
|
||||
- All code must pass linting
|
||||
- All code must be formatted
|
||||
- All agents must return valid JSON
|
||||
|
||||
## Aggregation Requirements
|
||||
- How to combine results from worker agents
|
||||
- Validation steps for consolidated output
|
||||
- Reporting requirements
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Pipeline Optimization
|
||||
```
|
||||
User: /orchestrate "Analyze and optimize all gold layer tables for performance"
|
||||
|
||||
Orchestrator launches 5 worker agents:
|
||||
- agent_1: Analyze g_x_mg_* tables
|
||||
- agent_2: Analyze g_xa_* tables
|
||||
- agent_3: Review joins and aggregations
|
||||
- agent_4: Check indexing strategies
|
||||
- agent_5: Validate query plans
|
||||
|
||||
Each agent reports back with JSON results
|
||||
Orchestrator aggregates findings and produces consolidated report
|
||||
```
|
||||
|
||||
### Example 2: Code Quality Sweep
|
||||
```
|
||||
User: /orchestrate code_quality_improvement.md
|
||||
|
||||
Orchestrator reads task file with 8 categories
|
||||
Launches 8 worker agents in parallel:
|
||||
- agent_1: Fix linting issues in bronze layer
|
||||
- agent_2: Fix linting issues in silver layer
|
||||
- agent_3: Fix linting issues in gold layer
|
||||
- agent_4: Add missing type hints
|
||||
- agent_5: Update error handling
|
||||
- agent_6: Improve logging
|
||||
- agent_7: Optimize imports
|
||||
- agent_8: Update documentation
|
||||
|
||||
Collects JSON from all 8 agents
|
||||
Validates quality checks
|
||||
Produces aggregated metrics report
|
||||
```
|
||||
|
||||
### Example 3: Feature Implementation
|
||||
```
|
||||
User: /orchestrate "Implement data validation framework across all layers"
|
||||
|
||||
Orchestrator decomposes into:
|
||||
- agent_1: Design validation schema
|
||||
- agent_2: Implement bronze validators
|
||||
- agent_3: Implement silver validators
|
||||
- agent_4: Implement gold validators
|
||||
- agent_5: Create validation tests
|
||||
- agent_6: Update documentation
|
||||
|
||||
Coordinates execution
|
||||
Collects results in JSON format
|
||||
Validates completeness
|
||||
Generates implementation report
|
||||
```
|
||||
|
||||
## JSON Response Validation
|
||||
|
||||
The orchestrator MUST validate each worker agent response contains:
|
||||
|
||||
**Required Fields:**
|
||||
- `agent_id`: String, unique identifier
|
||||
- `task_assigned`: String, description of assigned work
|
||||
- `status`: String, one of ["completed", "failed", "partial"]
|
||||
- `results`: Object with:
|
||||
- `files_modified`: Array of strings
|
||||
- `changes_summary`: String
|
||||
- `metrics`: Object with numeric values
|
||||
- `quality_checks`: Object with pass/fail values
|
||||
- `issues_encountered`: Array of strings
|
||||
- `recommendations`: Array of strings
|
||||
- `execution_time_seconds`: Number
|
||||
|
||||
**Validation Checks:**
|
||||
- All required fields present
|
||||
- Status is valid enum value
|
||||
- Arrays are properly formatted
|
||||
- Metrics are numeric
|
||||
- Quality checks are pass/fail
|
||||
- JSON is well-formed and parseable
|
||||
|
||||
## Agent Coordination Patterns
|
||||
|
||||
### Pattern 1: Parallel Independent Tasks
|
||||
```
|
||||
Orchestrator launches all agents simultaneously
|
||||
No dependencies between agents
|
||||
Each agent works on separate files/components
|
||||
Results aggregated at end
|
||||
```
|
||||
|
||||
### Pattern 2: Sequential with Handoff (Not Recommended)
|
||||
```
|
||||
Orchestrator launches agent_1
|
||||
Waits for agent_1 JSON response
|
||||
Uses agent_1 results to inform agent_2 prompt
|
||||
Launches agent_2 with context from agent_1
|
||||
Continues chain
|
||||
```
|
||||
|
||||
### Pattern 3: Hybrid (Parallel Groups)
|
||||
```
|
||||
Orchestrator identifies 2-3 independent groups
|
||||
Launches all agents in group 1 in parallel
|
||||
Waits for group 1 completion
|
||||
Launches all agents in group 2 with context from group 1
|
||||
Aggregates results from all groups
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Orchestration task completion requires:
|
||||
- ✅ All worker agents launched successfully
|
||||
- ✅ All worker agents returned valid JSON responses
|
||||
- ✅ All quality checks passed across all agents
|
||||
- ✅ No unresolved issues or failures
|
||||
- ✅ Consolidated metrics calculated correctly
|
||||
- ✅ Comprehensive orchestration report provided
|
||||
- ✅ All files syntax validated
|
||||
- ✅ All files linted and formatted
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Orchestrator Design
|
||||
- Keep worker tasks independent when possible
|
||||
- Provide complete context to each worker
|
||||
- Assign unique, meaningful agent IDs
|
||||
- Specify clear JSON response requirements
|
||||
- Validate all JSON responses
|
||||
- Handle worker failures gracefully
|
||||
- Aggregate results systematically
|
||||
- Provide actionable consolidated report
|
||||
|
||||
### For Worker Agent Design
|
||||
- Make each subtask self-contained
|
||||
- Include all necessary context in prompt
|
||||
- Specify exact file paths and requirements
|
||||
- Define clear success criteria
|
||||
- Require JSON response format
|
||||
- Include quality gate validation
|
||||
- Request execution metrics
|
||||
|
||||
### For Task Decomposition
|
||||
- Break into 2-8 independent subtasks
|
||||
- Avoid inter-agent dependencies
|
||||
- Balance workload across agents
|
||||
- Group related work logically
|
||||
- Consider file/component boundaries
|
||||
- Respect layer separation (bronze/silver/gold)
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Worker Agent Failures
|
||||
If a worker agent fails:
|
||||
1. Orchestrator captures failure details
|
||||
2. Marks agent status as "failed" in JSON
|
||||
3. Continues with other agents
|
||||
4. Reports failure in final summary
|
||||
5. Suggests recovery steps
|
||||
|
||||
### JSON Parse Errors
|
||||
If worker returns invalid JSON:
|
||||
1. Orchestrator logs parse error
|
||||
2. Attempts to extract partial results
|
||||
3. Marks agent response as invalid
|
||||
4. Flags for manual review
|
||||
5. Continues with valid responses
|
||||
|
||||
### Quality Check Failures
|
||||
If worker's quality checks fail:
|
||||
1. Orchestrator flags the failure
|
||||
2. Includes failure details in report
|
||||
3. Prevents final approval
|
||||
4. Suggests corrective actions
|
||||
5. May relaunch worker with corrections
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Parallel Execution
|
||||
- Launch all independent agents simultaneously
|
||||
- Use Task tool with multiple concurrent calls
|
||||
- Maximize parallelism for faster completion
|
||||
- Monitor resource utilization
|
||||
|
||||
### Agent Sizing
|
||||
- 2-8 agents: Optimal for most tasks
|
||||
- <2 agents: Consider using single agent instead
|
||||
- >8 agents: May have coordination overhead
|
||||
- Balance granularity vs overhead
|
||||
|
||||
### Context Management
|
||||
- Provide minimal necessary context
|
||||
- Avoid duplicating shared information
|
||||
- Use references to shared documentation
|
||||
- Keep prompts focused and concise
|
||||
|
||||
## Notes
|
||||
|
||||
- Orchestrator coordinates but doesn't do actual code changes
|
||||
- Worker agents are general-purpose and autonomous
|
||||
- All communication uses structured JSON format
|
||||
- Quality validation is mandatory across all agents
|
||||
- Failed agents don't block other agents
|
||||
- Orchestrator produces human-readable summary
|
||||
- JSON enables programmatic result processing
|
||||
- Pattern scales from 2 to 8 parallel agents
|
||||
- Best for complex, decomposable tasks
|
||||
- Overkill for simple, atomic tasks
|
||||
84
commands/performance-monitoring.md
Executable file
84
commands/performance-monitoring.md
Executable file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
allowed-tools: Read, Bash, Grep, Glob
|
||||
argument-hint: [monitoring-type] | --apm | --rum | --custom
|
||||
description: Setup comprehensive application performance monitoring with metrics, alerting, and observability
|
||||
|
||||
---
|
||||
|
||||
# Add Performance Monitoring
|
||||
|
||||
Setup application performance monitoring: **$ARGUMENTS**
|
||||
|
||||
## Instructions
|
||||
|
||||
1. **Performance Monitoring Strategy**
|
||||
- Define key performance indicators (KPIs) and service level objectives (SLOs)
|
||||
- Identify critical user journeys and performance bottlenecks
|
||||
- Plan monitoring architecture and data collection strategy
|
||||
- Assess existing monitoring infrastructure and integration points
|
||||
- Define alerting thresholds and escalation procedures
|
||||
|
||||
2. **Application Performance Monitoring (APM)**
|
||||
- Set up comprehensive APM solution (New Relic, Datadog, AppDynamics)
|
||||
- Configure distributed tracing for request lifecycle visibility
|
||||
- Implement custom metrics and performance tracking
|
||||
- Set up transaction monitoring and error tracking
|
||||
- Configure performance profiling and diagnostics
|
||||
|
||||
3. **Real User Monitoring (RUM)**
|
||||
- Implement client-side performance tracking and web vitals monitoring
|
||||
- Set up user experience metrics collection (LCP, FID, CLS, TTFB)
|
||||
- Configure custom performance metrics for user interactions
|
||||
- Monitor page load performance and resource loading
|
||||
- Track user journey performance across different devices
|
||||
|
||||
4. **Server Performance Monitoring**
|
||||
- Monitor system metrics (CPU, memory, disk, network)
|
||||
- Set up process and application-level monitoring
|
||||
- Configure event loop lag and garbage collection monitoring
|
||||
- Implement custom server performance metrics
|
||||
- Monitor resource utilization and capacity planning
|
||||
|
||||
5. **Database Performance Monitoring**
|
||||
- Track database query performance and slow query identification
|
||||
- Monitor database connection pool utilization
|
||||
- Set up database performance metrics and alerting
|
||||
- Implement query execution plan analysis
|
||||
- Monitor database resource usage and optimization opportunities
|
||||
|
||||
6. **Error Tracking and Monitoring**
|
||||
- Implement comprehensive error tracking (Sentry, Bugsnag, Rollbar)
|
||||
- Configure error categorization and impact analysis
|
||||
- Set up error alerting and notification systems
|
||||
- Track error trends and resolution metrics
|
||||
- Implement error context and debugging information
|
||||
|
||||
7. **Custom Metrics and Dashboards**
|
||||
- Implement business metrics tracking (Prometheus, StatsD)
|
||||
- Create performance dashboards and visualizations
|
||||
- Configure custom alerting rules and thresholds
|
||||
- Set up performance trend analysis and reporting
|
||||
- Implement performance regression detection
|
||||
|
||||
8. **Alerting and Notification System**
|
||||
- Configure intelligent alerting based on performance thresholds
|
||||
- Set up multi-channel notifications (email, Slack, PagerDuty)
|
||||
- Implement alert escalation and on-call procedures
|
||||
- Configure alert fatigue prevention and noise reduction
|
||||
- Set up performance incident management workflows
|
||||
|
||||
9. **Performance Testing Integration**
|
||||
- Integrate monitoring with load testing and performance testing
|
||||
- Set up continuous performance testing and monitoring
|
||||
- Configure performance baseline tracking and comparison
|
||||
- Implement performance test result analysis and reporting
|
||||
- Monitor performance under different load scenarios
|
||||
|
||||
10. **Performance Optimization Recommendations**
|
||||
- Generate actionable performance insights and recommendations
|
||||
- Implement automated performance analysis and reporting
|
||||
- Set up performance optimization tracking and measurement
|
||||
- Configure performance improvement validation
|
||||
- Create performance optimization prioritization frameworks
|
||||
|
||||
Focus on monitoring strategies that provide actionable insights for performance optimization. Ensure monitoring overhead is minimal and doesn't impact application performance.
|
||||
268
commands/pr-deploy-workflow.md
Executable file
268
commands/pr-deploy-workflow.md
Executable file
@@ -0,0 +1,268 @@
|
||||
---
|
||||
model: claude-haiku-4-5-20251001
|
||||
allowed-tools: SlashCommand, Bash(git:*), mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_requests_by_repo_or_project
|
||||
argument-hint: [commit-message]
|
||||
description: Complete deployment workflow - commit, PR to staging, review, then staging to develop
|
||||
---
|
||||
|
||||
|
||||
# Complete Deployment Workflow
|
||||
|
||||
Automates the full deployment workflow with integrated PR review:
|
||||
1. Commit feature changes and create PR to staging
|
||||
2. Automatically review the PR for quality and standards
|
||||
3. Fix any issues identified in review (with iteration loop)
|
||||
4. After PR is approved and merged, create PR from staging to develop
|
||||
|
||||
## What This Does
|
||||
|
||||
1. Calls `/pr-feature-to-staging` to commit and create feature → staging PR
|
||||
2. Calls `/pr-review` to automatically review the PR
|
||||
3. If review identifies issues → calls `/pr-fix-pr-review` and loops back to review
|
||||
4. If review passes → waits for user to merge staging PR
|
||||
5. Calls `/pr-staging-to-develop` to create staging → develop PR
|
||||
|
||||
## Implementation Logic
|
||||
|
||||
### Step 1: Create Feature PR to Staging
|
||||
Use `SlashCommand` tool to execute:
|
||||
```
|
||||
/pr-feature-to-staging $ARGUMENTS
|
||||
```
|
||||
|
||||
**Expected Output:**
|
||||
- PR URL and PR ID
|
||||
- Work item comments added
|
||||
- Source and target branches confirmed
|
||||
|
||||
**Extract from output:**
|
||||
- PR ID (needed for review step)
|
||||
- PR number (for user reference)
|
||||
|
||||
### Step 2: Automated PR Review
|
||||
Use `SlashCommand` tool to execute:
|
||||
```
|
||||
/pr-review [PR_ID]
|
||||
```
|
||||
|
||||
**The review will evaluate:**
|
||||
- Code quality and maintainability
|
||||
- PySpark best practices
|
||||
- ETL pattern compliance
|
||||
- Standards compliance from `.claude/rules/python_rules.md`
|
||||
- DevOps considerations
|
||||
- Merge conflicts
|
||||
|
||||
**Review Outcomes:**
|
||||
|
||||
#### Outcome A: Review Passes (PR Approved)
|
||||
Review output will indicate:
|
||||
- "PR approved and set to auto-complete"
|
||||
- No active review comments requiring changes
|
||||
- All quality gates passed
|
||||
|
||||
**Action:** Proceed to Step 4
|
||||
|
||||
#### Outcome B: Review Requires Changes
|
||||
Review output will indicate:
|
||||
- Active review comments with specific issues
|
||||
- Quality standards not met
|
||||
- Files requiring modifications
|
||||
|
||||
**Action:** Proceed to Step 3
|
||||
|
||||
### Step 3: Fix Review Issues (if needed)
|
||||
**Only execute if Step 2 identified issues**
|
||||
|
||||
Use `SlashCommand` tool to execute:
|
||||
```
|
||||
/pr-fix-pr-review [PR_ID]
|
||||
```
|
||||
|
||||
**This will:**
|
||||
1. Retrieve all active review comments
|
||||
2. Make code changes to address feedback
|
||||
3. Run quality gates (syntax, lint, format)
|
||||
4. Commit fixes and push to feature branch
|
||||
5. Reply to review threads
|
||||
6. Update the PR automatically
|
||||
|
||||
**After fixes are applied:**
|
||||
- Loop back to Step 2 to re-review
|
||||
- Continue iterating until review passes
|
||||
|
||||
**Iteration Logic:**
|
||||
```
|
||||
LOOP while review has active issues:
|
||||
1. /pr-fix-pr-review [PR_ID]
|
||||
2. /pr-review [PR_ID]
|
||||
3. Check review outcome
|
||||
4. If approved → exit loop
|
||||
5. If still has issues → continue loop
|
||||
END LOOP
|
||||
```
|
||||
|
||||
### Step 4: Wait for Staging PR Merge
|
||||
After PR review passes and is approved, inform user:
|
||||
```
|
||||
✅ PR Review Passed - PR Approved and Ready
|
||||
|
||||
PR #[PR_ID] has been reviewed and approved with auto-complete enabled.
|
||||
|
||||
Review Summary:
|
||||
- Code quality: ✓ Passed
|
||||
- PySpark best practices: ✓ Passed
|
||||
- ETL patterns: ✓ Passed
|
||||
- Standards compliance: ✓ Passed
|
||||
- No merge conflicts
|
||||
|
||||
Next Steps:
|
||||
1. The PR will auto-merge when all policies are satisfied
|
||||
2. Once merged to staging, I'll create the staging → develop PR
|
||||
|
||||
Would you like me to:
|
||||
a) Create the staging → develop PR now (if staging merge is complete)
|
||||
b) Wait for you to confirm the staging merge
|
||||
c) Check the PR status
|
||||
|
||||
Enter choice (a/b/c):
|
||||
```
|
||||
|
||||
**User Responses:**
|
||||
- **a**: Immediately proceed to Step 5
|
||||
- **b**: Wait for user confirmation, then proceed to Step 5
|
||||
- **c**: Use `mcp__ado__repo_get_pull_request_by_id` to check if PR is merged, then guide user
|
||||
|
||||
### Step 5: Create Staging to Develop PR
|
||||
Use `SlashCommand` tool to execute:
|
||||
```
|
||||
/pr-staging-to-develop
|
||||
```
|
||||
|
||||
**This will:**
|
||||
1. Create PR: staging → develop
|
||||
2. Handle any merge conflicts
|
||||
3. Return PR URL for tracking
|
||||
|
||||
**Final Output:**
|
||||
```
|
||||
🚀 Deployment Workflow Complete
|
||||
|
||||
Feature → Staging:
|
||||
- PR #[PR_ID] - Reviewed and Merged ✓
|
||||
|
||||
Staging → Develop:
|
||||
- PR #[NEW_PR_ID] - Created and Ready for Review
|
||||
- URL: [PR_URL]
|
||||
|
||||
Summary:
|
||||
1. Feature PR created and reviewed
|
||||
2. All quality gates passed
|
||||
3. PR approved and merged to staging
|
||||
4. Staging PR created for develop
|
||||
|
||||
The workflow is complete. The staging → develop PR is now ready for final review and deployment.
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Full Workflow with Work Item
|
||||
```bash
|
||||
/deploy-workflow "feat(gold): add X_MG_Offender linkage table #45497"
|
||||
```
|
||||
|
||||
**This will:**
|
||||
1. Create commit on feature branch
|
||||
2. Create PR: feature → staging
|
||||
3. Comment on work item #45497
|
||||
4. Automatically review PR for quality
|
||||
5. Fix any issues identified (with iteration)
|
||||
6. Wait for staging PR merge
|
||||
7. Create PR: staging → develop
|
||||
|
||||
### Full Workflow Without Work Item
|
||||
```bash
|
||||
/deploy-workflow "refactor: optimise session management"
|
||||
```
|
||||
|
||||
**This will:**
|
||||
1. Create commit on feature branch
|
||||
2. Create PR: feature → staging
|
||||
3. Automatically review PR
|
||||
4. Fix any issues (iterative)
|
||||
5. Wait for merge confirmation
|
||||
6. Create staging → develop PR
|
||||
|
||||
## Review Iteration Example
|
||||
|
||||
**Scenario:** Review finds 3 issues in the initial PR
|
||||
|
||||
```
|
||||
Step 1: /pr-feature-to-staging "feat: add new table"
|
||||
→ PR #5678 created
|
||||
|
||||
Step 2: /pr-review 5678
|
||||
→ Found 3 issues:
|
||||
- Missing type hints in function
|
||||
- Line exceeds 240 characters
|
||||
- Missing @synapse_error_print_handler decorator
|
||||
|
||||
Step 3: /pr-fix-pr-review 5678
|
||||
→ Fixed all 3 issues
|
||||
→ Committed and pushed
|
||||
→ PR updated
|
||||
|
||||
Step 2 (again): /pr-review 5678
|
||||
→ All issues resolved
|
||||
→ PR approved ✓
|
||||
|
||||
Step 4: Wait for merge confirmation
|
||||
|
||||
Step 5: /pr-staging-to-develop
|
||||
→ PR #5679 created (staging → develop)
|
||||
|
||||
Complete!
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### PR Creation Fails
|
||||
- Display error from `/pr-feature-to-staging`
|
||||
- Guide user to resolve (branch validation, git issues)
|
||||
- Do not proceed to review step
|
||||
|
||||
### Review Cannot Complete
|
||||
- Display specific blocker (merge conflicts, missing files)
|
||||
- Guide user to manual resolution
|
||||
- Offer to retry review after fix
|
||||
|
||||
### Fix PR Review Fails
|
||||
- Display specific errors (quality gates, git issues)
|
||||
- Offer manual intervention option
|
||||
- Allow user to fix locally and skip to next step
|
||||
|
||||
### Staging PR Already Exists
|
||||
- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` to check existing PRs
|
||||
- Inform user of existing PR
|
||||
- Ask if they want to create anyway or use existing
|
||||
|
||||
## Notes
|
||||
|
||||
- **Automated Review**: Quality gates are enforced automatically
|
||||
- **Iterative Fixes**: Will loop through fix → review until approved
|
||||
- **Semi-Automated Merge**: User must confirm staging merge before final PR
|
||||
- **Work Item Tracking**: Automatic comments on linked work items
|
||||
- **Quality First**: Won't proceed if review fails and can't auto-fix
|
||||
- **Graceful Degradation**: Offers manual intervention at each step if automatisation fails
|
||||
|
||||
## Quality Gates Enforced
|
||||
|
||||
The integrated `/pr-review` checks:
|
||||
1. Code quality (type hints, line length, formatting)
|
||||
2. PySpark best practices (DataFrame ops, logging, session mgmt)
|
||||
3. ETL pattern compliance (class structure, decorators)
|
||||
4. Standards from `.claude/rules/python_rules.md`
|
||||
5. No merge conflicts
|
||||
6. Proper error handling
|
||||
|
||||
All must pass before proceeding to staging → develop PR.
|
||||
233
commands/pr-feature-to-staging.md
Executable file
233
commands/pr-feature-to-staging.md
Executable file
@@ -0,0 +1,233 @@
|
||||
---
|
||||
model: claude-haiku-4-5-20251001
|
||||
allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__*, mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item, Read, Glob
|
||||
argument-hint:
|
||||
description: Automatically analyze changes and create PR from current feature branch to staging
|
||||
---
|
||||
|
||||
# Create Feature PR to Staging
|
||||
|
||||
Automatically analyzes repository changes, generates appropriate commit message, and creates pull request to `staging`.
|
||||
|
||||
## Repository Configuration
|
||||
- **Project**: Program Unify
|
||||
- **Repository ID**: e030ea00-2f85-4b19-88c3-05a864d7298d
|
||||
- **Repository Name**: unify_2_1_dm_synapse_env_d10
|
||||
- **Target Branch**: `staging` (fixed)
|
||||
- **Source Branch**: Current feature branch
|
||||
|
||||
## Current Repository State
|
||||
|
||||
- Git status: !`git status --short`
|
||||
- Current branch: !`git branch --show-current`
|
||||
- Staged changes: !`git diff --cached --stat`
|
||||
- Unstaged changes: !`git diff --stat`
|
||||
- Recent commits: !`git log --oneline -5`
|
||||
|
||||
## Implementation Logic
|
||||
|
||||
### 1. Validate Current Branch
|
||||
- Get current branch: `git branch --show-current`
|
||||
- **REQUIRE**: Branch must start with `feature/`
|
||||
- **BLOCK**: `staging`, `develop`, `main` branches
|
||||
- If validation fails: Show clear error and exit
|
||||
|
||||
### 2. Analyze Changes and Generate Commit Message
|
||||
- Run `git status --short` to see modified files
|
||||
- Run `git diff --stat` to see change statistics
|
||||
- Run `git diff` to analyze actual code changes
|
||||
- **Automatically determine**:
|
||||
- **Type**: Based on file changes (feat, fix, refactor, docs, test, chore, etc.)
|
||||
- **Scope**: From file paths (bronze, silver, gold, utilities, pipeline, etc.)
|
||||
- **Description**: Concise summary of what changed (e.g., "add person address table", "fix deduplication logic")
|
||||
- **Work Items**: Extract from branch name pattern (e.g., feature/46225-description → #46225)
|
||||
- **Analysis Rules**:
|
||||
- New files in gold/silver/bronze → `feat`
|
||||
- Modified transformation logic → `refactor` or `fix`
|
||||
- Test files → `test`
|
||||
- Documentation → `docs`
|
||||
- Utilities/session_optimiser → `refactor` or `feat`
|
||||
- Multiple file types → prioritize feat > fix > refactor
|
||||
- Gold layer → scope: `(gold)`
|
||||
- Silver layer → scope: `(silver)` or `(silver_<database>)`
|
||||
- Bronze layer → scope: `(bronze)`
|
||||
- Generate commit message in format: `emoji type(scope): description #workitem`
|
||||
|
||||
### 3. Execute Commit Workflow
|
||||
- Stage all changes: `git add .`
|
||||
- Create commit with auto-generated emoji conventional format
|
||||
- Run pre-commit hooks (ruff lint/format, YAML validation, etc.)
|
||||
- Push to current feature branch
|
||||
|
||||
### 4. Create Pull Request
|
||||
- Use `mcp__ado__repo_create_pull_request` with:
|
||||
- `repositoryId`: e030ea00-2f85-4b19-88c3-05a864d7298d
|
||||
- `sourceRefName`: Current feature branch (refs/heads/feature/*)
|
||||
- `targetRefName`: refs/heads/staging
|
||||
- `title`: Extract from auto-generated commit message
|
||||
- `description`: Brief summary with bullet points based on analyzed changes
|
||||
- Return PR URL to user
|
||||
|
||||
### 5. Add Work Item Comments (Automatic)
|
||||
If PR creation was successful:
|
||||
- Get work items linked to PR using `mcp__ado__repo_get_pull_request_by_id`
|
||||
- For each linked work item:
|
||||
- Verify work item exists with `mcp__ado__wit_get_work_item`
|
||||
- Generate comment with:
|
||||
- PR title and number
|
||||
- Commit message and SHA
|
||||
- File changes summary from `git diff --stat`
|
||||
- Link to PR in Azure DevOps
|
||||
- Link to commit in Azure DevOps
|
||||
- Add comment using `mcp__ado__wit_add_work_item_comment`
|
||||
- Use HTML format for rich formatting
|
||||
- **IMPORTANT**: Do NOT include footer attribution text
|
||||
- **IMPORTANT**: always use australian english in all messages and descriptions
|
||||
- **IMPORTANT**: do not mention that you are using australian english in all messages and descriptions
|
||||
|
||||
## Commit Message Format
|
||||
|
||||
### Type + Emoji Mapping
|
||||
- ✨ `feat`: New feature
|
||||
- 🐛 `fix`: Bug fix
|
||||
- 📝 `docs`: Documentation
|
||||
- 💄 `style`: Formatting/style
|
||||
- ♻️ `refactor`: Code refactoring
|
||||
- ⚡️ `perf`: Performance improvements
|
||||
- ✅ `test`: Tests
|
||||
- 🔧 `chore`: Tooling, configuration
|
||||
- 🚀 `ci`: CI/CD improvements
|
||||
- 🗃️ `db`: Database changes
|
||||
- 🔥 `fix`: Remove code/files
|
||||
- 📦️ `chore`: Dependencies
|
||||
- 🚸 `feat`: UX improvements
|
||||
- 🦺 `feat`: Validation
|
||||
|
||||
### Example Format
|
||||
```
|
||||
✨ feat(gold): add X_MG_Offender linkage table #45497
|
||||
```
|
||||
|
||||
### Auto-Generation Logic
|
||||
|
||||
**File Path Analysis**:
|
||||
- `python_files/gold/*.py` → scope: `(gold)`
|
||||
- `python_files/silver/s_fvms_*.py` → scope: `(silver_fvms)` or `(silver)`
|
||||
- `python_files/silver/s_cms_*.py` → scope: `(silver_cms)` or `(silver)`
|
||||
- `python_files/bronze/*.py` → scope: `(bronze)`
|
||||
- `python_files/utilities/*.py` → scope: `(utilities)`
|
||||
- `python_files/pipeline_operations/*.py` → scope: `(pipeline)`
|
||||
- `python_files/testing/*.py` → scope: `(test)`
|
||||
- `.claude/**`, `*.md` → scope: `(docs)`
|
||||
|
||||
**Change Type Detection**:
|
||||
- New files (`A` in git status) → `feat` ✨
|
||||
- Modified transformation/ETL files → `refactor` ♻️
|
||||
- Bug fixes (keywords: fix, bug, error, issue) → `fix` 🐛
|
||||
- Test files → `test` ✅
|
||||
- Documentation files → `docs` 📝
|
||||
- Configuration files → `chore` 🔧
|
||||
|
||||
**Description Generation**:
|
||||
- Extract meaningful operation from file names and diffs
|
||||
- New table: "add <table_name> table"
|
||||
- Modified logic: "improve/update <functionality>"
|
||||
- Bug fix: "fix <issue_description>"
|
||||
- Refactor: "refactor <component> for <reason>"
|
||||
|
||||
**Work Item Extraction**:
|
||||
- Branch name pattern: `feature/<number>-description` → `#<number>`
|
||||
- Multiple numbers: Extract first occurrence
|
||||
- No number in branch: No work item reference added
|
||||
|
||||
## What This Command Does
|
||||
|
||||
1. Validates you're on a feature branch (feature/*)
|
||||
2. Analyzes git changes to determine type, scope, and description
|
||||
3. Extracts work item numbers from branch name
|
||||
4. Auto-generates commit message with conventional emoji format
|
||||
5. Stages all modified files
|
||||
6. Creates commit with auto-generated message
|
||||
7. Runs pre-commit hooks (auto-fixes code quality issues)
|
||||
8. Pushes to current feature branch
|
||||
9. Creates PR from feature branch → staging
|
||||
10. Automatically adds comments to linked work items with PR details
|
||||
|
||||
## Pre-Commit Hooks
|
||||
|
||||
Your project uses pre-commit with:
|
||||
- **Ruff**: Linting with auto-fix + formatting
|
||||
- **Standard hooks**: Trailing whitespace, YAML/JSON validation
|
||||
- **Security**: Private key detection
|
||||
|
||||
Pre-commit hooks will auto-fix issues and may modify files. The commit process will:
|
||||
1. Run hooks
|
||||
2. Auto-stage modified files
|
||||
3. Complete commit with fixes applied
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Automatic Feature PR
|
||||
```bash
|
||||
/pr-feature-to-staging
|
||||
```
|
||||
**On branch**: `feature/46225-add-person-address-table`
|
||||
**Changed files**: `python_files/gold/g_occ_person_address.py` (new file)
|
||||
|
||||
**Auto-generated commit**: `✨ feat(gold): add person address table #46225`
|
||||
|
||||
This will:
|
||||
1. Analyze changes (new gold layer file)
|
||||
2. Extract work item #46225 from branch name
|
||||
3. Auto-generate commit message
|
||||
4. Commit and push to feature branch
|
||||
5. Create PR: `feature/46225-add-person-address-table → staging`
|
||||
6. Link work item #46225
|
||||
7. Add automatic comment to work item #46225 with PR details
|
||||
|
||||
### Multiple File Changes
|
||||
**On branch**: `feature/46789-refactor-deduplication`
|
||||
**Changed files**:
|
||||
- `python_files/silver/s_fvms_incident.py` (modified)
|
||||
- `python_files/silver/s_cms_offence_report.py` (modified)
|
||||
- `python_files/utilities/session_optimiser.py` (modified)
|
||||
|
||||
**Auto-generated commit**: `♻️ refactor(silver): improve deduplication logic #46789`
|
||||
|
||||
### Fix Bug
|
||||
**On branch**: `feature/47123-fix-timestamp-parsing`
|
||||
**Changed files**: `python_files/utilities/session_optimiser.py` (modified, TableUtilities.clean_date_time_columns)
|
||||
|
||||
**Auto-generated commit**: `🐛 fix(utilities): correct timestamp parsing for null values #47123`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Not on Feature Branch
|
||||
```bash
|
||||
# Error: On staging branch
|
||||
/pr-feature-to-staging
|
||||
```
|
||||
**Result**: ERROR - Must be on feature/* branch. Current: staging
|
||||
|
||||
### Invalid Branch
|
||||
```bash
|
||||
# Error: On develop or main branch
|
||||
/pr-feature-to-staging
|
||||
```
|
||||
**Result**: ERROR - Cannot create feature PR from develop/main branch
|
||||
|
||||
### No Changes to Commit
|
||||
```bash
|
||||
# Error: Working directory clean
|
||||
/pr-feature-to-staging
|
||||
```
|
||||
**Result**: ERROR - No changes to commit. Working directory is clean.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Work on feature branches** - Always create PRs from `feature/*` branches
|
||||
2. **Include work item in branch name** - Use pattern `feature/<work-item>-description` (e.g., `feature/46225-add-person-address`)
|
||||
3. **Make focused changes** - Keep changes related to a single feature/fix for accurate commit message generation
|
||||
4. **Let pre-commit work** - Hooks maintain code quality automatically
|
||||
5. **Review changes** - Check `git status` before running command to ensure only intended files are modified
|
||||
6. **Trust the automation** - The command analyzes your changes and generates appropriate conventional commit messages
|
||||
294
commands/pr-fix-pr-review.md
Executable file
294
commands/pr-fix-pr-review.md
Executable file
@@ -0,0 +1,294 @@
|
||||
---
|
||||
model: claude-haiku-4-5-20251001
|
||||
allowed-tools: Bash(git:*), Read, Edit, Write, Task, mcp__*, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_reply_to_comment, mcp__ado__repo_resolve_comment, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment
|
||||
argument-hint: [PR_ID]
|
||||
description: Address PR review feedback and update pull request
|
||||
---
|
||||
|
||||
# Fix PR Review Issues
|
||||
|
||||
Address feedback from PR review comments, make necessary code changes, and update the pull request.
|
||||
|
||||
## Repository Configuration
|
||||
- **Project**: Program Unify
|
||||
- **Repository ID**: d3fa6f02-bfdf-428d-825c-7e7bd4e7f338
|
||||
- **Repository Name**: unify_2_1_dm_synapse_env_d10
|
||||
|
||||
## What This Does
|
||||
|
||||
1. Retrieves PR details and all active review comments
|
||||
2. Analyzes review feedback and identifies required changes
|
||||
3. Makes code changes to address each review comment
|
||||
4. Commits changes with descriptive message
|
||||
5. Pushes to feature branch (automatically updates PR)
|
||||
6. Replies to review threads confirming fixes
|
||||
7. Resolves review threads when appropriate
|
||||
|
||||
## Implementation Logic
|
||||
|
||||
### 1. Get PR Information
|
||||
- Use \`mcp__ado__repo_get_pull_request_by_id\` with PR_ID from \`$ARGUMENTS\`
|
||||
- Extract source branch, target branch, and PR title
|
||||
- Validate PR is still active
|
||||
|
||||
### 2. Retrieve Review Comments
|
||||
- Use \`mcp__ado__repo_list_pull_request_threads\` to get all threads
|
||||
- Filter for active threads (status = "Active")
|
||||
- For each thread, use \`mcp__ado__repo_list_pull_request_thread_comments\` to get details
|
||||
- Display all review comments with:
|
||||
- File path and line number
|
||||
- Reviewer name
|
||||
- Comment content
|
||||
- Thread ID (for later replies)
|
||||
|
||||
### 3. Checkout Feature Branch
|
||||
\`\`\`bash
|
||||
git fetch origin
|
||||
git checkout <source-branch-name>
|
||||
git pull origin <source-branch-name>
|
||||
\`\`\`
|
||||
|
||||
### 4. Address Each Review Comment
|
||||
|
||||
**Categorise review comments first:**
|
||||
|
||||
#### Standard Code Quality Issues
|
||||
Handle directly with Edit tool for:
|
||||
- Type hints
|
||||
- Line length violations
|
||||
- Formatting issues
|
||||
- Missing decorators
|
||||
- Import organization
|
||||
- Variable naming
|
||||
|
||||
**Implementation:**
|
||||
1. Read affected file using Read tool
|
||||
2. Analyze the feedback and determine required changes
|
||||
3. Make code changes using Edit tool
|
||||
4. Validate changes meet project standards
|
||||
|
||||
#### Complex PySpark Issues
|
||||
**Use pyspark-engineer agent for:**
|
||||
- Performance optimisation requests
|
||||
- Partitioning strategy changes
|
||||
- Shuffle optimisation
|
||||
- Broadcast join refactoring
|
||||
- Memory management improvements
|
||||
- Medallion architecture violations
|
||||
- Complex transformation logic
|
||||
|
||||
**Trigger criteria:**
|
||||
- Review comment mentions: "performance", "optimisation", "partitioning", "shuffle", "memory", "medallion", "bronze/silver/gold layer"
|
||||
- Files affected in: \`python_files/pipeline_operations/\`, \`python_files/silver/\`, \`python_files/gold/\`, \`python_files/utilities/session_optimiser.py\`
|
||||
|
||||
**Use Task tool to launch pyspark-engineer agent:**
|
||||
|
||||
\`\`\`
|
||||
Task tool parameters:
|
||||
- subagent_type: "pyspark-engineer"
|
||||
- description: "Implement PySpark fixes for PR #[PR_ID]"
|
||||
- prompt: "
|
||||
Address PySpark review feedback for PR #[PR_ID]:
|
||||
|
||||
Review Comment Details:
|
||||
[For each PySpark-related comment, include:]
|
||||
- File: [FILE_PATH]
|
||||
- Line: [LINE_NUMBER]
|
||||
- Reviewer Feedback: [COMMENT_TEXT]
|
||||
- Thread ID: [THREAD_ID]
|
||||
|
||||
Implementation Requirements:
|
||||
1. Read all affected files
|
||||
2. Implement fixes following these standards:
|
||||
- Maximum line length: 240 characters
|
||||
- No blank lines inside functions
|
||||
- Proper type hints for all functions
|
||||
- Use @synapse_error_print_handler decorator
|
||||
- PySpark DataFrame operations (not SQL)
|
||||
- Suffix _sdf for all DataFrames
|
||||
- Follow medallion architecture patterns
|
||||
3. Optimize for:
|
||||
- Performance and cost-efficiency
|
||||
- Data skew handling
|
||||
- Memory management
|
||||
- Proper partitioning strategies
|
||||
4. Ensure production readiness:
|
||||
- Error handling
|
||||
- Logging with NotebookLogger
|
||||
- Idempotent operations
|
||||
5. Run quality gates:
|
||||
- Syntax validation: python3 -m py_compile
|
||||
- Linting: ruff check python_files/
|
||||
- Formatting: ruff format python_files/
|
||||
|
||||
Return:
|
||||
1. List of files modified
|
||||
2. Summary of changes made
|
||||
3. Explanation of how each review comment was addressed
|
||||
4. Any additional optimisations implemented
|
||||
"
|
||||
\`\`\`
|
||||
|
||||
**Integration:**
|
||||
- pyspark-engineer will read, modify, and validate files
|
||||
- Agent will run quality gates automatically
|
||||
- You will receive summary of changes
|
||||
- Use summary for commit message and review replies
|
||||
|
||||
#### Validation for All Changes
|
||||
Regardless of method (direct Edit or pyspark-engineer agent):
|
||||
- Maximum line length: 240 characters
|
||||
- No blank lines inside functions
|
||||
- Proper type hints
|
||||
- Use of \`@synapse_error_print_handler\` decorator
|
||||
- PySpark best practices from \`.claude/rules/python_rules.md\`
|
||||
- Document all fixes for commit message
|
||||
|
||||
### 5. Validate Changes
|
||||
Run quality gates:
|
||||
\`\`\`bash
|
||||
# Syntax check
|
||||
python3 -m py_compile <changed-file>
|
||||
|
||||
# Linting
|
||||
ruff check python_files/
|
||||
|
||||
# Format
|
||||
ruff format python_files/
|
||||
\`\`\`
|
||||
|
||||
### 6. Commit and Push
|
||||
\`\`\`bash
|
||||
git add .
|
||||
git commit -m "♻️ refactor: address PR review feedback - <brief-summary>"
|
||||
git push origin <source-branch>
|
||||
\`\`\`
|
||||
|
||||
**Commit Message Format:**
|
||||
\`\`\`
|
||||
♻️ refactor: address PR review feedback
|
||||
|
||||
Fixes applied:
|
||||
- <file1>: <description of fix>
|
||||
- <file2>: <description of fix>
|
||||
- ...
|
||||
|
||||
Review comments addressed in PR #<PR_ID>
|
||||
\`\`\`
|
||||
|
||||
### 7. Reply to Review Threads
|
||||
For each addressed comment:
|
||||
- Use \`mcp__ado__repo_reply_to_comment\` to add reply:
|
||||
\`\`\`
|
||||
✅ Fixed in commit <SHA>
|
||||
|
||||
Changes made:
|
||||
- <specific change description>
|
||||
\`\`\`
|
||||
- Use \`mcp__ado__repo_resolve_comment\` to mark thread as resolved (if appropriate)
|
||||
|
||||
### 8. Report Results
|
||||
Provide summary:
|
||||
\`\`\`
|
||||
PR Review Fixes Completed
|
||||
|
||||
PR: #<PR_ID> - <PR_Title>
|
||||
Branch: <source-branch> → <target-branch>
|
||||
|
||||
Review Comments Addressed: <count>
|
||||
Files Modified: <file-list>
|
||||
Commit SHA: <sha>
|
||||
|
||||
Quality Gates:
|
||||
✓ Syntax validation passed
|
||||
✓ Linting passed
|
||||
✓ Code formatting applied
|
||||
|
||||
The PR has been updated and is ready for re-review.
|
||||
\`\`\`
|
||||
|
||||
## Error Handling
|
||||
|
||||
### No PR ID Provided
|
||||
If \`$ARGUMENTS\` is empty:
|
||||
- Use \`mcp__ado__repo_list_pull_requests_by_repo_or_project\` to list open PRs
|
||||
- Display all PRs created by current user
|
||||
- Prompt user to specify PR ID
|
||||
|
||||
### No Active Review Comments
|
||||
If no active review threads found:
|
||||
\`\`\`
|
||||
No active review comments found for PR #<PR_ID>.
|
||||
|
||||
The PR may already be approved or have no feedback requiring changes.
|
||||
Would you like me to re-run /pr-review to check current status?
|
||||
\`\`\`
|
||||
|
||||
### Merge Conflicts
|
||||
If \`git pull\` results in merge conflicts:
|
||||
1. Display conflict files
|
||||
2. Guide user through resolution:
|
||||
- Show conflicting sections
|
||||
- Suggest resolution based on context
|
||||
- Use Edit tool to resolve
|
||||
3. Complete merge commit
|
||||
4. Continue with review fixes
|
||||
|
||||
### Quality Gate Failures
|
||||
If syntax check or linting fails:
|
||||
1. Display specific errors
|
||||
2. Fix automatically if possible
|
||||
3. Re-run quality gates
|
||||
4. Only proceed to commit when all gates pass
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Fix Review for Specific PR
|
||||
\`\`\`bash
|
||||
/pr-fix-pr-review 5642
|
||||
\`\`\`
|
||||
|
||||
### Fix Review for Latest PR
|
||||
\`\`\`bash
|
||||
/pr-fix-pr-review
|
||||
\`\`\`
|
||||
(Will list your open PRs if ID not provided)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Read all comments first** - Understand full scope before making changes
|
||||
2. **Make targeted fixes** - Address each comment specifically
|
||||
3. **Run quality gates** - Ensure changes meet project standards
|
||||
4. **Descriptive replies** - Explain what was changed and why
|
||||
5. **Resolve appropriately** - Only resolve threads when fix is complete
|
||||
6. **Test locally** - Consider running relevant tests if available
|
||||
|
||||
## Integration with /deploy-workflow
|
||||
|
||||
This command is automatically called by \`/deploy-workflow\` when:
|
||||
- \`/pr-review\` identifies issues requiring changes
|
||||
- The workflow needs to iterate on PR quality before merging
|
||||
|
||||
The workflow will loop:
|
||||
1. \`/pr-review\` → identifies issues (may include pyspark-engineer deep analysis)
|
||||
2. \`/pr-fix-pr-review\` → addresses issues
|
||||
- Standard fixes: Direct Edit tool usage
|
||||
- Complex PySpark fixes: pyspark-engineer agent handles implementation
|
||||
3. \`/pr-review\` → re-validates
|
||||
4. Repeat until PR is approved
|
||||
|
||||
**PySpark-Engineer Integration:**
|
||||
- Automatically triggered for performance and architecture issues
|
||||
- Ensures optimised, production-ready PySpark code
|
||||
- Maintains consistency with medallion architecture patterns
|
||||
- Validates test coverage and quality gates
|
||||
|
||||
## Notes
|
||||
|
||||
- **Automatic PR Update**: Pushing to source branch automatically updates the PR
|
||||
- **No New PR Created**: This updates the existing PR, doesn't create a new one
|
||||
- **Preserves History**: All review iterations are preserved in commit history
|
||||
- **Thread Management**: Replies and resolutions are tracked in Azure DevOps
|
||||
- **Quality First**: Will not commit changes that fail quality gates
|
||||
- **Intelligent Delegation**: Routes simple fixes to Edit tool, complex PySpark issues to specialist agent
|
||||
- **Expert Optimisation**: pyspark-engineer ensures performance and architecture best practices
|
||||
206
commands/pr-review.md
Executable file
206
commands/pr-review.md
Executable file
@@ -0,0 +1,206 @@
|
||||
---
|
||||
model: claude-haiku-4-5-20251001
|
||||
allowed-tools: Bash(git branch:*), Bash(git status:*), Bash(git log:*), Bash(git diff:*), mcp__*, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__repo_list_pull_requests_by_repo_or_project, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_create_pull_request_thread, mcp__ado__repo_reply_to_comment, mcp__ado__repo_update_pull_request, mcp__ado__repo_search_commits, mcp__ado__pipelines_get_builds, Read, Task
|
||||
argument-hint: [PR_ID] (optional - if not provided, will list all open PRs)
|
||||
# PR Review and Approval
|
||||
---
|
||||
## Task
|
||||
Review open pull requests in the current repository and approve/complete them if they meet quality standards.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Get Repository Information
|
||||
- Use `mcp__ado__repo_get_repo_by_name_or_id` with:
|
||||
- Project: `Program Unify`
|
||||
- Repository: `unify_2_1_dm_synapse_env_d10`
|
||||
- Extract repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
|
||||
|
||||
### 2. List Open Pull Requests
|
||||
- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` with:
|
||||
- Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
|
||||
- Status: `Active`
|
||||
- If `$ARGUMENTS` provided, filter to that specific PR ID
|
||||
- Display all open PRs with key details (ID, title, source/target branches, author)
|
||||
|
||||
### 3. Review Each Pull Request
|
||||
For each PR (or the specified PR):
|
||||
|
||||
#### 3.1 Get PR Details
|
||||
- Use `mcp__ado__repo_get_pull_request_by_id` to get full PR details
|
||||
- Check merge status - if conflicts exist, stop and report
|
||||
|
||||
#### 3.2 Get PR Changes
|
||||
- Use `mcp__ado__repo_search_commits` to get commits in the PR
|
||||
- Identify files changed and scope of changes
|
||||
|
||||
#### 3.3 Review Code Quality
|
||||
Read changed files and evaluate:
|
||||
1. **Code Quality & Maintainability**
|
||||
- Proper use of type hints and descriptive variable names
|
||||
- Maximum line length (240 chars) compliance
|
||||
- No blank lines inside functions
|
||||
- Proper import organization
|
||||
- Use of `@synapse_error_print_handler` decorator
|
||||
- Proper error handling with meaningful messages
|
||||
|
||||
2. **PySpark Best Practices**
|
||||
- DataFrame operations over raw SQL
|
||||
- Proper use of `TableUtilities` methods
|
||||
- Correct logging with `NotebookLogger`
|
||||
- Proper session management
|
||||
|
||||
3. **ETL Pattern Compliance**
|
||||
- Follows ETL class pattern for Silver/Gold layers
|
||||
- Proper extract/transform/load method structure
|
||||
- Correct database and table naming conventions
|
||||
|
||||
4. **Standards Compliance**
|
||||
- Follows project coding standards from `.claude/rules/python_rules.md`
|
||||
- No missing docstrings (unless explicitly instructed to omit)
|
||||
- Proper use of configuration from `configuration.yaml`
|
||||
|
||||
#### 3.4 Review DevOps Considerations
|
||||
1. **CI/CD Integration**
|
||||
- Changes compatible with existing pipeline
|
||||
- No breaking changes to deployment process
|
||||
|
||||
2. **Configuration & Infrastructure**
|
||||
- Proper environment detection pattern
|
||||
- Azure integration handled correctly
|
||||
- No hardcoded paths or credentials
|
||||
|
||||
3. **Testing & Quality Gates**
|
||||
- Syntax validation would pass
|
||||
- Linting compliance (ruff check)
|
||||
- Test coverage for new functionality
|
||||
|
||||
#### 3.5 Deep PySpark Analysis (Conditional)
|
||||
**Only execute if PR modifies PySpark ETL code**
|
||||
|
||||
Check if PR changes affect:
|
||||
- `python_files/pipeline_operations/bronze_layer_deployment.py`
|
||||
- `python_files/pipeline_operations/silver_dag_deployment.py`
|
||||
- `python_files/pipeline_operations/gold_dag_deployment.py`
|
||||
- Any files in `python_files/silver/`
|
||||
- Any files in `python_files/gold/`
|
||||
- `python_files/utilities/session_optimiser.py`
|
||||
|
||||
**If PySpark files are modified, use Task tool to launch pyspark-engineer agent:**
|
||||
|
||||
```
|
||||
Task tool parameters:
|
||||
- subagent_type: "pyspark-engineer"
|
||||
- description: "Deep PySpark analysis for PR #[PR_ID]"
|
||||
- prompt: "
|
||||
Perform expert-level PySpark analysis for PR #[PR_ID]:
|
||||
|
||||
PR Details:
|
||||
- Title: [PR_TITLE]
|
||||
- Changed Files: [LIST_OF_CHANGED_FILES]
|
||||
- Source Branch: [SOURCE_BRANCH]
|
||||
- Target Branch: [TARGET_BRANCH]
|
||||
|
||||
Review Requirements:
|
||||
1. Read all changed PySpark files
|
||||
2. Analyze transformation logic for:
|
||||
- Partitioning strategies and data skew
|
||||
- Shuffle optimisation opportunities
|
||||
- Broadcast join usage and optimisation
|
||||
- Memory management and caching strategies
|
||||
- DataFrame operation efficiency
|
||||
3. Validate Medallion Architecture compliance:
|
||||
- Bronze layer: Raw data preservation patterns
|
||||
- Silver layer: Cleansing and standardization
|
||||
- Gold layer: Business model optimisation
|
||||
4. Check performance considerations:
|
||||
- Identify potential bottlenecks
|
||||
- Suggest optimisation opportunities
|
||||
- Validate cost-efficiency patterns
|
||||
5. Verify test coverage:
|
||||
- Check for pytest test files
|
||||
- Validate test completeness
|
||||
- Suggest missing test scenarios
|
||||
6. Review production readiness:
|
||||
- Error handling for data pipeline failures
|
||||
- Idempotent operation design
|
||||
- Monitoring and logging completeness
|
||||
|
||||
Provide detailed findings in this format:
|
||||
|
||||
## PySpark Analysis Results
|
||||
|
||||
### Critical Issues (blocking)
|
||||
- [List any critical performance or correctness issues]
|
||||
|
||||
### Performance Optimisations
|
||||
- [Specific optimisation recommendations]
|
||||
|
||||
### Architecture Compliance
|
||||
- [Medallion architecture adherence assessment]
|
||||
|
||||
### Test Coverage
|
||||
- [Test completeness and gaps]
|
||||
|
||||
### Recommendations
|
||||
- [Specific actionable improvements]
|
||||
|
||||
Return your analysis for integration into the PR review.
|
||||
"
|
||||
```
|
||||
|
||||
**Integration of PySpark Analysis:**
|
||||
- If pyspark-engineer identifies critical issues → Add to review comments
|
||||
- If optimisations suggested → Add as optional improvement comments
|
||||
- If architecture violations found → Add as required changes
|
||||
- Include all findings in final review summary
|
||||
|
||||
### 4. Provide Review Comments
|
||||
- Use `mcp__ado__repo_list_pull_request_threads` to check existing review comments
|
||||
- If issues found, use `mcp__ado__repo_create_pull_request_thread` to add:
|
||||
- Specific file-level comments with line numbers
|
||||
- Clear description of issues
|
||||
- Suggested improvements
|
||||
- Mark as `Active` status if changes required
|
||||
|
||||
### 5. Approve and Complete PR (if satisfied)
|
||||
**Only proceed if ALL criteria met:**
|
||||
- No merge conflicts
|
||||
- Code quality standards met
|
||||
- PySpark best practices followed
|
||||
- ETL patterns correct
|
||||
- No DevOps concerns
|
||||
- Proper error handling and logging
|
||||
- Standards compliant
|
||||
- **PySpark analysis (if performed) shows no critical issues**
|
||||
- **Performance optimisations either implemented or deferred with justification**
|
||||
- **Medallion architecture compliance validated**
|
||||
|
||||
**If approved:**
|
||||
1. Use `mcp__ado__repo_update_pull_request` with:
|
||||
- Set `autoComplete: true`
|
||||
- Set `mergeStrategy: "NoFastForward"` (or "Squash" if many small commits)
|
||||
- Set `deleteSourceBranch: false` (preserve branch history)
|
||||
- Set `transitionWorkItems: true`
|
||||
- Add approval comment explaining what was reviewed
|
||||
|
||||
2. Confirm completion with summary:
|
||||
- PR ID and title
|
||||
- Number of commits reviewed
|
||||
- Key changes identified
|
||||
- Approval rationale
|
||||
|
||||
### 6. Report Results
|
||||
Provide comprehensive summary:
|
||||
- Total open PRs reviewed
|
||||
- PRs approved and completed (with IDs)
|
||||
- PRs requiring changes (with summary of issues)
|
||||
- PRs blocked by merge conflicts
|
||||
- **PySpark analysis findings (if performed)**
|
||||
- **Performance optimisation recommendations**
|
||||
|
||||
## Important Notes
|
||||
- **No deferrals**: All identified issues must be addressed before approval
|
||||
- **Immediate action**: If improvements needed, request them now - no "future work" comments
|
||||
- **Thorough review**: Check both code quality AND DevOps considerations
|
||||
- **Professional objectivity**: Prioritize technical accuracy over validation
|
||||
- **Merge conflicts**: Do NOT approve PRs with merge conflicts - report them for manual resolution
|
||||
35
commands/pr-staging-to-develop.md
Executable file
35
commands/pr-staging-to-develop.md
Executable file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
model: claude-haiku-4-5-20251001
|
||||
allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__*, mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item
|
||||
argument-hint: [message] | --no-verify | --amend | --pr-s | --pr-d | --pr-m
|
||||
# Create Remote PR: staging → develop
|
||||
---
|
||||
## Task
|
||||
Create a pull request from remote `staging` branch to remote `develop` branch using Azure DevOps MCP tools.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Create PR
|
||||
- Use `mcp__ado__repo_create_pull_request` tool
|
||||
- Source: `refs/heads/staging` (remote only - do NOT push local branches)
|
||||
- Target: `refs/heads/develop`
|
||||
- Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
|
||||
- Title: Clear, concise description with conventional commit emoji
|
||||
- Description: Brief bullet points summarising changes (keep short)
|
||||
|
||||
### 2. Check for Merge Conflicts
|
||||
- Use `mcp__ado__repo_get_pull_request_by_id` to verify PR status
|
||||
- If merge conflicts exist, resolve them:
|
||||
1. Create temporary branch from `origin/staging`
|
||||
2. Merge `origin/develop` into temp branch
|
||||
3. Resolve conflicts using Edit tool
|
||||
4. Commit resolution: `🔀 Merge origin/develop into staging - resolve conflicts for PR #XXXX`
|
||||
5. Push resolved merge to `origin/staging`
|
||||
6. Clean up temp branch
|
||||
|
||||
### 3. Success Criteria
|
||||
- PR created successfully
|
||||
- No merge conflicts preventing approval
|
||||
- PR ready for reviewer approval
|
||||
|
||||
storageexplorer://v=1&accountid=%2Fsubscriptions%2F646e3673-7a99-4617-9f7e-47857fa18002%2FresourceGroups%2FAuE-Atlas-DataPlatform-DEV-RG%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fauedatamigdevlake&subscriptionid=646e3673-7a99-4617-9f7e-47857fa18002&resourcetype=Azure.FileShare&resourcename=atldev01ndsdb1
|
||||
183
commands/prime-claude.md
Normal file
183
commands/prime-claude.md
Normal file
@@ -0,0 +1,183 @@
|
||||
---
|
||||
name: prime-claude-md
|
||||
description: Distill CLAUDE.md to essentials, moving detailed knowledge into skills for on-demand loading. Reduces context pollution by 80-90%.
|
||||
args: [--analyze-only] | [--backup] | [--apply]
|
||||
---
|
||||
|
||||
# Prime CLAUDE.md
|
||||
|
||||
Distill your CLAUDE.md file to only essential information, moving detailed knowledge into skills.
|
||||
|
||||
## Problem
|
||||
|
||||
Large CLAUDE.md files (400+ lines) are loaded into context for EVERY conversation:
|
||||
- Wastes 5,000-15,000 tokens per conversation
|
||||
- Reduces space for actual work
|
||||
- Slows Claude's responses
|
||||
- 80% of the content is rarely needed
|
||||
|
||||
## Solution
|
||||
|
||||
**Prime your CLAUDE.md**:
|
||||
1. Keep only critical architecture and coding standards
|
||||
2. Move detailed knowledge into skills (loaded on-demand)
|
||||
3. Reduce from 400+ lines to ~100 lines
|
||||
4. Save 80-90% context per conversation
|
||||
|
||||
## Usage
|
||||
|
||||
### Analyze Current CLAUDE.md
|
||||
```bash
|
||||
/prime-claude-md --analyze-only
|
||||
```
|
||||
Shows what would be moved to skills without making changes.
|
||||
|
||||
### Create Backup and Apply
|
||||
```bash
|
||||
/prime-claude-md --backup --apply
|
||||
```
|
||||
1. Backs up current CLAUDE.md to CLAUDE.md.backup
|
||||
2. Creates supporting skills with detailed knowledge
|
||||
3. Replaces CLAUDE.md with distilled version
|
||||
4. Documents what was moved where
|
||||
|
||||
### Just Apply (No Backup)
|
||||
```bash
|
||||
/prime-claude-md --apply
|
||||
```
|
||||
|
||||
## What Gets Distilled
|
||||
|
||||
### Kept in CLAUDE.md (Essential)
|
||||
- Critical architecture concepts (high-level only)
|
||||
- Mandatory coding standards (line length, blank lines, decorators)
|
||||
- Quality gates (syntax check, linting, formatting)
|
||||
- Essential commands (2-3 most common)
|
||||
- References to skills for details
|
||||
|
||||
### Moved to Skills (Detailed Knowledge)
|
||||
|
||||
**project-architecture** skill:
|
||||
- Detailed medallion architecture
|
||||
- Pipeline execution flow
|
||||
- Data source details
|
||||
- Azure integration specifics
|
||||
- Configuration management
|
||||
- Testing architecture
|
||||
|
||||
**project-commands** skill:
|
||||
- Complete make command reference
|
||||
- All development workflows
|
||||
- Azure operations
|
||||
- Database operations
|
||||
- Git operations
|
||||
- Troubleshooting commands
|
||||
|
||||
**pyspark-patterns** skill:
|
||||
- TableUtilities method documentation
|
||||
- ETL class pattern details
|
||||
- Logging standards
|
||||
- DataFrame operation patterns
|
||||
- JDBC connection patterns
|
||||
- Performance tips
|
||||
|
||||
## Results
|
||||
|
||||
**Before Priming**:
|
||||
- CLAUDE.md: 420 lines
|
||||
- Context cost: ~12,000 tokens per conversation
|
||||
- Skills: 0
|
||||
- Knowledge: Always loaded
|
||||
|
||||
**After Priming**:
|
||||
- CLAUDE.md: ~100 lines (76% reduction)
|
||||
- Context cost: ~2,000 tokens per conversation (83% savings)
|
||||
- Skills: 3 specialized skills
|
||||
- Knowledge: Loaded only when needed
|
||||
|
||||
## Example Distilled CLAUDE.md
|
||||
|
||||
```markdown
|
||||
# CLAUDE.md
|
||||
|
||||
**CRITICAL**: READ `.claude/rules/python_rules.md`
|
||||
|
||||
## Architecture
|
||||
Medallion: Bronze → Silver → Gold
|
||||
Core: `session_optimiser.py` (SparkOptimiser, NotebookLogger, TableUtilities)
|
||||
|
||||
## Essential Commands
|
||||
python3 -m py_compile <file> # Must run
|
||||
ruff check python_files/ # Must pass
|
||||
make run_all # Full pipeline
|
||||
|
||||
## Coding Standards
|
||||
- Line length: 240 chars
|
||||
- No blank lines in functions
|
||||
- Use @synapse_error_print_handler
|
||||
- Use logger (not print)
|
||||
|
||||
## Skills Available
|
||||
- project-architecture: Detailed architecture
|
||||
- project-commands: Complete command reference
|
||||
- pyspark-patterns: PySpark best practices
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Faster conversations**: Less context overhead
|
||||
2. **Better responses**: More room for actual work
|
||||
3. **On-demand knowledge**: Load only what you need
|
||||
4. **Maintainable**: Easier to update focused skills
|
||||
5. **Reusable pattern**: Apply to any repository
|
||||
|
||||
## Applying to Other Repositories
|
||||
|
||||
This command is repository-agnostic. To use on another repo:
|
||||
|
||||
1. Run `/prime-claude-md --analyze-only` to see what you have
|
||||
2. Command will identify:
|
||||
- Architectural concepts
|
||||
- Command references
|
||||
- Coding standards
|
||||
- Configuration details
|
||||
3. Creates appropriate skills based on content
|
||||
4. Run `/prime-claude-md --apply` when ready
|
||||
|
||||
## Files Created
|
||||
|
||||
```
|
||||
.claude/
|
||||
├── CLAUDE.md # Distilled (100 lines)
|
||||
├── CLAUDE.md.backup # Original (if --backup used)
|
||||
└── skills/
|
||||
├── project-architecture/
|
||||
│ └── skill.md # Architecture details
|
||||
├── project-commands/
|
||||
│ └── skill.md # Command reference
|
||||
└── pyspark-patterns/ # (project-specific)
|
||||
└── skill.md # Code patterns
|
||||
```
|
||||
|
||||
## Philosophy
|
||||
|
||||
**CLAUDE.md should answer**: "What's special about this repo?"
|
||||
|
||||
**Skills should answer**: "How do I do X in detail?"
|
||||
|
||||
## Task Execution
|
||||
|
||||
I will:
|
||||
1. Read current CLAUDE.md (both project and global if exists)
|
||||
2. Analyze content and categorize
|
||||
3. Create distilled CLAUDE.md (essential only)
|
||||
4. Create supporting skills with detailed knowledge
|
||||
5. If --backup: Save CLAUDE.md.backup
|
||||
6. If --apply: Replace CLAUDE.md with distilled version
|
||||
7. Generate summary report of changes
|
||||
|
||||
---
|
||||
|
||||
**Current Project**: Unify Data Migration (PySpark/Azure Synapse)
|
||||
|
||||
Let me analyze your CLAUDE.md and create the distilled version with supporting skills.
|
||||
607
commands/pyspark-errors.md
Executable file
607
commands/pyspark-errors.md
Executable file
@@ -0,0 +1,607 @@
|
||||
# PySpark Error Fixing Command
|
||||
|
||||
## Objective
|
||||
Execute `make gold_table` and systematically fix all errors encountered in the PySpark gold layer file using specialized agents. Errors may be code-based (syntax, type, runtime) or logical (incorrect joins, missing data, business rule violations).
|
||||
|
||||
## Agent Workflow (MANDATORY)
|
||||
|
||||
### Phase 1: Error Fixing with pyspark-engineer
|
||||
**CRITICAL**: All PySpark error fixing MUST be performed by the `pyspark-engineer` agent. Do NOT attempt to fix errors directly.
|
||||
|
||||
1. Launch the `pyspark-engineer` agent with:
|
||||
- Full error stack trace and context
|
||||
- Target file path
|
||||
- All relevant schema information from MCP server
|
||||
- Data dictionary references
|
||||
|
||||
2. The pyspark-engineer will:
|
||||
- Validate MCP server connectivity
|
||||
- Query schemas and foreign key relationships
|
||||
- Analyze and fix all errors systematically
|
||||
- Apply fixes following project coding standards
|
||||
- Run quality gates (py_compile, ruff check, ruff format)
|
||||
|
||||
### Phase 2: Code Review with code-reviewer
|
||||
**CRITICAL**: After pyspark-engineer completes fixes, MUST launch the `code-reviewer` agent.
|
||||
|
||||
1. Launch the `code-reviewer` agent with:
|
||||
- Path to the fixed file(s)
|
||||
- Context: "PySpark gold layer error fixes"
|
||||
- Request comprehensive review focusing on:
|
||||
- PySpark best practices
|
||||
- Join logic correctness
|
||||
- Schema alignment
|
||||
- Business rule implementation
|
||||
- Code quality and standards adherence
|
||||
|
||||
2. The code-reviewer will provide:
|
||||
- Detailed feedback on all issues found
|
||||
- Security vulnerabilities
|
||||
- Performance optimization opportunities
|
||||
- Code quality improvements needed
|
||||
|
||||
### Phase 3: Iterative Refinement (MANDATORY LOOP)
|
||||
**CRITICAL**: The review-refactor cycle MUST continue until code-reviewer is 100% satisfied.
|
||||
|
||||
1. If code-reviewer identifies ANY issues:
|
||||
- Launch pyspark-engineer again with code-reviewer's feedback
|
||||
- pyspark-engineer implements all recommended changes
|
||||
- Launch code-reviewer again to re-validate
|
||||
|
||||
2. Repeat Phase 1 → Phase 2 → Phase 3 until:
|
||||
- code-reviewer explicitly states: "✓ 100% SATISFIED - No further changes required"
|
||||
- Zero issues, warnings, or concerns remain
|
||||
- All quality gates pass
|
||||
- All business rules validated
|
||||
|
||||
3. Only then is the error fixing task complete.
|
||||
|
||||
**DO NOT PROCEED TO COMPLETION** until code-reviewer gives explicit 100% satisfaction confirmation.
|
||||
|
||||
## Pre-Execution Requirements
|
||||
|
||||
### 1. Python Coding Standards (CRITICAL - READ FIRST)
|
||||
**MANDATORY**: All code MUST follow `.claude/rules/python_rules.md` standards:
|
||||
- **Line 19**: Use DataFrame API not Spark SQL
|
||||
- **Line 20**: Do NOT use DataFrame aliases (e.g., `.alias("l")`) or `col()` function - use direct string references or `df["column"]` syntax
|
||||
- **Line 8**: Limit line length to 240 characters
|
||||
- **Line 9-10**: Single line per statement, no carriage returns mid-statement
|
||||
- **Line 10, 12**: No blank lines inside functions
|
||||
- **Line 11**: Close parentheses on the last line of code
|
||||
- **Line 5**: Use type hints for all function parameters and return values
|
||||
- **Line 18**: Import statements only at the start of file, never inside functions
|
||||
- **Line 16**: Run `ruff check` and `ruff format` before finalizing
|
||||
- Import only necessary PySpark functions: `from pyspark.sql.functions import when, coalesce, lit` (NO col() usage - use direct references instead)
|
||||
|
||||
### 2. Identify Target File
|
||||
- Default target: `python_files/gold/<INSERT FILE NAME>.py`
|
||||
- Override via Makefile: `G_RUN_FILE_NAME` variable (line 63)
|
||||
- Verify file exists before execution
|
||||
|
||||
### 3. Environment Context
|
||||
- **Runtime Environment**: Local development (not Azure Synapse)
|
||||
- **Working Directory**: `/workspaces/unify_2_1_dm_synapse_env_d10`
|
||||
- **Python Version**: 3.11+
|
||||
- **Spark Mode**: Local cluster (`local[*]`)
|
||||
- **Data Location**: `/workspaces/data` (parquet files)
|
||||
|
||||
### 4. Available Resources
|
||||
- **Data Dictionary**: `.claude/data_dictionary/*.md` - schema definitions for all CMS, FVMS, NicheRMS tables
|
||||
- **Configuration**: `configuration.yaml` - database lists, null replacements, Azure settings
|
||||
- **MCP Schema Server**: `mcp-server-motherduck` - live schema access via MCP (REQUIRED for schema verification)
|
||||
- **Utilities Module**: `python_files/utilities/session_optimiser.py` - TableUtilities, NotebookLogger, decorators
|
||||
- **Example Files**: Other `python_files/gold/g_*.py` files for reference patterns
|
||||
|
||||
### 5. MCP Server Validation (CRITICAL)
|
||||
**BEFORE PROCEEDING**, verify MCP server connectivity:
|
||||
|
||||
1. **Test MCP Server Connection**:
|
||||
- Attempt to query any known table schema via MCP
|
||||
- Example test: Query schema for a common table (e.g., `silver_cms.s_cms_offence_report`)
|
||||
|
||||
2. **Validation Criteria**:
|
||||
- MCP server must respond with valid schema data
|
||||
- Schema must include column names, data types, and nullability
|
||||
- Response must be recent (not cached/stale data)
|
||||
|
||||
3. **Failure Handling**:
|
||||
```
|
||||
⚠️ STOP: MCP Server Not Available
|
||||
|
||||
The MCP server (mcp-server-motherduck) is not responding or not providing valid schema data.
|
||||
|
||||
This command requires live schema access to:
|
||||
- Verify column names and data types
|
||||
- Validate join key compatibility
|
||||
- Check foreign key relationships
|
||||
- Ensure accurate schema matching
|
||||
|
||||
Actions Required:
|
||||
1. Check MCP server status and configuration
|
||||
2. Verify MotherDuck connection credentials
|
||||
3. Ensure schema database is accessible
|
||||
4. Restart MCP server if necessary
|
||||
|
||||
Cannot proceed with error fixing without verified schema access.
|
||||
Use data dictionary files as fallback, but warn user of potential schema drift.
|
||||
```
|
||||
|
||||
4. **Success Confirmation**:
|
||||
```
|
||||
✓ MCP Server Connected
|
||||
✓ Schema data available
|
||||
✓ Proceeding with error fixing workflow
|
||||
```
|
||||
|
||||
## Error Detection Strategy
|
||||
|
||||
### Phase 1: Execute and Capture Errors
|
||||
1. Run: `make gold_table`
|
||||
2. Capture full stack trace including:
|
||||
- Error type (AttributeError, KeyError, AnalysisException, etc.)
|
||||
- Line number and function name
|
||||
- Failed DataFrame operation
|
||||
- Column names involved
|
||||
- Join conditions if applicable
|
||||
|
||||
### Phase 2: Categorize Error Types
|
||||
|
||||
#### A. Code-Based Errors
|
||||
|
||||
**Syntax/Import Errors**
|
||||
- Missing imports from `pyspark.sql.functions`
|
||||
- Incorrect function signatures
|
||||
- Type hint violations
|
||||
- Decorator usage errors
|
||||
|
||||
**Runtime Errors**
|
||||
- `AnalysisException`: Column not found, table doesn't exist
|
||||
- `AttributeError`: Calling non-existent DataFrame methods
|
||||
- `KeyError`: Dictionary access failures
|
||||
- `TypeError`: Incompatible data types in operations
|
||||
|
||||
**DataFrame Schema Errors**
|
||||
- Column name mismatches (case sensitivity)
|
||||
- Duplicate column names after joins
|
||||
- Missing required columns for downstream operations
|
||||
- Incorrect column aliases
|
||||
|
||||
#### B. Logical Errors
|
||||
|
||||
**Join Issues**
|
||||
- **Incorrect Join Keys**: Joining on wrong columns (e.g., `offence_report_id` vs `cms_offence_report_id`)
|
||||
- **Missing Table Aliases**: Ambiguous column references after joins
|
||||
- **Wrong Join Types**: Using `inner` when `left` is required (or vice versa)
|
||||
- **Cartesian Products**: Missing join conditions causing data explosion
|
||||
- **Broadcast Misuse**: Not using `broadcast()` for small dimension tables
|
||||
- **Duplicate Join Keys**: Multiple rows with same key causing row multiplication
|
||||
|
||||
**Aggregation Problems**
|
||||
- Incorrect `groupBy()` columns
|
||||
- Missing aggregation functions (`first()`, `last()`, `collect_list()`)
|
||||
- Wrong window specifications
|
||||
- Aggregating on nullable columns without `coalesce()`
|
||||
|
||||
**Business Rule Violations**
|
||||
- Incorrect date/time logic (e.g., using `reported_date_time` when `date_created` should be fallback)
|
||||
- Missing null handling for critical fields
|
||||
- Status code logic errors
|
||||
- Incorrect coalesce order
|
||||
|
||||
**Data Quality Issues**
|
||||
- Expected vs actual row counts (use `logger.info(f"Expected X rows, got {df.count()}")`)
|
||||
- Null propagation in critical columns
|
||||
- Duplicate records not being handled
|
||||
- Missing deduplication logic
|
||||
|
||||
## Systematic Debugging Process
|
||||
|
||||
### Step 1: Schema Verification
|
||||
For each source table mentioned in the error:
|
||||
|
||||
1. **PRIMARY: Query MCP Server for Schema** (MANDATORY FIRST STEP):
|
||||
- Use MCP tools to query table schema from MotherDuck
|
||||
- Extract column names, data types, nullability, and constraints
|
||||
- Verify foreign key relationships for join operations
|
||||
- Cross-reference with error column names
|
||||
|
||||
**Example MCP Query Pattern**:
|
||||
```
|
||||
Query: "Get schema for table silver_cms.s_cms_offence_report"
|
||||
Expected Response: Column list with types and constraints
|
||||
```
|
||||
|
||||
**If MCP Server Fails**:
|
||||
- STOP and warn user (see Section 4: MCP Server Validation)
|
||||
- Do NOT proceed with fixing without schema verification
|
||||
- Suggest user check MCP server configuration
|
||||
|
||||
2. **SECONDARY: Verify Schema Using Data Dictionary** (as supplementary reference):
|
||||
- Read `.claude/data_dictionary/{source}_{table}.md`
|
||||
- Compare MCP schema vs data dictionary for consistency
|
||||
- Note any schema drift or discrepancies
|
||||
- Alert user if schemas don't match
|
||||
|
||||
3. **Check Table Existence**:
|
||||
```python
|
||||
spark.sql("SHOW TABLES IN silver_cms").show()
|
||||
```
|
||||
|
||||
4. **Inspect Actual Runtime Schema** (validate MCP data):
|
||||
```python
|
||||
df = spark.read.table("silver_cms.s_cms_offence_report")
|
||||
df.printSchema()
|
||||
df.select([col for col in df.columns[:10]]).show(5, truncate=False)
|
||||
```
|
||||
|
||||
**Compare**:
|
||||
- MCP schema vs Spark runtime schema
|
||||
- Report any mismatches to user
|
||||
- Use runtime schema as source of truth if conflicts exist
|
||||
|
||||
5. **Use DuckDB Schema** (if available, as additional validation):
|
||||
- Query schema.db for column definitions
|
||||
- Check foreign key relationships
|
||||
- Validate join key data types
|
||||
- Triangulate: MCP + DuckDB + Data Dictionary should align
|
||||
|
||||
### Step 2: Join Logic Validation
|
||||
|
||||
For each join operation:
|
||||
|
||||
1. **Use MCP Server to Validate Join Relationships**:
|
||||
- Query foreign key constraints from MCP schema server
|
||||
- Identify correct join column names and data types
|
||||
- Verify parent-child table relationships
|
||||
- Confirm join key nullability (affects join results)
|
||||
|
||||
**Example MCP Queries**:
|
||||
```
|
||||
Query: "Show foreign keys for table silver_cms.s_cms_offence_report"
|
||||
Query: "What columns link s_cms_offence_report to s_cms_case_file?"
|
||||
Query: "Get data type for column cms_offence_report_id in silver_cms.s_cms_offence_report"
|
||||
```
|
||||
|
||||
**If MCP Returns No Foreign Keys**:
|
||||
- Fall back to data dictionary documentation
|
||||
- Check `.claude/data_dictionary/` for relationship diagrams
|
||||
- Manually verify join logic with business analyst
|
||||
|
||||
2. **Verify Join Keys Exist** (using MCP-confirmed column names):
|
||||
```python
|
||||
left_df.select("join_key_column").show(5)
|
||||
right_df.select("join_key_column").show(5)
|
||||
```
|
||||
|
||||
3. **Check Join Key Data Type Compatibility** (cross-reference with MCP schema):
|
||||
```python
|
||||
# Verify types match MCP schema expectations
|
||||
left_df.select("join_key_column").dtypes
|
||||
right_df.select("join_key_column").dtypes
|
||||
```
|
||||
|
||||
4. **Check Join Key Uniqueness**:
|
||||
```python
|
||||
left_df.groupBy("join_key_column").count().filter("count > 1").show()
|
||||
```
|
||||
|
||||
5. **Validate Join Type**:
|
||||
- `left`: Keep all left records (most common for fact-to-dimension)
|
||||
- `inner`: Only matching records
|
||||
- Use `broadcast()` for small lookup tables (< 10MB)
|
||||
- Confirm join type matches MCP foreign key relationship (nullable FK → left join)
|
||||
|
||||
6. **Handle Ambiguous Columns**:
|
||||
```python
|
||||
# BEFORE (causes ambiguity if both tables have same column names)
|
||||
joined_df = left_df.join(right_df, on="common_id", how="left")
|
||||
|
||||
# AFTER (select specific columns to avoid ambiguity)
|
||||
left_cols = [c for c in left_df.columns]
|
||||
right_cols = ["dimension_field"]
|
||||
joined_df = left_df.join(right_df, on="common_id", how="left").select(left_cols + right_cols)
|
||||
```
|
||||
|
||||
### Step 3: Aggregation Verification
|
||||
|
||||
1. **Check groupBy Columns**:
|
||||
- Must include all columns not being aggregated
|
||||
- Verify columns exist in DataFrame
|
||||
|
||||
2. **Validate Aggregation Functions**:
|
||||
```python
|
||||
from pyspark.sql.functions import min, max, first, count, sum, coalesce, lit
|
||||
|
||||
aggregated = df.groupBy("key").agg(min("date_column").alias("earliest_date"), max("date_column").alias("latest_date"), first("dimension_column", ignorenulls=True).alias("dimension"), count("*").alias("record_count"), coalesce(sum("amount"), lit(0)).alias("total_amount"))
|
||||
```
|
||||
|
||||
3. **Test Aggregation Logic**:
|
||||
- Run aggregation on small sample
|
||||
- Compare counts before/after
|
||||
- Check for unexpected nulls
|
||||
|
||||
### Step 4: Business Rule Testing
|
||||
|
||||
1. **Verify Timestamp Logic**:
|
||||
```python
|
||||
from pyspark.sql.functions import when
|
||||
|
||||
df.select("reported_date_time", "date_created", when(df["reported_date_time"].isNotNull(), df["reported_date_time"]).otherwise(df["date_created"]).alias("final_timestamp")).show(10)
|
||||
```
|
||||
|
||||
2. **Test Null Handling**:
|
||||
```python
|
||||
from pyspark.sql.functions import coalesce, lit
|
||||
|
||||
df.select("primary_field", "fallback_field", coalesce(df["primary_field"], df["fallback_field"], lit(0)).alias("result")).show(10)
|
||||
```
|
||||
|
||||
3. **Validate Status/Lookup Logic**:
|
||||
- Check status code mappings against data dictionary
|
||||
- Verify conditional logic matches business requirements
|
||||
|
||||
## Common Error Patterns and Fixes
|
||||
|
||||
### Pattern 1: Column Not Found After Join
|
||||
**Error**: `AnalysisException: Column 'offence_report_id' not found`
|
||||
|
||||
**Root Cause**: Incorrect column name - verify column exists using MCP schema
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# BEFORE - wrong column name
|
||||
df = left_df.join(right_df, on="offence_report_id", how="left")
|
||||
|
||||
# AFTER - MCP-verified correct column name
|
||||
df = left_df.join(right_df, on="cms_offence_report_id", how="left")
|
||||
|
||||
# If joining on different column names between tables:
|
||||
df = left_df.join(
|
||||
right_df,
|
||||
left_df["cms_offence_report_id"] == right_df["offence_report_id"],
|
||||
how="left"
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Duplicate Column Names
|
||||
**Error**: Multiple columns with same name causing selection issues
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# BEFORE - causes duplicate 'id' column
|
||||
joined = left_df.join(right_df, left_df["id"] == right_df["id"], how="left")
|
||||
|
||||
# AFTER - drop duplicate from right table before join
|
||||
right_df_clean = right_df.drop("id")
|
||||
joined = left_df.join(right_df_clean, left_df["id"] == right_df["id"], how="left")
|
||||
|
||||
# OR - rename columns to avoid duplicates
|
||||
right_df_renamed = right_df.withColumnRenamed("id", "related_id")
|
||||
joined = left_df.join(right_df_renamed, left_df["id"] == right_df_renamed["related_id"], how="left")
|
||||
```
|
||||
|
||||
### Pattern 3: Incorrect Aggregation
|
||||
**Error**: Column not in GROUP BY causing aggregation failure
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
from pyspark.sql.functions import min, first
|
||||
|
||||
# BEFORE - non-aggregated column not in groupBy
|
||||
df.groupBy("key1").agg(min("date_field"), "non_aggregated_field")
|
||||
|
||||
# AFTER - all non-grouped columns must be aggregated
|
||||
df = df.groupBy("key1").agg(min("date_field").alias("min_date"), first("non_aggregated_field", ignorenulls=True).alias("non_aggregated_field"))
|
||||
```
|
||||
|
||||
### Pattern 4: Join Key Mismatch
|
||||
**Error**: No matching records or unexpected cartesian product
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
left_df.select("join_key").show(20)
|
||||
right_df.select("join_key").show(20)
|
||||
left_df.select("join_key").dtypes
|
||||
right_df.select("join_key").dtypes
|
||||
left_df.filter(left_df["join_key"].isNull()).count()
|
||||
right_df.filter(right_df["join_key"].isNull()).count()
|
||||
result = left_df.join(right_df, left_df["join_key"].cast("int") == right_df["join_key"].cast("int"), how="left")
|
||||
```
|
||||
|
||||
### Pattern 5: Missing Null Handling
|
||||
**Error**: Unexpected nulls propagating through transformations
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
from pyspark.sql.functions import coalesce, lit
|
||||
|
||||
# BEFORE - NULL if either field is NULL
|
||||
df = df.withColumn("result", df["field1"] + df["field2"])
|
||||
|
||||
# AFTER - handle nulls with coalesce
|
||||
df = df.withColumn("result", coalesce(df["field1"], lit(0)) + coalesce(df["field2"], lit(0)))
|
||||
```
|
||||
|
||||
## Validation Requirements
|
||||
|
||||
After fixing errors, validate:
|
||||
|
||||
1. **Row Counts**: Log and verify expected vs actual counts at each transformation
|
||||
2. **Schema**: Ensure output schema matches target table requirements
|
||||
3. **Nulls**: Check critical columns for unexpected nulls
|
||||
4. **Duplicates**: Verify uniqueness of ID columns
|
||||
5. **Data Ranges**: Check timestamp ranges and numeric bounds
|
||||
6. **Join Results**: Sample joined records to verify correctness
|
||||
|
||||
## Logging Requirements
|
||||
|
||||
Use `NotebookLogger` throughout:
|
||||
|
||||
```python
|
||||
logger = NotebookLogger()
|
||||
|
||||
# Start of operation
|
||||
logger.info(f"Starting extraction from {table_name}")
|
||||
|
||||
# After DataFrame creation
|
||||
logger.info(f"Extracted {df.count()} records from {table_name}")
|
||||
|
||||
# After join
|
||||
logger.info(f"Join completed: {joined_df.count()} records (expected ~X)")
|
||||
|
||||
# After transformation
|
||||
logger.info(f"Transformation complete: {final_df.count()} records")
|
||||
|
||||
# On error
|
||||
logger.error(f"Failed to process {table_name}: {error_message}")
|
||||
|
||||
# On success
|
||||
logger.success(f"Successfully loaded {target_table_name}")
|
||||
```
|
||||
|
||||
## Quality Gates (Must Run After Fixes)
|
||||
|
||||
```bash
|
||||
# 1. Syntax validation
|
||||
python3 -m py_compile python_files/gold/g_x_mg_cms_mo.py
|
||||
|
||||
# 2. Code quality check
|
||||
ruff check python_files/gold/g_x_mg_cms_mo.py
|
||||
|
||||
# 3. Format code
|
||||
ruff format python_files/gold/g_x_mg_cms_mo.py
|
||||
|
||||
# 4. Run fixed code
|
||||
make gold_table
|
||||
```
|
||||
|
||||
## Key Principles for PySpark Engineer Agent
|
||||
|
||||
1. **CRITICAL: Agent Workflow Required**: ALL error fixing must follow the 3-phase agent workflow (pyspark-engineer → code-reviewer → iterative refinement until 100% satisfied)
|
||||
2. **CRITICAL: Validate MCP Server First**: Before starting, verify MCP server connectivity and schema availability. STOP and warn user if unavailable.
|
||||
3. **Always Query MCP Schema First**: Use MCP server to get authoritative schema data before fixing any errors. Cross-reference with data dictionary.
|
||||
4. **Use MCP for Join Validation**: Query foreign key relationships from MCP to ensure correct join logic and column names.
|
||||
5. **DataFrame API Without Aliases or col()**: Use DataFrame API (NOT Spark SQL). NO DataFrame aliases. NO col() function. Use direct string references (e.g., `"column_name"`) or df["column"] syntax (e.g., `df["column_name"]`). Import only needed functions (e.g., `from pyspark.sql.functions import when, coalesce`)
|
||||
6. **Test Incrementally**: Fix one error at a time, validate, then proceed
|
||||
7. **Log Everything**: Add logging at every transformation step
|
||||
8. **Handle Nulls**: Always consider null cases in business logic (check MCP nullability constraints)
|
||||
9. **Verify Join Logic**: Check join keys, types, and uniqueness before implementing (use MCP data types)
|
||||
10. **Use Utilities**: Leverage `TableUtilities` methods (add_row_hash, save_as_table, clean_date_time_columns)
|
||||
11. **Follow Patterns**: Reference working gold layer files for established patterns
|
||||
12. **Validate Business Rules**: Confirm logic with MCP schema, data dictionary, and user story requirements
|
||||
13. **Clean Code**: Adhere to project standards (240 char line length, no blank lines in functions, type hints, single line per statement)
|
||||
14. **Triple-Check Schemas**: When schema mismatch occurs, verify MCP → Runtime → Data Dictionary alignment and report discrepancies
|
||||
15. **Code Review Loop**: Continue refactoring until code-reviewer explicitly confirms 100% satisfaction with zero remaining issues
|
||||
|
||||
## Example Workflow with MCP Integration
|
||||
|
||||
### Scenario: Fixing AnalysisException - Column 'offence_report_id' not found
|
||||
|
||||
**Step 1: MCP Server Validation**
|
||||
```
|
||||
Agent: Testing MCP server connectivity...
|
||||
MCP Query: "List databases available"
|
||||
MCP Response: ✓ [bronze_cms, silver_cms, silver_fvms, gold_data_model, ...]
|
||||
|
||||
Agent: ✓ MCP Server Connected
|
||||
Agent: ✓ Schema data available
|
||||
Agent: ✓ Proceeding with error fixing workflow
|
||||
```
|
||||
|
||||
**Step 2: Query Schema via MCP**
|
||||
```
|
||||
Agent: Querying schema for silver_cms.s_cms_offence_report via MCP...
|
||||
MCP Query: "DESCRIBE TABLE silver_cms.s_cms_offence_report"
|
||||
MCP Response:
|
||||
- cms_offence_report_id (BIGINT, PRIMARY KEY, NOT NULL)
|
||||
- case_file_id (BIGINT, FOREIGN KEY → s_cms_case_file.cms_case_file_id, NULLABLE)
|
||||
- reported_date_time (TIMESTAMP, NULLABLE)
|
||||
- date_created (TIMESTAMP, NOT NULL)
|
||||
- status_code (VARCHAR, NULLABLE)
|
||||
... (50+ more columns)
|
||||
|
||||
Agent: Found correct column name: 'cms_offence_report_id' (not 'offence_report_id')
|
||||
```
|
||||
|
||||
**Step 3: Verify Join Relationship via MCP**
|
||||
```
|
||||
Agent: Checking foreign key relationship for join...
|
||||
MCP Query: "Show foreign keys from s_cms_offence_report to s_cms_case_file"
|
||||
MCP Response:
|
||||
- s_cms_offence_report.case_file_id → s_cms_case_file.cms_case_file_id
|
||||
- Relationship: MANY-TO-ONE
|
||||
- Nullability: case_file_id is NULLABLE → Use LEFT JOIN
|
||||
|
||||
Agent: ✓ Join should be: LEFT JOIN on case_file_id = cms_case_file_id
|
||||
```
|
||||
|
||||
**Step 4: Apply Fix with MCP-Verified Schema**
|
||||
```python
|
||||
# BEFORE (error)
|
||||
offence_df = spark.read.table("silver_cms.s_cms_offence_report")
|
||||
case_df = spark.read.table("silver_cms.s_cms_case_file")
|
||||
joined = offence_df.join(case_df, on="offence_report_id", how="left")
|
||||
|
||||
# AFTER (MCP-verified) - Use DataFrame API with correct column names
|
||||
# MCP-confirmed FK: case_file_id → cms_case_file_id
|
||||
# MCP-confirmed nullable FK → LEFT JOIN
|
||||
# MCP-confirmed PK: cms_offence_report_id
|
||||
case_df_subset = case_df.select("cms_case_file_id", "case_file_number")
|
||||
joined = offence_df.join(case_df_subset, offence_df["case_file_id"] == case_df_subset["cms_case_file_id"], how="left")
|
||||
joined = joined.withColumnRenamed("cms_offence_report_id", "offence_report_id")
|
||||
```
|
||||
|
||||
**Step 5: Validate Fix**
|
||||
```
|
||||
Agent: Running fixed code...
|
||||
Agent: ✓ No AnalysisException
|
||||
Agent: ✓ Join executed successfully
|
||||
Agent: Row count: 45,823 (expected ~45,000)
|
||||
Agent: ✓ Schema matches MCP expectations
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Phase 1: Initial Error Fixing (pyspark-engineer)
|
||||
- [ ] **MCP Server validated and responding** (MANDATORY FIRST CHECK)
|
||||
- [ ] Schema verified via MCP server for all source tables
|
||||
- [ ] Foreign key relationships confirmed via MCP queries
|
||||
- [ ] All syntax errors resolved
|
||||
- [ ] All runtime errors fixed
|
||||
- [ ] Join logic validated and correct (using MCP-confirmed column names and types)
|
||||
- [ ] DataFrame API used (NOT Spark SQL) per python_rules.md line 19
|
||||
- [ ] NO DataFrame aliases or col() function used - direct string references or df["column"] syntax only (per python_rules.md line 20)
|
||||
- [ ] Code follows python_rules.md standards: 240 char lines, no blank lines in functions, single line per statement, imports at top only
|
||||
- [ ] Row counts logged and reasonable
|
||||
- [ ] Business rules implemented correctly
|
||||
- [ ] Output schema matches requirements (cross-referenced with MCP schema)
|
||||
- [ ] Code passes quality gates (py_compile, ruff check, ruff format)
|
||||
- [ ] `make gold_table` executes successfully
|
||||
- [ ] Target table created/updated in `gold_data_model` database
|
||||
- [ ] No schema drift reported between MCP, Runtime, and Data Dictionary sources
|
||||
|
||||
### Phase 2: Code Review (code-reviewer)
|
||||
- [ ] code-reviewer agent launched with fixed code
|
||||
- [ ] Comprehensive review completed covering:
|
||||
- [ ] PySpark best practices adherence
|
||||
- [ ] Join logic correctness
|
||||
- [ ] Schema alignment validation
|
||||
- [ ] Business rule implementation accuracy
|
||||
- [ ] Code quality and standards compliance
|
||||
- [ ] Security vulnerabilities (none found)
|
||||
- [ ] Performance optimization opportunities addressed
|
||||
|
||||
### Phase 3: Iterative Refinement (MANDATORY UNTIL 100% SATISFIED)
|
||||
- [ ] All code-reviewer feedback items addressed by pyspark-engineer
|
||||
- [ ] Re-review completed by code-reviewer
|
||||
- [ ] Iteration cycle repeated until code-reviewer explicitly confirms:
|
||||
- [ ] **"✓ 100% SATISFIED - No further changes required"**
|
||||
- [ ] Zero remaining issues, warnings, or concerns
|
||||
- [ ] All quality gates pass
|
||||
- [ ] All business rules validated
|
||||
- [ ] Code meets production-ready standards
|
||||
|
||||
### Final Approval
|
||||
- [ ] **code-reviewer has explicitly confirmed 100% satisfaction**
|
||||
- [ ] No outstanding issues or concerns remain
|
||||
- [ ] Task is complete and ready for production deployment
|
||||
116
commands/refactor-code.md
Executable file
116
commands/refactor-code.md
Executable file
@@ -0,0 +1,116 @@
|
||||
# Intelligently Refactor and Improve Code Quality
|
||||
|
||||
Intelligently refactor and improve code quality
|
||||
|
||||
## Instructions
|
||||
|
||||
Follow this systematic approach to refactor code: **$ARGUMENTS**
|
||||
|
||||
1. **Pre-Refactoring Analysis**
|
||||
- Identify the code that needs refactoring and the reasons why
|
||||
- Understand the current functionality and behavior completely
|
||||
- Review existing tests and documentation
|
||||
- Identify all dependencies and usage points
|
||||
|
||||
2. **Test Coverage Verification**
|
||||
- Ensure comprehensive test coverage exists for the code being refactored
|
||||
- If tests are missing, write them BEFORE starting refactoring
|
||||
- Run all tests to establish a baseline
|
||||
- Document current behavior with additional tests if needed
|
||||
|
||||
3. **Refactoring Strategy**
|
||||
- Define clear goals for the refactoring (performance, readability, maintainability)
|
||||
- Choose appropriate refactoring techniques:
|
||||
- Extract Method/Function
|
||||
- Extract Class/Component
|
||||
- Rename Variable/Method
|
||||
- Move Method/Field
|
||||
- Replace Conditional with Polymorphism
|
||||
- Eliminate Dead Code
|
||||
- Plan the refactoring in small, incremental steps
|
||||
|
||||
4. **Environment Setup**
|
||||
- Create a new branch: `git checkout -b refactor/$ARGUMENTS`
|
||||
- Ensure all tests pass before starting
|
||||
- Set up any additional tooling needed (profilers, analyzers)
|
||||
|
||||
5. **Incremental Refactoring**
|
||||
- Make small, focused changes one at a time
|
||||
- Run tests after each change to ensure nothing breaks
|
||||
- Commit working changes frequently with descriptive messages
|
||||
- Use IDE refactoring tools when available for safety
|
||||
|
||||
6. **Code Quality Improvements**
|
||||
- Improve naming conventions for clarity
|
||||
- Eliminate code duplication (DRY principle)
|
||||
- Simplify complex conditional logic
|
||||
- Reduce method/function length and complexity
|
||||
- Improve separation of concerns
|
||||
|
||||
7. **Performance Optimizations**
|
||||
- Identify and eliminate performance bottlenecks
|
||||
- Optimize algorithms and data structures
|
||||
- Reduce unnecessary computations
|
||||
- Improve memory usage patterns
|
||||
|
||||
8. **Design Pattern Application**
|
||||
- Apply appropriate design patterns where beneficial
|
||||
- Improve abstraction and encapsulation
|
||||
- Enhance modularity and reusability
|
||||
- Reduce coupling between components
|
||||
|
||||
9. **Error Handling Improvement**
|
||||
- Standardize error handling approaches
|
||||
- Improve error messages and logging
|
||||
- Add proper exception handling
|
||||
- Enhance resilience and fault tolerance
|
||||
|
||||
10. **Documentation Updates**
|
||||
- Update code comments to reflect changes
|
||||
- Revise API documentation if interfaces changed
|
||||
- Update inline documentation and examples
|
||||
- Ensure comments are accurate and helpful
|
||||
|
||||
11. **Testing Enhancements**
|
||||
- Add tests for any new code paths created
|
||||
- Improve existing test quality and coverage
|
||||
- Remove or update obsolete tests
|
||||
- Ensure tests are still meaningful and effective
|
||||
|
||||
12. **Static Analysis**
|
||||
- Run linting tools to catch style and potential issues
|
||||
- Use static analysis tools to identify problems
|
||||
- Check for security vulnerabilities
|
||||
- Verify code complexity metrics
|
||||
|
||||
13. **Performance Verification**
|
||||
- Run performance benchmarks if applicable
|
||||
- Compare before/after metrics
|
||||
- Ensure refactoring didn't degrade performance
|
||||
- Document any performance improvements
|
||||
|
||||
14. **Integration Testing**
|
||||
- Run full test suite to ensure no regressions
|
||||
- Test integration with dependent systems
|
||||
- Verify all functionality works as expected
|
||||
- Test edge cases and error scenarios
|
||||
|
||||
15. **Code Review Preparation**
|
||||
- Review all changes for quality and consistency
|
||||
- Ensure refactoring goals were achieved
|
||||
- Prepare clear explanation of changes made
|
||||
- Document benefits and rationale
|
||||
|
||||
16. **Documentation of Changes**
|
||||
- Create a summary of refactoring changes
|
||||
- Document any breaking changes or new patterns
|
||||
- Update project documentation if needed
|
||||
- Explain benefits and reasoning for future reference
|
||||
|
||||
17. **Deployment Considerations**
|
||||
- Plan deployment strategy for refactored code
|
||||
- Consider feature flags for gradual rollout
|
||||
- Prepare rollback procedures
|
||||
- Set up monitoring for the refactored components
|
||||
|
||||
Remember: Refactoring should preserve external behavior while improving internal structure. Always prioritize safety over speed, and maintain comprehensive test coverage throughout the process.
|
||||
37
commands/setup-docker-containers.md
Executable file
37
commands/setup-docker-containers.md
Executable file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
allowed-tools: Read, Write, Edit, Bash
|
||||
argument-hint: [environment-type] | --development | --production | --microservices | --compose
|
||||
description: Setup Docker containerization with multi-stage builds and development workflows
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
# Setup Docker Containers
|
||||
|
||||
Setup comprehensive Docker containerization for development and production: **$ARGUMENTS**
|
||||
|
||||
## Current Project State
|
||||
|
||||
- Application type: @package.json or @requirements.txt (detect Node.js, Python, etc.)
|
||||
- Existing Docker: @Dockerfile or @docker-compose.yml (if exists)
|
||||
- Dependencies: !`find . -name "package-lock.json" -o -name "poetry.lock" -o -name "Pipfile.lock" | wc -l`
|
||||
- Services needed: Database, cache, message queue detection from configs
|
||||
|
||||
## Task
|
||||
|
||||
Implement production-ready Docker containerization with optimized builds and development workflows:
|
||||
|
||||
**Environment Type**: Use $ARGUMENTS to specify development, production, microservices, or Docker Compose setup
|
||||
|
||||
**Containerization Strategy**:
|
||||
1. **Dockerfile Creation** - Multi-stage builds, layer optimization, security best practices
|
||||
2. **Development Workflow** - Hot reloading, volume mounts, debugging capabilities
|
||||
3. **Production Optimization** - Image size reduction, security scanning, health checks
|
||||
4. **Multi-Service Setup** - Docker Compose, service discovery, networking configuration
|
||||
5. **CI/CD Integration** - Build automation, registry management, deployment pipelines
|
||||
6. **Monitoring & Logs** - Container observability, log aggregation, resource monitoring
|
||||
|
||||
**Security Features**: Non-root users, minimal base images, vulnerability scanning, secrets management.
|
||||
|
||||
**Performance Optimization**: Layer caching, build contexts, multi-platform builds, and resource constraints.
|
||||
|
||||
**Output**: Complete Docker setup with optimized containers, development workflows, production deployment, and comprehensive documentation.
|
||||
153
commands/ultra-think.md
Executable file
153
commands/ultra-think.md
Executable file
@@ -0,0 +1,153 @@
|
||||
# Deep Analysis and Problem Solving Mode
|
||||
|
||||
Deep analysis and problem solving mode
|
||||
|
||||
## Instructions
|
||||
|
||||
1. **Initialize Ultra Think Mode**
|
||||
- Acknowledge the request for enhanced analytical thinking
|
||||
- Set context for deep, systematic reasoning
|
||||
- Prepare to explore the problem space comprehensively
|
||||
|
||||
2. **Parse the Problem or Question**
|
||||
- Extract the core challenge from: **$ARGUMENTS**
|
||||
- Identify all stakeholders and constraints
|
||||
- Recognize implicit requirements and hidden complexities
|
||||
- Question assumptions and surface unknowns
|
||||
|
||||
3. **Multi-Dimensional Analysis**
|
||||
Approach the problem from multiple angles:
|
||||
|
||||
### Technical Perspective
|
||||
- Analyze technical feasibility and constraints
|
||||
- Consider scalability, performance, and maintainability
|
||||
- Evaluate security implications
|
||||
- Assess technical debt and future-proofing
|
||||
|
||||
### Business Perspective
|
||||
- Understand business value and ROI
|
||||
- Consider time-to-market pressures
|
||||
- Evaluate competitive advantages
|
||||
- Assess risk vs. reward trade-offs
|
||||
|
||||
### User Perspective
|
||||
- Analyze user needs and pain points
|
||||
- Consider usability and accessibility
|
||||
- Evaluate user experience implications
|
||||
- Think about edge cases and user journeys
|
||||
|
||||
### System Perspective
|
||||
- Consider system-wide impacts
|
||||
- Analyze integration points
|
||||
- Evaluate dependencies and coupling
|
||||
- Think about emergent behaviors
|
||||
|
||||
4. **Generate Multiple Solutions**
|
||||
- Brainstorm at least 3-5 different approaches
|
||||
- For each approach, consider:
|
||||
- Pros and cons
|
||||
- Implementation complexity
|
||||
- Resource requirements
|
||||
- Potential risks
|
||||
- Long-term implications
|
||||
- Include both conventional and creative solutions
|
||||
- Consider hybrid approaches
|
||||
|
||||
5. **Deep Dive Analysis**
|
||||
For the most promising solutions:
|
||||
- Create detailed implementation plans
|
||||
- Identify potential pitfalls and mitigation strategies
|
||||
- Consider phased approaches and MVPs
|
||||
- Analyze second and third-order effects
|
||||
- Think through failure modes and recovery
|
||||
|
||||
6. **Cross-Domain Thinking**
|
||||
- Draw parallels from other industries or domains
|
||||
- Apply design patterns from different contexts
|
||||
- Consider biological or natural system analogies
|
||||
- Look for innovative combinations of existing solutions
|
||||
|
||||
7. **Challenge and Refine**
|
||||
- Play devil's advocate with each solution
|
||||
- Identify weaknesses and blind spots
|
||||
- Consider "what if" scenarios
|
||||
- Stress-test assumptions
|
||||
- Look for unintended consequences
|
||||
|
||||
8. **Synthesize Insights**
|
||||
- Combine insights from all perspectives
|
||||
- Identify key decision factors
|
||||
- Highlight critical trade-offs
|
||||
- Summarize innovative discoveries
|
||||
- Present a nuanced view of the problem space
|
||||
|
||||
9. **Provide Structured Recommendations**
|
||||
Present findings in a clear structure:
|
||||
```
|
||||
## Problem Analysis
|
||||
- Core challenge
|
||||
- Key constraints
|
||||
- Critical success factors
|
||||
|
||||
## Solution Options
|
||||
### Option 1: [Name]
|
||||
- Description
|
||||
- Pros/Cons
|
||||
- Implementation approach
|
||||
- Risk assessment
|
||||
|
||||
### Option 2: [Name]
|
||||
[Similar structure]
|
||||
|
||||
## Recommendation
|
||||
- Recommended approach
|
||||
- Rationale
|
||||
- Implementation roadmap
|
||||
- Success metrics
|
||||
- Risk mitigation plan
|
||||
|
||||
## Alternative Perspectives
|
||||
- Contrarian view
|
||||
- Future considerations
|
||||
- Areas for further research
|
||||
```
|
||||
|
||||
10. **Meta-Analysis**
|
||||
- Reflect on the thinking process itself
|
||||
- Identify areas of uncertainty
|
||||
- Acknowledge biases or limitations
|
||||
- Suggest additional expertise needed
|
||||
- Provide confidence levels for recommendations
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```bash
|
||||
# Architectural decision
|
||||
/project:ultra-think Should we migrate to microservices or improve our monolith?
|
||||
|
||||
# Complex problem solving
|
||||
/project:ultra-think How do we scale our system to handle 10x traffic while reducing costs?
|
||||
|
||||
# Strategic planning
|
||||
/project:ultra-think What technology stack should we choose for our next-gen platform?
|
||||
|
||||
# Design challenge
|
||||
/project:ultra-think How can we improve our API to be more developer-friendly while maintaining backward compatibility?
|
||||
```
|
||||
|
||||
## Key Principles
|
||||
|
||||
- **First Principles Thinking**: Break down to fundamental truths
|
||||
- **Systems Thinking**: Consider interconnections and feedback loops
|
||||
- **Probabilistic Thinking**: Work with uncertainties and ranges
|
||||
- **Inversion**: Consider what to avoid, not just what to do
|
||||
- **Second-Order Thinking**: Consider consequences of consequences
|
||||
|
||||
## Output Expectations
|
||||
|
||||
- Comprehensive analysis (typically 2-4 pages of insights)
|
||||
- Multiple viable solutions with trade-offs
|
||||
- Clear reasoning chains
|
||||
- Acknowledgment of uncertainties
|
||||
- Actionable recommendations
|
||||
- Novel insights or perspectives
|
||||
672
commands/update-docs.md
Executable file
672
commands/update-docs.md
Executable file
@@ -0,0 +1,672 @@
|
||||
---
|
||||
allowed-tools: Read, Write, Edit, Bash, Grep, Glob, Task, mcp__*
|
||||
argument-hint: [doc-type] | --generate-local | --sync-to-wiki | --regenerate | --all | --validate
|
||||
description: Generate documentation locally to ./docs/ then sync to Azure DevOps wiki (local-first workflow)
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
# Data Pipeline Documentation - Local-First Workflow
|
||||
|
||||
Generate documentation locally in `./docs/` directory, then sync to Azure DevOps wiki: $ARGUMENTS
|
||||
|
||||
## Architecture: Local-First Documentation
|
||||
|
||||
```
|
||||
Source Code → Generate Docs → ./docs/ (version controlled) → Sync to Wiki
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Documentation version controlled in git
|
||||
- ✅ Review locally before wiki publish
|
||||
- ✅ No regeneration needed for wiki sync
|
||||
- ✅ Git diff shows doc changes
|
||||
- ✅ Reusable across multiple targets (wiki, GitHub Pages, PDF)
|
||||
- ✅ Offline access to documentation
|
||||
|
||||
## Repository Information
|
||||
|
||||
- Repository: unify_2_1_dm_synapse_env_d10
|
||||
- Local docs: `./docs/` (mirrors repo structure)
|
||||
- Wiki base: 'Unify 2.1 Data Migration Technical Documentation'/'Data Migration Pipeline'/unify_2_1_dm_synapse_env_d10/
|
||||
- Exclusions: @.docsignore (similar to .gitignore)
|
||||
|
||||
## Documentation Workflows
|
||||
|
||||
### --generate-local: Generate Documentation Locally
|
||||
|
||||
Generate comprehensive documentation and save to `./docs/` directory.
|
||||
|
||||
#### Step 1: Scan Repository for Files
|
||||
|
||||
```bash
|
||||
# Get all documentable files (exclude .docsignore patterns)
|
||||
git ls-files "*.py" "*.yaml" "*.yml" "*.md" | grep -v -f <(git ls-files --ignored --exclude-standard --exclude-from=.docsignore)
|
||||
```
|
||||
|
||||
**Target files:**
|
||||
- Python files: `python_files/**/*.py`
|
||||
- Configuration: `configuration.yaml`
|
||||
- Existing markdown: `README.md` (validate/enhance)
|
||||
|
||||
**Exclude (from .docsignore):**
|
||||
- `__pycache__/`, `*.pyc`, `.venv/`
|
||||
- `.claude/`, `docs/`, `*.duckdb`
|
||||
- See `.docsignore` for complete list
|
||||
|
||||
#### Step 2: Launch Code-Documenter Agent
|
||||
|
||||
Use Task tool to launch code-documenter agent:
|
||||
|
||||
```
|
||||
Generate comprehensive documentation for repository files:
|
||||
|
||||
**Scope:**
|
||||
- Target: All Python files in python_files/ (utilities, bronze, silver, gold, testing)
|
||||
- Configuration files: configuration.yaml
|
||||
- Exclude: Files matching .docsignore patterns
|
||||
|
||||
**Documentation Requirements:**
|
||||
For Python files:
|
||||
- File purpose and overview
|
||||
- Architecture and design patterns (medallion, ETL, etc.)
|
||||
- Class and function documentation
|
||||
- Data flow explanations
|
||||
- Business logic descriptions
|
||||
- Dependencies and imports
|
||||
- Usage examples
|
||||
- Testing information
|
||||
- Related Azure DevOps work items
|
||||
|
||||
For Configuration files:
|
||||
- Configuration structure
|
||||
- All configuration sections explained
|
||||
- Environment variables
|
||||
- Azure integration settings
|
||||
- Usage examples
|
||||
|
||||
**Output Format:**
|
||||
- Markdown format suitable for wiki
|
||||
- File naming: source_file.py → docs/path/source_file.py.md
|
||||
- Clear heading structure
|
||||
- Code examples with syntax highlighting
|
||||
- Cross-references to related files
|
||||
- Professional, concise language
|
||||
- NO attribution footers (e.g., "Documentation By: Claude Code")
|
||||
|
||||
**Output Location:**
|
||||
Save all generated documentation to ./docs/ directory maintaining source structure:
|
||||
- python_files/utilities/session_optimiser.py → docs/python_files/utilities/session_optimiser.py.md
|
||||
- python_files/gold/g_address.py → docs/python_files/gold/g_address.py.md
|
||||
- configuration.yaml → docs/configuration.yaml.md
|
||||
|
||||
**Directory Index Files:**
|
||||
Generate README.md for each directory with:
|
||||
- Directory purpose
|
||||
- List of files with brief descriptions
|
||||
- Architecture overview for layer directories
|
||||
- Navigation links
|
||||
```
|
||||
|
||||
#### Step 3: Generate Directory Index Files
|
||||
|
||||
Create `README.md` files for each directory:
|
||||
|
||||
**Root Index (docs/README.md):**
|
||||
- Overall documentation structure
|
||||
- Navigation to main sections
|
||||
- Medallion architecture overview
|
||||
- Link to wiki
|
||||
|
||||
**Layer Indexes:**
|
||||
- `docs/python_files/README.md` - Pipeline overview
|
||||
- `docs/python_files/utilities/README.md` - Core utilities index
|
||||
- `docs/python_files/bronze/README.md` - Bronze layer overview
|
||||
- `docs/python_files/silver/README.md` - Silver layer overview
|
||||
- `docs/python_files/silver/cms/README.md` - CMS tables index
|
||||
- `docs/python_files/silver/fvms/README.md` - FVMS tables index
|
||||
- `docs/python_files/silver/nicherms/README.md` - NicheRMS tables index
|
||||
- `docs/python_files/gold/README.md` - Gold layer overview
|
||||
- `docs/python_files/testing/README.md` - Testing documentation
|
||||
|
||||
#### Step 4: Validation
|
||||
|
||||
Verify generated documentation:
|
||||
- All source files have corresponding .md files in ./docs/
|
||||
- Directory structure matches source repository
|
||||
- Index files (README.md) created for directories
|
||||
- Markdown formatting is valid
|
||||
- No files from .docsignore included
|
||||
- Cross-references are valid
|
||||
|
||||
#### Step 5: Summary Report
|
||||
|
||||
Provide detailed report:
|
||||
```markdown
|
||||
## Documentation Generation Complete
|
||||
|
||||
### Files Documented:
|
||||
- Python files: [count]
|
||||
- Configuration files: [count]
|
||||
- Total documentation files: [count]
|
||||
|
||||
### Directory Structure:
|
||||
- Utilities: [file count]
|
||||
- Bronze layer: [file count]
|
||||
- Silver layer: [file count by database]
|
||||
- Gold layer: [file count]
|
||||
- Testing: [file count]
|
||||
|
||||
### Index Files Created:
|
||||
- Root index: docs/README.md
|
||||
- Layer indexes: [list]
|
||||
- Database indexes: [list]
|
||||
|
||||
### Location:
|
||||
All documentation saved to: ./docs/
|
||||
|
||||
### Next Steps:
|
||||
1. Review generated documentation: `ls -R ./docs/`
|
||||
2. Make any manual edits if needed
|
||||
3. Commit to git: `git add docs/`
|
||||
4. Sync to wiki: `/update-docs --sync-to-wiki`
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### --sync-to-wiki: Sync Local Docs to Azure DevOps Wiki
|
||||
|
||||
Copy documentation from `./docs/` to Azure DevOps wiki (no regeneration).
|
||||
|
||||
#### Step 1: Scan Local Documentation
|
||||
|
||||
```bash
|
||||
# Find all .md files in ./docs/
|
||||
find ./docs -name "*.md" -type f
|
||||
```
|
||||
|
||||
**Path Mapping Logic:**
|
||||
|
||||
Local path → Wiki path conversion:
|
||||
```
|
||||
./docs/python_files/utilities/session_optimiser.py.md
|
||||
↓
|
||||
Unify 2.1 Data Migration Technical Documentation/
|
||||
Data Migration Pipeline/
|
||||
unify_2_1_dm_synapse_env_d10/
|
||||
python_files/utilities/session_optimiser.py
|
||||
```
|
||||
|
||||
**Mapping rules:**
|
||||
1. Remove `./docs/` prefix
|
||||
2. Remove `.md` extension (unless README.md → README)
|
||||
3. Prepend wiki base path
|
||||
4. Use forward slashes for wiki paths
|
||||
|
||||
#### Step 2: Read and Process Each Documentation File
|
||||
|
||||
For each `.md` file in `./docs/`:
|
||||
1. Read markdown content
|
||||
2. Extract metadata (if present)
|
||||
3. Generate wiki path from local path
|
||||
4. Prepare content for wiki format
|
||||
5. Add footer with metadata:
|
||||
```markdown
|
||||
---
|
||||
**Metadata:**
|
||||
- Source: [file path in repo]
|
||||
- Last Updated: [date]
|
||||
- Related Work Items: [links if available]
|
||||
```
|
||||
|
||||
#### Step 3: Create/Update Wiki Pages Using ADO MCP
|
||||
|
||||
Use Azure DevOps MCP to create or update each wiki page:
|
||||
|
||||
```bash
|
||||
# For each documentation file:
|
||||
# 1. Check if wiki page exists
|
||||
# 2. Create new page if not exists
|
||||
# 3. Update existing page if exists
|
||||
# 4. Verify success
|
||||
|
||||
# Example for session_optimiser.py.md:
|
||||
Local: ./docs/python_files/utilities/session_optimiser.py.md
|
||||
Wiki: Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py
|
||||
Action: Create/Update wiki page with content
|
||||
```
|
||||
|
||||
**ADO MCP Operations:**
|
||||
```python
|
||||
# Pseudo-code for sync operation
|
||||
for doc_file in find_all_docs():
|
||||
wiki_path = local_to_wiki_path(doc_file)
|
||||
content = read_file(doc_file)
|
||||
|
||||
# Use MCP to create/update
|
||||
mcp__Azure_DevOps__create_or_update_wiki_page(
|
||||
path=wiki_path,
|
||||
content=content
|
||||
)
|
||||
```
|
||||
|
||||
#### Step 4: Verification
|
||||
|
||||
After sync, verify:
|
||||
- All .md files from ./docs/ have corresponding wiki pages
|
||||
- Wiki path structure matches local structure
|
||||
- Content is properly formatted in wiki
|
||||
- No sync errors
|
||||
- Wiki pages accessible in Azure DevOps
|
||||
|
||||
#### Step 5: Summary Report
|
||||
|
||||
Provide detailed sync report:
|
||||
```markdown
|
||||
## Wiki Sync Complete
|
||||
|
||||
### Pages Synced:
|
||||
- Total pages: [count]
|
||||
- Created new: [count]
|
||||
- Updated existing: [count]
|
||||
|
||||
### By Directory:
|
||||
- Utilities: [count] pages
|
||||
- Bronze: [count] pages
|
||||
- Silver: [count] pages
|
||||
- CMS: [count] pages
|
||||
- FVMS: [count] pages
|
||||
- NicheRMS: [count] pages
|
||||
- Gold: [count] pages
|
||||
- Testing: [count] pages
|
||||
|
||||
### Wiki Location:
|
||||
Base: Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/
|
||||
|
||||
### Verification:
|
||||
- All pages synced successfully: [✅/❌]
|
||||
- Path structure correct: [✅/❌]
|
||||
- Content formatting valid: [✅/❌]
|
||||
|
||||
### Errors:
|
||||
[List any sync failures and reasons]
|
||||
|
||||
### Next Steps:
|
||||
1. Verify pages in Azure DevOps wiki
|
||||
2. Check navigation and cross-references
|
||||
3. Share wiki URL with team
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### --regenerate: Regenerate Specific File(s)
|
||||
|
||||
Update documentation for specific file(s) without full regeneration.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Single file
|
||||
/update-docs --regenerate python_files/gold/g_address.py
|
||||
|
||||
# Multiple files
|
||||
/update-docs --regenerate python_files/gold/g_address.py python_files/gold/g_cms_address.py
|
||||
|
||||
# Entire directory
|
||||
/update-docs --regenerate python_files/utilities/
|
||||
```
|
||||
|
||||
**Process:**
|
||||
1. Launch code-documenter agent for specified file(s)
|
||||
2. Generate updated documentation
|
||||
3. Save to ./docs/ (overwrite existing)
|
||||
4. Report files updated
|
||||
5. Optionally sync to wiki
|
||||
|
||||
**Output:**
|
||||
```markdown
|
||||
## Documentation Regenerated
|
||||
|
||||
### Files Updated:
|
||||
- python_files/gold/g_address.py → docs/python_files/gold/g_address.py.md
|
||||
|
||||
### Next Steps:
|
||||
1. Review updated documentation
|
||||
2. Commit changes: `git add docs/python_files/gold/g_address.py.md`
|
||||
3. Sync to wiki: `/update-docs --sync-to-wiki --directory python_files/gold/`
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### --all: Complete Workflow
|
||||
|
||||
Execute complete documentation workflow: generate local + sync to wiki.
|
||||
|
||||
**Process:**
|
||||
1. Execute `--generate-local` workflow
|
||||
2. Validate generated documentation
|
||||
3. Execute `--sync-to-wiki` workflow
|
||||
4. Provide comprehensive summary
|
||||
|
||||
**Use when:**
|
||||
- Initial documentation setup
|
||||
- Major refactoring or restructuring
|
||||
- Adding new layers or modules
|
||||
- Quarterly documentation refresh
|
||||
|
||||
---
|
||||
|
||||
### --validate: Documentation Validation
|
||||
|
||||
Validate documentation completeness and accuracy.
|
||||
|
||||
**Validation Checks:**
|
||||
|
||||
1. **Completeness:**
|
||||
- All source files have documentation
|
||||
- All directories have index files (README.md)
|
||||
- No missing cross-references
|
||||
|
||||
2. **Accuracy:**
|
||||
- Documented functions exist in source
|
||||
- Schema documentation matches actual tables
|
||||
- Configuration docs match configuration.yaml
|
||||
|
||||
3. **Quality:**
|
||||
- Valid markdown syntax
|
||||
- Proper heading structure
|
||||
- Code blocks properly formatted
|
||||
- No broken links
|
||||
|
||||
4. **Sync Status:**
|
||||
- ./docs/ files match wiki pages
|
||||
- No uncommitted documentation changes
|
||||
- Wiki pages up to date
|
||||
|
||||
**Validation Report:**
|
||||
```markdown
|
||||
## Documentation Validation Results
|
||||
|
||||
### Completeness: [✅/❌]
|
||||
- Files without docs: [count]
|
||||
- Missing index files: [count]
|
||||
- Missing cross-references: [count]
|
||||
|
||||
### Accuracy: [✅/❌]
|
||||
- Schema mismatches: [count]
|
||||
- Outdated function docs: [count]
|
||||
- Configuration drift: [count]
|
||||
|
||||
### Quality: [✅/❌]
|
||||
- Markdown syntax errors: [count]
|
||||
- Broken links: [count]
|
||||
- Formatting issues: [count]
|
||||
|
||||
### Sync Status: [✅/❌]
|
||||
- Out-of-sync files: [count]
|
||||
- Uncommitted changes: [count]
|
||||
- Wiki drift: [count]
|
||||
|
||||
### Actions Required:
|
||||
[List of fixes needed]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optional Workflow Modifiers
|
||||
|
||||
### --layer: Target Specific Layer
|
||||
|
||||
Generate/sync documentation for specific layer only.
|
||||
|
||||
```bash
|
||||
/update-docs --generate-local --layer utilities
|
||||
/update-docs --generate-local --layer gold
|
||||
/update-docs --sync-to-wiki --layer silver
|
||||
```
|
||||
|
||||
### --directory: Target Specific Directory
|
||||
|
||||
Generate/sync documentation for specific directory.
|
||||
|
||||
```bash
|
||||
/update-docs --generate-local --directory python_files/gold/
|
||||
/update-docs --sync-to-wiki --directory python_files/utilities/
|
||||
```
|
||||
|
||||
### --only-modified: Sync Only Changed Files
|
||||
|
||||
Sync only files modified since last sync (based on git status).
|
||||
|
||||
```bash
|
||||
/update-docs --sync-to-wiki --only-modified
|
||||
```
|
||||
|
||||
**Process:**
|
||||
1. Check git status for modified .md files in ./docs/
|
||||
2. Sync only those files to wiki
|
||||
3. Faster than full sync
|
||||
|
||||
---
|
||||
|
||||
## Code-Documenter Agent Integration
|
||||
|
||||
### When to Use Code-Documenter Agent:
|
||||
|
||||
**Always use Task tool with subagent_type="code-documenter" for:**
|
||||
1. **Initial documentation generation** (--generate-local)
|
||||
2. **File regeneration** (--regenerate)
|
||||
3. **Complex transformations** - ETL logic, medallion patterns
|
||||
4. **Architecture documentation** - High-level system design
|
||||
|
||||
### Agent Invocation Pattern:
|
||||
|
||||
```markdown
|
||||
Launch code-documenter agent with:
|
||||
- Target files: [list of files or directories]
|
||||
- Documentation scope: comprehensive documentation
|
||||
- Focus areas: [medallion architecture | ETL logic | utilities | testing]
|
||||
- Output format: Wiki-ready markdown
|
||||
- Output location: ./docs/ (maintain source structure)
|
||||
- Exclude patterns: Files from .docsignore
|
||||
- Quality requirements: Professional, accurate, no attribution footers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Path Mapping Reference
|
||||
|
||||
### Local to Wiki Path Conversion
|
||||
|
||||
**Function logic:**
|
||||
```python
|
||||
def local_to_wiki_path(local_path: str) -> str:
|
||||
"""
|
||||
Convert local docs path to Azure DevOps wiki path
|
||||
|
||||
Args:
|
||||
local_path: Path like ./docs/python_files/utilities/session_optimiser.py.md
|
||||
|
||||
Returns:
|
||||
Wiki path like: Unify 2.1 Data Migration Technical Documentation/.../session_optimiser.py
|
||||
"""
|
||||
# Remove ./docs/ prefix
|
||||
relative = local_path.replace('./docs/', '')
|
||||
|
||||
# Handle README.md (keep as README)
|
||||
if relative.endswith('/README.md'):
|
||||
relative = relative # Keep README.md
|
||||
elif relative.endswith('.md'):
|
||||
relative = relative[:-3] # Remove .md extension
|
||||
|
||||
# Build wiki path
|
||||
wiki_base = "Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10"
|
||||
wiki_path = f"{wiki_base}/{relative}"
|
||||
|
||||
return wiki_path
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
```
|
||||
./docs/README.md
|
||||
→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/README
|
||||
|
||||
./docs/python_files/utilities/session_optimiser.py.md
|
||||
→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py
|
||||
|
||||
./docs/python_files/gold/g_address.py.md
|
||||
→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/gold/g_address.py
|
||||
|
||||
./docs/configuration.yaml.md
|
||||
→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/configuration.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Azure DevOps MCP Commands
|
||||
|
||||
### Wiki Operations:
|
||||
|
||||
```bash
|
||||
# Create wiki page
|
||||
mcp__Azure_DevOps__create_wiki_page(
|
||||
path="Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py",
|
||||
content="[markdown content]"
|
||||
)
|
||||
|
||||
# Update wiki page
|
||||
mcp__Azure_DevOps__update_wiki_page(
|
||||
path="[wiki page path]",
|
||||
content="[updated markdown content]"
|
||||
)
|
||||
|
||||
# List wiki pages in directory
|
||||
mcp__Azure_DevOps__list_wiki_pages(
|
||||
path="Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/gold"
|
||||
)
|
||||
|
||||
# Delete wiki page (cleanup)
|
||||
mcp__Azure_DevOps__delete_wiki_page(
|
||||
path="[wiki page path]"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Guidelines
|
||||
|
||||
### DO:
|
||||
- ✅ Generate documentation locally first (./docs/)
|
||||
- ✅ Review and edit documentation before wiki sync
|
||||
- ✅ Commit documentation to git with code changes
|
||||
- ✅ Use code-documenter agent for comprehensive docs
|
||||
- ✅ Respect .docsignore patterns
|
||||
- ✅ Maintain directory structure matching source repo
|
||||
- ✅ Generate index files (README.md) for directories
|
||||
- ✅ Use --only-modified for incremental wiki updates
|
||||
- ✅ Validate documentation regularly
|
||||
- ✅ Link to Azure DevOps work items in docs
|
||||
|
||||
### DO NOT:
|
||||
- ❌ Generate documentation directly to wiki (bypass ./docs/)
|
||||
- ❌ Skip local review before wiki publish
|
||||
- ❌ Document files in .docsignore (__pycache__/, *.pyc, .env)
|
||||
- ❌ Include attribution footers ("Documentation By: Claude Code")
|
||||
- ❌ Duplicate documentation in multiple locations
|
||||
- ❌ Create wiki pages without proper path structure
|
||||
- ❌ Forget to update documentation when code changes
|
||||
- ❌ Sync to wiki without validating locally first
|
||||
|
||||
---
|
||||
|
||||
## Documentation Quality Standards
|
||||
|
||||
### For Python Files:
|
||||
- Clear file purpose and overview
|
||||
- Architecture and design pattern explanations
|
||||
- Class and function documentation with type hints
|
||||
- Data flow diagrams for ETL transformations
|
||||
- Business logic explanations
|
||||
- Usage examples with code snippets
|
||||
- Testing information and coverage
|
||||
- Dependencies and related files
|
||||
- Related Azure DevOps work items
|
||||
|
||||
### For Configuration Files:
|
||||
- Section-by-section explanation
|
||||
- Environment variable documentation
|
||||
- Azure integration details
|
||||
- Usage examples
|
||||
- Valid value ranges and constraints
|
||||
|
||||
### For Index Files (README.md):
|
||||
- Directory purpose and overview
|
||||
- File listing with brief descriptions
|
||||
- Architecture context (for layers)
|
||||
- Navigation links to sub-sections
|
||||
- Key concepts and patterns
|
||||
|
||||
### Markdown Quality:
|
||||
- Clear heading hierarchy (H1 → H2 → H3)
|
||||
- Code blocks with language specification
|
||||
- Tables for structured data
|
||||
- Cross-references using relative links
|
||||
- No broken links
|
||||
- Professional, concise language
|
||||
- Valid markdown syntax
|
||||
|
||||
---
|
||||
|
||||
## Git Integration
|
||||
|
||||
### Commit Documentation with Code:
|
||||
|
||||
```bash
|
||||
# Add both code and documentation
|
||||
git add python_files/gold/g_address.py docs/python_files/gold/g_address.py.md
|
||||
git commit -m "feat(gold): add g_address table with documentation"
|
||||
|
||||
# View documentation changes
|
||||
git diff docs/
|
||||
|
||||
# Documentation visible in PR reviews
|
||||
```
|
||||
|
||||
### Pre-commit Hook (Optional):
|
||||
|
||||
```bash
|
||||
# Validate documentation before commit
|
||||
# In .git/hooks/pre-commit:
|
||||
/update-docs --validate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output Summary Template
|
||||
|
||||
After any workflow completion, provide:
|
||||
|
||||
### 1. Workflow Executed:
|
||||
- Command: [command used]
|
||||
- Scope: [what was processed]
|
||||
- Duration: [time taken]
|
||||
|
||||
### 2. Documentation Generated/Updated:
|
||||
- Files processed: [count and list]
|
||||
- Location: ./docs/
|
||||
- Size: [total documentation size]
|
||||
|
||||
### 3. Wiki Sync Results (if applicable):
|
||||
- Pages created: [count]
|
||||
- Pages updated: [count]
|
||||
- Wiki path: [base path]
|
||||
- Status: [success/partial/failed]
|
||||
|
||||
### 4. Validation Results:
|
||||
- Completeness: [✅/❌]
|
||||
- Accuracy: [✅/❌]
|
||||
- Quality: [✅/❌]
|
||||
- Issues found: [count and details]
|
||||
|
||||
### 5. Next Steps:
|
||||
- Recommended actions
|
||||
- Areas needing attention
|
||||
- Suggested improvements
|
||||
326
commands/write-tests.md
Executable file
326
commands/write-tests.md
Executable file
@@ -0,0 +1,326 @@
|
||||
---
|
||||
allowed-tools: Read, Write, Edit, Bash
|
||||
argument-hint: [target-file] | [test-type] | --unit | --integration | --data-validation | --medallion
|
||||
description: Write comprehensive pytest tests for PySpark data pipelines with live data validation
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
# Write Tests - pytest + PySpark with Live Data
|
||||
|
||||
Write comprehensive pytest tests for PySpark data pipelines using **LIVE DATA** sources: **$ARGUMENTS**
|
||||
|
||||
## Current Testing Context
|
||||
|
||||
- Test framework: !`[ -f pytest.ini ] && echo "pytest configured" || echo "pytest setup needed"`
|
||||
- Target: $ARGUMENTS (file/layer to test)
|
||||
- Test location: !`ls -d tests/ test/ 2>/dev/null | head -1 || echo "tests/ (will create)"`
|
||||
- Live data available: Bronze/Silver/Gold layers with real FVMS, CMS, NicheRMS tables
|
||||
|
||||
## Core Principle: TEST WITH LIVE DATA
|
||||
|
||||
**ALWAYS use real data from Bronze/Silver/Gold layers**. No mocked data unless absolutely necessary.
|
||||
|
||||
## pytest Testing Framework
|
||||
|
||||
### 1. Test File Organization
|
||||
|
||||
```
|
||||
tests/
|
||||
├── conftest.py # Shared fixtures (Spark session, live data)
|
||||
├── test_bronze_ingestion.py # Bronze layer validation
|
||||
├── test_silver_transformations.py # Silver layer ETL
|
||||
├── test_gold_aggregations.py # Gold layer analytics
|
||||
├── test_utilities.py # TableUtilities, NotebookLogger
|
||||
└── integration/
|
||||
└── test_end_to_end_pipeline.py
|
||||
```
|
||||
|
||||
### 2. Essential pytest Fixtures (conftest.py)
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from pyspark.sql import SparkSession
|
||||
from python_files.utilities.session_optimiser import SparkOptimiser
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def spark():
|
||||
"""Shared Spark session for all tests - reuses SparkOptimiser"""
|
||||
session = SparkOptimiser.get_optimised_spark_session()
|
||||
yield session
|
||||
session.stop()
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def bronze_data(spark):
|
||||
"""Live bronze layer data - REAL DATA"""
|
||||
return spark.table("bronze_fvms.b_vehicle_master")
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def silver_data(spark):
|
||||
"""Live silver layer data - REAL DATA"""
|
||||
return spark.table("silver_fvms.s_vehicle_master")
|
||||
|
||||
@pytest.fixture
|
||||
def sample_live_data(bronze_data):
|
||||
"""Small sample from live data for fast tests"""
|
||||
return bronze_data.limit(100)
|
||||
```
|
||||
|
||||
### 3. pytest Test Patterns
|
||||
|
||||
#### Pattern 1: Unit Tests (Individual Functions)
|
||||
|
||||
```python
|
||||
# tests/test_utilities.py
|
||||
import pytest
|
||||
from python_files.utilities.session_optimiser import TableUtilities
|
||||
|
||||
class TestTableUtilities:
|
||||
def test_add_row_hash_creates_hash_column(self, spark, sample_live_data):
|
||||
"""Verify add_row_hash() creates hash_key column"""
|
||||
result = TableUtilities.add_row_hash(sample_live_data, ["vehicle_id"])
|
||||
assert "hash_key" in result.columns
|
||||
assert result.count() == sample_live_data.count()
|
||||
|
||||
def test_drop_duplicates_simple_removes_exact_duplicates(self, spark):
|
||||
"""Test deduplication on live data"""
|
||||
# Use LIVE data with known duplicates
|
||||
raw_data = spark.table("bronze_fvms.b_vehicle_events")
|
||||
result = TableUtilities.drop_duplicates_simple(raw_data)
|
||||
assert result.count() <= raw_data.count()
|
||||
|
||||
@pytest.mark.parametrize("date_col", ["created_date", "updated_date", "event_date"])
|
||||
def test_clean_date_time_columns_handles_all_formats(self, spark, bronze_data, date_col):
|
||||
"""Parameterized test for date cleaning"""
|
||||
if date_col in bronze_data.columns:
|
||||
result = TableUtilities.clean_date_time_columns(bronze_data, [date_col])
|
||||
assert date_col in result.columns
|
||||
```
|
||||
|
||||
#### Pattern 2: Integration Tests (End-to-End)
|
||||
|
||||
```python
|
||||
# tests/integration/test_end_to_end_pipeline.py
|
||||
import pytest
|
||||
from python_files.silver.fvms.s_vehicle_master import VehicleMaster
|
||||
|
||||
class TestSilverVehicleMasterPipeline:
|
||||
def test_full_etl_with_live_bronze_data(self, spark):
|
||||
"""Test complete Bronze → Silver transformation with LIVE data"""
|
||||
# Extract: Read LIVE bronze data
|
||||
bronze_table = "bronze_fvms.b_vehicle_master"
|
||||
bronze_df = spark.table(bronze_table)
|
||||
initial_count = bronze_df.count()
|
||||
|
||||
# Transform & Load: Run actual ETL class
|
||||
etl = VehicleMaster(bronze_table_name=bronze_table)
|
||||
|
||||
# Validate: Check LIVE silver output
|
||||
silver_df = spark.table("silver_fvms.s_vehicle_master")
|
||||
assert silver_df.count() > 0
|
||||
assert "hash_key" in silver_df.columns
|
||||
assert "load_timestamp" in silver_df.columns
|
||||
|
||||
# Data quality: No nulls in critical fields
|
||||
assert silver_df.filter("vehicle_id IS NULL").count() == 0
|
||||
```
|
||||
|
||||
#### Pattern 3: Data Validation (Live Data Checks)
|
||||
|
||||
```python
|
||||
# tests/test_data_validation.py
|
||||
import pytest
|
||||
|
||||
class TestBronzeLayerDataQuality:
|
||||
"""Validate live data quality in Bronze layer"""
|
||||
|
||||
def test_bronze_vehicle_master_has_recent_data(self, spark):
|
||||
"""Verify bronze layer contains recent records"""
|
||||
from pyspark.sql.functions import max, datediff, current_date
|
||||
|
||||
df = spark.table("bronze_fvms.b_vehicle_master")
|
||||
max_date = df.select(max("load_timestamp")).collect()[0][0]
|
||||
|
||||
# Data should be less than 30 days old
|
||||
assert (current_date() - max_date).days <= 30
|
||||
|
||||
def test_bronze_to_silver_row_counts_match_expectations(self, spark):
|
||||
"""Validate row count transformation logic"""
|
||||
bronze = spark.table("bronze_fvms.b_vehicle_master")
|
||||
silver = spark.table("silver_fvms.s_vehicle_master")
|
||||
|
||||
# After deduplication, silver <= bronze
|
||||
assert silver.count() <= bronze.count()
|
||||
|
||||
@pytest.mark.slow
|
||||
def test_hash_key_uniqueness_on_live_data(self, spark):
|
||||
"""Verify hash_key uniqueness in Silver layer (full scan)"""
|
||||
df = spark.table("silver_fvms.s_vehicle_master")
|
||||
total = df.count()
|
||||
unique = df.select("hash_key").distinct().count()
|
||||
|
||||
assert total == unique, f"Duplicate hash_keys found: {total - unique}"
|
||||
```
|
||||
|
||||
#### Pattern 4: Schema Validation
|
||||
|
||||
```python
|
||||
# tests/test_schema_validation.py
|
||||
import pytest
|
||||
from pyspark.sql.types import StringType, IntegerType, TimestampType
|
||||
|
||||
class TestSchemaConformance:
|
||||
def test_silver_vehicle_schema_matches_expected(self, spark):
|
||||
"""Validate Silver layer schema against business requirements"""
|
||||
df = spark.table("silver_fvms.s_vehicle_master")
|
||||
schema_dict = {field.name: field.dataType for field in df.schema.fields}
|
||||
|
||||
# Critical fields must exist
|
||||
assert "vehicle_id" in schema_dict
|
||||
assert "hash_key" in schema_dict
|
||||
assert "load_timestamp" in schema_dict
|
||||
|
||||
# Type validation
|
||||
assert isinstance(schema_dict["vehicle_id"], StringType)
|
||||
assert isinstance(schema_dict["load_timestamp"], TimestampType)
|
||||
```
|
||||
|
||||
### 4. pytest Markers & Configuration
|
||||
|
||||
**pytest.ini**:
|
||||
```ini
|
||||
[tool:pytest]
|
||||
testpaths = tests
|
||||
python_files = test_*.py
|
||||
python_classes = Test*
|
||||
python_functions = test_*
|
||||
markers =
|
||||
slow: marks tests as slow (deselect with '-m "not slow"')
|
||||
integration: marks tests as integration tests
|
||||
unit: marks tests as unit tests
|
||||
live_data: tests that require live data access
|
||||
addopts =
|
||||
-v
|
||||
--tb=short
|
||||
--strict-markers
|
||||
--disable-warnings
|
||||
```
|
||||
|
||||
**Run specific test types**:
|
||||
```bash
|
||||
pytest tests/test_utilities.py -v # Single file
|
||||
pytest -m unit # Only unit tests
|
||||
pytest -m "not slow" # Skip slow tests
|
||||
pytest -k "vehicle" # Tests matching "vehicle"
|
||||
pytest --maxfail=1 # Stop on first failure
|
||||
pytest -n auto # Parallel execution (pytest-xdist)
|
||||
```
|
||||
|
||||
### 5. Advanced pytest Features
|
||||
|
||||
#### Parametrized Tests
|
||||
```python
|
||||
@pytest.mark.parametrize("table_name,expected_min_count", [
|
||||
("bronze_fvms.b_vehicle_master", 1000),
|
||||
("bronze_cms.b_customer_master", 500),
|
||||
("bronze_nicherms.b_booking_master", 2000),
|
||||
])
|
||||
def test_bronze_tables_have_minimum_rows(spark, table_name, expected_min_count):
|
||||
"""Validate minimum row counts across multiple live tables"""
|
||||
df = spark.table(table_name)
|
||||
assert df.count() >= expected_min_count
|
||||
```
|
||||
|
||||
#### Fixtures with Live Data Sampling
|
||||
```python
|
||||
@pytest.fixture
|
||||
def stratified_sample(bronze_data):
|
||||
"""Stratified sample from live data for statistical tests"""
|
||||
from pyspark.sql.functions import col
|
||||
return bronze_data.sampleBy("vehicle_type", fractions={"Car": 0.1, "Truck": 0.1})
|
||||
```
|
||||
|
||||
### 6. Testing Best Practices
|
||||
|
||||
**DO**:
|
||||
- ✅ Use `spark.table()` to read LIVE Bronze/Silver/Gold data
|
||||
- ✅ Test with `.limit(100)` for speed, full dataset for validation
|
||||
- ✅ Use `@pytest.fixture(scope="session")` for Spark session (reuse)
|
||||
- ✅ Test actual ETL classes (e.g., `VehicleMaster()`)
|
||||
- ✅ Validate data quality (nulls, duplicates, date ranges)
|
||||
- ✅ Use `pytest.mark.parametrize` for testing multiple tables
|
||||
- ✅ Clean up test outputs in teardown fixtures
|
||||
|
||||
**DON'T**:
|
||||
- ❌ Create mock/fake data (use real data samples)
|
||||
- ❌ Skip testing because "data is too large" (use `.limit()`)
|
||||
- ❌ Write tests that modify production tables
|
||||
- ❌ Ignore schema validation
|
||||
- ❌ Forget to test error handling with real edge cases
|
||||
|
||||
### 7. Example: Complete Test File
|
||||
|
||||
```python
|
||||
# tests/test_silver_vehicle_master.py
|
||||
import pytest
|
||||
from pyspark.sql.functions import col, count, when
|
||||
from python_files.silver.fvms.s_vehicle_master import VehicleMaster
|
||||
|
||||
class TestSilverVehicleMaster:
|
||||
"""Test Silver layer VehicleMaster ETL with LIVE data"""
|
||||
|
||||
@pytest.fixture(scope="class")
|
||||
def silver_df(self, spark):
|
||||
"""Live Silver data - computed once per test class"""
|
||||
return spark.table("silver_fvms.s_vehicle_master")
|
||||
|
||||
def test_all_required_columns_exist(self, silver_df):
|
||||
"""Validate schema completeness"""
|
||||
required = ["vehicle_id", "hash_key", "load_timestamp", "registration_number"]
|
||||
missing = [col for col in required if col not in silver_df.columns]
|
||||
assert not missing, f"Missing columns: {missing}"
|
||||
|
||||
def test_no_nulls_in_primary_key(self, silver_df):
|
||||
"""Primary key cannot be null"""
|
||||
null_count = silver_df.filter(col("vehicle_id").isNull()).count()
|
||||
assert null_count == 0
|
||||
|
||||
def test_hash_key_generated_for_all_rows(self, silver_df):
|
||||
"""Every row must have hash_key"""
|
||||
total = silver_df.count()
|
||||
with_hash = silver_df.filter(col("hash_key").isNotNull()).count()
|
||||
assert total == with_hash
|
||||
|
||||
@pytest.mark.slow
|
||||
def test_deduplication_effectiveness(self, spark):
|
||||
"""Compare Bronze vs Silver row counts"""
|
||||
bronze = spark.table("bronze_fvms.b_vehicle_master")
|
||||
silver = spark.table("silver_fvms.s_vehicle_master")
|
||||
|
||||
bronze_count = bronze.count()
|
||||
silver_count = silver.count()
|
||||
dedup_rate = (bronze_count - silver_count) / bronze_count * 100
|
||||
|
||||
print(f"Deduplication removed {dedup_rate:.2f}% of rows")
|
||||
assert silver_count <= bronze_count
|
||||
```
|
||||
|
||||
## Execution Workflow
|
||||
|
||||
1. **Read target file** ($ARGUMENTS) - Understand transformation logic
|
||||
2. **Identify live data sources** - Find Bronze/Silver tables used
|
||||
3. **Create test file** - `tests/test_<target>.py`
|
||||
4. **Write fixtures** - Setup Spark session, load live data samples
|
||||
5. **Write unit tests** - Test individual utility functions
|
||||
6. **Write integration tests** - Test full ETL with live data
|
||||
7. **Write validation tests** - Check data quality on live tables
|
||||
8. **Run tests**: `pytest tests/test_<target>.py -v`
|
||||
9. **Verify coverage**: Ensure >80% coverage of transformation logic
|
||||
|
||||
## Output Deliverables
|
||||
|
||||
- ✅ pytest test file with 10+ test cases
|
||||
- ✅ conftest.py with reusable fixtures
|
||||
- ✅ pytest.ini configuration
|
||||
- ✅ Tests use LIVE data from Bronze/Silver/Gold
|
||||
- ✅ All tests pass: `pytest -v`
|
||||
- ✅ Documentation comments showing live data usage
|
||||
Reference in New Issue
Block a user