Initial commit

2025-11-30 08:37:55 +08:00
commit 506a828b22
59 changed files with 18515 additions and 0 deletions
--- a/commands/background.md
+++ b/commands/background.md
@@ -0,0 +1,237 @@
+---
+description: Fires off a  agent in the background to complete tasks autonomously
+argument-hint: [user-prompt] | [task-file-name]
+allowed-tools: Read, Task, TodoWrite
+---
+
+# Background PySpark Data Engineer Agent
+
+Launch a PySpark data engineer agent to work autonomously in the background on ETL tasks, data pipeline fixes, or code reviews.
+
+## Usage
+
+**Option 1: Direct prompt**
+```
+/background "Fix the validation issues in g_xa_mg_statsclasscount.py"
+```
+
+**Option 2: Task file from .claude/tasks/**
+```
+/background code_review_fixes_task_list.md
+```
+
+## Variables
+
+- `TASK_INPUT`: Either a direct prompt string or a task file name from `.claude/tasks/`
+- `TASK_FILE_PATH`: Full path to task file if using a task file
+- `PROMPT_CONTENT`: The actual prompt to send to the agent
+
+## Instructions
+
+### 1. Determine Task Source
+
+Check if `$ARGUMENTS` looks like a file name (ends with `.md` or contains no spaces):
+- If YES: It's a task file name from `.claude/tasks/`
+- If NO: It's a direct user prompt
+
+### 2. Load Task Content
+
+**If using task file:**
+1. List all available task files in `.claude/tasks/` directory
+2. Find the task file matching the provided name (exact match or partial match)
+3. Read the task file content
+4. Use the full task file content as the prompt
+
+**If using direct prompt:**
+1. Use the `$ARGUMENTS` directly as the prompt
+
+### 3. Launch PySpark Data Engineer Agent
+
+Launch the specialized `pyspark-data-engineer` agent using the Task tool:
+
+**Important Configuration:**
+- **subagent_type**: `pyspark-data-engineer`
+- **model**: `sonnet` (default) or `opus` for complex tasks
+- **description**: Short 3-5 word description based on task type
+- **prompt**: Complete, detailed instructions including:
+  - The task content (from file or direct prompt)
+  - Explicit instruction to follow `.claude/CLAUDE.md` best practices
+  - Instruction to run quality gates (syntax check, linting, formatting)
+  - Instruction to create a comprehensive final report
+
+**Prompt Template:**
+```
+You are a PySpark data engineer working on the Unify 2.1 Data Migration project using Azure Synapse Analytics.
+
+CRITICAL INSTRUCTIONS:
+- Read and follow ALL guidelines in .claude/CLAUDE.md
+- Use .claude/rules/python_rules.md for coding standards
+- Maximum line length: 240 characters
+- No blank lines inside functions
+- Use @synapse_error_print_handler decorator on all methods
+- Use NotebookLogger for all logging (not print statements)
+- Use TableUtilities methods for DataFrame operations
+
+TASK TO COMPLETE:
+{TASK_CONTENT}
+
+QUALITY GATES (MUST RUN BEFORE COMPLETION):
+1. Syntax validation: python3 -m py_compile <file_path>
+2. Linting: ruff check python_files/
+3. Formatting: ruff format python_files/
+
+FINAL REPORT REQUIREMENTS:
+Provide a comprehensive report including:
+1. Summary of changes made
+2. Files modified with line numbers
+3. Quality gate results (syntax, linting, formatting)
+4. Testing recommendations
+5. Any issues encountered and resolutions
+6. Next steps or follow-up tasks
+
+Work autonomously and complete all tasks in the list. Use your available tools to read files, make edits, run tests, and validate your work.
+```
+
+### 4. Inform User
+
+After launching the agent, inform the user:
+- Agent has been launched in the background
+- Task being worked on (summary)
+- Estimated completion time (if known from task file)
+- The agent will work autonomously and provide a final report
+
+## Task File Structure
+
+Expected task file format in `.claude/tasks/`:
+
+```markdown
+# Task Title
+
+**Date Created**: YYYY-MM-DD
+**Priority**: HIGH/MEDIUM/LOW
+**Estimated Total Time**: X minutes
+**Files Affected**: N
+
+## Task 1: Description
+**File**: path/to/file.py
+**Line**: 123
+**Estimated Time**: X minutes
+**Severity**: CRITICAL/HIGH/MEDIUM/LOW
+
+**Current Code**:
+```python
+# code
+```
+
+**Required Fix**:
+```python
+# fixed code
+```
+
+**Reason**: Explanation
+**Testing**: How to verify
+
+---
+
+(Repeat for each task)
+```
+
+## Examples
+
+### Example 1: Using Task File
+```
+User: /background code_review_fixes_task_list.md
+
+Agent Response:
+1. Lists available task files
+2. Finds and reads code_review_fixes_task_list.md
+3. Launches pyspark-data-engineer agent with task content
+4. Informs user: "PySpark data engineer agent launched to complete 9 code review fixes (est. 27 minutes)"
+```
+
+### Example 2: Using Direct Prompt
+```
+User: /background "Add data validation methods to the statsclasscount gold table and ensure they are called in the transform method"
+
+Agent Response:
+1. Uses the prompt directly
+2. Launches pyspark-data-engineer agent with the prompt
+3. Informs user: "PySpark data engineer agent launched to add data validation methods"
+```
+
+### Example 3: Partial Task File Name Match
+```
+User: /background code_review
+
+Agent Response:
+1. Lists task files and finds "code_review_fixes_task_list.md"
+2. Confirms match with user or proceeds if unambiguous
+3. Launches agent with task content
+```
+
+## Available Task Files
+
+List available task files from `.claude/tasks/` directory when user runs the command without arguments or with "list" argument:
+
+```
+/background
+/background list
+```
+
+Output:
+```
+Available task files in .claude/tasks/:
+1. code_review_fixes_task_list.md (9 tasks, 27 min, HIGH priority)
+
+Usage:
+  /background <task-file-name>    - Run agent with task file
+  /background "your prompt"       - Run agent with direct prompt
+  /background list               - Show available task files
+```
+
+## Agent Workflow
+
+The pyspark-data-engineer agent will:
+
+1. **Read Context**: Load .claude/CLAUDE.md, .claude/rules/python_rules.md
+2. **Analyze Tasks**: Break down task list into actionable items
+3. **Execute Changes**: Read files, make edits, apply fixes
+4. **Validate Work**: Run syntax checks, linting, formatting
+5. **Test Changes**: Execute relevant tests if available
+6. **Generate Report**: Comprehensive summary of all work completed
+
+## Best Practices
+
+### For Task Files
+- Keep tasks atomic and well-defined
+- Include file paths and line numbers
+- Provide current code and required fix
+- Specify testing requirements
+- Estimate time for each task
+- Prioritize tasks (CRITICAL, HIGH, MEDIUM, LOW)
+
+### For Direct Prompts
+- Be specific about files and functionality
+- Reference table/database names
+- Specify layer (bronze, silver, gold)
+- Include any business requirements
+- Mention quality requirements
+
+## Success Criteria
+
+Agent task completion requires:
+- ✅ All code changes implemented
+- ✅ Syntax validation passes (python3 -m py_compile)
+- ✅ Linting passes (ruff check)
+- ✅ Code formatted (ruff format)
+- ✅ No new issues introduced
+- ✅ Comprehensive final report provided
+
+## Notes
+
+- The agent has access to all project files and tools
+- It follows medallion architecture patterns (bronze/silver/gold)
+- It uses established utilities (SparkOptimiser, TableUtilities, NotebookLogger)
+- It respects project coding standards (240 char lines, no blanks in functions)
+- It works autonomously without requiring additional user input
+- Results are reported back when complete
--- a/commands/branch-cleanup.md
+++ b/commands/branch-cleanup.md
@@ -0,0 +1,181 @@
+---
+allowed-tools: Bash(git branch:*), Bash(git checkout:*), Bash(git push:*), Bash(git merge:*), Bash(gh:*), Read, Grep
+argument-hint: [--dry-run] | [--force] | [--remote-only] | [--local-only]
+description: Use PROACTIVELY to clean up merged branches, stale remotes, and organize branch structure
+
+---
+
+# Git Branch Cleanup & Organization
+
+Clean up merged branches and organize repository structure: $ARGUMENTS
+
+## Current Repository State
+
+- All branches: !`git branch -a`
+- Recent branches: !`git for-each-ref --count=10 --sort=-committerdate refs/heads/ --format='%(refname:short) - %(committerdate:relative)'`
+- Remote branches: !`git branch -r`
+- Merged branches: !`git branch --merged main 2>/dev/null || git branch --merged master 2>/dev/null || echo "No main/master branch found"`
+- Current branch: !`git branch --show-current`
+
+## Task
+
+Perform comprehensive branch cleanup and organization based on the repository state and provided arguments.
+
+## Cleanup Operations
+
+### 1. Identify Branches for Cleanup
+- **Merged branches**: Find local branches already merged into main/master
+- **Stale remote branches**: Identify remote-tracking branches that no longer exist
+- **Old branches**: Detect branches with no recent activity (>30 days)
+- **Feature branches**: Organize feature/* hotfix/* release/* branches
+
+### 2. Safety Checks Before Deletion
+- Verify branches are actually merged using `git merge-base`
+- Check if branches have unpushed commits
+- Confirm branches aren't the current working branch
+- Validate against protected branch patterns
+
+### 3. Branch Categories to Handle
+- **Safe to delete**: Merged feature branches, old hotfix branches
+- **Needs review**: Unmerged branches with old commits
+- **Keep**: Main branches (main, master, develop), active feature branches
+- **Archive**: Long-running branches that might need preservation
+
+### 4. Remote Branch Synchronization
+- Remove remote-tracking branches for deleted remotes
+- Prune remote references with `git remote prune origin`
+- Update branch tracking relationships
+- Clean up remote branch references
+
+## Command Modes
+
+### Default Mode (Interactive)
+1. Show branch analysis with recommendations
+2. Ask for confirmation before each deletion
+3. Provide summary of actions taken
+4. Offer to push deletions to remote
+
+### Dry Run Mode (`--dry-run`)
+1. Show what would be deleted without making changes
+2. Display branch analysis and recommendations
+3. Provide cleanup statistics
+4. Exit without modifying repository
+
+### Force Mode (`--force`)
+1. Delete merged branches without confirmation
+2. Clean up stale remotes automatically
+3. Provide summary of all actions taken
+4. Use with caution - no undo capability
+
+### Remote Only (`--remote-only`)
+1. Only clean up remote-tracking branches
+2. Synchronize with actual remote state
+3. Remove stale remote references
+4. Keep all local branches intact
+
+### Local Only (`--local-only`)
+1. Only clean up local branches
+2. Don't affect remote-tracking branches
+3. Keep remote synchronization intact
+4. Focus on local workspace organization
+
+## Safety Features
+
+### Pre-cleanup Validation
+- Ensure working directory is clean
+- Check for uncommitted changes
+- Verify current branch is safe (not target for deletion)
+- Create backup references if requested
+
+### Protected Branches
+Never delete branches matching these patterns:
+- `main`, `master`, `develop`, `staging`, `production`
+- `release/*` (unless explicitly confirmed)
+- Current working branch
+- Branches with unpushed commits (unless forced)
+
+### Recovery Information
+- Display git reflog references for deleted branches
+- Provide commands to recover accidentally deleted branches
+- Show SHA hashes for branch tips before deletion
+- Create recovery script if multiple branches deleted
+
+## Branch Organization Features
+
+### Naming Convention Enforcement
+- Suggest renaming branches to follow team conventions
+- Organize branches by type (feature/, bugfix/, hotfix/)
+- Identify branches that don't follow naming patterns
+- Provide batch renaming suggestions
+
+### Branch Tracking Setup
+- Set up proper upstream tracking for feature branches
+- Configure push/pull behavior for new branches
+- Identify branches missing upstream configuration
+- Fix broken tracking relationships
+
+## Output and Reporting
+
+### Cleanup Summary
+```
+Branch Cleanup Summary:
+✅ Deleted 3 merged feature branches
+✅ Removed 5 stale remote references
+✅ Cleaned up 2 old hotfix branches
+⚠️  Found 1 unmerged branch requiring attention
+📊 Repository now has 8 active branches (was 18)
+```
+
+### Recovery Instructions
+```
+Branch Recovery Commands:
+git checkout -b feature/user-auth 1a2b3c4d  # Recover feature/user-auth
+git push origin feature/user-auth            # Restore to remote
+```
+
+## Best Practices
+
+### Regular Maintenance Schedule
+- Run cleanup weekly for active repositories
+- Use `--dry-run` first to review changes
+- Coordinate with team before major cleanups
+- Document any non-standard branches to preserve
+
+### Team Coordination
+- Communicate branch deletion plans with team
+- Check if anyone has work-in-progress on old branches
+- Use GitHub/GitLab branch protection rules
+- Maintain shared documentation of branch policies
+
+### Branch Lifecycle Management
+- Delete feature branches immediately after merge
+- Keep release branches until next major release
+- Archive long-term experimental branches
+- Use tags to mark important branch states before deletion
+
+## Example Usage
+
+```bash
+# Safe interactive cleanup
+/branch-cleanup
+
+# See what would be cleaned without changes
+/branch-cleanup --dry-run
+
+# Clean only remote tracking branches
+/branch-cleanup --remote-only
+
+# Force cleanup of merged branches
+/branch-cleanup --force
+
+# Clean only local branches
+/branch-cleanup --local-only
+```
+
+## Integration with GitHub/GitLab
+
+If GitHub CLI or GitLab CLI is available:
+- Check PR status before deleting branches
+- Verify branches are actually merged in web interface
+- Clean up both local and remote branches consistently
+- Update branch protection rules if needed
--- a/commands/code-review.md
+++ b/commands/code-review.md
@@ -0,0 +1,70 @@
+---
+allowed-tools: Read, Bash, Grep, Glob
+argument-hint: [file-path] | [commit-hash] | --full
+description: Comprehensive code quality review with security, performance, and architecture analysis
+
+---
+
+# Code Quality Review
+
+Perform comprehensive code quality review: $ARGUMENTS
+
+## Current State
+
+- Git status: !`git status --porcelain`
+- Recent changes: !`git diff --stat HEAD~5`
+- Repository info: !`git log --oneline -5`
+- Build status: !`npm run build --dry-run 2>/dev/null || echo "No build script"`
+
+## Task
+
+Follow these steps to conduct a thorough code review:
+
+1. **Repository Analysis**
+   - Examine the repository structure and identify the primary language/framework
+   - Check for configuration files (package.json, requirements.txt, Cargo.toml, etc.)
+   - Review README and documentation for context
+
+2. **Code Quality Assessment**
+   - Scan for code smells, anti-patterns, and potential bugs
+   - Check for consistent coding style and naming conventions
+   - Identify unused imports, variables, or dead code
+   - Review error handling and logging practices
+
+3. **Security Review**
+   - Look for common security vulnerabilities (SQL injection, XSS, etc.)
+   - Check for hardcoded secrets, API keys, or passwords
+   - Review authentication and authorization logic
+   - Examine input validation and sanitization
+
+4. **Performance Analysis**
+   - Identify potential performance bottlenecks
+   - Check for inefficient algorithms or database queries
+   - Review memory usage patterns and potential leaks
+   - Analyze bundle size and optimization opportunities
+
+5. **Architecture & Design**
+   - Evaluate code organization and separation of concerns
+   - Check for proper abstraction and modularity
+   - Review dependency management and coupling
+   - Assess scalability and maintainability
+
+6. **Testing Coverage**
+   - Check existing test coverage and quality
+   - Identify areas lacking proper testing
+   - Review test structure and organization
+   - Suggest additional test scenarios
+
+7. **Documentation Review**
+   - Evaluate code comments and inline documentation
+   - Check API documentation completeness
+   - Review README and setup instructions
+   - Identify areas needing better documentation
+
+8. **Recommendations**
+   - Prioritize issues by severity (critical, high, medium, low)
+   - Provide specific, actionable recommendations
+   - Suggest tools and practices for improvement
+   - Create a summary report with next steps
+
+Remember to be constructive and provide specific examples with file paths and line numbers where applicable.
--- a/commands/create-feature.md
+++ b/commands/create-feature.md
@@ -0,0 +1,130 @@
+---
+allowed-tools: Read, Write, Edit, Bash
+argument-hint: [feature-name] | [feature-type] [name]
+description: Scaffold new feature with boilerplate code, tests, and documentation
+
+---
+
+# Create Feature
+
+Scaffold new feature: $ARGUMENTS
+
+## Current Project Context
+
+- Project structure: !`find . -maxdepth 2 -type d -name src -o -name components -o -name features | head -5`
+- Current branch: !`git branch --show-current`
+- Package info: @package.json or @Cargo.toml or @requirements.txt (if exists)
+- Architecture docs: @docs/architecture.md or @README.md (if exists)
+
+## Task
+
+Follow this systematic approach to create a new feature: $ARGUMENTS
+
+1. **Feature Planning**
+   - Define the feature requirements and acceptance criteria
+   - Break down the feature into smaller, manageable tasks
+   - Identify affected components and potential impact areas
+   - Plan the API/interface design before implementation
+
+2. **Research and Analysis**
+   - Study existing codebase patterns and conventions
+   - Identify similar features for consistency
+   - Research external dependencies or libraries needed
+   - Review any relevant documentation or specifications
+
+3. **Architecture Design**
+   - Design the feature architecture and data flow
+   - Plan database schema changes if needed
+   - Define API endpoints and contracts
+   - Consider scalability and performance implications
+
+4. **Environment Setup**
+   - Create a new feature branch: `git checkout -b feature/$ARGUMENTS`
+   - Ensure development environment is up to date
+   - Install any new dependencies required
+   - Set up feature flags if applicable
+
+5. **Implementation Strategy**
+   - Start with core functionality and build incrementally
+   - Follow the project's coding standards and patterns
+   - Implement proper error handling and validation
+   - Use dependency injection and maintain loose coupling
+
+6. **Database Changes (if applicable)**
+   - Create migration scripts for schema changes
+   - Ensure backward compatibility
+   - Plan for rollback scenarios
+   - Test migrations on sample data
+
+7. **API Development**
+   - Implement API endpoints with proper HTTP status codes
+   - Add request/response validation
+   - Implement proper authentication and authorization
+   - Document API contracts and examples
+
+8. **Frontend Implementation (if applicable)**
+   - Create reusable components following project patterns
+   - Implement responsive design and accessibility
+   - Add proper state management
+   - Handle loading and error states
+
+9. **Testing Implementation**
+   - Write unit tests for core business logic
+   - Create integration tests for API endpoints
+   - Add end-to-end tests for user workflows
+   - Test error scenarios and edge cases
+
+10. **Security Considerations**
+    - Implement proper input validation and sanitization
+    - Add authorization checks for sensitive operations
+    - Review for common security vulnerabilities
+    - Ensure data protection and privacy compliance
+
+11. **Performance Optimization**
+    - Optimize database queries and indexes
+    - Implement caching where appropriate
+    - Monitor memory usage and optimize algorithms
+    - Consider lazy loading and pagination
+
+12. **Documentation**
+    - Add inline code documentation and comments
+    - Update API documentation
+    - Create user documentation if needed
+    - Update project README if applicable
+
+13. **Code Review Preparation**
+    - Run all tests and ensure they pass
+    - Run linting and formatting tools
+    - Check for code coverage and quality metrics
+    - Perform self-review of the changes
+
+14. **Integration Testing**
+    - Test feature integration with existing functionality
+    - Verify feature flags work correctly
+    - Test deployment and rollback procedures
+    - Validate monitoring and logging
+
+15. **Commit and Push**
+    - Create atomic commits with descriptive messages
+    - Follow conventional commit format if project uses it
+    - Push feature branch: `git push origin feature/$ARGUMENTS`
+
+16. **Pull Request Creation**
+    - Create PR with comprehensive description
+    - Include screenshots or demos if applicable
+    - Add appropriate labels and reviewers
+    - Link to any related issues or specifications
+
+17. **Quality Assurance**
+    - Coordinate with QA team for testing
+    - Address any bugs or issues found
+    - Verify accessibility and usability requirements
+    - Test on different environments and browsers
+
+18. **Deployment Planning**
+    - Plan feature rollout strategy
+    - Set up monitoring and alerting
+    - Prepare rollback procedures
+    - Schedule deployment and communication
+
+Remember to maintain code quality, follow project conventions, and prioritize user experience throughout the development process.
--- a/commands/create-pr.md
+++ b/commands/create-pr.md
@@ -0,0 +1,19 @@
+# Create Pull Request Command
+
+Create a new branch, commit changes, and submit a pull request.
+
+## Behavior
+- Creates a new branch based on current changes
+- Formats modified files using Biome
+- Analyzes changes and automatically splits into logical commits when appropriate
+- Each commit focuses on a single logical change or feature
+- Creates descriptive commit messages for each logical unit
+- Pushes branch to remote
+- Creates pull request with proper summary and test plan
+
+## Guidelines for Automatic Commit Splitting
+- Split commits by feature, component, or concern
+- Keep related file changes together in the same commit
+- Separate refactoring from feature additions
+- Ensure each commit can be understood independently
+- Multiple unrelated changes should be split into separate commits
--- a/commands/create-prd.md
+++ b/commands/create-prd.md
@@ -0,0 +1,36 @@
+---
+allowed-tools: Read, Write, Edit, Grep, Glob
+argument-hint: [feature-name] | --template | --interactive
+description: Create Product Requirements Document (PRD) for new features
+model: sonnet
+---
+
+# Create Product Requirements Document
+
+You are an experienced Product Manager. Create a Product Requirements Document (PRD) for a feature we are adding to the product: **$ARGUMENTS**
+
+**IMPORTANT:**
+- Focus on the feature and user needs, not technical implementation
+- Do not include any time estimates
+
+## Product Context
+
+1. **Product Documentation**: @product-development/resources/product.md (to understand the product)
+2. **Feature Documentation**: @product-development/current-feature/feature.md (to understand the feature idea)
+3. **JTBD Documentation**: @product-development/current-feature/JTBD.md (to understand the Jobs to be Done)
+
+## Task
+
+Create a comprehensive PRD document that captures the what, why, and how of the product:
+
+1. Use the PRD template from `@product-development/resources/PRD-template.md`
+2. Based on the feature documentation, create a PRD that defines:
+   - Problem statement and user needs
+   - Feature specifications and scope
+   - Success metrics and acceptance criteria
+   - User experience requirements
+   - Technical considerations (high-level only)
+
+3. Output the completed PRD to `product-development/current-feature/PRD.md`
+
+Focus on creating a comprehensive PRD that clearly defines the feature requirements while maintaining alignment with user needs and business objectives.
--- a/commands/create-pull-request.md
+++ b/commands/create-pull-request.md
@@ -0,0 +1,126 @@
+# How to Create a Pull Request Using GitHub CLI
+
+This guide explains how to create pull requests using GitHub CLI in our project.
+
+## Prerequisites
+
+1. Install GitHub CLI if you haven't already:
+
+   ```bash
+   # macOS
+   brew install gh
+
+   # Windows
+   winget install --id GitHub.cli
+
+   # Linux
+   # Follow instructions at https://github.com/cli/cli/blob/trunk/docs/install_linux.md
+   ```
+
+2. Authenticate with GitHub:
+   ```bash
+   gh auth login
+   ```
+
+## Creating a New Pull Request
+
+1. First, prepare your PR description following the template in `.github/pull_request_template.md`
+
+2. Use the `gh pr create` command to create a new pull request:
+
+   ```bash
+   # Basic command structure
+   gh pr create --title "✨(scope): Your descriptive title" --body "Your PR description" --base main --draft
+   ```
+
+   For more complex PR descriptions with proper formatting, use the `--body-file` option with the exact PR template structure:
+
+   ```bash
+   # Create PR with proper template structure
+   gh pr create --title "✨(scope): Your descriptive title" --body-file <(echo -e "## Issue\n\n- resolve:\n\n## Why is this change needed?\nYour description here.\n\n## What would you like reviewers to focus on?\n- Point 1\n- Point 2\n\n## Testing Verification\nHow you tested these changes.\n\n## What was done\npr_agent:summary\n\n## Detailed Changes\npr_agent:walkthrough\n\n## Additional Notes\nAny additional notes.") --base main --draft
+   ```
+
+## Best Practices
+
+1. **PR Title Format**: Use conventional commit format with emojis
+
+   - Always include an appropriate emoji at the beginning of the title
+   - Use the actual emoji character (not the code representation like `:sparkles:`)
+   - Examples:
+     - `✨(supabase): Add staging remote configuration`
+     - `🐛(auth): Fix login redirect issue`
+     - `📝(readme): Update installation instructions`
+
+2. **Description Template**: Always use our PR template structure from `.github/pull_request_template.md`:
+
+   - Issue reference
+   - Why the change is needed
+   - Review focus points
+   - Testing verification
+   - PR-Agent sections (keep `pr_agent:summary` and `pr_agent:walkthrough` tags intact)
+   - Additional notes
+
+3. **Template Accuracy**: Ensure your PR description precisely follows the template structure:
+
+   - Don't modify or rename the PR-Agent sections (`pr_agent:summary` and `pr_agent:walkthrough`)
+   - Keep all section headers exactly as they appear in the template
+   - Don't add custom sections that aren't in the template
+
+4. **Draft PRs**: Start as draft when the work is in progress
+   - Use `--draft` flag in the command
+   - Convert to ready for review when complete using `gh pr ready`
+
+### Common Mistakes to Avoid
+
+1. **Incorrect Section Headers**: Always use the exact section headers from the template
+2. **Modifying PR-Agent Sections**: Don't remove or modify the `pr_agent:summary` and `pr_agent:walkthrough` placeholders
+3. **Adding Custom Sections**: Stick to the sections defined in the template
+4. **Using Outdated Templates**: Always refer to the current `.github/pull_request_template.md` file
+
+### Missing Sections
+
+Always include all template sections, even if some are marked as "N/A" or "None"
+
+## Additional GitHub CLI PR Commands
+
+Here are some additional useful GitHub CLI commands for managing PRs:
+
+```bash
+# List your open pull requests
+gh pr list --author "@me"
+
+# Check PR status
+gh pr status
+
+# View a specific PR
+gh pr view <PR-NUMBER>
+
+# Check out a PR branch locally
+gh pr checkout <PR-NUMBER>
+
+# Convert a draft PR to ready for review
+gh pr ready <PR-NUMBER>
+
+# Add reviewers to a PR
+gh pr edit <PR-NUMBER> --add-reviewer username1,username2
+
+# Merge a PR
+gh pr merge <PR-NUMBER> --squash
+```
+
+## Using Templates for PR Creation
+
+To simplify PR creation with consistent descriptions, you can create a template file:
+
+1. Create a file named `pr-template.md` with your PR template
+2. Use it when creating PRs:
+
+```bash
+gh pr create --title "feat(scope): Your title" --body-file pr-template.md --base main --draft
+```
+
+## Related Documentation
+
+- [PR Template](.github/pull_request_template.md)
+- [Conventional Commits](https://www.conventionalcommits.org/)
+- [GitHub CLI documentation](https://cli.github.com/manual/)
--- a/commands/describe.md
+++ b/commands/describe.md
@@ -0,0 +1,197 @@
+---
+allowed-tools: Read, mcp__mcp-server-motherduck__query, Grep, Glob, Bash
+argument-hint: [file-path] (optional - defaults to currently open file)
+description: Add comprehensive descriptive comments to code files, focusing on data flow, joining logic, and business context
+---
+
+# Add Descriptive Comments to Code
+
+Add detailed, descriptive comments to the selected file: $ARGUMENTS
+
+## Current Context
+
+- Currently open file: !`echo $CLAUDE_OPEN_FILE`
+- File layer detection: !`basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown"`
+- Git status: !`git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"`
+
+## Task
+
+You will add comprehensive descriptive comments to the **currently open file** (or the file specified in $ARGUMENTS if provided).
+
+### Instructions
+
+1. **Determine Target File**
+   - If $ARGUMENTS contains a file path, use that file
+   - Otherwise, use the currently open file from the IDE
+   - Verify the file exists and is readable
+
+2. **Analyze File Context**
+   - Identify the file type (silver/gold layer transformation, utility, pipeline operation)
+   - Read and understand the complete file structure
+   - Identify the ETL pattern (extract, transform, load methods)
+   - Map out all DataFrame operations and transformations
+
+3. **Analyze Data Sources and Schemas**
+   - Use DuckDB MCP to query relevant source tables if available:
+     ```sql
+     -- Example: Check schema of source table
+     DESCRIBE table_name;
+     SELECT * FROM table_name LIMIT 5;
+     ```
+   - Reference `.claude/memory/data_dictionary/` for column definitions and business context
+   - Identify all source tables being read (bronze/silver layer)
+   - Document the schema of input and output DataFrames
+
+4. **Document Joining Logic (Priority Focus)**
+   - For each join operation, add comments explaining:
+     - **WHY** the join is happening (business reason)
+     - **WHAT** tables are being joined
+     - **JOIN TYPE** (left, inner, outer) and why that type was chosen
+     - **JOIN KEYS** and their meaning
+     - **EXPECTED CARDINALITY** (1:1, 1:many, many:many)
+     - **NULL HANDLING** strategy for unmatched records
+
+   Example format:
+   ```python
+   # JOIN: Link incidents to persons involved
+   # Type: LEFT JOIN (preserve all incidents even if person data missing)
+   # Keys: incident_id (unique identifier from FVMS system)
+   # Expected: 1:many (one incident can have multiple persons)
+   # Nulls: Person details will be NULL for incidents with no associated persons
+   joined_df = incident_df.join(person_df, on="incident_id", how="left")
+   ```
+
+5. **Document Transformations Step-by-Step**
+   - Add inline comments explaining each transformation
+   - Describe column derivations and calculations
+   - Explain business rules being applied
+   - Document any data quality fixes or cleansing
+   - Note any deduplication logic
+
+6. **Document Data Quality Patterns**
+   - Explain null handling strategies
+   - Document default values and their business meaning
+   - Describe validation rules
+   - Note any data type conversions
+
+7. **Add Function/Method Documentation**
+   - Add docstring-style comments at the start of each method explaining:
+     - Purpose of the method
+     - Input: Source tables and their schemas
+     - Output: Resulting table and schema
+     - Business logic summary
+
+   Example format:
+   ```python
+   def transform(self) -> DataFrame:
+       """
+       Transform incident data with person and location enrichment.
+
+       Input: bronze_fvms.b_fvms_incident (raw incident records)
+       Output: silver_fvms.s_fvms_incident (validated, enriched incidents)
+
+       Transformations:
+       1. Join with person table to add demographic details
+       2. Join with address table to add location coordinates
+       3. Apply business rules for incident classification
+       4. Deduplicate based on incident_id and date_created
+       5. Add row hash for change detection
+
+       Business Context:
+       - Incidents represent family violence events recorded in FVMS
+       - Each incident may involve multiple persons (victims, offenders)
+       - Location data enables geographic analysis and reporting
+       """
+   ```
+
+8. **Add Header Comments**
+   - Add a comprehensive header at the top of the file explaining:
+     - File purpose and business context
+     - Source systems and tables
+     - Target table and database
+     - Key transformations and business rules
+     - Dependencies on other tables or processes
+
+9. **Variable Naming Context**
+   - When variable names are abbreviated or unclear, add comments explaining:
+     - What the variable represents
+     - The business meaning of the data
+     - Expected data types and formats
+     - Reference data dictionary entries if available
+
+10. **Use Data Dictionary References**
+    - Check `.claude/memory/data_dictionary/` for column definitions
+    - Reference these definitions in comments to explain field meanings
+    - Link business terminology to technical column names
+    - Example: `# offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)`
+
+11. **Query DuckDB for Context (When Available)**
+    - Use MCP DuckDB tool to inspect actual data patterns:
+    - Check distinct values: `SELECT DISTINCT column_name FROM table LIMIT 20;`
+    - Verify join relationships: `SELECT COUNT(*) FROM table1 JOIN table2 ...`
+    - Understand data distributions: `SELECT column, COUNT(*) FROM table GROUP BY column;`
+    - Use insights from queries to write more accurate comments
+
+12. **Preserve Code Formatting Standards**
+    - Do NOT add blank lines inside functions (project standard)
+    - Maximum line length: 240 characters
+    - Maintain existing indentation
+    - Keep comments concise but informative
+    - Use inline comments for single-line explanations
+    - Use block comments for multi-step processes
+
+13. **Focus Areas by File Type**
+
+    **Silver Layer Files (`python_files/silver/`):**
+    - Document source bronze tables
+    - Explain validation rules
+    - Describe enumeration mappings
+    - Note data cleansing operations
+
+    **Gold Layer Files (`python_files/gold/`):**
+    - Document all source silver tables
+    - Explain aggregation logic
+    - Describe business metrics calculations
+    - Note analytical transformations
+
+    **Utility Files (`python_files/utilities/`):**
+    - Explain helper function purposes
+    - Document parameter meanings
+    - Describe return values
+    - Note edge cases handled
+
+14. **Comment Quality Guidelines**
+    - Comments should explain **WHY**, not just **WHAT**
+    - Avoid obvious comments (e.g., don't say "create dataframe" for `df = spark.createDataFrame()`)
+    - Focus on business context and data relationships
+    - Use proper grammar and complete sentences
+    - Be concise but thorough
+    - Think like a new developer reading the code for the first time
+
+15. **Final Validation**
+    - Run syntax check: `python3 -m py_compile <file>`
+    - Run linting: `ruff check <file>`
+    - Format code: `ruff format <file>`
+    - Ensure all comments are accurate and helpful
+
+## Example Output Structure
+
+After adding comments, the file should have:
+- ✅ Comprehensive header explaining file purpose
+- ✅ Method-level documentation for extract/transform/load
+- ✅ Detailed join operation comments (business reason, type, keys, cardinality)
+- ✅ Step-by-step transformation explanations
+- ✅ Data quality and validation logic documented
+- ✅ Variable context for unclear names
+- ✅ References to data dictionary where applicable
+- ✅ Business context linking technical operations to real-world meaning
+
+## Important Notes
+- **ALWAYS** use Australian English spelling conventions throughout the comments and documentation
+- **DO NOT** remove or modify existing functionality
+- **DO NOT** change code structure or logic
+- **ONLY** add descriptive comments
+- **PRESERVE** all existing comments
+- **MAINTAIN** project coding standards (no blank lines in functions, 240 char max)
+- **USE** the data dictionary and DuckDB queries to provide accurate context
+- **THINK** about the user who will read this code - walk them through the logic clearly
--- a/commands/dev-agent.md
+++ b/commands/dev-agent.md
@@ -0,0 +1,88 @@
+# PySpark Azure Synapse Expert Agent
+
+## Overview
+Expert data engineer specializing in PySpark development within Azure Synapse Analytics environment. Focuses on scalable data processing, optimization, and enterprise-grade solutions.
+
+## Core Competencies
+
+### PySpark Expertise
+- Advanced DataFrame/Dataset operations
+- Performance optimization and tuning
+- Custom UDFs and aggregations
+- Spark SQL query optimization
+- Memory management and partitioning strategies
+
+### Azure Synapse Mastery
+- Synapse Spark pools configuration
+- Integration with Azure Data Lake Storage
+- Synapse Pipelines orchestration
+- Serverless SQL pools interaction
+
+
+### Data Engineering Skills
+- ETL/ELT pipeline design
+- Data quality and validation frameworks
+
+## Technical Stack
+
+### Languages & Frameworks
+- **Primary**: Python, PySpark
+- **Secondary**: SQL, PowerShell
+- **Libraries**: pandas, numpy, pytest
+
+### Azure Services
+- Azure Synapse Analytics
+- Azure Data Lake Storage Gen2
+- Azure Key Vault
+
+### Tools & Platforms
+- Git/Azure DevOps
+- Jupyter/Synapse Notebooks
+
+## Responsibilities
+
+### Development
+- Design optimized PySpark jobs for large-scale data processing
+- Implement data transformation logic with performance considerations
+- Create reusable libraries and frameworks
+- Build automated testing suites for data pipelines
+
+### Optimization
+- Analyze and tune Spark job performance
+- Optimize cluster configurations and resource allocation
+- Implement caching strategies and data skew handling
+- Monitor and troubleshoot production workloads
+
+### Architecture
+- Design scalable data lake architectures
+- Establish data partitioning and storage strategies
+- Define data governance and security protocols
+- Create disaster recovery and backup procedures
+
+## Best Practices
+**CRITICAL** read .claude/CLAUDE.md for best practices
+
+
+### Performance
+- Leverage broadcast joins and bucketing
+- Optimize shuffle operations and partition sizes
+- Use appropriate file formats (Parquet, Delta)
+- Implement incremental processing patterns
+
+### Security
+- Implement row-level and column-level security
+- Use managed identities and service principals
+- Encrypt data at rest and in transit
+- Follow least privilege access principles
+
+## Communication Style
+- Provides technical solutions with clear performance implications
+- Focuses on scalable, production-ready implementations
+- Emphasizes best practices and enterprise patterns
+- Delivers concise explanations with practical examples
+
+## Key Metrics
+- Pipeline execution time and resource utilization
+- Data quality scores and SLA compliance
+- Cost optimization and resource efficiency
+- System reliability and uptime statistics
--- a/commands/explain-code.md
+++ b/commands/explain-code.md
@@ -0,0 +1,194 @@
+# Analyze and Explain Code Functionality
+
+Analyze and explain code functionality
+
+## Instructions
+
+Follow this systematic approach to explain code: **$ARGUMENTS**
+
+1. **Code Context Analysis**
+   - Identify the programming language and framework
+   - Understand the broader context and purpose of the code
+   - Identify the file location and its role in the project
+   - Review related imports, dependencies, and configurations
+
+2. **High-Level Overview**
+   - Provide a summary of what the code does
+   - Explain the main purpose and functionality
+   - Identify the problem the code is solving
+   - Describe how it fits into the larger system
+
+3. **Code Structure Breakdown**
+   - Break down the code into logical sections
+   - Identify classes, functions, and methods
+   - Explain the overall architecture and design patterns
+   - Map out data flow and control flow
+
+4. **Line-by-Line Analysis**
+   - Explain complex or non-obvious lines of code
+   - Describe variable declarations and their purposes
+   - Explain function calls and their parameters
+   - Clarify conditional logic and loops
+
+5. **Algorithm and Logic Explanation**
+   - Describe the algorithm or approach being used
+   - Explain the logic behind complex calculations
+   - Break down nested conditions and loops
+   - Clarify recursive or asynchronous operations
+
+6. **Data Structures and Types**
+   - Explain data types and structures being used
+   - Describe how data is transformed or processed
+   - Explain object relationships and hierarchies
+   - Clarify input and output formats
+
+7. **Framework and Library Usage**
+   - Explain framework-specific patterns and conventions
+   - Describe library functions and their purposes
+   - Explain API calls and their expected responses
+   - Clarify configuration and setup code
+
+8. **Error Handling and Edge Cases**
+   - Explain error handling mechanisms
+   - Describe exception handling and recovery
+   - Identify edge cases being handled
+   - Explain validation and defensive programming
+
+9. **Performance Considerations**
+   - Identify performance-critical sections
+   - Explain optimization techniques being used
+   - Describe complexity and scalability implications
+   - Point out potential bottlenecks or inefficiencies
+
+10. **Security Implications**
+    - Identify security-related code sections
+    - Explain authentication and authorization logic
+    - Describe input validation and sanitization
+    - Point out potential security vulnerabilities
+
+11. **Testing and Debugging**
+    - Explain how the code can be tested
+    - Identify debugging points and logging
+    - Describe mock data or test scenarios
+    - Explain test helpers and utilities
+
+12. **Dependencies and Integrations**
+    - Explain external service integrations
+    - Describe database operations and queries
+    - Explain API interactions and protocols
+    - Clarify third-party library usage
+
+**Explanation Format Examples:**
+
+**For Complex Algorithms:**
+```
+This function implements a depth-first search algorithm:
+
+1. Line 1-3: Initialize a stack with the starting node and a visited set
+2. Line 4-8: Main loop - continue until stack is empty
+3. Line 9-11: Pop a node and check if it's the target
+4. Line 12-15: Add unvisited neighbors to the stack
+5. Line 16: Return null if target not found
+
+Time Complexity: O(V + E) where V is vertices and E is edges
+Space Complexity: O(V) for the visited set and stack
+```
+
+**For API Integration Code:**
+```
+This code handles user authentication with a third-party service:
+
+1. Extract credentials from request headers
+2. Validate credential format and required fields
+3. Make API call to authentication service
+4. Handle response and extract user data
+5. Create session token and set cookies
+6. Return user profile or error response
+
+Error Handling: Catches network errors, invalid credentials, and service unavailability
+Security: Uses HTTPS, validates inputs, and sanitizes responses
+```
+
+**For Database Operations:**
+```
+This function performs a complex database query with joins:
+
+1. Build base query with primary table
+2. Add LEFT JOIN for related user data
+3. Apply WHERE conditions for filtering
+4. Add ORDER BY for consistent sorting
+5. Implement pagination with LIMIT/OFFSET
+6. Execute query and handle potential errors
+7. Transform raw results into domain objects
+
+Performance Notes: Uses indexes on filtered columns, implements connection pooling
+```
+
+13. **Common Patterns and Idioms**
+    - Identify language-specific patterns and idioms
+    - Explain design patterns being implemented
+    - Describe architectural patterns in use
+    - Clarify naming conventions and code style
+
+14. **Potential Improvements**
+    - Suggest code improvements and optimizations
+    - Identify possible refactoring opportunities
+    - Point out maintainability concerns
+    - Recommend best practices and standards
+
+15. **Related Code and Context**
+    - Reference related functions and classes
+    - Explain how this code interacts with other components
+    - Describe the calling context and usage patterns
+    - Point to relevant documentation and resources
+
+16. **Debugging and Troubleshooting**
+    - Explain how to debug issues in this code
+    - Identify common failure points
+    - Describe logging and monitoring approaches
+    - Suggest testing strategies
+
+**Language-Specific Considerations:**
+
+**JavaScript/TypeScript:**
+- Explain async/await and Promise handling
+- Describe closure and scope behavior
+- Clarify this binding and arrow functions
+- Explain event handling and callbacks
+
+**Python:**
+- Explain list comprehensions and generators
+- Describe decorator usage and purpose
+- Clarify context managers and with statements
+- Explain class inheritance and method resolution
+
+**Java:**
+- Explain generics and type parameters
+- Describe annotation usage and processing
+- Clarify stream operations and lambda expressions
+- Explain exception hierarchy and handling
+
+**C#:**
+- Explain LINQ queries and expressions
+- Describe async/await and Task handling
+- Clarify delegate and event usage
+- Explain nullable reference types
+
+**Go:**
+- Explain goroutines and channel usage
+- Describe interface implementation
+- Clarify error handling patterns
+- Explain package structure and imports
+
+**Rust:**
+- Explain ownership and borrowing
+- Describe lifetime annotations
+- Clarify pattern matching and Option/Result types
+- Explain trait implementations
+
+Remember to:
+- Use clear, non-technical language when possible
+- Provide examples and analogies for complex concepts
+- Structure explanations logically from high-level to detailed
+- Include visual diagrams or flowcharts when helpful
+- Tailor the explanation level to the intended audience
--- a/commands/local-commit.md
+++ b/commands/local-commit.md
@@ -0,0 +1,361 @@
+---
+allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item
+argument-hint: [message] | --no-verify | --amend | --pr-s | --pr-d | --pr-m
+description: Create well-formatted commits with conventional commit format and emoji, integrated with Azure DevOps
+---
+
+# Smart Git Commit with Azure DevOps Integration
+
+Create well-formatted commit: $ARGUMENTS
+
+## Repository Configuration
+- **Project**: Program Unify
+- **Repository ID**: e030ea00-2f85-4b19-88c3-05a864d7298d
+- **Repository Name**: unify_2_1_dm_synapse_env_d10
+- **Branch Structure**: `feature/* → staging → develop → main`
+- **Main Branch**: main
+
+## Implementation Logic for Claude
+
+When processing this command, Claude should:
+
+1. **Detect Repository**: Check if current repo is `unify_2_1_dm_synapse_env_d10`
+   - Use `git remote -v` or check current directory path
+   - Can also use `mcp__ado__repo_get_repo_by_name_or_id` to verify
+
+2. **Parse Arguments**: Extract flags from `$ARGUMENTS`
+   - **PR Flags**:
+     - `--pr-s`: Set target = `staging`
+     - `--pr-d`: Set target = `develop`
+     - `--pr-m`: Set target = `main`
+     - `--pr` (no suffix): ERROR if unify_2_1_dm_synapse_env_d10, else target = `develop`
+
+3. **Validate Current Branch** (if PR flag provided):
+   - Get current branch: `git branch --show-current`
+   - For `--pr-s`: Require `feature/*` branch (reject `staging`, `develop`, `main`)
+   - For `--pr-d`: Require `staging` branch exactly
+   - For `--pr-m`: Require `develop` branch exactly
+   - If validation fails: Show clear error and exit
+
+4. **Execute Commit Workflow**:
+   - Stage changes (`git add .` )
+   - Create commit with emoji conventional format
+   - Run pre-commit hooks (unless `--no-verify`)
+   - Push to current branch
+
+5. **Create Pull Request** (if PR flag):
+   - Call `mcp__ado__repo_create_pull_request` with:
+     - `repository_id`: e030ea00-2f85-4b19-88c3-05a864d7298d
+     - `source_branch`: Current branch from step 3
+     - `target_branch`: Target from step 2
+     - `title`: Extract from commit message
+     - `description`: Generate with summary and test plan
+   - Return PR URL to user
+
+6. **Add Work Item Comments Automatically** (if PR was created in step 5):
+   - **Condition Check**: Only execute if:
+     - A PR was created in step 5 (any `--pr-*` flag was used)
+     - PR creation was successful and returned a PR ID
+   - **Get Work Items from PR**:
+     - Use `mcp__ado__repo_get_pull_request_by_id` with:
+       - `repositoryId`: e030ea00-2f85-4b19-88c3-05a864d7298d
+       - `pullRequestId`: PR ID from step 5
+       - `includeWorkItemRefs`: true
+     - Extract work item IDs from the PR response
+     - If no work items found, log info message and skip to next step
+   - **Add Comments to Each Work Item**:
+     - For each work item ID extracted from PR:
+       - Use `mcp__ado__wit_get_work_item` to verify work item exists
+       - Generate comment with:
+         - PR title and number
+         - Commit message and SHA
+         - File changes summary from `git diff --stat`
+         - Link to PR in Azure DevOps
+         - Link to commit in Azure DevOps
+         - **IMPORTANT**: Do NOT include any footer text like "Automatically added by /local-commit command" or similar attribution
+       - Call `mcp__ado__wit_add_work_item_comment` with:
+         - `project`: "Program Unify"
+         - `workItemId`: Current work item ID
+         - `comment`: Generated comment with HTML formatting
+         - `format`: "html"
+     - Log success/failure for each work item
+     - If ANY work item fails, warn but don't fail the commit
+
+## Current Repository State
+
+- Git status: !`git status --short`
+- Current branch: !`git branch --show-current`
+- Staged changes: !`git diff --cached --stat`
+- Unstaged changes: !`git diff --stat`
+- Recent commits: !`git log --oneline -5`
+
+## What This Command Does
+
+1. Analyzes current git status and changes
+2. If no files staged, stages all modified files with `git add`
+3. Reviews changes with `git diff`
+4. Analyzes for multiple logical changes
+5. For complex changes, suggests split commits
+6. Creates commit with emoji conventional format
+7. Automatically runs pre-commit hooks (ruff lint/format, trailing whitespace, etc.)
+   - Pre-commit may modify files (auto-fixes)
+   - If files are modified, they'll be re-staged automatically
+   - Use `--no-verify` to skip hooks in emergencies only
+8. **NEW**: With PR flags, creates Azure DevOps pull request after push
+   - Uses `mcp__ado__repo_create_pull_request` to create PR
+   - Automatically links work items if commit message contains work item IDs
+   - **IMPORTANT Branch Flow Rules** (unify_2_1_dm_synapse_env_d10 ONLY):
+     - `--pr-s`: Feature branch → `staging` (standard feature PR)
+     - `--pr-d`: `staging` → `develop` (promote staging to develop)
+     - `--pr-m`: `develop` → `main` (promote develop to production)
+     - `--pr`: **NOT ALLOWED** - must specify `-s`, `-d`, or `-m` for this repository
+   - **For OTHER repositories**: `--pr` creates PR to `develop` branch (legacy behavior)
+9. **NEW**: Automatically adds comments to linked work items after PR creation
+   - Retrieves work items linked to the PR using `mcp__ado__repo_get_pull_request_by_id`
+   - Automatically adds comment to each linked work item with:
+     - PR title and number
+     - Commit message and SHA
+     - Summary of file changes
+     - Direct link to PR in Azure DevOps
+     - Direct link to commit in Azure DevOps
+     - **IMPORTANT**: No footer attribution text (e.g., "Automatically added by /local-commit command")
+   - Validates work items exist before commenting
+   - Continues even if some work items fail (warns only) 
+
+## Commit Message Format
+
+### Type + Emoji Mapping
+- ✨ `feat`: New feature
+- 🐛 `fix`: Bug fix
+- 📝 `docs`: Documentation
+- 💄 `style`: Formatting/style
+- ♻️ `refactor`: Code refactoring
+- ⚡️ `perf`: Performance improvements
+- ✅ `test`: Tests
+- 🔧 `chore`: Tooling, configuration
+- 🚀 `ci`: CI/CD improvements
+- ⏪️ `revert`: Reverting changes
+- 🚨 `fix`: Compiler/linter warnings
+- 🔒️ `fix`: Security issues
+- 🩹 `fix`: Simple non-critical fix
+- 🚑️ `fix`: Critical hotfix
+- 🎨 `style`: Code structure/format
+- 🔥 `fix`: Remove code/files
+- 📦️ `chore`: Dependencies
+- 🌱 `chore`: Seed files
+- 🧑‍💻 `chore`: Developer experience
+- 🏷️ `feat`: Types
+- 💬 `feat`: Text/literals
+- 🌐 `feat`: i18n/l10n
+- 💡 `feat`: Business logic
+- 📱 `feat`: Responsive design
+- 🚸 `feat`: UX improvements
+- ♿️ `feat`: Accessibility
+- 🗃️ `db`: Database changes
+- 🚩 `feat`: Feature flags
+- ⚰️ `refactor`: Remove dead code
+- 🦺 `feat`: Validation
+
+## Commit Strategy
+
+### Single Commit (Default)
+```bash
+git add .
+git commit -m "✨ feat: implement user auth"
+```
+
+### Multiple Commits (Complex Changes)
+```bash
+# Stage and commit separately
+git add src/auth.py
+git commit -m "✨ feat: add authentication module"
+
+git add tests/test_auth.py
+git commit -m "✅ test: add auth unit tests"
+
+git add docs/auth.md
+git commit -m "📝 docs: document auth API"
+
+# Push all commits
+git push
+```
+
+## Pre-Commit Hooks
+
+Your project uses pre-commit with:
+- **Ruff**: Linting with auto-fix + formatting
+- **Standard hooks**: Trailing whitespace, AST check, YAML/JSON/TOML validation
+- **Security**: Private key detection
+- **Quality**: Debug statement detection, merge conflict check
+
+**Important**: Pre-commit hooks will auto-fix issues and may modify your files. The commit process will:
+1. Run pre-commit hooks
+2. If hooks modify files, automatically re-stage them
+3. Complete the commit with all fixes applied
+
+## Command Options
+
+- `--no-verify`: Skip pre-commit checks (emergency use only)
+- `--amend`: Amend previous commit
+- **`--pr-s`**: Create PR to `staging` branch (feature → staging)
+- **`--pr-d`**: Create PR to `develop` branch (staging → develop)
+- **`--pr-m`**: Create PR to `main` branch (develop → main)
+- `--pr`: Legacy flag for other repositories (creates PR to `develop`)
+  - **NOT ALLOWED** in unify_2_1_dm_synapse_env_d10 - must use `-s`, `-d`, or `-m`
+- Default: Run all pre-commit hooks and create new commit
+- **Automatic Work Item Comments**: When using any PR flag, work items linked to the PR will automatically receive comments with commit details (no footer attribution)
+
+## Azure DevOps Integration Features
+
+### Pull Request Workflow (PR Flags)
+When using PR flags, the command will:
+1. Commit changes locally
+2. Push to remote branch
+3. Validate repository and branch configuration:
+   - **THIS repo (unify_2_1_dm_synapse_env_d10)**: Requires explicit flag (`--pr-s`, `--pr-d`, or `--pr-m`)
+     - `--pr-s`: Current feature branch → `staging`
+     - `--pr-d`: Must be on `staging` branch → `develop`
+     - `--pr-m`: Must be on `develop` branch → `main`
+     - `--pr` alone: **ERROR** - must specify target
+   - **OTHER repos**: `--pr` creates PR to `develop` (all other flags ignored)
+4. Use `mcp__ado__repo_create_pull_request` to create PR with:
+   - **Title**: Extracted from commit message
+   - **Description**: Full commit details with summary and test plan
+   - **Source Branch**: Current branch
+   - **Target Branch**: Determined by flag and repository
+   - **Work Items**: Auto-linked from commit message (e.g., "fixes #12345")
+
+### Viewing Commit History
+You can view commit history using:
+- `mcp__ado__repo_search_commits` - Search commits by branch, author, date range
+- Traditional `git log` - For local history
+
+### Branch Management
+- `mcp__ado__repo_list_branches_by_repo` - View all Azure DevOps branches
+- `git branch` - View local branches
+
+## Branch Validation Rules (unify_2_1_dm_synapse_env_d10)
+
+Before creating a PR, the command validates:
+
+### --pr-s (Feature → Staging)
+- ✅ **ALLOWED**: Any `feature/*` branch
+- ❌ **BLOCKED**: `staging`, `develop`, `main` branches
+- **Target**: `staging`
+
+### --pr-d (Staging → Develop)
+- ✅ **ALLOWED**: Only `staging` branch
+- ❌ **BLOCKED**: All other branches (including `feature/*`)
+- **Target**: `develop`
+
+### --pr-m (Develop → Main)
+- ✅ **ALLOWED**: Only `develop` branch
+- ❌ **BLOCKED**: All other branches (including `staging`, `feature/*`)
+- **Target**: `main`
+
+### --pr (Legacy - NOT ALLOWED)
+- ❌ **BLOCKED**: All branches in unify_2_1_dm_synapse_env_d10
+- 💡 **Error Message**: "Must use --pr-s, --pr-d, or --pr-m for this repository"
+- ✅ **ALLOWED**: All other repositories (targets `develop`)
+
+## Best Practices
+
+1. **Let pre-commit work** - Don't use `--no-verify` unless absolutely necessary
+2. **Atomic commits** - One logical change per commit
+3. **Descriptive messages** - Emoji + type + clear description
+4. **Review before commit** - Always check `git diff`
+5. **Clean history** - Split complex changes into multiple commits
+6. **Trust the hooks** - They maintain code quality automatically
+7. **Use correct PR flag** - `--pr-s` for features, `--pr-d` for staging promotion, `--pr-m` for production
+8. **Link work items** - Reference Azure DevOps work items in commit messages (e.g., "#43815") to enable automatic PR linking
+9. **Validate branch** - Ensure you're on the correct branch before using `--pr-d` or `--pr-m`
+10. **Work item linking** - Work items linked to PRs will automatically receive comments with commit details
+11. **Keep stakeholders informed** - Use PR flags to ensure work items are automatically updated with progress
+
+## Example Workflows
+
+### Simple Commit
+```bash
+/commit "fix: resolve enum import error"
+```
+
+### Commit with Work Item
+```bash
+/commit "feat: add enum imports for Synapse environment"
+```
+
+### Commit and Create PR (Feature to Staging)
+```bash
+/commit --pr-s "feat: refactor commit command with ADO MCP integration"
+```
+This will:
+1. Create commit locally
+2. Push to current branch
+3. Create PR: `feature/xyz → staging`
+4. Link work items automatically if mentioned in commit message
+
+### Promote Staging to Develop
+```bash
+# First checkout staging branch
+git checkout staging
+git pull origin staging
+
+# Then commit and create PR
+/commit --pr-d "release: promote staging changes to develop"
+```
+This will:
+1. Create commit on `staging` branch
+2. Push to `staging`
+3. Create PR: `staging → develop`
+
+### Promote Develop to Main (Production)
+```bash
+# First checkout develop branch
+git checkout develop
+git pull origin develop
+
+# Then commit and create PR
+/commit --pr-m "release: promote develop to production"
+```
+This will:
+1. Create commit on `develop` branch
+2. Push to `develop`
+3. Create PR: `develop → main`
+
+### Error: Using --pr without suffix
+```bash
+/commit --pr "feat: some feature"
+```
+**Result**: ERROR - unify_2_1_dm_synapse_env_d10 requires explicit PR target (`--pr-s`, `--pr-d`, or `--pr-m`)
+
+### Feature PR with Automatic Work Item Comments
+```bash
+# On feature/xyz branch
+/commit --pr-s "feat(user-auth): implement OAuth2 authentication #12345"
+```
+This will:
+1. Create commit on feature branch
+2. Push to feature branch
+3. Create PR: `feature/xyz → staging`
+4. Link work item #12345 to the PR
+5. Automatically add comment to work item #12345 with:
+   - PR title and number
+   - Commit message and SHA
+   - File changes summary
+   - Link to PR in Azure DevOps
+   - Link to commit in Azure DevOps
+   - (No footer attribution text)
+
+### Staging to Develop PR with Multiple Work Items
+```bash
+# On staging branch
+/commit --pr-d "release: promote staging to develop - fixes #12345, #67890"
+```
+This will:
+1. Create commit on `staging` branch
+2. Push to `staging`
+3. Create PR: `staging → develop`
+4. Link work items #12345 and #67890 to the PR
+5. Automatically add comments to both work items with PR and commit details (without footer attribution)
+
+**Note**: Work items are automatically detected from commit message and linked to PR. Comments are added automatically to all linked work items without any footer text.
--- a/commands/multi-agent.md
+++ b/commands/multi-agent.md
@@ -0,0 +1,125 @@
+---
+description: Discuss multi-agent workflow strategy for a specific task
+argument-hint: [task-description]
+allowed-tools: Read, Task, TodoWrite
+---
+
+# Multi-Agent Workflow Discussion
+
+Prepare to discuss how you will use a multi-agent workflow to ${ARGUMENTS}.
+
+## Instructions
+
+1. **Analyze the Task**: ${ARGUMENTS}
+   - Break down the complexity
+   - Identify parallelizable components
+   - Determine if multi-agent approach is optimal
+
+2. **Evaluate Approach**:
+   - Should this use `/background` (single agent) or `/orchestrate` (multiple agents)?
+   - How many agents would be optimal?
+   - What are the dependencies between subtasks?
+
+3. **Design Strategy**:
+   - Outline the orchestration plan
+   - Define agent responsibilities
+   - Specify communication format (JSON responses)
+   - Identify quality gates
+
+4. **Provide Recommendations**:
+   - Best command to use (`/background` vs `/orchestrate`)
+   - Number of agents (if using orchestrate: 2-8)
+   - Subtask breakdown
+   - Estimated completion time
+   - Resource requirements
+
+## Output Format
+
+Provide a structured discussion covering:
+
+1. **Task Analysis**: Understanding of "${ARGUMENTS}"
+2. **Complexity Assessment**: Simple/Moderate/High
+3. **Recommended Approach**: `/background` or `/orchestrate`
+4. **Agent Breakdown** (if orchestrate):
+   - Agent 1: Responsibility
+   - Agent 2: Responsibility
+   - ...
+5. **Dependencies**: None / Sequential / Hybrid
+6. **Estimated Time**: X minutes
+7. **Next Steps**: Commands to run
+
+## Examples
+
+### Example 1: Simple Task
+```
+User: /aa_command "optimize the g_x_mg_statsclasscount table"
+
+Output:
+Task Analysis: Single gold table optimization - focused work
+Complexity: Moderate
+Recommended: /background
+Reasoning: Single file, sequential optimization steps
+Estimated Time: 15 minutes
+Next Step: /background "Optimize g_x_mg_statsclasscount.py for performance"
+```
+
+### Example 2: Complex Task
+```
+User: /aa_command "implement data quality framework across all layers"
+
+Output:
+Task Analysis: Cross-layer feature implementation
+Complexity: High
+Recommended: /orchestrate
+Agent Breakdown:
+  - Agent 1: Design quality schema and base classes
+  - Agent 2: Implement bronze layer validators
+  - Agent 3: Implement silver layer validators
+  - Agent 4: Implement gold layer validators
+  - Agent 5: Create validation tests
+  - Agent 6: Update documentation
+Dependencies: Agent 1 must complete before 2-4 (hybrid approach)
+Estimated Time: 45-60 minutes
+Next Step: Create task file and run /orchestrate data_quality_framework.md
+```
+
+### Example 3: Medium Complexity
+```
+User: /aa_command "fix all linting errors in the silver layer"
+
+Output:
+Task Analysis: Code quality fixes across multiple files
+Complexity: Moderate-High
+Recommended: /orchestrate (if >15 files) or /background (if <15 files)
+Agent Breakdown:
+  - Agent 1: Fix linting in silver_cms files
+  - Agent 2: Fix linting in silver_fvms files
+  - Agent 3: Fix linting in silver_nicherms files
+Dependencies: None (fully parallel)
+Estimated Time: 20-30 minutes
+Next Step: /orchestrate "Fix linting errors: silver_cms, silver_fvms, silver_nicherms in parallel"
+```
+
+## Usage
+
+```bash
+# Discuss strategy for any task
+/aa_command "optimize all gold tables for performance"
+
+# Get recommendations for feature implementation
+/aa_command "add monitoring and alerting to the pipeline"
+
+# Plan refactoring work
+/aa_command "refactor all ETL classes to use new base class pattern"
+
+# Evaluate testing strategy
+/aa_command "write comprehensive tests for the medallion architecture"
+```
+
+## Notes
+
+- This command helps you plan before executing
+- Use this to determine optimal agent strategy
+- Creates a blueprint for `/background` or `/orchestrate` commands
+- Considers parallelism, dependencies, and complexity
+- Provides concrete next steps and command examples
--- a/commands/my-devops-tasks.md
+++ b/commands/my-devops-tasks.md
@@ -0,0 +1,54 @@
+# ADO MCP Task Retrieval Prompt
+
+Use the Azure DevOps MCP tools to retrieve all user stories and tasks assigned to me that are currently in "New", "Active", "Committed", or "Backlog" states. Create a comprehensive markdown document with the following structure:
+
+## Query Parameters
+- **Assigned To**: @Me
+- **Work Item Types**: User Story, Task, Bug
+- **States**: New, Active, Committed, Backlog
+- **Include**: All active iterations and backlog
+
+## Required Output Format
+
+```markdown
+# My Active Work Items
+
+## Summary
+- **Total Items**: {count}
+- **By Type**: {breakdown by work item type}
+- **By State**: {breakdown by state}
+- **Last Updated**: {current date}
+
+## Work Items
+
+### {Work Item Type} - {ID}: {Title}
+**URL** {URL to work item}
+**Status**: {State} | **Priority**: {Priority} | **Effort**: {Story Points/Original Estimate}
+**Iteration**: S{Iteration Path} | **Area**: {Area Path}
+
+**Description Summary**: 
+{Provide a 2-3 sentence summary of the description/acceptance criteria}
+
+**Key Details**:
+- **Created**: {Created Date}
+- **Tags**: {Tags if any}
+- **Parent**: {Parent work item if applicable}
+
+**[View in ADO]({URL to work item})**
+
+---
+```
+
+## Specific Requirements
+
+1. **Summarize Descriptions**: For each work item, provide a concise 2-3 sentence summary of the description and acceptance criteria, focusing on the core objective and deliverables.
+
+2. **Clickable URLs**: Ensure all Azure DevOps URLs are properly formatted as clickable markdown links. including the actual work item
+
+3. **Sort Order**: Sort by Priority (High to Low), then by State (Active, Committed, New, Backlog), then by Story Points/Effort (High to Low).
+
+4. **Data Validation**: If any work items have missing key fields (Priority, Effort, etc.), note this in the output.
+
+5. **Additional Context**: Include any relevant comments from the last 7 days if present.
+
+Execute this query and generate the markdown document with all my currently assigned work items.
--- a/commands/orchestrate.md
+++ b/commands/orchestrate.md
@@ -0,0 +1,510 @@
+---
+description: Orchestrate multiple generic agents working in parallel on complex tasks
+argument-hint: [user-prompt] | [task-file-name]
+allowed-tools: Read, Task, TodoWrite
+---
+
+# Multi-Agent Orchestrator
+
+Launch an orchestrator agent that coordinates multiple generic agents working in parallel on complex, decomposable tasks. All agents communicate via JSON format for structured coordination.
+
+## Usage
+
+**Option 1: Direct prompt**
+```
+/orchestrate "Analyze all gold tables, identify optimization opportunities, and implement improvements across the codebase"
+```
+
+**Option 2: Task file from .claude/tasks/**
+```
+/orchestrate multi_agent_pipeline_optimization.md
+```
+
+**Option 3: List available orchestration tasks**
+```
+/orchestrate list
+```
+
+## Variables
+
+- `TASK_INPUT`: Either a direct prompt string or a task file name from `.claude/tasks/`
+- `TASK_FILE_PATH`: Full path to task file if using a task file
+- `PROMPT_CONTENT`: The actual prompt to send to the orchestrator agent
+
+## Instructions
+
+### 1. Determine Task Source
+
+Check if `$ARGUMENTS` looks like a file name (ends with `.md` or contains no spaces):
+- If YES: It's a task file name from `.claude/tasks/`
+- If NO: It's a direct user prompt
+- If "list": Show available orchestration task files
+
+### 2. Load Task Content
+
+**If using task file:**
+1. List all available task files in `.claude/tasks/` directory
+2. Find the task file matching the provided name (exact match or partial match)
+3. Read the task file content
+4. Use the full task file content as the prompt
+
+**If using direct prompt:**
+1. Use the `$ARGUMENTS` directly as the prompt
+
+**If "list" command:**
+1. Show all available orchestration task files with metadata
+2. Exit without launching agents
+
+### 3. Launch Orchestrator Agent
+
+Launch the orchestrator agent using the Task tool with the following configuration:
+
+**Important Configuration:**
+- **subagent_type**: `general-purpose`
+- **model**: `sonnet` (default) or `opus` for highly complex orchestrations
+- **description**: Short 3-5 word description (e.g., "Orchestrate pipeline optimization")
+- **prompt**: Complete orchestrator instructions (see template below)
+
+**Orchestrator Prompt Template:**
+```
+You are an ORCHESTRATOR AGENT coordinating multiple generic worker agents on a complex project task.
+
+PROJECT CONTEXT:
+- Project: Unify 2.1 Data Migration using Azure Synapse Analytics
+- Architecture: Medallion pattern (Bronze/Silver/Gold layers)
+- Primary Language: PySpark Python
+- Follow: .claude/CLAUDE.md and .claude/rules/python_rules.md
+
+YOUR ORCHESTRATOR RESPONSIBILITIES:
+1. Analyze the main task and decompose it into 2-8 independent subtasks
+2. Launch multiple generic worker agents (use Task tool with subagent_type="general-purpose")
+3. Provide each worker agent with:
+   - Clear, self-contained instructions
+   - Required context (file paths, requirements)
+   - Expected JSON response format
+4. Collect and aggregate all worker responses
+5. Validate completeness and consistency
+6. Produce final consolidated report
+
+MAIN TASK TO ORCHESTRATE:
+{TASK_CONTENT}
+
+WORKER AGENT COMMUNICATION PROTOCOL:
+Each worker agent MUST return results in this JSON format:
+```json
+{
+  "agent_id": "unique_identifier",
+  "task_assigned": "brief description",
+  "status": "completed|failed|partial",
+  "results": {
+    "files_modified": ["path/to/file1.py", "path/to/file2.py"],
+    "changes_summary": "description of changes",
+    "metrics": {
+      "lines_added": 0,
+      "lines_removed": 0,
+      "functions_added": 0,
+      "issues_fixed": 0
+    }
+  },
+  "quality_checks": {
+    "syntax_check": "passed|failed",
+    "linting": "passed|failed",
+    "formatting": "passed|failed"
+  },
+  "issues_encountered": ["issue1", "issue2"],
+  "recommendations": ["recommendation1", "recommendation2"],
+  "execution_time_seconds": 0
+}
+```
+
+WORKER AGENT PROMPT TEMPLATE:
+When launching each worker agent, use this prompt structure:
+
+```
+You are a WORKER AGENT (ID: {agent_id}) reporting to an orchestrator.
+
+CRITICAL: You MUST return your results in JSON format as specified below.
+
+PROJECT CONTEXT:
+- Read and follow: .claude/CLAUDE.md and .claude/rules/python_rules.md
+- Coding Standards: 240 char lines, no blanks in functions, type hints required
+- Use: @synapse_error_print_handler decorator, NotebookLogger, TableUtilities
+
+YOUR ASSIGNED SUBTASK:
+{subtask_description}
+
+FILES TO WORK ON:
+{file_list}
+
+REQUIREMENTS:
+{specific_requirements}
+
+QUALITY GATES (MUST RUN):
+1. python3 -m py_compile <modified_files>
+2. ruff check python_files/
+3. ruff format python_files/
+
+REQUIRED JSON RESPONSE FORMAT:
+```json
+{
+  "agent_id": "{agent_id}",
+  "task_assigned": "{subtask_description}",
+  "status": "completed",
+  "results": {
+    "files_modified": [],
+    "changes_summary": "",
+    "metrics": {
+      "lines_added": 0,
+      "lines_removed": 0,
+      "functions_added": 0,
+      "issues_fixed": 0
+    }
+  },
+  "quality_checks": {
+    "syntax_check": "passed|failed",
+    "linting": "passed|failed",
+    "formatting": "passed|failed"
+  },
+  "issues_encountered": [],
+  "recommendations": [],
+  "execution_time_seconds": 0
+}
+```
+
+Work autonomously, complete your task, run quality gates, and return the JSON response.
+```
+
+ORCHESTRATION WORKFLOW:
+1. **Task Decomposition**: Break main task into 2-8 independent subtasks
+2. **Agent Assignment**: Create unique agent IDs (agent_1, agent_2, etc.)
+3. **Parallel Launch**: Launch all worker agents simultaneously using Task tool
+4. **Monitor Progress**: Track each agent's completion
+5. **Collect Results**: Parse JSON responses from each worker agent
+6. **Validate Output**: Ensure all quality checks passed
+7. **Aggregate Results**: Combine all worker outputs
+8. **Generate Report**: Create comprehensive orchestration summary
+
+FINAL ORCHESTRATOR REPORT FORMAT:
+```json
+{
+  "orchestration_summary": {
+    "main_task": "{original task description}",
+    "total_agents_launched": 0,
+    "successful_agents": 0,
+    "failed_agents": 0,
+    "total_execution_time_seconds": 0
+  },
+  "agent_results": [
+    {worker_agent_json_response_1},
+    {worker_agent_json_response_2},
+    ...
+  ],
+  "consolidated_metrics": {
+    "total_files_modified": 0,
+    "total_lines_added": 0,
+    "total_lines_removed": 0,
+    "total_functions_added": 0,
+    "total_issues_fixed": 0
+  },
+  "quality_validation": {
+    "all_syntax_checks_passed": true,
+    "all_linting_passed": true,
+    "all_formatting_passed": true
+  },
+  "consolidated_issues": [],
+  "consolidated_recommendations": [],
+  "next_steps": []
+}
+```
+
+BEST PRACTICES:
+- Keep subtasks independent (no dependencies between worker agents)
+- Provide complete context to each worker agent
+- Launch all agents in parallel for maximum efficiency
+- Validate JSON responses from each worker
+- Aggregate metrics and results systematically
+- Flag any worker failures or incomplete results
+- Provide actionable next steps
+
+Work autonomously and orchestrate the complete task execution.
+```
+
+### 4. Inform User
+
+After launching the orchestrator, inform the user:
+- Orchestrator agent has been launched
+- Main task being orchestrated (summary)
+- Expected number of worker agents to be spawned
+- Estimated completion time (if known)
+- The orchestrator will coordinate all work and provide a consolidated JSON report
+
+## Task File Structure
+
+Expected orchestration task file format in `.claude/tasks/`:
+
+```markdown
+# Orchestration Task Title
+
+**Date Created**: YYYY-MM-DD
+**Priority**: HIGH/MEDIUM/LOW
+**Estimated Total Time**: X minutes
+**Complexity**: High/Medium/Low
+**Recommended Worker Agents**: N
+
+## Main Objective
+Clear description of the overall goal
+
+## Success Criteria
+- [ ] Criterion 1
+- [ ] Criterion 2
+- [ ] Criterion 3
+
+## Suggested Subtask Decomposition
+
+### Subtask 1: Title
+**Scope**: Files/components affected
+**Estimated Time**: X minutes
+**Dependencies**: None or list other subtasks
+
+**Description**: What needs to be done
+
+**Expected Outputs**:
+- Output 1
+- Output 2
+
+---
+
+### Subtask 2: Title
+**Scope**: Files/components affected
+**Estimated Time**: X minutes
+**Dependencies**: None or list other subtasks
+
+**Description**: What needs to be done
+
+**Expected Outputs**:
+- Output 1
+- Output 2
+
+---
+
+(Repeat for each suggested subtask)
+
+## Quality Requirements
+- All code must pass syntax validation
+- All code must pass linting
+- All code must be formatted
+- All agents must return valid JSON
+
+## Aggregation Requirements
+- How to combine results from worker agents
+- Validation steps for consolidated output
+- Reporting requirements
+```
+
+## Examples
+
+### Example 1: Pipeline Optimization
+```
+User: /orchestrate "Analyze and optimize all gold layer tables for performance"
+
+Orchestrator launches 5 worker agents:
+- agent_1: Analyze g_x_mg_* tables
+- agent_2: Analyze g_xa_* tables
+- agent_3: Review joins and aggregations
+- agent_4: Check indexing strategies
+- agent_5: Validate query plans
+
+Each agent reports back with JSON results
+Orchestrator aggregates findings and produces consolidated report
+```
+
+### Example 2: Code Quality Sweep
+```
+User: /orchestrate code_quality_improvement.md
+
+Orchestrator reads task file with 8 categories
+Launches 8 worker agents in parallel:
+- agent_1: Fix linting issues in bronze layer
+- agent_2: Fix linting issues in silver layer
+- agent_3: Fix linting issues in gold layer
+- agent_4: Add missing type hints
+- agent_5: Update error handling
+- agent_6: Improve logging
+- agent_7: Optimize imports
+- agent_8: Update documentation
+
+Collects JSON from all 8 agents
+Validates quality checks
+Produces aggregated metrics report
+```
+
+### Example 3: Feature Implementation
+```
+User: /orchestrate "Implement data validation framework across all layers"
+
+Orchestrator decomposes into:
+- agent_1: Design validation schema
+- agent_2: Implement bronze validators
+- agent_3: Implement silver validators
+- agent_4: Implement gold validators
+- agent_5: Create validation tests
+- agent_6: Update documentation
+
+Coordinates execution
+Collects results in JSON format
+Validates completeness
+Generates implementation report
+```
+
+## JSON Response Validation
+
+The orchestrator MUST validate each worker agent response contains:
+
+**Required Fields:**
+- `agent_id`: String, unique identifier
+- `task_assigned`: String, description of assigned work
+- `status`: String, one of ["completed", "failed", "partial"]
+- `results`: Object with:
+  - `files_modified`: Array of strings
+  - `changes_summary`: String
+  - `metrics`: Object with numeric values
+- `quality_checks`: Object with pass/fail values
+- `issues_encountered`: Array of strings
+- `recommendations`: Array of strings
+- `execution_time_seconds`: Number
+
+**Validation Checks:**
+- All required fields present
+- Status is valid enum value
+- Arrays are properly formatted
+- Metrics are numeric
+- Quality checks are pass/fail
+- JSON is well-formed and parseable
+
+## Agent Coordination Patterns
+
+### Pattern 1: Parallel Independent Tasks
+```
+Orchestrator launches all agents simultaneously
+No dependencies between agents
+Each agent works on separate files/components
+Results aggregated at end
+```
+
+### Pattern 2: Sequential with Handoff (Not Recommended)
+```
+Orchestrator launches agent_1
+Waits for agent_1 JSON response
+Uses agent_1 results to inform agent_2 prompt
+Launches agent_2 with context from agent_1
+Continues chain
+```
+
+### Pattern 3: Hybrid (Parallel Groups)
+```
+Orchestrator identifies 2-3 independent groups
+Launches all agents in group 1 in parallel
+Waits for group 1 completion
+Launches all agents in group 2 with context from group 1
+Aggregates results from all groups
+```
+
+## Success Criteria
+
+Orchestration task completion requires:
+- ✅ All worker agents launched successfully
+- ✅ All worker agents returned valid JSON responses
+- ✅ All quality checks passed across all agents
+- ✅ No unresolved issues or failures
+- ✅ Consolidated metrics calculated correctly
+- ✅ Comprehensive orchestration report provided
+- ✅ All files syntax validated
+- ✅ All files linted and formatted
+
+## Best Practices
+
+### For Orchestrator Design
+- Keep worker tasks independent when possible
+- Provide complete context to each worker
+- Assign unique, meaningful agent IDs
+- Specify clear JSON response requirements
+- Validate all JSON responses
+- Handle worker failures gracefully
+- Aggregate results systematically
+- Provide actionable consolidated report
+
+### For Worker Agent Design
+- Make each subtask self-contained
+- Include all necessary context in prompt
+- Specify exact file paths and requirements
+- Define clear success criteria
+- Require JSON response format
+- Include quality gate validation
+- Request execution metrics
+
+### For Task Decomposition
+- Break into 2-8 independent subtasks
+- Avoid inter-agent dependencies
+- Balance workload across agents
+- Group related work logically
+- Consider file/component boundaries
+- Respect layer separation (bronze/silver/gold)
+
+## Error Handling
+
+### Worker Agent Failures
+If a worker agent fails:
+1. Orchestrator captures failure details
+2. Marks agent status as "failed" in JSON
+3. Continues with other agents
+4. Reports failure in final summary
+5. Suggests recovery steps
+
+### JSON Parse Errors
+If worker returns invalid JSON:
+1. Orchestrator logs parse error
+2. Attempts to extract partial results
+3. Marks agent response as invalid
+4. Flags for manual review
+5. Continues with valid responses
+
+### Quality Check Failures
+If worker's quality checks fail:
+1. Orchestrator flags the failure
+2. Includes failure details in report
+3. Prevents final approval
+4. Suggests corrective actions
+5. May relaunch worker with corrections
+
+## Performance Optimization
+
+### Parallel Execution
+- Launch all independent agents simultaneously
+- Use Task tool with multiple concurrent calls
+- Maximize parallelism for faster completion
+- Monitor resource utilization
+
+### Agent Sizing
+- 2-8 agents: Optimal for most tasks
+- <2 agents: Consider using single agent instead
+- >8 agents: May have coordination overhead
+- Balance granularity vs overhead
+
+### Context Management
+- Provide minimal necessary context
+- Avoid duplicating shared information
+- Use references to shared documentation
+- Keep prompts focused and concise
+
+## Notes
+
+- Orchestrator coordinates but doesn't do actual code changes
+- Worker agents are general-purpose and autonomous
+- All communication uses structured JSON format
+- Quality validation is mandatory across all agents
+- Failed agents don't block other agents
+- Orchestrator produces human-readable summary
+- JSON enables programmatic result processing
+- Pattern scales from 2 to 8 parallel agents
+- Best for complex, decomposable tasks
+- Overkill for simple, atomic tasks
--- a/commands/performance-monitoring.md
+++ b/commands/performance-monitoring.md
@@ -0,0 +1,84 @@
+---
+allowed-tools: Read, Bash, Grep, Glob
+argument-hint: [monitoring-type] | --apm | --rum | --custom
+description: Setup comprehensive application performance monitoring with metrics, alerting, and observability
+
+---
+
+# Add Performance Monitoring
+
+Setup application performance monitoring: **$ARGUMENTS**
+
+## Instructions
+
+1. **Performance Monitoring Strategy**
+   - Define key performance indicators (KPIs) and service level objectives (SLOs)
+   - Identify critical user journeys and performance bottlenecks
+   - Plan monitoring architecture and data collection strategy
+   - Assess existing monitoring infrastructure and integration points
+   - Define alerting thresholds and escalation procedures
+
+2. **Application Performance Monitoring (APM)**
+   - Set up comprehensive APM solution (New Relic, Datadog, AppDynamics)
+   - Configure distributed tracing for request lifecycle visibility
+   - Implement custom metrics and performance tracking
+   - Set up transaction monitoring and error tracking
+   - Configure performance profiling and diagnostics
+
+3. **Real User Monitoring (RUM)**
+   - Implement client-side performance tracking and web vitals monitoring
+   - Set up user experience metrics collection (LCP, FID, CLS, TTFB)
+   - Configure custom performance metrics for user interactions
+   - Monitor page load performance and resource loading
+   - Track user journey performance across different devices
+
+4. **Server Performance Monitoring**
+   - Monitor system metrics (CPU, memory, disk, network)
+   - Set up process and application-level monitoring
+   - Configure event loop lag and garbage collection monitoring
+   - Implement custom server performance metrics
+   - Monitor resource utilization and capacity planning
+
+5. **Database Performance Monitoring**
+   - Track database query performance and slow query identification
+   - Monitor database connection pool utilization
+   - Set up database performance metrics and alerting
+   - Implement query execution plan analysis
+   - Monitor database resource usage and optimization opportunities
+
+6. **Error Tracking and Monitoring**
+   - Implement comprehensive error tracking (Sentry, Bugsnag, Rollbar)
+   - Configure error categorization and impact analysis
+   - Set up error alerting and notification systems
+   - Track error trends and resolution metrics
+   - Implement error context and debugging information
+
+7. **Custom Metrics and Dashboards**
+   - Implement business metrics tracking (Prometheus, StatsD)
+   - Create performance dashboards and visualizations
+   - Configure custom alerting rules and thresholds
+   - Set up performance trend analysis and reporting
+   - Implement performance regression detection
+
+8. **Alerting and Notification System**
+   - Configure intelligent alerting based on performance thresholds
+   - Set up multi-channel notifications (email, Slack, PagerDuty)
+   - Implement alert escalation and on-call procedures
+   - Configure alert fatigue prevention and noise reduction
+   - Set up performance incident management workflows
+
+9. **Performance Testing Integration**
+   - Integrate monitoring with load testing and performance testing
+   - Set up continuous performance testing and monitoring
+   - Configure performance baseline tracking and comparison
+   - Implement performance test result analysis and reporting
+   - Monitor performance under different load scenarios
+
+10. **Performance Optimization Recommendations**
+    - Generate actionable performance insights and recommendations
+    - Implement automated performance analysis and reporting
+    - Set up performance optimization tracking and measurement
+    - Configure performance improvement validation
+    - Create performance optimization prioritization frameworks
+
+Focus on monitoring strategies that provide actionable insights for performance optimization. Ensure monitoring overhead is minimal and doesn't impact application performance.
--- a/commands/pr-deploy-workflow.md
+++ b/commands/pr-deploy-workflow.md
@@ -0,0 +1,268 @@
+---
+model: claude-haiku-4-5-20251001
+allowed-tools: SlashCommand, Bash(git:*), mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_requests_by_repo_or_project
+argument-hint: [commit-message]
+description: Complete deployment workflow - commit, PR to staging, review, then staging to develop
+---
+
+
+# Complete Deployment Workflow
+
+Automates the full deployment workflow with integrated PR review:
+1. Commit feature changes and create PR to staging
+2. Automatically review the PR for quality and standards
+3. Fix any issues identified in review (with iteration loop)
+4. After PR is approved and merged, create PR from staging to develop
+
+## What This Does
+
+1. Calls `/pr-feature-to-staging` to commit and create feature → staging PR
+2. Calls `/pr-review` to automatically review the PR
+3. If review identifies issues → calls `/pr-fix-pr-review` and loops back to review
+4. If review passes → waits for user to merge staging PR
+5. Calls `/pr-staging-to-develop` to create staging → develop PR
+
+## Implementation Logic
+
+### Step 1: Create Feature PR to Staging
+Use `SlashCommand` tool to execute:
+```
+/pr-feature-to-staging $ARGUMENTS
+```
+
+**Expected Output:**
+- PR URL and PR ID
+- Work item comments added
+- Source and target branches confirmed
+
+**Extract from output:**
+- PR ID (needed for review step)
+- PR number (for user reference)
+
+### Step 2: Automated PR Review
+Use `SlashCommand` tool to execute:
+```
+/pr-review [PR_ID]
+```
+
+**The review will evaluate:**
+- Code quality and maintainability
+- PySpark best practices
+- ETL pattern compliance
+- Standards compliance from `.claude/rules/python_rules.md`
+- DevOps considerations
+- Merge conflicts
+
+**Review Outcomes:**
+
+#### Outcome A: Review Passes (PR Approved)
+Review output will indicate:
+- "PR approved and set to auto-complete"
+- No active review comments requiring changes
+- All quality gates passed
+
+**Action:** Proceed to Step 4
+
+#### Outcome B: Review Requires Changes
+Review output will indicate:
+- Active review comments with specific issues
+- Quality standards not met
+- Files requiring modifications
+
+**Action:** Proceed to Step 3
+
+### Step 3: Fix Review Issues (if needed)
+**Only execute if Step 2 identified issues**
+
+Use `SlashCommand` tool to execute:
+```
+/pr-fix-pr-review [PR_ID]
+```
+
+**This will:**
+1. Retrieve all active review comments
+2. Make code changes to address feedback
+3. Run quality gates (syntax, lint, format)
+4. Commit fixes and push to feature branch
+5. Reply to review threads
+6. Update the PR automatically
+
+**After fixes are applied:**
+- Loop back to Step 2 to re-review
+- Continue iterating until review passes
+
+**Iteration Logic:**
+```
+LOOP while review has active issues:
+  1. /pr-fix-pr-review [PR_ID]
+  2. /pr-review [PR_ID]
+  3. Check review outcome
+  4. If approved → exit loop
+  5. If still has issues → continue loop
+END LOOP
+```
+
+### Step 4: Wait for Staging PR Merge
+After PR review passes and is approved, inform user:
+```
+✅ PR Review Passed - PR Approved and Ready
+
+PR #[PR_ID] has been reviewed and approved with auto-complete enabled.
+
+Review Summary:
+- Code quality: ✓ Passed
+- PySpark best practices: ✓ Passed
+- ETL patterns: ✓ Passed
+- Standards compliance: ✓ Passed
+- No merge conflicts
+
+Next Steps:
+1. The PR will auto-merge when all policies are satisfied
+2. Once merged to staging, I'll create the staging → develop PR
+
+Would you like me to:
+a) Create the staging → develop PR now (if staging merge is complete)
+b) Wait for you to confirm the staging merge
+c) Check the PR status
+
+Enter choice (a/b/c):
+```
+
+**User Responses:**
+- **a**: Immediately proceed to Step 5
+- **b**: Wait for user confirmation, then proceed to Step 5
+- **c**: Use `mcp__ado__repo_get_pull_request_by_id` to check if PR is merged, then guide user
+
+### Step 5: Create Staging to Develop PR
+Use `SlashCommand` tool to execute:
+```
+/pr-staging-to-develop
+```
+
+**This will:**
+1. Create PR: staging → develop
+2. Handle any merge conflicts
+3. Return PR URL for tracking
+
+**Final Output:**
+```
+🚀 Deployment Workflow Complete
+
+Feature → Staging:
+- PR #[PR_ID] - Reviewed and Merged ✓
+
+Staging → Develop:
+- PR #[NEW_PR_ID] - Created and Ready for Review
+- URL: [PR_URL]
+
+Summary:
+1. Feature PR created and reviewed
+2. All quality gates passed
+3. PR approved and merged to staging
+4. Staging PR created for develop
+
+The workflow is complete. The staging → develop PR is now ready for final review and deployment.
+```
+
+## Example Usage
+
+### Full Workflow with Work Item
+```bash
+/deploy-workflow "feat(gold): add X_MG_Offender linkage table #45497"
+```
+
+**This will:**
+1. Create commit on feature branch
+2. Create PR: feature → staging
+3. Comment on work item #45497
+4. Automatically review PR for quality
+5. Fix any issues identified (with iteration)
+6. Wait for staging PR merge
+7. Create PR: staging → develop
+
+### Full Workflow Without Work Item
+```bash
+/deploy-workflow "refactor: optimise session management"
+```
+
+**This will:**
+1. Create commit on feature branch
+2. Create PR: feature → staging
+3. Automatically review PR
+4. Fix any issues (iterative)
+5. Wait for merge confirmation
+6. Create staging → develop PR
+
+## Review Iteration Example
+
+**Scenario:** Review finds 3 issues in the initial PR
+
+```
+Step 1: /pr-feature-to-staging "feat: add new table"
+  → PR #5678 created
+
+Step 2: /pr-review 5678
+  → Found 3 issues:
+    - Missing type hints in function
+    - Line exceeds 240 characters
+    - Missing @synapse_error_print_handler decorator
+
+Step 3: /pr-fix-pr-review 5678
+  → Fixed all 3 issues
+  → Committed and pushed
+  → PR updated
+
+Step 2 (again): /pr-review 5678
+  → All issues resolved
+  → PR approved ✓
+
+Step 4: Wait for merge confirmation
+
+Step 5: /pr-staging-to-develop
+  → PR #5679 created (staging → develop)
+
+Complete!
+```
+
+## Error Handling
+
+### PR Creation Fails
+- Display error from `/pr-feature-to-staging`
+- Guide user to resolve (branch validation, git issues)
+- Do not proceed to review step
+
+### Review Cannot Complete
+- Display specific blocker (merge conflicts, missing files)
+- Guide user to manual resolution
+- Offer to retry review after fix
+
+### Fix PR Review Fails
+- Display specific errors (quality gates, git issues)
+- Offer manual intervention option
+- Allow user to fix locally and skip to next step
+
+### Staging PR Already Exists
+- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` to check existing PRs
+- Inform user of existing PR
+- Ask if they want to create anyway or use existing
+
+## Notes
+
+- **Automated Review**: Quality gates are enforced automatically
+- **Iterative Fixes**: Will loop through fix → review until approved
+- **Semi-Automated Merge**: User must confirm staging merge before final PR
+- **Work Item Tracking**: Automatic comments on linked work items
+- **Quality First**: Won't proceed if review fails and can't auto-fix
+- **Graceful Degradation**: Offers manual intervention at each step if automatisation fails
+
+## Quality Gates Enforced
+
+The integrated `/pr-review` checks:
+1. Code quality (type hints, line length, formatting)
+2. PySpark best practices (DataFrame ops, logging, session mgmt)
+3. ETL pattern compliance (class structure, decorators)
+4. Standards from `.claude/rules/python_rules.md`
+5. No merge conflicts
+6. Proper error handling
+
+All must pass before proceeding to staging → develop PR.
--- a/commands/pr-feature-to-staging.md
+++ b/commands/pr-feature-to-staging.md
@@ -0,0 +1,233 @@
+---
+model: claude-haiku-4-5-20251001
+allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__*, mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item, Read, Glob
+argument-hint:
+description: Automatically analyze changes and create PR from current feature branch to staging
+---
+
+# Create Feature PR to Staging
+
+Automatically analyzes repository changes, generates appropriate commit message, and creates pull request to `staging`.
+
+## Repository Configuration
+- **Project**: Program Unify
+- **Repository ID**: e030ea00-2f85-4b19-88c3-05a864d7298d
+- **Repository Name**: unify_2_1_dm_synapse_env_d10
+- **Target Branch**: `staging` (fixed)
+- **Source Branch**: Current feature branch
+
+## Current Repository State
+
+- Git status: !`git status --short`
+- Current branch: !`git branch --show-current`
+- Staged changes: !`git diff --cached --stat`
+- Unstaged changes: !`git diff --stat`
+- Recent commits: !`git log --oneline -5`
+
+## Implementation Logic
+
+### 1. Validate Current Branch
+- Get current branch: `git branch --show-current`
+- **REQUIRE**: Branch must start with `feature/`
+- **BLOCK**: `staging`, `develop`, `main` branches
+- If validation fails: Show clear error and exit
+
+### 2. Analyze Changes and Generate Commit Message
+- Run `git status --short` to see modified files
+- Run `git diff --stat` to see change statistics
+- Run `git diff` to analyze actual code changes
+- **Automatically determine**:
+  - **Type**: Based on file changes (feat, fix, refactor, docs, test, chore, etc.)
+  - **Scope**: From file paths (bronze, silver, gold, utilities, pipeline, etc.)
+  - **Description**: Concise summary of what changed (e.g., "add person address table", "fix deduplication logic")
+  - **Work Items**: Extract from branch name pattern (e.g., feature/46225-description → #46225)
+- **Analysis Rules**:
+  - New files in gold/silver/bronze → `feat`
+  - Modified transformation logic → `refactor` or `fix`
+  - Test files → `test`
+  - Documentation → `docs`
+  - Utilities/session_optimiser → `refactor` or `feat`
+  - Multiple file types → prioritize feat > fix > refactor
+  - Gold layer → scope: `(gold)`
+  - Silver layer → scope: `(silver)` or `(silver_<database>)`
+  - Bronze layer → scope: `(bronze)`
+- Generate commit message in format: `emoji type(scope): description #workitem`
+
+### 3. Execute Commit Workflow
+- Stage all changes: `git add .`
+- Create commit with auto-generated emoji conventional format
+- Run pre-commit hooks (ruff lint/format, YAML validation, etc.)
+- Push to current feature branch
+
+### 4. Create Pull Request
+- Use `mcp__ado__repo_create_pull_request` with:
+  - `repositoryId`: e030ea00-2f85-4b19-88c3-05a864d7298d
+  - `sourceRefName`: Current feature branch (refs/heads/feature/*)
+  - `targetRefName`: refs/heads/staging
+  - `title`: Extract from auto-generated commit message
+  - `description`: Brief summary with bullet points based on analyzed changes
+- Return PR URL to user
+
+### 5. Add Work Item Comments (Automatic)
+If PR creation was successful:
+- Get work items linked to PR using `mcp__ado__repo_get_pull_request_by_id`
+- For each linked work item:
+  - Verify work item exists with `mcp__ado__wit_get_work_item`
+  - Generate comment with:
+    - PR title and number
+    - Commit message and SHA
+    - File changes summary from `git diff --stat`
+    - Link to PR in Azure DevOps
+    - Link to commit in Azure DevOps
+  - Add comment using `mcp__ado__wit_add_work_item_comment`
+  - Use HTML format for rich formatting
+- **IMPORTANT**: Do NOT include footer attribution text
+- **IMPORTANT**: always use australian english in all messages and descriptions
+- **IMPORTANT**: do not mention that you are using australian english in all messages and descriptions
+
+## Commit Message Format
+
+### Type + Emoji Mapping
+- ✨ `feat`: New feature
+- 🐛 `fix`: Bug fix
+- 📝 `docs`: Documentation
+- 💄 `style`: Formatting/style
+- ♻️ `refactor`: Code refactoring
+- ⚡️ `perf`: Performance improvements
+- ✅ `test`: Tests
+- 🔧 `chore`: Tooling, configuration
+- 🚀 `ci`: CI/CD improvements
+- 🗃️ `db`: Database changes
+- 🔥 `fix`: Remove code/files
+- 📦️ `chore`: Dependencies
+- 🚸 `feat`: UX improvements
+- 🦺 `feat`: Validation
+
+### Example Format
+```
+✨ feat(gold): add X_MG_Offender linkage table #45497
+```
+
+### Auto-Generation Logic
+
+**File Path Analysis**:
+- `python_files/gold/*.py` → scope: `(gold)`
+- `python_files/silver/s_fvms_*.py` → scope: `(silver_fvms)` or `(silver)`
+- `python_files/silver/s_cms_*.py` → scope: `(silver_cms)` or `(silver)`
+- `python_files/bronze/*.py` → scope: `(bronze)`
+- `python_files/utilities/*.py` → scope: `(utilities)`
+- `python_files/pipeline_operations/*.py` → scope: `(pipeline)`
+- `python_files/testing/*.py` → scope: `(test)`
+- `.claude/**`, `*.md` → scope: `(docs)`
+
+**Change Type Detection**:
+- New files (`A` in git status) → `feat` ✨
+- Modified transformation/ETL files → `refactor` ♻️
+- Bug fixes (keywords: fix, bug, error, issue) → `fix` 🐛
+- Test files → `test` ✅
+- Documentation files → `docs` 📝
+- Configuration files → `chore` 🔧
+
+**Description Generation**:
+- Extract meaningful operation from file names and diffs
+- New table: "add <table_name> table"
+- Modified logic: "improve/update <functionality>"
+- Bug fix: "fix <issue_description>"
+- Refactor: "refactor <component> for <reason>"
+
+**Work Item Extraction**:
+- Branch name pattern: `feature/<number>-description` → `#<number>`
+- Multiple numbers: Extract first occurrence
+- No number in branch: No work item reference added
+
+## What This Command Does
+
+1. Validates you're on a feature branch (feature/*)
+2. Analyzes git changes to determine type, scope, and description
+3. Extracts work item numbers from branch name
+4. Auto-generates commit message with conventional emoji format
+5. Stages all modified files
+6. Creates commit with auto-generated message
+7. Runs pre-commit hooks (auto-fixes code quality issues)
+8. Pushes to current feature branch
+9. Creates PR from feature branch → staging
+10. Automatically adds comments to linked work items with PR details
+
+## Pre-Commit Hooks
+
+Your project uses pre-commit with:
+- **Ruff**: Linting with auto-fix + formatting
+- **Standard hooks**: Trailing whitespace, YAML/JSON validation
+- **Security**: Private key detection
+
+Pre-commit hooks will auto-fix issues and may modify files. The commit process will:
+1. Run hooks
+2. Auto-stage modified files
+3. Complete commit with fixes applied
+
+## Example Usage
+
+### Automatic Feature PR
+```bash
+/pr-feature-to-staging
+```
+**On branch**: `feature/46225-add-person-address-table`
+**Changed files**: `python_files/gold/g_occ_person_address.py` (new file)
+
+**Auto-generated commit**: `✨ feat(gold): add person address table #46225`
+
+This will:
+1. Analyze changes (new gold layer file)
+2. Extract work item #46225 from branch name
+3. Auto-generate commit message
+4. Commit and push to feature branch
+5. Create PR: `feature/46225-add-person-address-table → staging`
+6. Link work item #46225
+7. Add automatic comment to work item #46225 with PR details
+
+### Multiple File Changes
+**On branch**: `feature/46789-refactor-deduplication`
+**Changed files**:
+- `python_files/silver/s_fvms_incident.py` (modified)
+- `python_files/silver/s_cms_offence_report.py` (modified)
+- `python_files/utilities/session_optimiser.py` (modified)
+
+**Auto-generated commit**: `♻️ refactor(silver): improve deduplication logic #46789`
+
+### Fix Bug
+**On branch**: `feature/47123-fix-timestamp-parsing`
+**Changed files**: `python_files/utilities/session_optimiser.py` (modified, TableUtilities.clean_date_time_columns)
+
+**Auto-generated commit**: `🐛 fix(utilities): correct timestamp parsing for null values #47123`
+
+## Error Handling
+
+### Not on Feature Branch
+```bash
+# Error: On staging branch
+/pr-feature-to-staging
+```
+**Result**: ERROR - Must be on feature/* branch. Current: staging
+
+### Invalid Branch
+```bash
+# Error: On develop or main branch
+/pr-feature-to-staging
+```
+**Result**: ERROR - Cannot create feature PR from develop/main branch
+
+### No Changes to Commit
+```bash
+# Error: Working directory clean
+/pr-feature-to-staging
+```
+**Result**: ERROR - No changes to commit. Working directory is clean.
+
+## Best Practices
+
+1. **Work on feature branches** - Always create PRs from `feature/*` branches
+2. **Include work item in branch name** - Use pattern `feature/<work-item>-description` (e.g., `feature/46225-add-person-address`)
+3. **Make focused changes** - Keep changes related to a single feature/fix for accurate commit message generation
+4. **Let pre-commit work** - Hooks maintain code quality automatically
+5. **Review changes** - Check `git status` before running command to ensure only intended files are modified
+6. **Trust the automation** - The command analyzes your changes and generates appropriate conventional commit messages
--- a/commands/pr-fix-pr-review.md
+++ b/commands/pr-fix-pr-review.md
@@ -0,0 +1,294 @@
+---
+model: claude-haiku-4-5-20251001
+allowed-tools: Bash(git:*), Read, Edit, Write, Task, mcp__*, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_reply_to_comment, mcp__ado__repo_resolve_comment, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment
+argument-hint: [PR_ID]
+description: Address PR review feedback and update pull request
+---
+
+# Fix PR Review Issues
+
+Address feedback from PR review comments, make necessary code changes, and update the pull request.
+
+## Repository Configuration
+- **Project**: Program Unify
+- **Repository ID**: d3fa6f02-bfdf-428d-825c-7e7bd4e7f338
+- **Repository Name**: unify_2_1_dm_synapse_env_d10
+
+## What This Does
+
+1. Retrieves PR details and all active review comments
+2. Analyzes review feedback and identifies required changes
+3. Makes code changes to address each review comment
+4. Commits changes with descriptive message
+5. Pushes to feature branch (automatically updates PR)
+6. Replies to review threads confirming fixes
+7. Resolves review threads when appropriate
+
+## Implementation Logic
+
+### 1. Get PR Information
+- Use \`mcp__ado__repo_get_pull_request_by_id\` with PR_ID from \`$ARGUMENTS\`
+- Extract source branch, target branch, and PR title
+- Validate PR is still active
+
+### 2. Retrieve Review Comments
+- Use \`mcp__ado__repo_list_pull_request_threads\` to get all threads
+- Filter for active threads (status = "Active")
+- For each thread, use \`mcp__ado__repo_list_pull_request_thread_comments\` to get details
+- Display all review comments with:
+  - File path and line number
+  - Reviewer name
+  - Comment content
+  - Thread ID (for later replies)
+
+### 3. Checkout Feature Branch
+\`\`\`bash
+git fetch origin
+git checkout <source-branch-name>
+git pull origin <source-branch-name>
+\`\`\`
+
+### 4. Address Each Review Comment
+
+**Categorise review comments first:**
+
+#### Standard Code Quality Issues
+Handle directly with Edit tool for:
+- Type hints
+- Line length violations
+- Formatting issues
+- Missing decorators
+- Import organization
+- Variable naming
+
+**Implementation:**
+1. Read affected file using Read tool
+2. Analyze the feedback and determine required changes
+3. Make code changes using Edit tool
+4. Validate changes meet project standards
+
+#### Complex PySpark Issues
+**Use pyspark-engineer agent for:**
+- Performance optimisation requests
+- Partitioning strategy changes
+- Shuffle optimisation
+- Broadcast join refactoring
+- Memory management improvements
+- Medallion architecture violations
+- Complex transformation logic
+
+**Trigger criteria:**
+- Review comment mentions: "performance", "optimisation", "partitioning", "shuffle", "memory", "medallion", "bronze/silver/gold layer"
+- Files affected in: \`python_files/pipeline_operations/\`, \`python_files/silver/\`, \`python_files/gold/\`, \`python_files/utilities/session_optimiser.py\`
+
+**Use Task tool to launch pyspark-engineer agent:**
+
+\`\`\`
+Task tool parameters:
+- subagent_type: "pyspark-engineer"
+- description: "Implement PySpark fixes for PR #[PR_ID]"
+- prompt: "
+  Address PySpark review feedback for PR #[PR_ID]:
+
+  Review Comment Details:
+  [For each PySpark-related comment, include:]
+  - File: [FILE_PATH]
+  - Line: [LINE_NUMBER]
+  - Reviewer Feedback: [COMMENT_TEXT]
+  - Thread ID: [THREAD_ID]
+
+  Implementation Requirements:
+  1. Read all affected files
+  2. Implement fixes following these standards:
+     - Maximum line length: 240 characters
+     - No blank lines inside functions
+     - Proper type hints for all functions
+     - Use @synapse_error_print_handler decorator
+     - PySpark DataFrame operations (not SQL)
+     - Suffix _sdf for all DataFrames
+     - Follow medallion architecture patterns
+  3. Optimize for:
+     - Performance and cost-efficiency
+     - Data skew handling
+     - Memory management
+     - Proper partitioning strategies
+  4. Ensure production readiness:
+     - Error handling
+     - Logging with NotebookLogger
+     - Idempotent operations
+  5. Run quality gates:
+     - Syntax validation: python3 -m py_compile
+     - Linting: ruff check python_files/
+     - Formatting: ruff format python_files/
+
+  Return:
+  1. List of files modified
+  2. Summary of changes made
+  3. Explanation of how each review comment was addressed
+  4. Any additional optimisations implemented
+  "
+\`\`\`
+
+**Integration:**
+- pyspark-engineer will read, modify, and validate files
+- Agent will run quality gates automatically
+- You will receive summary of changes
+- Use summary for commit message and review replies
+
+#### Validation for All Changes
+Regardless of method (direct Edit or pyspark-engineer agent):
+- Maximum line length: 240 characters
+- No blank lines inside functions
+- Proper type hints
+- Use of \`@synapse_error_print_handler\` decorator
+- PySpark best practices from \`.claude/rules/python_rules.md\`
+- Document all fixes for commit message
+
+### 5. Validate Changes
+Run quality gates:
+\`\`\`bash
+# Syntax check
+python3 -m py_compile <changed-file>
+
+# Linting
+ruff check python_files/
+
+# Format
+ruff format python_files/
+\`\`\`
+
+### 6. Commit and Push
+\`\`\`bash
+git add .
+git commit -m "♻️ refactor: address PR review feedback - <brief-summary>"
+git push origin <source-branch>
+\`\`\`
+
+**Commit Message Format:**
+\`\`\`
+♻️ refactor: address PR review feedback
+
+Fixes applied:
+- <file1>: <description of fix>
+- <file2>: <description of fix>
+- ...
+
+Review comments addressed in PR #<PR_ID>
+\`\`\`
+
+### 7. Reply to Review Threads
+For each addressed comment:
+- Use \`mcp__ado__repo_reply_to_comment\` to add reply:
+  \`\`\`
+  ✅ Fixed in commit <SHA>
+
+  Changes made:
+  - <specific change description>
+  \`\`\`
+- Use \`mcp__ado__repo_resolve_comment\` to mark thread as resolved (if appropriate)
+
+### 8. Report Results
+Provide summary:
+\`\`\`
+PR Review Fixes Completed
+
+PR: #<PR_ID> - <PR_Title>
+Branch: <source-branch> → <target-branch>
+
+Review Comments Addressed: <count>
+Files Modified: <file-list>
+Commit SHA: <sha>
+
+Quality Gates:
+✓ Syntax validation passed
+✓ Linting passed
+✓ Code formatting applied
+
+The PR has been updated and is ready for re-review.
+\`\`\`
+
+## Error Handling
+
+### No PR ID Provided
+If \`$ARGUMENTS\` is empty:
+- Use \`mcp__ado__repo_list_pull_requests_by_repo_or_project\` to list open PRs
+- Display all PRs created by current user
+- Prompt user to specify PR ID
+
+### No Active Review Comments
+If no active review threads found:
+\`\`\`
+No active review comments found for PR #<PR_ID>.
+
+The PR may already be approved or have no feedback requiring changes.
+Would you like me to re-run /pr-review to check current status?
+\`\`\`
+
+### Merge Conflicts
+If \`git pull\` results in merge conflicts:
+1. Display conflict files
+2. Guide user through resolution:
+   - Show conflicting sections
+   - Suggest resolution based on context
+   - Use Edit tool to resolve
+3. Complete merge commit
+4. Continue with review fixes
+
+### Quality Gate Failures
+If syntax check or linting fails:
+1. Display specific errors
+2. Fix automatically if possible
+3. Re-run quality gates
+4. Only proceed to commit when all gates pass
+
+## Example Usage
+
+### Fix Review for Specific PR
+\`\`\`bash
+/pr-fix-pr-review 5642
+\`\`\`
+
+### Fix Review for Latest PR
+\`\`\`bash
+/pr-fix-pr-review
+\`\`\`
+(Will list your open PRs if ID not provided)
+
+## Best Practices
+
+1. **Read all comments first** - Understand full scope before making changes
+2. **Make targeted fixes** - Address each comment specifically
+3. **Run quality gates** - Ensure changes meet project standards
+4. **Descriptive replies** - Explain what was changed and why
+5. **Resolve appropriately** - Only resolve threads when fix is complete
+6. **Test locally** - Consider running relevant tests if available
+
+## Integration with /deploy-workflow
+
+This command is automatically called by \`/deploy-workflow\` when:
+- \`/pr-review\` identifies issues requiring changes
+- The workflow needs to iterate on PR quality before merging
+
+The workflow will loop:
+1. \`/pr-review\` → identifies issues (may include pyspark-engineer deep analysis)
+2. \`/pr-fix-pr-review\` → addresses issues
+   - Standard fixes: Direct Edit tool usage
+   - Complex PySpark fixes: pyspark-engineer agent handles implementation
+3. \`/pr-review\` → re-validates
+4. Repeat until PR is approved
+
+**PySpark-Engineer Integration:**
+- Automatically triggered for performance and architecture issues
+- Ensures optimised, production-ready PySpark code
+- Maintains consistency with medallion architecture patterns
+- Validates test coverage and quality gates
+
+## Notes
+
+- **Automatic PR Update**: Pushing to source branch automatically updates the PR
+- **No New PR Created**: This updates the existing PR, doesn't create a new one
+- **Preserves History**: All review iterations are preserved in commit history
+- **Thread Management**: Replies and resolutions are tracked in Azure DevOps
+- **Quality First**: Will not commit changes that fail quality gates
+- **Intelligent Delegation**: Routes simple fixes to Edit tool, complex PySpark issues to specialist agent
+- **Expert Optimisation**: pyspark-engineer ensures performance and architecture best practices
--- a/commands/pr-review.md
+++ b/commands/pr-review.md
@@ -0,0 +1,206 @@
+---
+model: claude-haiku-4-5-20251001
+allowed-tools: Bash(git branch:*), Bash(git status:*), Bash(git log:*), Bash(git diff:*), mcp__*, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__repo_list_pull_requests_by_repo_or_project, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_list_pull_request_threads, mcp__ado__repo_list_pull_request_thread_comments, mcp__ado__repo_create_pull_request_thread, mcp__ado__repo_reply_to_comment, mcp__ado__repo_update_pull_request, mcp__ado__repo_search_commits, mcp__ado__pipelines_get_builds, Read, Task
+argument-hint: [PR_ID] (optional - if not provided, will list all open PRs)
+# PR Review and Approval
+---
+## Task
+Review open pull requests in the current repository and approve/complete them if they meet quality standards.
+
+## Instructions
+
+### 1. Get Repository Information
+- Use `mcp__ado__repo_get_repo_by_name_or_id` with:
+  - Project: `Program Unify`
+  - Repository: `unify_2_1_dm_synapse_env_d10`
+- Extract repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
+
+### 2. List Open Pull Requests
+- Use `mcp__ado__repo_list_pull_requests_by_repo_or_project` with:
+  - Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
+  - Status: `Active`
+- If `$ARGUMENTS` provided, filter to that specific PR ID
+- Display all open PRs with key details (ID, title, source/target branches, author)
+
+### 3. Review Each Pull Request
+For each PR (or the specified PR):
+
+#### 3.1 Get PR Details
+- Use `mcp__ado__repo_get_pull_request_by_id` to get full PR details
+- Check merge status - if conflicts exist, stop and report
+
+#### 3.2 Get PR Changes
+- Use `mcp__ado__repo_search_commits` to get commits in the PR
+- Identify files changed and scope of changes
+
+#### 3.3 Review Code Quality
+Read changed files and evaluate:
+1. **Code Quality & Maintainability**
+   - Proper use of type hints and descriptive variable names
+   - Maximum line length (240 chars) compliance
+   - No blank lines inside functions
+   - Proper import organization
+   - Use of `@synapse_error_print_handler` decorator
+   - Proper error handling with meaningful messages
+
+2. **PySpark Best Practices**
+   - DataFrame operations over raw SQL
+   - Proper use of `TableUtilities` methods
+   - Correct logging with `NotebookLogger`
+   - Proper session management
+
+3. **ETL Pattern Compliance**
+   - Follows ETL class pattern for Silver/Gold layers
+   - Proper extract/transform/load method structure
+   - Correct database and table naming conventions
+
+4. **Standards Compliance**
+   - Follows project coding standards from `.claude/rules/python_rules.md`
+   - No missing docstrings (unless explicitly instructed to omit)
+   - Proper use of configuration from `configuration.yaml`
+
+#### 3.4 Review DevOps Considerations
+1. **CI/CD Integration**
+   - Changes compatible with existing pipeline
+   - No breaking changes to deployment process
+
+2. **Configuration & Infrastructure**
+   - Proper environment detection pattern
+   - Azure integration handled correctly
+   - No hardcoded paths or credentials
+
+3. **Testing & Quality Gates**
+   - Syntax validation would pass
+   - Linting compliance (ruff check)
+   - Test coverage for new functionality
+
+#### 3.5 Deep PySpark Analysis (Conditional)
+**Only execute if PR modifies PySpark ETL code**
+
+Check if PR changes affect:
+- `python_files/pipeline_operations/bronze_layer_deployment.py`
+- `python_files/pipeline_operations/silver_dag_deployment.py`
+- `python_files/pipeline_operations/gold_dag_deployment.py`
+- Any files in `python_files/silver/`
+- Any files in `python_files/gold/`
+- `python_files/utilities/session_optimiser.py`
+
+**If PySpark files are modified, use Task tool to launch pyspark-engineer agent:**
+
+```
+Task tool parameters:
+- subagent_type: "pyspark-engineer"
+- description: "Deep PySpark analysis for PR #[PR_ID]"
+- prompt: "
+  Perform expert-level PySpark analysis for PR #[PR_ID]:
+
+  PR Details:
+  - Title: [PR_TITLE]
+  - Changed Files: [LIST_OF_CHANGED_FILES]
+  - Source Branch: [SOURCE_BRANCH]
+  - Target Branch: [TARGET_BRANCH]
+
+  Review Requirements:
+  1. Read all changed PySpark files
+  2. Analyze transformation logic for:
+     - Partitioning strategies and data skew
+     - Shuffle optimisation opportunities
+     - Broadcast join usage and optimisation
+     - Memory management and caching strategies
+     - DataFrame operation efficiency
+  3. Validate Medallion Architecture compliance:
+     - Bronze layer: Raw data preservation patterns
+     - Silver layer: Cleansing and standardization
+     - Gold layer: Business model optimisation
+  4. Check performance considerations:
+     - Identify potential bottlenecks
+     - Suggest optimisation opportunities
+     - Validate cost-efficiency patterns
+  5. Verify test coverage:
+     - Check for pytest test files
+     - Validate test completeness
+     - Suggest missing test scenarios
+  6. Review production readiness:
+     - Error handling for data pipeline failures
+     - Idempotent operation design
+     - Monitoring and logging completeness
+
+  Provide detailed findings in this format:
+
+  ## PySpark Analysis Results
+
+  ### Critical Issues (blocking)
+  - [List any critical performance or correctness issues]
+
+  ### Performance Optimisations
+  - [Specific optimisation recommendations]
+
+  ### Architecture Compliance
+  - [Medallion architecture adherence assessment]
+
+  ### Test Coverage
+  - [Test completeness and gaps]
+
+  ### Recommendations
+  - [Specific actionable improvements]
+
+  Return your analysis for integration into the PR review.
+  "
+```
+
+**Integration of PySpark Analysis:**
+- If pyspark-engineer identifies critical issues → Add to review comments
+- If optimisations suggested → Add as optional improvement comments
+- If architecture violations found → Add as required changes
+- Include all findings in final review summary
+
+### 4. Provide Review Comments
+- Use `mcp__ado__repo_list_pull_request_threads` to check existing review comments
+- If issues found, use `mcp__ado__repo_create_pull_request_thread` to add:
+  - Specific file-level comments with line numbers
+  - Clear description of issues
+  - Suggested improvements
+  - Mark as `Active` status if changes required
+
+### 5. Approve and Complete PR (if satisfied)
+**Only proceed if ALL criteria met:**
+- No merge conflicts
+- Code quality standards met
+- PySpark best practices followed
+- ETL patterns correct
+- No DevOps concerns
+- Proper error handling and logging
+- Standards compliant
+- **PySpark analysis (if performed) shows no critical issues**
+- **Performance optimisations either implemented or deferred with justification**
+- **Medallion architecture compliance validated**
+
+**If approved:**
+1. Use `mcp__ado__repo_update_pull_request` with:
+   - Set `autoComplete: true`
+   - Set `mergeStrategy: "NoFastForward"` (or "Squash" if many small commits)
+   - Set `deleteSourceBranch: false` (preserve branch history)
+   - Set `transitionWorkItems: true`
+   - Add approval comment explaining what was reviewed
+
+2. Confirm completion with summary:
+   - PR ID and title
+   - Number of commits reviewed
+   - Key changes identified
+   - Approval rationale
+
+### 6. Report Results
+Provide comprehensive summary:
+- Total open PRs reviewed
+- PRs approved and completed (with IDs)
+- PRs requiring changes (with summary of issues)
+- PRs blocked by merge conflicts
+- **PySpark analysis findings (if performed)**
+- **Performance optimisation recommendations**
+
+## Important Notes
+- **No deferrals**: All identified issues must be addressed before approval
+- **Immediate action**: If improvements needed, request them now - no "future work" comments
+- **Thorough review**: Check both code quality AND DevOps considerations
+- **Professional objectivity**: Prioritize technical accuracy over validation
+- **Merge conflicts**: Do NOT approve PRs with merge conflicts - report them for manual resolution
--- a/commands/pr-staging-to-develop.md
+++ b/commands/pr-staging-to-develop.md
@@ -0,0 +1,35 @@
+---
+model: claude-haiku-4-5-20251001
+allowed-tools: Bash(git add:*), Bash(git status:*), Bash(git commit:*), Bash(git diff:*), Bash(git log:*), Bash(git push:*), Bash(git pull:*), Bash(git branch:*), mcp__*, mcp__ado__repo_list_branches_by_repo, mcp__ado__repo_search_commits, mcp__ado__repo_create_pull_request, mcp__ado__repo_get_pull_request_by_id, mcp__ado__repo_get_repo_by_name_or_id, mcp__ado__wit_add_work_item_comment, mcp__ado__wit_get_work_item
+argument-hint: [message] | --no-verify | --amend | --pr-s | --pr-d | --pr-m
+# Create Remote PR: staging → develop
+---
+## Task
+Create a pull request from remote `staging` branch to remote `develop` branch using Azure DevOps MCP tools.
+
+## Instructions
+
+### 1. Create PR
+- Use `mcp__ado__repo_create_pull_request` tool
+- Source: `refs/heads/staging` (remote only - do NOT push local branches)
+- Target: `refs/heads/develop`
+- Repository ID: `d3fa6f02-bfdf-428d-825c-7e7bd4e7f338`
+- Title: Clear, concise description with conventional commit emoji
+- Description: Brief bullet points summarising changes (keep short)
+
+### 2. Check for Merge Conflicts
+- Use `mcp__ado__repo_get_pull_request_by_id` to verify PR status
+- If merge conflicts exist, resolve them:
+  1. Create temporary branch from `origin/staging`
+  2. Merge `origin/develop` into temp branch
+  3. Resolve conflicts using Edit tool
+  4. Commit resolution: `🔀 Merge origin/develop into staging - resolve conflicts for PR #XXXX`
+  5. Push resolved merge to `origin/staging`
+  6. Clean up temp branch
+
+### 3. Success Criteria
+- PR created successfully
+- No merge conflicts preventing approval
+- PR ready for reviewer approval
+
+storageexplorer://v=1&accountid=%2Fsubscriptions%2F646e3673-7a99-4617-9f7e-47857fa18002%2FresourceGroups%2FAuE-Atlas-DataPlatform-DEV-RG%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fauedatamigdevlake&subscriptionid=646e3673-7a99-4617-9f7e-47857fa18002&resourcetype=Azure.FileShare&resourcename=atldev01ndsdb1
--- a/commands/prime-claude.md
+++ b/commands/prime-claude.md
@@ -0,0 +1,183 @@
+---
+name: prime-claude-md
+description: Distill CLAUDE.md to essentials, moving detailed knowledge into skills for on-demand loading. Reduces context pollution by 80-90%.
+args: [--analyze-only] | [--backup] | [--apply]
+---
+
+# Prime CLAUDE.md
+
+Distill your CLAUDE.md file to only essential information, moving detailed knowledge into skills.
+
+## Problem
+
+Large CLAUDE.md files (400+ lines) are loaded into context for EVERY conversation:
+- Wastes 5,000-15,000 tokens per conversation
+- Reduces space for actual work
+- Slows Claude's responses
+- 80% of the content is rarely needed
+
+## Solution
+
+**Prime your CLAUDE.md**:
+1. Keep only critical architecture and coding standards
+2. Move detailed knowledge into skills (loaded on-demand)
+3. Reduce from 400+ lines to ~100 lines
+4. Save 80-90% context per conversation
+
+## Usage
+
+### Analyze Current CLAUDE.md
+```bash
+/prime-claude-md --analyze-only
+```
+Shows what would be moved to skills without making changes.
+
+### Create Backup and Apply
+```bash
+/prime-claude-md --backup --apply
+```
+1. Backs up current CLAUDE.md to CLAUDE.md.backup
+2. Creates supporting skills with detailed knowledge
+3. Replaces CLAUDE.md with distilled version
+4. Documents what was moved where
+
+### Just Apply (No Backup)
+```bash
+/prime-claude-md --apply
+```
+
+## What Gets Distilled
+
+### Kept in CLAUDE.md (Essential)
+- Critical architecture concepts (high-level only)
+- Mandatory coding standards (line length, blank lines, decorators)
+- Quality gates (syntax check, linting, formatting)
+- Essential commands (2-3 most common)
+- References to skills for details
+
+### Moved to Skills (Detailed Knowledge)
+
+**project-architecture** skill:
+- Detailed medallion architecture
+- Pipeline execution flow
+- Data source details
+- Azure integration specifics
+- Configuration management
+- Testing architecture
+
+**project-commands** skill:
+- Complete make command reference
+- All development workflows
+- Azure operations
+- Database operations
+- Git operations
+- Troubleshooting commands
+
+**pyspark-patterns** skill:
+- TableUtilities method documentation
+- ETL class pattern details
+- Logging standards
+- DataFrame operation patterns
+- JDBC connection patterns
+- Performance tips
+
+## Results
+
+**Before Priming**:
+- CLAUDE.md: 420 lines
+- Context cost: ~12,000 tokens per conversation
+- Skills: 0
+- Knowledge: Always loaded
+
+**After Priming**:
+- CLAUDE.md: ~100 lines (76% reduction)
+- Context cost: ~2,000 tokens per conversation (83% savings)
+- Skills: 3 specialized skills
+- Knowledge: Loaded only when needed
+
+## Example Distilled CLAUDE.md
+
+```markdown
+# CLAUDE.md
+
+**CRITICAL**: READ `.claude/rules/python_rules.md`
+
+## Architecture
+Medallion: Bronze → Silver → Gold
+Core: `session_optimiser.py` (SparkOptimiser, NotebookLogger, TableUtilities)
+
+## Essential Commands
+python3 -m py_compile <file>  # Must run
+ruff check python_files/       # Must pass
+make run_all                   # Full pipeline
+
+## Coding Standards
+- Line length: 240 chars
+- No blank lines in functions
+- Use @synapse_error_print_handler
+- Use logger (not print)
+
+## Skills Available
+- project-architecture: Detailed architecture
+- project-commands: Complete command reference
+- pyspark-patterns: PySpark best practices
+```
+
+## Benefits
+
+1. **Faster conversations**: Less context overhead
+2. **Better responses**: More room for actual work
+3. **On-demand knowledge**: Load only what you need
+4. **Maintainable**: Easier to update focused skills
+5. **Reusable pattern**: Apply to any repository
+
+## Applying to Other Repositories
+
+This command is repository-agnostic. To use on another repo:
+
+1. Run `/prime-claude-md --analyze-only` to see what you have
+2. Command will identify:
+   - Architectural concepts
+   - Command references
+   - Coding standards
+   - Configuration details
+3. Creates appropriate skills based on content
+4. Run `/prime-claude-md --apply` when ready
+
+## Files Created
+
+```
+.claude/
+├── CLAUDE.md                          # Distilled (100 lines)
+├── CLAUDE.md.backup                   # Original (if --backup used)
+└── skills/
+    ├── project-architecture/
+    │   └── skill.md                   # Architecture details
+    ├── project-commands/
+    │   └── skill.md                   # Command reference
+    └── pyspark-patterns/              # (project-specific)
+        └── skill.md                   # Code patterns
+```
+
+## Philosophy
+
+**CLAUDE.md should answer**: "What's special about this repo?"
+
+**Skills should answer**: "How do I do X in detail?"
+
+## Task Execution
+
+I will:
+1. Read current CLAUDE.md (both project and global if exists)
+2. Analyze content and categorize
+3. Create distilled CLAUDE.md (essential only)
+4. Create supporting skills with detailed knowledge
+5. If --backup: Save CLAUDE.md.backup
+6. If --apply: Replace CLAUDE.md with distilled version
+7. Generate summary report of changes
+
+---
+
+**Current Project**: Unify Data Migration (PySpark/Azure Synapse)
+
+Let me analyze your CLAUDE.md and create the distilled version with supporting skills.
--- a/commands/pyspark-errors.md
+++ b/commands/pyspark-errors.md
@@ -0,0 +1,607 @@
+# PySpark Error Fixing Command
+
+## Objective
+Execute `make gold_table` and systematically fix all errors encountered in the PySpark gold layer file using specialized agents. Errors may be code-based (syntax, type, runtime) or logical (incorrect joins, missing data, business rule violations).
+
+## Agent Workflow (MANDATORY)
+
+### Phase 1: Error Fixing with pyspark-engineer
+**CRITICAL**: All PySpark error fixing MUST be performed by the `pyspark-engineer` agent. Do NOT attempt to fix errors directly.
+
+1. Launch the `pyspark-engineer` agent with:
+   - Full error stack trace and context
+   - Target file path
+   - All relevant schema information from MCP server
+   - Data dictionary references
+
+2. The pyspark-engineer will:
+   - Validate MCP server connectivity
+   - Query schemas and foreign key relationships
+   - Analyze and fix all errors systematically
+   - Apply fixes following project coding standards
+   - Run quality gates (py_compile, ruff check, ruff format)
+
+### Phase 2: Code Review with code-reviewer
+**CRITICAL**: After pyspark-engineer completes fixes, MUST launch the `code-reviewer` agent.
+
+1. Launch the `code-reviewer` agent with:
+   - Path to the fixed file(s)
+   - Context: "PySpark gold layer error fixes"
+   - Request comprehensive review focusing on:
+     - PySpark best practices
+     - Join logic correctness
+     - Schema alignment
+     - Business rule implementation
+     - Code quality and standards adherence
+
+2. The code-reviewer will provide:
+   - Detailed feedback on all issues found
+   - Security vulnerabilities
+   - Performance optimization opportunities
+   - Code quality improvements needed
+
+### Phase 3: Iterative Refinement (MANDATORY LOOP)
+**CRITICAL**: The review-refactor cycle MUST continue until code-reviewer is 100% satisfied.
+
+1. If code-reviewer identifies ANY issues:
+   - Launch pyspark-engineer again with code-reviewer's feedback
+   - pyspark-engineer implements all recommended changes
+   - Launch code-reviewer again to re-validate
+
+2. Repeat Phase 1 → Phase 2 → Phase 3 until:
+   - code-reviewer explicitly states: "✓ 100% SATISFIED - No further changes required"
+   - Zero issues, warnings, or concerns remain
+   - All quality gates pass
+   - All business rules validated
+
+3. Only then is the error fixing task complete.
+
+**DO NOT PROCEED TO COMPLETION** until code-reviewer gives explicit 100% satisfaction confirmation.
+
+## Pre-Execution Requirements
+
+### 1. Python Coding Standards (CRITICAL - READ FIRST)
+**MANDATORY**: All code MUST follow `.claude/rules/python_rules.md` standards:
+- **Line 19**: Use DataFrame API not Spark SQL
+- **Line 20**: Do NOT use DataFrame aliases (e.g., `.alias("l")`) or `col()` function - use direct string references or `df["column"]` syntax
+- **Line 8**: Limit line length to 240 characters
+- **Line 9-10**: Single line per statement, no carriage returns mid-statement
+- **Line 10, 12**: No blank lines inside functions
+- **Line 11**: Close parentheses on the last line of code
+- **Line 5**: Use type hints for all function parameters and return values
+- **Line 18**: Import statements only at the start of file, never inside functions
+- **Line 16**: Run `ruff check` and `ruff format` before finalizing
+- Import only necessary PySpark functions: `from pyspark.sql.functions import when, coalesce, lit` (NO col() usage - use direct references instead)
+
+### 2. Identify Target File
+- Default target: `python_files/gold/<INSERT FILE NAME>.py`
+- Override via Makefile: `G_RUN_FILE_NAME` variable (line 63)
+- Verify file exists before execution
+
+### 3. Environment Context
+- **Runtime Environment**: Local development (not Azure Synapse)
+- **Working Directory**: `/workspaces/unify_2_1_dm_synapse_env_d10`
+- **Python Version**: 3.11+
+- **Spark Mode**: Local cluster (`local[*]`)
+- **Data Location**: `/workspaces/data` (parquet files)
+
+### 4. Available Resources
+- **Data Dictionary**: `.claude/data_dictionary/*.md` - schema definitions for all CMS, FVMS, NicheRMS tables
+- **Configuration**: `configuration.yaml` - database lists, null replacements, Azure settings
+- **MCP Schema Server**: `mcp-server-motherduck` - live schema access via MCP (REQUIRED for schema verification)
+- **Utilities Module**: `python_files/utilities/session_optimiser.py` - TableUtilities, NotebookLogger, decorators
+- **Example Files**: Other `python_files/gold/g_*.py` files for reference patterns
+
+### 5. MCP Server Validation (CRITICAL)
+**BEFORE PROCEEDING**, verify MCP server connectivity:
+
+1. **Test MCP Server Connection**:
+   - Attempt to query any known table schema via MCP
+   - Example test: Query schema for a common table (e.g., `silver_cms.s_cms_offence_report`)
+
+2. **Validation Criteria**:
+   - MCP server must respond with valid schema data
+   - Schema must include column names, data types, and nullability
+   - Response must be recent (not cached/stale data)
+
+3. **Failure Handling**:
+   ```
+   ⚠️  STOP: MCP Server Not Available
+
+   The MCP server (mcp-server-motherduck) is not responding or not providing valid schema data.
+
+   This command requires live schema access to:
+   - Verify column names and data types
+   - Validate join key compatibility
+   - Check foreign key relationships
+   - Ensure accurate schema matching
+
+   Actions Required:
+   1. Check MCP server status and configuration
+   2. Verify MotherDuck connection credentials
+   3. Ensure schema database is accessible
+   4. Restart MCP server if necessary
+
+   Cannot proceed with error fixing without verified schema access.
+   Use data dictionary files as fallback, but warn user of potential schema drift.
+   ```
+
+4. **Success Confirmation**:
+   ```
+   ✓ MCP Server Connected
+   ✓ Schema data available
+   ✓ Proceeding with error fixing workflow
+   ```
+
+## Error Detection Strategy
+
+### Phase 1: Execute and Capture Errors
+1. Run: `make gold_table`
+2. Capture full stack trace including:
+   - Error type (AttributeError, KeyError, AnalysisException, etc.)
+   - Line number and function name
+   - Failed DataFrame operation
+   - Column names involved
+   - Join conditions if applicable
+
+### Phase 2: Categorize Error Types
+
+#### A. Code-Based Errors
+
+**Syntax/Import Errors**
+- Missing imports from `pyspark.sql.functions`
+- Incorrect function signatures
+- Type hint violations
+- Decorator usage errors
+
+**Runtime Errors**
+- `AnalysisException`: Column not found, table doesn't exist
+- `AttributeError`: Calling non-existent DataFrame methods
+- `KeyError`: Dictionary access failures
+- `TypeError`: Incompatible data types in operations
+
+**DataFrame Schema Errors**
+- Column name mismatches (case sensitivity)
+- Duplicate column names after joins
+- Missing required columns for downstream operations
+- Incorrect column aliases
+
+#### B. Logical Errors
+
+**Join Issues**
+- **Incorrect Join Keys**: Joining on wrong columns (e.g., `offence_report_id` vs `cms_offence_report_id`)
+- **Missing Table Aliases**: Ambiguous column references after joins
+- **Wrong Join Types**: Using `inner` when `left` is required (or vice versa)
+- **Cartesian Products**: Missing join conditions causing data explosion
+- **Broadcast Misuse**: Not using `broadcast()` for small dimension tables
+- **Duplicate Join Keys**: Multiple rows with same key causing row multiplication
+
+**Aggregation Problems**
+- Incorrect `groupBy()` columns
+- Missing aggregation functions (`first()`, `last()`, `collect_list()`)
+- Wrong window specifications
+- Aggregating on nullable columns without `coalesce()`
+
+**Business Rule Violations**
+- Incorrect date/time logic (e.g., using `reported_date_time` when `date_created` should be fallback)
+- Missing null handling for critical fields
+- Status code logic errors
+- Incorrect coalesce order
+
+**Data Quality Issues**
+- Expected vs actual row counts (use `logger.info(f"Expected X rows, got {df.count()}")`)
+- Null propagation in critical columns
+- Duplicate records not being handled
+- Missing deduplication logic
+
+## Systematic Debugging Process
+
+### Step 1: Schema Verification
+For each source table mentioned in the error:
+
+1. **PRIMARY: Query MCP Server for Schema** (MANDATORY FIRST STEP):
+   - Use MCP tools to query table schema from MotherDuck
+   - Extract column names, data types, nullability, and constraints
+   - Verify foreign key relationships for join operations
+   - Cross-reference with error column names
+
+   **Example MCP Query Pattern**:
+   ```
+   Query: "Get schema for table silver_cms.s_cms_offence_report"
+   Expected Response: Column list with types and constraints
+   ```
+
+   **If MCP Server Fails**:
+   - STOP and warn user (see Section 4: MCP Server Validation)
+   - Do NOT proceed with fixing without schema verification
+   - Suggest user check MCP server configuration
+
+2. **SECONDARY: Verify Schema Using Data Dictionary** (as supplementary reference):
+   - Read `.claude/data_dictionary/{source}_{table}.md`
+   - Compare MCP schema vs data dictionary for consistency
+   - Note any schema drift or discrepancies
+   - Alert user if schemas don't match
+
+3. **Check Table Existence**:
+   ```python
+   spark.sql("SHOW TABLES IN silver_cms").show()
+   ```
+
+4. **Inspect Actual Runtime Schema** (validate MCP data):
+   ```python
+   df = spark.read.table("silver_cms.s_cms_offence_report")
+   df.printSchema()
+   df.select([col for col in df.columns[:10]]).show(5, truncate=False)
+   ```
+
+   **Compare**:
+   - MCP schema vs Spark runtime schema
+   - Report any mismatches to user
+   - Use runtime schema as source of truth if conflicts exist
+
+5. **Use DuckDB Schema** (if available, as additional validation):
+   - Query schema.db for column definitions
+   - Check foreign key relationships
+   - Validate join key data types
+   - Triangulate: MCP + DuckDB + Data Dictionary should align
+
+### Step 2: Join Logic Validation
+
+For each join operation:
+
+1. **Use MCP Server to Validate Join Relationships**:
+   - Query foreign key constraints from MCP schema server
+   - Identify correct join column names and data types
+   - Verify parent-child table relationships
+   - Confirm join key nullability (affects join results)
+
+   **Example MCP Queries**:
+   ```
+   Query: "Show foreign keys for table silver_cms.s_cms_offence_report"
+   Query: "What columns link s_cms_offence_report to s_cms_case_file?"
+   Query: "Get data type for column cms_offence_report_id in silver_cms.s_cms_offence_report"
+   ```
+
+   **If MCP Returns No Foreign Keys**:
+   - Fall back to data dictionary documentation
+   - Check `.claude/data_dictionary/` for relationship diagrams
+   - Manually verify join logic with business analyst
+
+2. **Verify Join Keys Exist** (using MCP-confirmed column names):
+   ```python
+   left_df.select("join_key_column").show(5)
+   right_df.select("join_key_column").show(5)
+   ```
+
+3. **Check Join Key Data Type Compatibility** (cross-reference with MCP schema):
+   ```python
+   # Verify types match MCP schema expectations
+   left_df.select("join_key_column").dtypes
+   right_df.select("join_key_column").dtypes
+   ```
+
+4. **Check Join Key Uniqueness**:
+   ```python
+   left_df.groupBy("join_key_column").count().filter("count > 1").show()
+   ```
+
+5. **Validate Join Type**:
+   - `left`: Keep all left records (most common for fact-to-dimension)
+   - `inner`: Only matching records
+   - Use `broadcast()` for small lookup tables (< 10MB)
+   - Confirm join type matches MCP foreign key relationship (nullable FK → left join)
+
+6. **Handle Ambiguous Columns**:
+   ```python
+   # BEFORE (causes ambiguity if both tables have same column names)
+   joined_df = left_df.join(right_df, on="common_id", how="left")
+
+   # AFTER (select specific columns to avoid ambiguity)
+   left_cols = [c for c in left_df.columns]
+   right_cols = ["dimension_field"]
+   joined_df = left_df.join(right_df, on="common_id", how="left").select(left_cols + right_cols)
+   ```
+
+### Step 3: Aggregation Verification
+
+1. **Check groupBy Columns**:
+   - Must include all columns not being aggregated
+   - Verify columns exist in DataFrame
+
+2. **Validate Aggregation Functions**:
+   ```python
+   from pyspark.sql.functions import min, max, first, count, sum, coalesce, lit
+
+   aggregated = df.groupBy("key").agg(min("date_column").alias("earliest_date"), max("date_column").alias("latest_date"), first("dimension_column", ignorenulls=True).alias("dimension"), count("*").alias("record_count"), coalesce(sum("amount"), lit(0)).alias("total_amount"))
+   ```
+
+3. **Test Aggregation Logic**:
+   - Run aggregation on small sample
+   - Compare counts before/after
+   - Check for unexpected nulls
+
+### Step 4: Business Rule Testing
+
+1. **Verify Timestamp Logic**:
+   ```python
+   from pyspark.sql.functions import when
+
+   df.select("reported_date_time", "date_created", when(df["reported_date_time"].isNotNull(), df["reported_date_time"]).otherwise(df["date_created"]).alias("final_timestamp")).show(10)
+   ```
+
+2. **Test Null Handling**:
+   ```python
+   from pyspark.sql.functions import coalesce, lit
+
+   df.select("primary_field", "fallback_field", coalesce(df["primary_field"], df["fallback_field"], lit(0)).alias("result")).show(10)
+   ```
+
+3. **Validate Status/Lookup Logic**:
+   - Check status code mappings against data dictionary
+   - Verify conditional logic matches business requirements
+
+## Common Error Patterns and Fixes
+
+### Pattern 1: Column Not Found After Join
+**Error**: `AnalysisException: Column 'offence_report_id' not found`
+
+**Root Cause**: Incorrect column name - verify column exists using MCP schema
+
+**Fix**:
+```python
+# BEFORE - wrong column name
+df = left_df.join(right_df, on="offence_report_id", how="left")
+
+# AFTER - MCP-verified correct column name
+df = left_df.join(right_df, on="cms_offence_report_id", how="left")
+
+# If joining on different column names between tables:
+df = left_df.join(
+    right_df,
+    left_df["cms_offence_report_id"] == right_df["offence_report_id"],
+    how="left"
+)
+```
+
+### Pattern 2: Duplicate Column Names
+**Error**: Multiple columns with same name causing selection issues
+
+**Fix**:
+```python
+# BEFORE - causes duplicate 'id' column
+joined = left_df.join(right_df, left_df["id"] == right_df["id"], how="left")
+
+# AFTER - drop duplicate from right table before join
+right_df_clean = right_df.drop("id")
+joined = left_df.join(right_df_clean, left_df["id"] == right_df["id"], how="left")
+
+# OR - rename columns to avoid duplicates
+right_df_renamed = right_df.withColumnRenamed("id", "related_id")
+joined = left_df.join(right_df_renamed, left_df["id"] == right_df_renamed["related_id"], how="left")
+```
+
+### Pattern 3: Incorrect Aggregation
+**Error**: Column not in GROUP BY causing aggregation failure
+
+**Fix**:
+```python
+from pyspark.sql.functions import min, first
+
+# BEFORE - non-aggregated column not in groupBy
+df.groupBy("key1").agg(min("date_field"), "non_aggregated_field")
+
+# AFTER - all non-grouped columns must be aggregated
+df = df.groupBy("key1").agg(min("date_field").alias("min_date"), first("non_aggregated_field", ignorenulls=True).alias("non_aggregated_field"))
+```
+
+### Pattern 4: Join Key Mismatch
+**Error**: No matching records or unexpected cartesian product
+
+**Fix**:
+```python
+left_df.select("join_key").show(20)
+right_df.select("join_key").show(20)
+left_df.select("join_key").dtypes
+right_df.select("join_key").dtypes
+left_df.filter(left_df["join_key"].isNull()).count()
+right_df.filter(right_df["join_key"].isNull()).count()
+result = left_df.join(right_df, left_df["join_key"].cast("int") == right_df["join_key"].cast("int"), how="left")
+```
+
+### Pattern 5: Missing Null Handling
+**Error**: Unexpected nulls propagating through transformations
+
+**Fix**:
+```python
+from pyspark.sql.functions import coalesce, lit
+
+# BEFORE - NULL if either field is NULL
+df = df.withColumn("result", df["field1"] + df["field2"])
+
+# AFTER - handle nulls with coalesce
+df = df.withColumn("result", coalesce(df["field1"], lit(0)) + coalesce(df["field2"], lit(0)))
+```
+
+## Validation Requirements
+
+After fixing errors, validate:
+
+1. **Row Counts**: Log and verify expected vs actual counts at each transformation
+2. **Schema**: Ensure output schema matches target table requirements
+3. **Nulls**: Check critical columns for unexpected nulls
+4. **Duplicates**: Verify uniqueness of ID columns
+5. **Data Ranges**: Check timestamp ranges and numeric bounds
+6. **Join Results**: Sample joined records to verify correctness
+
+## Logging Requirements
+
+Use `NotebookLogger` throughout:
+
+```python
+logger = NotebookLogger()
+
+# Start of operation
+logger.info(f"Starting extraction from {table_name}")
+
+# After DataFrame creation
+logger.info(f"Extracted {df.count()} records from {table_name}")
+
+# After join
+logger.info(f"Join completed: {joined_df.count()} records (expected ~X)")
+
+# After transformation
+logger.info(f"Transformation complete: {final_df.count()} records")
+
+# On error
+logger.error(f"Failed to process {table_name}: {error_message}")
+
+# On success
+logger.success(f"Successfully loaded {target_table_name}")
+```
+
+## Quality Gates (Must Run After Fixes)
+
+```bash
+# 1. Syntax validation
+python3 -m py_compile python_files/gold/g_x_mg_cms_mo.py
+
+# 2. Code quality check
+ruff check python_files/gold/g_x_mg_cms_mo.py
+
+# 3. Format code
+ruff format python_files/gold/g_x_mg_cms_mo.py
+
+# 4. Run fixed code
+make gold_table
+```
+
+## Key Principles for PySpark Engineer Agent
+
+1. **CRITICAL: Agent Workflow Required**: ALL error fixing must follow the 3-phase agent workflow (pyspark-engineer → code-reviewer → iterative refinement until 100% satisfied)
+2. **CRITICAL: Validate MCP Server First**: Before starting, verify MCP server connectivity and schema availability. STOP and warn user if unavailable.
+3. **Always Query MCP Schema First**: Use MCP server to get authoritative schema data before fixing any errors. Cross-reference with data dictionary.
+4. **Use MCP for Join Validation**: Query foreign key relationships from MCP to ensure correct join logic and column names.
+5. **DataFrame API Without Aliases or col()**: Use DataFrame API (NOT Spark SQL). NO DataFrame aliases. NO col() function. Use direct string references (e.g., `"column_name"`) or df["column"] syntax (e.g., `df["column_name"]`). Import only needed functions (e.g., `from pyspark.sql.functions import when, coalesce`)
+6. **Test Incrementally**: Fix one error at a time, validate, then proceed
+7. **Log Everything**: Add logging at every transformation step
+8. **Handle Nulls**: Always consider null cases in business logic (check MCP nullability constraints)
+9. **Verify Join Logic**: Check join keys, types, and uniqueness before implementing (use MCP data types)
+10. **Use Utilities**: Leverage `TableUtilities` methods (add_row_hash, save_as_table, clean_date_time_columns)
+11. **Follow Patterns**: Reference working gold layer files for established patterns
+12. **Validate Business Rules**: Confirm logic with MCP schema, data dictionary, and user story requirements
+13. **Clean Code**: Adhere to project standards (240 char line length, no blank lines in functions, type hints, single line per statement)
+14. **Triple-Check Schemas**: When schema mismatch occurs, verify MCP → Runtime → Data Dictionary alignment and report discrepancies
+15. **Code Review Loop**: Continue refactoring until code-reviewer explicitly confirms 100% satisfaction with zero remaining issues
+
+## Example Workflow with MCP Integration
+
+### Scenario: Fixing AnalysisException - Column 'offence_report_id' not found
+
+**Step 1: MCP Server Validation**
+```
+Agent: Testing MCP server connectivity...
+MCP Query: "List databases available"
+MCP Response: ✓ [bronze_cms, silver_cms, silver_fvms, gold_data_model, ...]
+
+Agent: ✓ MCP Server Connected
+Agent: ✓ Schema data available
+Agent: ✓ Proceeding with error fixing workflow
+```
+
+**Step 2: Query Schema via MCP**
+```
+Agent: Querying schema for silver_cms.s_cms_offence_report via MCP...
+MCP Query: "DESCRIBE TABLE silver_cms.s_cms_offence_report"
+MCP Response:
+  - cms_offence_report_id (BIGINT, PRIMARY KEY, NOT NULL)
+  - case_file_id (BIGINT, FOREIGN KEY → s_cms_case_file.cms_case_file_id, NULLABLE)
+  - reported_date_time (TIMESTAMP, NULLABLE)
+  - date_created (TIMESTAMP, NOT NULL)
+  - status_code (VARCHAR, NULLABLE)
+  ... (50+ more columns)
+
+Agent: Found correct column name: 'cms_offence_report_id' (not 'offence_report_id')
+```
+
+**Step 3: Verify Join Relationship via MCP**
+```
+Agent: Checking foreign key relationship for join...
+MCP Query: "Show foreign keys from s_cms_offence_report to s_cms_case_file"
+MCP Response:
+  - s_cms_offence_report.case_file_id → s_cms_case_file.cms_case_file_id
+  - Relationship: MANY-TO-ONE
+  - Nullability: case_file_id is NULLABLE → Use LEFT JOIN
+
+Agent: ✓ Join should be: LEFT JOIN on case_file_id = cms_case_file_id
+```
+
+**Step 4: Apply Fix with MCP-Verified Schema**
+```python
+# BEFORE (error)
+offence_df = spark.read.table("silver_cms.s_cms_offence_report")
+case_df = spark.read.table("silver_cms.s_cms_case_file")
+joined = offence_df.join(case_df, on="offence_report_id", how="left")
+
+# AFTER (MCP-verified) - Use DataFrame API with correct column names
+# MCP-confirmed FK: case_file_id → cms_case_file_id
+# MCP-confirmed nullable FK → LEFT JOIN
+# MCP-confirmed PK: cms_offence_report_id
+case_df_subset = case_df.select("cms_case_file_id", "case_file_number")
+joined = offence_df.join(case_df_subset, offence_df["case_file_id"] == case_df_subset["cms_case_file_id"], how="left")
+joined = joined.withColumnRenamed("cms_offence_report_id", "offence_report_id")
+```
+
+**Step 5: Validate Fix**
+```
+Agent: Running fixed code...
+Agent: ✓ No AnalysisException
+Agent: ✓ Join executed successfully
+Agent: Row count: 45,823 (expected ~45,000)
+Agent: ✓ Schema matches MCP expectations
+```
+
+## Success Criteria
+
+### Phase 1: Initial Error Fixing (pyspark-engineer)
+- [ ] **MCP Server validated and responding** (MANDATORY FIRST CHECK)
+- [ ] Schema verified via MCP server for all source tables
+- [ ] Foreign key relationships confirmed via MCP queries
+- [ ] All syntax errors resolved
+- [ ] All runtime errors fixed
+- [ ] Join logic validated and correct (using MCP-confirmed column names and types)
+- [ ] DataFrame API used (NOT Spark SQL) per python_rules.md line 19
+- [ ] NO DataFrame aliases or col() function used - direct string references or df["column"] syntax only (per python_rules.md line 20)
+- [ ] Code follows python_rules.md standards: 240 char lines, no blank lines in functions, single line per statement, imports at top only
+- [ ] Row counts logged and reasonable
+- [ ] Business rules implemented correctly
+- [ ] Output schema matches requirements (cross-referenced with MCP schema)
+- [ ] Code passes quality gates (py_compile, ruff check, ruff format)
+- [ ] `make gold_table` executes successfully
+- [ ] Target table created/updated in `gold_data_model` database
+- [ ] No schema drift reported between MCP, Runtime, and Data Dictionary sources
+
+### Phase 2: Code Review (code-reviewer)
+- [ ] code-reviewer agent launched with fixed code
+- [ ] Comprehensive review completed covering:
+  - [ ] PySpark best practices adherence
+  - [ ] Join logic correctness
+  - [ ] Schema alignment validation
+  - [ ] Business rule implementation accuracy
+  - [ ] Code quality and standards compliance
+  - [ ] Security vulnerabilities (none found)
+  - [ ] Performance optimization opportunities addressed
+
+### Phase 3: Iterative Refinement (MANDATORY UNTIL 100% SATISFIED)
+- [ ] All code-reviewer feedback items addressed by pyspark-engineer
+- [ ] Re-review completed by code-reviewer
+- [ ] Iteration cycle repeated until code-reviewer explicitly confirms:
+  - [ ] **"✓ 100% SATISFIED - No further changes required"**
+  - [ ] Zero remaining issues, warnings, or concerns
+  - [ ] All quality gates pass
+  - [ ] All business rules validated
+  - [ ] Code meets production-ready standards
+
+### Final Approval
+- [ ] **code-reviewer has explicitly confirmed 100% satisfaction**
+- [ ] No outstanding issues or concerns remain
+- [ ] Task is complete and ready for production deployment
--- a/commands/refactor-code.md
+++ b/commands/refactor-code.md
@@ -0,0 +1,116 @@
+# Intelligently Refactor and Improve Code Quality
+
+Intelligently refactor and improve code quality
+
+## Instructions
+
+Follow this systematic approach to refactor code: **$ARGUMENTS**
+
+1. **Pre-Refactoring Analysis**
+   - Identify the code that needs refactoring and the reasons why
+   - Understand the current functionality and behavior completely
+   - Review existing tests and documentation
+   - Identify all dependencies and usage points
+
+2. **Test Coverage Verification**
+   - Ensure comprehensive test coverage exists for the code being refactored
+   - If tests are missing, write them BEFORE starting refactoring
+   - Run all tests to establish a baseline
+   - Document current behavior with additional tests if needed
+
+3. **Refactoring Strategy**
+   - Define clear goals for the refactoring (performance, readability, maintainability)
+   - Choose appropriate refactoring techniques:
+     - Extract Method/Function
+     - Extract Class/Component
+     - Rename Variable/Method
+     - Move Method/Field
+     - Replace Conditional with Polymorphism
+     - Eliminate Dead Code
+   - Plan the refactoring in small, incremental steps
+
+4. **Environment Setup**
+   - Create a new branch: `git checkout -b refactor/$ARGUMENTS`
+   - Ensure all tests pass before starting
+   - Set up any additional tooling needed (profilers, analyzers)
+
+5. **Incremental Refactoring**
+   - Make small, focused changes one at a time
+   - Run tests after each change to ensure nothing breaks
+   - Commit working changes frequently with descriptive messages
+   - Use IDE refactoring tools when available for safety
+
+6. **Code Quality Improvements**
+   - Improve naming conventions for clarity
+   - Eliminate code duplication (DRY principle)
+   - Simplify complex conditional logic
+   - Reduce method/function length and complexity
+   - Improve separation of concerns
+
+7. **Performance Optimizations**
+   - Identify and eliminate performance bottlenecks
+   - Optimize algorithms and data structures
+   - Reduce unnecessary computations
+   - Improve memory usage patterns
+
+8. **Design Pattern Application**
+   - Apply appropriate design patterns where beneficial
+   - Improve abstraction and encapsulation
+   - Enhance modularity and reusability
+   - Reduce coupling between components
+
+9. **Error Handling Improvement**
+   - Standardize error handling approaches
+   - Improve error messages and logging
+   - Add proper exception handling
+   - Enhance resilience and fault tolerance
+
+10. **Documentation Updates**
+    - Update code comments to reflect changes
+    - Revise API documentation if interfaces changed
+    - Update inline documentation and examples
+    - Ensure comments are accurate and helpful
+
+11. **Testing Enhancements**
+    - Add tests for any new code paths created
+    - Improve existing test quality and coverage
+    - Remove or update obsolete tests
+    - Ensure tests are still meaningful and effective
+
+12. **Static Analysis**
+    - Run linting tools to catch style and potential issues
+    - Use static analysis tools to identify problems
+    - Check for security vulnerabilities
+    - Verify code complexity metrics
+
+13. **Performance Verification**
+    - Run performance benchmarks if applicable
+    - Compare before/after metrics
+    - Ensure refactoring didn't degrade performance
+    - Document any performance improvements
+
+14. **Integration Testing**
+    - Run full test suite to ensure no regressions
+    - Test integration with dependent systems
+    - Verify all functionality works as expected
+    - Test edge cases and error scenarios
+
+15. **Code Review Preparation**
+    - Review all changes for quality and consistency
+    - Ensure refactoring goals were achieved
+    - Prepare clear explanation of changes made
+    - Document benefits and rationale
+
+16. **Documentation of Changes**
+    - Create a summary of refactoring changes
+    - Document any breaking changes or new patterns
+    - Update project documentation if needed
+    - Explain benefits and reasoning for future reference
+
+17. **Deployment Considerations**
+    - Plan deployment strategy for refactored code
+    - Consider feature flags for gradual rollout
+    - Prepare rollback procedures
+    - Set up monitoring for the refactored components
+
+Remember: Refactoring should preserve external behavior while improving internal structure. Always prioritize safety over speed, and maintain comprehensive test coverage throughout the process.
--- a/commands/setup-docker-containers.md
+++ b/commands/setup-docker-containers.md
@@ -0,0 +1,37 @@
+---
+allowed-tools: Read, Write, Edit, Bash
+argument-hint: [environment-type] | --development | --production | --microservices | --compose
+description: Setup Docker containerization with multi-stage builds and development workflows
+model: sonnet
+---
+
+# Setup Docker Containers
+
+Setup comprehensive Docker containerization for development and production: **$ARGUMENTS**
+
+## Current Project State
+
+- Application type: @package.json or @requirements.txt (detect Node.js, Python, etc.)
+- Existing Docker: @Dockerfile or @docker-compose.yml (if exists)
+- Dependencies: !`find . -name "package-lock.json" -o -name "poetry.lock" -o -name "Pipfile.lock" | wc -l`
+- Services needed: Database, cache, message queue detection from configs
+
+## Task
+
+Implement production-ready Docker containerization with optimized builds and development workflows:
+
+**Environment Type**: Use $ARGUMENTS to specify development, production, microservices, or Docker Compose setup
+
+**Containerization Strategy**:
+1. **Dockerfile Creation** - Multi-stage builds, layer optimization, security best practices
+2. **Development Workflow** - Hot reloading, volume mounts, debugging capabilities
+3. **Production Optimization** - Image size reduction, security scanning, health checks
+4. **Multi-Service Setup** - Docker Compose, service discovery, networking configuration
+5. **CI/CD Integration** - Build automation, registry management, deployment pipelines
+6. **Monitoring & Logs** - Container observability, log aggregation, resource monitoring
+
+**Security Features**: Non-root users, minimal base images, vulnerability scanning, secrets management.
+
+**Performance Optimization**: Layer caching, build contexts, multi-platform builds, and resource constraints.
+
+**Output**: Complete Docker setup with optimized containers, development workflows, production deployment, and comprehensive documentation.
--- a/commands/ultra-think.md
+++ b/commands/ultra-think.md
@@ -0,0 +1,153 @@
+# Deep Analysis and Problem Solving Mode
+
+Deep analysis and problem solving mode
+
+## Instructions
+
+1. **Initialize Ultra Think Mode**
+   - Acknowledge the request for enhanced analytical thinking
+   - Set context for deep, systematic reasoning
+   - Prepare to explore the problem space comprehensively
+
+2. **Parse the Problem or Question**
+   - Extract the core challenge from: **$ARGUMENTS**
+   - Identify all stakeholders and constraints
+   - Recognize implicit requirements and hidden complexities
+   - Question assumptions and surface unknowns
+
+3. **Multi-Dimensional Analysis**
+   Approach the problem from multiple angles:
+   
+   ### Technical Perspective
+   - Analyze technical feasibility and constraints
+   - Consider scalability, performance, and maintainability
+   - Evaluate security implications
+   - Assess technical debt and future-proofing
+   
+   ### Business Perspective
+   - Understand business value and ROI
+   - Consider time-to-market pressures
+   - Evaluate competitive advantages
+   - Assess risk vs. reward trade-offs
+   
+   ### User Perspective
+   - Analyze user needs and pain points
+   - Consider usability and accessibility
+   - Evaluate user experience implications
+   - Think about edge cases and user journeys
+   
+   ### System Perspective
+   - Consider system-wide impacts
+   - Analyze integration points
+   - Evaluate dependencies and coupling
+   - Think about emergent behaviors
+
+4. **Generate Multiple Solutions**
+   - Brainstorm at least 3-5 different approaches
+   - For each approach, consider:
+     - Pros and cons
+     - Implementation complexity
+     - Resource requirements
+     - Potential risks
+     - Long-term implications
+   - Include both conventional and creative solutions
+   - Consider hybrid approaches
+
+5. **Deep Dive Analysis**
+   For the most promising solutions:
+   - Create detailed implementation plans
+   - Identify potential pitfalls and mitigation strategies
+   - Consider phased approaches and MVPs
+   - Analyze second and third-order effects
+   - Think through failure modes and recovery
+
+6. **Cross-Domain Thinking**
+   - Draw parallels from other industries or domains
+   - Apply design patterns from different contexts
+   - Consider biological or natural system analogies
+   - Look for innovative combinations of existing solutions
+
+7. **Challenge and Refine**
+   - Play devil's advocate with each solution
+   - Identify weaknesses and blind spots
+   - Consider "what if" scenarios
+   - Stress-test assumptions
+   - Look for unintended consequences
+
+8. **Synthesize Insights**
+   - Combine insights from all perspectives
+   - Identify key decision factors
+   - Highlight critical trade-offs
+   - Summarize innovative discoveries
+   - Present a nuanced view of the problem space
+
+9. **Provide Structured Recommendations**
+   Present findings in a clear structure:
+   ```
+   ## Problem Analysis
+   - Core challenge
+   - Key constraints
+   - Critical success factors
+   
+   ## Solution Options
+   ### Option 1: [Name]
+   - Description
+   - Pros/Cons
+   - Implementation approach
+   - Risk assessment
+   
+   ### Option 2: [Name]
+   [Similar structure]
+   
+   ## Recommendation
+   - Recommended approach
+   - Rationale
+   - Implementation roadmap
+   - Success metrics
+   - Risk mitigation plan
+   
+   ## Alternative Perspectives
+   - Contrarian view
+   - Future considerations
+   - Areas for further research
+   ```
+
+10. **Meta-Analysis**
+    - Reflect on the thinking process itself
+    - Identify areas of uncertainty
+    - Acknowledge biases or limitations
+    - Suggest additional expertise needed
+    - Provide confidence levels for recommendations
+
+## Usage Examples
+
+```bash
+# Architectural decision
+/project:ultra-think Should we migrate to microservices or improve our monolith?
+
+# Complex problem solving
+/project:ultra-think How do we scale our system to handle 10x traffic while reducing costs?
+
+# Strategic planning
+/project:ultra-think What technology stack should we choose for our next-gen platform?
+
+# Design challenge
+/project:ultra-think How can we improve our API to be more developer-friendly while maintaining backward compatibility?
+```
+
+## Key Principles
+
+- **First Principles Thinking**: Break down to fundamental truths
+- **Systems Thinking**: Consider interconnections and feedback loops
+- **Probabilistic Thinking**: Work with uncertainties and ranges
+- **Inversion**: Consider what to avoid, not just what to do
+- **Second-Order Thinking**: Consider consequences of consequences
+
+## Output Expectations
+
+- Comprehensive analysis (typically 2-4 pages of insights)
+- Multiple viable solutions with trade-offs
+- Clear reasoning chains
+- Acknowledgment of uncertainties
+- Actionable recommendations
+- Novel insights or perspectives
--- a/commands/update-docs.md
+++ b/commands/update-docs.md
@@ -0,0 +1,672 @@
+---
+allowed-tools: Read, Write, Edit, Bash, Grep, Glob, Task, mcp__*
+argument-hint: [doc-type] | --generate-local | --sync-to-wiki | --regenerate | --all | --validate
+description: Generate documentation locally to ./docs/ then sync to Azure DevOps wiki (local-first workflow)
+model: sonnet
+---
+
+# Data Pipeline Documentation - Local-First Workflow
+
+Generate documentation locally in `./docs/` directory, then sync to Azure DevOps wiki: $ARGUMENTS
+
+## Architecture: Local-First Documentation
+
+```
+Source Code → Generate Docs → ./docs/ (version controlled) → Sync to Wiki
+```
+
+**Benefits:**
+- ✅ Documentation version controlled in git
+- ✅ Review locally before wiki publish
+- ✅ No regeneration needed for wiki sync
+- ✅ Git diff shows doc changes
+- ✅ Reusable across multiple targets (wiki, GitHub Pages, PDF)
+- ✅ Offline access to documentation
+
+## Repository Information
+
+- Repository: unify_2_1_dm_synapse_env_d10
+- Local docs: `./docs/` (mirrors repo structure)
+- Wiki base: 'Unify 2.1 Data Migration Technical Documentation'/'Data Migration Pipeline'/unify_2_1_dm_synapse_env_d10/
+- Exclusions: @.docsignore (similar to .gitignore)
+
+## Documentation Workflows
+
+### --generate-local: Generate Documentation Locally
+
+Generate comprehensive documentation and save to `./docs/` directory.
+
+#### Step 1: Scan Repository for Files
+
+```bash
+# Get all documentable files (exclude .docsignore patterns)
+git ls-files "*.py" "*.yaml" "*.yml" "*.md" | grep -v -f <(git ls-files --ignored --exclude-standard --exclude-from=.docsignore)
+```
+
+**Target files:**
+- Python files: `python_files/**/*.py`
+- Configuration: `configuration.yaml`
+- Existing markdown: `README.md` (validate/enhance)
+
+**Exclude (from .docsignore):**
+- `__pycache__/`, `*.pyc`, `.venv/`
+- `.claude/`, `docs/`, `*.duckdb`
+- See `.docsignore` for complete list
+
+#### Step 2: Launch Code-Documenter Agent
+
+Use Task tool to launch code-documenter agent:
+
+```
+Generate comprehensive documentation for repository files:
+
+**Scope:**
+- Target: All Python files in python_files/ (utilities, bronze, silver, gold, testing)
+- Configuration files: configuration.yaml
+- Exclude: Files matching .docsignore patterns
+
+**Documentation Requirements:**
+For Python files:
+- File purpose and overview
+- Architecture and design patterns (medallion, ETL, etc.)
+- Class and function documentation
+- Data flow explanations
+- Business logic descriptions
+- Dependencies and imports
+- Usage examples
+- Testing information
+- Related Azure DevOps work items
+
+For Configuration files:
+- Configuration structure
+- All configuration sections explained
+- Environment variables
+- Azure integration settings
+- Usage examples
+
+**Output Format:**
+- Markdown format suitable for wiki
+- File naming: source_file.py → docs/path/source_file.py.md
+- Clear heading structure
+- Code examples with syntax highlighting
+- Cross-references to related files
+- Professional, concise language
+- NO attribution footers (e.g., "Documentation By: Claude Code")
+
+**Output Location:**
+Save all generated documentation to ./docs/ directory maintaining source structure:
+- python_files/utilities/session_optimiser.py → docs/python_files/utilities/session_optimiser.py.md
+- python_files/gold/g_address.py → docs/python_files/gold/g_address.py.md
+- configuration.yaml → docs/configuration.yaml.md
+
+**Directory Index Files:**
+Generate README.md for each directory with:
+- Directory purpose
+- List of files with brief descriptions
+- Architecture overview for layer directories
+- Navigation links
+```
+
+#### Step 3: Generate Directory Index Files
+
+Create `README.md` files for each directory:
+
+**Root Index (docs/README.md):**
+- Overall documentation structure
+- Navigation to main sections
+- Medallion architecture overview
+- Link to wiki
+
+**Layer Indexes:**
+- `docs/python_files/README.md` - Pipeline overview
+- `docs/python_files/utilities/README.md` - Core utilities index
+- `docs/python_files/bronze/README.md` - Bronze layer overview
+- `docs/python_files/silver/README.md` - Silver layer overview
+  - `docs/python_files/silver/cms/README.md` - CMS tables index
+  - `docs/python_files/silver/fvms/README.md` - FVMS tables index
+  - `docs/python_files/silver/nicherms/README.md` - NicheRMS tables index
+- `docs/python_files/gold/README.md` - Gold layer overview
+- `docs/python_files/testing/README.md` - Testing documentation
+
+#### Step 4: Validation
+
+Verify generated documentation:
+- All source files have corresponding .md files in ./docs/
+- Directory structure matches source repository
+- Index files (README.md) created for directories
+- Markdown formatting is valid
+- No files from .docsignore included
+- Cross-references are valid
+
+#### Step 5: Summary Report
+
+Provide detailed report:
+```markdown
+## Documentation Generation Complete
+
+### Files Documented:
+- Python files: [count]
+- Configuration files: [count]
+- Total documentation files: [count]
+
+### Directory Structure:
+- Utilities: [file count]
+- Bronze layer: [file count]
+- Silver layer: [file count by database]
+- Gold layer: [file count]
+- Testing: [file count]
+
+### Index Files Created:
+- Root index: docs/README.md
+- Layer indexes: [list]
+- Database indexes: [list]
+
+### Location:
+All documentation saved to: ./docs/
+
+### Next Steps:
+1. Review generated documentation: `ls -R ./docs/`
+2. Make any manual edits if needed
+3. Commit to git: `git add docs/`
+4. Sync to wiki: `/update-docs --sync-to-wiki`
+```
+
+---
+
+### --sync-to-wiki: Sync Local Docs to Azure DevOps Wiki
+
+Copy documentation from `./docs/` to Azure DevOps wiki (no regeneration).
+
+#### Step 1: Scan Local Documentation
+
+```bash
+# Find all .md files in ./docs/
+find ./docs -name "*.md" -type f
+```
+
+**Path Mapping Logic:**
+
+Local path → Wiki path conversion:
+```
+./docs/python_files/utilities/session_optimiser.py.md
+↓
+Unify 2.1 Data Migration Technical Documentation/
+  Data Migration Pipeline/
+    unify_2_1_dm_synapse_env_d10/
+      python_files/utilities/session_optimiser.py
+```
+
+**Mapping rules:**
+1. Remove `./docs/` prefix
+2. Remove `.md` extension (unless README.md → README)
+3. Prepend wiki base path
+4. Use forward slashes for wiki paths
+
+#### Step 2: Read and Process Each Documentation File
+
+For each `.md` file in `./docs/`:
+1. Read markdown content
+2. Extract metadata (if present)
+3. Generate wiki path from local path
+4. Prepare content for wiki format
+5. Add footer with metadata:
+   ```markdown
+   ---
+   **Metadata:**
+   - Source: [file path in repo]
+   - Last Updated: [date]
+   - Related Work Items: [links if available]
+   ```
+
+#### Step 3: Create/Update Wiki Pages Using ADO MCP
+
+Use Azure DevOps MCP to create or update each wiki page:
+
+```bash
+# For each documentation file:
+# 1. Check if wiki page exists
+# 2. Create new page if not exists
+# 3. Update existing page if exists
+# 4. Verify success
+
+# Example for session_optimiser.py.md:
+Local:  ./docs/python_files/utilities/session_optimiser.py.md
+Wiki:   Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py
+Action: Create/Update wiki page with content
+```
+
+**ADO MCP Operations:**
+```python
+# Pseudo-code for sync operation
+for doc_file in find_all_docs():
+    wiki_path = local_to_wiki_path(doc_file)
+    content = read_file(doc_file)
+
+    # Use MCP to create/update
+    mcp__Azure_DevOps__create_or_update_wiki_page(
+        path=wiki_path,
+        content=content
+    )
+```
+
+#### Step 4: Verification
+
+After sync, verify:
+- All .md files from ./docs/ have corresponding wiki pages
+- Wiki path structure matches local structure
+- Content is properly formatted in wiki
+- No sync errors
+- Wiki pages accessible in Azure DevOps
+
+#### Step 5: Summary Report
+
+Provide detailed sync report:
+```markdown
+## Wiki Sync Complete
+
+### Pages Synced:
+- Total pages: [count]
+- Created new: [count]
+- Updated existing: [count]
+
+### By Directory:
+- Utilities: [count] pages
+- Bronze: [count] pages
+- Silver: [count] pages
+  - CMS: [count] pages
+  - FVMS: [count] pages
+  - NicheRMS: [count] pages
+- Gold: [count] pages
+- Testing: [count] pages
+
+### Wiki Location:
+Base: Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/
+
+### Verification:
+- All pages synced successfully: [✅/❌]
+- Path structure correct: [✅/❌]
+- Content formatting valid: [✅/❌]
+
+### Errors:
+[List any sync failures and reasons]
+
+### Next Steps:
+1. Verify pages in Azure DevOps wiki
+2. Check navigation and cross-references
+3. Share wiki URL with team
+```
+
+---
+
+### --regenerate: Regenerate Specific File(s)
+
+Update documentation for specific file(s) without full regeneration.
+
+**Usage:**
+```bash
+# Single file
+/update-docs --regenerate python_files/gold/g_address.py
+
+# Multiple files
+/update-docs --regenerate python_files/gold/g_address.py python_files/gold/g_cms_address.py
+
+# Entire directory
+/update-docs --regenerate python_files/utilities/
+```
+
+**Process:**
+1. Launch code-documenter agent for specified file(s)
+2. Generate updated documentation
+3. Save to ./docs/ (overwrite existing)
+4. Report files updated
+5. Optionally sync to wiki
+
+**Output:**
+```markdown
+## Documentation Regenerated
+
+### Files Updated:
+- python_files/gold/g_address.py → docs/python_files/gold/g_address.py.md
+
+### Next Steps:
+1. Review updated documentation
+2. Commit changes: `git add docs/python_files/gold/g_address.py.md`
+3. Sync to wiki: `/update-docs --sync-to-wiki --directory python_files/gold/`
+```
+
+---
+
+### --all: Complete Workflow
+
+Execute complete documentation workflow: generate local + sync to wiki.
+
+**Process:**
+1. Execute `--generate-local` workflow
+2. Validate generated documentation
+3. Execute `--sync-to-wiki` workflow
+4. Provide comprehensive summary
+
+**Use when:**
+- Initial documentation setup
+- Major refactoring or restructuring
+- Adding new layers or modules
+- Quarterly documentation refresh
+
+---
+
+### --validate: Documentation Validation
+
+Validate documentation completeness and accuracy.
+
+**Validation Checks:**
+
+1. **Completeness:**
+   - All source files have documentation
+   - All directories have index files (README.md)
+   - No missing cross-references
+
+2. **Accuracy:**
+   - Documented functions exist in source
+   - Schema documentation matches actual tables
+   - Configuration docs match configuration.yaml
+
+3. **Quality:**
+   - Valid markdown syntax
+   - Proper heading structure
+   - Code blocks properly formatted
+   - No broken links
+
+4. **Sync Status:**
+   - ./docs/ files match wiki pages
+   - No uncommitted documentation changes
+   - Wiki pages up to date
+
+**Validation Report:**
+```markdown
+## Documentation Validation Results
+
+### Completeness: [✅/❌]
+- Files without docs: [count]
+- Missing index files: [count]
+- Missing cross-references: [count]
+
+### Accuracy: [✅/❌]
+- Schema mismatches: [count]
+- Outdated function docs: [count]
+- Configuration drift: [count]
+
+### Quality: [✅/❌]
+- Markdown syntax errors: [count]
+- Broken links: [count]
+- Formatting issues: [count]
+
+### Sync Status: [✅/❌]
+- Out-of-sync files: [count]
+- Uncommitted changes: [count]
+- Wiki drift: [count]
+
+### Actions Required:
+[List of fixes needed]
+```
+
+---
+
+## Optional Workflow Modifiers
+
+### --layer: Target Specific Layer
+
+Generate/sync documentation for specific layer only.
+
+```bash
+/update-docs --generate-local --layer utilities
+/update-docs --generate-local --layer gold
+/update-docs --sync-to-wiki --layer silver
+```
+
+### --directory: Target Specific Directory
+
+Generate/sync documentation for specific directory.
+
+```bash
+/update-docs --generate-local --directory python_files/gold/
+/update-docs --sync-to-wiki --directory python_files/utilities/
+```
+
+### --only-modified: Sync Only Changed Files
+
+Sync only files modified since last sync (based on git status).
+
+```bash
+/update-docs --sync-to-wiki --only-modified
+```
+
+**Process:**
+1. Check git status for modified .md files in ./docs/
+2. Sync only those files to wiki
+3. Faster than full sync
+
+---
+
+## Code-Documenter Agent Integration
+
+### When to Use Code-Documenter Agent:
+
+**Always use Task tool with subagent_type="code-documenter" for:**
+1. **Initial documentation generation** (--generate-local)
+2. **File regeneration** (--regenerate)
+3. **Complex transformations** - ETL logic, medallion patterns
+4. **Architecture documentation** - High-level system design
+
+### Agent Invocation Pattern:
+
+```markdown
+Launch code-documenter agent with:
+- Target files: [list of files or directories]
+- Documentation scope: comprehensive documentation
+- Focus areas: [medallion architecture | ETL logic | utilities | testing]
+- Output format: Wiki-ready markdown
+- Output location: ./docs/ (maintain source structure)
+- Exclude patterns: Files from .docsignore
+- Quality requirements: Professional, accurate, no attribution footers
+```
+
+---
+
+## Path Mapping Reference
+
+### Local to Wiki Path Conversion
+
+**Function logic:**
+```python
+def local_to_wiki_path(local_path: str) -> str:
+    """
+    Convert local docs path to Azure DevOps wiki path
+
+    Args:
+        local_path: Path like ./docs/python_files/utilities/session_optimiser.py.md
+
+    Returns:
+        Wiki path like: Unify 2.1 Data Migration Technical Documentation/.../session_optimiser.py
+    """
+    # Remove ./docs/ prefix
+    relative = local_path.replace('./docs/', '')
+
+    # Handle README.md (keep as README)
+    if relative.endswith('/README.md'):
+        relative = relative  # Keep README.md
+    elif relative.endswith('.md'):
+        relative = relative[:-3]  # Remove .md extension
+
+    # Build wiki path
+    wiki_base = "Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10"
+    wiki_path = f"{wiki_base}/{relative}"
+
+    return wiki_path
+```
+
+**Examples:**
+```
+./docs/README.md
+→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/README
+
+./docs/python_files/utilities/session_optimiser.py.md
+→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py
+
+./docs/python_files/gold/g_address.py.md
+→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/gold/g_address.py
+
+./docs/configuration.yaml.md
+→ Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/configuration.yaml
+```
+
+---
+
+## Azure DevOps MCP Commands
+
+### Wiki Operations:
+
+```bash
+# Create wiki page
+mcp__Azure_DevOps__create_wiki_page(
+    path="Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/utilities/session_optimiser.py",
+    content="[markdown content]"
+)
+
+# Update wiki page
+mcp__Azure_DevOps__update_wiki_page(
+    path="[wiki page path]",
+    content="[updated markdown content]"
+)
+
+# List wiki pages in directory
+mcp__Azure_DevOps__list_wiki_pages(
+    path="Unify 2.1 Data Migration Technical Documentation/Data Migration Pipeline/unify_2_1_dm_synapse_env_d10/python_files/gold"
+)
+
+# Delete wiki page (cleanup)
+mcp__Azure_DevOps__delete_wiki_page(
+    path="[wiki page path]"
+)
+```
+
+---
+
+## Guidelines
+
+### DO:
+- ✅ Generate documentation locally first (./docs/)
+- ✅ Review and edit documentation before wiki sync
+- ✅ Commit documentation to git with code changes
+- ✅ Use code-documenter agent for comprehensive docs
+- ✅ Respect .docsignore patterns
+- ✅ Maintain directory structure matching source repo
+- ✅ Generate index files (README.md) for directories
+- ✅ Use --only-modified for incremental wiki updates
+- ✅ Validate documentation regularly
+- ✅ Link to Azure DevOps work items in docs
+
+### DO NOT:
+- ❌ Generate documentation directly to wiki (bypass ./docs/)
+- ❌ Skip local review before wiki publish
+- ❌ Document files in .docsignore (__pycache__/, *.pyc, .env)
+- ❌ Include attribution footers ("Documentation By: Claude Code")
+- ❌ Duplicate documentation in multiple locations
+- ❌ Create wiki pages without proper path structure
+- ❌ Forget to update documentation when code changes
+- ❌ Sync to wiki without validating locally first
+
+---
+
+## Documentation Quality Standards
+
+### For Python Files:
+- Clear file purpose and overview
+- Architecture and design pattern explanations
+- Class and function documentation with type hints
+- Data flow diagrams for ETL transformations
+- Business logic explanations
+- Usage examples with code snippets
+- Testing information and coverage
+- Dependencies and related files
+- Related Azure DevOps work items
+
+### For Configuration Files:
+- Section-by-section explanation
+- Environment variable documentation
+- Azure integration details
+- Usage examples
+- Valid value ranges and constraints
+
+### For Index Files (README.md):
+- Directory purpose and overview
+- File listing with brief descriptions
+- Architecture context (for layers)
+- Navigation links to sub-sections
+- Key concepts and patterns
+
+### Markdown Quality:
+- Clear heading hierarchy (H1 → H2 → H3)
+- Code blocks with language specification
+- Tables for structured data
+- Cross-references using relative links
+- No broken links
+- Professional, concise language
+- Valid markdown syntax
+
+---
+
+## Git Integration
+
+### Commit Documentation with Code:
+
+```bash
+# Add both code and documentation
+git add python_files/gold/g_address.py docs/python_files/gold/g_address.py.md
+git commit -m "feat(gold): add g_address table with documentation"
+
+# View documentation changes
+git diff docs/
+
+# Documentation visible in PR reviews
+```
+
+### Pre-commit Hook (Optional):
+
+```bash
+# Validate documentation before commit
+# In .git/hooks/pre-commit:
+/update-docs --validate
+```
+
+---
+
+## Output Summary Template
+
+After any workflow completion, provide:
+
+### 1. Workflow Executed:
+- Command: [command used]
+- Scope: [what was processed]
+- Duration: [time taken]
+
+### 2. Documentation Generated/Updated:
+- Files processed: [count and list]
+- Location: ./docs/
+- Size: [total documentation size]
+
+### 3. Wiki Sync Results (if applicable):
+- Pages created: [count]
+- Pages updated: [count]
+- Wiki path: [base path]
+- Status: [success/partial/failed]
+
+### 4. Validation Results:
+- Completeness: [✅/❌]
+- Accuracy: [✅/❌]
+- Quality: [✅/❌]
+- Issues found: [count and details]
+
+### 5. Next Steps:
+- Recommended actions
+- Areas needing attention
+- Suggested improvements
--- a/commands/write-tests.md
+++ b/commands/write-tests.md
@@ -0,0 +1,326 @@
+---
+allowed-tools: Read, Write, Edit, Bash
+argument-hint: [target-file] | [test-type] | --unit | --integration | --data-validation | --medallion
+description: Write comprehensive pytest tests for PySpark data pipelines with live data validation
+model: sonnet
+---
+
+# Write Tests - pytest + PySpark with Live Data
+
+Write comprehensive pytest tests for PySpark data pipelines using **LIVE DATA** sources: **$ARGUMENTS**
+
+## Current Testing Context
+
+- Test framework: !`[ -f pytest.ini ] && echo "pytest configured" || echo "pytest setup needed"`
+- Target: $ARGUMENTS (file/layer to test)
+- Test location: !`ls -d tests/ test/ 2>/dev/null | head -1 || echo "tests/ (will create)"`
+- Live data available: Bronze/Silver/Gold layers with real FVMS, CMS, NicheRMS tables
+
+## Core Principle: TEST WITH LIVE DATA
+
+**ALWAYS use real data from Bronze/Silver/Gold layers**. No mocked data unless absolutely necessary.
+
+## pytest Testing Framework
+
+### 1. Test File Organization
+
+```
+tests/
+├── conftest.py                    # Shared fixtures (Spark session, live data)
+├── test_bronze_ingestion.py       # Bronze layer validation
+├── test_silver_transformations.py # Silver layer ETL
+├── test_gold_aggregations.py      # Gold layer analytics
+├── test_utilities.py              # TableUtilities, NotebookLogger
+└── integration/
+    └── test_end_to_end_pipeline.py
+```
+
+### 2. Essential pytest Fixtures (conftest.py)
+
+```python
+import pytest
+from pyspark.sql import SparkSession
+from python_files.utilities.session_optimiser import SparkOptimiser
+
+@pytest.fixture(scope="session")
+def spark():
+    """Shared Spark session for all tests - reuses SparkOptimiser"""
+    session = SparkOptimiser.get_optimised_spark_session()
+    yield session
+    session.stop()
+
+@pytest.fixture(scope="session")
+def bronze_data(spark):
+    """Live bronze layer data - REAL DATA"""
+    return spark.table("bronze_fvms.b_vehicle_master")
+
+@pytest.fixture(scope="session")
+def silver_data(spark):
+    """Live silver layer data - REAL DATA"""
+    return spark.table("silver_fvms.s_vehicle_master")
+
+@pytest.fixture
+def sample_live_data(bronze_data):
+    """Small sample from live data for fast tests"""
+    return bronze_data.limit(100)
+```
+
+### 3. pytest Test Patterns
+
+#### Pattern 1: Unit Tests (Individual Functions)
+
+```python
+# tests/test_utilities.py
+import pytest
+from python_files.utilities.session_optimiser import TableUtilities
+
+class TestTableUtilities:
+    def test_add_row_hash_creates_hash_column(self, spark, sample_live_data):
+        """Verify add_row_hash() creates hash_key column"""
+        result = TableUtilities.add_row_hash(sample_live_data, ["vehicle_id"])
+        assert "hash_key" in result.columns
+        assert result.count() == sample_live_data.count()
+
+    def test_drop_duplicates_simple_removes_exact_duplicates(self, spark):
+        """Test deduplication on live data"""
+        # Use LIVE data with known duplicates
+        raw_data = spark.table("bronze_fvms.b_vehicle_events")
+        result = TableUtilities.drop_duplicates_simple(raw_data)
+        assert result.count() <= raw_data.count()
+
+    @pytest.mark.parametrize("date_col", ["created_date", "updated_date", "event_date"])
+    def test_clean_date_time_columns_handles_all_formats(self, spark, bronze_data, date_col):
+        """Parameterized test for date cleaning"""
+        if date_col in bronze_data.columns:
+            result = TableUtilities.clean_date_time_columns(bronze_data, [date_col])
+            assert date_col in result.columns
+```
+
+#### Pattern 2: Integration Tests (End-to-End)
+
+```python
+# tests/integration/test_end_to_end_pipeline.py
+import pytest
+from python_files.silver.fvms.s_vehicle_master import VehicleMaster
+
+class TestSilverVehicleMasterPipeline:
+    def test_full_etl_with_live_bronze_data(self, spark):
+        """Test complete Bronze → Silver transformation with LIVE data"""
+        # Extract: Read LIVE bronze data
+        bronze_table = "bronze_fvms.b_vehicle_master"
+        bronze_df = spark.table(bronze_table)
+        initial_count = bronze_df.count()
+
+        # Transform & Load: Run actual ETL class
+        etl = VehicleMaster(bronze_table_name=bronze_table)
+
+        # Validate: Check LIVE silver output
+        silver_df = spark.table("silver_fvms.s_vehicle_master")
+        assert silver_df.count() > 0
+        assert "hash_key" in silver_df.columns
+        assert "load_timestamp" in silver_df.columns
+
+        # Data quality: No nulls in critical fields
+        assert silver_df.filter("vehicle_id IS NULL").count() == 0
+```
+
+#### Pattern 3: Data Validation (Live Data Checks)
+
+```python
+# tests/test_data_validation.py
+import pytest
+
+class TestBronzeLayerDataQuality:
+    """Validate live data quality in Bronze layer"""
+
+    def test_bronze_vehicle_master_has_recent_data(self, spark):
+        """Verify bronze layer contains recent records"""
+        from pyspark.sql.functions import max, datediff, current_date
+
+        df = spark.table("bronze_fvms.b_vehicle_master")
+        max_date = df.select(max("load_timestamp")).collect()[0][0]
+
+        # Data should be less than 30 days old
+        assert (current_date() - max_date).days <= 30
+
+    def test_bronze_to_silver_row_counts_match_expectations(self, spark):
+        """Validate row count transformation logic"""
+        bronze = spark.table("bronze_fvms.b_vehicle_master")
+        silver = spark.table("silver_fvms.s_vehicle_master")
+
+        # After deduplication, silver <= bronze
+        assert silver.count() <= bronze.count()
+
+    @pytest.mark.slow
+    def test_hash_key_uniqueness_on_live_data(self, spark):
+        """Verify hash_key uniqueness in Silver layer (full scan)"""
+        df = spark.table("silver_fvms.s_vehicle_master")
+        total = df.count()
+        unique = df.select("hash_key").distinct().count()
+
+        assert total == unique, f"Duplicate hash_keys found: {total - unique}"
+```
+
+#### Pattern 4: Schema Validation
+
+```python
+# tests/test_schema_validation.py
+import pytest
+from pyspark.sql.types import StringType, IntegerType, TimestampType
+
+class TestSchemaConformance:
+    def test_silver_vehicle_schema_matches_expected(self, spark):
+        """Validate Silver layer schema against business requirements"""
+        df = spark.table("silver_fvms.s_vehicle_master")
+        schema_dict = {field.name: field.dataType for field in df.schema.fields}
+
+        # Critical fields must exist
+        assert "vehicle_id" in schema_dict
+        assert "hash_key" in schema_dict
+        assert "load_timestamp" in schema_dict
+
+        # Type validation
+        assert isinstance(schema_dict["vehicle_id"], StringType)
+        assert isinstance(schema_dict["load_timestamp"], TimestampType)
+```
+
+### 4. pytest Markers & Configuration
+
+**pytest.ini**:
+```ini
+[tool:pytest]
+testpaths = tests
+python_files = test_*.py
+python_classes = Test*
+python_functions = test_*
+markers =
+    slow: marks tests as slow (deselect with '-m "not slow"')
+    integration: marks tests as integration tests
+    unit: marks tests as unit tests
+    live_data: tests that require live data access
+addopts =
+    -v
+    --tb=short
+    --strict-markers
+    --disable-warnings
+```
+
+**Run specific test types**:
+```bash
+pytest tests/test_utilities.py -v                    # Single file
+pytest -m unit                                       # Only unit tests
+pytest -m "not slow"                                 # Skip slow tests
+pytest -k "vehicle"                                  # Tests matching "vehicle"
+pytest --maxfail=1                                   # Stop on first failure
+pytest -n auto                                       # Parallel execution (pytest-xdist)
+```
+
+### 5. Advanced pytest Features
+
+#### Parametrized Tests
+```python
+@pytest.mark.parametrize("table_name,expected_min_count", [
+    ("bronze_fvms.b_vehicle_master", 1000),
+    ("bronze_cms.b_customer_master", 500),
+    ("bronze_nicherms.b_booking_master", 2000),
+])
+def test_bronze_tables_have_minimum_rows(spark, table_name, expected_min_count):
+    """Validate minimum row counts across multiple live tables"""
+    df = spark.table(table_name)
+    assert df.count() >= expected_min_count
+```
+
+#### Fixtures with Live Data Sampling
+```python
+@pytest.fixture
+def stratified_sample(bronze_data):
+    """Stratified sample from live data for statistical tests"""
+    from pyspark.sql.functions import col
+    return bronze_data.sampleBy("vehicle_type", fractions={"Car": 0.1, "Truck": 0.1})
+```
+
+### 6. Testing Best Practices
+
+**DO**:
+- ✅ Use `spark.table()` to read LIVE Bronze/Silver/Gold data
+- ✅ Test with `.limit(100)` for speed, full dataset for validation
+- ✅ Use `@pytest.fixture(scope="session")` for Spark session (reuse)
+- ✅ Test actual ETL classes (e.g., `VehicleMaster()`)
+- ✅ Validate data quality (nulls, duplicates, date ranges)
+- ✅ Use `pytest.mark.parametrize` for testing multiple tables
+- ✅ Clean up test outputs in teardown fixtures
+
+**DON'T**:
+- ❌ Create mock/fake data (use real data samples)
+- ❌ Skip testing because "data is too large" (use `.limit()`)
+- ❌ Write tests that modify production tables
+- ❌ Ignore schema validation
+- ❌ Forget to test error handling with real edge cases
+
+### 7. Example: Complete Test File
+
+```python
+# tests/test_silver_vehicle_master.py
+import pytest
+from pyspark.sql.functions import col, count, when
+from python_files.silver.fvms.s_vehicle_master import VehicleMaster
+
+class TestSilverVehicleMaster:
+    """Test Silver layer VehicleMaster ETL with LIVE data"""
+
+    @pytest.fixture(scope="class")
+    def silver_df(self, spark):
+        """Live Silver data - computed once per test class"""
+        return spark.table("silver_fvms.s_vehicle_master")
+
+    def test_all_required_columns_exist(self, silver_df):
+        """Validate schema completeness"""
+        required = ["vehicle_id", "hash_key", "load_timestamp", "registration_number"]
+        missing = [col for col in required if col not in silver_df.columns]
+        assert not missing, f"Missing columns: {missing}"
+
+    def test_no_nulls_in_primary_key(self, silver_df):
+        """Primary key cannot be null"""
+        null_count = silver_df.filter(col("vehicle_id").isNull()).count()
+        assert null_count == 0
+
+    def test_hash_key_generated_for_all_rows(self, silver_df):
+        """Every row must have hash_key"""
+        total = silver_df.count()
+        with_hash = silver_df.filter(col("hash_key").isNotNull()).count()
+        assert total == with_hash
+
+    @pytest.mark.slow
+    def test_deduplication_effectiveness(self, spark):
+        """Compare Bronze vs Silver row counts"""
+        bronze = spark.table("bronze_fvms.b_vehicle_master")
+        silver = spark.table("silver_fvms.s_vehicle_master")
+
+        bronze_count = bronze.count()
+        silver_count = silver.count()
+        dedup_rate = (bronze_count - silver_count) / bronze_count * 100
+
+        print(f"Deduplication removed {dedup_rate:.2f}% of rows")
+        assert silver_count <= bronze_count
+```
+
+## Execution Workflow
+
+1. **Read target file** ($ARGUMENTS) - Understand transformation logic
+2. **Identify live data sources** - Find Bronze/Silver tables used
+3. **Create test file** - `tests/test_<target>.py`
+4. **Write fixtures** - Setup Spark session, load live data samples
+5. **Write unit tests** - Test individual utility functions
+6. **Write integration tests** - Test full ETL with live data
+7. **Write validation tests** - Check data quality on live tables
+8. **Run tests**: `pytest tests/test_<target>.py -v`
+9. **Verify coverage**: Ensure >80% coverage of transformation logic
+
+## Output Deliverables
+
+- ✅ pytest test file with 10+ test cases
+- ✅ conftest.py with reusable fixtures
+- ✅ pytest.ini configuration
+- ✅ Tests use LIVE data from Bronze/Silver/Gold
+- ✅ All tests pass: `pytest -v`
+- ✅ Documentation comments showing live data usage