zhongwei/gh-linus-mcmanamey-unify-2-1-plugin

Files

Zhongwei Li 506a828b22 Initial commit

2025-11-30 08:37:55 +08:00

8.0 KiB

Executable File

Raw Blame History

allowed-tools: Read, mcp__mcp-server-motherduck__query, Grep, Glob, Bash argument-hint: [file-path] (optional - defaults to currently open file) description: Add comprehensive descriptive comments to code files, focusing on data flow, joining logic, and business context

Add Descriptive Comments to Code

Add detailed, descriptive comments to the selected file: $ARGUMENTS

Current Context

Currently open file: !echo $CLAUDE_OPEN_FILE
File layer detection: !basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown"
Git status: !git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"

Task

You will add comprehensive descriptive comments to the currently open file (or the file specified in $ARGUMENTS if provided).

Instructions

Determine Target File
- If $ARGUMENTS contains a file path, use that file
- Otherwise, use the currently open file from the IDE
- Verify the file exists and is readable
Analyze File Context
- Identify the file type (silver/gold layer transformation, utility, pipeline operation)
- Read and understand the complete file structure
- Identify the ETL pattern (extract, transform, load methods)
- Map out all DataFrame operations and transformations
Analyze Data Sources and Schemas
- Use DuckDB MCP to query relevant source tables if available:
```
-- Example: Check schema of source table
DESCRIBE table_name;
SELECT * FROM table_name LIMIT 5;
```
- Reference .claude/memory/data_dictionary/ for column definitions and business context
- Identify all source tables being read (bronze/silver layer)
- Document the schema of input and output DataFrames

Document Joining Logic (Priority Focus)

For each join operation, add comments explaining:
- WHY the join is happening (business reason)
- WHAT tables are being joined
- JOIN TYPE (left, inner, outer) and why that type was chosen
- JOIN KEYS and their meaning
- EXPECTED CARDINALITY (1:1, 1:many, many:many)
- NULL HANDLING strategy for unmatched records

Example format:

# JOIN: Link incidents to persons involved
# Type: LEFT JOIN (preserve all incidents even if person data missing)
# Keys: incident_id (unique identifier from FVMS system)
# Expected: 1:many (one incident can have multiple persons)
# Nulls: Person details will be NULL for incidents with no associated persons
joined_df = incident_df.join(person_df, on="incident_id", how="left")

Document Transformations Step-by-Step
- Add inline comments explaining each transformation
- Describe column derivations and calculations
- Explain business rules being applied
- Document any data quality fixes or cleansing
- Note any deduplication logic
Document Data Quality Patterns
- Explain null handling strategies
- Document default values and their business meaning
- Describe validation rules
- Note any data type conversions

Add Function/Method Documentation

Add docstring-style comments at the start of each method explaining:
- Purpose of the method
- Input: Source tables and their schemas
- Output: Resulting table and schema
- Business logic summary

Example format:

def transform(self) -> DataFrame:
    """
    Transform incident data with person and location enrichment.

    Input: bronze_fvms.b_fvms_incident (raw incident records)
    Output: silver_fvms.s_fvms_incident (validated, enriched incidents)

    Transformations:
    1. Join with person table to add demographic details
    2. Join with address table to add location coordinates
    3. Apply business rules for incident classification
    4. Deduplicate based on incident_id and date_created
    5. Add row hash for change detection

    Business Context:
    - Incidents represent family violence events recorded in FVMS
    - Each incident may involve multiple persons (victims, offenders)
    - Location data enables geographic analysis and reporting
    """

Add Header Comments
- Add a comprehensive header at the top of the file explaining:
  - File purpose and business context
  - Source systems and tables
  - Target table and database
  - Key transformations and business rules
  - Dependencies on other tables or processes
Variable Naming Context
- When variable names are abbreviated or unclear, add comments explaining:
  - What the variable represents
  - The business meaning of the data
  - Expected data types and formats
  - Reference data dictionary entries if available
Use Data Dictionary References
- Check .claude/memory/data_dictionary/ for column definitions
- Reference these definitions in comments to explain field meanings
- Link business terminology to technical column names
- Example: # offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)
Query DuckDB for Context (When Available)
- Use MCP DuckDB tool to inspect actual data patterns:
- Check distinct values: SELECT DISTINCT column_name FROM table LIMIT 20;
- Verify join relationships: SELECT COUNT(*) FROM table1 JOIN table2 ...
- Understand data distributions: SELECT column, COUNT(*) FROM table GROUP BY column;
- Use insights from queries to write more accurate comments
Preserve Code Formatting Standards
- Do NOT add blank lines inside functions (project standard)
- Maximum line length: 240 characters
- Maintain existing indentation
- Keep comments concise but informative
- Use inline comments for single-line explanations
- Use block comments for multi-step processes
Focus Areas by File Type

Silver Layer Files (python_files/silver/):
- Document source bronze tables
- Explain validation rules
- Describe enumeration mappings
- Note data cleansing operations
Gold Layer Files (python_files/gold/):
- Document all source silver tables
- Explain aggregation logic
- Describe business metrics calculations
- Note analytical transformations
Utility Files (python_files/utilities/):
- Explain helper function purposes
- Document parameter meanings
- Describe return values
- Note edge cases handled
Comment Quality Guidelines
- Comments should explain WHY, not just WHAT
- Avoid obvious comments (e.g., don't say "create dataframe" for df = spark.createDataFrame())
- Focus on business context and data relationships
- Use proper grammar and complete sentences
- Be concise but thorough
- Think like a new developer reading the code for the first time
Final Validation
- Run syntax check: python3 -m py_compile <file>
- Run linting: ruff check <file>
- Format code: ruff format <file>
- Ensure all comments are accurate and helpful

Example Output Structure

After adding comments, the file should have:

✅ Comprehensive header explaining file purpose
✅ Method-level documentation for extract/transform/load
✅ Detailed join operation comments (business reason, type, keys, cardinality)
✅ Step-by-step transformation explanations
✅ Data quality and validation logic documented
✅ Variable context for unclear names
✅ References to data dictionary where applicable
✅ Business context linking technical operations to real-world meaning

Important Notes

ALWAYS use Australian English spelling conventions throughout the comments and documentation
DO NOT remove or modify existing functionality
DO NOT change code structure or logic
ONLY add descriptive comments
PRESERVE all existing comments
MAINTAIN project coding standards (no blank lines in functions, 240 char max)
USE the data dictionary and DuckDB queries to provide accurate context
THINK about the user who will read this code - walk them through the logic clearly

8.0 KiB Executable File Raw Blame History