--- allowed-tools: Read, mcp__mcp-server-motherduck__query, Grep, Glob, Bash argument-hint: [file-path] (optional - defaults to currently open file) description: Add comprehensive descriptive comments to code files, focusing on data flow, joining logic, and business context --- # Add Descriptive Comments to Code Add detailed, descriptive comments to the selected file: $ARGUMENTS ## Current Context - Currently open file: !`echo $CLAUDE_OPEN_FILE` - File layer detection: !`basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown"` - Git status: !`git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"` ## Task You will add comprehensive descriptive comments to the **currently open file** (or the file specified in $ARGUMENTS if provided). ### Instructions 1. **Determine Target File** - If $ARGUMENTS contains a file path, use that file - Otherwise, use the currently open file from the IDE - Verify the file exists and is readable 2. **Analyze File Context** - Identify the file type (silver/gold layer transformation, utility, pipeline operation) - Read and understand the complete file structure - Identify the ETL pattern (extract, transform, load methods) - Map out all DataFrame operations and transformations 3. **Analyze Data Sources and Schemas** - Use DuckDB MCP to query relevant source tables if available: ```sql -- Example: Check schema of source table DESCRIBE table_name; SELECT * FROM table_name LIMIT 5; ``` - Reference `.claude/memory/data_dictionary/` for column definitions and business context - Identify all source tables being read (bronze/silver layer) - Document the schema of input and output DataFrames 4. **Document Joining Logic (Priority Focus)** - For each join operation, add comments explaining: - **WHY** the join is happening (business reason) - **WHAT** tables are being joined - **JOIN TYPE** (left, inner, outer) and why that type was chosen - **JOIN KEYS** and their meaning - **EXPECTED CARDINALITY** (1:1, 1:many, many:many) - **NULL HANDLING** strategy for unmatched records Example format: ```python # JOIN: Link incidents to persons involved # Type: LEFT JOIN (preserve all incidents even if person data missing) # Keys: incident_id (unique identifier from FVMS system) # Expected: 1:many (one incident can have multiple persons) # Nulls: Person details will be NULL for incidents with no associated persons joined_df = incident_df.join(person_df, on="incident_id", how="left") ``` 5. **Document Transformations Step-by-Step** - Add inline comments explaining each transformation - Describe column derivations and calculations - Explain business rules being applied - Document any data quality fixes or cleansing - Note any deduplication logic 6. **Document Data Quality Patterns** - Explain null handling strategies - Document default values and their business meaning - Describe validation rules - Note any data type conversions 7. **Add Function/Method Documentation** - Add docstring-style comments at the start of each method explaining: - Purpose of the method - Input: Source tables and their schemas - Output: Resulting table and schema - Business logic summary Example format: ```python def transform(self) -> DataFrame: """ Transform incident data with person and location enrichment. Input: bronze_fvms.b_fvms_incident (raw incident records) Output: silver_fvms.s_fvms_incident (validated, enriched incidents) Transformations: 1. Join with person table to add demographic details 2. Join with address table to add location coordinates 3. Apply business rules for incident classification 4. Deduplicate based on incident_id and date_created 5. Add row hash for change detection Business Context: - Incidents represent family violence events recorded in FVMS - Each incident may involve multiple persons (victims, offenders) - Location data enables geographic analysis and reporting """ ``` 8. **Add Header Comments** - Add a comprehensive header at the top of the file explaining: - File purpose and business context - Source systems and tables - Target table and database - Key transformations and business rules - Dependencies on other tables or processes 9. **Variable Naming Context** - When variable names are abbreviated or unclear, add comments explaining: - What the variable represents - The business meaning of the data - Expected data types and formats - Reference data dictionary entries if available 10. **Use Data Dictionary References** - Check `.claude/memory/data_dictionary/` for column definitions - Reference these definitions in comments to explain field meanings - Link business terminology to technical column names - Example: `# offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)` 11. **Query DuckDB for Context (When Available)** - Use MCP DuckDB tool to inspect actual data patterns: - Check distinct values: `SELECT DISTINCT column_name FROM table LIMIT 20;` - Verify join relationships: `SELECT COUNT(*) FROM table1 JOIN table2 ...` - Understand data distributions: `SELECT column, COUNT(*) FROM table GROUP BY column;` - Use insights from queries to write more accurate comments 12. **Preserve Code Formatting Standards** - Do NOT add blank lines inside functions (project standard) - Maximum line length: 240 characters - Maintain existing indentation - Keep comments concise but informative - Use inline comments for single-line explanations - Use block comments for multi-step processes 13. **Focus Areas by File Type** **Silver Layer Files (`python_files/silver/`):** - Document source bronze tables - Explain validation rules - Describe enumeration mappings - Note data cleansing operations **Gold Layer Files (`python_files/gold/`):** - Document all source silver tables - Explain aggregation logic - Describe business metrics calculations - Note analytical transformations **Utility Files (`python_files/utilities/`):** - Explain helper function purposes - Document parameter meanings - Describe return values - Note edge cases handled 14. **Comment Quality Guidelines** - Comments should explain **WHY**, not just **WHAT** - Avoid obvious comments (e.g., don't say "create dataframe" for `df = spark.createDataFrame()`) - Focus on business context and data relationships - Use proper grammar and complete sentences - Be concise but thorough - Think like a new developer reading the code for the first time 15. **Final Validation** - Run syntax check: `python3 -m py_compile ` - Run linting: `ruff check ` - Format code: `ruff format ` - Ensure all comments are accurate and helpful ## Example Output Structure After adding comments, the file should have: - ✅ Comprehensive header explaining file purpose - ✅ Method-level documentation for extract/transform/load - ✅ Detailed join operation comments (business reason, type, keys, cardinality) - ✅ Step-by-step transformation explanations - ✅ Data quality and validation logic documented - ✅ Variable context for unclear names - ✅ References to data dictionary where applicable - ✅ Business context linking technical operations to real-world meaning ## Important Notes - **ALWAYS** use Australian English spelling conventions throughout the comments and documentation - **DO NOT** remove or modify existing functionality - **DO NOT** change code structure or logic - **ONLY** add descriptive comments - **PRESERVE** all existing comments - **MAINTAIN** project coding standards (no blank lines in functions, 240 char max) - **USE** the data dictionary and DuckDB queries to provide accurate context - **THINK** about the user who will read this code - walk them through the logic clearly