8.0 KiB
Executable File
Add Descriptive Comments to Code
Add detailed, descriptive comments to the selected file: $ARGUMENTS
Current Context
- Currently open file: !
echo $CLAUDE_OPEN_FILE - File layer detection: !
basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown" - Git status: !
git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"
Task
You will add comprehensive descriptive comments to the currently open file (or the file specified in $ARGUMENTS if provided).
Instructions
-
Determine Target File
- If $ARGUMENTS contains a file path, use that file
- Otherwise, use the currently open file from the IDE
- Verify the file exists and is readable
-
Analyze File Context
- Identify the file type (silver/gold layer transformation, utility, pipeline operation)
- Read and understand the complete file structure
- Identify the ETL pattern (extract, transform, load methods)
- Map out all DataFrame operations and transformations
-
Analyze Data Sources and Schemas
- Use DuckDB MCP to query relevant source tables if available:
-- Example: Check schema of source table DESCRIBE table_name; SELECT * FROM table_name LIMIT 5; - Reference
.claude/memory/data_dictionary/for column definitions and business context - Identify all source tables being read (bronze/silver layer)
- Document the schema of input and output DataFrames
- Use DuckDB MCP to query relevant source tables if available:
-
Document Joining Logic (Priority Focus)
- For each join operation, add comments explaining:
- WHY the join is happening (business reason)
- WHAT tables are being joined
- JOIN TYPE (left, inner, outer) and why that type was chosen
- JOIN KEYS and their meaning
- EXPECTED CARDINALITY (1:1, 1:many, many:many)
- NULL HANDLING strategy for unmatched records
Example format:
# JOIN: Link incidents to persons involved # Type: LEFT JOIN (preserve all incidents even if person data missing) # Keys: incident_id (unique identifier from FVMS system) # Expected: 1:many (one incident can have multiple persons) # Nulls: Person details will be NULL for incidents with no associated persons joined_df = incident_df.join(person_df, on="incident_id", how="left") - For each join operation, add comments explaining:
-
Document Transformations Step-by-Step
- Add inline comments explaining each transformation
- Describe column derivations and calculations
- Explain business rules being applied
- Document any data quality fixes or cleansing
- Note any deduplication logic
-
Document Data Quality Patterns
- Explain null handling strategies
- Document default values and their business meaning
- Describe validation rules
- Note any data type conversions
-
Add Function/Method Documentation
- Add docstring-style comments at the start of each method explaining:
- Purpose of the method
- Input: Source tables and their schemas
- Output: Resulting table and schema
- Business logic summary
Example format:
def transform(self) -> DataFrame: """ Transform incident data with person and location enrichment. Input: bronze_fvms.b_fvms_incident (raw incident records) Output: silver_fvms.s_fvms_incident (validated, enriched incidents) Transformations: 1. Join with person table to add demographic details 2. Join with address table to add location coordinates 3. Apply business rules for incident classification 4. Deduplicate based on incident_id and date_created 5. Add row hash for change detection Business Context: - Incidents represent family violence events recorded in FVMS - Each incident may involve multiple persons (victims, offenders) - Location data enables geographic analysis and reporting """ - Add docstring-style comments at the start of each method explaining:
-
Add Header Comments
- Add a comprehensive header at the top of the file explaining:
- File purpose and business context
- Source systems and tables
- Target table and database
- Key transformations and business rules
- Dependencies on other tables or processes
- Add a comprehensive header at the top of the file explaining:
-
Variable Naming Context
- When variable names are abbreviated or unclear, add comments explaining:
- What the variable represents
- The business meaning of the data
- Expected data types and formats
- Reference data dictionary entries if available
- When variable names are abbreviated or unclear, add comments explaining:
-
Use Data Dictionary References
- Check
.claude/memory/data_dictionary/for column definitions - Reference these definitions in comments to explain field meanings
- Link business terminology to technical column names
- Example:
# offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)
- Check
-
Query DuckDB for Context (When Available)
- Use MCP DuckDB tool to inspect actual data patterns:
- Check distinct values:
SELECT DISTINCT column_name FROM table LIMIT 20; - Verify join relationships:
SELECT COUNT(*) FROM table1 JOIN table2 ... - Understand data distributions:
SELECT column, COUNT(*) FROM table GROUP BY column; - Use insights from queries to write more accurate comments
-
Preserve Code Formatting Standards
- Do NOT add blank lines inside functions (project standard)
- Maximum line length: 240 characters
- Maintain existing indentation
- Keep comments concise but informative
- Use inline comments for single-line explanations
- Use block comments for multi-step processes
-
Focus Areas by File Type
Silver Layer Files (
python_files/silver/):- Document source bronze tables
- Explain validation rules
- Describe enumeration mappings
- Note data cleansing operations
Gold Layer Files (
python_files/gold/):- Document all source silver tables
- Explain aggregation logic
- Describe business metrics calculations
- Note analytical transformations
Utility Files (
python_files/utilities/):- Explain helper function purposes
- Document parameter meanings
- Describe return values
- Note edge cases handled
-
Comment Quality Guidelines
- Comments should explain WHY, not just WHAT
- Avoid obvious comments (e.g., don't say "create dataframe" for
df = spark.createDataFrame()) - Focus on business context and data relationships
- Use proper grammar and complete sentences
- Be concise but thorough
- Think like a new developer reading the code for the first time
-
Final Validation
- Run syntax check:
python3 -m py_compile <file> - Run linting:
ruff check <file> - Format code:
ruff format <file> - Ensure all comments are accurate and helpful
- Run syntax check:
Example Output Structure
After adding comments, the file should have:
- ✅ Comprehensive header explaining file purpose
- ✅ Method-level documentation for extract/transform/load
- ✅ Detailed join operation comments (business reason, type, keys, cardinality)
- ✅ Step-by-step transformation explanations
- ✅ Data quality and validation logic documented
- ✅ Variable context for unclear names
- ✅ References to data dictionary where applicable
- ✅ Business context linking technical operations to real-world meaning
Important Notes
- ALWAYS use Australian English spelling conventions throughout the comments and documentation
- DO NOT remove or modify existing functionality
- DO NOT change code structure or logic
- ONLY add descriptive comments
- PRESERVE all existing comments
- MAINTAIN project coding standards (no blank lines in functions, 240 char max)
- USE the data dictionary and DuckDB queries to provide accurate context
- THINK about the user who will read this code - walk them through the logic clearly