198 lines
8.0 KiB
Markdown
Executable File
198 lines
8.0 KiB
Markdown
Executable File
---
|
|
allowed-tools: Read, mcp__mcp-server-motherduck__query, Grep, Glob, Bash
|
|
argument-hint: [file-path] (optional - defaults to currently open file)
|
|
description: Add comprehensive descriptive comments to code files, focusing on data flow, joining logic, and business context
|
|
---
|
|
|
|
# Add Descriptive Comments to Code
|
|
|
|
Add detailed, descriptive comments to the selected file: $ARGUMENTS
|
|
|
|
## Current Context
|
|
|
|
- Currently open file: !`echo $CLAUDE_OPEN_FILE`
|
|
- File layer detection: !`basename $(dirname $CLAUDE_OPEN_FILE) 2>/dev/null || echo "unknown"`
|
|
- Git status: !`git status --porcelain $CLAUDE_OPEN_FILE 2>/dev/null || echo "Not in git"`
|
|
|
|
## Task
|
|
|
|
You will add comprehensive descriptive comments to the **currently open file** (or the file specified in $ARGUMENTS if provided).
|
|
|
|
### Instructions
|
|
|
|
1. **Determine Target File**
|
|
- If $ARGUMENTS contains a file path, use that file
|
|
- Otherwise, use the currently open file from the IDE
|
|
- Verify the file exists and is readable
|
|
|
|
2. **Analyze File Context**
|
|
- Identify the file type (silver/gold layer transformation, utility, pipeline operation)
|
|
- Read and understand the complete file structure
|
|
- Identify the ETL pattern (extract, transform, load methods)
|
|
- Map out all DataFrame operations and transformations
|
|
|
|
3. **Analyze Data Sources and Schemas**
|
|
- Use DuckDB MCP to query relevant source tables if available:
|
|
```sql
|
|
-- Example: Check schema of source table
|
|
DESCRIBE table_name;
|
|
SELECT * FROM table_name LIMIT 5;
|
|
```
|
|
- Reference `.claude/memory/data_dictionary/` for column definitions and business context
|
|
- Identify all source tables being read (bronze/silver layer)
|
|
- Document the schema of input and output DataFrames
|
|
|
|
4. **Document Joining Logic (Priority Focus)**
|
|
- For each join operation, add comments explaining:
|
|
- **WHY** the join is happening (business reason)
|
|
- **WHAT** tables are being joined
|
|
- **JOIN TYPE** (left, inner, outer) and why that type was chosen
|
|
- **JOIN KEYS** and their meaning
|
|
- **EXPECTED CARDINALITY** (1:1, 1:many, many:many)
|
|
- **NULL HANDLING** strategy for unmatched records
|
|
|
|
Example format:
|
|
```python
|
|
# JOIN: Link incidents to persons involved
|
|
# Type: LEFT JOIN (preserve all incidents even if person data missing)
|
|
# Keys: incident_id (unique identifier from FVMS system)
|
|
# Expected: 1:many (one incident can have multiple persons)
|
|
# Nulls: Person details will be NULL for incidents with no associated persons
|
|
joined_df = incident_df.join(person_df, on="incident_id", how="left")
|
|
```
|
|
|
|
5. **Document Transformations Step-by-Step**
|
|
- Add inline comments explaining each transformation
|
|
- Describe column derivations and calculations
|
|
- Explain business rules being applied
|
|
- Document any data quality fixes or cleansing
|
|
- Note any deduplication logic
|
|
|
|
6. **Document Data Quality Patterns**
|
|
- Explain null handling strategies
|
|
- Document default values and their business meaning
|
|
- Describe validation rules
|
|
- Note any data type conversions
|
|
|
|
7. **Add Function/Method Documentation**
|
|
- Add docstring-style comments at the start of each method explaining:
|
|
- Purpose of the method
|
|
- Input: Source tables and their schemas
|
|
- Output: Resulting table and schema
|
|
- Business logic summary
|
|
|
|
Example format:
|
|
```python
|
|
def transform(self) -> DataFrame:
|
|
"""
|
|
Transform incident data with person and location enrichment.
|
|
|
|
Input: bronze_fvms.b_fvms_incident (raw incident records)
|
|
Output: silver_fvms.s_fvms_incident (validated, enriched incidents)
|
|
|
|
Transformations:
|
|
1. Join with person table to add demographic details
|
|
2. Join with address table to add location coordinates
|
|
3. Apply business rules for incident classification
|
|
4. Deduplicate based on incident_id and date_created
|
|
5. Add row hash for change detection
|
|
|
|
Business Context:
|
|
- Incidents represent family violence events recorded in FVMS
|
|
- Each incident may involve multiple persons (victims, offenders)
|
|
- Location data enables geographic analysis and reporting
|
|
"""
|
|
```
|
|
|
|
8. **Add Header Comments**
|
|
- Add a comprehensive header at the top of the file explaining:
|
|
- File purpose and business context
|
|
- Source systems and tables
|
|
- Target table and database
|
|
- Key transformations and business rules
|
|
- Dependencies on other tables or processes
|
|
|
|
9. **Variable Naming Context**
|
|
- When variable names are abbreviated or unclear, add comments explaining:
|
|
- What the variable represents
|
|
- The business meaning of the data
|
|
- Expected data types and formats
|
|
- Reference data dictionary entries if available
|
|
|
|
10. **Use Data Dictionary References**
|
|
- Check `.claude/memory/data_dictionary/` for column definitions
|
|
- Reference these definitions in comments to explain field meanings
|
|
- Link business terminology to technical column names
|
|
- Example: `# offence_code: Maps to ANZSOC classification system (see data_dict/cms_offence_codes.md)`
|
|
|
|
11. **Query DuckDB for Context (When Available)**
|
|
- Use MCP DuckDB tool to inspect actual data patterns:
|
|
- Check distinct values: `SELECT DISTINCT column_name FROM table LIMIT 20;`
|
|
- Verify join relationships: `SELECT COUNT(*) FROM table1 JOIN table2 ...`
|
|
- Understand data distributions: `SELECT column, COUNT(*) FROM table GROUP BY column;`
|
|
- Use insights from queries to write more accurate comments
|
|
|
|
12. **Preserve Code Formatting Standards**
|
|
- Do NOT add blank lines inside functions (project standard)
|
|
- Maximum line length: 240 characters
|
|
- Maintain existing indentation
|
|
- Keep comments concise but informative
|
|
- Use inline comments for single-line explanations
|
|
- Use block comments for multi-step processes
|
|
|
|
13. **Focus Areas by File Type**
|
|
|
|
**Silver Layer Files (`python_files/silver/`):**
|
|
- Document source bronze tables
|
|
- Explain validation rules
|
|
- Describe enumeration mappings
|
|
- Note data cleansing operations
|
|
|
|
**Gold Layer Files (`python_files/gold/`):**
|
|
- Document all source silver tables
|
|
- Explain aggregation logic
|
|
- Describe business metrics calculations
|
|
- Note analytical transformations
|
|
|
|
**Utility Files (`python_files/utilities/`):**
|
|
- Explain helper function purposes
|
|
- Document parameter meanings
|
|
- Describe return values
|
|
- Note edge cases handled
|
|
|
|
14. **Comment Quality Guidelines**
|
|
- Comments should explain **WHY**, not just **WHAT**
|
|
- Avoid obvious comments (e.g., don't say "create dataframe" for `df = spark.createDataFrame()`)
|
|
- Focus on business context and data relationships
|
|
- Use proper grammar and complete sentences
|
|
- Be concise but thorough
|
|
- Think like a new developer reading the code for the first time
|
|
|
|
15. **Final Validation**
|
|
- Run syntax check: `python3 -m py_compile <file>`
|
|
- Run linting: `ruff check <file>`
|
|
- Format code: `ruff format <file>`
|
|
- Ensure all comments are accurate and helpful
|
|
|
|
## Example Output Structure
|
|
|
|
After adding comments, the file should have:
|
|
- ✅ Comprehensive header explaining file purpose
|
|
- ✅ Method-level documentation for extract/transform/load
|
|
- ✅ Detailed join operation comments (business reason, type, keys, cardinality)
|
|
- ✅ Step-by-step transformation explanations
|
|
- ✅ Data quality and validation logic documented
|
|
- ✅ Variable context for unclear names
|
|
- ✅ References to data dictionary where applicable
|
|
- ✅ Business context linking technical operations to real-world meaning
|
|
|
|
## Important Notes
|
|
- **ALWAYS** use Australian English spelling conventions throughout the comments and documentation
|
|
- **DO NOT** remove or modify existing functionality
|
|
- **DO NOT** change code structure or logic
|
|
- **ONLY** add descriptive comments
|
|
- **PRESERVE** all existing comments
|
|
- **MAINTAIN** project coding standards (no blank lines in functions, 240 char max)
|
|
- **USE** the data dictionary and DuckDB queries to provide accurate context
|
|
- **THINK** about the user who will read this code - walk them through the logic clearly
|