# Assessing Code Quality ## Purpose Analyze code quality indicators beyond architecture with EVIDENCE-BASED findings - produces quality scorecard with concrete examples, file/line references, and actionable recommendations. ## When to Use - Coordinator delegates quality assessment after subsystem catalog completion - Need to identify specific, actionable quality issues beyond architectural patterns - Output feeds into architect handover or improvement planning ## Core Principle: Evidence Over Speculation **Good quality analysis provides concrete evidence. Poor quality analysis makes educated guesses.** ``` ❌ BAD: "Payment service likely lacks error handling" ✅ GOOD: "Payment service process_payment() (line 127) raises generic Exception, loses error context" ❌ BAD: "Functions may be too long" ✅ GOOD: "src/api/orders.py:process_order() is 145 lines (lines 89-234)" ❌ BAD: "Error handling unclear" ✅ GOOD: "3 different error patterns: orders.py uses exceptions, payment.py returns codes, user.py mixes both" ``` **Your goal:** Document WHAT YOU OBSERVED, not what you guess might be true. ## The Speculation Trap (from baseline testing) **Common rationalization:** "I don't have full context, so I'll make educated guesses to provide value." **Reality:** Speculation masquerading as analysis is worse than saying "insufficient information." **Baseline failure mode:** - Document says "Error handling likely mixes patterns" - Based on: Filename alone, not actual code review - Should say: "Reviewed file structure only - implementation details not analyzed" ## Weasel Words - BANNED If you catch yourself writing these, STOP: | Banned Phrase | Why Banned | Replace With | |---------------|------------|--------------| | "Likely" | Speculation | "Observed in file.py:line" OR "Not analyzed" | | "May" | Hedge | "Found X" OR "Did not review X" | | "Unclear if" | Evasion | "Code shows X" OR "Insufficient information to assess X" | | "Appears to" | Guessing | "Lines 12-45 implement X" OR "Not examined" | | "Probably" | Assumption | Concrete observation OR admit limitation | | "Should" | Inference | "Currently does X" (observation) | | "Suggests" | Reading between lines | Quote actual code OR acknowledge gap | **From baseline testing:** Agents will use these to fill gaps. Skill must ban them explicitly. ## Evidence Requirements **Every finding MUST include:** 1. **File path** - Exact location - ✅ `src/services/payment.py` - ❌ "payment service" 2. **Line numbers** - Specific range - ✅ "Lines 127-189 (63 lines)" - ❌ "Long function" 3. **Code example or description** - What you saw - ✅ "Function has 8 nested if statements" - ✅ "Quoted: `except Exception as e: pass`" - ❌ "Complex logic" 4. **Severity with reasoning** - Why this level - ✅ "Critical: Swallows exceptions in payment processing" - ❌ "High priority issue" **If you didn't examine implementation:** - ✅ "Reviewed imports and structure only - implementation not analyzed" - ❌ "Function likely does X" ## Severity Framework (REQUIRED) **Define criteria BEFORE rating anything:** **Critical:** - Security vulnerability (injection, exposed secrets, auth bypass) - Data loss/corruption risk - Blocks deployment - Example: Hardcoded credentials, SQL injection, unhandled exceptions in critical path **High:** - Frequent source of bugs - High maintenance burden - Performance degradation - Example: 200-line functions, 15% code duplication, N+1 queries **Medium:** - Moderate maintainability concern - Technical debt accumulation - Example: Missing docstrings, inconsistent error handling, magic numbers **Low:** - Minor improvement - Style/cosmetic - Example: Verbose naming, minor duplication (< 5%) **From baseline testing:** Agents will use severity labels without defining them. Enforce framework first. ## Observation vs. Inference **Distinguish clearly:** **Observation** (what you saw): - "Function process_order() is 145 lines" - "3 files contain identical validation logic (lines quoted)" - "No docstrings found in src/api/ (0/12 functions)" **Inference** (what it might mean): - "145-line function suggests complexity - recommend review" - "Identical validation in 3 files indicates duplication - recommend extraction" - "Missing docstrings may hinder onboarding - recommend adding" **Always lead with observation, inference optional:** ```markdown **Observed:** `src/api/orders.py:process_order()` is 145 lines (lines 89-234) with 12 decision points **Assessment:** High complexity - recommend splitting into smaller functions ``` ## "Insufficient Information" is Valid **When you haven't examined something, say so:** ✅ **Honest limitation:** ```markdown ## Testing Coverage **Not analyzed** - Test files not reviewed during subsystem cataloging **Recommendation:** Review test/ directory to assess coverage ``` ❌ **Speculation:** ```markdown ## Testing Coverage Testing likely exists but coverage unclear. May have integration tests. Probably needs more unit tests. ``` **From baseline testing:** Agents filled every section with speculation. Skill must make "not analyzed" acceptable. ## Minimal Viable Quality Assessment **If time/sample constraints prevent comprehensive analysis:** 1. **Acknowledge scope explicitly** - "Reviewed 5 files from 8 subsystems" - "Implementation details not examined (imports/structure only)" 2. **Document what you CAN observe** - File sizes, function counts (from grep/wc) - Imports (coupling indicators) - Structure (directory organization) 3. **List gaps explicitly** - "Not analyzed: Test coverage, runtime behavior, security patterns" 4. **Mark confidence appropriately** - "Low confidence: Small sample, structural review only" **Better to be honest about limitations than to guess.** ## Output Contract Write to `05-quality-assessment.md`: ```markdown # Code Quality Assessment **Analysis Date:** YYYY-MM-DD **Scope:** [What you reviewed - be specific] **Methodology:** [How you analyzed - tools, sampling strategy] **Confidence:** [High/Medium/Low with reasoning] ## Severity Framework [Define Critical/High/Medium/Low with examples] ## Findings ### [Category: Complexity/Duplication/etc.] **Observed:** - [File path:line numbers] [Concrete observation] - [Quote code or describe specifically] **Severity:** [Level] - [Reasoning based on framework] **Recommendation:** [Specific action] ### [Repeat for each finding] ## What Was NOT Analyzed - [Explicitly list gaps] - [No speculation about these areas] ## Recommendations [Prioritized by severity] ## Limitations - Sample size: [N files from M subsystems] - Methodology constraints: [No tools, time pressure, etc.] - Confidence assessment: [Why Medium/Low] ``` ## Success Criteria **You succeeded when:** - Every finding has file:line evidence - No weasel words ("likely", "may", "unclear if") - Severity framework defined before use - Observations distinguished from inferences - Gaps acknowledged as "not analyzed" - Output follows contract **You failed when:** - Speculation instead of evidence - Vague findings ("functions may be long") - Undefined severity ratings - "Unclear if" everywhere - Professional-looking guesswork ## Common Rationalizations (STOP SIGNALS) | Excuse | Reality | |--------|---------| | "I'll provide value with educated guesses" | Speculation is not analysis. Be honest about gaps. | | "I can infer from file names" | File names ≠ implementation. Acknowledge limitation. | | "Stakeholders expect complete analysis" | They expect accurate analysis. Partial truth > full speculation. | | "I reviewed it during cataloging" | Catalog focus ≠ quality focus. If you didn't examine for quality, say so. | | "Professional reports don't say 'I don't know'" | Professional reports distinguish facts from limitations. | ## Integration with Workflow Typically invoked after subsystem catalog completion, before or alongside diagram generation. Feeds findings into final report and architect handover.