19 KiB
Documenting System Architecture
Purpose
Synthesize subsystem catalogs and architecture diagrams into final, stakeholder-ready architecture reports that serve multiple audiences through clear structure, comprehensive navigation, and actionable findings.
When to Use
- Coordinator delegates final report generation from validated artifacts
- Have
02-subsystem-catalog.mdand03-diagrams.mdas inputs - Task specifies writing to
04-final-report.md - Need to produce executive-readable architecture documentation
- Output represents deliverable for stakeholders
Core Principle: Synthesis Over Concatenation
Good reports synthesize information into insights. Poor reports concatenate source documents.
Your goal: Create a coherent narrative with extracted patterns, concerns, and recommendations - not a copy-paste of inputs.
Document Structure
Required Sections
1. Front Matter
- Document title
- Version number
- Analysis date
- Classification (if needed)
2. Table of Contents
- Multi-level hierarchy (H2, H3, H4)
- Anchor links to all major sections
- Quick navigation for readers
3. Executive Summary (2-3 paragraphs)
- High-level system overview
- Key architectural patterns
- Major concerns and confidence assessment
- Should be readable standalone by leadership
4. System Overview
- Purpose and scope
- Technology stack
- System context (external dependencies)
5. Architecture Diagrams
- Embed all diagrams from
03-diagrams.md - Add contextual analysis after each diagram
- Cross-reference to subsystem catalog
6. Subsystem Catalog
- One detailed entry per subsystem
- Synthesize from
02-subsystem-catalog.md(don't just copy) - Add cross-references to diagrams and findings
7. Key Findings
- Architectural Patterns: Identified across subsystems
- Technical Concerns: Extracted from catalog concerns
- Recommendations: Actionable next steps with priorities
8. Appendices
- Methodology: How analysis was performed
- Confidence Levels: Rationale for confidence ratings
- Assumptions & Limitations: What you inferred, what's missing
Synthesis Strategies
Pattern Identification
Look across subsystems for recurring patterns:
From catalog observations:
- Subsystem A: "Dependency injection for testability"
- Subsystem B: "All external services injected"
- Subsystem C: "Injected dependencies for testing"
Synthesize into pattern:
### Dependency Injection Pattern
**Observed in**: Authentication Service, API Gateway, User Service
**Description**: External dependencies are injected rather than directly instantiated, enabling test isolation and loose coupling.
**Benefits**:
- Testability: Mock dependencies in unit tests
- Flexibility: Swap implementations without code changes
- Loose coupling: Services depend on interfaces, not concrete implementations
**Trade-offs**:
- Initial complexity: Requires dependency wiring infrastructure
- Runtime overhead: Minimal (dependency resolution at startup)
Concern Extraction
Find concerns buried in catalog entries:
Catalog entries:
- API Gateway: "Rate limiter uses in-memory storage (doesn't scale horizontally)"
- Database Layer: "Connection pool max size hardcoded (should be configurable)"
- Data Service: "Large analytics queries can cause database load spikes"
Synthesize into findings:
## Technical Concerns
### 1. Rate Limiter Scalability Issue
**Severity**: Medium
**Affected Subsystem**: [API Gateway](#api-gateway)
**Issue**: In-memory rate limiting prevents horizontal scaling. If multiple gateway instances run, each maintains separate counters, allowing clients to exceed intended limits by distributing requests across instances.
**Impact**:
- Cannot scale gateway horizontally without distributed rate limiting
- Potential for rate limit bypass under load balancing
- Inconsistent rate limit enforcement
**Remediation**:
1. **Immediate** (next sprint): Document limitation, add monitoring alerts
2. **Short-term** (next quarter): Migrate to Redis-backed rate limiter
3. **Validation**: Test rate limiting with multiple gateway instances
**Priority**: High (blocks horizontal scaling)
Recommendation Prioritization
Priority recommendations using severity scoring + impact assessment + timeline buckets:
Severity Scoring (for each concern/recommendation)
Critical:
- Blocks deployment or core functionality
- Security vulnerability (data exposure, injection, auth bypass)
- Data corruption or loss risk
- Service outage potential
- Examples: SQL injection, hardcoded credentials, unhandled critical exceptions
High:
- Significant maintainability impact
- High effort to modify or extend
- Frequent source of bugs
- Performance degradation under load
- Examples: God objects, extreme duplication, shotgun surgery, N+1 queries
Medium:
- Moderate maintainability concern
- Refactoring beneficial but not urgent
- Technical debt accumulation
- Examples: Long functions, missing documentation, inconsistent error handling
Low:
- Minor quality improvement
- Cosmetic or style issues
- Nice-to-have enhancements
- Examples: Magic numbers, verbose naming, minor duplication
Impact Assessment Matrix
Use 2-dimensional scoring: Severity × Frequency
| Severity | High Frequency | Medium Frequency | Low Frequency |
|---|---|---|---|
| Critical | P1 - Fix immediately | P1 - Fix immediately | P2 - Fix ASAP |
| High | P2 - Fix ASAP | P2 - Fix ASAP | P3 - Plan for sprint |
| Medium | P3 - Plan for sprint | P4 - Backlog | P4 - Backlog |
| Low | P4 - Backlog | P4 - Backlog | P5 - Optional |
Frequency assessment:
- High: Affects core user workflows, used constantly, blocking development
- Medium: Affects some workflows, occasional impact, periodic friction
- Low: Edge case, rarely encountered, minimal operational impact
Timeline Buckets
Immediate (This Week / Next Sprint):
- P1 priorities (Critical issues regardless of frequency)
- Security vulnerabilities
- Blocking deployment or development
- Quick wins (high impact, low effort)
Short-Term (1-3 Months / Next Quarter):
- P2 priorities (High severity or critical+low frequency)
- Significant maintainability improvements
- Performance optimizations
- Breaking circular dependencies
Medium-Term (3-6 Months):
- P3 priorities (Medium severity+high frequency or high+low)
- Architectural refactoring
- Technical debt paydown
- System-wide improvements
Long-Term (6-12+ Months):
- P4-P5 priorities (Low severity, backlog items)
- Nice-to-have improvements
- Experimental optimizations
- Deferred enhancements
Prioritized Recommendation Format
## Recommendations
### Immediate (This Week / Next Sprint) - P1
**1. Fix Rate Limiter Scalability Vulnerability**
- **Severity:** Critical (blocks horizontal scaling)
- **Frequency:** High (affects all gateway scaling attempts)
- **Priority:** P1
- **Impact:** Cannot scale API gateway, potential rate limit bypass
- **Effort:** Medium (2-3 days migration to Redis)
- **Action:**
1. Document current limitation in ops runbook (Day 1)
2. Add monitoring for rate limit violations (Day 1)
3. Migrate to Redis-backed rate limiter (Days 2-3)
4. Validate with load testing (Day 3)
**2. Remove Hardcoded Database Credentials**
- **Severity:** Critical (security vulnerability)
- **Frequency:** Low (only affects DB config rotation)
- **Priority:** P1
- **Impact:** Credentials exposed in source control, rotation requires code deployment
- **Effort:** Low (< 1 day)
- **Action:**
1. Move credentials to environment variables
2. Update deployment configs
3. Rotate compromised credentials
### Short-Term (1-3 Months / Next Quarter) - P2
**3. Extract Common Validation Framework**
- **Severity:** High (high duplication, shotgun surgery for validation changes)
- **Frequency:** High (every new API endpoint)
- **Priority:** P2
- **Impact:** 3 duplicate validation implementations, 15% code duplication
- **Effort:** Medium (1 week to extract + migrate)
- **Action:**
1. Design validation framework API (2 days)
2. Implement core framework (2 days)
3. Migrate existing validators (2 days)
4. Document validation patterns (1 day)
**4. Externalize Database Pool Configuration**
- **Severity:** High (hardcoded limits cause connection exhaustion)
- **Frequency:** Medium (impacts under load spikes)
- **Priority:** P2
- **Impact:** Connection pool exhaustion during traffic spikes
- **Effort:** Low (2 days)
- **Action:**
1. Move pool config to environment variables
2. Add runtime pool size adjustment
3. Document tuning guidelines
### Medium-Term (3-6 Months) - P3
**5. Break User ↔ Notification Circular Dependency**
- **Severity:** Medium (architectural coupling)
- **Frequency:** Medium (affects both subsystem modifications)
- **Priority:** P3
- **Impact:** Difficult to modify either service independently
- **Effort:** High (2-3 weeks, requires event bus introduction)
- **Action:**
1. Design event bus architecture (1 week)
2. Implement notification via events (1 week)
3. Migrate user service to publish events (3 days)
4. Remove direct dependency (2 days)
**6. Add Docstrings to Public API (27% → 90% coverage)**
- **Severity:** Medium (maintainability concern)
- **Frequency:** Medium (affects onboarding, API understanding)
- **Priority:** P3
- **Impact:** Poor API discoverability, onboarding friction
- **Effort:** Medium (2-3 weeks distributed work)
- **Action:**
1. Establish docstring standard (1 day)
2. Document public APIs in batches (2 weeks)
3. Add pre-commit hook to enforce (1 day)
### Long-Term (6-12+ Months) - P4-P5
**7. Evaluate Circuit Breaker Effectiveness**
- **Severity:** Low (optimization opportunity)
- **Frequency:** Low (affects only failure scenarios)
- **Priority:** P4
- **Impact:** Potential false positives, could improve resilience
- **Effort:** Medium (1 week testing + analysis)
- **Action:** Load testing + monitoring analysis when capacity allows
**8. Extract Magic Numbers to Configuration**
- **Severity:** Low (code quality improvement)
- **Frequency:** Low (rarely needs changing)
- **Priority:** P5
- **Impact:** Minor maintainability improvement
- **Effort:** Low (2-3 days)
- **Action:** Backlog item, tackle during related refactoring
Priority Summary Table
Include summary table for quick scanning:
## Priority Summary
| Priority | Count | Severity Distribution | Total Effort |
|----------|-------|----------------------|--------------|
| **P1** (Immediate) | 2 | Critical: 2 | 4 days |
| **P2** (Short-term) | 2 | High: 2 | 2.5 weeks |
| **P3** (Medium-term) | 2 | Medium: 2 | 5-6 weeks |
| **P4-P5** (Long-term) | 2 | Low: 2 | 2 weeks |
| **Total** | 8 | - | ~10 weeks |
**Recommended sprint allocation:**
- Sprint 1: P1 items (4 days) + start P2.3 validation framework
- Sprint 2: Complete P2.3 + P2.4 database pool config
- Quarter 2: P3 items (architectural improvements)
- Backlog: P4-P5 items (opportunistic improvements)
Cross-Referencing Strategy
Bidirectional Links
Subsystem → Diagram:
## Authentication Service
[...subsystem details...]
**Component Architecture**: See [Authentication Service Components](#auth-service-components) diagram
**Dependencies**: [API Gateway](#api-gateway), [Database Layer](#database-layer)
Diagram → Subsystem:
### Authentication Service Components
[...diagram...]
**Description**: This component diagram shows internal structure of the Authentication Service. For additional operational details, see [Authentication Service](#authentication-service) in the subsystem catalog.
Finding → Subsystem:
### Rate Limiter Scalability Issue
**Affected Subsystem**: [API Gateway](#api-gateway)
[...concern details...]
Navigation Patterns
Table of contents with anchor links:
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [System Overview](#system-overview)
- [Purpose and Scope](#purpose-and-scope)
- [Technology Stack](#technology-stack)
3. [Architecture Diagrams](#architecture-diagrams)
- [Level 1: Context](#level-1-context)
- [Level 2: Container](#level-2-container)
Multi-Audience Considerations
Executive Audience
What they need:
- Executive summary ONLY (should be self-contained)
- High-level patterns and risks
- Business impact of concerns
- Clear recommendations with timelines
Document design:
- Put executive summary first
- Make it readable standalone (no forward references)
- Focus on "why this matters" over "how it works"
Architect Audience
What they need:
- System overview + architecture diagrams + key findings
- Pattern analysis with trade-offs
- Dependency relationships
- Design decisions and rationale
Document design:
- System overview explains context
- Diagrams show structure at multiple levels
- Findings synthesize patterns and concerns
- Cross-references enable non-linear reading
Engineer Audience
What they need:
- Subsystem catalog with technical details
- Component diagrams showing internal structure
- Technology stack specifics
- File references and entry points
Document design:
- Detailed subsystem catalog
- Component-level diagrams
- Technology stack section with versions/frameworks
- Code/file references where available
Operations Audience
What they need:
- Technical concerns with remediation
- Dependency mapping
- Confidence levels (what's validated vs assumed)
- Recommendations with priorities
Document design:
- Technical concerns section up front
- Clear remediation steps
- Appendix with assumptions/limitations
- Prioritized recommendations
Optional Enhancements
Visual Aids
Subsystem Quick Reference Table:
## Appendix D: Subsystem Quick Reference
| Subsystem | Location | Confidence | Key Concerns | Dependencies |
|-----------|----------|------------|--------------|--------------|
| API Gateway | /src/gateway/ | High | Rate limiter scalability | Auth, User, Data, Logging |
| Auth Service | /src/services/auth/ | High | None | Database, Cache, Logging |
| User Service | /src/services/users/ | High | None | Database, Cache, Notification |
Pattern Summary Matrix:
## Architectural Patterns Summary
| Pattern | Subsystems Using | Benefits | Trade-offs |
|---------|------------------|----------|------------|
| Dependency Injection | Auth, Gateway, User | Testability, flexibility | Initial complexity |
| Repository Pattern | User, Data | Data access abstraction | Extra layer |
| Circuit Breaker | Gateway | Fault isolation | False positives |
Reading Guide
## How to Read This Document
**For Executives** (5 minutes):
- Read [Executive Summary](#executive-summary) only
- Optionally skim [Recommendations](#recommendations)
**For Architects** (30 minutes):
- Read [Executive Summary](#executive-summary)
- Read [System Overview](#system-overview)
- Review [Architecture Diagrams](#architecture-diagrams)
- Read [Key Findings](#key-findings)
**For Engineers** (1 hour):
- Read [System Overview](#system-overview)
- Study [Architecture Diagrams](#architecture-diagrams) (all levels)
- Read [Subsystem Catalog](#subsystem-catalog) for relevant services
- Review [Technical Concerns](#technical-concerns)
**For Operations** (45 minutes):
- Read [Executive Summary](#executive-summary)
- Study [Technical Concerns](#technical-concerns)
- Review [Recommendations](#recommendations)
- Read [Appendix C: Assumptions and Limitations](#appendix-c-assumptions-and-limitations)
Glossary
## Appendix E: Glossary
**Circuit Breaker**: Fault tolerance pattern that prevents cascading failures by temporarily blocking requests to failing services.
**Dependency Injection**: Design pattern where dependencies are provided to components rather than constructed internally, enabling testability and loose coupling.
**Repository Pattern**: Data access abstraction that separates business logic from data persistence concerns.
**Optimistic Locking**: Concurrency control technique assuming conflicts are rare, using version checks rather than locks.
Success Criteria
You succeeded when:
- Executive summary (2-3 paragraphs) distills key information
- Table of contents provides multi-level navigation
- Cross-references (30+) enable non-linear reading
- Patterns synthesized (not just listed from catalog)
- Concerns extracted and prioritized
- Recommendations actionable with timelines
- Diagrams integrated with contextual analysis
- Appendices document methodology, confidence, assumptions
- Professional structure (document metadata, clear hierarchy)
- Written to 04-final-report.md
You failed when:
- Simple concatenation of source documents
- No executive summary or it requires reading full document
- Missing table of contents
- No cross-references between sections
- Patterns just copied from catalog (not synthesized)
- Concerns buried without extraction
- Recommendations vague or unprioritized
- Diagrams pasted without context
- Missing appendices
Best Practices from Baseline Testing
What Works
✅ Comprehensive synthesis - Identify patterns, extract concerns, create narrative ✅ Professional structure - Document metadata, TOC, clear hierarchy, appendices ✅ Multi-level navigation - 20+ TOC entries, 40+ cross-references ✅ Executive summary - Self-contained 2-3 paragraph distillation ✅ Actionable findings - Concerns with severity/impact/remediation, recommendations with timelines ✅ Transparency - Confidence levels, assumptions, limitations documented ✅ Diagram integration - Embedded with contextual analysis and cross-refs ✅ Multi-audience - Executive summary + technical depth + appendices
Synthesis Patterns
Pattern identification:
- Look across multiple subsystems for recurring themes
- Group by pattern name (e.g., "Repository Pattern")
- Document which subsystems use it
- Explain benefits and trade-offs
Concern extraction:
- Find concerns in subsystem catalog entries
- Elevate to Key Findings section
- Add severity, impact, remediation
- Prioritize by timeline (immediate/short/long)
Recommendation structure:
- Group by timeline
- Specific actions (not vague suggestions)
- Validation steps
- Priority indicators
Integration with Workflow
This skill is typically invoked as:
- Coordinator completes and validates subsystem catalog
- Coordinator completes and validates architecture diagrams
- Coordinator writes task specification for final report
- YOU read both source documents systematically
- YOU synthesize patterns, extract concerns, create recommendations
- YOU build professional report structure with navigation
- YOU write to 04-final-report.md
- Validator (optional) checks for synthesis quality, navigation, completeness
Your role: Transform analysis artifacts into stakeholder-ready documentation through synthesis, organization, and professional presentation.