# Example: System Architecture Documentation with Mermaid Diagrams
Complete workflow for creating comprehensive system architecture documentation for a distributed Grey Haven application.
## Context
**Project**: Multi-Tenant SaaS Platform (TanStack Start + Cloudflare Workers + FastAPI + PostgreSQL)
**Problem**: New developers taking 3-4 weeks to understand system architecture, high onboarding cost
**Goal**: Create comprehensive architecture documentation that reduces onboarding time to <1 week
**Initial State**:
- No architecture documentation
- Tribal knowledge spread across 8 senior developers
- New hires asking same questions repeatedly
- 3-4 weeks until new developer productive
- Architecture decisions not documented (ADRs missing)
## Step 1: System Overview with Mermaid
### High-Level Architecture Diagram
```mermaid
graph TB
subgraph "Client Layer"
Browser[Web Browser]
Mobile[Mobile App]
end
subgraph "Edge Layer (Cloudflare Workers)"
Gateway[API Gateway]
Auth[Auth Service]
Cache[KV Cache]
end
subgraph "Application Layer"
Frontend[TanStack Start
React 19]
Backend[FastAPI Backend
Python 3.12]
end
subgraph "Data Layer"
PostgreSQL[(PostgreSQL
PlanetScale)]
Redis[(Redis Cache
Upstash)]
S3[(R2 Object Storage
Cloudflare)]
end
subgraph "External Services"
Stripe[Stripe
Payments]
SendGrid[SendGrid
Email]
DataDog[DataDog
Monitoring]
end
Browser --> Gateway
Mobile --> Gateway
Gateway --> Auth
Gateway --> Frontend
Gateway --> Backend
Auth --> Cache
Frontend --> PostgreSQL
Backend --> PostgreSQL
Backend --> Redis
Backend --> S3
Backend --> Stripe
Backend --> SendGrid
Backend -.telemetry.-> DataDog
```
## Step 2: Request Flow Sequence Diagrams
### User Authentication Flow
```mermaid
sequenceDiagram
actor User
participant Browser
participant Gateway as API Gateway
(Cloudflare Worker)
participant Auth as Auth Service
(Cloudflare Worker)
participant KV as KV Cache
participant DB as PostgreSQL
User->>Browser: Enter email/password
Browser->>Gateway: POST /auth/login
Gateway->>Auth: Validate credentials
Auth->>DB: Query user by email
DB-->>Auth: User record
alt Valid Credentials
Auth->>Auth: Hash password & verify
Auth->>Auth: Generate JWT token
Auth->>KV: Store session (token -> user_id)
KV-->>Auth: OK
Auth-->>Gateway: {token, user}
Gateway-->>Browser: 200 OK {token, user}
Browser->>Browser: Store token in localStorage
Browser-->>User: Redirect to dashboard
else Invalid Credentials
Auth-->>Gateway: 401 Unauthorized
Gateway-->>Browser: {error: "INVALID_CREDENTIALS"}
Browser-->>User: Show error message
end
```
### Multi-Tenant Data Access Flow
```mermaid
sequenceDiagram
participant Client
participant Gateway
participant Backend as FastAPI Backend
participant DB as PostgreSQL
(Row-Level Security)
Client->>Gateway: GET /api/orders
Authorization: Bearer
Gateway->>Gateway: Validate JWT token
Gateway->>Gateway: Extract tenant_id from token
Gateway->>Backend: Forward request
X-Tenant-ID: tenant_123
Backend->>Backend: Set session context
SET app.tenant_id = 'tenant_123'
Backend->>DB: SELECT * FROM orders
(RLS automatically filters by tenant)
Note over DB: Row-Level Security Policy:
CREATE POLICY tenant_isolation ON orders
FOR SELECT USING (tenant_id = current_setting('app.tenant_id'))
DB-->>Backend: Orders for tenant_123 only
Backend-->>Gateway: {orders: [...]}
Gateway-->>Client: 200 OK {orders: [...]}
```
## Step 3: Data Flow Diagram
### Order Processing Data Flow
```mermaid
flowchart LR
User[User Creates Order] --> Validation[Validate Order Data]
Validation --> Stock{Check Stock
Availability}
Stock -->|Insufficient| Error[Return 400 Error]
Stock -->|Available| Reserve[Reserve Inventory]
Reserve --> Payment[Process Payment
via Stripe]
Payment -->|Failed| Release[Release Reservation]
Release --> Error
Payment -->|Success| CreateOrder[Create Order
in Database]
CreateOrder --> Queue[Queue Email
Confirmation]
Queue --> Cache[Invalidate
User Cache]
Cache --> Success[Return Order]
Success --> Async[Async: Send Email
via SendGrid]
Success --> Metrics[Update Metrics
in DataDog]
```
## Step 4: Database Schema ER Diagram
```mermaid
erDiagram
TENANT ||--o{ USER : has
TENANT ||--o{ ORDER : has
USER ||--o{ ORDER : places
ORDER ||--|{ ORDER_ITEM : contains
PRODUCT ||--o{ ORDER_ITEM : included_in
TENANT ||--o{ PRODUCT : owns
TENANT {
uuid id PK
string name
string subdomain UK
timestamp created_at
}
USER {
uuid id PK
uuid tenant_id FK
string email UK
string hashed_password
string role
timestamp created_at
}
PRODUCT {
uuid id PK
uuid tenant_id FK
string name
decimal price
int stock
}
ORDER {
uuid id PK
uuid tenant_id FK
uuid user_id FK
decimal subtotal
decimal tax
decimal total
string status
timestamp created_at
}
ORDER_ITEM {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal unit_price
}
```
## Step 5: Deployment Architecture
```mermaid
graph TB
subgraph "Development"
DevBranch[Feature Branch]
DevEnv[Dev Environment
Cloudflare Preview]
end
subgraph "Staging"
MainBranch[Main Branch]
StageEnv[Staging Environment
staging.greyhaven.com]
StageDB[(Staging PostgreSQL)]
end
subgraph "Production"
Release[Release Tag]
ProdWorkers[Cloudflare Workers
300+ Datacenters]
ProdDB[(Production PostgreSQL
PlanetScale)]
ProdCache[(Redis Cache
Upstash)]
end
DevBranch -->|git push| CI1[GitHub Actions]
CI1 -->|Deploy| DevEnv
DevBranch -->|PR Merged| MainBranch
MainBranch -->|Deploy| CI2[GitHub Actions]
CI2 -->|Run Tests| TestSuite
TestSuite -->|Success| StageEnv
StageEnv --> StageDB
MainBranch -->|git tag v1.0.0| Release
Release -->|Deploy| CI3[GitHub Actions]
CI3 -->|Canary 10%| ProdWorkers
CI3 -->|Monitor 10 min| Metrics
Metrics -->|Success| FullDeploy[100% Rollout]
FullDeploy --> ProdWorkers
ProdWorkers --> ProdDB
ProdWorkers --> ProdCache
```
## Step 6: State Machine Diagram for Order Status
```mermaid
stateDiagram-v2
[*] --> Pending: Order Created
Pending --> Processing: Payment Confirmed
Pending --> Cancelled: Payment Failed
Processing --> Shipped: Fulfillment Complete
Processing --> Cancelled: Out of Stock
Shipped --> Delivered: Tracking Confirmed
Shipped --> Returned: Customer Return
Delivered --> Returned: Return Requested
Returned --> Refunded: Return Approved
Cancelled --> [*]
Delivered --> [*]
Refunded --> [*]
note right of Pending
Inventory reserved
Payment processing
end note
note right of Processing
Items picked
Preparing shipment
end note
note right of Shipped
Tracking number assigned
In transit
end note
```
## Step 7: Architecture Decision Records (ADRs)
### ADR-001: Choose Cloudflare Workers for Edge Computing
```markdown
# ADR-001: Use Cloudflare Workers for API Gateway and Auth
**Date**: 2024-01-15
**Status**: Accepted
**Decision Makers**: Engineering Team
## Context
We need an edge computing platform for API gateway, authentication, and caching that:
- Provides global low latency (<50ms p95)
- Scales automatically without management
- Integrates with our CDN infrastructure
- Supports multi-tenant architecture
## Decision
We will use Cloudflare Workers for edge computing with KV for session storage.
## Alternatives Considered
1. **AWS Lambda@Edge**: Good performance but vendor lock-in, higher cost
2. **Traditional Load Balancer**: Single region, no edge caching
3. **Self-hosted Edge Nodes**: Complex deployment, maintenance overhead
## Consequences
**Positive**:
- Global deployment (300+ datacenters) with <50ms latency worldwide
- Auto-scaling to zero cost when idle
- Built-in DDoS protection and WAF
- KV storage for session caching (sub-millisecond reads)
- 1ms CPU time limit forces efficient code
**Negative**:
- 1ms CPU time limit requires careful optimization
- Cold starts (though <10ms typically)
- Limited to JavaScript/TypeScript/Rust/Python (via Pyodide)
- No native PostgreSQL driver (must use HTTP-based client)
## Implementation
- API Gateway: Handles routing, CORS, rate limiting
- Auth Service: JWT validation, session management (KV)
- Cache Layer: API response caching (KV + Cache API)
## Monitoring
- Worker CPU time (aim for <500μs p95)
- KV cache hit rate (aim for >95%)
- Edge response time (aim for <50ms p95)
```
### ADR-002: PostgreSQL with Row-Level Security for Multi-Tenancy
```markdown
# ADR-002: PostgreSQL Row-Level Security (RLS) for Multi-Tenant Isolation
**Date**: 2024-01-20
**Status**: Accepted
## Context
Multi-tenant SaaS requires strict data isolation. Accidental cross-tenant data access would be a critical security breach.
## Decision
Use PostgreSQL Row-Level Security (RLS) policies to enforce tenant isolation at the database level.
## Implementation
```sql
-- Enable RLS on all tables
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create policy that filters by session tenant_id
CREATE POLICY tenant_isolation ON orders
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::uuid);
-- Application sets tenant context per request
SET app.tenant_id = '';
```
## Consequences
**Positive**:
- Database-level enforcement (cannot be bypassed by application bugs)
- Automatic filtering on all queries (including ORMs)
- Performance: RLS uses indexes efficiently
**Negative**:
- Requires setting session context per connection
- Slightly more complex query plans
## Monitoring
- Weekly audit: Check for tables missing RLS
- Quarterly penetration test: Attempt cross-tenant access
```
## Results
### Before
- No architecture documentation
- 3-4 weeks until new developer productive
- 15+ hours/week answering architecture questions
- Architecture decisions lost to time
- Difficult to identify bottlenecks
### After
- Comprehensive architecture docs with 8 Mermaid diagrams
- 5 Architecture Decision Records documenting key choices
- Documentation in Git (versioned, reviewed)
- Interactive diagrams (clickable, navigable)
### Improvements
- Onboarding time: 3-4 weeks → 4-5 days (75% reduction)
- Architecture questions: 15 hrs/week → 2 hrs/week (87% reduction)
- New developer productivity: Week 4 → Week 1
- Time to understand data flow: 2 weeks → 1 day
### Developer Feedback
- "The sequence diagrams made auth flow crystal clear"
- "ERD diagram helped me understand relationships immediately"
- "ADRs answered 'why did we choose X?' questions"
## Key Lessons
1. **Mermaid Diagrams**: Version-controlled, reviewable, always up-to-date
2. **Multiple Perspectives**: System, sequence, data flow, deployment diagrams all needed
3. **ADRs are Critical**: "Why" is as important as "what"
4. **Progressive Disclosure**: Overview first, then drill into details
5. **Keep Diagrams Simple**: One concept per diagram, not everything at once
## Prevention Measures
**Implemented**:
- [x] All architecture docs in Git (versioned)
- [x] Mermaid diagrams (not static images)
- [x] ADR template for all major decisions
- [x] Onboarding checklist includes reading architecture docs
**Ongoing**:
- [ ] Auto-generate diagrams from code (infrastructure as code)
- [ ] Quarterly architecture review (docs up-to-date?)
- [ ] New ADR for every major technical decision
---
Related: [openapi-generation.md](openapi-generation.md) | [coverage-validation.md](coverage-validation.md) | [Return to INDEX](INDEX.md)