# Security Architecture Review ## Overview Systematically review designs for security issues. Core principle: **Checklist-driven review finds issues that intuition misses**. **Key insight**: Ad-hoc review finds obvious issues. Systematic checklist finds subtle gaps. ## When to Use Load this skill when: - Reviewing architecture designs pre-implementation - Conducting security audits of existing systems - Evaluating third-party integrations - Pre-deployment security validation **Symptoms you need this**: - "Is this design secure?" - Reviewing microservices, APIs, data pipelines - Pre-launch security check - Compliance audit preparation **Don't use for**: - Threat modeling new systems (use `ordis/security-architect/threat-modeling`) - Designing controls (use `ordis/security-architect/security-controls-design`) - Secure patterns (use `ordis/security-architect/secure-by-design-patterns`) ## Review Process ### Step 1: Understand System Before checklists, understand: - **Components**: Services, databases, queues, external APIs - **Data flows**: Where data enters/exits, trust boundaries - **Users/actors**: Who accesses what (end users, admins, services) - **Deployment**: Cloud/on-prem, network topology ### Step 2: Apply Checklists For EACH area, go through checklist systematically. Check = verify present and secure. ### Step 3: Document Findings For each issue found: - **Description**: What's wrong - **Severity**: Critical/High/Medium/Low - **Impact**: What can attacker do - **Recommendation**: How to fix ## Checklist 1: Authentication ### Credential Storage - [ ] Passwords hashed with strong algorithm (bcrypt, Argon2, scrypt) - [ ] NOT hashed with weak algorithms (MD5, SHA1, plain SHA256) - [ ] Salt used per-password (not global salt or no salt) - [ ] Key derivation with sufficient iterations (bcrypt cost ≥12, Argon2 with recommended params) **Common Issues**: - ❌ MD5/SHA1 hashing → CRITICAL - ❌ No salt → HIGH - ❌ Global salt → MEDIUM - ❌ Low iteration count → MEDIUM ### Multi-Factor Authentication - [ ] MFA available for all users (or at least admins) - [ ] TOTP/hardware token support (not just SMS) - [ ] Backup codes for account recovery - [ ] Cannot bypass MFA via alternate login paths **Common Issues**: - ❌ No MFA → HIGH (for privileged accounts) - ❌ SMS-only 2FA → MEDIUM (SIM swapping risk) - ❌ MFA bypass path exists → HIGH ### Session Management - [ ] Session tokens have expiration (not indefinite) - [ ] Token expiry reasonable (hours for web, days max for mobile) - [ ] Token rotation on sensitive actions (password change, permission change) - [ ] Logout invalidates tokens (server-side revocation) - [ ] Tokens stored securely (HttpOnly, Secure cookies or secure storage) **Common Issues**: - ❌ Non-expiring tokens → HIGH - ❌ No logout/revocation → MEDIUM - ❌ Tokens in localStorage (XSS vulnerable) → MEDIUM ### Password Policies - [ ] Minimum length enforced (≥12 characters recommended) - [ ] No maximum length (or very high max, ≥128 chars) - [ ] Password history (prevent reuse of last N passwords) - [ ] Account lockout after failed attempts (5-10 tries, temporary lockout) - [ ] No common password blacklist (check against known-weak passwords) **Common Issues**: - ❌ No password policy → MEDIUM - ❌ Short max length (<20 chars) → LOW - ❌ No lockout → MEDIUM (brute-force vulnerability) ## Checklist 2: Authorization ### Access Control Model - [ ] Clear access control model (RBAC, ABAC, MAC documented) - [ ] Permissions defined per resource type - [ ] Authorization checks at API layer AND data layer - [ ] Consistent enforcement across all endpoints **Common Issues**: - ❌ No defined model (ad-hoc checks) → HIGH - ❌ Authorization only at API (not data layer) → MEDIUM - ❌ Inconsistent (some endpoints skip checks) → HIGH ### Privilege Escalation Prevention - [ ] Users cannot modify their own roles/permissions - [ ] API doesn't trust client-provided role claims - [ ] Admin actions require separate authentication/approval - [ ] No hidden admin endpoints without authorization **Common Issues**: - ❌ User can edit own role in profile → CRITICAL - ❌ Trusting `X-User-Role` header from client → CRITICAL - ❌ Hidden `/admin` path without authz → HIGH ### Resource-Level Authorization - [ ] Authorization checks "Can THIS user access THIS resource?" - [ ] Not just "Is user logged in?" or "Is user admin?" - [ ] Users can only access their own data (unless explicitly shared) - [ ] Object-level permission checks (IDOR prevention) **Common Issues**: - ❌ Check login only, not resource ownership → HIGH (IDOR vulnerability) - ❌ `/api/users/{user_id}` allows any authenticated user → HIGH ### Default-Deny Principle - [ ] All endpoints require authentication by default - [ ] Explicit allow-list for public endpoints - [ ] Authorization failures return 403 Forbidden (not 404) - [ ] No "development mode" backdoors in production **Common Issues**: - ❌ Default-allow (must explicitly mark secure) → HIGH - ❌ Returning 404 instead of 403 → LOW (information disclosure) - ❌ Debug/dev endpoints enabled in production → CRITICAL ## Checklist 3: Secrets Management ### Secrets Storage - [ ] No secrets in source code - [ ] No secrets in config files committed to Git - [ ] No secrets in environment variables visible in UI - [ ] Secrets stored in secrets manager (Vault, AWS Secrets Manager, etc.) - [ ] Secrets encrypted at rest **Common Issues**: - ❌ Secrets in Git history → CRITICAL - ❌ Secrets in config.yaml → CRITICAL - ❌ Database password in Dockerfile → HIGH ### Secrets Rotation - [ ] Secrets have rotation schedule (monthly/quarterly) - [ ] Rotation is automated or documented - [ ] Application supports zero-downtime rotation - [ ] Old secrets revoked after rotation **Common Issues**: - ❌ No rotation policy → MEDIUM - ❌ Manual rotation without documentation → LOW - ❌ Rotation requires downtime → LOW ### Access Control to Secrets - [ ] Secrets access restricted by service identity - [ ] Least privilege (service accesses only its secrets) - [ ] Audit log of secrets access - [ ] No shared secrets across services **Common Issues**: - ❌ All services can access all secrets → MEDIUM - ❌ No audit log → LOW - ❌ Shared API key for multiple services → MEDIUM ## Checklist 4: Data Flow ### Trust Boundary Identification - [ ] Trust boundaries documented (external → API → services → database) - [ ] Each boundary has validation rules - [ ] No implicit trust based on network location **Common Issues**: - ❌ No documented boundaries → MEDIUM - ❌ "Internal network = trusted" assumption → HIGH ### Input Validation at Boundaries - [ ] All external input validated (type, format, range) - [ ] Validation at EVERY trust boundary (not just entry point) - [ ] Allow-list validation (define what's allowed, reject rest) - [ ] Input validation errors logged **Common Issues**: - ❌ No validation at internal boundaries → MEDIUM - ❌ Deny-list only (blacklist can be bypassed) → MEDIUM - ❌ Validation at UI only (not API) → HIGH ### Output Encoding for Context - [ ] Output encoded for destination context (HTML, SQL, shell, etc.) - [ ] SQL parameterized queries (no string concatenation) - [ ] HTML escaped when rendering user data - [ ] JSON encoded when returning API responses **Common Issues**: - ❌ String concatenation for SQL → CRITICAL (SQL injection) - ❌ Unescaped HTML → HIGH (XSS) - ❌ Unescaped shell commands → CRITICAL (command injection) ### Data Classification and Handling - [ ] Sensitive data identified (PII, secrets, financial) - [ ] Sensitive data encrypted in transit - [ ] Sensitive data encrypted at rest - [ ] Sensitive data not logged - [ ] Data retention policy defined **Common Issues**: - ❌ PII in plaintext logs → HIGH - ❌ No encryption for sensitive data → HIGH - ❌ No data retention policy → MEDIUM ## Checklist 5: Network Security ### TLS/Encryption in Transit - [ ] All external traffic uses TLS (HTTPS, not HTTP) - [ ] TLS 1.2 or 1.3 only (TLS 1.0/1.1 disabled) - [ ] Strong cipher suites (no RC4, 3DES, MD5) - [ ] HSTS header present (HTTP Strict Transport Security) **Common Issues**: - ❌ HTTP allowed → CRITICAL - ❌ TLS 1.0 enabled → HIGH - ❌ Weak ciphers → MEDIUM - ❌ No HSTS → LOW ### Certificate Validation - [ ] TLS certificates validated (not self-signed in production) - [ ] Certificate expiry monitored - [ ] Certificate revocation checking (OCSP/CRL) - [ ] No certificate validation bypass in code **Common Issues**: - ❌ Self-signed certs in production → HIGH - ❌ `verify=False` in code → CRITICAL - ❌ No expiry monitoring → MEDIUM ### Network Segmentation - [ ] Database not directly accessible from internet - [ ] Services in separate subnets/VPCs - [ ] Admin interfaces on separate network - [ ] Internal services use mTLS or VPN **Common Issues**: - ❌ Database exposed to internet → CRITICAL - ❌ All services in same flat network → MEDIUM - ❌ Admin panel on public internet → HIGH ### Firewall Rules and Least-Necessary Access - [ ] Firewall rules documented - [ ] Default-deny firewall policy - [ ] Only necessary ports open - [ ] Source IP restrictions for sensitive endpoints **Common Issues**: - ❌ Default-allow firewall → HIGH - ❌ All ports open → MEDIUM - ❌ No source restrictions → MEDIUM ## Severity Guidelines Use these guidelines to assign severity: ### Critical - Direct path to data breach or system compromise - **Examples**: SQL injection, secrets in Git, authentication bypass ### High - Significant security impact, requires immediate attention - **Examples**: Weak password hashing, no MFA for admins, IDOR vulnerability ### Medium - Security weakness that should be addressed - **Examples**: No password policy, inconsistent authorization, no audit logging ### Low - Minor issue or hardening opportunity - **Examples**: Information disclosure via error messages, missing HSTS header ## Review Report Template ```markdown # Security Architecture Review **System**: [System Name] **Review Date**: [Date] **Reviewer**: [Name] ## Executive Summary [1-2 paragraph overview: Critical/High count, key findings, overall risk level] ## Findings ### Critical (Count: X) #### 1. [Finding Title] - **Area**: Authentication / Authorization / Secrets / Data Flow / Network - **Description**: [What's wrong] - **Impact**: [What attacker can do] - **Recommendation**: [How to fix] - **Affected Components**: [API Service, Database, etc.] ### High (Count: X) [Same format as Critical] ### Medium (Count: X) [Same format] ### Low (Count: X) [Same format] ## Summary Table | Finding | Severity | Area | Status | |---------|----------|------|--------| | SQL injection in /api/users | Critical | Data Flow | Open | | Secrets in Git | Critical | Secrets | Open | | No MFA for admins | High | Authentication | Open | ## Next Steps 1. Address Critical findings immediately (timeline: 1 week) 2. Address High findings (timeline: 2-4 weeks) 3. Schedule Medium/Low findings (timeline: 1-3 months) ``` ## Quick Reference: Checklist Summary | Area | Key Checks | |------|------------| | **Authentication** | Strong hashing (bcrypt), MFA, token expiry, lockout policy | | **Authorization** | RBAC/ABAC model, resource-level checks, default-deny, no privilege escalation | | **Secrets** | Not in Git, secrets manager, rotation policy, access control | | **Data Flow** | Trust boundaries, input validation, output encoding, data classification | | **Network** | TLS 1.2+, certificate validation, segmentation, firewall rules | ## Common Mistakes ### ❌ Ad-Hoc Review Without Checklist **Wrong**: Review design intuitively, find obvious issues, call it done **Right**: Systematically go through all 5 checklists, check every item **Why**: Intuition finds 50% of issues. Checklist finds 90%. ### ❌ Stopping After Finding First Issues **Wrong**: Find SQL injection, report it, stop review **Right**: Complete full checklist review even after finding critical issues **Why**: Systems often have multiple unrelated vulnerabilities. ### ❌ Vague Recommendations **Wrong**: "Improve authentication security" **Right**: "Replace MD5 with bcrypt (cost factor 12), add salt per-password" **Why**: Specific recommendations are actionable. Vague ones are ignored. ### ❌ Missing Severity Assignment **Wrong**: List issues without severity **Right**: Assign Critical/High/Medium/Low to prioritize fixes **Why**: Teams need to know what to fix first. ## Cross-References **Use BEFORE this skill**: - `ordis/security-architect/threat-modeling` - Understand threats, then review if design addresses them **Use WITH this skill**: - `muna/technical-writer/documentation-structure` - Document review as ADR or report **Use AFTER this skill**: - `ordis/security-architect/security-controls-design` - Design fixes for identified issues ## Real-World Impact **Systematic reviews using these checklists:** - **Pre-launch review caught SQL injection** in `/api/users/{id}` endpoint (would have been CRITICAL production vulnerability) - **Checklist found secrets in Git history** across 3 repositories (developers missed it in manual review) - **Authorization review found 12 IDOR vulnerabilities** where `/api/orders/{order_id}` didn't check ownership (intuitive review found only 2) **Key lesson**: **Systematic checklist review finds 3-5x more issues than ad-hoc intuitive review.**