Files
gh-tachyon-beep-skillpacks-…/skills/using-security-architect/security-architecture-review.md
2025-11-30 08:59:46 +08:00

434 lines
13 KiB
Markdown

# Security Architecture Review
## Overview
Systematically review designs for security issues. Core principle: **Checklist-driven review finds issues that intuition misses**.
**Key insight**: Ad-hoc review finds obvious issues. Systematic checklist finds subtle gaps.
## When to Use
Load this skill when:
- Reviewing architecture designs pre-implementation
- Conducting security audits of existing systems
- Evaluating third-party integrations
- Pre-deployment security validation
**Symptoms you need this**:
- "Is this design secure?"
- Reviewing microservices, APIs, data pipelines
- Pre-launch security check
- Compliance audit preparation
**Don't use for**:
- Threat modeling new systems (use `ordis/security-architect/threat-modeling`)
- Designing controls (use `ordis/security-architect/security-controls-design`)
- Secure patterns (use `ordis/security-architect/secure-by-design-patterns`)
## Review Process
### Step 1: Understand System
Before checklists, understand:
- **Components**: Services, databases, queues, external APIs
- **Data flows**: Where data enters/exits, trust boundaries
- **Users/actors**: Who accesses what (end users, admins, services)
- **Deployment**: Cloud/on-prem, network topology
### Step 2: Apply Checklists
For EACH area, go through checklist systematically. Check = verify present and secure.
### Step 3: Document Findings
For each issue found:
- **Description**: What's wrong
- **Severity**: Critical/High/Medium/Low
- **Impact**: What can attacker do
- **Recommendation**: How to fix
## Checklist 1: Authentication
### Credential Storage
- [ ] Passwords hashed with strong algorithm (bcrypt, Argon2, scrypt)
- [ ] NOT hashed with weak algorithms (MD5, SHA1, plain SHA256)
- [ ] Salt used per-password (not global salt or no salt)
- [ ] Key derivation with sufficient iterations (bcrypt cost ≥12, Argon2 with recommended params)
**Common Issues**:
- ❌ MD5/SHA1 hashing → CRITICAL
- ❌ No salt → HIGH
- ❌ Global salt → MEDIUM
- ❌ Low iteration count → MEDIUM
### Multi-Factor Authentication
- [ ] MFA available for all users (or at least admins)
- [ ] TOTP/hardware token support (not just SMS)
- [ ] Backup codes for account recovery
- [ ] Cannot bypass MFA via alternate login paths
**Common Issues**:
- ❌ No MFA → HIGH (for privileged accounts)
- ❌ SMS-only 2FA → MEDIUM (SIM swapping risk)
- ❌ MFA bypass path exists → HIGH
### Session Management
- [ ] Session tokens have expiration (not indefinite)
- [ ] Token expiry reasonable (hours for web, days max for mobile)
- [ ] Token rotation on sensitive actions (password change, permission change)
- [ ] Logout invalidates tokens (server-side revocation)
- [ ] Tokens stored securely (HttpOnly, Secure cookies or secure storage)
**Common Issues**:
- ❌ Non-expiring tokens → HIGH
- ❌ No logout/revocation → MEDIUM
- ❌ Tokens in localStorage (XSS vulnerable) → MEDIUM
### Password Policies
- [ ] Minimum length enforced (≥12 characters recommended)
- [ ] No maximum length (or very high max, ≥128 chars)
- [ ] Password history (prevent reuse of last N passwords)
- [ ] Account lockout after failed attempts (5-10 tries, temporary lockout)
- [ ] No common password blacklist (check against known-weak passwords)
**Common Issues**:
- ❌ No password policy → MEDIUM
- ❌ Short max length (<20 chars) → LOW
- ❌ No lockout → MEDIUM (brute-force vulnerability)
## Checklist 2: Authorization
### Access Control Model
- [ ] Clear access control model (RBAC, ABAC, MAC documented)
- [ ] Permissions defined per resource type
- [ ] Authorization checks at API layer AND data layer
- [ ] Consistent enforcement across all endpoints
**Common Issues**:
- ❌ No defined model (ad-hoc checks) → HIGH
- ❌ Authorization only at API (not data layer) → MEDIUM
- ❌ Inconsistent (some endpoints skip checks) → HIGH
### Privilege Escalation Prevention
- [ ] Users cannot modify their own roles/permissions
- [ ] API doesn't trust client-provided role claims
- [ ] Admin actions require separate authentication/approval
- [ ] No hidden admin endpoints without authorization
**Common Issues**:
- ❌ User can edit own role in profile → CRITICAL
- ❌ Trusting `X-User-Role` header from client → CRITICAL
- ❌ Hidden `/admin` path without authz → HIGH
### Resource-Level Authorization
- [ ] Authorization checks "Can THIS user access THIS resource?"
- [ ] Not just "Is user logged in?" or "Is user admin?"
- [ ] Users can only access their own data (unless explicitly shared)
- [ ] Object-level permission checks (IDOR prevention)
**Common Issues**:
- ❌ Check login only, not resource ownership → HIGH (IDOR vulnerability)
-`/api/users/{user_id}` allows any authenticated user → HIGH
### Default-Deny Principle
- [ ] All endpoints require authentication by default
- [ ] Explicit allow-list for public endpoints
- [ ] Authorization failures return 403 Forbidden (not 404)
- [ ] No "development mode" backdoors in production
**Common Issues**:
- ❌ Default-allow (must explicitly mark secure) → HIGH
- ❌ Returning 404 instead of 403 → LOW (information disclosure)
- ❌ Debug/dev endpoints enabled in production → CRITICAL
## Checklist 3: Secrets Management
### Secrets Storage
- [ ] No secrets in source code
- [ ] No secrets in config files committed to Git
- [ ] No secrets in environment variables visible in UI
- [ ] Secrets stored in secrets manager (Vault, AWS Secrets Manager, etc.)
- [ ] Secrets encrypted at rest
**Common Issues**:
- ❌ Secrets in Git history → CRITICAL
- ❌ Secrets in config.yaml → CRITICAL
- ❌ Database password in Dockerfile → HIGH
### Secrets Rotation
- [ ] Secrets have rotation schedule (monthly/quarterly)
- [ ] Rotation is automated or documented
- [ ] Application supports zero-downtime rotation
- [ ] Old secrets revoked after rotation
**Common Issues**:
- ❌ No rotation policy → MEDIUM
- ❌ Manual rotation without documentation → LOW
- ❌ Rotation requires downtime → LOW
### Access Control to Secrets
- [ ] Secrets access restricted by service identity
- [ ] Least privilege (service accesses only its secrets)
- [ ] Audit log of secrets access
- [ ] No shared secrets across services
**Common Issues**:
- ❌ All services can access all secrets → MEDIUM
- ❌ No audit log → LOW
- ❌ Shared API key for multiple services → MEDIUM
## Checklist 4: Data Flow
### Trust Boundary Identification
- [ ] Trust boundaries documented (external → API → services → database)
- [ ] Each boundary has validation rules
- [ ] No implicit trust based on network location
**Common Issues**:
- ❌ No documented boundaries → MEDIUM
- ❌ "Internal network = trusted" assumption → HIGH
### Input Validation at Boundaries
- [ ] All external input validated (type, format, range)
- [ ] Validation at EVERY trust boundary (not just entry point)
- [ ] Allow-list validation (define what's allowed, reject rest)
- [ ] Input validation errors logged
**Common Issues**:
- ❌ No validation at internal boundaries → MEDIUM
- ❌ Deny-list only (blacklist can be bypassed) → MEDIUM
- ❌ Validation at UI only (not API) → HIGH
### Output Encoding for Context
- [ ] Output encoded for destination context (HTML, SQL, shell, etc.)
- [ ] SQL parameterized queries (no string concatenation)
- [ ] HTML escaped when rendering user data
- [ ] JSON encoded when returning API responses
**Common Issues**:
- ❌ String concatenation for SQL → CRITICAL (SQL injection)
- ❌ Unescaped HTML → HIGH (XSS)
- ❌ Unescaped shell commands → CRITICAL (command injection)
### Data Classification and Handling
- [ ] Sensitive data identified (PII, secrets, financial)
- [ ] Sensitive data encrypted in transit
- [ ] Sensitive data encrypted at rest
- [ ] Sensitive data not logged
- [ ] Data retention policy defined
**Common Issues**:
- ❌ PII in plaintext logs → HIGH
- ❌ No encryption for sensitive data → HIGH
- ❌ No data retention policy → MEDIUM
## Checklist 5: Network Security
### TLS/Encryption in Transit
- [ ] All external traffic uses TLS (HTTPS, not HTTP)
- [ ] TLS 1.2 or 1.3 only (TLS 1.0/1.1 disabled)
- [ ] Strong cipher suites (no RC4, 3DES, MD5)
- [ ] HSTS header present (HTTP Strict Transport Security)
**Common Issues**:
- ❌ HTTP allowed → CRITICAL
- ❌ TLS 1.0 enabled → HIGH
- ❌ Weak ciphers → MEDIUM
- ❌ No HSTS → LOW
### Certificate Validation
- [ ] TLS certificates validated (not self-signed in production)
- [ ] Certificate expiry monitored
- [ ] Certificate revocation checking (OCSP/CRL)
- [ ] No certificate validation bypass in code
**Common Issues**:
- ❌ Self-signed certs in production → HIGH
-`verify=False` in code → CRITICAL
- ❌ No expiry monitoring → MEDIUM
### Network Segmentation
- [ ] Database not directly accessible from internet
- [ ] Services in separate subnets/VPCs
- [ ] Admin interfaces on separate network
- [ ] Internal services use mTLS or VPN
**Common Issues**:
- ❌ Database exposed to internet → CRITICAL
- ❌ All services in same flat network → MEDIUM
- ❌ Admin panel on public internet → HIGH
### Firewall Rules and Least-Necessary Access
- [ ] Firewall rules documented
- [ ] Default-deny firewall policy
- [ ] Only necessary ports open
- [ ] Source IP restrictions for sensitive endpoints
**Common Issues**:
- ❌ Default-allow firewall → HIGH
- ❌ All ports open → MEDIUM
- ❌ No source restrictions → MEDIUM
## Severity Guidelines
Use these guidelines to assign severity:
### Critical
- Direct path to data breach or system compromise
- **Examples**: SQL injection, secrets in Git, authentication bypass
### High
- Significant security impact, requires immediate attention
- **Examples**: Weak password hashing, no MFA for admins, IDOR vulnerability
### Medium
- Security weakness that should be addressed
- **Examples**: No password policy, inconsistent authorization, no audit logging
### Low
- Minor issue or hardening opportunity
- **Examples**: Information disclosure via error messages, missing HSTS header
## Review Report Template
```markdown
# Security Architecture Review
**System**: [System Name]
**Review Date**: [Date]
**Reviewer**: [Name]
## Executive Summary
[1-2 paragraph overview: Critical/High count, key findings, overall risk level]
## Findings
### Critical (Count: X)
#### 1. [Finding Title]
- **Area**: Authentication / Authorization / Secrets / Data Flow / Network
- **Description**: [What's wrong]
- **Impact**: [What attacker can do]
- **Recommendation**: [How to fix]
- **Affected Components**: [API Service, Database, etc.]
### High (Count: X)
[Same format as Critical]
### Medium (Count: X)
[Same format]
### Low (Count: X)
[Same format]
## Summary Table
| Finding | Severity | Area | Status |
|---------|----------|------|--------|
| SQL injection in /api/users | Critical | Data Flow | Open |
| Secrets in Git | Critical | Secrets | Open |
| No MFA for admins | High | Authentication | Open |
## Next Steps
1. Address Critical findings immediately (timeline: 1 week)
2. Address High findings (timeline: 2-4 weeks)
3. Schedule Medium/Low findings (timeline: 1-3 months)
```
## Quick Reference: Checklist Summary
| Area | Key Checks |
|------|------------|
| **Authentication** | Strong hashing (bcrypt), MFA, token expiry, lockout policy |
| **Authorization** | RBAC/ABAC model, resource-level checks, default-deny, no privilege escalation |
| **Secrets** | Not in Git, secrets manager, rotation policy, access control |
| **Data Flow** | Trust boundaries, input validation, output encoding, data classification |
| **Network** | TLS 1.2+, certificate validation, segmentation, firewall rules |
## Common Mistakes
### ❌ Ad-Hoc Review Without Checklist
**Wrong**: Review design intuitively, find obvious issues, call it done
**Right**: Systematically go through all 5 checklists, check every item
**Why**: Intuition finds 50% of issues. Checklist finds 90%.
### ❌ Stopping After Finding First Issues
**Wrong**: Find SQL injection, report it, stop review
**Right**: Complete full checklist review even after finding critical issues
**Why**: Systems often have multiple unrelated vulnerabilities.
### ❌ Vague Recommendations
**Wrong**: "Improve authentication security"
**Right**: "Replace MD5 with bcrypt (cost factor 12), add salt per-password"
**Why**: Specific recommendations are actionable. Vague ones are ignored.
### ❌ Missing Severity Assignment
**Wrong**: List issues without severity
**Right**: Assign Critical/High/Medium/Low to prioritize fixes
**Why**: Teams need to know what to fix first.
## Cross-References
**Use BEFORE this skill**:
- `ordis/security-architect/threat-modeling` - Understand threats, then review if design addresses them
**Use WITH this skill**:
- `muna/technical-writer/documentation-structure` - Document review as ADR or report
**Use AFTER this skill**:
- `ordis/security-architect/security-controls-design` - Design fixes for identified issues
## Real-World Impact
**Systematic reviews using these checklists:**
- **Pre-launch review caught SQL injection** in `/api/users/{id}` endpoint (would have been CRITICAL production vulnerability)
- **Checklist found secrets in Git history** across 3 repositories (developers missed it in manual review)
- **Authorization review found 12 IDOR vulnerabilities** where `/api/orders/{order_id}` didn't check ownership (intuitive review found only 2)
**Key lesson**: **Systematic checklist review finds 3-5x more issues than ad-hoc intuitive review.**