566 lines
19 KiB
Markdown
566 lines
19 KiB
Markdown
|
||
# Threat Modeling
|
||
|
||
## Overview
|
||
|
||
Systematic identification of security threats using proven frameworks. Threat modeling finds threats that intuition misses by applying structured methodologies.
|
||
|
||
**Core Principle**: Security intuition finds obvious threats. Systematic threat modeling finds subtle, critical threats that lead to real vulnerabilities.
|
||
|
||
## When to Use
|
||
|
||
Load this skill when:
|
||
- Designing new systems or features (before implementation)
|
||
- Adding security-sensitive functionality (auth, data handling, APIs)
|
||
- Reviewing existing designs for security gaps
|
||
- Investigating security incidents (what else could be exploited?)
|
||
- User mentions: "threat model", "security risks", "what could go wrong", "attack surface"
|
||
|
||
**Use BEFORE implementation** - threats found after deployment are 10x more expensive to fix.
|
||
|
||
## Don't Use For
|
||
|
||
- **Implementing specific security controls** (use security-controls-design)
|
||
- **Code-level security patterns** (use secure-by-design-patterns)
|
||
- **Reviewing existing designs for completeness** (use security-architecture-review)
|
||
- **Compliance mapping** (use compliance-awareness-and-mapping)
|
||
- **Documenting threats after they're identified** (use documenting-threats-and-controls)
|
||
|
||
This skill is for IDENTIFYING threats systematically. Once threats are identified, route to appropriate skills for designing controls, implementing patterns, or documenting decisions.
|
||
|
||
## The STRIDE Framework
|
||
|
||
**STRIDE** is a systematic threat enumeration framework. Apply to EVERY component, interface, and data flow.
|
||
|
||
### S - Spoofing Identity
|
||
|
||
**Definition**: Attacker pretends to be someone/something else
|
||
|
||
**Questions to Ask**:
|
||
- Can attacker claim a different identity?
|
||
- Is authentication required? Can it be bypassed?
|
||
- Are credentials properly validated?
|
||
- Can tokens/sessions be stolen or forged?
|
||
|
||
**Example Threats**:
|
||
- Stolen authentication tokens
|
||
- Forged JWT signatures
|
||
- Session hijacking via XSS
|
||
- API key leakage in logs
|
||
|
||
|
||
### T - Tampering with Data
|
||
|
||
**Definition**: Unauthorized modification of data or code
|
||
|
||
**Questions to Ask**:
|
||
- Can attacker modify data in transit? (MITM)
|
||
- Can attacker modify data at rest? (database, files, config)
|
||
- Can attacker modify code? (supply chain, config injection)
|
||
- **Can configuration override security properties?** (CRITICAL - often missed)
|
||
|
||
**Example Threats**:
|
||
- Configuration files modifying security_level properties
|
||
- YAML/JSON injection overriding access controls
|
||
- Database tampering if encryption/MAC missing
|
||
- Code injection via deserialization
|
||
|
||
**⚠️ Property Override Pattern** (VULN-004):
|
||
```yaml
|
||
# Plugin declares security_level=UNOFFICIAL in code
|
||
# Attacker adds to YAML config:
|
||
plugins:
|
||
datasource:
|
||
security_level: SECRET # OVERRIDES code declaration!
|
||
```
|
||
|
||
Always ask: **"Can external configuration override security-critical properties?"**
|
||
|
||
|
||
### R - Repudiation
|
||
|
||
**Definition**: Attacker denies performing an action (no audit trail)
|
||
|
||
**Questions to Ask**:
|
||
- Are security-relevant actions logged?
|
||
- Can logs be tampered with or deleted?
|
||
- Is logging sufficient for forensics?
|
||
- Can attacker perform reconnaissance without detection?
|
||
|
||
**Example Threats**:
|
||
- No logging of failed authorization attempts
|
||
- Logs stored without integrity protection (MAC, signatures)
|
||
- Insufficient detail for incident response
|
||
- Log injection attacks
|
||
|
||
|
||
### I - Information Disclosure
|
||
|
||
**Definition**: Exposure of information to unauthorized parties
|
||
|
||
**Questions to Ask**:
|
||
- What data is exposed in responses, logs, errors?
|
||
- Can attacker enumerate resources or users?
|
||
- Are temporary files, caches, or memory properly cleared?
|
||
- Can attacker infer sensitive data from timing/behavior?
|
||
|
||
**Example Threats**:
|
||
- Secrets in error messages or stack traces
|
||
- Timing attacks revealing password validity
|
||
- Cache poisoning exposing other users' data
|
||
- Path traversal reading arbitrary files
|
||
|
||
|
||
### D - Denial of Service
|
||
|
||
**Definition**: Making system unavailable or degrading performance
|
||
|
||
**Questions to Ask**:
|
||
- Are there resource limits (CPU, memory, connections)?
|
||
- Can attacker trigger expensive operations?
|
||
- Is rate limiting implemented?
|
||
- Can attacker cause crashes or hangs?
|
||
|
||
**Example Threats**:
|
||
- Unbounded recursion or loops
|
||
- Memory exhaustion via large payloads
|
||
- Algorithmic complexity attacks (e.g., hash collisions)
|
||
- Crash via malformed input
|
||
|
||
|
||
### E - Elevation of Privilege
|
||
|
||
**Definition**: Gaining capabilities beyond what's authorized
|
||
|
||
**Questions to Ask**:
|
||
- Can attacker access admin functions?
|
||
- Can attacker escalate from low to high privilege?
|
||
- Are privilege checks performed at every layer?
|
||
- **Can type system be bypassed?** (ADR-004 pattern)
|
||
|
||
**Example Threats**:
|
||
- Missing authorization checks on sensitive endpoints
|
||
- Horizontal privilege escalation (access other users' data)
|
||
- Vertical privilege escalation (user → admin)
|
||
- Duck typing allowing security bypass
|
||
|
||
|
||
## Attack Tree Construction
|
||
|
||
**Purpose**: Visual/structured representation of attack paths from goal → exploitation
|
||
|
||
### Attack Tree Format
|
||
|
||
```
|
||
ROOT: Attacker Goal (e.g., "Access classified data")
|
||
├─ BRANCH 1: Attack Vector
|
||
│ ├─ LEAF: Specific Exploit (with feasibility)
|
||
│ └─ LEAF: Alternative Exploit
|
||
├─ BRANCH 2: Alternative Vector
|
||
│ └─ LEAF: Exploit Method
|
||
```
|
||
|
||
### Example: Configuration Override Attack (VULN-004)
|
||
|
||
```
|
||
ROOT: Access classified data with insufficient clearance
|
||
├─ Override Plugin Security Level
|
||
│ ├─ Inject security_level into YAML config ⭐ (VULN-004 - actually happened)
|
||
│ ├─ Modify plugin source code (requires code access)
|
||
│ └─ Bypass registry to register malicious plugin (ADR-003 gap)
|
||
├─ Exploit Trusted Downgrade
|
||
│ ├─ Compromise high-clearance component (supply chain)
|
||
│ └─ Abuse legitimate downgrade path (ADR-005 gap)
|
||
├─ Bypass Type System
|
||
│ └─ Duck-type plugin without BasePlugin inheritance (ADR-004 gap)
|
||
```
|
||
|
||
**⭐ = Easiest/highest risk path**
|
||
|
||
### How to Build Attack Trees
|
||
|
||
1. **Start with attacker goal**: What does attacker want? (data access, DoS, privilege escalation)
|
||
2. **Branch by attack vector**: How could they achieve it? (config, network, code)
|
||
3. **Leaf nodes are specific exploits**: Concrete technical steps
|
||
4. **Mark feasibility**: Easy, Medium, Hard (or Low/Med/High effort)
|
||
5. **Identify easiest path**: This is your highest priority to mitigate
|
||
|
||
|
||
## Enforcement Gap Analysis
|
||
|
||
**Pattern**: Security properties must be enforced at EVERY layer. Single-layer enforcement fails.
|
||
|
||
### Layers to Check
|
||
|
||
**For any security property (e.g., security_level, access control, data classification):**
|
||
|
||
1. **Schema/Type Layer**: Is property type-safe? Can it be None/invalid?
|
||
2. **Registration Layer**: Is component registered? Can attacker bypass registry?
|
||
3. **Construction Layer**: Is property immutable after creation? Can it be modified?
|
||
4. **Runtime Layer**: Is property checked before sensitive operations?
|
||
5. **Post-Operation Layer**: Is result validated against expected property?
|
||
|
||
### Example: MLS Security Level Enforcement (ADR-002 → 005)
|
||
|
||
| Layer | Gap Found | Fix Required |
|
||
|-------|-----------|--------------|
|
||
| **Registry** | Plugin not registered at all (ADR-003) | Central plugin registry with runtime checks |
|
||
| **Type System** | Protocol allows duck typing bypass (ADR-004) | ABC with sealed methods, not Protocol |
|
||
| **Immutability** | security_level could be mutated (VULN-009) | Frozen dataclass + runtime checks |
|
||
| **Trust** | Trusted downgrade assumes no compromise (ADR-005) | Strict mode disables trusted downgrade |
|
||
|
||
**Key Insight**: Each gap was found AFTER implementation. Systematic enforcement gap analysis would have caught all four upfront.
|
||
|
||
### How to Apply
|
||
|
||
For each security property:
|
||
1. **List all layers** where property matters
|
||
2. **Ask per layer**: "Can attacker bypass this layer?"
|
||
3. **Design defense-in-depth**: Redundant checks at multiple layers
|
||
|
||
|
||
## Risk Scoring
|
||
|
||
**Purpose**: Prioritize threats by (Likelihood × Impact)
|
||
|
||
### Likelihood Scale
|
||
|
||
| Score | Likelihood | Criteria |
|
||
|-------|-----------|----------|
|
||
| **3** | High | Easy to exploit, attacker has means and motive, no special access needed |
|
||
| **2** | Medium | Requires some skill or access, exploit path exists but not trivial |
|
||
| **1** | Low | Requires significant expertise, insider access, or rare conditions |
|
||
|
||
### Impact Scale
|
||
|
||
| Score | Impact | Criteria |
|
||
|-------|--------|----------|
|
||
| **3** | High | Complete system compromise, data breach, financial loss, safety risk |
|
||
| **2** | Medium | Partial compromise, limited data exposure, service degradation |
|
||
| **1** | Low | Minor information leakage, temporary DoS, limited scope |
|
||
|
||
### Risk Matrix
|
||
|
||
```
|
||
IMPACT
|
||
1 2 3
|
||
┌───┬───┬───┐
|
||
3 │ M │ H │ C │ C = Critical (fix immediately)
|
||
L 2 │ L │ M │ H │ H = High (fix before launch)
|
||
I 1 │ L │ L │ M │ M = Medium (fix soon)
|
||
K └───┴───┴───┘ L = Low (fix if time permits)
|
||
```
|
||
|
||
### Example: VULN-004 Config Override
|
||
|
||
- **Likelihood**: 3 (High) - YAML files easily modified with filesystem access
|
||
- **Impact**: 3 (High) - Bypass MLS enforcement, access classified data
|
||
- **Risk Score**: 9 (Critical) - **Fix immediately**
|
||
|
||
### Example: ADR-004 Type System Bypass
|
||
|
||
- **Likelihood**: 2 (Medium) - Requires knowing to create duck-typed plugin
|
||
- **Impact**: 3 (High) - Complete security bypass
|
||
- **Risk Score**: 6 (High) - **Fix before launch**
|
||
|
||
|
||
## Threat Modeling Workflow
|
||
|
||
### Step 1: System Decomposition
|
||
|
||
Break system into components:
|
||
1. **Entry points**: APIs, file uploads, configuration, user input
|
||
2. **Data stores**: Databases, caches, logs, files
|
||
3. **External dependencies**: Third-party APIs, libraries, services
|
||
4. **Trust boundaries**: Where privilege level changes, network boundaries
|
||
5. **Security-critical components**: Auth, access control, crypto, secrets management
|
||
|
||
|
||
### Step 2: Apply STRIDE per Component
|
||
|
||
For EACH component/interface, systematically ask STRIDE questions:
|
||
|
||
**Example: Plugin Configuration Component**
|
||
|
||
| STRIDE | Threat Found | Priority |
|
||
|--------|-------------|----------|
|
||
| **S** | None (no identity claims) | - |
|
||
| **T** | Config tampering to override security_level (VULN-004) | Critical |
|
||
| **R** | Config changes not logged | Medium |
|
||
| **I** | Config may contain secrets in plaintext | High |
|
||
| **D** | Malformed YAML causes parser crash | Low |
|
||
| **E** | Config override elevates plugin privilege | Critical |
|
||
|
||
|
||
### Step 3: Build Attack Trees
|
||
|
||
For each high-priority threat, construct attack tree:
|
||
- Goal: What does attacker want?
|
||
- Vectors: How could they get it?
|
||
- Exploits: Specific technical steps
|
||
|
||
Mark easiest paths with ⭐.
|
||
|
||
|
||
### Step 4: Check Enforcement Gaps
|
||
|
||
For each security property (authentication, authorization, encryption):
|
||
1. List enforcement layers (schema, registry, runtime, etc.)
|
||
2. Check each layer for gaps
|
||
3. Design redundant checks (defense-in-depth)
|
||
|
||
|
||
### Step 5: Score and Prioritize
|
||
|
||
- Calculate Likelihood × Impact for each threat
|
||
- Sort by risk score (highest first)
|
||
- Set mitigation deadlines (Critical → immediate, High → before launch)
|
||
|
||
|
||
### Step 6: Document Threats
|
||
|
||
Create threat model document:
|
||
```markdown
|
||
# Threat Model: [System Name]
|
||
|
||
## Scope
|
||
[Components, entry points, trust boundaries]
|
||
|
||
## Threats Identified
|
||
|
||
### THREAT-001: Configuration Override Attack (CRITICAL)
|
||
**STRIDE**: Tampering, Elevation of Privilege
|
||
**Attack Tree**: [Include tree diagram or text description]
|
||
**Risk Score**: 9 (L:3 × I:3)
|
||
**Mitigation**: Forbid security_level in config (schema), runtime verification, frozen dataclass
|
||
|
||
### THREAT-002: [Next threat...]
|
||
|
||
## Enforcement Gaps
|
||
[List gaps found in defense-in-depth analysis]
|
||
|
||
## Risk Matrix
|
||
[Include prioritized threat list]
|
||
```
|
||
|
||
|
||
## Common Patterns That Intuition Misses
|
||
|
||
### Pattern 1: Property Override via Configuration
|
||
|
||
**Symptom**: Security property declared in code, but configuration system allows overriding it
|
||
|
||
**Example**: VULN-004 - Plugin declares security_level in code, YAML config overrides it
|
||
|
||
**How to Spot**:
|
||
- Code declares security property (access_level, security_level, role)
|
||
- Configuration system loads external data (YAML, JSON, database)
|
||
- No explicit check that config cannot override security properties
|
||
|
||
**Mitigation**: Schema MUST forbid security properties in config, runtime verification
|
||
|
||
|
||
### Pattern 2: Enforcement at One Layer Only
|
||
|
||
**Symptom**: Security check at one layer, but attacker can bypass that layer
|
||
|
||
**Example**: ADR-003 - MLS checks assume plugin is registered, but no check that plugin IS registered
|
||
|
||
**How to Spot**:
|
||
- Security check at schema/type layer but not runtime
|
||
- Trust in single source of truth (registry, type system) without verification
|
||
- No redundant checks
|
||
|
||
**Mitigation**: Defense-in-depth - check at schema, registry, runtime, post-operation
|
||
|
||
|
||
### Pattern 3: Type System as Security Boundary
|
||
|
||
**Symptom**: Relying on type system (Protocol, interface) for security enforcement
|
||
|
||
**Example**: ADR-004 - Protocol typing allows duck typing to bypass BasePlugin
|
||
|
||
**How to Spot**:
|
||
- Security property defined in Protocol or interface
|
||
- No nominal type enforcement (isinstance check, ABC)
|
||
- Runtime doesn't verify actual type, just duck typing compatibility
|
||
|
||
**Mitigation**: Use ABC with sealed methods, runtime isinstance checks
|
||
|
||
|
||
### Pattern 4: Trusted Component Assumptions
|
||
|
||
**Symptom**: Assuming high-privilege component will never be compromised
|
||
|
||
**Example**: ADR-005 - Trusted downgrade assumes high-clearance component is always safe
|
||
|
||
**How to Spot**:
|
||
- Component granted special privileges ("trusted")
|
||
- No monitoring or verification of trusted component behavior
|
||
- Insider threat or supply chain compromise not in threat model
|
||
|
||
**Mitigation**: Trust but verify - log all actions, anomaly detection, strict mode without trust
|
||
|
||
|
||
### Pattern 5: Immutability Assumption
|
||
|
||
**Symptom**: Assuming language feature (frozen, const, final) provides security
|
||
|
||
**Example**: VULN-009 - Frozen dataclass but __dict__ bypass possible
|
||
|
||
**How to Spot**:
|
||
- Security property marked frozen/immutable via language feature
|
||
- No runtime check that property hasn't changed
|
||
- Language feature has known bypasses (__dict__, __setattr__)
|
||
|
||
**Mitigation**: Language feature + runtime verification + test all bypass methods
|
||
|
||
|
||
## Quick Reference Checklist
|
||
|
||
**Use this checklist for every threat modeling session:**
|
||
|
||
### Pre-Session
|
||
- [ ] Identify scope (components, entry points, trust boundaries)
|
||
- [ ] Gather architecture diagrams, API specs, data flow diagrams
|
||
|
||
### STRIDE Application
|
||
- [ ] Apply S.T.R.I.D.E to EVERY component/interface
|
||
- [ ] Document threats found per category
|
||
- [ ] Check for property override patterns
|
||
- [ ] Check for enforcement gap patterns
|
||
|
||
### Attack Trees
|
||
- [ ] Build attack tree for each high-priority threat
|
||
- [ ] Mark easiest exploitation paths
|
||
- [ ] Identify pre-requisites (what attacker needs)
|
||
|
||
### Risk Scoring
|
||
- [ ] Score Likelihood (1-3) for each threat
|
||
- [ ] Score Impact (1-3) for each threat
|
||
- [ ] Calculate Risk = L × I
|
||
- [ ] Prioritize by risk score
|
||
|
||
### Enforcement Gaps
|
||
- [ ] List security properties (auth, authorization, encryption, etc.)
|
||
- [ ] For each property, check: Schema? Registry? Runtime? Post-op?
|
||
- [ ] Identify gaps in defense-in-depth
|
||
|
||
### Documentation
|
||
- [ ] Create threat model document
|
||
- [ ] Include attack trees, risk matrix, mitigation plans
|
||
- [ ] Share with team for review
|
||
|
||
|
||
## Common Mistakes
|
||
|
||
### ❌ Intuitive Threat Finding Only
|
||
**Wrong**: "I'll just think about what could go wrong"
|
||
**Right**: Systematically apply STRIDE to every component
|
||
|
||
**Why**: Intuition finds obvious threats. STRIDE finds subtle, critical threats like VULN-004.
|
||
|
||
### ❌ Threat Modeling After Implementation
|
||
**Wrong**: "Let's build it first, then threat model"
|
||
**Right**: Threat model BEFORE implementation
|
||
|
||
**Why**: Threats found post-implementation require expensive re-architecture. Threats found in design are cheap to fix.
|
||
|
||
### ❌ Single-Layer Validation
|
||
**Wrong**: "Schema validates config, so it's secure"
|
||
**Right**: Validate at schema, registry, runtime, post-operation
|
||
|
||
**Why**: Attackers bypass single layers. Defense-in-depth catches them.
|
||
|
||
### ❌ Trusting Language Features for Security
|
||
**Wrong**: "It's frozen=True, so it can't be modified"
|
||
**Right**: Language feature + runtime verification + test bypass methods
|
||
|
||
**Why**: Language features have bypasses (VULN-009). Always verify.
|
||
|
||
### ❌ Skipping Risk Scoring
|
||
**Wrong**: "All threats are important, fix them all"
|
||
**Right**: Score L×I, prioritize Critical/High, fix Low only if time permits
|
||
|
||
**Why**: Resources are limited. Critical threats must be fixed first.
|
||
|
||
|
||
## Real-World Examples
|
||
|
||
### Example 1: VULN-004 - Configuration Override Attack
|
||
|
||
**System**: Plugin system with YAML configuration and MLS security levels
|
||
|
||
**STRIDE Analysis**:
|
||
- **T** (Tampering): Config file tampering ✓
|
||
- **E** (Elevation): Override security_level property ✓ **← Caught by STRIDE**
|
||
|
||
**Attack Tree**:
|
||
```
|
||
Goal: Access classified data
|
||
└─ Override security_level to SECRET
|
||
├─ Inject security_level: SECRET into YAML ⭐ (easiest)
|
||
├─ Modify source code (harder)
|
||
└─ Compromise plugin registry (harder)
|
||
```
|
||
|
||
**Risk Score**: L:3 × I:3 = 9 (Critical)
|
||
|
||
**Mitigation**: Forbid security_level in config schema + runtime verification
|
||
|
||
|
||
### Example 2: ADR-002 → 005 - MLS Design Gaps
|
||
|
||
**System**: Multi-Level Security enforcement for plugins
|
||
|
||
**Enforcement Gap Analysis**:
|
||
1. **Registry Layer**: No check plugin is registered (ADR-003) ✓
|
||
2. **Type Layer**: Protocol allows duck typing (ADR-004) ✓
|
||
3. **Immutability**: security_level could be mutated (VULN-009) ✓
|
||
4. **Trust**: Trusted downgrade assumes no compromise (ADR-005) ✓
|
||
|
||
**All four gaps found by systematic enforcement analysis** - would have prevented 3 follow-up ADRs.
|
||
|
||
**Risk Scores**:
|
||
- ADR-003 (registry): L:2 × I:3 = 6 (High)
|
||
- ADR-004 (type): L:2 × I:3 = 6 (High)
|
||
- ADR-005 (trust): L:1 × I:3 = 3 (Medium)
|
||
|
||
|
||
## When NOT to Threat Model
|
||
|
||
**Don't threat model for**:
|
||
- Non-security features (UI styling, analytics dashboards with no sensitive data)
|
||
- Changes that don't touch attack surface (refactoring internal code, renaming variables)
|
||
- Systems with no sensitive data and no attack value (internal dev tools, prototypes)
|
||
|
||
**Quick test**: If attacker can't gain anything (data, money, access, disruption), threat modeling may be overkill.
|
||
|
||
|
||
## Cross-References
|
||
|
||
### Load These Skills Together
|
||
|
||
**For comprehensive security**:
|
||
- `ordis/security-architect/threat-modeling` (this skill) - Find threats
|
||
- `ordis/security-architect/security-controls-design` - Design mitigations
|
||
- `ordis/security-architect/secure-by-design-patterns` - Prevent threats at architecture level
|
||
|
||
**For documentation**:
|
||
- `ordis/security-architect/documenting-threats-and-controls` - Document threat model
|
||
- `muna/technical-writer/documentation-structure` - Structure threat docs as ADRs
|
||
|
||
|
||
## Summary
|
||
|
||
**Threat modeling IS systematic threat discovery using STRIDE, attack trees, and risk scoring.**
|
||
|
||
**Key Principles**:
|
||
1. **STRIDE every component** - systematic beats intuition
|
||
2. **Build attack trees** - find easiest exploitation paths
|
||
3. **Check enforcement gaps** - defense-in-depth at every layer
|
||
4. **Score risks** - L × I prioritization
|
||
5. **Do it early** - before implementation, when fixes are cheap
|
||
|
||
**Meta-rule**: If you're designing something security-sensitive and you haven't threat modeled it, you've missed critical threats. Always threat model first.
|