Files
gh-tachyon-beep-skillpacks-…/skills/using-security-architect/threat-modeling.md
2025-11-30 08:59:46 +08:00

566 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Threat Modeling
## Overview
Systematic identification of security threats using proven frameworks. Threat modeling finds threats that intuition misses by applying structured methodologies.
**Core Principle**: Security intuition finds obvious threats. Systematic threat modeling finds subtle, critical threats that lead to real vulnerabilities.
## When to Use
Load this skill when:
- Designing new systems or features (before implementation)
- Adding security-sensitive functionality (auth, data handling, APIs)
- Reviewing existing designs for security gaps
- Investigating security incidents (what else could be exploited?)
- User mentions: "threat model", "security risks", "what could go wrong", "attack surface"
**Use BEFORE implementation** - threats found after deployment are 10x more expensive to fix.
## Don't Use For
- **Implementing specific security controls** (use security-controls-design)
- **Code-level security patterns** (use secure-by-design-patterns)
- **Reviewing existing designs for completeness** (use security-architecture-review)
- **Compliance mapping** (use compliance-awareness-and-mapping)
- **Documenting threats after they're identified** (use documenting-threats-and-controls)
This skill is for IDENTIFYING threats systematically. Once threats are identified, route to appropriate skills for designing controls, implementing patterns, or documenting decisions.
## The STRIDE Framework
**STRIDE** is a systematic threat enumeration framework. Apply to EVERY component, interface, and data flow.
### S - Spoofing Identity
**Definition**: Attacker pretends to be someone/something else
**Questions to Ask**:
- Can attacker claim a different identity?
- Is authentication required? Can it be bypassed?
- Are credentials properly validated?
- Can tokens/sessions be stolen or forged?
**Example Threats**:
- Stolen authentication tokens
- Forged JWT signatures
- Session hijacking via XSS
- API key leakage in logs
### T - Tampering with Data
**Definition**: Unauthorized modification of data or code
**Questions to Ask**:
- Can attacker modify data in transit? (MITM)
- Can attacker modify data at rest? (database, files, config)
- Can attacker modify code? (supply chain, config injection)
- **Can configuration override security properties?** (CRITICAL - often missed)
**Example Threats**:
- Configuration files modifying security_level properties
- YAML/JSON injection overriding access controls
- Database tampering if encryption/MAC missing
- Code injection via deserialization
**⚠️ Property Override Pattern** (VULN-004):
```yaml
# Plugin declares security_level=UNOFFICIAL in code
# Attacker adds to YAML config:
plugins:
datasource:
security_level: SECRET # OVERRIDES code declaration!
```
Always ask: **"Can external configuration override security-critical properties?"**
### R - Repudiation
**Definition**: Attacker denies performing an action (no audit trail)
**Questions to Ask**:
- Are security-relevant actions logged?
- Can logs be tampered with or deleted?
- Is logging sufficient for forensics?
- Can attacker perform reconnaissance without detection?
**Example Threats**:
- No logging of failed authorization attempts
- Logs stored without integrity protection (MAC, signatures)
- Insufficient detail for incident response
- Log injection attacks
### I - Information Disclosure
**Definition**: Exposure of information to unauthorized parties
**Questions to Ask**:
- What data is exposed in responses, logs, errors?
- Can attacker enumerate resources or users?
- Are temporary files, caches, or memory properly cleared?
- Can attacker infer sensitive data from timing/behavior?
**Example Threats**:
- Secrets in error messages or stack traces
- Timing attacks revealing password validity
- Cache poisoning exposing other users' data
- Path traversal reading arbitrary files
### D - Denial of Service
**Definition**: Making system unavailable or degrading performance
**Questions to Ask**:
- Are there resource limits (CPU, memory, connections)?
- Can attacker trigger expensive operations?
- Is rate limiting implemented?
- Can attacker cause crashes or hangs?
**Example Threats**:
- Unbounded recursion or loops
- Memory exhaustion via large payloads
- Algorithmic complexity attacks (e.g., hash collisions)
- Crash via malformed input
### E - Elevation of Privilege
**Definition**: Gaining capabilities beyond what's authorized
**Questions to Ask**:
- Can attacker access admin functions?
- Can attacker escalate from low to high privilege?
- Are privilege checks performed at every layer?
- **Can type system be bypassed?** (ADR-004 pattern)
**Example Threats**:
- Missing authorization checks on sensitive endpoints
- Horizontal privilege escalation (access other users' data)
- Vertical privilege escalation (user → admin)
- Duck typing allowing security bypass
## Attack Tree Construction
**Purpose**: Visual/structured representation of attack paths from goal → exploitation
### Attack Tree Format
```
ROOT: Attacker Goal (e.g., "Access classified data")
├─ BRANCH 1: Attack Vector
│ ├─ LEAF: Specific Exploit (with feasibility)
│ └─ LEAF: Alternative Exploit
├─ BRANCH 2: Alternative Vector
│ └─ LEAF: Exploit Method
```
### Example: Configuration Override Attack (VULN-004)
```
ROOT: Access classified data with insufficient clearance
├─ Override Plugin Security Level
│ ├─ Inject security_level into YAML config ⭐ (VULN-004 - actually happened)
│ ├─ Modify plugin source code (requires code access)
│ └─ Bypass registry to register malicious plugin (ADR-003 gap)
├─ Exploit Trusted Downgrade
│ ├─ Compromise high-clearance component (supply chain)
│ └─ Abuse legitimate downgrade path (ADR-005 gap)
├─ Bypass Type System
│ └─ Duck-type plugin without BasePlugin inheritance (ADR-004 gap)
```
**⭐ = Easiest/highest risk path**
### How to Build Attack Trees
1. **Start with attacker goal**: What does attacker want? (data access, DoS, privilege escalation)
2. **Branch by attack vector**: How could they achieve it? (config, network, code)
3. **Leaf nodes are specific exploits**: Concrete technical steps
4. **Mark feasibility**: Easy, Medium, Hard (or Low/Med/High effort)
5. **Identify easiest path**: This is your highest priority to mitigate
## Enforcement Gap Analysis
**Pattern**: Security properties must be enforced at EVERY layer. Single-layer enforcement fails.
### Layers to Check
**For any security property (e.g., security_level, access control, data classification):**
1. **Schema/Type Layer**: Is property type-safe? Can it be None/invalid?
2. **Registration Layer**: Is component registered? Can attacker bypass registry?
3. **Construction Layer**: Is property immutable after creation? Can it be modified?
4. **Runtime Layer**: Is property checked before sensitive operations?
5. **Post-Operation Layer**: Is result validated against expected property?
### Example: MLS Security Level Enforcement (ADR-002 → 005)
| Layer | Gap Found | Fix Required |
|-------|-----------|--------------|
| **Registry** | Plugin not registered at all (ADR-003) | Central plugin registry with runtime checks |
| **Type System** | Protocol allows duck typing bypass (ADR-004) | ABC with sealed methods, not Protocol |
| **Immutability** | security_level could be mutated (VULN-009) | Frozen dataclass + runtime checks |
| **Trust** | Trusted downgrade assumes no compromise (ADR-005) | Strict mode disables trusted downgrade |
**Key Insight**: Each gap was found AFTER implementation. Systematic enforcement gap analysis would have caught all four upfront.
### How to Apply
For each security property:
1. **List all layers** where property matters
2. **Ask per layer**: "Can attacker bypass this layer?"
3. **Design defense-in-depth**: Redundant checks at multiple layers
## Risk Scoring
**Purpose**: Prioritize threats by (Likelihood × Impact)
### Likelihood Scale
| Score | Likelihood | Criteria |
|-------|-----------|----------|
| **3** | High | Easy to exploit, attacker has means and motive, no special access needed |
| **2** | Medium | Requires some skill or access, exploit path exists but not trivial |
| **1** | Low | Requires significant expertise, insider access, or rare conditions |
### Impact Scale
| Score | Impact | Criteria |
|-------|--------|----------|
| **3** | High | Complete system compromise, data breach, financial loss, safety risk |
| **2** | Medium | Partial compromise, limited data exposure, service degradation |
| **1** | Low | Minor information leakage, temporary DoS, limited scope |
### Risk Matrix
```
IMPACT
1 2 3
┌───┬───┬───┐
3 │ M │ H │ C │ C = Critical (fix immediately)
L 2 │ L │ M │ H │ H = High (fix before launch)
I 1 │ L │ L │ M │ M = Medium (fix soon)
K └───┴───┴───┘ L = Low (fix if time permits)
```
### Example: VULN-004 Config Override
- **Likelihood**: 3 (High) - YAML files easily modified with filesystem access
- **Impact**: 3 (High) - Bypass MLS enforcement, access classified data
- **Risk Score**: 9 (Critical) - **Fix immediately**
### Example: ADR-004 Type System Bypass
- **Likelihood**: 2 (Medium) - Requires knowing to create duck-typed plugin
- **Impact**: 3 (High) - Complete security bypass
- **Risk Score**: 6 (High) - **Fix before launch**
## Threat Modeling Workflow
### Step 1: System Decomposition
Break system into components:
1. **Entry points**: APIs, file uploads, configuration, user input
2. **Data stores**: Databases, caches, logs, files
3. **External dependencies**: Third-party APIs, libraries, services
4. **Trust boundaries**: Where privilege level changes, network boundaries
5. **Security-critical components**: Auth, access control, crypto, secrets management
### Step 2: Apply STRIDE per Component
For EACH component/interface, systematically ask STRIDE questions:
**Example: Plugin Configuration Component**
| STRIDE | Threat Found | Priority |
|--------|-------------|----------|
| **S** | None (no identity claims) | - |
| **T** | Config tampering to override security_level (VULN-004) | Critical |
| **R** | Config changes not logged | Medium |
| **I** | Config may contain secrets in plaintext | High |
| **D** | Malformed YAML causes parser crash | Low |
| **E** | Config override elevates plugin privilege | Critical |
### Step 3: Build Attack Trees
For each high-priority threat, construct attack tree:
- Goal: What does attacker want?
- Vectors: How could they get it?
- Exploits: Specific technical steps
Mark easiest paths with ⭐.
### Step 4: Check Enforcement Gaps
For each security property (authentication, authorization, encryption):
1. List enforcement layers (schema, registry, runtime, etc.)
2. Check each layer for gaps
3. Design redundant checks (defense-in-depth)
### Step 5: Score and Prioritize
- Calculate Likelihood × Impact for each threat
- Sort by risk score (highest first)
- Set mitigation deadlines (Critical → immediate, High → before launch)
### Step 6: Document Threats
Create threat model document:
```markdown
# Threat Model: [System Name]
## Scope
[Components, entry points, trust boundaries]
## Threats Identified
### THREAT-001: Configuration Override Attack (CRITICAL)
**STRIDE**: Tampering, Elevation of Privilege
**Attack Tree**: [Include tree diagram or text description]
**Risk Score**: 9 (L:3 × I:3)
**Mitigation**: Forbid security_level in config (schema), runtime verification, frozen dataclass
### THREAT-002: [Next threat...]
## Enforcement Gaps
[List gaps found in defense-in-depth analysis]
## Risk Matrix
[Include prioritized threat list]
```
## Common Patterns That Intuition Misses
### Pattern 1: Property Override via Configuration
**Symptom**: Security property declared in code, but configuration system allows overriding it
**Example**: VULN-004 - Plugin declares security_level in code, YAML config overrides it
**How to Spot**:
- Code declares security property (access_level, security_level, role)
- Configuration system loads external data (YAML, JSON, database)
- No explicit check that config cannot override security properties
**Mitigation**: Schema MUST forbid security properties in config, runtime verification
### Pattern 2: Enforcement at One Layer Only
**Symptom**: Security check at one layer, but attacker can bypass that layer
**Example**: ADR-003 - MLS checks assume plugin is registered, but no check that plugin IS registered
**How to Spot**:
- Security check at schema/type layer but not runtime
- Trust in single source of truth (registry, type system) without verification
- No redundant checks
**Mitigation**: Defense-in-depth - check at schema, registry, runtime, post-operation
### Pattern 3: Type System as Security Boundary
**Symptom**: Relying on type system (Protocol, interface) for security enforcement
**Example**: ADR-004 - Protocol typing allows duck typing to bypass BasePlugin
**How to Spot**:
- Security property defined in Protocol or interface
- No nominal type enforcement (isinstance check, ABC)
- Runtime doesn't verify actual type, just duck typing compatibility
**Mitigation**: Use ABC with sealed methods, runtime isinstance checks
### Pattern 4: Trusted Component Assumptions
**Symptom**: Assuming high-privilege component will never be compromised
**Example**: ADR-005 - Trusted downgrade assumes high-clearance component is always safe
**How to Spot**:
- Component granted special privileges ("trusted")
- No monitoring or verification of trusted component behavior
- Insider threat or supply chain compromise not in threat model
**Mitigation**: Trust but verify - log all actions, anomaly detection, strict mode without trust
### Pattern 5: Immutability Assumption
**Symptom**: Assuming language feature (frozen, const, final) provides security
**Example**: VULN-009 - Frozen dataclass but __dict__ bypass possible
**How to Spot**:
- Security property marked frozen/immutable via language feature
- No runtime check that property hasn't changed
- Language feature has known bypasses (__dict__, __setattr__)
**Mitigation**: Language feature + runtime verification + test all bypass methods
## Quick Reference Checklist
**Use this checklist for every threat modeling session:**
### Pre-Session
- [ ] Identify scope (components, entry points, trust boundaries)
- [ ] Gather architecture diagrams, API specs, data flow diagrams
### STRIDE Application
- [ ] Apply S.T.R.I.D.E to EVERY component/interface
- [ ] Document threats found per category
- [ ] Check for property override patterns
- [ ] Check for enforcement gap patterns
### Attack Trees
- [ ] Build attack tree for each high-priority threat
- [ ] Mark easiest exploitation paths
- [ ] Identify pre-requisites (what attacker needs)
### Risk Scoring
- [ ] Score Likelihood (1-3) for each threat
- [ ] Score Impact (1-3) for each threat
- [ ] Calculate Risk = L × I
- [ ] Prioritize by risk score
### Enforcement Gaps
- [ ] List security properties (auth, authorization, encryption, etc.)
- [ ] For each property, check: Schema? Registry? Runtime? Post-op?
- [ ] Identify gaps in defense-in-depth
### Documentation
- [ ] Create threat model document
- [ ] Include attack trees, risk matrix, mitigation plans
- [ ] Share with team for review
## Common Mistakes
### ❌ Intuitive Threat Finding Only
**Wrong**: "I'll just think about what could go wrong"
**Right**: Systematically apply STRIDE to every component
**Why**: Intuition finds obvious threats. STRIDE finds subtle, critical threats like VULN-004.
### ❌ Threat Modeling After Implementation
**Wrong**: "Let's build it first, then threat model"
**Right**: Threat model BEFORE implementation
**Why**: Threats found post-implementation require expensive re-architecture. Threats found in design are cheap to fix.
### ❌ Single-Layer Validation
**Wrong**: "Schema validates config, so it's secure"
**Right**: Validate at schema, registry, runtime, post-operation
**Why**: Attackers bypass single layers. Defense-in-depth catches them.
### ❌ Trusting Language Features for Security
**Wrong**: "It's frozen=True, so it can't be modified"
**Right**: Language feature + runtime verification + test bypass methods
**Why**: Language features have bypasses (VULN-009). Always verify.
### ❌ Skipping Risk Scoring
**Wrong**: "All threats are important, fix them all"
**Right**: Score L×I, prioritize Critical/High, fix Low only if time permits
**Why**: Resources are limited. Critical threats must be fixed first.
## Real-World Examples
### Example 1: VULN-004 - Configuration Override Attack
**System**: Plugin system with YAML configuration and MLS security levels
**STRIDE Analysis**:
- **T** (Tampering): Config file tampering ✓
- **E** (Elevation): Override security_level property ✓ **← Caught by STRIDE**
**Attack Tree**:
```
Goal: Access classified data
└─ Override security_level to SECRET
├─ Inject security_level: SECRET into YAML ⭐ (easiest)
├─ Modify source code (harder)
└─ Compromise plugin registry (harder)
```
**Risk Score**: L:3 × I:3 = 9 (Critical)
**Mitigation**: Forbid security_level in config schema + runtime verification
### Example 2: ADR-002 → 005 - MLS Design Gaps
**System**: Multi-Level Security enforcement for plugins
**Enforcement Gap Analysis**:
1. **Registry Layer**: No check plugin is registered (ADR-003) ✓
2. **Type Layer**: Protocol allows duck typing (ADR-004) ✓
3. **Immutability**: security_level could be mutated (VULN-009) ✓
4. **Trust**: Trusted downgrade assumes no compromise (ADR-005) ✓
**All four gaps found by systematic enforcement analysis** - would have prevented 3 follow-up ADRs.
**Risk Scores**:
- ADR-003 (registry): L:2 × I:3 = 6 (High)
- ADR-004 (type): L:2 × I:3 = 6 (High)
- ADR-005 (trust): L:1 × I:3 = 3 (Medium)
## When NOT to Threat Model
**Don't threat model for**:
- Non-security features (UI styling, analytics dashboards with no sensitive data)
- Changes that don't touch attack surface (refactoring internal code, renaming variables)
- Systems with no sensitive data and no attack value (internal dev tools, prototypes)
**Quick test**: If attacker can't gain anything (data, money, access, disruption), threat modeling may be overkill.
## Cross-References
### Load These Skills Together
**For comprehensive security**:
- `ordis/security-architect/threat-modeling` (this skill) - Find threats
- `ordis/security-architect/security-controls-design` - Design mitigations
- `ordis/security-architect/secure-by-design-patterns` - Prevent threats at architecture level
**For documentation**:
- `ordis/security-architect/documenting-threats-and-controls` - Document threat model
- `muna/technical-writer/documentation-structure` - Structure threat docs as ADRs
## Summary
**Threat modeling IS systematic threat discovery using STRIDE, attack trees, and risk scoring.**
**Key Principles**:
1. **STRIDE every component** - systematic beats intuition
2. **Build attack trees** - find easiest exploitation paths
3. **Check enforcement gaps** - defense-in-depth at every layer
4. **Score risks** - L × I prioritization
5. **Do it early** - before implementation, when fixes are cheap
**Meta-rule**: If you're designing something security-sensitive and you haven't threat modeled it, you've missed critical threats. Always threat model first.