gh-tachyon-beep-skillpacks-…/skills/using-security-architect/threat-modeling.md


# Threat Modeling

## Overview

Systematic identification of security threats using proven frameworks. Threat modeling finds threats that intuition misses by applying structured methodologies.

**Core Principle**: Security intuition finds obvious threats. Systematic threat modeling finds subtle, critical threats that lead to real vulnerabilities.

## When to Use

Load this skill when:
- Designing new systems or features (before implementation)
- Adding security-sensitive functionality (auth, data handling, APIs)
- Reviewing existing designs for security gaps
- Investigating security incidents (what else could be exploited?)
- User mentions: "threat model", "security risks", "what could go wrong", "attack surface"

**Use BEFORE implementation** - threats found after deployment are 10x more expensive to fix.

## Don't Use For

- **Implementing specific security controls** (use security-controls-design)
- **Code-level security patterns** (use secure-by-design-patterns)
- **Reviewing existing designs for completeness** (use security-architecture-review)
- **Compliance mapping** (use compliance-awareness-and-mapping)
- **Documenting threats after they're identified** (use documenting-threats-and-controls)

This skill is for IDENTIFYING threats systematically. Once threats are identified, route to appropriate skills for designing controls, implementing patterns, or documenting decisions.

## The STRIDE Framework

**STRIDE** is a systematic threat enumeration framework. Apply to EVERY component, interface, and data flow.

### S - Spoofing Identity

**Definition**: Attacker pretends to be someone/something else

**Questions to Ask**:
- Can attacker claim a different identity?
- Is authentication required? Can it be bypassed?
- Are credentials properly validated?
- Can tokens/sessions be stolen or forged?

**Example Threats**:
- Stolen authentication tokens
- Forged JWT signatures
- Session hijacking via XSS
- API key leakage in logs


### T - Tampering with Data

**Definition**: Unauthorized modification of data or code

**Questions to Ask**:
- Can attacker modify data in transit? (MITM)
- Can attacker modify data at rest? (database, files, config)
- Can attacker modify code? (supply chain, config injection)
- **Can configuration override security properties?** (CRITICAL - often missed)

**Example Threats**:
- Configuration files modifying security_level properties
- YAML/JSON injection overriding access controls
- Database tampering if encryption/MAC missing
- Code injection via deserialization

**⚠️ Property Override Pattern** (VULN-004):
```yaml
# Plugin declares security_level=UNOFFICIAL in code
# Attacker adds to YAML config:
plugins:
  datasource:
    security_level: SECRET  # OVERRIDES code declaration!
```

Always ask: **"Can external configuration override security-critical properties?"**


### R - Repudiation

**Definition**: Attacker denies performing an action (no audit trail)

**Questions to Ask**:
- Are security-relevant actions logged?
- Can logs be tampered with or deleted?
- Is logging sufficient for forensics?
- Can attacker perform reconnaissance without detection?

**Example Threats**:
- No logging of failed authorization attempts
- Logs stored without integrity protection (MAC, signatures)
- Insufficient detail for incident response
- Log injection attacks


### I - Information Disclosure

**Definition**: Exposure of information to unauthorized parties

**Questions to Ask**:
- What data is exposed in responses, logs, errors?
- Can attacker enumerate resources or users?
- Are temporary files, caches, or memory properly cleared?
- Can attacker infer sensitive data from timing/behavior?

**Example Threats**:
- Secrets in error messages or stack traces
- Timing attacks revealing password validity
- Cache poisoning exposing other users' data
- Path traversal reading arbitrary files


### D - Denial of Service

**Definition**: Making system unavailable or degrading performance

**Questions to Ask**:
- Are there resource limits (CPU, memory, connections)?
- Can attacker trigger expensive operations?
- Is rate limiting implemented?
- Can attacker cause crashes or hangs?

**Example Threats**:
- Unbounded recursion or loops
- Memory exhaustion via large payloads
- Algorithmic complexity attacks (e.g., hash collisions)
- Crash via malformed input


### E - Elevation of Privilege

**Definition**: Gaining capabilities beyond what's authorized

**Questions to Ask**:
- Can attacker access admin functions?
- Can attacker escalate from low to high privilege?
- Are privilege checks performed at every layer?
- **Can type system be bypassed?** (ADR-004 pattern)

**Example Threats**:
- Missing authorization checks on sensitive endpoints
- Horizontal privilege escalation (access other users' data)
- Vertical privilege escalation (user → admin)
- Duck typing allowing security bypass


## Attack Tree Construction

**Purpose**: Visual/structured representation of attack paths from goal → exploitation

### Attack Tree Format

```
ROOT: Attacker Goal (e.g., "Access classified data")
├─ BRANCH 1: Attack Vector
│  ├─ LEAF: Specific Exploit (with feasibility)
│  └─ LEAF: Alternative Exploit
├─ BRANCH 2: Alternative Vector
│  └─ LEAF: Exploit Method
```

### Example: Configuration Override Attack (VULN-004)

```
ROOT: Access classified data with insufficient clearance
├─ Override Plugin Security Level
│  ├─ Inject security_level into YAML config ⭐ (VULN-004 - actually happened)
│  ├─ Modify plugin source code (requires code access)
│  └─ Bypass registry to register malicious plugin (ADR-003 gap)
├─ Exploit Trusted Downgrade
│  ├─ Compromise high-clearance component (supply chain)
│  └─ Abuse legitimate downgrade path (ADR-005 gap)
├─ Bypass Type System
│  └─ Duck-type plugin without BasePlugin inheritance (ADR-004 gap)
```

**⭐ = Easiest/highest risk path**

### How to Build Attack Trees

1. **Start with attacker goal**: What does attacker want? (data access, DoS, privilege escalation)
2. **Branch by attack vector**: How could they achieve it? (config, network, code)
3. **Leaf nodes are specific exploits**: Concrete technical steps
4. **Mark feasibility**: Easy, Medium, Hard (or Low/Med/High effort)
5. **Identify easiest path**: This is your highest priority to mitigate


## Enforcement Gap Analysis

**Pattern**: Security properties must be enforced at EVERY layer. Single-layer enforcement fails.

### Layers to Check

**For any security property (e.g., security_level, access control, data classification):**

1. **Schema/Type Layer**: Is property type-safe? Can it be None/invalid?
2. **Registration Layer**: Is component registered? Can attacker bypass registry?
3. **Construction Layer**: Is property immutable after creation? Can it be modified?
4. **Runtime Layer**: Is property checked before sensitive operations?
5. **Post-Operation Layer**: Is result validated against expected property?

### Example: MLS Security Level Enforcement (ADR-002 → 005)

| Layer | Gap Found | Fix Required |
|-------|-----------|--------------|
| **Registry** | Plugin not registered at all (ADR-003) | Central plugin registry with runtime checks |
| **Type System** | Protocol allows duck typing bypass (ADR-004) | ABC with sealed methods, not Protocol |
| **Immutability** | security_level could be mutated (VULN-009) | Frozen dataclass + runtime checks |
| **Trust** | Trusted downgrade assumes no compromise (ADR-005) | Strict mode disables trusted downgrade |

**Key Insight**: Each gap was found AFTER implementation. Systematic enforcement gap analysis would have caught all four upfront.

### How to Apply

For each security property:
1. **List all layers** where property matters
2. **Ask per layer**: "Can attacker bypass this layer?"
3. **Design defense-in-depth**: Redundant checks at multiple layers


## Risk Scoring

**Purpose**: Prioritize threats by (Likelihood × Impact)

### Likelihood Scale

| Score | Likelihood | Criteria |
|-------|-----------|----------|
| **3** | High | Easy to exploit, attacker has means and motive, no special access needed |
| **2** | Medium | Requires some skill or access, exploit path exists but not trivial |
| **1** | Low | Requires significant expertise, insider access, or rare conditions |

### Impact Scale

| Score | Impact | Criteria |
|-------|--------|----------|
| **3** | High | Complete system compromise, data breach, financial loss, safety risk |
| **2** | Medium | Partial compromise, limited data exposure, service degradation |
| **1** | Low | Minor information leakage, temporary DoS, limited scope |

### Risk Matrix

```
         IMPACT
         1   2   3
       ┌───┬───┬───┐
     3 │ M │ H │ C │  C = Critical (fix immediately)
L    2 │ L │ M │ H │  H = High (fix before launch)
I    1 │ L │ L │ M │  M = Medium (fix soon)
K      └───┴───┴───┘  L = Low (fix if time permits)
```

### Example: VULN-004 Config Override

- **Likelihood**: 3 (High) - YAML files easily modified with filesystem access
- **Impact**: 3 (High) - Bypass MLS enforcement, access classified data
- **Risk Score**: 9 (Critical) - **Fix immediately**

### Example: ADR-004 Type System Bypass

- **Likelihood**: 2 (Medium) - Requires knowing to create duck-typed plugin
- **Impact**: 3 (High) - Complete security bypass
- **Risk Score**: 6 (High) - **Fix before launch**


## Threat Modeling Workflow

### Step 1: System Decomposition

Break system into components:
1. **Entry points**: APIs, file uploads, configuration, user input
2. **Data stores**: Databases, caches, logs, files
3. **External dependencies**: Third-party APIs, libraries, services
4. **Trust boundaries**: Where privilege level changes, network boundaries
5. **Security-critical components**: Auth, access control, crypto, secrets management


### Step 2: Apply STRIDE per Component

For EACH component/interface, systematically ask STRIDE questions:

**Example: Plugin Configuration Component**

| STRIDE | Threat Found | Priority |
|--------|-------------|----------|
| **S** | None (no identity claims) | - |
| **T** | Config tampering to override security_level (VULN-004) | Critical |
| **R** | Config changes not logged | Medium |
| **I** | Config may contain secrets in plaintext | High |
| **D** | Malformed YAML causes parser crash | Low |
| **E** | Config override elevates plugin privilege | Critical |


### Step 3: Build Attack Trees

For each high-priority threat, construct attack tree:
- Goal: What does attacker want?
- Vectors: How could they get it?
- Exploits: Specific technical steps

Mark easiest paths with ⭐.


### Step 4: Check Enforcement Gaps

For each security property (authentication, authorization, encryption):
1. List enforcement layers (schema, registry, runtime, etc.)
2. Check each layer for gaps
3. Design redundant checks (defense-in-depth)


### Step 5: Score and Prioritize

- Calculate Likelihood × Impact for each threat
- Sort by risk score (highest first)
- Set mitigation deadlines (Critical → immediate, High → before launch)


### Step 6: Document Threats

Create threat model document:
```markdown
# Threat Model: [System Name]

## Scope
[Components, entry points, trust boundaries]

## Threats Identified

### THREAT-001: Configuration Override Attack (CRITICAL)
**STRIDE**: Tampering, Elevation of Privilege
**Attack Tree**: [Include tree diagram or text description]
**Risk Score**: 9 (L:3 × I:3)
**Mitigation**: Forbid security_level in config (schema), runtime verification, frozen dataclass

### THREAT-002: [Next threat...]

## Enforcement Gaps
[List gaps found in defense-in-depth analysis]

## Risk Matrix
[Include prioritized threat list]
```


## Common Patterns That Intuition Misses

### Pattern 1: Property Override via Configuration

**Symptom**: Security property declared in code, but configuration system allows overriding it

**Example**: VULN-004 - Plugin declares security_level in code, YAML config overrides it

**How to Spot**:
- Code declares security property (access_level, security_level, role)
- Configuration system loads external data (YAML, JSON, database)
- No explicit check that config cannot override security properties

**Mitigation**: Schema MUST forbid security properties in config, runtime verification


### Pattern 2: Enforcement at One Layer Only

**Symptom**: Security check at one layer, but attacker can bypass that layer

**Example**: ADR-003 - MLS checks assume plugin is registered, but no check that plugin IS registered

**How to Spot**:
- Security check at schema/type layer but not runtime
- Trust in single source of truth (registry, type system) without verification
- No redundant checks

**Mitigation**: Defense-in-depth - check at schema, registry, runtime, post-operation


### Pattern 3: Type System as Security Boundary

**Symptom**: Relying on type system (Protocol, interface) for security enforcement

**Example**: ADR-004 - Protocol typing allows duck typing to bypass BasePlugin

**How to Spot**:
- Security property defined in Protocol or interface
- No nominal type enforcement (isinstance check, ABC)
- Runtime doesn't verify actual type, just duck typing compatibility

**Mitigation**: Use ABC with sealed methods, runtime isinstance checks


### Pattern 4: Trusted Component Assumptions

**Symptom**: Assuming high-privilege component will never be compromised

**Example**: ADR-005 - Trusted downgrade assumes high-clearance component is always safe

**How to Spot**:
- Component granted special privileges ("trusted")
- No monitoring or verification of trusted component behavior
- Insider threat or supply chain compromise not in threat model

**Mitigation**: Trust but verify - log all actions, anomaly detection, strict mode without trust


### Pattern 5: Immutability Assumption

**Symptom**: Assuming language feature (frozen, const, final) provides security

**Example**: VULN-009 - Frozen dataclass but __dict__ bypass possible

**How to Spot**:
- Security property marked frozen/immutable via language feature
- No runtime check that property hasn't changed
- Language feature has known bypasses (__dict__, __setattr__)

**Mitigation**: Language feature + runtime verification + test all bypass methods


## Quick Reference Checklist

**Use this checklist for every threat modeling session:**

### Pre-Session
- [ ] Identify scope (components, entry points, trust boundaries)
- [ ] Gather architecture diagrams, API specs, data flow diagrams

### STRIDE Application
- [ ] Apply S.T.R.I.D.E to EVERY component/interface
- [ ] Document threats found per category
- [ ] Check for property override patterns
- [ ] Check for enforcement gap patterns

### Attack Trees
- [ ] Build attack tree for each high-priority threat
- [ ] Mark easiest exploitation paths
- [ ] Identify pre-requisites (what attacker needs)

### Risk Scoring
- [ ] Score Likelihood (1-3) for each threat
- [ ] Score Impact (1-3) for each threat
- [ ] Calculate Risk = L × I
- [ ] Prioritize by risk score

### Enforcement Gaps
- [ ] List security properties (auth, authorization, encryption, etc.)
- [ ] For each property, check: Schema? Registry? Runtime? Post-op?
- [ ] Identify gaps in defense-in-depth

### Documentation
- [ ] Create threat model document
- [ ] Include attack trees, risk matrix, mitigation plans
- [ ] Share with team for review


## Common Mistakes

### ❌ Intuitive Threat Finding Only
**Wrong**: "I'll just think about what could go wrong"
**Right**: Systematically apply STRIDE to every component

**Why**: Intuition finds obvious threats. STRIDE finds subtle, critical threats like VULN-004.

### ❌ Threat Modeling After Implementation
**Wrong**: "Let's build it first, then threat model"
**Right**: Threat model BEFORE implementation

**Why**: Threats found post-implementation require expensive re-architecture. Threats found in design are cheap to fix.

### ❌ Single-Layer Validation
**Wrong**: "Schema validates config, so it's secure"
**Right**: Validate at schema, registry, runtime, post-operation

**Why**: Attackers bypass single layers. Defense-in-depth catches them.

### ❌ Trusting Language Features for Security
**Wrong**: "It's frozen=True, so it can't be modified"
**Right**: Language feature + runtime verification + test bypass methods

**Why**: Language features have bypasses (VULN-009). Always verify.

### ❌ Skipping Risk Scoring
**Wrong**: "All threats are important, fix them all"
**Right**: Score L×I, prioritize Critical/High, fix Low only if time permits

**Why**: Resources are limited. Critical threats must be fixed first.


## Real-World Examples

### Example 1: VULN-004 - Configuration Override Attack

**System**: Plugin system with YAML configuration and MLS security levels

**STRIDE Analysis**:
- **T** (Tampering): Config file tampering ✓
- **E** (Elevation): Override security_level property ✓ **← Caught by STRIDE**

**Attack Tree**:
```
Goal: Access classified data
└─ Override security_level to SECRET
   ├─ Inject security_level: SECRET into YAML ⭐ (easiest)
   ├─ Modify source code (harder)
   └─ Compromise plugin registry (harder)
```

**Risk Score**: L:3 × I:3 = 9 (Critical)

**Mitigation**: Forbid security_level in config schema + runtime verification


### Example 2: ADR-002 → 005 - MLS Design Gaps

**System**: Multi-Level Security enforcement for plugins

**Enforcement Gap Analysis**:
1. **Registry Layer**: No check plugin is registered (ADR-003) ✓
2. **Type Layer**: Protocol allows duck typing (ADR-004) ✓
3. **Immutability**: security_level could be mutated (VULN-009) ✓
4. **Trust**: Trusted downgrade assumes no compromise (ADR-005) ✓

**All four gaps found by systematic enforcement analysis** - would have prevented 3 follow-up ADRs.

**Risk Scores**:
- ADR-003 (registry): L:2 × I:3 = 6 (High)
- ADR-004 (type): L:2 × I:3 = 6 (High)
- ADR-005 (trust): L:1 × I:3 = 3 (Medium)


## When NOT to Threat Model

**Don't threat model for**:
- Non-security features (UI styling, analytics dashboards with no sensitive data)
- Changes that don't touch attack surface (refactoring internal code, renaming variables)
- Systems with no sensitive data and no attack value (internal dev tools, prototypes)

**Quick test**: If attacker can't gain anything (data, money, access, disruption), threat modeling may be overkill.


## Cross-References

### Load These Skills Together

**For comprehensive security**:
- `ordis/security-architect/threat-modeling` (this skill) - Find threats
- `ordis/security-architect/security-controls-design` - Design mitigations
- `ordis/security-architect/secure-by-design-patterns` - Prevent threats at architecture level

**For documentation**:
- `ordis/security-architect/documenting-threats-and-controls` - Document threat model
- `muna/technical-writer/documentation-structure` - Structure threat docs as ADRs


## Summary

**Threat modeling IS systematic threat discovery using STRIDE, attack trees, and risk scoring.**

**Key Principles**:
1. **STRIDE every component** - systematic beats intuition
2. **Build attack trees** - find easiest exploitation paths
3. **Check enforcement gaps** - defense-in-depth at every layer
4. **Score risks** - L × I prioritization
5. **Do it early** - before implementation, when fixes are cheap

**Meta-rule**: If you're designing something security-sensitive and you haven't threat modeled it, you've missed critical threats. Always threat model first.