Initial commit

2025-11-30 08:59:46 +08:00
commit 4fad9f51e1
12 changed files with 5049 additions and 0 deletions
--- a/skills/using-security-architect/threat-modeling.md
+++ b/skills/using-security-architect/threat-modeling.md
@@ -0,0 +1,565 @@
+
+# Threat Modeling
+
+## Overview
+
+Systematic identification of security threats using proven frameworks. Threat modeling finds threats that intuition misses by applying structured methodologies.
+
+**Core Principle**: Security intuition finds obvious threats. Systematic threat modeling finds subtle, critical threats that lead to real vulnerabilities.
+
+## When to Use
+
+Load this skill when:
+- Designing new systems or features (before implementation)
+- Adding security-sensitive functionality (auth, data handling, APIs)
+- Reviewing existing designs for security gaps
+- Investigating security incidents (what else could be exploited?)
+- User mentions: "threat model", "security risks", "what could go wrong", "attack surface"
+
+**Use BEFORE implementation** - threats found after deployment are 10x more expensive to fix.
+
+## Don't Use For
+
+- **Implementing specific security controls** (use security-controls-design)
+- **Code-level security patterns** (use secure-by-design-patterns)
+- **Reviewing existing designs for completeness** (use security-architecture-review)
+- **Compliance mapping** (use compliance-awareness-and-mapping)
+- **Documenting threats after they're identified** (use documenting-threats-and-controls)
+
+This skill is for IDENTIFYING threats systematically. Once threats are identified, route to appropriate skills for designing controls, implementing patterns, or documenting decisions.
+
+## The STRIDE Framework
+
+**STRIDE** is a systematic threat enumeration framework. Apply to EVERY component, interface, and data flow.
+
+### S - Spoofing Identity
+
+**Definition**: Attacker pretends to be someone/something else
+
+**Questions to Ask**:
+- Can attacker claim a different identity?
+- Is authentication required? Can it be bypassed?
+- Are credentials properly validated?
+- Can tokens/sessions be stolen or forged?
+
+**Example Threats**:
+- Stolen authentication tokens
+- Forged JWT signatures
+- Session hijacking via XSS
+- API key leakage in logs
+
+
+### T - Tampering with Data
+
+**Definition**: Unauthorized modification of data or code
+
+**Questions to Ask**:
+- Can attacker modify data in transit? (MITM)
+- Can attacker modify data at rest? (database, files, config)
+- Can attacker modify code? (supply chain, config injection)
+- **Can configuration override security properties?** (CRITICAL - often missed)
+
+**Example Threats**:
+- Configuration files modifying security_level properties
+- YAML/JSON injection overriding access controls
+- Database tampering if encryption/MAC missing
+- Code injection via deserialization
+
+**⚠️ Property Override Pattern** (VULN-004):
+```yaml
+# Plugin declares security_level=UNOFFICIAL in code
+# Attacker adds to YAML config:
+plugins:
+  datasource:
+    security_level: SECRET  # OVERRIDES code declaration!
+```
+
+Always ask: **"Can external configuration override security-critical properties?"**
+
+
+### R - Repudiation
+
+**Definition**: Attacker denies performing an action (no audit trail)
+
+**Questions to Ask**:
+- Are security-relevant actions logged?
+- Can logs be tampered with or deleted?
+- Is logging sufficient for forensics?
+- Can attacker perform reconnaissance without detection?
+
+**Example Threats**:
+- No logging of failed authorization attempts
+- Logs stored without integrity protection (MAC, signatures)
+- Insufficient detail for incident response
+- Log injection attacks
+
+
+### I - Information Disclosure
+
+**Definition**: Exposure of information to unauthorized parties
+
+**Questions to Ask**:
+- What data is exposed in responses, logs, errors?
+- Can attacker enumerate resources or users?
+- Are temporary files, caches, or memory properly cleared?
+- Can attacker infer sensitive data from timing/behavior?
+
+**Example Threats**:
+- Secrets in error messages or stack traces
+- Timing attacks revealing password validity
+- Cache poisoning exposing other users' data
+- Path traversal reading arbitrary files
+
+
+### D - Denial of Service
+
+**Definition**: Making system unavailable or degrading performance
+
+**Questions to Ask**:
+- Are there resource limits (CPU, memory, connections)?
+- Can attacker trigger expensive operations?
+- Is rate limiting implemented?
+- Can attacker cause crashes or hangs?
+
+**Example Threats**:
+- Unbounded recursion or loops
+- Memory exhaustion via large payloads
+- Algorithmic complexity attacks (e.g., hash collisions)
+- Crash via malformed input
+
+
+### E - Elevation of Privilege
+
+**Definition**: Gaining capabilities beyond what's authorized
+
+**Questions to Ask**:
+- Can attacker access admin functions?
+- Can attacker escalate from low to high privilege?
+- Are privilege checks performed at every layer?
+- **Can type system be bypassed?** (ADR-004 pattern)
+
+**Example Threats**:
+- Missing authorization checks on sensitive endpoints
+- Horizontal privilege escalation (access other users' data)
+- Vertical privilege escalation (user → admin)
+- Duck typing allowing security bypass
+
+
+## Attack Tree Construction
+
+**Purpose**: Visual/structured representation of attack paths from goal → exploitation
+
+### Attack Tree Format
+
+```
+ROOT: Attacker Goal (e.g., "Access classified data")
+├─ BRANCH 1: Attack Vector
+│  ├─ LEAF: Specific Exploit (with feasibility)
+│  └─ LEAF: Alternative Exploit
+├─ BRANCH 2: Alternative Vector
+│  └─ LEAF: Exploit Method
+```
+
+### Example: Configuration Override Attack (VULN-004)
+
+```
+ROOT: Access classified data with insufficient clearance
+├─ Override Plugin Security Level
+│  ├─ Inject security_level into YAML config ⭐ (VULN-004 - actually happened)
+│  ├─ Modify plugin source code (requires code access)
+│  └─ Bypass registry to register malicious plugin (ADR-003 gap)
+├─ Exploit Trusted Downgrade
+│  ├─ Compromise high-clearance component (supply chain)
+│  └─ Abuse legitimate downgrade path (ADR-005 gap)
+├─ Bypass Type System
+│  └─ Duck-type plugin without BasePlugin inheritance (ADR-004 gap)
+```
+
+**⭐ = Easiest/highest risk path**
+
+### How to Build Attack Trees
+
+1. **Start with attacker goal**: What does attacker want? (data access, DoS, privilege escalation)
+2. **Branch by attack vector**: How could they achieve it? (config, network, code)
+3. **Leaf nodes are specific exploits**: Concrete technical steps
+4. **Mark feasibility**: Easy, Medium, Hard (or Low/Med/High effort)
+5. **Identify easiest path**: This is your highest priority to mitigate
+
+
+## Enforcement Gap Analysis
+
+**Pattern**: Security properties must be enforced at EVERY layer. Single-layer enforcement fails.
+
+### Layers to Check
+
+**For any security property (e.g., security_level, access control, data classification):**
+
+1. **Schema/Type Layer**: Is property type-safe? Can it be None/invalid?
+2. **Registration Layer**: Is component registered? Can attacker bypass registry?
+3. **Construction Layer**: Is property immutable after creation? Can it be modified?
+4. **Runtime Layer**: Is property checked before sensitive operations?
+5. **Post-Operation Layer**: Is result validated against expected property?
+
+### Example: MLS Security Level Enforcement (ADR-002 → 005)
+
+| Layer | Gap Found | Fix Required |
+|-------|-----------|--------------|
+| **Registry** | Plugin not registered at all (ADR-003) | Central plugin registry with runtime checks |
+| **Type System** | Protocol allows duck typing bypass (ADR-004) | ABC with sealed methods, not Protocol |
+| **Immutability** | security_level could be mutated (VULN-009) | Frozen dataclass + runtime checks |
+| **Trust** | Trusted downgrade assumes no compromise (ADR-005) | Strict mode disables trusted downgrade |
+
+**Key Insight**: Each gap was found AFTER implementation. Systematic enforcement gap analysis would have caught all four upfront.
+
+### How to Apply
+
+For each security property:
+1. **List all layers** where property matters
+2. **Ask per layer**: "Can attacker bypass this layer?"
+3. **Design defense-in-depth**: Redundant checks at multiple layers
+
+
+## Risk Scoring
+
+**Purpose**: Prioritize threats by (Likelihood × Impact)
+
+### Likelihood Scale
+
+| Score | Likelihood | Criteria |
+|-------|-----------|----------|
+| **3** | High | Easy to exploit, attacker has means and motive, no special access needed |
+| **2** | Medium | Requires some skill or access, exploit path exists but not trivial |
+| **1** | Low | Requires significant expertise, insider access, or rare conditions |
+
+### Impact Scale
+
+| Score | Impact | Criteria |
+|-------|--------|----------|
+| **3** | High | Complete system compromise, data breach, financial loss, safety risk |
+| **2** | Medium | Partial compromise, limited data exposure, service degradation |
+| **1** | Low | Minor information leakage, temporary DoS, limited scope |
+
+### Risk Matrix
+
+```
+         IMPACT
+         1   2   3
+       ┌───┬───┬───┐
+     3 │ M │ H │ C │  C = Critical (fix immediately)
+L    2 │ L │ M │ H │  H = High (fix before launch)
+I    1 │ L │ L │ M │  M = Medium (fix soon)
+K      └───┴───┴───┘  L = Low (fix if time permits)
+```
+
+### Example: VULN-004 Config Override
+
+- **Likelihood**: 3 (High) - YAML files easily modified with filesystem access
+- **Impact**: 3 (High) - Bypass MLS enforcement, access classified data
+- **Risk Score**: 9 (Critical) - **Fix immediately**
+
+### Example: ADR-004 Type System Bypass
+
+- **Likelihood**: 2 (Medium) - Requires knowing to create duck-typed plugin
+- **Impact**: 3 (High) - Complete security bypass
+- **Risk Score**: 6 (High) - **Fix before launch**
+
+
+## Threat Modeling Workflow
+
+### Step 1: System Decomposition
+
+Break system into components:
+1. **Entry points**: APIs, file uploads, configuration, user input
+2. **Data stores**: Databases, caches, logs, files
+3. **External dependencies**: Third-party APIs, libraries, services
+4. **Trust boundaries**: Where privilege level changes, network boundaries
+5. **Security-critical components**: Auth, access control, crypto, secrets management
+
+
+### Step 2: Apply STRIDE per Component
+
+For EACH component/interface, systematically ask STRIDE questions:
+
+**Example: Plugin Configuration Component**
+
+| STRIDE | Threat Found | Priority |
+|--------|-------------|----------|
+| **S** | None (no identity claims) | - |
+| **T** | Config tampering to override security_level (VULN-004) | Critical |
+| **R** | Config changes not logged | Medium |
+| **I** | Config may contain secrets in plaintext | High |
+| **D** | Malformed YAML causes parser crash | Low |
+| **E** | Config override elevates plugin privilege | Critical |
+
+
+### Step 3: Build Attack Trees
+
+For each high-priority threat, construct attack tree:
+- Goal: What does attacker want?
+- Vectors: How could they get it?
+- Exploits: Specific technical steps
+
+Mark easiest paths with ⭐.
+
+
+### Step 4: Check Enforcement Gaps
+
+For each security property (authentication, authorization, encryption):
+1. List enforcement layers (schema, registry, runtime, etc.)
+2. Check each layer for gaps
+3. Design redundant checks (defense-in-depth)
+
+
+### Step 5: Score and Prioritize
+
+- Calculate Likelihood × Impact for each threat
+- Sort by risk score (highest first)
+- Set mitigation deadlines (Critical → immediate, High → before launch)
+
+
+### Step 6: Document Threats
+
+Create threat model document:
+```markdown
+# Threat Model: [System Name]
+
+## Scope
+[Components, entry points, trust boundaries]
+
+## Threats Identified
+
+### THREAT-001: Configuration Override Attack (CRITICAL)
+**STRIDE**: Tampering, Elevation of Privilege
+**Attack Tree**: [Include tree diagram or text description]
+**Risk Score**: 9 (L:3 × I:3)
+**Mitigation**: Forbid security_level in config (schema), runtime verification, frozen dataclass
+
+### THREAT-002: [Next threat...]
+
+## Enforcement Gaps
+[List gaps found in defense-in-depth analysis]
+
+## Risk Matrix
+[Include prioritized threat list]
+```
+
+
+## Common Patterns That Intuition Misses
+
+### Pattern 1: Property Override via Configuration
+
+**Symptom**: Security property declared in code, but configuration system allows overriding it
+
+**Example**: VULN-004 - Plugin declares security_level in code, YAML config overrides it
+
+**How to Spot**:
+- Code declares security property (access_level, security_level, role)
+- Configuration system loads external data (YAML, JSON, database)
+- No explicit check that config cannot override security properties
+
+**Mitigation**: Schema MUST forbid security properties in config, runtime verification
+
+
+### Pattern 2: Enforcement at One Layer Only
+
+**Symptom**: Security check at one layer, but attacker can bypass that layer
+
+**Example**: ADR-003 - MLS checks assume plugin is registered, but no check that plugin IS registered
+
+**How to Spot**:
+- Security check at schema/type layer but not runtime
+- Trust in single source of truth (registry, type system) without verification
+- No redundant checks
+
+**Mitigation**: Defense-in-depth - check at schema, registry, runtime, post-operation
+
+
+### Pattern 3: Type System as Security Boundary
+
+**Symptom**: Relying on type system (Protocol, interface) for security enforcement
+
+**Example**: ADR-004 - Protocol typing allows duck typing to bypass BasePlugin
+
+**How to Spot**:
+- Security property defined in Protocol or interface
+- No nominal type enforcement (isinstance check, ABC)
+- Runtime doesn't verify actual type, just duck typing compatibility
+
+**Mitigation**: Use ABC with sealed methods, runtime isinstance checks
+
+
+### Pattern 4: Trusted Component Assumptions
+
+**Symptom**: Assuming high-privilege component will never be compromised
+
+**Example**: ADR-005 - Trusted downgrade assumes high-clearance component is always safe
+
+**How to Spot**:
+- Component granted special privileges ("trusted")
+- No monitoring or verification of trusted component behavior
+- Insider threat or supply chain compromise not in threat model
+
+**Mitigation**: Trust but verify - log all actions, anomaly detection, strict mode without trust
+
+
+### Pattern 5: Immutability Assumption
+
+**Symptom**: Assuming language feature (frozen, const, final) provides security
+
+**Example**: VULN-009 - Frozen dataclass but __dict__ bypass possible
+
+**How to Spot**:
+- Security property marked frozen/immutable via language feature
+- No runtime check that property hasn't changed
+- Language feature has known bypasses (__dict__, __setattr__)
+
+**Mitigation**: Language feature + runtime verification + test all bypass methods
+
+
+## Quick Reference Checklist
+
+**Use this checklist for every threat modeling session:**
+
+### Pre-Session
+- [ ] Identify scope (components, entry points, trust boundaries)
+- [ ] Gather architecture diagrams, API specs, data flow diagrams
+
+### STRIDE Application
+- [ ] Apply S.T.R.I.D.E to EVERY component/interface
+- [ ] Document threats found per category
+- [ ] Check for property override patterns
+- [ ] Check for enforcement gap patterns
+
+### Attack Trees
+- [ ] Build attack tree for each high-priority threat
+- [ ] Mark easiest exploitation paths
+- [ ] Identify pre-requisites (what attacker needs)
+
+### Risk Scoring
+- [ ] Score Likelihood (1-3) for each threat
+- [ ] Score Impact (1-3) for each threat
+- [ ] Calculate Risk = L × I
+- [ ] Prioritize by risk score
+
+### Enforcement Gaps
+- [ ] List security properties (auth, authorization, encryption, etc.)
+- [ ] For each property, check: Schema? Registry? Runtime? Post-op?
+- [ ] Identify gaps in defense-in-depth
+
+### Documentation
+- [ ] Create threat model document
+- [ ] Include attack trees, risk matrix, mitigation plans
+- [ ] Share with team for review
+
+
+## Common Mistakes
+
+### ❌ Intuitive Threat Finding Only
+**Wrong**: "I'll just think about what could go wrong"
+**Right**: Systematically apply STRIDE to every component
+
+**Why**: Intuition finds obvious threats. STRIDE finds subtle, critical threats like VULN-004.
+
+### ❌ Threat Modeling After Implementation
+**Wrong**: "Let's build it first, then threat model"
+**Right**: Threat model BEFORE implementation
+
+**Why**: Threats found post-implementation require expensive re-architecture. Threats found in design are cheap to fix.
+
+### ❌ Single-Layer Validation
+**Wrong**: "Schema validates config, so it's secure"
+**Right**: Validate at schema, registry, runtime, post-operation
+
+**Why**: Attackers bypass single layers. Defense-in-depth catches them.
+
+### ❌ Trusting Language Features for Security
+**Wrong**: "It's frozen=True, so it can't be modified"
+**Right**: Language feature + runtime verification + test bypass methods
+
+**Why**: Language features have bypasses (VULN-009). Always verify.
+
+### ❌ Skipping Risk Scoring
+**Wrong**: "All threats are important, fix them all"
+**Right**: Score L×I, prioritize Critical/High, fix Low only if time permits
+
+**Why**: Resources are limited. Critical threats must be fixed first.
+
+
+## Real-World Examples
+
+### Example 1: VULN-004 - Configuration Override Attack
+
+**System**: Plugin system with YAML configuration and MLS security levels
+
+**STRIDE Analysis**:
+- **T** (Tampering): Config file tampering ✓
+- **E** (Elevation): Override security_level property ✓ **← Caught by STRIDE**
+
+**Attack Tree**:
+```
+Goal: Access classified data
+└─ Override security_level to SECRET
+   ├─ Inject security_level: SECRET into YAML ⭐ (easiest)
+   ├─ Modify source code (harder)
+   └─ Compromise plugin registry (harder)
+```
+
+**Risk Score**: L:3 × I:3 = 9 (Critical)
+
+**Mitigation**: Forbid security_level in config schema + runtime verification
+
+
+### Example 2: ADR-002 → 005 - MLS Design Gaps
+
+**System**: Multi-Level Security enforcement for plugins
+
+**Enforcement Gap Analysis**:
+1. **Registry Layer**: No check plugin is registered (ADR-003) ✓
+2. **Type Layer**: Protocol allows duck typing (ADR-004) ✓
+3. **Immutability**: security_level could be mutated (VULN-009) ✓
+4. **Trust**: Trusted downgrade assumes no compromise (ADR-005) ✓
+
+**All four gaps found by systematic enforcement analysis** - would have prevented 3 follow-up ADRs.
+
+**Risk Scores**:
+- ADR-003 (registry): L:2 × I:3 = 6 (High)
+- ADR-004 (type): L:2 × I:3 = 6 (High)
+- ADR-005 (trust): L:1 × I:3 = 3 (Medium)
+
+
+## When NOT to Threat Model
+
+**Don't threat model for**:
+- Non-security features (UI styling, analytics dashboards with no sensitive data)
+- Changes that don't touch attack surface (refactoring internal code, renaming variables)
+- Systems with no sensitive data and no attack value (internal dev tools, prototypes)
+
+**Quick test**: If attacker can't gain anything (data, money, access, disruption), threat modeling may be overkill.
+
+
+## Cross-References
+
+### Load These Skills Together
+
+**For comprehensive security**:
+- `ordis/security-architect/threat-modeling` (this skill) - Find threats
+- `ordis/security-architect/security-controls-design` - Design mitigations
+- `ordis/security-architect/secure-by-design-patterns` - Prevent threats at architecture level
+
+**For documentation**:
+- `ordis/security-architect/documenting-threats-and-controls` - Document threat model
+- `muna/technical-writer/documentation-structure` - Structure threat docs as ADRs
+
+
+## Summary
+
+**Threat modeling IS systematic threat discovery using STRIDE, attack trees, and risk scoring.**
+
+**Key Principles**:
+1. **STRIDE every component** - systematic beats intuition
+2. **Build attack trees** - find easiest exploitation paths
+3. **Check enforcement gaps** - defense-in-depth at every layer
+4. **Score risks** - L × I prioritization
+5. **Do it early** - before implementation, when fixes are cheap
+
+**Meta-rule**: If you're designing something security-sensitive and you haven't threat modeled it, you've missed critical threats. Always threat model first.