538 lines
18 KiB
Markdown
538 lines
18 KiB
Markdown
|
|
# Security Controls Design
|
|
|
|
## Overview
|
|
|
|
Design security controls as **layered defenses at trust boundaries**. Core principle: Apply systematic checks at every boundary to ensure no single control failure compromises security.
|
|
|
|
**Key insight**: List specific controls after identifying WHERE to apply them (trust boundaries first, then controls).
|
|
|
|
## When to Use
|
|
|
|
Load this skill when:
|
|
- Implementing authentication/authorization systems
|
|
- Hardening API endpoints, databases, file storage
|
|
- Designing data protection mechanisms
|
|
- Securing communication channels
|
|
- Protecting sensitive operations
|
|
|
|
**Symptoms you need this**:
|
|
- "How do I secure this API/database/upload feature?"
|
|
- "What controls should I implement?"
|
|
- "How do I prevent unauthorized access to X?"
|
|
- "How do I harden this system?"
|
|
|
|
**Don't use for**:
|
|
- Threat modeling (use `ordis/security-architect/threat-modeling` first)
|
|
- Code-level security patterns (use `ordis/security-architect/secure-code-patterns`)
|
|
- Reviewing existing designs (use `ordis/security-architect/security-architecture-review`)
|
|
|
|
## Core Methodology: Trust Boundaries First
|
|
|
|
**DON'T start with**: "What controls should I implement?"
|
|
|
|
**DO start with**: "Where are the trust boundaries?"
|
|
|
|
###Step 1: Identify Trust Boundaries
|
|
|
|
Trust boundaries are **points where data/requests cross from less-trusted to more-trusted zones**.
|
|
|
|
**Common boundaries:**
|
|
- Internet → API Gateway
|
|
- API Gateway → Application Server
|
|
- Application → Database
|
|
- Application → File Storage
|
|
- Unauthenticated → Authenticated
|
|
- User Role → Admin Role
|
|
- External Service → Internal Service
|
|
|
|
**Example: File Upload System**
|
|
```
|
|
Trust Boundaries:
|
|
1. User Browser → Upload Endpoint (UNTRUSTED → APP)
|
|
2. Upload Endpoint → Virus Scanner (APP → SCANNER)
|
|
3. Scanner → Storage (SCANNER → S3)
|
|
4. Storage → Display (S3 → USER)
|
|
5. Storage → Internal Processing (S3 → APP)
|
|
```
|
|
|
|
### Step 2: Apply Defense-in-Depth at Each Boundary
|
|
|
|
For EACH boundary, apply multiple control layers. If one fails, others provide backup.
|
|
|
|
## Defense-in-Depth Checklist
|
|
|
|
Use this checklist at **every trust boundary**:
|
|
|
|
### Layer 1: Validation (First Line)
|
|
- [ ] **Input validation**: Type, format, size, allowed values
|
|
- [ ] **Sanitization**: Remove dangerous characters, escape output
|
|
- [ ] **Canonicalization**: Resolve to standard form (prevent bypass)
|
|
|
|
### Layer 2: Authentication (Who Are You?)
|
|
- [ ] **Identity verification**: Credentials, tokens, certificates
|
|
- [ ] **Multi-factor authentication**: For sensitive boundaries
|
|
- [ ] **Session management**: Secure tokens, expiration, rotation
|
|
|
|
### Layer 3: Authorization (What Can You Do?)
|
|
- [ ] **Access control checks**: RBAC, ABAC, resource-level
|
|
- [ ] **Least privilege enforcement**: Grant minimum necessary
|
|
- [ ] **Privilege escalation prevention**: No path to higher access
|
|
|
|
### Layer 4: Rate Limiting (Abuse Prevention)
|
|
- [ ] **Request rate limits**: Per-IP, per-user, per-endpoint
|
|
- [ ] **Resource quotas**: Prevent resource exhaustion
|
|
- [ ] **Anomaly detection**: Flag unusual patterns
|
|
|
|
### Layer 5: Audit Logging (Detective)
|
|
- [ ] **Security event logging**: Who, what, when, where, outcome
|
|
- [ ] **Tamper-proof logs**: Write-only for applications
|
|
- [ ] **Alerting**: Automated detection of suspicious activity
|
|
|
|
### Layer 6: Encryption (Confidentiality)
|
|
- [ ] **Data in transit**: TLS 1.3, certificate validation
|
|
- [ ] **Data at rest**: Encryption for sensitive data
|
|
- [ ] **Key management**: Secure storage, rotation, separation
|
|
|
|
**Example Application** (API Authentication Boundary):
|
|
```
|
|
Internet → API Gateway boundary:
|
|
|
|
Layer 1 (Validation):
|
|
- Validate Authorization header present and well-formed
|
|
- Check request size limits (prevent DoS)
|
|
- Validate content-type and payload structure
|
|
|
|
Layer 2 (Authentication):
|
|
- Verify JWT signature (RS256, public key validation)
|
|
- Check token expiration (exp claim)
|
|
- Verify token not revoked (check Redis revocation list)
|
|
|
|
Layer 3 (Authorization):
|
|
- Extract scopes from token
|
|
- Verify endpoint requires scope present in token
|
|
- Check resource-level permissions (can user access THIS resource?)
|
|
|
|
Layer 4 (Rate Limiting):
|
|
- Per-token: 1000 requests/minute
|
|
- Per-IP: 100 requests/minute (catch token sharing)
|
|
- Per-endpoint: Stricter limits on write operations
|
|
|
|
Layer 5 (Audit Logging):
|
|
- Log authentication attempts (success/failure)
|
|
- Log authorization decisions (allowed/denied)
|
|
- Log resource access (who accessed what)
|
|
|
|
Layer 6 (Encryption):
|
|
- Enforce TLS 1.3 only (reject unencrypted)
|
|
- Validate certificate chain
|
|
- Store tokens encrypted in session store
|
|
```
|
|
|
|
**If ANY layer fails, others provide defense.**
|
|
|
|
## Fail-Secure Patterns
|
|
|
|
When a control fails, system should **default to secure state** (deny access, close connection, reject request).
|
|
|
|
### Fail-Closed (Secure) vs Fail-Open (Insecure)
|
|
|
|
| Situation | Fail-Open (❌ BAD) | Fail-Closed (✅ GOOD) |
|
|
|-----------|-------------------|---------------------|
|
|
| **Auth service down** | Allow all requests through | Deny all requests until service recovers |
|
|
| **Token validation fails** | Treat as valid | Reject request |
|
|
| **Database unreachable** | Skip permission check | Deny access |
|
|
| **Rate limit store unavailable** | No rate limiting | Apply strictest default limit |
|
|
| **Audit log fails to write** | Continue operation | Reject operation |
|
|
|
|
### Examples of Fail-Secure Implementation
|
|
|
|
**Example 1: Authentication Service Failure**
|
|
```python
|
|
def authenticate_request(request):
|
|
try:
|
|
token = extract_token(request)
|
|
user = auth_service.validate_token(token) # External service call
|
|
return user
|
|
except AuthServiceUnavailable:
|
|
# ❌ FAIL-OPEN: return AnonymousUser() # Let them through
|
|
# ✅ FAIL-CLOSED: raise Unauthorized("Authentication service unavailable")
|
|
raise Unauthorized("Authentication service unavailable")
|
|
except InvalidToken:
|
|
raise Unauthorized("Invalid token")
|
|
```
|
|
|
|
**Example 2: Rate Limiter Failure**
|
|
```python
|
|
def check_rate_limit(user_id):
|
|
try:
|
|
redis.incr(f"rate:{user_id}")
|
|
count = redis.get(f"rate:{user_id}")
|
|
if count > LIMIT:
|
|
raise RateLimitExceeded()
|
|
except RedisConnectionError:
|
|
# ❌ FAIL-OPEN: return # Let request through
|
|
# ✅ FAIL-CLOSED: Apply strictest default limit
|
|
# If Redis is down, apply aggressive in-memory rate limit
|
|
in_memory_limiter.check(user_id, limit=10) # Much stricter than normal
|
|
```
|
|
|
|
**Example 3: Database Permission Check**
|
|
```python
|
|
def can_user_access_resource(user_id, resource_id):
|
|
try:
|
|
permission = db.query(
|
|
"SELECT can_read FROM permissions WHERE user_id = ? AND resource_id = ?",
|
|
user_id, resource_id
|
|
)
|
|
return permission.can_read
|
|
except DatabaseConnectionError:
|
|
# ❌ FAIL-OPEN: return True # Assume they have access
|
|
# ✅ FAIL-CLOSED: return False # Deny access if can't verify
|
|
logger.error(f"DB unavailable, denying access for user={user_id} resource={resource_id}")
|
|
return False
|
|
```
|
|
|
|
**Example 4: File Type Validation**
|
|
```python
|
|
def validate_file_upload(file):
|
|
# Layer 1: Check extension
|
|
if file.extension not in ALLOWED_EXTENSIONS:
|
|
raise ValidationError("Invalid file type")
|
|
|
|
# Layer 2: Check magic bytes
|
|
try:
|
|
magic_bytes = file.read(16)
|
|
if not is_valid_magic_bytes(magic_bytes):
|
|
# ✅ FAIL-CLOSED: If magic bytes don't match, reject
|
|
# Even if extension passed, magic bytes take precedence
|
|
raise ValidationError("File content doesn't match extension")
|
|
except Exception as e:
|
|
# ❌ FAIL-OPEN: return True # Couldn't check, assume valid
|
|
# ✅ FAIL-CLOSED: raise ValidationError("Could not validate file")
|
|
raise ValidationError(f"Could not validate file: {e}")
|
|
```
|
|
|
|
**Principle**: **When in doubt, deny**. It's better to have a false positive (deny legitimate request) than false negative (allow malicious request).
|
|
|
|
## Least Privilege Principle
|
|
|
|
Grant **minimum necessary access** for each component to perform its function. No more.
|
|
|
|
### Application Method
|
|
|
|
**For each component, ask three questions:**
|
|
1. **What does it NEED to do?** (functional requirements)
|
|
2. **What's the MINIMUM access to achieve that?** (reduce scope)
|
|
3. **What can it NEVER do?** (explicit denials)
|
|
|
|
### Example: Database Access Roles
|
|
|
|
**Web Application Role:**
|
|
```sql
|
|
-- What it NEEDS: Read customers, write audit logs
|
|
GRANT SELECT ON customers TO web_app_user;
|
|
GRANT INSERT ON audit_logs TO web_app_user;
|
|
|
|
-- What's MINIMUM: No DELETE, no UPDATE on audit logs (immutable), no admin tables
|
|
REVOKE DELETE ON customers FROM web_app_user;
|
|
REVOKE ALL ON admin_users FROM web_app_user;
|
|
|
|
-- Explicit NEVER: Cannot modify audit logs (tamper-proof)
|
|
REVOKE UPDATE, DELETE ON audit_logs FROM web_app_user;
|
|
|
|
-- Row-level security: Only active customers
|
|
CREATE POLICY web_app_access ON customers
|
|
FOR SELECT TO web_app_user
|
|
USING (status = 'active');
|
|
```
|
|
|
|
**Analytics Role:**
|
|
```sql
|
|
-- What it NEEDS: Read non-PII customer data for analytics
|
|
-- What's MINIMUM: View with PII columns excluded
|
|
CREATE VIEW customers_analytics AS
|
|
SELECT customer_id, country, subscription_tier, created_at
|
|
FROM customers; -- Excludes: name, email, address
|
|
|
|
GRANT SELECT ON customers_analytics TO analytics_user;
|
|
|
|
-- What it can NEVER do: Access PII, modify data, see payment info
|
|
REVOKE ALL ON customers FROM analytics_user;
|
|
REVOKE ALL ON payment_info FROM analytics_user;
|
|
SET default_transaction_read_only = true FOR analytics_user;
|
|
```
|
|
|
|
### File System Permissions
|
|
|
|
**Application Server:**
|
|
```bash
|
|
# What it NEEDS: Read config, write logs, read/write uploads
|
|
/etc/app/config/ → Read-only (owner: root, chmod 640, group: app)
|
|
/var/log/app/ → Write-only (owner: app, chmod 200, append-only)
|
|
/var/uploads/ → Read/write (owner: app, chmod 700)
|
|
|
|
# What it can NEVER do: Write to config, execute from uploads
|
|
/etc/app/config/ → No write permissions
|
|
/var/uploads/ → Mount with noexec flag (prevent execution)
|
|
```
|
|
|
|
### API Scopes (OAuth2 Pattern)
|
|
|
|
```python
|
|
# User requests minimal scopes
|
|
scopes_requested = ["read:profile", "read:posts"]
|
|
|
|
# DON'T grant admin scopes by default
|
|
# DO grant only what was requested and approved
|
|
token = create_token(user, scopes=scopes_requested)
|
|
|
|
# At each endpoint, verify scope
|
|
@require_scope("write:posts")
|
|
def create_post(request):
|
|
# This endpoint is inaccessible with read:posts scope
|
|
pass
|
|
```
|
|
|
|
**Principle**: **Default deny, explicit allow**. Start with no access, grant only what's needed.
|
|
|
|
## Separation of Duties
|
|
|
|
**No single component/person/account should have complete control** over a critical operation.
|
|
|
|
### Patterns
|
|
|
|
#### Pattern 1: Multi-Signature Approvals
|
|
|
|
**Example: Production Deployments**
|
|
```yaml
|
|
# Require 2 approvals from different teams
|
|
approvals:
|
|
required: 2
|
|
teams:
|
|
- engineering-leads
|
|
- security-team
|
|
|
|
# Cannot approve own PR
|
|
prevent_self_approval: true
|
|
```
|
|
|
|
#### Pattern 2: Split Responsibilities
|
|
|
|
**Example: Payment Processing**
|
|
```python
|
|
# Component A: Initiates payment (can create, cannot approve)
|
|
payment_service.initiate_payment(amount, account)
|
|
|
|
# Component B: Approves payment (can approve, cannot create)
|
|
# Different credentials, different service
|
|
approval_service.approve_payment(payment_id)
|
|
|
|
# Component C: Executes payment (can execute, cannot create/approve)
|
|
# Only accepts approved payments
|
|
execution_service.execute_payment(approved_payment_id)
|
|
```
|
|
|
|
**No single service can create AND approve AND execute a payment.**
|
|
|
|
#### Pattern 3: Key Splitting
|
|
|
|
**Example: Encryption Key Management**
|
|
```python
|
|
# Master key split into 3 shares using Shamir Secret Sharing
|
|
# Require 2 of 3 shares to reconstruct
|
|
shares = split_key(master_key, threshold=2, num_shares=3)
|
|
|
|
# Distribute to different teams/locations
|
|
security_team.store(shares[0])
|
|
ops_team.store(shares[1])
|
|
compliance_team.store(shares[2])
|
|
|
|
# Reconstruction requires 2 teams to cooperate
|
|
reconstructed = reconstruct_key([shares[0], shares[1]])
|
|
```
|
|
|
|
#### Pattern 4: Admin Operations Require Approval
|
|
|
|
**Example: Database Admin Actions**
|
|
```python
|
|
# Admin initiates action (creates request, cannot execute)
|
|
admin_request = AdminRequest(
|
|
action="DELETE_USER",
|
|
user_id=12345,
|
|
reason="GDPR erasure request",
|
|
requested_by=admin_id
|
|
)
|
|
|
|
# Second admin reviews and approves (cannot initiate)
|
|
reviewer.approve(admin_request, reviewer_id=different_admin_id)
|
|
|
|
# System executes after approval (automated, no single admin control)
|
|
if admin_request.is_approved():
|
|
execute_admin_action(admin_request)
|
|
```
|
|
|
|
**Principle**: **Break critical paths into multiple steps requiring different actors.**
|
|
|
|
## Control Verification Method
|
|
|
|
For **each control you design**, ask: **"What if this control fails?"**
|
|
|
|
### Verification Checklist
|
|
|
|
**For each control:**
|
|
1. **What attack does this prevent?** (threat it addresses)
|
|
2. **How can this control fail?** (failure modes)
|
|
3. **What happens if it fails?** (impact)
|
|
4. **What's the next layer of defense?** (backup control)
|
|
5. **Is failure logged/detected?** (observability)
|
|
|
|
### Example: API Token Validation
|
|
|
|
**Control**: Verify JWT signature before processing request
|
|
|
|
1. **What attack**: Prevents forged tokens, ensures authenticity
|
|
2. **How it can fail**:
|
|
- Public key unavailable (service down)
|
|
- Expired token not caught (clock skew)
|
|
- Token revocation list unavailable (Redis down)
|
|
- Signature algorithm downgrade attack (accept HS256 instead of RS256)
|
|
3. **What if it fails**:
|
|
- Public key unavailable → Fail-closed (deny all requests)
|
|
- Expired token → Layer 2: Check expiration explicitly
|
|
- Revocation list down → Layer 3: Apply strict rate limits as fallback
|
|
- Algorithm downgrade → Layer 4: Explicitly require RS256, reject others
|
|
4. **Next layer**:
|
|
- Authorization checks (even with valid token, check permissions)
|
|
- Rate limiting (limit damage from compromised token)
|
|
- Audit logging (detect unusual access patterns)
|
|
5. **Failure logged**: Yes → Log signature validation failures, alert on spike
|
|
|
|
**Outcome**: Designed 4 layers of defense against token attacks.
|
|
|
|
### Example: File Upload Validation
|
|
|
|
**Control**: Check file extension against allowlist
|
|
|
|
1. **What attack**: Prevents upload of executable files (.exe, .sh)
|
|
2. **How it can fail**:
|
|
- Attacker renames malware.exe → malware.jpg
|
|
- Double extension: malware.jpg.exe
|
|
- Case variation: malware.ExE
|
|
3. **What if it fails**: Malicious file stored, potentially executed
|
|
4. **Next layers**:
|
|
- Layer 2: Magic byte verification (check file content, not name)
|
|
- Layer 3: Antivirus scanning (detect known malware)
|
|
- Layer 4: File reprocessing (re-encode images, destroying embedded code)
|
|
- Layer 5: noexec mount (storage prevents execution)
|
|
- Layer 6: Separate domain for user content (CSP prevents XSS)
|
|
5. **Failure logged**: Yes → Log validation failures, rejected files
|
|
|
|
**Outcome**: Extension check is Layer 1 of 6. If bypassed, 5 more layers prevent exploitation.
|
|
|
|
## Quick Reference: Control Selection
|
|
|
|
**For every trust boundary, apply this checklist:**
|
|
|
|
| Layer | Control Type | Example |
|
|
|-------|--------------|---------|
|
|
| **1. Validation** | Input checking | Size limits, type validation, sanitization |
|
|
| **2. Authentication** | Identity verification | JWT validation, certificate checks, MFA |
|
|
| **3. Authorization** | Permission checks | RBAC, resource-level access, least privilege |
|
|
| **4. Rate Limiting** | Abuse prevention | Per-user limits, anomaly detection, quotas |
|
|
| **5. Audit Logging** | Detective | Security events, tamper-proof logs, alerting |
|
|
| **6. Encryption** | Confidentiality | TLS in transit, encryption at rest, key management |
|
|
|
|
**For each control:**
|
|
- Define fail-secure behavior (what happens if it fails?)
|
|
- Apply least privilege (minimum necessary access)
|
|
- Verify separation of duties (no single point of complete control)
|
|
- Test "what if this fails?" (ensure backup layers exist)
|
|
|
|
## Common Mistakes
|
|
|
|
### ❌ Designing Controls Before Identifying Boundaries
|
|
|
|
**Wrong**: "I need authentication and authorization and rate limiting"
|
|
|
|
**Right**: "Where are my trust boundaries? → Internet→API, API→Database → At each: apply layered controls"
|
|
|
|
**Why**: Controls are meaningless without knowing WHERE to apply them.
|
|
|
|
|
|
### ❌ Single Layer of Defense
|
|
|
|
**Wrong**: "Authentication is enough security"
|
|
|
|
**Right**: "Authentication + Authorization + Rate Limiting + Audit Logging"
|
|
|
|
**Why**: If authentication is bypassed (bug, misconfiguration), other layers provide defense.
|
|
|
|
|
|
### ❌ Fail-Open Defaults
|
|
|
|
**Wrong**:
|
|
```python
|
|
try:
|
|
user = auth_service.validate(token)
|
|
except ServiceUnavailable:
|
|
user = AnonymousUser() # Let them through
|
|
```
|
|
|
|
**Right**:
|
|
```python
|
|
try:
|
|
user = auth_service.validate(token)
|
|
except ServiceUnavailable:
|
|
raise Unauthorized("Auth service unavailable")
|
|
```
|
|
|
|
**Why**: Control failure should result in secure state (deny), not insecure state (allow).
|
|
|
|
|
|
### ❌ Excessive Privileges
|
|
|
|
**Wrong**: Grant web application full database access (SELECT, INSERT, UPDATE, DELETE on all tables)
|
|
|
|
**Right**: Grant only needed operations per table (SELECT on customers, INSERT-only on audit_logs)
|
|
|
|
**Why**: Minimizes damage from compromised application (SQL injection, stolen credentials).
|
|
|
|
|
|
### ❌ Single Point of Control
|
|
|
|
**Wrong**: One admin account can initiate, approve, and execute critical operations
|
|
|
|
**Right**: Separate accounts for initiate vs approve, require multi-signature
|
|
|
|
**Why**: Prevents single compromised account from complete system control.
|
|
|
|
|
|
### ❌ No Verification of "What If This Fails?"
|
|
|
|
**Wrong**: Design controls, assume they work
|
|
|
|
**Right**: For each control, ask "how can this fail?" and design backup layers
|
|
|
|
**Why**: Controls fail due to bugs, misconfigurations, attacks. Backup layers provide resilience.
|
|
|
|
## Cross-References
|
|
|
|
**Use BEFORE this skill**:
|
|
- `ordis/security-architect/threat-modeling` - Identify threats first, then design controls to address them
|
|
|
|
**Use WITH this skill**:
|
|
- `muna/technical-writer/documentation-structure` - Document control architecture as ADR
|
|
|
|
**Use AFTER this skill**:
|
|
- `ordis/security-architect/security-architecture-review` - Review controls for completeness
|
|
|
|
## Real-World Impact
|
|
|
|
**Well-designed controls using this methodology:**
|
|
- Multi-layered API authentication catching token forgery even when signature validation was bypassed (algorithm confusion attack)
|
|
- Database access controls limiting SQL injection damage to read-only operations (least privilege prevented data deletion)
|
|
- File upload defenses stopping malware despite extension check bypass (magic bytes + antivirus + reprocessing layers)
|
|
|
|
**Key lesson**: **Systematic application of defense-in-depth at trust boundaries is more effective than ad-hoc control selection.**
|