gh-tachyon-beep-skillpacks-…/skills/using-simulation-tactics/performance-optimization-for-sims.md


# Performance Optimization for Simulations

**When to use this skill**: When simulations run below target frame rate (typically 60 FPS for PC, 30 FPS for mobile), especially with large agent counts (100+ units), complex AI, physics calculations, or proximity queries. Critical for RTS games, crowd simulations, ecosystem models, traffic systems, and any scenario requiring 1000+ active entities.

**What this skill provides**: Systematic methodology for performance optimization using profiling-driven decisions, spatial partitioning patterns, level-of-detail (LOD) systems, time-slicing, caching strategies, data-oriented design, and selective multithreading. Focuses on achieving 60 FPS at scale while maintaining gameplay quality.


## Core Concepts

### The Optimization Hierarchy (Critical Order)

**ALWAYS optimize in this order** - each level provides 10-100× improvement:

1. **PROFILE FIRST** (0.5-1 hour investment)
   - Identify actual bottleneck with profiler
   - Measure baseline performance
   - Set target frame time budgets
   - **Never guess** - 80% of time is usually in 20% of code

2. **Algorithmic Optimizations** (10-100× improvement)
   - Fix O(n²) → O(n) or O(n log n)
   - Spatial partitioning for proximity queries
   - Replace brute-force with smart algorithms
   - **Biggest wins**, do these FIRST

3. **Level of Detail (LOD)** (2-10× improvement)
   - Reduce computation for distant/unimportant entities
   - Smooth transitions (no popping)
   - Priority-based update frequencies
   - Behavior LOD + visual LOD

4. **Time-Slicing** (2-5× improvement)
   - Spread work across multiple frames
   - Frame time budgets per system
   - Priority queues for important work
   - Amortized expensive operations

5. **Caching** (2-10× improvement)
   - Avoid redundant calculations
   - LRU eviction + TTL
   - Proper invalidation
   - Bounded memory usage

6. **Data-Oriented Design** (1.5-3× improvement)
   - Cache-friendly memory layouts
   - Struct of Arrays (SoA) vs Array of Structs (AoS)
   - Minimize pointer chasing
   - Batch operations on contiguous data

7. **Multithreading** (1.5-4× improvement)
   - ONLY if still needed after above
   - Job systems for data parallelism
   - Avoid locks and race conditions
   - Complexity cost is high

**Example**: RTS with 1000 units at 10 FPS → 60 FPS
- Profile: Vision checks are 80% of frame time
- Spatial partitioning: O(n²) → O(n) = 50× faster → 40 FPS
- LOD: Distant units update less = 1.5× faster → 60 FPS
- Done in 30 minutes vs 2 hours of trial-and-error

### Profiling Methodology

**Three-step profiling process**:

1. **Capture Baseline** (before optimization)
   - Total frame time
   - Time per major system (AI, physics, rendering, pathfinding)
   - CPU vs GPU bound
   - Memory allocations per frame
   - Cache misses (if profiler supports)

2. **Identify Bottleneck** (80/20 rule)
   - Sort functions by time spent
   - Focus on top 3-5 functions (usually 80% of time)
   - Understand WHY they're slow (algorithm, data layout, cache misses)

3. **Validate Improvement** (after each optimization)
   - Measure same metrics
   - Calculate speedup ratio
   - Check for regressions (new bottlenecks)
   - Iterate until target met

**Profiling Tools**:
- **Python**: cProfile, line_profiler, memory_profiler, py-spy
- **C++**: VTune, perf, Instruments (Mac), Very Sleepy
- **Unity**: Unity Profiler, Deep Profile mode
- **Unreal**: Unreal Insights, stat commands
- **Browser**: Chrome DevTools Performance tab

**Example Profiling Output**:
```
Total frame time: 100ms (10 FPS)

Function                    Time    % of Frame
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
update_vision_checks()      80ms    80%      ← BOTTLENECK
update_ai()                 10ms    10%
update_pathfinding()         5ms     5%
update_physics()             3ms     3%
render()                     2ms     2%

Diagnosis: O(n²) vision checks (1000 units × 1000 = 1M checks/frame)
Solution: Spatial partitioning → O(n) checks
```

### Spatial Partitioning

**Problem**: Proximity queries are O(n²) when checking every entity against every other
- 100 entities = 10,000 checks
- 1,000 entities = 1,000,000 checks (death)
- 10,000 entities = 100,000,000 checks (impossible)

**Solution**: Divide space into regions, only check entities in nearby regions

**Spatial Hash Grid** (simplest, fastest for uniform distribution)
- Divide world into fixed-size cells (e.g., 50×50 units)
- Hash entity position to cell(s)
- Query: Check only entities in neighboring cells
- Complexity: O(n) to build, O(1) average query
- Best for: Mostly uniform entity distribution

**Quadtree** (adaptive, good for clustered entities)
- Recursively subdivide space into 4 quadrants
- Split when cell exceeds threshold (e.g., 10 entities)
- Query: Descend tree, check overlapping nodes
- Complexity: O(n log n) to build, O(log n) average query
- Best for: Entities clustered in areas

**Octree** (3D version of quadtree)
- Recursively subdivide 3D space into 8 octants
- Same benefits as quadtree for 3D worlds
- Best for: 3D flight sims, space games, underwater

**Decision Framework**:
```
Spatial Partitioning Choice:

├─ 2D WORLD with UNIFORM DISTRIBUTION?
│  └─ Use Spatial Hash Grid (simplest, fastest)
│
├─ 2D WORLD with CLUSTERED ENTITIES?
│  └─ Use Quadtree (adapts to density)
│
├─ 3D WORLD?
│  └─ Use Octree (3D quadtree)
│
└─ VERY LARGE WORLD (multiple km²)?
   └─ Use Hierarchical Grid (multiple grids at different scales)
```

**Performance Impact**:
- 1000 units: O(n²) = 1,000,000 checks → O(n) = 1,000 checks = **1000× faster**
- Typical speedup: 50-100× in practice (accounting for grid overhead)

### Level of Detail (LOD)

**Concept**: Reduce computation for entities that don't need full precision

**Distance-Based LOD Levels**:
- **LOD 0** (0-50 units from camera): Full detail
  - Full AI decision-making (10 Hz)
  - Precise pathfinding
  - Detailed animations
  - All visual effects

- **LOD 1** (50-100 units): Reduced detail
  - Simplified AI (5 Hz)
  - Coarse pathfinding (waypoints only)
  - Simplified animations
  - Reduced effects

- **LOD 2** (100-200 units): Minimal detail
  - Basic AI (1 Hz)
  - Straight-line movement
  - Static pose or simple animation
  - No effects

- **LOD 3** (200+ units): Culled or dormant
  - State update only (0.2 Hz)
  - No pathfinding
  - Billboards or invisible
  - No physics

**Importance-Based LOD** (better than distance alone):
```python
def calculate_lod_level(entity, camera, player):
    # Multiple factors determine importance
    distance = entity.distance_to(camera)
    is_player_unit = entity.team == player.team
    is_in_combat = entity.in_combat
    is_selected = entity in player.selection

    # Important entities always get high LOD
    if is_selected:
        return 0  # Always full detail
    if is_player_unit and is_in_combat:
        return 0  # Player's units in combat = critical

    # Distance-based for others
    if distance < 50:
        return 0
    elif distance < 100:
        return 1
    elif distance < 200:
        return 2
    else:
        return 3
```

**Smooth LOD Transitions** (avoid popping):
- **Hysteresis**: Different thresholds for upgrading vs downgrading
  - Upgrade LOD at 90 units
  - Downgrade LOD at 110 units
  - 20-unit buffer prevents thrashing

- **Time delay**: Wait N seconds before downgrading LOD
  - Prevents rapid flicker at boundary

- **Blend animations**: Cross-fade between LOD levels
  - 0.5-1 second blend

**Behavior LOD Examples**:

| System | LOD 0 (Full) | LOD 1 (Reduced) | LOD 2 (Minimal) | LOD 3 (Dormant) |
|--------|--------------|-----------------|-----------------|-----------------|
| **AI** | Behavior tree 10 Hz | Simple FSM 5 Hz | Follow path 1 Hz | State only 0.2 Hz |
| **Pathfinding** | Full A* | Hierarchical | Straight line | None |
| **Vision** | 360° scan 10 Hz | Forward cone 5 Hz | None | None |
| **Physics** | Full collision | Bounding box | None | None |
| **Animation** | Full skeleton | 5 bones | Static pose | None |
| **Audio** | 3D positioned | 2D ambient | None | None |

**Performance Impact**:
- 1000 units: 100% at LOD 0 vs 20% at LOD 0 + 80% at LOD 1-3 = **3-5× faster**

### Time-Slicing

**Concept**: Spread expensive operations across multiple frames to stay within frame budget

**Frame Time Budget** (60 FPS = 16.67ms per frame):
```
Frame Budget (16.67ms total):
├─ Rendering: 6ms (40%)
├─ AI: 4ms (24%)
├─ Physics: 3ms (18%)
├─ Pathfinding: 2ms (12%)
└─ Other: 1.67ms (10%)
```

**Time-Slicing Pattern 1: Fixed Budget Per Frame**
```python
class TimeSlicedSystem:
    def __init__(self, budget_ms=2.0):
        self.budget = budget_ms
        self.pending_work = []

    def add_work(self, work_item, priority=0):
        # Priority queue: higher priority = processed first
        heapq.heappush(self.pending_work, (-priority, work_item))

    def update(self, dt):
        start_time = time.time()
        processed = 0

        while self.pending_work and (time.time() - start_time) < self.budget:
            priority, work_item = heapq.heappop(self.pending_work)
            work_item.execute()
            processed += 1

        return processed

# Usage: Pathfinding
pathfinding_system = TimeSlicedSystem(budget_ms=2.0)

for unit in units_needing_paths:
    priority = calculate_priority(unit)  # Player units = high priority
    pathfinding_system.add_work(PathfindRequest(unit), priority)

# Each frame: process as many as fit in 2ms budget
paths_found = pathfinding_system.update(dt)
```

**Time-Slicing Pattern 2: Amortized Updates**
```python
class AmortizedUpdateManager:
    def __init__(self, entities, updates_per_frame=200):
        self.entities = entities
        self.updates_per_frame = updates_per_frame
        self.current_index = 0

    def update(self, dt):
        # Update N entities per frame
        for i in range(self.updates_per_frame):
            entity = self.entities[self.current_index]
            entity.expensive_update(dt)

            self.current_index = (self.current_index + 1) % len(self.entities)

        # All entities updated every N frames
        # 1000 entities / 200 per frame = every 5 frames = 12 Hz at 60 FPS

# Priority-based amortization
def update_with_priority(entities, frame_count):
    for i, entity in enumerate(entities):
        # Distance-based update frequency
        distance = entity.distance_to_camera()

        if distance < 50:
            entity.update()  # Every frame (60 Hz)
        elif distance < 100 and frame_count % 2 == 0:
            entity.update()  # Every 2 frames (30 Hz)
        elif distance < 200 and frame_count % 5 == 0:
            entity.update()  # Every 5 frames (12 Hz)
        elif frame_count % 30 == 0:
            entity.update()  # Every 30 frames (2 Hz)
```

**Time-Slicing Pattern 3: Incremental Processing**
```python
class IncrementalPathfinder:
    """Find path over multiple frames instead of blocking"""

    def __init__(self, max_nodes_per_frame=100):
        self.max_nodes = max_nodes_per_frame
        self.open_set = []
        self.closed_set = set()
        self.current_request = None

    def start_pathfind(self, start, goal):
        self.current_request = PathRequest(start, goal)
        heapq.heappush(self.open_set, (0, start))
        return self.current_request

    def step(self):
        """Process up to max_nodes this frame, return True if done"""
        if not self.current_request:
            return True

        nodes_processed = 0

        while self.open_set and nodes_processed < self.max_nodes:
            current = heapq.heappop(self.open_set)

            if current == self.current_request.goal:
                self.current_request.path = reconstruct_path(current)
                self.current_request.complete = True
                return True

            # Expand neighbors...
            nodes_processed += 1

        return False  # Not done yet, continue next frame

# Usage
pathfinder = IncrementalPathfinder(max_nodes_per_frame=100)
request = pathfinder.start_pathfind(unit.pos, target.pos)

# Each frame
while not request.complete:
    pathfinder.step()  # Process 100 nodes, spread over multiple frames
```

**Performance Impact**:
- 1000 expensive updates: 1000/frame → 200/frame = **5× faster**
- Pathfinding: Blocking 50ms → 2ms budget = stays at 60 FPS

### Caching Strategies

**When to Cache**:
- Expensive calculations used repeatedly (pathfinding, line-of-sight)
- Results that change infrequently (static paths, terrain visibility)
- Deterministic results (same input = same output)

**Cache Design Pattern**:
```python
class PerformanceCache:
    def __init__(self, max_size=10000, ttl_seconds=60.0):
        self.cache = {}  # key -> CacheEntry
        self.max_size = max_size
        self.ttl = ttl_seconds
        self.access_times = {}  # LRU tracking
        self.insert_times = {}  # TTL tracking

    def get(self, key):
        current_time = time.time()

        if key not in self.cache:
            return None

        # Check TTL (time-to-live)
        if current_time - self.insert_times[key] > self.ttl:
            del self.cache[key]
            del self.access_times[key]
            del self.insert_times[key]
            return None

        # Update LRU
        self.access_times[key] = current_time
        return self.cache[key]

    def put(self, key, value):
        current_time = time.time()

        # Evict if full (LRU eviction)
        if len(self.cache) >= self.max_size:
            # Find least recently used
            lru_key = min(self.access_times, key=self.access_times.get)
            del self.cache[lru_key]
            del self.access_times[lru_key]
            del self.insert_times[lru_key]

        self.cache[key] = value
        self.access_times[key] = current_time
        self.insert_times[key] = current_time

    def invalidate(self, key):
        """Explicit invalidation when data changes"""
        if key in self.cache:
            del self.cache[key]
            del self.access_times[key]
            del self.insert_times[key]

    def invalidate_region(self, x, y, radius):
        """Invalidate all cache entries in region (e.g., terrain changed)"""
        keys_to_remove = []
        for key in self.cache:
            if self._key_in_region(key, x, y, radius):
                keys_to_remove.append(key)

        for key in keys_to_remove:
            self.invalidate(key)

# Usage: Path caching
path_cache = PerformanceCache(max_size=5000, ttl_seconds=30.0)

def get_or_calculate_path(start, goal):
    # Quantize to grid for cache key (allow slight position variance)
    key = (round(start.x), round(start.y), round(goal.x), round(goal.y))

    cached = path_cache.get(key)
    if cached:
        return cached  # Cache hit!

    # Cache miss - calculate
    path = expensive_pathfinding(start, goal)
    path_cache.put(key, path)
    return path

# Invalidate when terrain changes
def on_building_placed(x, y):
    path_cache.invalidate_region(x, y, radius=100)
```

**Cache Invalidation Strategies**:

1. **Time-To-Live (TTL)**: Expire after N seconds
   - Good for: Dynamic environments (traffic, weather)
   - Example: Path cache with 30 second TTL

2. **Event-Based**: Invalidate on specific events
   - Good for: Known change triggers (building placed, obstacle moved)
   - Example: Invalidate paths when wall built

3. **Hybrid**: TTL + event-based
   - Good for: Most scenarios
   - Example: 60 second TTL OR invalidate on terrain change

**Performance Impact**:
- Pathfinding with 60% cache hit rate: 40% of requests calculate = **2.5× faster**
- Line-of-sight with 80% cache hit rate: 20% of requests calculate = **5× faster**

### Data-Oriented Design (DOD)

**Concept**: Organize data for cache-friendly access patterns

**Array of Structs (AoS)** - Traditional OOP approach:
```python
class Unit:
    def __init__(self):
        self.x = 0.0
        self.y = 0.0
        self.health = 100
        self.damage = 10
        # ... 20 more fields ...

units = [Unit() for _ in range(1000)]

# Update positions (cache-unfriendly)
for unit in units:
    unit.x += unit.velocity_x * dt  # Load entire Unit struct for each unit
    unit.y += unit.velocity_y * dt  # Only using 2 fields, wasting cache
```

**Struct of Arrays (SoA)** - DOD approach:
```python
class UnitSystem:
    def __init__(self, count):
        # Separate arrays for each component
        self.positions_x = [0.0] * count
        self.positions_y = [0.0] * count
        self.velocities_x = [0.0] * count
        self.velocities_y = [0.0] * count
        self.health = [100] * count
        self.damage = [10] * count
        # ... more arrays ...

units = UnitSystem(1000)

# Update positions (cache-friendly)
for i in range(len(units.positions_x)):
    units.positions_x[i] += units.velocities_x[i] * dt  # Sequential memory access
    units.positions_y[i] += units.velocities_y[i] * dt  # Perfect for CPU cache
```

**Why SoA is Faster**:
- CPU cache lines are 64 bytes
- AoS: Load 1-2 units per cache line (if Unit is 32-64 bytes)
- SoA: Load 8-16 floats per cache line (4 bytes each)
- **4-8× better cache utilization** = 1.5-3× faster in practice

**When to Use SoA**:
- Batch operations on many entities (position updates, damage calculations)
- Systems that only need 1-2 fields from entity
- Performance-critical inner loops

**When AoS is Okay**:
- Small entity counts (< 100)
- Operations needing many fields
- Prototyping (DOD is optimization, not default)

**ECS Architecture** (combines SoA + component composition):
```python
# Components (pure data)
class Position:
    x: float
    y: float

class Velocity:
    x: float
    y: float

class Health:
    current: int
    max: int

# Systems (pure logic)
class MovementSystem:
    def update(self, positions, velocities, dt):
        # Batch process all entities with Position + Velocity
        for i in range(len(positions)):
            positions[i].x += velocities[i].x * dt
            positions[i].y += velocities[i].y * dt

class CombatSystem:
    def update(self, positions, health, attacks):
        # Only process entities with Position + Health + Attack
        # ...

# Entity is just an ID
entities = [Entity(id=i) for i in range(1000)]
```

**Performance Impact**:
- Cache-friendly data layout: 1.5-3× faster for batch operations
- ECS architecture: Enables efficient multithreading (no shared mutable state)

### Multithreading (Use Sparingly)

**When to Multithread**:
- ✅ After all other optimizations (if still needed)
- ✅ Embarrassingly parallel work (no dependencies)
- ✅ Long-running tasks (benefit outweighs overhead)
- ✅ Native code (C++, Rust) - avoids GIL

**When NOT to Multithread**:
- ❌ Python CPU-bound code (GIL limits to 1 core)
- ❌ Before trying simpler optimizations
- ❌ Lots of shared mutable state (locking overhead)
- ❌ Small tasks (thread overhead > savings)

**Job System Pattern** (best practice):
```python
from concurrent.futures import ThreadPoolExecutor
import threading

class JobSystem:
    def __init__(self, num_workers=4):
        self.executor = ThreadPoolExecutor(max_workers=num_workers)

    def submit_batch(self, jobs):
        """Submit list of independent jobs, return futures"""
        futures = [self.executor.submit(job.execute) for job in jobs]
        return futures

    def wait_all(self, futures):
        """Wait for all jobs to complete"""
        results = [future.result() for future in futures]
        return results

# Good: Parallel pathfinding (independent tasks)
job_system = JobSystem(num_workers=4)

path_jobs = [PathfindJob(unit.pos, unit.target) for unit in units_needing_paths]
futures = job_system.submit_batch(path_jobs)

# Do other work while pathfinding runs...

# Collect results
paths = job_system.wait_all(futures)
```

**Data Parallelism Pattern** (no shared mutable state):
```python
def update_positions_parallel(positions, velocities, dt, num_workers=4):
    """Update positions in parallel batches"""

    def update_batch(start_idx, end_idx):
        # Each worker gets exclusive slice (no locks needed)
        for i in range(start_idx, end_idx):
            positions[i].x += velocities[i].x * dt
            positions[i].y += velocities[i].y * dt

    # Split work into batches
    batch_size = len(positions) // num_workers
    futures = []

    for worker_id in range(num_workers):
        start = worker_id * batch_size
        end = start + batch_size if worker_id < num_workers - 1 else len(positions)
        future = executor.submit(update_batch, start, end)
        futures.append(future)

    # Wait for all workers
    for future in futures:
        future.result()
```

**Common Multithreading Pitfalls**:

1. **Race Conditions** (shared mutable state)
   ```python
   # BAD: Multiple threads modifying same list
   for unit in units:
       threading.Thread(target=unit.update, args=(all_units,)).start()
       # Each thread reads/writes all_units = data race!

   # GOOD: Read-only shared data
   for unit in units:
       # units is read-only for all threads
       # Each unit only modifies itself (exclusive ownership)
       threading.Thread(target=unit.update, args=(units,)).start()
   ```

2. **False Sharing** (cache line contention)
   ```python
   # BAD: Adjacent array elements on same cache line
   shared_counters = [0] * 8  # 8 threads updating 8 counters
   # Thread 0 updates counter[0], Thread 1 updates counter[1]
   # Both on same 64-byte cache line = cache thrashing!

   # GOOD: Pad to separate cache lines
   class PaddedCounter:
       value: int
       padding: [int] * 15  # Force to own cache line

   shared_counters = [PaddedCounter() for _ in range(8)]
   ```

3. **Excessive Locking** (defeats parallelism)
   ```python
   # BAD: Single lock for everything
   lock = threading.Lock()

   def update_unit(unit):
       with lock:  # Only 1 thread can work at a time!
           unit.update()

   # GOOD: Lock-free or fine-grained locking
   def update_unit(unit):
       unit.update()  # Each unit independent, no lock needed
   ```

**Performance Impact**:
- 4 cores: Ideal speedup = 4×, realistic = 2-3× (overhead, Amdahl's law)
- Python: Minimal (GIL), use multiprocessing or native extensions
- C++/Rust: Good (2-3× on 4 cores for parallelizable work)


## Decision Frameworks

### Framework 1: Systematic Optimization Process

**Use this process EVERY time performance is inadequate**:

```
Step 1: PROFILE (mandatory, do first)
├─ Capture baseline metrics
├─ Identify top 3-5 bottlenecks (80% of time)
└─ Understand WHY slow (algorithm, data, cache)

Step 2: ALGORITHMIC (10-100× gains)
├─ Is bottleneck O(n²) or worse?
│  ├─ Proximity queries? → Spatial partitioning
│  ├─ Pathfinding? → Hierarchical, flow fields, or caching
│  └─ Sorting? → Better algorithm or less frequent
├─ Is bottleneck doing redundant work?
│  └─ Add caching with LRU + TTL
└─ Measure improvement, re-profile

Step 3: LOD (2-10× gains)
├─ Can distant entities use less detail?
│  ├─ Distance-based LOD levels (4 levels)
│  ├─ Importance weighting (player units > NPC)
│  └─ Smooth transitions (hysteresis, blending)
└─ Measure improvement, re-profile

Step 4: TIME-SLICING (2-5× gains)
├─ Can work spread across multiple frames?
│  ├─ Set frame budget per system (2-4ms typical)
│  ├─ Priority queue (important work first)
│  └─ Amortized updates (N entities per frame)
└─ Measure improvement, re-profile

Step 5: DATA-ORIENTED DESIGN (1.5-3× gains)
├─ Is bottleneck cache-unfriendly?
│  ├─ Convert AoS → SoA for batch operations
│  ├─ Group hot data together
│  └─ Minimize pointer chasing
└─ Measure improvement, re-profile

Step 6: MULTITHREADING (1.5-4× gains, high complexity)
├─ Still below target after above?
│  ├─ Identify embarrassingly parallel work
│  ├─ Job system for independent tasks
│  ├─ Data parallelism (no shared mutable state)
│  └─ Avoid locks (lock-free or per-entity ownership)
└─ Measure improvement, re-profile

Step 7: VALIDATE
├─ Met target frame rate? → Done!
├─ Still slow? → Return to Step 1, find new bottleneck
└─ Regression? → Revert and try different approach
```

**Example Application** (1000-unit RTS at 10 FPS):
1. Profile: Vision checks are 80% (80ms/100ms frame)
2. Algorithmic: Add spatial hash grid → 40 FPS (15ms vision checks)
3. LOD: Distant units update at 5 Hz → 55 FPS (11ms vision)
4. Time-slicing: 2ms pathfinding budget → 60 FPS ✅ **Done**
5. (Skip DOD and multithreading - already at target)

### Framework 2: Choosing Spatial Partitioning

```
START: What's my proximity query scenario?

├─ 2D WORLD with UNIFORM ENTITY DISTRIBUTION?
│  └─ Use SPATIAL HASH GRID
│     - Cell size = 2× query radius (e.g., vision range 50 → cells 100×100)
│     - O(n) build, O(1) query
│     - Simplest to implement
│     - Example: RTS units on open battlefield
│
├─ 2D WORLD with CLUSTERED ENTITIES?
│  └─ Use QUADTREE
│     - Split threshold = 10-20 entities per node
│     - Max depth = 8-10 levels
│     - O(n log n) build, O(log n) query
│     - Example: City simulation (dense downtown, sparse suburbs)
│
├─ 3D WORLD?
│  └─ Use OCTREE
│     - Same as quadtree, but 8 children per node
│     - Example: Space game, underwater sim
│
├─ VERY LARGE WORLD (> 10 km²)?
│  └─ Use HIERARCHICAL GRID
│     - Coarse grid (1km cells) + fine grid (50m cells) per coarse cell
│     - Example: MMO world, open-world game
│
└─ ENTITIES MOSTLY STATIONARY?
   └─ Use STATIC QUADTREE/OCTREE
      - Build once, query many times
      - Example: Building placement, static obstacles
```

**Implementation Complexity**:
- Spatial Hash Grid: **1-2 hours** (simple)
- Quadtree: **3-5 hours** (moderate)
- Octree: **4-6 hours** (moderate)
- Hierarchical Grid: **6-10 hours** (complex)

**Performance Characteristics**:

| Method | Build Time | Query Time | Memory | Best For |
|--------|------------|------------|--------|----------|
| Hash Grid | O(n) | O(1) avg | Low | Uniform distribution |
| Quadtree | O(n log n) | O(log n) avg | Medium | Clustered entities |
| Octree | O(n log n) | O(log n) avg | Medium | 3D worlds |
| Hierarchical | O(n) | O(1) avg | Higher | Massive worlds |

### Framework 3: LOD Level Assignment

```
For each entity, assign LOD level based on:

├─ IMPORTANCE (highest priority)
│  ├─ Player-controlled? → LOD 0 (always full detail)
│  ├─ Player's team AND in combat? → LOD 0
│  ├─ Selected units? → LOD 0
│  ├─ Quest-critical NPCs? → LOD 0
│  └─ Otherwise, use distance-based...
│
├─ DISTANCE FROM CAMERA (secondary)
│  ├─ 0-50 units → LOD 0 (full detail)
│  │  - Update: 60 Hz (every frame)
│  │  - AI: Full behavior tree
│  │  - Pathfinding: Precise A*
│  │  - Animation: Full skeleton
│  │
│  ├─ 50-100 units → LOD 1 (reduced)
│  │  - Update: 30 Hz (every 2 frames)
│  │  - AI: Simplified FSM
│  │  - Pathfinding: Hierarchical
│  │  - Animation: 10 bones
│  │
│  ├─ 100-200 units → LOD 2 (minimal)
│  │  - Update: 12 Hz (every 5 frames)
│  │  - AI: Basic scripted
│  │  - Pathfinding: Waypoints
│  │  - Animation: Static pose
│  │
│  └─ 200+ units → LOD 3 (culled)
│     - Update: 2 Hz (every 30 frames)
│     - AI: State only (no decisions)
│     - Pathfinding: None
│     - Animation: None (invisible or billboard)
│
└─ SCREEN SIZE (tertiary)
   ├─ Occluded or < 5 pixels? → LOD 3 (culled)
   └─ Small on screen? → Bump LOD down 1 level
```

**Hysteresis to Prevent LOD Thrashing**:
```python
# Without hysteresis (bad - flickers)
lod = 0 if distance < 100 else 1
# Entity at 99-101 units: LOD flip-flops every frame!

# With hysteresis (good - stable)
if distance < 90:
    lod = 0  # Upgrade at 90
elif distance > 110:
    lod = 1  # Downgrade at 110
# else: keep current LOD
# 20-unit buffer prevents thrashing
```

### Framework 4: When to Use Multithreading

```
Should I multithread this system?

├─ ALREADY optimized algorithmic/LOD/caching?
│  └─ NO → Do those FIRST (10-100× gains vs 2-4× for threading)
│
├─ WORK IS EMBARRASSINGLY PARALLEL?
│  ├─ Independent tasks (pathfinding requests)? → YES, good candidate
│  ├─ Lots of shared mutable state? → NO, locking kills performance
│  └─ Need results immediately? → NO, adds latency
│
├─ TASK DURATION > 1ms?
│  ├─ YES → Threading overhead is small % of work
│  └─ NO → Overhead dominates, not worth it
│
├─ PYTHON or NATIVE CODE?
│  ├─ Python → Use multiprocessing (avoid GIL) or native extensions
│  └─ C++/Rust → ThreadPool or job system works well
│
├─ COMPLEXITY COST JUSTIFIED?
│  ├─ Can maintain code with debugging difficulty? → Consider it
│  └─ Team inexperienced with threading? → Avoid (bugs are costly)
│
└─ EXPECTED SPEEDUP > 1.5×?
   ├─ 4 cores: Realistic = 2-3× (not 4× due to overhead)
   ├─ Worth complexity? → Your call
   └─ Not worth it? → Try other optimizations first
```

**Threading Decision Tree Example**:
```
Scenario: Pathfinding for 100 units

├─ Already using caching? YES (60% hit rate)
├─ Work is parallel? YES (each path independent)
├─ Task duration? 5ms per path (good for threading)
├─ Language? Python (GIL problem)
│  └─ Solution: Use multiprocessing or native pathfinding library
├─ Complexity justified? 100 paths × 5ms = 500ms → 60ms with 8 workers
│  └─ YES, worth it (8× speedup)
│
Decision: Use multiprocessing.Pool with 8 workers
```

### Framework 5: Frame Time Budget Allocation

**60 FPS = 16.67ms per frame, 30 FPS = 33.33ms per frame**

**Budget Template** (adjust based on game type):

```
60 FPS Frame Budget (16.67ms total):

├─ Rendering: 6.0ms (40%)
│  ├─ Culling: 1.0ms
│  ├─ Draw calls: 4.0ms
│  └─ Post-processing: 1.0ms
│
├─ AI: 3.5ms (24%)
│  ├─ Behavior trees: 2.0ms
│  ├─ Sensors/perception: 1.0ms
│  └─ Decision-making: 0.5ms
│
├─ Physics: 3.0ms (18%)
│  ├─ Broad-phase: 0.5ms
│  ├─ Narrow-phase: 1.5ms
│  └─ Constraint solving: 1.0ms
│
├─ Pathfinding: 2.0ms (12%)
│  ├─ New paths: 1.5ms
│  └─ Path following: 0.5ms
│
├─ Gameplay: 1.0ms (6%)
│  ├─ Economy updates: 0.3ms
│  ├─ Event processing: 0.4ms
│  └─ UI updates: 0.3ms
│
└─ Buffer: 1.17ms (7%)
   └─ Unexpected spikes, GC, etc.
```

**Budget by Game Type**:

| Game Type | Rendering | AI | Physics | Pathfinding | Gameplay |
|-----------|-----------|-----|---------|-------------|----------|
| **RTS** | 30% | 30% | 10% | 20% | 10% |
| **FPS** | 50% | 15% | 20% | 5% | 10% |
| **City Builder** | 35% | 20% | 5% | 15% | 25% |
| **Physics Sim** | 30% | 5% | 50% | 5% | 10% |
| **Turn-Based** | 60% | 15% | 5% | 10% | 10% |

**Enforcement Pattern**:
```python
class FrameBudgetMonitor:
    def __init__(self):
        self.budgets = {
            'rendering': 6.0,
            'ai': 3.5,
            'physics': 3.0,
            'pathfinding': 2.0,
            'gameplay': 1.0
        }
        self.measurements = {key: [] for key in self.budgets}

    def measure(self, system_name, func):
        start = time.perf_counter()
        result = func()
        elapsed_ms = (time.perf_counter() - start) * 1000

        self.measurements[system_name].append(elapsed_ms)

        # Alert if over budget
        if elapsed_ms > self.budgets[system_name]:
            print(f"⚠️  {system_name} over budget: {elapsed_ms:.2f}ms / {self.budgets[system_name]:.2f}ms")

        return result

    def report(self):
        print("Frame Time Budget Report:")
        for system, budget in self.budgets.items():
            avg = sum(self.measurements[system]) / len(self.measurements[system])
            pct = (avg / budget) * 100
            print(f"  {system}: {avg:.2f}ms / {budget:.2f}ms ({pct:.0f}%)")

# Usage
monitor = FrameBudgetMonitor()

def game_loop():
    monitor.measure('ai', lambda: update_ai(units))
    monitor.measure('physics', lambda: update_physics(world))
    monitor.measure('pathfinding', lambda: update_pathfinding(units))
    monitor.measure('rendering', lambda: render_scene(camera))

    if frame_count % 300 == 0:  # Every 5 seconds
        monitor.report()
```


## Implementation Patterns

### Pattern 1: Spatial Hash Grid for Proximity Queries

**Problem**: Checking every unit against every other unit for vision/attack is O(n²)
- 1000 units = 1,000,000 checks per frame = death

**Solution**: Spatial hash grid divides world into cells, only check nearby cells

```python
import math
from collections import defaultdict

class SpatialHashGrid:
    """
    Spatial partitioning using hash grid for O(1) average query time.

    Best for: Uniform entity distribution, 2D worlds
    Cell size rule: 2× maximum query radius
    """

    def __init__(self, cell_size=100):
        self.cell_size = cell_size
        self.grid = defaultdict(list)  # (cell_x, cell_y) -> [entities]

    def _hash(self, x, y):
        """Convert world position to cell coordinates"""
        cell_x = int(math.floor(x / self.cell_size))
        cell_y = int(math.floor(y / self.cell_size))
        return (cell_x, cell_y)

    def clear(self):
        """Clear all entities (call at start of frame)"""
        self.grid.clear()

    def insert(self, entity):
        """Insert entity into grid"""
        cell = self._hash(entity.x, entity.y)
        self.grid[cell].append(entity)

    def query_radius(self, x, y, radius):
        """
        Find all entities within radius of (x, y).

        Returns: List of entities in range
        Complexity: O(k) where k = entities in nearby cells (typically 10-50)
        """
        # Calculate which cells to check
        min_cell_x = int(math.floor((x - radius) / self.cell_size))
        max_cell_x = int(math.floor((x + radius) / self.cell_size))
        min_cell_y = int(math.floor((y - radius) / self.cell_size))
        max_cell_y = int(math.floor((y + radius) / self.cell_size))

        candidates = []

        # Check all cells in range
        for cell_x in range(min_cell_x, max_cell_x + 1):
            for cell_y in range(min_cell_y, max_cell_y + 1):
                cell = (cell_x, cell_y)
                candidates.extend(self.grid.get(cell, []))

        # Filter by exact distance (candidates may be outside radius)
        results = []
        radius_sq = radius * radius

        for entity in candidates:
            dx = entity.x - x
            dy = entity.y - y
            dist_sq = dx * dx + dy * dy

            if dist_sq <= radius_sq:
                results.append(entity)

        return results

    def query_rect(self, min_x, min_y, max_x, max_y):
        """Find all entities in rectangular region"""
        min_cell_x = int(math.floor(min_x / self.cell_size))
        max_cell_x = int(math.floor(max_x / self.cell_size))
        min_cell_y = int(math.floor(min_y / self.cell_size))
        max_cell_y = int(math.floor(max_y / self.cell_size))

        results = []

        for cell_x in range(min_cell_x, max_cell_x + 1):
            for cell_y in range(min_cell_y, max_cell_y + 1):
                cell = (cell_x, cell_y)
                results.extend(self.grid.get(cell, []))

        return results

# Usage Example
class Unit:
    def __init__(self, x, y, team):
        self.x = x
        self.y = y
        self.team = team
        self.vision_range = 50
        self.attack_range = 20

def game_loop():
    units = [Unit(random() * 1000, random() * 1000, random_team())
             for _ in range(1000)]

    # Cell size = 2× max query radius (vision range)
    spatial_grid = SpatialHashGrid(cell_size=100)

    while running:
        # Rebuild grid each frame (units move)
        spatial_grid.clear()
        for unit in units:
            spatial_grid.insert(unit)

        # Update units
        for unit in units:
            # OLD (O(n²)): Check all 1000 units = 1,000,000 checks
            # enemies = [u for u in units if u.team != unit.team and distance(u, unit) < vision_range]

            # NEW (O(k)): Check ~10-50 units in nearby cells
            nearby = spatial_grid.query_radius(unit.x, unit.y, unit.vision_range)
            enemies = [u for u in nearby if u.team != unit.team]

            # Attack enemies in range
            for enemy in enemies:
                dist_sq = (unit.x - enemy.x)**2 + (unit.y - enemy.y)**2
                if dist_sq <= unit.attack_range**2:
                    enemy.health -= unit.damage

# Performance: O(n²) → O(n)
# 1000 units: 1,000,000 checks → ~30,000 checks (nearby cells only)
# Speedup: ~30-50× for vision/attack queries
```

### Pattern 2: Quadtree for Clustered Entities

**When to use**: Entities cluster in specific areas (cities, battlefields) with sparse regions

```python
class Quadtree:
    """
    Adaptive spatial partitioning for clustered entity distributions.

    Best for: Non-uniform distribution, entities cluster in areas
    Automatically subdivides dense regions
    """

    class Node:
        def __init__(self, x, y, width, height, max_entities=10, max_depth=8):
            self.x = x
            self.y = y
            self.width = width
            self.height = height
            self.max_entities = max_entities
            self.max_depth = max_depth
            self.entities = []
            self.children = None  # [NW, NE, SW, SE] when subdivided

        def is_leaf(self):
            return self.children is None

        def contains(self, entity):
            """Check if entity is within this node's bounds"""
            return (self.x <= entity.x < self.x + self.width and
                    self.y <= entity.y < self.y + self.height)

        def subdivide(self):
            """Split into 4 quadrants"""
            hw = self.width / 2  # half width
            hh = self.height / 2  # half height

            # Create 4 children: NW, NE, SW, SE
            self.children = [
                Quadtree.Node(self.x, self.y, hw, hh,
                             self.max_entities, self.max_depth - 1),  # NW
                Quadtree.Node(self.x + hw, self.y, hw, hh,
                             self.max_entities, self.max_depth - 1),  # NE
                Quadtree.Node(self.x, self.y + hh, hw, hh,
                             self.max_entities, self.max_depth - 1),  # SW
                Quadtree.Node(self.x + hw, self.y + hh, hw, hh,
                             self.max_entities, self.max_depth - 1),  # SE
            ]

            # Move entities to children
            for entity in self.entities:
                for child in self.children:
                    if child.contains(entity):
                        child.insert(entity)
                        break

            self.entities.clear()

        def insert(self, entity):
            """Insert entity into quadtree"""
            if not self.contains(entity):
                return False

            if self.is_leaf():
                self.entities.append(entity)

                # Subdivide if over capacity and can go deeper
                if len(self.entities) > self.max_entities and self.max_depth > 0:
                    self.subdivide()
            else:
                # Insert into appropriate child
                for child in self.children:
                    if child.insert(entity):
                        break

            return True

        def query_radius(self, x, y, radius, results):
            """Find entities within radius of (x, y)"""
            # Check if search circle intersects this node
            closest_x = max(self.x, min(x, self.x + self.width))
            closest_y = max(self.y, min(y, self.y + self.height))

            dx = x - closest_x
            dy = y - closest_y
            dist_sq = dx * dx + dy * dy

            if dist_sq > radius * radius:
                return  # No intersection

            if self.is_leaf():
                # Check entities in this leaf
                radius_sq = radius * radius
                for entity in self.entities:
                    dx = entity.x - x
                    dy = entity.y - y
                    if dx * dx + dy * dy <= radius_sq:
                        results.append(entity)
            else:
                # Recurse into children
                for child in self.children:
                    child.query_radius(x, y, radius, results)

    def __init__(self, world_width, world_height, max_entities=10, max_depth=8):
        self.root = Quadtree.Node(0, 0, world_width, world_height,
                                   max_entities, max_depth)

    def insert(self, entity):
        self.root.insert(entity)

    def query_radius(self, x, y, radius):
        results = []
        self.root.query_radius(x, y, radius, results)
        return results

# Usage
quadtree = Quadtree(world_width=1000, world_height=1000,
                    max_entities=10, max_depth=8)

# Insert entities
for unit in units:
    quadtree.insert(unit)

# Query
enemies_nearby = quadtree.query_radius(player.x, player.y, vision_range=50)

# Performance: O(log n) average query
# Adapts to entity distribution automatically
```

### Pattern 3: Distance-Based LOD System

**Problem**: All entities update at full frequency, wasting CPU on distant entities

**Solution**: Update frequency based on distance from camera/player

```python
class LODSystem:
    """
    Level-of-detail system with smooth transitions and importance weighting.

    LOD 0: Full detail (near camera, important entities)
    LOD 1: Reduced detail (medium distance)
    LOD 2: Minimal detail (far distance)
    LOD 3: Dormant (very far, culled)
    """

    # LOD configuration
    LOD_LEVELS = [
        {
            'name': 'LOD_0_FULL',
            'distance_min': 0,
            'distance_max': 50,
            'update_hz': 60,        # Every frame
            'ai_enabled': True,
            'pathfinding': 'full',  # Precise A*
            'animation': 'full',    # Full skeleton
            'physics': 'full'       # Full collision
        },
        {
            'name': 'LOD_1_REDUCED',
            'distance_min': 50,
            'distance_max': 100,
            'update_hz': 30,        # Every 2 frames
            'ai_enabled': True,
            'pathfinding': 'hierarchical',
            'animation': 'reduced',  # 10 bones
            'physics': 'bbox'        # Bounding box only
        },
        {
            'name': 'LOD_2_MINIMAL',
            'distance_min': 100,
            'distance_max': 200,
            'update_hz': 12,        # Every 5 frames
            'ai_enabled': False,    # Scripted only
            'pathfinding': 'waypoints',
            'animation': 'static',   # Static pose
            'physics': 'none'
        },
        {
            'name': 'LOD_3_CULLED',
            'distance_min': 200,
            'distance_max': float('inf'),
            'update_hz': 2,         # Every 30 frames
            'ai_enabled': False,
            'pathfinding': 'none',
            'animation': 'none',
            'physics': 'none'
        }
    ]

    def __init__(self, camera, player):
        self.camera = camera
        self.player = player
        self.frame_count = 0

        # Hysteresis to prevent LOD thrashing
        self.hysteresis = 20  # Units of distance buffer

    def calculate_lod(self, entity):
        """
        Calculate LOD level for entity based on importance and distance.

        Priority:
        1. Importance (player-controlled, in combat, selected)
        2. Distance from camera
        3. Screen size
        """
        # Important entities always get highest LOD
        if self._is_important(entity):
            return 0

        # Distance-based LOD
        distance = self._distance_to_camera(entity)

        # Current LOD (for hysteresis)
        current_lod = getattr(entity, 'lod_level', 0)

        # Determine LOD level with hysteresis
        for i, lod in enumerate(self.LOD_LEVELS):
            if i < current_lod:
                # Upgrading (closer): Use min distance
                if distance <= lod['distance_max'] - self.hysteresis:
                    return i
            else:
                # Downgrading (farther): Use max distance
                if distance <= lod['distance_max'] + self.hysteresis:
                    return i

        return len(self.LOD_LEVELS) - 1

    def _is_important(self, entity):
        """Check if entity is important (always highest LOD)"""
        return (entity.player_controlled or
                entity.selected or
                (entity.team == self.player.team and entity.in_combat))

    def _distance_to_camera(self, entity):
        dx = entity.x - self.camera.x
        dy = entity.y - self.camera.y
        return math.sqrt(dx * dx + dy * dy)

    def should_update(self, entity):
        """Check if entity should update this frame"""
        lod_level = entity.lod_level
        lod_config = self.LOD_LEVELS[lod_level]
        update_hz = lod_config['update_hz']

        if update_hz >= 60:
            return True  # Every frame

        # Calculate frame interval
        frame_interval = 60 // update_hz  # 60 FPS baseline

        # Offset by entity ID to spread updates across frames
        return (self.frame_count + entity.id) % frame_interval == 0

    def update(self, entities):
        """Update LOD levels and entities"""
        self.frame_count += 1

        # Update LOD levels (cheap, do every frame)
        for entity in entities:
            entity.lod_level = self.calculate_lod(entity)

        # Update entities based on LOD (expensive, time-sliced)
        for entity in entities:
            if self.should_update(entity):
                lod_config = self.LOD_LEVELS[entity.lod_level]
                self._update_entity(entity, lod_config)

    def _update_entity(self, entity, lod_config):
        """Update entity according to LOD configuration"""
        if lod_config['ai_enabled']:
            entity.update_ai()

        if lod_config['pathfinding'] == 'full':
            entity.update_pathfinding_full()
        elif lod_config['pathfinding'] == 'hierarchical':
            entity.update_pathfinding_hierarchical()
        elif lod_config['pathfinding'] == 'waypoints':
            entity.update_pathfinding_waypoints()

        if lod_config['animation'] != 'none':
            entity.update_animation(lod_config['animation'])

        if lod_config['physics'] == 'full':
            entity.update_physics_full()
        elif lod_config['physics'] == 'bbox':
            entity.update_physics_bbox()

# Usage
lod_system = LODSystem(camera, player)

def game_loop():
    lod_system.update(units)
    # Only entities that should_update() this frame were updated

# Performance: 1000 units all at LOD 0 → mixed LOD levels
# Typical distribution: 100 LOD0 + 300 LOD1 + 400 LOD2 + 200 LOD3
# Effective updates: 100 + 150 + 80 + 7 = 337 updates/frame
# Speedup: 1000 → 337 = 3× faster
```

### Pattern 4: Time-Sliced Pathfinding with Priority Queue

**Problem**: 100 path requests × 5ms each = 500ms frame time (2 FPS)

**Solution**: Process paths over multiple frames with priority (player units first)

```python
import heapq
import time
from enum import Enum

class PathPriority(Enum):
    """Priority levels for pathfinding requests"""
    CRITICAL = 0    # Player-controlled, combat
    HIGH = 1        # Player's units
    NORMAL = 2      # Visible units
    LOW = 3         # Off-screen units

class PathRequest:
    def __init__(self, entity, start, goal, priority):
        self.entity = entity
        self.start = start
        self.goal = goal
        self.priority = priority
        self.path = None
        self.complete = False
        self.timestamp = time.time()

class TimeSlicedPathfinder:
    """
    Pathfinding system with frame time budget and priority queue.

    Features:
    - 2ms frame budget (stays at 60 FPS)
    - Priority queue (important requests first)
    - Incremental pathfinding (spread work over frames)
    - Request timeout (abandon old requests)
    """

    def __init__(self, budget_ms=2.0, timeout_seconds=5.0):
        self.budget = budget_ms / 1000.0  # Convert to seconds
        self.timeout = timeout_seconds
        self.pending = []  # Priority queue: (priority, request)
        self.active_request = None
        self.pathfinder = AStarPathfinder()  # Your pathfinding implementation

        # Statistics
        self.stats = {
            'requests_submitted': 0,
            'requests_completed': 0,
            'requests_timeout': 0,
            'avg_time_to_completion': 0
        }

    def submit_request(self, entity, start, goal, priority=PathPriority.NORMAL):
        """Submit pathfinding request with priority"""
        request = PathRequest(entity, start, goal, priority)
        heapq.heappush(self.pending, (priority.value, request))
        self.stats['requests_submitted'] += 1
        return request

    def update(self, dt):
        """
        Process pathfinding requests within frame budget.

        Returns: Number of paths completed this frame
        """
        start_time = time.perf_counter()
        completed = 0

        while time.perf_counter() - start_time < self.budget:
            # Get next request
            if not self.active_request:
                if not self.pending:
                    break  # No more work

                priority, request = heapq.heappop(self.pending)

                # Check timeout
                if time.time() - request.timestamp > self.timeout:
                    self.stats['requests_timeout'] += 1
                    continue

                self.active_request = request
                self.pathfinder.start(request.start, request.goal)

            # Process active request incrementally
            # (process up to 100 nodes this frame)
            done = self.pathfinder.step(max_nodes=100)

            if done:
                # Request complete
                self.active_request.path = self.pathfinder.get_path()
                self.active_request.complete = True
                self.active_request.entity.path = self.active_request.path

                time_to_complete = time.time() - self.active_request.timestamp
                self._update_avg_time(time_to_complete)

                self.stats['requests_completed'] += 1
                self.active_request = None
                completed += 1

        return completed

    def _update_avg_time(self, time_to_complete):
        """Update moving average of completion time"""
        alpha = 0.1  # Smoothing factor
        current_avg = self.stats['avg_time_to_completion']
        self.stats['avg_time_to_completion'] = (
            alpha * time_to_complete + (1 - alpha) * current_avg
        )

    def get_stats(self):
        """Get performance statistics"""
        pending_count = len(self.pending) + (1 if self.active_request else 0)
        return {
            **self.stats,
            'pending_requests': pending_count,
            'completion_rate': (
                self.stats['requests_completed'] / max(1, self.stats['requests_submitted'])
            )
        }

# Usage
pathfinder = TimeSlicedPathfinder(budget_ms=2.0)

def game_loop():
    # Submit pathfinding requests
    for unit in units_needing_paths:
        # Determine priority
        if unit.player_controlled:
            priority = PathPriority.CRITICAL
        elif unit.team == player.team:
            priority = PathPriority.HIGH
        elif unit.visible:
            priority = PathPriority.NORMAL
        else:
            priority = PathPriority.LOW

        pathfinder.submit_request(unit, unit.pos, unit.target, priority)

    # Process paths (stays within 2ms budget)
    paths_completed = pathfinder.update(dt)

    # Every 5 seconds, print stats
    if frame_count % 300 == 0:
        stats = pathfinder.get_stats()
        print(f"Pathfinding: {stats['requests_completed']} complete, "
              f"{stats['pending_requests']} pending, "
              f"avg time: {stats['avg_time_to_completion']:.2f}s")

# Performance:
# Without time-slicing: 100 paths × 5ms = 500ms frame (2 FPS)
# With time-slicing: 2ms budget per frame = 60 FPS maintained
# Paths complete over multiple frames, but high-priority paths finish first
```

### Pattern 5: LRU Cache with TTL for Pathfinding

**Problem**: Recalculating same paths repeatedly wastes CPU

**Solution**: Cache paths with LRU eviction and time-to-live

```python
import time
from collections import OrderedDict

class PathCache:
    """
    LRU cache with TTL for pathfinding results.

    Features:
    - LRU eviction (least recently used)
    - TTL expiration (paths become stale)
    - Region invalidation (terrain changes)
    - Bounded memory (max size)
    """

    def __init__(self, max_size=5000, ttl_seconds=30.0):
        self.cache = OrderedDict()  # Maintains insertion order for LRU
        self.max_size = max_size
        self.ttl = ttl_seconds
        self.insert_times = {}

        # Statistics
        self.stats = {
            'hits': 0,
            'misses': 0,
            'evictions': 0,
            'expirations': 0,
            'invalidations': 0
        }

    def _make_key(self, start, goal):
        """Create cache key from start/goal positions"""
        # Quantize to grid (allows position variance within cell)
        # Cell size = 5 units (units within 5 units share same path)
        return (
            round(start[0] / 5) * 5,
            round(start[1] / 5) * 5,
            round(goal[0] / 5) * 5,
            round(goal[1] / 5) * 5
        )

    def get(self, start, goal):
        """
        Get cached path if available and not expired.

        Returns: Path if cached and valid, None otherwise
        """
        key = self._make_key(start, goal)
        current_time = time.time()

        if key not in self.cache:
            self.stats['misses'] += 1
            return None

        # Check TTL
        if current_time - self.insert_times[key] > self.ttl:
            # Expired
            del self.cache[key]
            del self.insert_times[key]
            self.stats['expirations'] += 1
            self.stats['misses'] += 1
            return None

        # Cache hit - move to end (most recently used)
        self.cache.move_to_end(key)
        self.stats['hits'] += 1
        return self.cache[key]

    def put(self, start, goal, path):
        """Store path in cache"""
        key = self._make_key(start, goal)
        current_time = time.time()

        # Evict if at capacity (LRU)
        if len(self.cache) >= self.max_size and key not in self.cache:
            # Remove oldest (first item in OrderedDict)
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]
            del self.insert_times[oldest_key]
            self.stats['evictions'] += 1

        # Store path
        self.cache[key] = path
        self.insert_times[key] = current_time

        # Move to end (most recently used)
        self.cache.move_to_end(key)

    def invalidate_region(self, x, y, radius):
        """
        Invalidate all cached paths in region.

        Call when terrain changes (building placed, wall destroyed, etc.)
        """
        radius_sq = radius * radius
        keys_to_remove = []

        for key in self.cache:
            start_x, start_y, goal_x, goal_y = key

            # Check if start or goal in affected region
            dx_start = start_x - x
            dy_start = start_y - y
            dx_goal = goal_x - x
            dy_goal = goal_y - y

            if (dx_start * dx_start + dy_start * dy_start <= radius_sq or
                dx_goal * dx_goal + dy_goal * dy_goal <= radius_sq):
                keys_to_remove.append(key)

        for key in keys_to_remove:
            del self.cache[key]
            del self.insert_times[key]
            self.stats['invalidations'] += 1

    def get_hit_rate(self):
        """Calculate cache hit rate"""
        total = self.stats['hits'] + self.stats['misses']
        if total == 0:
            return 0.0
        return self.stats['hits'] / total

    def get_stats(self):
        """Get cache statistics"""
        return {
            **self.stats,
            'size': len(self.cache),
            'hit_rate': self.get_hit_rate()
        }

# Usage
path_cache = PathCache(max_size=5000, ttl_seconds=30.0)

def find_path(start, goal):
    # Try cache first
    cached_path = path_cache.get(start, goal)
    if cached_path:
        return cached_path  # Cache hit!

    # Cache miss - calculate path
    path = expensive_pathfinding(start, goal)
    path_cache.put(start, goal, path)
    return path

# Invalidate when terrain changes
def on_building_placed(building):
    # Invalidate paths near building
    path_cache.invalidate_region(building.x, building.y, radius=100)

# Print stats periodically
def print_cache_stats():
    stats = path_cache.get_stats()
    print(f"Path Cache: {stats['size']}/{path_cache.max_size} entries, "
          f"hit rate: {stats['hit_rate']:.1%}, "
          f"{stats['hits']} hits, {stats['misses']} misses")

# Performance:
# 60% hit rate: Only 40% of requests calculate = 2.5× faster
# 80% hit rate: Only 20% of requests calculate = 5× faster
```

### Pattern 6: Job System for Parallel Work

**When to use**: Native code (C++/Rust) with embarrassingly parallel work

```cpp
#include <vector>
#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>

/**
 * Job system for data-parallel work.
 *
 * Features:
 * - Worker thread pool
 * - Lock-free job submission (mostly)
 * - Wait-for-completion
 * - No shared mutable state (data parallelism)
 */
class JobSystem {
public:
    using Job = std::function<void()>;

    JobSystem(int num_workers = std::thread::hardware_concurrency()) {
        workers.reserve(num_workers);

        for (int i = 0; i < num_workers; ++i) {
            workers.emplace_back([this]() { this->worker_loop(); });
        }
    }

    ~JobSystem() {
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            shutdown = true;
        }
        queue_cv.notify_all();

        for (auto& worker : workers) {
            worker.join();
        }
    }

    // Submit single job
    void submit(Job job) {
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            job_queue.push(std::move(job));
        }
        queue_cv.notify_one();
    }

    // Submit batch of jobs and wait for all to complete
    void submit_batch_and_wait(const std::vector<Job>& jobs) {
        std::atomic<int> remaining{static_cast<int>(jobs.size())};
        std::mutex wait_mutex;
        std::condition_variable wait_cv;

        for (const auto& job : jobs) {
            submit([&, job]() {
                job();

                if (--remaining == 0) {
                    wait_cv.notify_one();
                }
            });
        }

        // Wait for all jobs to complete
        std::unique_lock<std::mutex> lock(wait_mutex);
        wait_cv.wait(lock, [&]() { return remaining == 0; });
    }

private:
    void worker_loop() {
        while (true) {
            Job job;

            {
                std::unique_lock<std::mutex> lock(queue_mutex);
                queue_cv.wait(lock, [this]() {
                    return !job_queue.empty() || shutdown;
                });

                if (shutdown && job_queue.empty()) {
                    return;
                }

                job = std::move(job_queue.front());
                job_queue.pop();
            }

            job();
        }
    }

    std::vector<std::thread> workers;
    std::queue<Job> job_queue;
    std::mutex queue_mutex;
    std::condition_variable queue_cv;
    bool shutdown = false;
};

// Usage Example: Parallel position updates
struct Unit {
    float x, y;
    float vx, vy;

    void update(float dt) {
        x += vx * dt;
        y += vy * dt;
    }
};

void update_units_parallel(std::vector<Unit>& units, float dt, JobSystem& job_system) {
    const int num_workers = 8;
    const int batch_size = units.size() / num_workers;

    std::vector<JobSystem::Job> jobs;

    for (int worker_id = 0; worker_id < num_workers; ++worker_id) {
        int start = worker_id * batch_size;
        int end = (worker_id == num_workers - 1) ? units.size() : start + batch_size;

        jobs.push_back([&units, dt, start, end]() {
            // Each worker updates exclusive slice (no locks needed)
            for (int i = start; i < end; ++i) {
                units[i].update(dt);
            }
        });
    }

    job_system.submit_batch_and_wait(jobs);
}

// Performance: 4 cores = 2-3× speedup (accounting for overhead)
```


## Common Pitfalls

### Pitfall 1: Premature Optimization (Most Common!)

**Symptoms**:
- Jumping to complex solutions (multithreading) before measuring bottleneck
- Micro-optimizing (sqrt → squared distance) without profiling
- Optimizing code that's 1% of frame time

**Why it fails**:
- You optimize the wrong thing (80% of time elsewhere)
- Complex solutions add bugs without benefit
- Time wasted that could go to real bottleneck

**Example**:
```python
# BAD: Premature micro-optimization
# Replaced sqrt with squared distance (saves 0.1ms)
# But vision checks are only 1% of frame time!
dist_sq = dx*dx + dy*dy
if dist_sq < range_sq:  # Micro-optimization
    # ...

# GOOD: Profile first, found pathfinding is 80% of frame time
# Added path caching (saves 40ms!)
cached_path = path_cache.get(start, goal)
if cached_path:
    return cached_path
```

**Solution**:
1. ✅ **Profile FIRST** - measure where time is actually spent
2. ✅ **Focus on top bottleneck** (80/20 rule)
3. ✅ **Measure improvement** - validate optimization helped
4. ✅ **Repeat** - find next bottleneck

**Quote**: "Premature optimization is the root of all evil" - Donald Knuth

### Pitfall 2: LOD Popping (Visual Artifacts)

**Symptoms**:
- Units suddenly appear/disappear at LOD boundaries
- Animation quality jumps (smooth → jerky)
- Players notice "fake" LOD transitions

**Why it fails**:
- No hysteresis: Entity at 99-101 units flip-flops between LOD 0/1 every frame
- Instant transitions: LOD 0 → LOD 3 in one frame (jarring)
- Distance-only: Ignores importance (player's units should always be high detail)

**Example**:
```python
# BAD: No hysteresis (causes popping)
if distance < 100:
    lod = 0
else:
    lod = 1
# Entity at 99.5 units: LOD 0
# Entity moves to 100.5 units: LOD 1
# Entity moves to 99.5 units: LOD 0 (flicker!)

# GOOD: Hysteresis + importance + blend
if is_important(entity):
    lod = 0  # Always full detail for player units
elif distance < 90:
    lod = 0  # Upgrade at 90
elif distance > 110:
    lod = 1  # Downgrade at 110
# else: keep current LOD
# 20-unit buffer prevents thrashing

# Blend between LOD levels over 0.5 seconds
blend_factor = (time.time() - lod_transition_start) / 0.5
```

**Solution**:
1. ✅ **Hysteresis** - different thresholds for upgrade (90) vs downgrade (110)
2. ✅ **Importance weighting** - player units, selected units always high LOD
3. ✅ **Blend transitions** - cross-fade over 0.5-1 second
4. ✅ **Time delay** - wait N seconds before downgrading LOD

### Pitfall 3: Thread Contention and Race Conditions

**Symptoms**:
- Crashes with "list modified during iteration"
- Nondeterministic behavior (works sometimes)
- Slower with multithreading than without (due to locking)

**Why it fails**:
- Multiple threads read/write shared mutable state (data race)
- Excessive locking serializes code (defeats parallelism)
- False sharing - adjacent data on same cache line thrashes

**Example**:
```python
# BAD: Race condition (shared mutable list)
def update_unit_threaded(unit, all_units):
    # Thread 1 reads all_units
    # Thread 2 modifies all_units (adds/removes unit)
    # Thread 1 crashes: "list changed during iteration"
    for other in all_units:
        if collides(unit, other):
            all_units.remove(other)  # RACE!

# BAD: Excessive locking (serialized)
lock = threading.Lock()

def update_unit(unit):
    with lock:  # Only 1 thread works at a time!
        unit.update()

# GOOD: Data parallelism (no shared mutable state)
def update_units_parallel(units, num_workers=4):
    batch_size = len(units) // num_workers

    def update_batch(start, end):
        # Exclusive ownership - no locks needed
        for i in range(start, end):
            units[i].update()  # Only modifies units[i]

    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        futures = []
        for worker_id in range(num_workers):
            start = worker_id * batch_size
            end = start + batch_size if worker_id < num_workers - 1 else len(units)
            futures.append(executor.submit(update_batch, start, end))

        # Wait for all
        for future in futures:
            future.result()
```

**Solution**:
1. ✅ **Avoid shared mutable state** - each thread owns exclusive data
2. ✅ **Read-only sharing** - threads can read shared data if no writes
3. ✅ **Message passing** - communicate via queues instead of shared memory
4. ✅ **Lock-free algorithms** - atomic operations, compare-and-swap
5. ✅ **Test with thread sanitizer** - detects data races

### Pitfall 4: Cache Invalidation Bugs

**Symptoms**:
- Units walk through walls (stale paths cached)
- Memory leak (cache grows unbounded)
- Crashes after long play sessions (out of memory)

**Why it fails**:
- No invalidation: Cache never updates when terrain changes
- No TTL: Old paths stay forever, become invalid
- No eviction: Cache grows until memory exhausted

**Example**:
```python
# BAD: No invalidation, no TTL, unbounded growth
cache = {}

def get_path(start, goal):
    key = (start, goal)
    if key in cache:
        return cache[key]  # May be stale!

    path = pathfind(start, goal)
    cache[key] = path  # Cache grows forever!
    return path

# Building placed, but cached paths not invalidated
def place_building(x, y):
    buildings.append(Building(x, y))
    # BUG: Paths through this area still cached!

# GOOD: LRU + TTL + invalidation
cache = PathCache(max_size=5000, ttl_seconds=30.0)

def get_path(start, goal):
    cached = cache.get(start, goal)
    if cached:
        return cached

    path = pathfind(start, goal)
    cache.put(start, goal, path)
    return path

def place_building(x, y):
    buildings.append(Building(x, y))
    cache.invalidate_region(x, y, radius=100)  # Clear affected paths
```

**Solution**:
1. ✅ **TTL (time-to-live)** - expire entries after N seconds
2. ✅ **Event-based invalidation** - clear cache when terrain changes
3. ✅ **LRU eviction** - remove least recently used when full
4. ✅ **Bounded size** - set max_size to prevent unbounded growth

### Pitfall 5: Forgetting to Rebuild Spatial Grid

**Symptoms**:
- Units see enemies that are no longer there
- Collision detection misses fast-moving objects
- Query results are stale (from previous frame)

**Why it fails**:
- Entities move every frame, but grid not rebuilt
- Grid contains stale positions

**Example**:
```python
# BAD: Grid built once, never updated
spatial_grid = SpatialHashGrid(cell_size=100)
for unit in units:
    spatial_grid.insert(unit)

def game_loop():
    # Units move
    for unit in units:
        unit.x += unit.vx * dt
        unit.y += unit.vy * dt

    # Query stale grid (positions from frame 0!)
    enemies = spatial_grid.query_radius(player.x, player.y, 50)

# GOOD: Rebuild grid every frame
def game_loop():
    # Move units
    for unit in units:
        unit.x += unit.vx * dt
        unit.y += unit.vy * dt

    # Rebuild spatial grid (fast: O(n))
    spatial_grid.clear()
    for unit in units:
        spatial_grid.insert(unit)

    # Query with current positions
    enemies = spatial_grid.query_radius(player.x, player.y, 50)
```

**Solution**:
1. ✅ **Rebuild every frame** - spatial_grid.clear() + insert all entities
2. ✅ **Or use dynamic structure** - quadtree with update() method
3. ✅ **Profile rebuild cost** - should be < 1ms for 1000 entities

### Pitfall 6: Optimization Without Validation

**Symptoms**:
- "Optimized" code runs slower
- New bottleneck created elsewhere
- Unsure if optimization helped

**Why it fails**:
- No before/after measurements
- Optimization moved bottleneck to different system
- Assumptions about cost were wrong

**Example**:
```python
# BAD: No measurement
def optimize_pathfinding():
    # Made some changes...
    # Hope it's faster?
    pass

# GOOD: Measure before and after
def optimize_pathfinding():
    # Measure baseline
    start = time.perf_counter()
    for i in range(100):
        path = pathfind(start, goal)
    baseline_ms = (time.perf_counter() - start) * 1000
    print(f"Baseline: {baseline_ms:.2f}ms for 100 paths")

    # Apply optimization...
    add_path_caching()

    # Measure improvement
    start = time.perf_counter()
    for i in range(100):
        path = pathfind(start, goal)
    optimized_ms = (time.perf_counter() - start) * 1000
    print(f"Optimized: {optimized_ms:.2f}ms for 100 paths")

    speedup = baseline_ms / optimized_ms
    print(f"Speedup: {speedup:.1f}×")

    # Baseline: 500ms for 100 paths
    # Optimized: 200ms for 100 paths
    # Speedup: 2.5×
```

**Solution**:
1. ✅ **Measure baseline** before optimization
2. ✅ **Measure improvement** after optimization
3. ✅ **Calculate speedup** - validate it helped
4. ✅ **Re-profile** - check for new bottlenecks
5. ✅ **Regression test** - ensure gameplay still works

### Pitfall 7: Ignoring Amdahl's Law (Diminishing Returns)

**Concept**: Speedup limited by serial portion of code

**Amdahl's Law**: `Speedup = 1 / ((1 - P) + P/N)`
- P = portion that can be parallelized (e.g., 0.75 = 75%)
- N = number of cores (e.g., 4)

**Example**:
- 75% of code parallelizable, 4 cores
- Speedup = 1 / ((1 - 0.75) + 0.75/4) = 1 / (0.25 + 0.1875) = 2.29×
- **Not 4×!** Serial portion limits speedup

**Why it matters**:
- Multithreading has diminishing returns
- Focus on parallelizing largest portions first
- Some tasks can't be parallelized (Amdahl's law ceiling)

**Solution**:
1. ✅ **Parallelize largest bottleneck** first (maximize P)
2. ✅ **Set realistic expectations** (2-3× on 4 cores, not 4×)
3. ✅ **Measure actual speedup** - compare to theoretical maximum

### Pitfall 8: Sorting Every Frame (Expensive!)

**Symptoms**:
- 3-5ms spent sorting units by distance
- Sorting is top function in profiler

**Why it fails**:
- O(n log n) sort is expensive for large N
- Entity distances change slowly (don't need exact sort every frame)

**Example**:
```python
# BAD: Full sort every frame
def update():
    # O(n log n) = 1000 × log(1000) ≈ 10,000 operations
    units_sorted = sorted(units, key=lambda u: distance_to_camera(u))

    # Update closest units
    for unit in units_sorted[:100]:
        unit.update()

# GOOD: Sort every N frames, or use approximate sort
def update():
    # Re-sort every 10 frames only
    if frame_count % 10 == 0:
        global units_sorted
        units_sorted = sorted(units, key=lambda u: distance_to_camera(u))

    # Use slightly stale sort (good enough!)
    for unit in units_sorted[:100]:
        unit.update()

# BETTER: Use spatial partitioning (no sorting needed!)
def update():
    # Query entities near camera (already sorted by distance)
    nearby_units = spatial_grid.query_radius(camera.x, camera.y, radius=200)

    # Update nearby units
    for unit in nearby_units:
        unit.update()
```

**Solution**:
1. ✅ **Sort less frequently** - every 5-10 frames is fine
2. ✅ **Approximate sort** - bucketing instead of exact sort
3. ✅ **Spatial queries** - avoid sorting entirely (use grid/quadtree)


## Real-World Examples

### Example 1: Unity DOTS (Data-Oriented Technology Stack)

**What it is**: Unity's high-performance ECS (Entity Component System) architecture

**Key optimizations**:
1. **Struct of Arrays (SoA)** - Components stored in contiguous arrays
   - Traditional: `List<GameObject>` with components scattered in memory
   - DOTS: `NativeArray<Position>`, `NativeArray<Velocity>` - cache-friendly
   - Result: 1.5-3× faster for batch operations

2. **Job System** - Data parallelism across CPU cores
   - Each job processes exclusive slice of entities
   - No locks (data ownership model)
   - Result: 2-4× speedup on 4-8 core CPUs

3. **Burst Compiler** - LLVM-based code generation
   - Generates SIMD instructions (AVX2, SSE)
   - Removes bounds checks, optimizes math
   - Result: 2-10× faster than standard C#

**Performance**: 10,000 entities at 60 FPS (vs 1,000 in traditional Unity)

**When to use**:
- ✅ 1000+ entities needing updates
- ✅ Batch operations (position updates, physics, AI)
- ✅ Performance-critical simulations

**When NOT to use**:
- ❌ Small entity counts (< 100)
- ❌ Gameplay prototyping (ECS is complex)
- ❌ Unique entities with lots of one-off logic

### Example 2: Supreme Commander (RTS with 1000+ Units)

**Challenge**: Support 1000+ units in RTS battles at 30-60 FPS

**Optimizations**:
1. **Flow Fields for Pathfinding**
   - Pre-compute direction field from goal
   - Each unit follows field (O(1) per unit)
   - Alternative to A* per unit (O(n log n) each)
   - Result: 100× faster pathfinding for groups

2. **LOD for Unit AI**
   - LOD 0 (< 50 units from camera): Full behavior tree
   - LOD 1 (50-100 units): Simplified FSM
   - LOD 2 (100+ units): Scripted behavior
   - Result: 3-5× fewer AI updates per frame

3. **Spatial Partitioning for Weapons**
   - Grid-based broad-phase for weapon targeting
   - Only check units in weapon range cells
   - Result: O(n²) → O(n) for combat calculations

4. **Time-Sliced Sim**
   - Economy updates: Every 10 frames
   - Unit production: Every 5 frames
   - Visual effects: Based on distance LOD
   - Result: Consistent frame rate under load

**Performance**: 1000 units at 30 FPS, 500 units at 60 FPS

**Lessons**:
- Flow fields > A* for large unit groups
- LOD critical for maintaining frame rate at scale
- Spatial partitioning is non-negotiable for 1000+ units

### Example 3: Total War (20,000+ Soldiers in Battles)

**Challenge**: Render and simulate 20,000 individual soldiers at 30-60 FPS

**Optimizations**:
1. **Hierarchical LOD**
   - LOD 0 (< 20m): Full skeleton, detailed model
   - LOD 1 (20-50m): Reduced skeleton, simpler model
   - LOD 2 (50-100m): Impostor (textured quad)
   - LOD 3 (100m+): Single pixel or culled
   - Result: 10× fewer vertices rendered

2. **Formation-Based AI**
   - Units in formation share single pathfinding result
   - Individual units offset from formation center
   - Result: 100× fewer pathfinding calculations

3. **Batched Rendering**
   - Instanced rendering for identical soldiers
   - 1 draw call for 100 soldiers (vs 100 draw calls)
   - Result: 10× fewer draw calls

4. **Simplified Physics**
   - Full physics for nearby units (< 20m)
   - Ragdolls for deaths near camera
   - Simplified collision for distant units
   - Result: 5× fewer physics calculations

**Performance**: 20,000 units at 30-60 FPS (depending on settings)

**Lessons**:
- Visual LOD as important as simulation LOD
- Formation-based AI avoids redundant pathfinding
- Instanced rendering critical for large unit counts

### Example 4: Cities Skylines (Traffic Simulation)

**Challenge**: Simulate 10,000+ vehicles with realistic traffic at 30 FPS

**Optimizations**:
1. **Hierarchical Pathfinding**
   - Highway network → arterial roads → local streets
   - Pre-compute high-level paths, refine locally
   - Result: 20× faster pathfinding for long routes

2. **Path Caching**
   - Common routes cached (home → work, work → home)
   - 60-80% cache hit rate
   - Result: 2.5-5× fewer pathfinding calculations

3. **Dynamic Cost Adjustment**
   - Road segments track vehicle density
   - Congested roads have higher pathfinding cost
   - Vehicles reroute around congestion
   - Result: Emergent traffic patterns

4. **Despawn Distant Vehicles**
   - Vehicles > 500m from camera despawned
   - Statistics tracked, respawn when relevant
   - Result: Effective vehicle count reduced 50%

**Performance**: 10,000 active vehicles at 30 FPS

**Lessons**:
- Hierarchical pathfinding essential for city-scale maps
- Path caching provides huge wins (60%+ hit rate common)
- Despawning off-screen entities maintains performance

### Example 5: Factorio (Mega-Factory Optimization)

**Challenge**: Simulate 100,000+ entities (belts, inserters, assemblers) at 60 FPS

**Optimizations**:
1. **Update Skipping**
   - Idle machines don't update (no input/output)
   - Active set typically 10-20% of total entities
   - Result: 5-10× fewer updates per tick

2. **Chunk-Based Simulation**
   - World divided into 32×32 tile chunks
   - Inactive chunks (no player nearby) update less often
   - Result: Effective world size reduced 80%

3. **Belt Optimization**
   - Items on belts compressed into contiguous arrays
   - Lane-based updates (not per-item)
   - Result: 10× faster belt simulation

4. **Electrical Network Caching**
   - Power grid solved once, cached until topology changes
   - Only recalculate when grid modified
   - Result: 100× fewer electrical calculations

**Performance**: 60 FPS with 100,000+ entities (in optimized factories)

**Lessons**:
- Update skipping (sleeping entities) provides huge wins
- Chunk-based simulation scales to massive worlds
- Cache static calculations (power grid, fluid networks)


## Cross-References

### Within Bravos/Simulation-Tactics

**This skill applies to ALL other simulation skills**:

- **traffic-and-pathfinding** ← Optimize pathfinding with caching, time-slicing
- **ai-and-agent-simulation** ← LOD for AI, time-sliced behavior trees
- **physics-simulation-patterns** ← Spatial partitioning for collision, broad-phase
- **ecosystem-simulation** ← LOD for distant populations, time-sliced updates
- **weather-and-time** ← Particle budgets, LOD for effects
- **economic-simulation-patterns** ← Time-slicing for economy updates

**Related skills in this skillpack**:
- **spatial-partitioning** (planned) - Deep dive into quadtrees, octrees, grids
- **ecs-architecture** (planned) - Data-oriented design, component systems

### External Skillpacks

**Yzmir/Performance-Optimization** (if exists):
- Profiling tools and methodology
- Memory optimization (pooling, allocators)
- Cache optimization (data layouts)

**Yzmir/Algorithms-and-Data-Structures** (if exists):
- Spatial data structures (quadtree, k-d tree, BVH)
- Priority queues (for time-slicing)
- LRU cache implementation

**Axiom/Game-Engine-Patterns** (if exists):
- Update loop patterns
- Frame time management
- Object pooling


## Testing Checklist

Use this checklist to verify optimization is complete and correct:

### 1. Profiling

- [ ] Captured baseline performance (frame time, FPS)
- [ ] Identified top 3-5 bottlenecks (80% of time)
- [ ] Understood WHY each bottleneck is slow (algorithm, data, cache)
- [ ] Documented baseline metrics for comparison

### 2. Algorithmic Optimization

- [ ] Checked for O(n²) algorithms (proximity queries, collisions)
- [ ] Applied spatial partitioning where appropriate (grid, quadtree)
- [ ] Validated spatial queries return correct results
- [ ] Measured improvement (should be 10-100×)

### 3. Level of Detail (LOD)

- [ ] Defined LOD levels (typically 4: full, reduced, minimal, culled)
- [ ] Implemented distance-based LOD assignment
- [ ] Added importance weighting (player units, selected units)
- [ ] Implemented hysteresis to prevent LOD thrashing
- [ ] Verified no visual popping artifacts
- [ ] Measured improvement (should be 2-10×)

### 4. Time-Slicing

- [ ] Set frame time budget per system (e.g., 2ms for pathfinding)
- [ ] Implemented priority queue (important work first)
- [ ] Verified budget is respected (doesn't exceed limit)
- [ ] Checked that high-priority work completes quickly
- [ ] Measured improvement (should be 2-5×)

### 5. Caching

- [ ] Identified redundant calculations to cache
- [ ] Implemented cache with LRU eviction
- [ ] Added TTL (time-to-live) expiration
- [ ] Implemented invalidation triggers (terrain changes, etc.)
- [ ] Verified cache hit rate (aim for 60-80%)
- [ ] Checked no stale data bugs (units walking through walls)
- [ ] Measured improvement (should be 2-10×)

### 6. Data-Oriented Design (if applicable)

- [ ] Identified batch operations on many entities
- [ ] Converted AoS → SoA for hot data
- [ ] Verified memory layout is cache-friendly
- [ ] Measured improvement (should be 1.5-3×)

### 7. Multithreading (if needed)

- [ ] Verified all simpler optimizations done first
- [ ] Identified embarrassingly parallel work
- [ ] Implemented job system or data parallelism
- [ ] Verified no race conditions (test with thread sanitizer)
- [ ] Checked performance gain justifies complexity
- [ ] Measured improvement (should be 1.5-4×)

### 8. Validation

- [ ] Met target frame rate (60 FPS or 30 FPS)
- [ ] Verified no gameplay regressions (units behave correctly)
- [ ] Checked no visual artifacts (LOD popping, etc.)
- [ ] Tested at target entity count (e.g., 1000 units)
- [ ] Tested edge cases (10,000 units, worst-case scenarios)
- [ ] Documented final performance metrics
- [ ] Calculated total speedup (baseline → optimized)

### 9. Before/After Comparison

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| **Frame Time** | ___ms | ___ms | ___× faster |
| **FPS** | ___ | ___ | ___ |
| **Bottleneck System Time** | ___ms | ___ms | ___× faster |
| **Entity Count (target FPS)** | ___ | ___ | ___× more |
| **Memory Usage** | ___MB | ___MB | ___ |

### 10. Regression Tests

- [ ] Units still path correctly (no walking through walls)
- [ ] AI behavior unchanged (same decisions)
- [ ] Combat calculations correct (same damage)
- [ ] No crashes or exceptions
- [ ] No memory leaks (long play session test)
- [ ] Deterministic results (same input → same output)


**Remember**:
1. **Profile FIRST** - measure before guessing
2. **Algorithmic optimization** provides biggest wins (10-100×)
3. **LOD and time-slicing** are essential for 1000+ entities
4. **Multithreading is LAST resort** - complexity cost is high
5. **Validate improvement** - measure before/after, check for regressions

**Success criteria**: Target frame rate achieved (60 FPS) with desired entity count (1000+) and no gameplay compromises.