# Deferred Loading > **Definition**: A performance optimization pattern that postpones the initialization, loading, or execution of resources until they are actually needed, reducing startup time and memory consumption. **Navigation**: [← Dynamic Manifests](./dynamic-manifests.md) | [↑ Best Practices](../README.md) | [← Progressive Disclosure](./progressive-disclosure.md) --- ## Table of Contents - [What Is It?](#what-is-it) ← Start here - [Why Defer?](#why-defer) - [Basic Patterns](#basic-patterns) ← Quick implementation - [Strategies](#strategies) ← For practitioners - [Lazy Initialization](#lazy-initialization) - [Advanced Techniques](#advanced-techniques) ← For architects - [MCP Skills Implementation](#mcp-skills-implementation) - [Performance Optimization](#performance-optimization) --- ## What Is It? Deferred loading delays resource initialization until first use. ### Visual Comparison ``` EAGER LOADING (Traditional) ───────────────────────────────────── App Starts ↓ Load Module A ──────── 2s Load Module B ──────── 3s Load Module C ──────── 2s Load Module D ──────── 1s Load Module E ──────── 2s ↓ App Ready ──────────── 10s total Memory: 500 MB User uses Module A only → Modules B, C, D, E wasted startup time DEFERRED LOADING (Optimized) ───────────────────────────────────── App Starts ↓ Minimal Initialization ── 0.5s ↓ App Ready ──────────── 0.5s total Memory: 50 MB User requests Module A ↓ Load Module A ──────── 2s Use Module A Memory: 150 MB (Modules B, C, D, E never loaded!) ``` ### Key Metrics | Metric | Eager Loading | Deferred Loading | Improvement | |--------|---------------|------------------|-------------| | **Startup Time** | 10s | 0.5s | **95% faster** | | **Initial Memory** | 500 MB | 50 MB | **90% less** | | **Time to First Use** | 10s | 2.5s | **75% faster** | | **Unused Resource Waste** | High | Zero | **100% efficient** | --- ## Why Defer? ### Problem 1: Slow Startup ```python # BAD: Load everything at startup import heavy_ml_library # 5 seconds import video_processing # 3 seconds import blockchain_tools # 4 seconds import data_analysis # 2 seconds # User wants to do simple text search... # But waited 14 seconds for unused features! ``` ### Problem 2: Memory Waste ```javascript // BAD: Initialize all components const videoEditor = new VideoEditor(); // 200 MB const imageProcessor = new ImageProcessor(); // 150 MB const audioMixer = new AudioMixer(); // 100 MB // User only uses text editor // But 450 MB wasted on unused features! ``` ### Problem 3: Unused Resources ```java // BAD: Connect to all services at startup DatabaseConnection db = new DatabaseConnection(); CacheService cache = new CacheService(); PaymentGateway payments = new PaymentGateway(); EmailService email = new EmailService(); AnalyticsService analytics = new AnalyticsService(); // User views read-only content // But connected to write services unnecessarily! ``` ### The Solution **Defer everything non-essential**: - ✅ Fast startup - ✅ Low memory footprint - ✅ Load only what's used - ✅ Better resource utilization --- ## Basic Patterns ### Pattern 1: Lazy Import (Python) **Bad: Eager Import** ```python # At module level - loads immediately import pandas as pd import numpy as np import tensorflow as tf def analyze_data(data): df = pd.DataFrame(data) return df.describe() ``` **Good: Deferred Import** ```python # No imports at module level def analyze_data(data): # Import only when function is called import pandas as pd df = pd.DataFrame(data) return df.describe() ``` **Even Better: Cached Lazy Import** ```python _pandas = None def get_pandas(): global _pandas if _pandas is None: import pandas as pd _pandas = pd return _pandas def analyze_data(data): pd = get_pandas() # First call imports, subsequent calls use cache df = pd.DataFrame(data) return df.describe() ``` ### Pattern 2: Lazy Initialization (JavaScript) **Bad: Eager Initialization** ```javascript class DataService { constructor() { // Initialize immediately this.database = new DatabaseConnection(); this.cache = new CacheLayer(); this.validator = new DataValidator(); } } // Even if never used, all initialized! const service = new DataService(); ``` **Good: Deferred Initialization** ```javascript class DataService { constructor() { // Don't initialize anything yet this._database = null; this._cache = null; this._validator = null; } get database() { // Initialize on first access if (!this._database) { this._database = new DatabaseConnection(); } return this._database; } get cache() { if (!this._cache) { this._cache = new CacheLayer(); } return this._cache; } get validator() { if (!this._validator) { this._validator = new DataValidator(); } return this._validator; } } const service = new DataService(); // Fast, nothing initialized service.database.query(...); // Now database is initialized ``` ### Pattern 3: Function Decorators (Python) **Decorator for Lazy Loading** ```python from functools import wraps def lazy_load(loader_func): """Decorator that defers execution until first call""" _cached = None @wraps(loader_func) def wrapper(*args, **kwargs): nonlocal _cached if _cached is None: _cached = loader_func(*args, **kwargs) return _cached return wrapper # Usage @lazy_load def load_ml_model(): print("Loading ML model...") # Only prints once import tensorflow as tf return tf.keras.models.load_model('model.h5') # First call: loads model model = load_ml_model() # Subsequent calls: returns cached model model = load_ml_model() # Instant, uses cache ``` --- ## Strategies ### Strategy 1: Tiered Loading Load resources in priority order: ```python class Application: def __init__(self): # Tier 1: Critical (load immediately) self.config = load_config() self.logger = setup_logging() # Tier 2: Important (load after startup) self._core_modules = None # Tier 3: Optional (load on demand) self._advanced_features = {} async def start(self): """Fast startup - only Tier 1""" print("App ready!") # Tier 2: Load in background asyncio.create_task(self._load_core_modules()) async def _load_core_modules(self): """Load Tier 2 in background""" await asyncio.sleep(0) # Yield to event loop self._core_modules = load_core_modules() def get_feature(self, feature_name): """Tier 3: Load on explicit request""" if feature_name not in self._advanced_features: self._advanced_features[feature_name] = load_feature(feature_name) return self._advanced_features[feature_name] ``` ### Strategy 2: Dependency-Based Loading Load dependencies only when needed: ```python class SkillManager: def __init__(self): self.skills = {} self.skill_dependencies = { 'data_analysis': ['pandas', 'numpy'], 'ml_training': ['tensorflow', 'scikit-learn'], 'web_scraping': ['requests', 'beautifulsoup4'] } def load_skill(self, skill_name): """Load skill and its dependencies on demand""" if skill_name in self.skills: return self.skills[skill_name] # Load dependencies first deps = self.skill_dependencies.get(skill_name, []) for dep in deps: self._ensure_dependency(dep) # Load skill skill = self._import_skill(skill_name) self.skills[skill_name] = skill return skill def _ensure_dependency(self, dep_name): """Lazy load dependency""" if dep_name not in sys.modules: __import__(dep_name) def _import_skill(self, skill_name): """Dynamically import skill module""" module = __import__(f'skills.{skill_name}', fromlist=['Skill']) return module.Skill() ``` ### Strategy 3: Code Splitting (JavaScript) Split code into chunks loaded on demand: ```javascript // main.js - Always loaded import { setupApp } from './core.js'; setupApp(); // Features loaded on demand async function enableAdvancedMode() { // Dynamic import - only loads when called const { AdvancedFeatures } = await import('./advanced.js'); return new AdvancedFeatures(); } async function startVideoEditing() { // Large dependency loaded only when needed const { VideoEditor } = await import('./video-editor.js'); return new VideoEditor(); } // User clicks "Advanced Mode" button.onclick = async () => { const features = await enableAdvancedMode(); features.activate(); }; ``` ### Strategy 4: Resource Pooling Reuse loaded resources efficiently: ```python from typing import Dict, Optional import weakref class ResourcePool: """Pool of lazily-loaded resources with weak references""" def __init__(self): self._resources: Dict[str, weakref.ref] = {} self._loaders: Dict[str, callable] = {} def register(self, name: str, loader: callable): """Register a resource loader""" self._loaders[name] = loader def get(self, name: str): """Get resource, loading if necessary""" # Check if already loaded if name in self._resources: resource = self._resources[name]() if resource is not None: return resource # Load resource if name not in self._loaders: raise ValueError(f"No loader for {name}") resource = self._loaders[name]() # Store weak reference (allows garbage collection) self._resources[name] = weakref.ref(resource) return resource # Usage pool = ResourcePool() pool.register('ml_model', lambda: load_ml_model()) pool.register('database', lambda: DatabaseConnection()) # First call: loads model = pool.get('ml_model') # Second call: reuses if still in memory model = pool.get('ml_model') # If garbage collected, reloads automatically ``` --- ## Lazy Initialization ### Pattern: Singleton with Lazy Loading ```python class Singleton: _instance = None def __new__(cls): # Lazy initialization: create only on first access if cls._instance is None: print("Creating singleton instance...") cls._instance = super().__new__(cls) cls._instance._initialize() return cls._instance def _initialize(self): """Heavy initialization deferred until first use""" self.data = self._load_heavy_data() self.connection = self._establish_connection() def _load_heavy_data(self): print("Loading heavy data...") return [...] # Expensive operation def _establish_connection(self): print("Establishing connection...") return Connection() # Expensive operation # First call: initializes instance1 = Singleton() # Prints: Creating singleton instance... # Subsequent calls: reuses instance2 = Singleton() # No output, instant ``` ### Pattern: Lazy Properties ```python class LazyProperty: """Descriptor for lazy property loading""" def __init__(self, func): self.func = func self.attr_name = f'_lazy_{func.__name__}' def __get__(self, obj, objtype=None): if obj is None: return self # Check if already loaded if not hasattr(obj, self.attr_name): # Load and cache value = self.func(obj) setattr(obj, self.attr_name, value) return getattr(obj, self.attr_name) class DataProcessor: @LazyProperty def expensive_resource(self): """Only loaded on first access""" print("Loading expensive resource...") return load_expensive_resource() @LazyProperty def ml_model(self): """Only loaded on first access""" print("Loading ML model...") return load_ml_model() # Usage processor = DataProcessor() # Fast, nothing loaded # First access: loads resource = processor.expensive_resource # Prints: Loading expensive resource... # Second access: cached resource = processor.expensive_resource # No output, instant # ML model not accessed = never loaded ``` ### Pattern: Lazy Collections ```python class LazyList: """List that loads items on first access""" def __init__(self, loader_func): self._loader = loader_func self._items = None def _ensure_loaded(self): if self._items is None: print("Loading items...") self._items = self._loader() def __getitem__(self, index): self._ensure_loaded() return self._items[index] def __len__(self): self._ensure_loaded() return len(self._items) def __iter__(self): self._ensure_loaded() return iter(self._items) # Usage def load_large_dataset(): print("Expensive database query...") return [1, 2, 3, 4, 5] lazy_data = LazyList(load_large_dataset) # Fast, nothing loaded # First access triggers load print(lazy_data[0]) # Prints: Loading items... then 1 # Subsequent access uses cache print(lazy_data[1]) # Prints: 2 (no loading message) ``` --- ## Advanced Techniques ### Technique 1: Asynchronous Lazy Loading Load resources in background without blocking: ```python import asyncio from typing import Optional class AsyncLazyLoader: def __init__(self, loader_coro): self._loader = loader_coro self._value: Optional[any] = None self._loading_task: Optional[asyncio.Task] = None async def get(self): """Get value, loading if necessary""" # Already loaded? if self._value is not None: return self._value # Already loading? if self._loading_task is not None: return await self._loading_task # Start loading self._loading_task = asyncio.create_task(self._load()) return await self._loading_task async def _load(self): """Perform the actual loading""" print("Loading resource...") self._value = await self._loader() return self._value # Usage async def load_api_data(): await asyncio.sleep(2) # Simulate slow API return {"data": "loaded"} loader = AsyncLazyLoader(load_api_data()) # Multiple concurrent calls share same load results = await asyncio.gather( loader.get(), # Starts loading loader.get(), # Waits for same load loader.get() # Waits for same load ) # Only loads once! ``` ### Technique 2: Preemptive Loading Start loading before needed, based on predictions: ```python class PreemptiveLoader: def __init__(self): self._cache = {} self._loading = {} def preload(self, resource_name, loader_func): """Start loading in background""" if resource_name not in self._cache and resource_name not in self._loading: self._loading[resource_name] = asyncio.create_task( self._load_resource(resource_name, loader_func) ) async def _load_resource(self, name, loader): """Background loading""" self._cache[name] = await loader() del self._loading[name] async def get(self, resource_name, loader_func): """Get resource (may already be loaded!)""" # Already cached? if resource_name in self._cache: return self._cache[resource_name] # Currently loading? if resource_name in self._loading: await self._loading[resource_name] return self._cache[resource_name] # Start loading now self._cache[resource_name] = await loader_func() return self._cache[resource_name] # Usage loader = PreemptiveLoader() # User hovers over "Advanced Features" button # Predict they might click, start loading loader.preload('advanced_features', load_advanced_features) # User clicks button # Already loaded (or nearly done)! features = await loader.get('advanced_features', load_advanced_features) ``` ### Technique 3: Conditional Loading with Context Load different resources based on context: ```python class ContextAwareLoader: def __init__(self): self._loaded_modules = {} def load_for_context(self, context): """Load only modules needed for this context""" required_modules = self._determine_required_modules(context) loaded = {} for module_name in required_modules: loaded[module_name] = self._get_or_load(module_name) return loaded def _determine_required_modules(self, context): """Figure out what's needed""" modules = ['core'] # Always needed if context.language == 'python': modules.extend(['python_linter', 'python_formatter']) if context.has_tests: modules.append('test_runner') if context.is_web_project: modules.extend(['http_server', 'browser_tools']) return modules def _get_or_load(self, module_name): """Lazy load with caching""" if module_name not in self._loaded_modules: print(f"Loading {module_name}...") self._loaded_modules[module_name] = self._import_module(module_name) return self._loaded_modules[module_name] # Usage loader = ContextAwareLoader() # Python project: loads Python-specific tools context = Context(language='python', has_tests=True) modules = loader.load_for_context(context) # Loads: core, python_linter, python_formatter, test_runner # JavaScript project: different tools context = Context(language='javascript', is_web_project=True) modules = loader.load_for_context(context) # Loads: core, http_server, browser_tools ``` ### Technique 4: Priority-Based Loading Load resources by priority, deferring low-priority items: ```python import asyncio from enum import Enum class Priority(Enum): CRITICAL = 1 HIGH = 2 MEDIUM = 3 LOW = 4 class PriorityLoader: def __init__(self): self._resources = {} self._load_queue = {p: [] for p in Priority} def register(self, name, loader, priority=Priority.MEDIUM): """Register a resource to load""" self._load_queue[priority].append((name, loader)) async def load_by_priority(self): """Load resources in priority order""" for priority in Priority: tasks = [] for name, loader in self._load_queue[priority]: tasks.append(self._load_resource(name, loader)) # Load all items at this priority level await asyncio.gather(*tasks) # Yield to event loop between priority levels await asyncio.sleep(0) async def _load_resource(self, name, loader): """Load a single resource""" print(f"Loading {name}...") self._resources[name] = await loader() # Usage loader = PriorityLoader() # Register resources with priorities loader.register('config', load_config, Priority.CRITICAL) loader.register('logger', setup_logging, Priority.CRITICAL) loader.register('database', connect_db, Priority.HIGH) loader.register('cache', setup_cache, Priority.HIGH) loader.register('analytics', init_analytics, Priority.LOW) loader.register('ml_model', load_ml_model, Priority.LOW) # Load in priority order await loader.load_by_priority() # Order: config, logger (critical) → database, cache (high) → analytics, ml_model (low) ``` --- ## MCP Skills Implementation ### Pattern: Lazy MCP Skill Loading ```python class MCPSkillManager: """Lazy loading manager for MCP skills""" def __init__(self): self._skills = {} self._skill_metadata = self._scan_available_skills() def _scan_available_skills(self): """Quick scan: only read metadata, don't load skills""" metadata = {} for skill_file in Path('skills').glob('*.md'): # Parse YAML frontmatter only (fast) meta = self._parse_frontmatter(skill_file) metadata[meta['name']] = { 'file': skill_file, 'triggers': meta.get('triggers', []), 'description': meta.get('description', '') } return metadata def get_skill(self, skill_name): """Get skill, loading on first access""" # Already loaded? if skill_name in self._skills: return self._skills[skill_name] # Load now if skill_name not in self._skill_metadata: raise ValueError(f"Skill {skill_name} not found") skill = self._load_skill(skill_name) self._skills[skill_name] = skill return skill def _load_skill(self, skill_name): """Actually load the skill (expensive)""" meta = self._skill_metadata[skill_name] skill_file = meta['file'] print(f"Loading skill: {skill_name}") # Read full content content = skill_file.read_text() # Initialize skill return Skill(name=skill_name, content=content, metadata=meta) def find_skills_for_trigger(self, trigger): """Find skills that match trigger (no loading!)""" matches = [] for name, meta in self._skill_metadata.items(): if trigger in meta['triggers']: matches.append(name) return matches # Usage manager = MCPSkillManager() # Fast, scans metadata only # User types "debug" matching_skills = manager.find_skills_for_trigger('debug') # Fast, no loading # Returns: ['python-debugger', 'javascript-debugger'] # User selects 'python-debugger' skill = manager.get_skill('python-debugger') # Loads now skill.execute() ``` ### Pattern: Progressive Skill Loading ```python class ProgressiveSkillLoader: """Load skills progressively based on usage patterns""" def __init__(self): self.tier1_skills = [] # Always loaded self.tier2_skills = [] # Load after startup self.tier3_skills = [] # Load on demand async def initialize(self): """Fast startup with tiered loading""" # Tier 1: Essential skills (load immediately) self.tier1_skills = [ await self._load_skill('basic-search'), await self._load_skill('file-operations') ] # Tier 2: Common skills (load in background) asyncio.create_task(self._load_tier2()) # Tier 3: Specialized skills (load on request) # Not loaded yet! async def _load_tier2(self): """Background loading of common skills""" await asyncio.sleep(0) # Yield to event loop self.tier2_skills = [ await self._load_skill('git-operations'), await self._load_skill('code-analysis') ] async def get_skill(self, skill_name): """Get skill from appropriate tier""" # Check Tier 1 for skill in self.tier1_skills: if skill.name == skill_name: return skill # Check Tier 2 for skill in self.tier2_skills: if skill.name == skill_name: return skill # Tier 3: Load on demand skill = await self._load_skill(skill_name) self.tier3_skills.append(skill) return skill ``` --- ## Performance Optimization ### Optimization 1: Memoization Cache expensive computations: ```python from functools import lru_cache @lru_cache(maxsize=128) def expensive_calculation(n): """Result cached automatically""" print(f"Computing for {n}...") # Expensive operation return n ** 2 # First call: computes result = expensive_calculation(5) # Prints: Computing for 5... # Second call: cached result = expensive_calculation(5) # No output, instant ``` ### Optimization 2: Time-Based Caching Refresh cache periodically: ```python from datetime import datetime, timedelta class TimedCache: def __init__(self, loader_func, ttl_seconds=300): self.loader = loader_func self.ttl = timedelta(seconds=ttl_seconds) self._cached_value = None self._cached_time = None def get(self): """Get value, reloading if expired""" now = datetime.now() # Cache miss or expired? if (self._cached_value is None or self._cached_time is None or now - self._cached_time > self.ttl): print("Reloading from source...") self._cached_value = self.loader() self._cached_time = now return self._cached_value # Usage def load_api_data(): return fetch_from_api() cache = TimedCache(load_api_data, ttl_seconds=60) # First call: loads data = cache.get() # Prints: Reloading from source... # Within 60 seconds: cached data = cache.get() # No output # After 60 seconds: reloads data = cache.get() # Prints: Reloading from source... ``` ### Optimization 3: Load Monitoring Track what gets loaded: ```python class MonitoredLoader: def __init__(self): self._load_stats = {} def load(self, name, loader_func): """Load with performance monitoring""" import time start = time.time() result = loader_func() elapsed = time.time() - start # Record stats self._load_stats[name] = { 'load_time': elapsed, 'loaded_at': datetime.now(), 'size': sys.getsizeof(result) } print(f"Loaded {name} in {elapsed:.2f}s ({self._load_stats[name]['size']} bytes)") return result def print_stats(self): """Show loading statistics""" print("\nLoad Statistics:") for name, stats in self._load_stats.items(): print(f" {name}: {stats['load_time']:.2f}s, {stats['size']} bytes") # Usage loader = MonitoredLoader() ml_model = loader.load('ml_model', load_ml_model) database = loader.load('database', connect_database) loader.print_stats() # Load Statistics: # ml_model: 3.45s, 524288 bytes # database: 0.12s, 1024 bytes ``` --- ## Related Concepts ### Deferred Loading ← [Dynamic Manifests](./dynamic-manifests.md) - Dynamic manifests: Tell you **what's available** - Deferred loading: Determines **when to load** Flow: 1. Dynamic manifest query: "These tools are available" 2. Deferred loading decision: "Don't load yet, wait for first use" 3. User requests tool 4. Deferred loading: "Now load the tool" ### Deferred Loading ← [Progressive Disclosure](./progressive-disclosure.md) - Progressive disclosure: **What to show** to users - Deferred loading: **When to initialize** resources Example: - Progressive disclosure: "Show basic features, hide advanced" - Deferred loading: "Don't load advanced feature code until user accesses it" Both patterns work together: ``` User opens app ↓ [Progressive Disclosure] → UI shows: Basic features only [Deferred Loading] → Code loaded: Basic modules only User clicks "Advanced" ↓ [Progressive Disclosure] → UI reveals: Advanced features [Deferred Loading] → Code loads: Advanced modules (now) ``` --- ## Summary Deferred loading optimizes performance by: 1. **Lazy Initialization**: Create objects only when needed 2. **Tiered Loading**: Load critical resources first, others later 3. **Code Splitting**: Split code into chunks loaded on demand 4. **Caching**: Reuse loaded resources efficiently 5. **Monitoring**: Track what gets loaded and when Key principle: **Don't pay for what you don't use** --- **Navigation**: [← Dynamic Manifests](./dynamic-manifests.md) | [↑ Best Practices](../README.md) | [← Progressive Disclosure](./progressive-disclosure.md) **Last Updated**: 2025-10-20