28 KiB
Deferred Loading
Definition: A performance optimization pattern that postpones the initialization, loading, or execution of resources until they are actually needed, reducing startup time and memory consumption.
Navigation: ← Dynamic Manifests | ↑ Best Practices | ← Progressive Disclosure
Table of Contents
- What Is It? ← Start here
- Why Defer?
- Basic Patterns ← Quick implementation
- Strategies ← For practitioners
- Lazy Initialization
- Advanced Techniques ← For architects
- MCP Skills Implementation
- Performance Optimization
What Is It?
Deferred loading delays resource initialization until first use.
Visual Comparison
EAGER LOADING (Traditional)
─────────────────────────────────────
App Starts
↓
Load Module A ──────── 2s
Load Module B ──────── 3s
Load Module C ──────── 2s
Load Module D ──────── 1s
Load Module E ──────── 2s
↓
App Ready ──────────── 10s total
Memory: 500 MB
User uses Module A only
→ Modules B, C, D, E wasted startup time
DEFERRED LOADING (Optimized)
─────────────────────────────────────
App Starts
↓
Minimal Initialization ── 0.5s
↓
App Ready ──────────── 0.5s total
Memory: 50 MB
User requests Module A
↓
Load Module A ──────── 2s
Use Module A
Memory: 150 MB
(Modules B, C, D, E never loaded!)
Key Metrics
| Metric | Eager Loading | Deferred Loading | Improvement |
|---|---|---|---|
| Startup Time | 10s | 0.5s | 95% faster |
| Initial Memory | 500 MB | 50 MB | 90% less |
| Time to First Use | 10s | 2.5s | 75% faster |
| Unused Resource Waste | High | Zero | 100% efficient |
Why Defer?
Problem 1: Slow Startup
# BAD: Load everything at startup
import heavy_ml_library # 5 seconds
import video_processing # 3 seconds
import blockchain_tools # 4 seconds
import data_analysis # 2 seconds
# User wants to do simple text search...
# But waited 14 seconds for unused features!
Problem 2: Memory Waste
// BAD: Initialize all components
const videoEditor = new VideoEditor(); // 200 MB
const imageProcessor = new ImageProcessor(); // 150 MB
const audioMixer = new AudioMixer(); // 100 MB
// User only uses text editor
// But 450 MB wasted on unused features!
Problem 3: Unused Resources
// BAD: Connect to all services at startup
DatabaseConnection db = new DatabaseConnection();
CacheService cache = new CacheService();
PaymentGateway payments = new PaymentGateway();
EmailService email = new EmailService();
AnalyticsService analytics = new AnalyticsService();
// User views read-only content
// But connected to write services unnecessarily!
The Solution
Defer everything non-essential:
- ✅ Fast startup
- ✅ Low memory footprint
- ✅ Load only what's used
- ✅ Better resource utilization
Basic Patterns
Pattern 1: Lazy Import (Python)
Bad: Eager Import
# At module level - loads immediately
import pandas as pd
import numpy as np
import tensorflow as tf
def analyze_data(data):
df = pd.DataFrame(data)
return df.describe()
Good: Deferred Import
# No imports at module level
def analyze_data(data):
# Import only when function is called
import pandas as pd
df = pd.DataFrame(data)
return df.describe()
Even Better: Cached Lazy Import
_pandas = None
def get_pandas():
global _pandas
if _pandas is None:
import pandas as pd
_pandas = pd
return _pandas
def analyze_data(data):
pd = get_pandas() # First call imports, subsequent calls use cache
df = pd.DataFrame(data)
return df.describe()
Pattern 2: Lazy Initialization (JavaScript)
Bad: Eager Initialization
class DataService {
constructor() {
// Initialize immediately
this.database = new DatabaseConnection();
this.cache = new CacheLayer();
this.validator = new DataValidator();
}
}
// Even if never used, all initialized!
const service = new DataService();
Good: Deferred Initialization
class DataService {
constructor() {
// Don't initialize anything yet
this._database = null;
this._cache = null;
this._validator = null;
}
get database() {
// Initialize on first access
if (!this._database) {
this._database = new DatabaseConnection();
}
return this._database;
}
get cache() {
if (!this._cache) {
this._cache = new CacheLayer();
}
return this._cache;
}
get validator() {
if (!this._validator) {
this._validator = new DataValidator();
}
return this._validator;
}
}
const service = new DataService(); // Fast, nothing initialized
service.database.query(...); // Now database is initialized
Pattern 3: Function Decorators (Python)
Decorator for Lazy Loading
from functools import wraps
def lazy_load(loader_func):
"""Decorator that defers execution until first call"""
_cached = None
@wraps(loader_func)
def wrapper(*args, **kwargs):
nonlocal _cached
if _cached is None:
_cached = loader_func(*args, **kwargs)
return _cached
return wrapper
# Usage
@lazy_load
def load_ml_model():
print("Loading ML model...") # Only prints once
import tensorflow as tf
return tf.keras.models.load_model('model.h5')
# First call: loads model
model = load_ml_model()
# Subsequent calls: returns cached model
model = load_ml_model() # Instant, uses cache
Strategies
Strategy 1: Tiered Loading
Load resources in priority order:
class Application:
def __init__(self):
# Tier 1: Critical (load immediately)
self.config = load_config()
self.logger = setup_logging()
# Tier 2: Important (load after startup)
self._core_modules = None
# Tier 3: Optional (load on demand)
self._advanced_features = {}
async def start(self):
"""Fast startup - only Tier 1"""
print("App ready!")
# Tier 2: Load in background
asyncio.create_task(self._load_core_modules())
async def _load_core_modules(self):
"""Load Tier 2 in background"""
await asyncio.sleep(0) # Yield to event loop
self._core_modules = load_core_modules()
def get_feature(self, feature_name):
"""Tier 3: Load on explicit request"""
if feature_name not in self._advanced_features:
self._advanced_features[feature_name] = load_feature(feature_name)
return self._advanced_features[feature_name]
Strategy 2: Dependency-Based Loading
Load dependencies only when needed:
class SkillManager:
def __init__(self):
self.skills = {}
self.skill_dependencies = {
'data_analysis': ['pandas', 'numpy'],
'ml_training': ['tensorflow', 'scikit-learn'],
'web_scraping': ['requests', 'beautifulsoup4']
}
def load_skill(self, skill_name):
"""Load skill and its dependencies on demand"""
if skill_name in self.skills:
return self.skills[skill_name]
# Load dependencies first
deps = self.skill_dependencies.get(skill_name, [])
for dep in deps:
self._ensure_dependency(dep)
# Load skill
skill = self._import_skill(skill_name)
self.skills[skill_name] = skill
return skill
def _ensure_dependency(self, dep_name):
"""Lazy load dependency"""
if dep_name not in sys.modules:
__import__(dep_name)
def _import_skill(self, skill_name):
"""Dynamically import skill module"""
module = __import__(f'skills.{skill_name}', fromlist=['Skill'])
return module.Skill()
Strategy 3: Code Splitting (JavaScript)
Split code into chunks loaded on demand:
// main.js - Always loaded
import { setupApp } from './core.js';
setupApp();
// Features loaded on demand
async function enableAdvancedMode() {
// Dynamic import - only loads when called
const { AdvancedFeatures } = await import('./advanced.js');
return new AdvancedFeatures();
}
async function startVideoEditing() {
// Large dependency loaded only when needed
const { VideoEditor } = await import('./video-editor.js');
return new VideoEditor();
}
// User clicks "Advanced Mode"
button.onclick = async () => {
const features = await enableAdvancedMode();
features.activate();
};
Strategy 4: Resource Pooling
Reuse loaded resources efficiently:
from typing import Dict, Optional
import weakref
class ResourcePool:
"""Pool of lazily-loaded resources with weak references"""
def __init__(self):
self._resources: Dict[str, weakref.ref] = {}
self._loaders: Dict[str, callable] = {}
def register(self, name: str, loader: callable):
"""Register a resource loader"""
self._loaders[name] = loader
def get(self, name: str):
"""Get resource, loading if necessary"""
# Check if already loaded
if name in self._resources:
resource = self._resources[name]()
if resource is not None:
return resource
# Load resource
if name not in self._loaders:
raise ValueError(f"No loader for {name}")
resource = self._loaders[name]()
# Store weak reference (allows garbage collection)
self._resources[name] = weakref.ref(resource)
return resource
# Usage
pool = ResourcePool()
pool.register('ml_model', lambda: load_ml_model())
pool.register('database', lambda: DatabaseConnection())
# First call: loads
model = pool.get('ml_model')
# Second call: reuses if still in memory
model = pool.get('ml_model')
# If garbage collected, reloads automatically
Lazy Initialization
Pattern: Singleton with Lazy Loading
class Singleton:
_instance = None
def __new__(cls):
# Lazy initialization: create only on first access
if cls._instance is None:
print("Creating singleton instance...")
cls._instance = super().__new__(cls)
cls._instance._initialize()
return cls._instance
def _initialize(self):
"""Heavy initialization deferred until first use"""
self.data = self._load_heavy_data()
self.connection = self._establish_connection()
def _load_heavy_data(self):
print("Loading heavy data...")
return [...] # Expensive operation
def _establish_connection(self):
print("Establishing connection...")
return Connection() # Expensive operation
# First call: initializes
instance1 = Singleton() # Prints: Creating singleton instance...
# Subsequent calls: reuses
instance2 = Singleton() # No output, instant
Pattern: Lazy Properties
class LazyProperty:
"""Descriptor for lazy property loading"""
def __init__(self, func):
self.func = func
self.attr_name = f'_lazy_{func.__name__}'
def __get__(self, obj, objtype=None):
if obj is None:
return self
# Check if already loaded
if not hasattr(obj, self.attr_name):
# Load and cache
value = self.func(obj)
setattr(obj, self.attr_name, value)
return getattr(obj, self.attr_name)
class DataProcessor:
@LazyProperty
def expensive_resource(self):
"""Only loaded on first access"""
print("Loading expensive resource...")
return load_expensive_resource()
@LazyProperty
def ml_model(self):
"""Only loaded on first access"""
print("Loading ML model...")
return load_ml_model()
# Usage
processor = DataProcessor() # Fast, nothing loaded
# First access: loads
resource = processor.expensive_resource # Prints: Loading expensive resource...
# Second access: cached
resource = processor.expensive_resource # No output, instant
# ML model not accessed = never loaded
Pattern: Lazy Collections
class LazyList:
"""List that loads items on first access"""
def __init__(self, loader_func):
self._loader = loader_func
self._items = None
def _ensure_loaded(self):
if self._items is None:
print("Loading items...")
self._items = self._loader()
def __getitem__(self, index):
self._ensure_loaded()
return self._items[index]
def __len__(self):
self._ensure_loaded()
return len(self._items)
def __iter__(self):
self._ensure_loaded()
return iter(self._items)
# Usage
def load_large_dataset():
print("Expensive database query...")
return [1, 2, 3, 4, 5]
lazy_data = LazyList(load_large_dataset) # Fast, nothing loaded
# First access triggers load
print(lazy_data[0]) # Prints: Loading items... then 1
# Subsequent access uses cache
print(lazy_data[1]) # Prints: 2 (no loading message)
Advanced Techniques
Technique 1: Asynchronous Lazy Loading
Load resources in background without blocking:
import asyncio
from typing import Optional
class AsyncLazyLoader:
def __init__(self, loader_coro):
self._loader = loader_coro
self._value: Optional[any] = None
self._loading_task: Optional[asyncio.Task] = None
async def get(self):
"""Get value, loading if necessary"""
# Already loaded?
if self._value is not None:
return self._value
# Already loading?
if self._loading_task is not None:
return await self._loading_task
# Start loading
self._loading_task = asyncio.create_task(self._load())
return await self._loading_task
async def _load(self):
"""Perform the actual loading"""
print("Loading resource...")
self._value = await self._loader()
return self._value
# Usage
async def load_api_data():
await asyncio.sleep(2) # Simulate slow API
return {"data": "loaded"}
loader = AsyncLazyLoader(load_api_data())
# Multiple concurrent calls share same load
results = await asyncio.gather(
loader.get(), # Starts loading
loader.get(), # Waits for same load
loader.get() # Waits for same load
)
# Only loads once!
Technique 2: Preemptive Loading
Start loading before needed, based on predictions:
class PreemptiveLoader:
def __init__(self):
self._cache = {}
self._loading = {}
def preload(self, resource_name, loader_func):
"""Start loading in background"""
if resource_name not in self._cache and resource_name not in self._loading:
self._loading[resource_name] = asyncio.create_task(
self._load_resource(resource_name, loader_func)
)
async def _load_resource(self, name, loader):
"""Background loading"""
self._cache[name] = await loader()
del self._loading[name]
async def get(self, resource_name, loader_func):
"""Get resource (may already be loaded!)"""
# Already cached?
if resource_name in self._cache:
return self._cache[resource_name]
# Currently loading?
if resource_name in self._loading:
await self._loading[resource_name]
return self._cache[resource_name]
# Start loading now
self._cache[resource_name] = await loader_func()
return self._cache[resource_name]
# Usage
loader = PreemptiveLoader()
# User hovers over "Advanced Features" button
# Predict they might click, start loading
loader.preload('advanced_features', load_advanced_features)
# User clicks button
# Already loaded (or nearly done)!
features = await loader.get('advanced_features', load_advanced_features)
Technique 3: Conditional Loading with Context
Load different resources based on context:
class ContextAwareLoader:
def __init__(self):
self._loaded_modules = {}
def load_for_context(self, context):
"""Load only modules needed for this context"""
required_modules = self._determine_required_modules(context)
loaded = {}
for module_name in required_modules:
loaded[module_name] = self._get_or_load(module_name)
return loaded
def _determine_required_modules(self, context):
"""Figure out what's needed"""
modules = ['core'] # Always needed
if context.language == 'python':
modules.extend(['python_linter', 'python_formatter'])
if context.has_tests:
modules.append('test_runner')
if context.is_web_project:
modules.extend(['http_server', 'browser_tools'])
return modules
def _get_or_load(self, module_name):
"""Lazy load with caching"""
if module_name not in self._loaded_modules:
print(f"Loading {module_name}...")
self._loaded_modules[module_name] = self._import_module(module_name)
return self._loaded_modules[module_name]
# Usage
loader = ContextAwareLoader()
# Python project: loads Python-specific tools
context = Context(language='python', has_tests=True)
modules = loader.load_for_context(context)
# Loads: core, python_linter, python_formatter, test_runner
# JavaScript project: different tools
context = Context(language='javascript', is_web_project=True)
modules = loader.load_for_context(context)
# Loads: core, http_server, browser_tools
Technique 4: Priority-Based Loading
Load resources by priority, deferring low-priority items:
import asyncio
from enum import Enum
class Priority(Enum):
CRITICAL = 1
HIGH = 2
MEDIUM = 3
LOW = 4
class PriorityLoader:
def __init__(self):
self._resources = {}
self._load_queue = {p: [] for p in Priority}
def register(self, name, loader, priority=Priority.MEDIUM):
"""Register a resource to load"""
self._load_queue[priority].append((name, loader))
async def load_by_priority(self):
"""Load resources in priority order"""
for priority in Priority:
tasks = []
for name, loader in self._load_queue[priority]:
tasks.append(self._load_resource(name, loader))
# Load all items at this priority level
await asyncio.gather(*tasks)
# Yield to event loop between priority levels
await asyncio.sleep(0)
async def _load_resource(self, name, loader):
"""Load a single resource"""
print(f"Loading {name}...")
self._resources[name] = await loader()
# Usage
loader = PriorityLoader()
# Register resources with priorities
loader.register('config', load_config, Priority.CRITICAL)
loader.register('logger', setup_logging, Priority.CRITICAL)
loader.register('database', connect_db, Priority.HIGH)
loader.register('cache', setup_cache, Priority.HIGH)
loader.register('analytics', init_analytics, Priority.LOW)
loader.register('ml_model', load_ml_model, Priority.LOW)
# Load in priority order
await loader.load_by_priority()
# Order: config, logger (critical) → database, cache (high) → analytics, ml_model (low)
MCP Skills Implementation
Pattern: Lazy MCP Skill Loading
class MCPSkillManager:
"""Lazy loading manager for MCP skills"""
def __init__(self):
self._skills = {}
self._skill_metadata = self._scan_available_skills()
def _scan_available_skills(self):
"""Quick scan: only read metadata, don't load skills"""
metadata = {}
for skill_file in Path('skills').glob('*.md'):
# Parse YAML frontmatter only (fast)
meta = self._parse_frontmatter(skill_file)
metadata[meta['name']] = {
'file': skill_file,
'triggers': meta.get('triggers', []),
'description': meta.get('description', '')
}
return metadata
def get_skill(self, skill_name):
"""Get skill, loading on first access"""
# Already loaded?
if skill_name in self._skills:
return self._skills[skill_name]
# Load now
if skill_name not in self._skill_metadata:
raise ValueError(f"Skill {skill_name} not found")
skill = self._load_skill(skill_name)
self._skills[skill_name] = skill
return skill
def _load_skill(self, skill_name):
"""Actually load the skill (expensive)"""
meta = self._skill_metadata[skill_name]
skill_file = meta['file']
print(f"Loading skill: {skill_name}")
# Read full content
content = skill_file.read_text()
# Initialize skill
return Skill(name=skill_name, content=content, metadata=meta)
def find_skills_for_trigger(self, trigger):
"""Find skills that match trigger (no loading!)"""
matches = []
for name, meta in self._skill_metadata.items():
if trigger in meta['triggers']:
matches.append(name)
return matches
# Usage
manager = MCPSkillManager() # Fast, scans metadata only
# User types "debug"
matching_skills = manager.find_skills_for_trigger('debug') # Fast, no loading
# Returns: ['python-debugger', 'javascript-debugger']
# User selects 'python-debugger'
skill = manager.get_skill('python-debugger') # Loads now
skill.execute()
Pattern: Progressive Skill Loading
class ProgressiveSkillLoader:
"""Load skills progressively based on usage patterns"""
def __init__(self):
self.tier1_skills = [] # Always loaded
self.tier2_skills = [] # Load after startup
self.tier3_skills = [] # Load on demand
async def initialize(self):
"""Fast startup with tiered loading"""
# Tier 1: Essential skills (load immediately)
self.tier1_skills = [
await self._load_skill('basic-search'),
await self._load_skill('file-operations')
]
# Tier 2: Common skills (load in background)
asyncio.create_task(self._load_tier2())
# Tier 3: Specialized skills (load on request)
# Not loaded yet!
async def _load_tier2(self):
"""Background loading of common skills"""
await asyncio.sleep(0) # Yield to event loop
self.tier2_skills = [
await self._load_skill('git-operations'),
await self._load_skill('code-analysis')
]
async def get_skill(self, skill_name):
"""Get skill from appropriate tier"""
# Check Tier 1
for skill in self.tier1_skills:
if skill.name == skill_name:
return skill
# Check Tier 2
for skill in self.tier2_skills:
if skill.name == skill_name:
return skill
# Tier 3: Load on demand
skill = await self._load_skill(skill_name)
self.tier3_skills.append(skill)
return skill
Performance Optimization
Optimization 1: Memoization
Cache expensive computations:
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_calculation(n):
"""Result cached automatically"""
print(f"Computing for {n}...")
# Expensive operation
return n ** 2
# First call: computes
result = expensive_calculation(5) # Prints: Computing for 5...
# Second call: cached
result = expensive_calculation(5) # No output, instant
Optimization 2: Time-Based Caching
Refresh cache periodically:
from datetime import datetime, timedelta
class TimedCache:
def __init__(self, loader_func, ttl_seconds=300):
self.loader = loader_func
self.ttl = timedelta(seconds=ttl_seconds)
self._cached_value = None
self._cached_time = None
def get(self):
"""Get value, reloading if expired"""
now = datetime.now()
# Cache miss or expired?
if (self._cached_value is None or
self._cached_time is None or
now - self._cached_time > self.ttl):
print("Reloading from source...")
self._cached_value = self.loader()
self._cached_time = now
return self._cached_value
# Usage
def load_api_data():
return fetch_from_api()
cache = TimedCache(load_api_data, ttl_seconds=60)
# First call: loads
data = cache.get() # Prints: Reloading from source...
# Within 60 seconds: cached
data = cache.get() # No output
# After 60 seconds: reloads
data = cache.get() # Prints: Reloading from source...
Optimization 3: Load Monitoring
Track what gets loaded:
class MonitoredLoader:
def __init__(self):
self._load_stats = {}
def load(self, name, loader_func):
"""Load with performance monitoring"""
import time
start = time.time()
result = loader_func()
elapsed = time.time() - start
# Record stats
self._load_stats[name] = {
'load_time': elapsed,
'loaded_at': datetime.now(),
'size': sys.getsizeof(result)
}
print(f"Loaded {name} in {elapsed:.2f}s ({self._load_stats[name]['size']} bytes)")
return result
def print_stats(self):
"""Show loading statistics"""
print("\nLoad Statistics:")
for name, stats in self._load_stats.items():
print(f" {name}: {stats['load_time']:.2f}s, {stats['size']} bytes")
# Usage
loader = MonitoredLoader()
ml_model = loader.load('ml_model', load_ml_model)
database = loader.load('database', connect_database)
loader.print_stats()
# Load Statistics:
# ml_model: 3.45s, 524288 bytes
# database: 0.12s, 1024 bytes
Related Concepts
Deferred Loading ← Dynamic Manifests
- Dynamic manifests: Tell you what's available
- Deferred loading: Determines when to load
Flow:
- Dynamic manifest query: "These tools are available"
- Deferred loading decision: "Don't load yet, wait for first use"
- User requests tool
- Deferred loading: "Now load the tool"
Deferred Loading ← Progressive Disclosure
- Progressive disclosure: What to show to users
- Deferred loading: When to initialize resources
Example:
- Progressive disclosure: "Show basic features, hide advanced"
- Deferred loading: "Don't load advanced feature code until user accesses it"
Both patterns work together:
User opens app
↓
[Progressive Disclosure]
→ UI shows: Basic features only
[Deferred Loading]
→ Code loaded: Basic modules only
User clicks "Advanced"
↓
[Progressive Disclosure]
→ UI reveals: Advanced features
[Deferred Loading]
→ Code loads: Advanced modules (now)
Summary
Deferred loading optimizes performance by:
- Lazy Initialization: Create objects only when needed
- Tiered Loading: Load critical resources first, others later
- Code Splitting: Split code into chunks loaded on demand
- Caching: Reuse loaded resources efficiently
- Monitoring: Track what gets loaded and when
Key principle: Don't pay for what you don't use
Navigation: ← Dynamic Manifests | ↑ Best Practices | ← Progressive Disclosure
Last Updated: 2025-10-20