--- name: gpui-performance description: Performance optimization techniques for GPUI including rendering optimization, layout performance, memory management, and profiling strategies. Use when user needs to optimize GPUI application performance or debug performance issues. --- # GPUI Performance Optimization ## Metadata This skill provides comprehensive guidance on optimizing GPUI applications for rendering performance, memory efficiency, and overall runtime speed. ## Instructions ### Rendering Optimization #### Understanding the Render Cycle ``` State Change → cx.notify() → Render → Layout → Paint → Display ``` **Key Points**: - Only call `cx.notify()` when state actually changes - Minimize work in `render()` method - Cache expensive computations - Reduce element count and nesting #### Avoiding Unnecessary Renders ```rust // BAD: Renders on every frame impl MyComponent { fn start_animation(&mut self, cx: &mut ViewContext) { cx.spawn(|this, mut cx| async move { loop { cx.update(|_, cx| cx.notify()).ok(); // Forces rerender! Timer::after(Duration::from_millis(16)).await; } }).detach(); } } // GOOD: Only render when state changes impl MyComponent { fn update_value(&mut self, new_value: i32, cx: &mut ViewContext) { if self.value != new_value { self.value = new_value; cx.notify(); // Only notify on actual change } } } ``` #### Optimize Subscription Updates ```rust // BAD: Always rerenders on model change let _subscription = cx.observe(&model, |_, _, cx| { cx.notify(); // Rerenders even if nothing relevant changed }); // GOOD: Selective updates let _subscription = cx.observe(&model, |this, model, cx| { let data = model.read(cx); // Only rerender if relevant field changed if data.relevant_field != this.cached_field { this.cached_field = data.relevant_field.clone(); cx.notify(); } }); ``` #### Memoization Pattern ```rust use std::cell::RefCell; use std::collections::hash_map::DefaultHasher; use std::hash::{Hash, Hasher}; struct MemoizedComponent { model: Model, cached_result: RefCell>, // (hash, result) } impl MemoizedComponent { fn expensive_computation(&self, cx: &ViewContext) -> String { let data = self.model.read(cx); // Calculate hash of input let mut hasher = DefaultHasher::new(); data.relevant_fields.hash(&mut hasher); let hash = hasher.finish(); // Return cached if unchanged if let Some((cached_hash, cached_result)) = &*self.cached_result.borrow() { if *cached_hash == hash { return cached_result.clone(); } } // Compute and cache let result = perform_expensive_computation(&data); *self.cached_result.borrow_mut() = Some((hash, result.clone())); result } } ``` ### Layout Performance #### Minimize Layout Complexity ```rust // BAD: Deep nesting div() .flex() .child( div() .flex() .child( div() .flex() .child( div().child("Content") ) ) ) // GOOD: Flat structure div() .flex() .flex_col() .gap_4() .child("Header") .child("Content") .child("Footer") ``` #### Use Fixed Sizing When Possible ```rust // BETTER: Fixed sizes (no layout calculation) div() .w(px(200.)) .h(px(100.)) .child("Fixed size") // SLOWER: Dynamic sizing (requires layout calculation) div() .w_full() .h_full() .child("Dynamic size") ``` #### Avoid Layout Thrashing ```rust // BAD: Reading layout during render impl Render for BadComponent { fn render(&mut self, cx: &mut ViewContext) -> impl IntoElement { let width = cx.window_bounds().get_bounds().size.width; // Using width immediately causes layout thrashing div().w(width) } } // GOOD: Cache layout-dependent values struct GoodComponent { cached_width: Pixels, } impl GoodComponent { fn on_window_resize(&mut self, cx: &mut ViewContext) { let width = cx.window_bounds().get_bounds().size.width; if self.cached_width != width { self.cached_width = width; cx.notify(); } } } ``` #### Virtual Scrolling for Long Lists ```rust struct VirtualList { items: Vec, scroll_offset: f32, viewport_height: f32, item_height: f32, } impl Render for VirtualList { fn render(&mut self, cx: &mut ViewContext) -> impl IntoElement { // Calculate visible range let start_index = (self.scroll_offset / self.item_height).floor() as usize; let visible_count = (self.viewport_height / self.item_height).ceil() as usize; let end_index = (start_index + visible_count).min(self.items.len()); // Only render visible items div() .h(px(self.viewport_height)) .overflow_y_scroll() .on_scroll(cx.listener(|this, event, cx| { this.scroll_offset = event.scroll_offset.y; cx.notify(); })) .child( div() .h(px(self.items.len() as f32 * self.item_height)) .child( div() .absolute() .top(px(start_index as f32 * self.item_height)) .children( self.items[start_index..end_index] .iter() .map(|item| { div() .h(px(self.item_height)) .child(item.as_str()) }) ) ) ) } } ``` ### Memory Management #### Preventing Memory Leaks ```rust // LEAK: Subscription not stored impl BadView { fn new(model: Model, cx: &mut ViewContext) -> Self { cx.observe(&model, |_, _, cx| cx.notify()); // Leak! Self { model } } } // CORRECT: Store subscription struct GoodView { model: Model, _subscription: Subscription, // Cleaned up on Drop } impl GoodView { fn new(model: Model, cx: &mut ViewContext) -> Self { let _subscription = cx.observe(&model, |_, _, cx| cx.notify()); Self { model, _subscription } } } ``` #### Avoid Circular References ```rust // BAD: Circular reference struct CircularRef { self_view: Option>, // Circular! } // GOOD: Use weak references or redesign struct NoCycle { other_view: View, // No cycle } ``` #### Bounded Collections ```rust use std::collections::VecDeque; const MAX_HISTORY: usize = 100; struct BoundedHistory { items: VecDeque, } impl BoundedHistory { fn add_item(&mut self, item: Item) { self.items.push_back(item); // Maintain size limit while self.items.len() > MAX_HISTORY { self.items.pop_front(); } } } ``` #### Reuse Allocations ```rust struct BufferedComponent { buffer: String, // Reused across operations } impl BufferedComponent { fn format_data(&mut self, data: &[Item]) -> &str { self.buffer.clear(); // Reuse allocation for item in data { use std::fmt::Write; write!(&mut self.buffer, "{}\n", item.name).ok(); } &self.buffer } } ``` ### Profiling Strategies #### CPU Profiling with cargo-flamegraph ```bash # Install cargo install flamegraph # Profile application cargo flamegraph --bin your-app # With specific features cargo flamegraph --bin your-app --features profiling # Opens flamegraph.svg showing CPU time distribution ``` #### Memory Profiling ```bash # valgrind (Linux) valgrind --tool=massif --massif-out-file=massif.out ./target/release/your-app ms_print massif.out # heaptrack (Linux) heaptrack ./target/release/your-app heaptrack_gui heaptrack.your-app.*.gz # Instruments (macOS) instruments -t "Allocations" ./target/release/your-app ``` #### Custom Performance Monitoring ```rust use std::time::Instant; struct PerformanceMonitor { frame_times: VecDeque, max_samples: usize, } impl PerformanceMonitor { fn new() -> Self { Self { frame_times: VecDeque::with_capacity(100), max_samples: 100, } } fn record_frame(&mut self, duration: Duration) { self.frame_times.push_back(duration); if self.frame_times.len() > self.max_samples { self.frame_times.pop_front(); } // Warn if frame is slow (> 16ms for 60fps) if duration.as_millis() > 16 { eprintln!("⚠️ Slow frame: {}ms", duration.as_millis()); } } fn average_fps(&self) -> f64 { if self.frame_times.is_empty() { return 0.0; } let total: Duration = self.frame_times.iter().sum(); let avg = total / self.frame_times.len() as u32; 1000.0 / avg.as_millis() as f64 } fn percentile(&self, p: f64) -> Duration { let mut sorted: Vec<_> = self.frame_times.iter().copied().collect(); sorted.sort(); let index = (sorted.len() as f64 * p) as usize; sorted[index.min(sorted.len() - 1)] } } // Usage in component impl MyView { fn measure_render(&mut self, f: F, cx: &mut ViewContext) where F: FnOnce(&mut Self, &mut ViewContext) { let start = Instant::now(); f(self, cx); let elapsed = start.elapsed(); self.perf_monitor.record_frame(elapsed); // Log stats periodically if self.frame_count % 60 == 0 { println!( "Avg FPS: {:.1}, p95: {}ms, p99: {}ms", self.perf_monitor.average_fps(), self.perf_monitor.percentile(0.95).as_millis(), self.perf_monitor.percentile(0.99).as_millis(), ); } } } ``` #### Benchmark with Criterion ```rust // benches/component_bench.rs use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId}; fn render_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("rendering"); for size in [10, 100, 1000].iter() { group.bench_with_input( BenchmarkId::from_parameter(size), size, |b, &size| { b.iter(|| { App::test(|cx| { let items = vec![Item::default(); size]; let view = cx.new_view(|cx| { ListView::new(items, cx) }); view.update(cx, |view, cx| { black_box(view.render(cx)); }); }); }); } ); } group.finish(); } criterion_group!(benches, render_benchmark); criterion_main!(benches); ``` ### Batching Updates ```rust // BAD: Multiple individual updates for item in items { self.model.update(cx, |model, cx| { model.add_item(item); // Triggers rerender each time! cx.notify(); }); } // GOOD: Batch into single update self.model.update(cx, |model, cx| { for item in items { model.add_item(item); } cx.notify(); // Single rerender }); ``` ### Async Rendering Optimization ```rust struct AsyncView { loading_state: Model, } impl AsyncView { fn load_data(&mut self, cx: &mut ViewContext) { let loading_state = self.loading_state.clone(); // Show loading immediately self.loading_state.update(cx, |state, cx| { *state = LoadingState::Loading; cx.notify(); }); // Load asynchronously cx.spawn(|_, mut cx| async move { // Fetch data let data = fetch_data().await?; // Update state once cx.update_model(&loading_state, |state, cx| { *state = LoadingState::Loaded(data); cx.notify(); })?; Ok::<_, anyhow::Error>(()) }).detach(); } } ``` ### Caching Strategies #### Result Caching ```rust use std::collections::HashMap; struct CachedRenderer { cache: RefCell>, } impl CachedRenderer { fn render_cached( &self, key: String, render_fn: impl FnOnce() -> AnyElement, ) -> AnyElement { let mut cache = self.cache.borrow_mut(); cache.entry(key) .or_insert_with(|| CachedElement::new(render_fn())) .element .clone() } fn invalidate(&self, key: &str) { self.cache.borrow_mut().remove(key); } } ``` ## Resources ### Performance Targets **Rendering**: - Target: 60 FPS (16.67ms per frame) - Render + Layout: ~10ms - Paint: ~6ms - Warning: Any frame > 16ms **Memory**: - Monitor heap growth - Warning: Steady increase (leak) - Target: Stable after initialization **Startup**: - Window display: < 100ms - Fully interactive: < 500ms ### Profiling Tools **CPU Profiling**: - cargo-flamegraph: Visualize CPU time - perf (Linux): System-level profiling - Instruments (macOS): Apple's profiler **Memory Profiling**: - valgrind/massif: Memory usage tracking - heaptrack: Heap allocation tracking - Instruments: Memory allocations **Benchmarking**: - criterion: Statistical benchmarking - cargo bench: Built-in benchmarks - hyperfine: Command-line tool benchmarking ### Best Practices 1. **Measure First**: Profile before optimizing 2. **Minimize Renders**: Only `cx.notify()` when necessary 3. **Cache Results**: Memoize expensive computations 4. **Batch Updates**: Group state changes 5. **Virtual Scrolling**: For long lists 6. **Flat Layouts**: Avoid deep nesting 7. **Fixed Sizing**: When possible 8. **Monitor Memory**: Watch for leaks 9. **Async Loading**: Don't block UI 10. **Test Performance**: Include benchmarks ### Common Bottlenecks - Subscription in render (memory leak) - Expensive computation in render - Deep component nesting - Unnecessary rerenders - Layout thrashing - Large lists without virtualization - Memory leaks from circular refs - Unbounded collections