--- name: Data Visualization description: "Comprehensive data visualization skill covering visual execution and technical implementation. Includes perceptual foundations, chart selection, layout algorithms, and library guidance. Load on-demand when building charts, graphs, dashboards, or any visual data representation." version: 1.0.0 --- # Data Visualization Visualization is communication. Every visual element must serve understanding. ## Critical Rules 🚨 **Use established algorithms.** Graph layout, tree layout, spatial indexingβ€”these problems are solved. Check dagre, d3-force, ELK.js before implementing anything custom. 🚨 **Choose encodings by perceptual accuracy.** Position beats length beats angle beats area beats color. Prefer bar charts over pie charts over bubble charts. 🚨 **Never rely on color alone.** 8% of men are colorblind. Use shape, pattern, or labels as backup encoding. 🚨 **Match rendering to scale.** SVG for <1000 elements, Canvas for 1000-10000, WebGL for >10000. --- ## 1. Visual Encoding ### Marks & Channels **Marks** are geometric primitives representing data: - Points (scatter plots, dot plots) - Lines (line charts, network edges) - Areas (bar charts, area charts, maps) **Channels** are visual properties applied to marks: - Position (x, y coordinates) - Size (length, area, volume) - Color (hue, saturation, lightness) - Shape (circle, square, triangle) - Orientation (angle, slope) ### Cleveland & McGill Hierarchy (1984) Visual encodings ranked by perceptual accuracy: 1. **Position along common scale** (most accurate) 2. Position on non-aligned scales 3. Length 4. Angle/slope 5. Area 6. Volume 7. **Color saturation/hue** (least accurate) **Implication:** Bar charts (position) > pie charts (angle) > bubble charts (area) ### Preattentive Attributes Properties processed in <250ms without conscious effort: - Color (hue, saturation) - Form (orientation, length, width, size, shape) - Spatial position - Motion Use preattentive attributes for the most important dataβ€”they "pop out" automatically. ### Channel Effectiveness by Data Type | Data Type | Best Channels | |-----------|---------------| | Quantitative | Position, length, angle, area | | Ordinal | Position, density, saturation | | Categorical | Shape, hue, spatial region | --- ## 2. Interaction Design ### Shneiderman's Mantra (1996) "Overview first, zoom and filter, then details on demand" 1. **Overview** β€” Show entire dataset, establish context 2. **Zoom & Filter** β€” Reduce complexity, focus on subset 3. **Details on Demand** β€” Tooltips, click-to-expand, drill-down ### Interaction Patterns | Pattern | Use Case | |---------|----------| | Brushing & linking | Cross-highlighting across coordinated views | | Focus + context | Fisheye lens, detail-on-demand panels | | Direct manipulation | Drag nodes, resize elements, reorder | | Animated transitions | Help users track changes between states | | Pan & zoom | Navigate large visualizations | | Filtering | Reduce data to relevant subset | | Selection | Highlight specific data points | --- ## 3. Chart Selection ### By Question Type | Question | Chart Type | Why | |----------|------------|-----| | How do values compare? | Bar chart | Position encoding is most accurate | | How has this changed over time? | Line chart | Shows trends, handles many points | | What's the distribution? | Histogram, box plot | Shows spread, outliers, shape | | What's the relationship? | Scatter plot | Reveals correlation, clusters | | What's the part-to-whole? | Stacked bar, treemap | Shows composition | | What are the connections? | Network graph, Sankey | Shows relationships, flows | | What's the hierarchy? | Tree, sunburst, treemap | Shows parent-child structure | | Where is it? | Choropleth, symbol map | Geographic context | ### By Data Volume | Volume | Approach | |--------|----------| | <20 points | Simple charts, direct labeling | | 20-500 | Standard visualization | | 500-5000 | Consider aggregation, filtering | | 5000+ | Aggregation mandatory, or Canvas/WebGL | ### Common Anti-Patterns - ❌ Pie charts with >5 slices (use bar chart) - ❌ 3D charts without strong justification - ❌ Dual-axis with unrelated scales (misleading) - ❌ Non-zero baselines for bar charts (distorts perception) - ❌ Truncated axes without clear indication --- ## 4. Color ### Palette Types | Type | Use Case | Examples | |------|----------|----------| | Sequential | Low to high values | Blues, Greens, Viridis | | Diverging | Values diverge from midpoint | RdBu, BrBG, Spectral | | Categorical | Distinct categories | Set2, Tableau10, Category10 | ### Colorblind Safety - 8% of men, 0.5% of women have color vision deficiency - **Never rely on color alone**β€”use shape, pattern, labels - Safe sequential: viridis, cividis, plasma - Safe categorical: ColorBrewer's colorblind-safe options - Test with: Coblis, Sim Daltonism, Chrome DevTools ### Perceptual Uniformity - **Avoid rainbow colormaps** (jet)β€”perceptual steps are uneven - Use viridis, parula, cividis for sequential data - These ensure equal perceptual distance between values ### Color Guidelines - 4.5:1 contrast ratio for text (WCAG AA) - 3:1 contrast for UI components - Max 7-10 distinct categorical colors - Use saturation/lightness variation for emphasis --- ## 5. Layout Algorithms 🚨 **Before implementing ANY layout algorithm, check if a library exists.** ### Algorithm β†’ Library Mapping | Problem | Algorithm | Libraries | |---------|-----------|-----------| | Layered/DAG graphs | Sugiyama (1981) | dagre, ELK.js | | Force-directed networks | Fruchterman-Reingold (1991) | d3-force, Cytoscape.js | | Tree layouts | Reingold-Tilford (1981) | d3-hierarchy | | Treemaps | Squarified (2000) | d3-hierarchy, ECharts | | Circle packing | Wang (2006) | d3-hierarchy | | Sankey diagrams | β€” | d3-sankey | | Chord diagrams | β€” | d3-chord | | Large graphs (10k+) | WebGL + spatial indexing | Sigma.js, G6, deck.gl | | Spatial queries | Quadtree, R-tree | d3-quadtree, rbush | | Edge crossing minimization | Barth (2002) | Built into dagre/ELK | ### When to Use Each Layout | Layout | Best For | |--------|----------| | Sugiyama (dagre) | Flowcharts, dependency graphs, DAGs with direction | | Force-directed | Social networks, organic relationships, exploration | | Tree | Hierarchies with single parent per node | | Treemap | Hierarchies with quantitative values | | Circular | Emphasizing central nodes, ring structures | | Matrix | Dense graphs where edges would overlap | **These problems are solved. Never implement from scratch.** --- ## 6. Rendering & Performance ### Rendering Technology Thresholds ``` <1000 elements β†’ SVG - DOM events work naturally - Accessibility (ARIA) supported - Crisp at any zoom level - CSS styling 1000-10000 β†’ Canvas - Batch rendering - Manual hit testing required - Lower memory footprint - requestAnimationFrame for animation >10000 β†’ WebGL - GPU acceleration - Sigma.js, deck.gl, regl - Complex setup - Limited text rendering ``` ### Performance Patterns | Pattern | When to Use | |---------|-------------| | Web Workers | Layout computation (never block main thread) | | Spatial indexing | Hit detection with quadtree/R-tree | | Level-of-detail | Simplify distant/small elements | | Viewport culling | Only render visible elements | | Debouncing | Expensive interactions (zoom, filter) | | Virtualization | Long lists of chart components | | Aggregation | Too many data points to render individually | ### Anti-Patterns - ❌ 5000 SVG nodes (use Canvas) - ❌ Layout computation on main thread - ❌ Hit testing without spatial indexing - ❌ Rendering off-screen elements - ❌ Animating thousands of elements individually --- ## 7. Libraries ### Graph Layouts | Library | Best For | Notes | |---------|----------|-------| | dagre | Layered DAGs, flowcharts | Sugiyama algorithm, good defaults | | dagre-d3 | dagre + D3 rendering | SVG output | | ELK.js | Complex layouts, compound graphs | Eclipse Layout Kernel, highly configurable | | d3-force | Organic networks | Fruchterman-Reingold, customizable forces | | Cytoscape.js | Graph analysis + visualization | Rich algorithm library | | Sigma.js | Large graphs (10k+) | WebGL rendering | | G6/AntV | Enterprise graphs | Full-featured, Chinese ecosystem | | vis-network | Quick prototypes | Easy API, limited customization | ### Charting | Library | Best For | Notes | |---------|----------|-------| | D3.js | Custom, highly interactive | Low-level, maximum control | | Observable Plot | Quick exploration | D3 team, excellent defaults | | Recharts | React integration | Declarative, composable | | Victory | React integration | Animation support | | ECharts | Feature-rich dashboards | Great mobile, large dataset support | | Vega-Lite | Grammar of graphics | Declarative JSON spec | | Chart.js | Simple charts | Easy setup, limited customization | | Plotly | Scientific visualization | 3D support, interactivity | ### When to Use D3 vs Higher-Level Libraries **Use D3 when:** - Need complete control over rendering - Building novel/custom visualizations - Integrating with existing SVG/Canvas code - Performance-critical with custom optimizations **Use higher-level libraries when:** - Standard chart types suffice - Faster development time matters - Team less experienced with D3 - Need built-in responsiveness/animation --- ## 8. Composition & Layout ### Project Composition (Dashboard Level) - **Visual hierarchy** β€” Guide eye to most important first - **Grid systems** β€” Align elements for coherence - **Grouping** β€” Related visualizations together - **White space** β€” Breathing room, not wasted space - **Reading flow** β€” Z-pattern or F-pattern for Western audiences ### Chart Composition (Single Chart) | Element | Guidelines | |---------|------------| | Title | Clear, descriptive; top-left or centered above | | Subtitle | Additional context; smaller, below title | | Axes | Labeled with units; tick marks at meaningful intervals | | Legend | Embedded when possible; external if complex | | Aspect ratio | Affects slope perception; 45Β° banking for trends | | Margins | Enough for labels; consistent across charts | ### Aspect Ratio Guidelines - **Line charts:** ~16:9 for trends (banking to 45Β°) - **Bar charts:** Depends on number of bars - **Scatter plots:** Often square (1:1) for correlation - **Maps:** Preserve geographic proportions --- ## 9. Annotation ### Annotation Types | Type | Purpose | |------|---------| | Title | The "what" β€” identifies the visualization | | Subtitle | Additional context, data source | | Caption | The "so what" β€” key insight or takeaway | | Axis labels | Variable names and units | | Legend | Decode color/shape/size mappings | | Callouts | Highlight specific data points | | Reference lines | Benchmarks, targets, averages | | Source citation | Data provenance | ### Best Practices - **Annotate the insight, not just the data** β€” "Sales peaked in Q3" not just "Sales over time" - **Use callouts sparingly** β€” Highlight 1-3 key points maximum - **Direct labeling** β€” Embed labels in chart when possible (vs separate legend) - **Provide context** β€” Benchmarks, historical reference, targets - **Layer information** β€” Overview visible, details on interaction ### Text Hierarchy 1. Title (largest, boldest) 2. Subtitle/caption 3. Axis titles 4. Tick labels 5. Annotations 6. Source (smallest) --- ## 10. Accessibility ### WCAG Requirements - **AA minimum** (AAA preferred) - 4.5:1 contrast ratio for normal text - 3:1 contrast for large text and UI components - No information conveyed by color alone ### Keyboard Navigation - Tab through interactive elements - Arrow keys for traversing data points - Enter/Space for selection - Escape to cancel/close ### Screen Reader Support ```html Monthly Sales 2024 Bar chart showing sales increasing from $10M in January to $15M in December ``` - Use ARIA labels and roles - Provide text alternatives - Announce dynamic updates with live regions - Structure for logical reading order ### Alternative Representations - **Data tables** β€” Provide as fallback for all charts - **Text summaries** β€” Describe key insights - **Sonification** β€” Audio representation for time-series - **Tactile graphics** β€” For physical accessibility --- ## 11. Anti-Patterns Summary ### Design Anti-Patterns | Anti-Pattern | Why It's Wrong | What to Do | |--------------|----------------|------------| | 3D charts | Distorts perception | Use 2D | | Pie >5 slices | Hard to compare | Use bar chart | | Dual unrelated axes | Misleading correlation | Separate charts | | Non-zero baseline | Exaggerates differences | Start at zero | | Rainbow colormap | Perceptually uneven | Use viridis | | Color-only encoding | Excludes colorblind | Add shape/pattern | | Chart junk | Distracts from data | Remove decoration | | Overplotting | Hides data density | Aggregate or jitter | ### Implementation Anti-Patterns | Anti-Pattern | Why It's Wrong | What to Do | |--------------|----------------|------------| | Custom graph layout | Reinventing solved problem | Use dagre/ELK | | 5000 SVG nodes | Poor performance | Use Canvas | | Main thread layout | Blocks UI | Use Web Worker | | No spatial indexing | Slow hit detection | Use quadtree | | Rendering off-screen | Wasted computation | Viewport culling | --- ## 12. Academic Foundations ### Seminal Papers | Paper | Year | Contribution | |-------|------|--------------| | Cleveland & McGill "Graphical Perception" | 1984 | Visual encoding hierarchy | | Shneiderman "The Eyes Have It" | 1996 | Overview-zoom-filter-details mantra | | Gansner et al. "Drawing Directed Graphs" | 1993 | Foundation for dagre | | Fruchterman & Reingold "Force-directed Placement" | 1991 | Foundation for d3-force | | Sugiyama et al. "Hierarchical Systems" | 1981 | Layered graph layout | | Barth et al. "Bilayer Cross Counting" | 2002 | Edge crossing minimization | | Brewer "Color Use Guidelines" | 1994 | ColorBrewer palettes | ### Essential Resources | Resource | Type | Focus | |----------|------|-------| | ColorBrewer (colorbrewer2.org) | Tool | Accessible color palettes | | From Data to Viz (data-to-viz.com) | Guide | Chart selection decision tree | | Visualization Analysis & Design (Munzner) | Textbook | Comprehensive theory | | Data Visualisation (Kirk) | Textbook | Practitioner guide | | Visual Display of Quantitative Information (Tufte) | Textbook | Data-ink ratio, chart junk | | D3 Gallery (observablehq.com/@d3/gallery) | Examples | Implementation patterns | --- ## Summary 🚨 **Before implementing visualization:** 1. **What question are you answering?** β†’ Select chart type 2. **What's your data volume?** β†’ Select rendering technology 3. **Is there an established algorithm?** β†’ Use the library 4. **Is it accessible?** β†’ Color, keyboard, screen reader 5. **Does it follow perceptual best practices?** β†’ Encoding hierarchy