Files
gh-ntcoding-claude-skillz-d…/skills/data-visualization/SKILL.md
2025-11-30 08:44:41 +08:00

15 KiB

name, description, version
name description version
Data Visualization Comprehensive data visualization skill covering visual execution and technical implementation. Includes perceptual foundations, chart selection, layout algorithms, and library guidance. Load on-demand when building charts, graphs, dashboards, or any visual data representation. 1.0.0

Data Visualization

Visualization is communication. Every visual element must serve understanding.

Critical Rules

🚨 Use established algorithms. Graph layout, tree layout, spatial indexing—these problems are solved. Check dagre, d3-force, ELK.js before implementing anything custom.

🚨 Choose encodings by perceptual accuracy. Position beats length beats angle beats area beats color. Prefer bar charts over pie charts over bubble charts.

🚨 Never rely on color alone. 8% of men are colorblind. Use shape, pattern, or labels as backup encoding.

🚨 Match rendering to scale. SVG for <1000 elements, Canvas for 1000-10000, WebGL for >10000.


1. Visual Encoding

Marks & Channels

Marks are geometric primitives representing data:

  • Points (scatter plots, dot plots)
  • Lines (line charts, network edges)
  • Areas (bar charts, area charts, maps)

Channels are visual properties applied to marks:

  • Position (x, y coordinates)
  • Size (length, area, volume)
  • Color (hue, saturation, lightness)
  • Shape (circle, square, triangle)
  • Orientation (angle, slope)

Cleveland & McGill Hierarchy (1984)

Visual encodings ranked by perceptual accuracy:

  1. Position along common scale (most accurate)
  2. Position on non-aligned scales
  3. Length
  4. Angle/slope
  5. Area
  6. Volume
  7. Color saturation/hue (least accurate)

Implication: Bar charts (position) > pie charts (angle) > bubble charts (area)

Preattentive Attributes

Properties processed in <250ms without conscious effort:

  • Color (hue, saturation)
  • Form (orientation, length, width, size, shape)
  • Spatial position
  • Motion

Use preattentive attributes for the most important data—they "pop out" automatically.

Channel Effectiveness by Data Type

Data Type Best Channels
Quantitative Position, length, angle, area
Ordinal Position, density, saturation
Categorical Shape, hue, spatial region

2. Interaction Design

Shneiderman's Mantra (1996)

"Overview first, zoom and filter, then details on demand"

  1. Overview — Show entire dataset, establish context
  2. Zoom & Filter — Reduce complexity, focus on subset
  3. Details on Demand — Tooltips, click-to-expand, drill-down

Interaction Patterns

Pattern Use Case
Brushing & linking Cross-highlighting across coordinated views
Focus + context Fisheye lens, detail-on-demand panels
Direct manipulation Drag nodes, resize elements, reorder
Animated transitions Help users track changes between states
Pan & zoom Navigate large visualizations
Filtering Reduce data to relevant subset
Selection Highlight specific data points

3. Chart Selection

By Question Type

Question Chart Type Why
How do values compare? Bar chart Position encoding is most accurate
How has this changed over time? Line chart Shows trends, handles many points
What's the distribution? Histogram, box plot Shows spread, outliers, shape
What's the relationship? Scatter plot Reveals correlation, clusters
What's the part-to-whole? Stacked bar, treemap Shows composition
What are the connections? Network graph, Sankey Shows relationships, flows
What's the hierarchy? Tree, sunburst, treemap Shows parent-child structure
Where is it? Choropleth, symbol map Geographic context

By Data Volume

Volume Approach
<20 points Simple charts, direct labeling
20-500 Standard visualization
500-5000 Consider aggregation, filtering
5000+ Aggregation mandatory, or Canvas/WebGL

Common Anti-Patterns

  • Pie charts with >5 slices (use bar chart)
  • 3D charts without strong justification
  • Dual-axis with unrelated scales (misleading)
  • Non-zero baselines for bar charts (distorts perception)
  • Truncated axes without clear indication

4. Color

Palette Types

Type Use Case Examples
Sequential Low to high values Blues, Greens, Viridis
Diverging Values diverge from midpoint RdBu, BrBG, Spectral
Categorical Distinct categories Set2, Tableau10, Category10

Colorblind Safety

  • 8% of men, 0.5% of women have color vision deficiency
  • Never rely on color alone—use shape, pattern, labels
  • Safe sequential: viridis, cividis, plasma
  • Safe categorical: ColorBrewer's colorblind-safe options
  • Test with: Coblis, Sim Daltonism, Chrome DevTools

Perceptual Uniformity

  • Avoid rainbow colormaps (jet)—perceptual steps are uneven
  • Use viridis, parula, cividis for sequential data
  • These ensure equal perceptual distance between values

Color Guidelines

  • 4.5:1 contrast ratio for text (WCAG AA)
  • 3:1 contrast for UI components
  • Max 7-10 distinct categorical colors
  • Use saturation/lightness variation for emphasis

5. Layout Algorithms

🚨 Before implementing ANY layout algorithm, check if a library exists.

Algorithm → Library Mapping

Problem Algorithm Libraries
Layered/DAG graphs Sugiyama (1981) dagre, ELK.js
Force-directed networks Fruchterman-Reingold (1991) d3-force, Cytoscape.js
Tree layouts Reingold-Tilford (1981) d3-hierarchy
Treemaps Squarified (2000) d3-hierarchy, ECharts
Circle packing Wang (2006) d3-hierarchy
Sankey diagrams d3-sankey
Chord diagrams d3-chord
Large graphs (10k+) WebGL + spatial indexing Sigma.js, G6, deck.gl
Spatial queries Quadtree, R-tree d3-quadtree, rbush
Edge crossing minimization Barth (2002) Built into dagre/ELK

When to Use Each Layout

Layout Best For
Sugiyama (dagre) Flowcharts, dependency graphs, DAGs with direction
Force-directed Social networks, organic relationships, exploration
Tree Hierarchies with single parent per node
Treemap Hierarchies with quantitative values
Circular Emphasizing central nodes, ring structures
Matrix Dense graphs where edges would overlap

These problems are solved. Never implement from scratch.


6. Rendering & Performance

Rendering Technology Thresholds

<1000 elements    → SVG
                    - DOM events work naturally
                    - Accessibility (ARIA) supported
                    - Crisp at any zoom level
                    - CSS styling

1000-10000        → Canvas
                    - Batch rendering
                    - Manual hit testing required
                    - Lower memory footprint
                    - requestAnimationFrame for animation

>10000            → WebGL
                    - GPU acceleration
                    - Sigma.js, deck.gl, regl
                    - Complex setup
                    - Limited text rendering

Performance Patterns

Pattern When to Use
Web Workers Layout computation (never block main thread)
Spatial indexing Hit detection with quadtree/R-tree
Level-of-detail Simplify distant/small elements
Viewport culling Only render visible elements
Debouncing Expensive interactions (zoom, filter)
Virtualization Long lists of chart components
Aggregation Too many data points to render individually

Anti-Patterns

  • 5000 SVG nodes (use Canvas)
  • Layout computation on main thread
  • Hit testing without spatial indexing
  • Rendering off-screen elements
  • Animating thousands of elements individually

7. Libraries

Graph Layouts

Library Best For Notes
dagre Layered DAGs, flowcharts Sugiyama algorithm, good defaults
dagre-d3 dagre + D3 rendering SVG output
ELK.js Complex layouts, compound graphs Eclipse Layout Kernel, highly configurable
d3-force Organic networks Fruchterman-Reingold, customizable forces
Cytoscape.js Graph analysis + visualization Rich algorithm library
Sigma.js Large graphs (10k+) WebGL rendering
G6/AntV Enterprise graphs Full-featured, Chinese ecosystem
vis-network Quick prototypes Easy API, limited customization

Charting

Library Best For Notes
D3.js Custom, highly interactive Low-level, maximum control
Observable Plot Quick exploration D3 team, excellent defaults
Recharts React integration Declarative, composable
Victory React integration Animation support
ECharts Feature-rich dashboards Great mobile, large dataset support
Vega-Lite Grammar of graphics Declarative JSON spec
Chart.js Simple charts Easy setup, limited customization
Plotly Scientific visualization 3D support, interactivity

When to Use D3 vs Higher-Level Libraries

Use D3 when:

  • Need complete control over rendering
  • Building novel/custom visualizations
  • Integrating with existing SVG/Canvas code
  • Performance-critical with custom optimizations

Use higher-level libraries when:

  • Standard chart types suffice
  • Faster development time matters
  • Team less experienced with D3
  • Need built-in responsiveness/animation

8. Composition & Layout

Project Composition (Dashboard Level)

  • Visual hierarchy — Guide eye to most important first
  • Grid systems — Align elements for coherence
  • Grouping — Related visualizations together
  • White space — Breathing room, not wasted space
  • Reading flow — Z-pattern or F-pattern for Western audiences

Chart Composition (Single Chart)

Element Guidelines
Title Clear, descriptive; top-left or centered above
Subtitle Additional context; smaller, below title
Axes Labeled with units; tick marks at meaningful intervals
Legend Embedded when possible; external if complex
Aspect ratio Affects slope perception; 45° banking for trends
Margins Enough for labels; consistent across charts

Aspect Ratio Guidelines

  • Line charts: ~16:9 for trends (banking to 45°)
  • Bar charts: Depends on number of bars
  • Scatter plots: Often square (1:1) for correlation
  • Maps: Preserve geographic proportions

9. Annotation

Annotation Types

Type Purpose
Title The "what" — identifies the visualization
Subtitle Additional context, data source
Caption The "so what" — key insight or takeaway
Axis labels Variable names and units
Legend Decode color/shape/size mappings
Callouts Highlight specific data points
Reference lines Benchmarks, targets, averages
Source citation Data provenance

Best Practices

  • Annotate the insight, not just the data — "Sales peaked in Q3" not just "Sales over time"
  • Use callouts sparingly — Highlight 1-3 key points maximum
  • Direct labeling — Embed labels in chart when possible (vs separate legend)
  • Provide context — Benchmarks, historical reference, targets
  • Layer information — Overview visible, details on interaction

Text Hierarchy

  1. Title (largest, boldest)
  2. Subtitle/caption
  3. Axis titles
  4. Tick labels
  5. Annotations
  6. Source (smallest)

10. Accessibility

WCAG Requirements

  • AA minimum (AAA preferred)
  • 4.5:1 contrast ratio for normal text
  • 3:1 contrast for large text and UI components
  • No information conveyed by color alone

Keyboard Navigation

  • Tab through interactive elements
  • Arrow keys for traversing data points
  • Enter/Space for selection
  • Escape to cancel/close

Screen Reader Support

<svg role="img" aria-labelledby="chart-title chart-desc">
  <title id="chart-title">Monthly Sales 2024</title>
  <desc id="chart-desc">Bar chart showing sales increasing from $10M in January to $15M in December</desc>
</svg>
  • Use ARIA labels and roles
  • Provide text alternatives
  • Announce dynamic updates with live regions
  • Structure for logical reading order

Alternative Representations

  • Data tables — Provide as fallback for all charts
  • Text summaries — Describe key insights
  • Sonification — Audio representation for time-series
  • Tactile graphics — For physical accessibility

11. Anti-Patterns Summary

Design Anti-Patterns

Anti-Pattern Why It's Wrong What to Do
3D charts Distorts perception Use 2D
Pie >5 slices Hard to compare Use bar chart
Dual unrelated axes Misleading correlation Separate charts
Non-zero baseline Exaggerates differences Start at zero
Rainbow colormap Perceptually uneven Use viridis
Color-only encoding Excludes colorblind Add shape/pattern
Chart junk Distracts from data Remove decoration
Overplotting Hides data density Aggregate or jitter

Implementation Anti-Patterns

Anti-Pattern Why It's Wrong What to Do
Custom graph layout Reinventing solved problem Use dagre/ELK
5000 SVG nodes Poor performance Use Canvas
Main thread layout Blocks UI Use Web Worker
No spatial indexing Slow hit detection Use quadtree
Rendering off-screen Wasted computation Viewport culling

12. Academic Foundations

Seminal Papers

Paper Year Contribution
Cleveland & McGill "Graphical Perception" 1984 Visual encoding hierarchy
Shneiderman "The Eyes Have It" 1996 Overview-zoom-filter-details mantra
Gansner et al. "Drawing Directed Graphs" 1993 Foundation for dagre
Fruchterman & Reingold "Force-directed Placement" 1991 Foundation for d3-force
Sugiyama et al. "Hierarchical Systems" 1981 Layered graph layout
Barth et al. "Bilayer Cross Counting" 2002 Edge crossing minimization
Brewer "Color Use Guidelines" 1994 ColorBrewer palettes

Essential Resources

Resource Type Focus
ColorBrewer (colorbrewer2.org) Tool Accessible color palettes
From Data to Viz (data-to-viz.com) Guide Chart selection decision tree
Visualization Analysis & Design (Munzner) Textbook Comprehensive theory
Data Visualisation (Kirk) Textbook Practitioner guide
Visual Display of Quantitative Information (Tufte) Textbook Data-ink ratio, chart junk
D3 Gallery (observablehq.com/@d3/gallery) Examples Implementation patterns

Summary

🚨 Before implementing visualization:

  1. What question are you answering? → Select chart type
  2. What's your data volume? → Select rendering technology
  3. Is there an established algorithm? → Use the library
  4. Is it accessible? → Color, keyboard, screen reader
  5. Does it follow perceptual best practices? → Encoding hierarchy