Files
gh-ntcoding-claude-skillz-d…/skills/data-visualization/SKILL.md
2025-11-30 08:44:41 +08:00

454 lines
15 KiB
Markdown

---
name: Data Visualization
description: "Comprehensive data visualization skill covering visual execution and technical implementation. Includes perceptual foundations, chart selection, layout algorithms, and library guidance. Load on-demand when building charts, graphs, dashboards, or any visual data representation."
version: 1.0.0
---
# Data Visualization
Visualization is communication. Every visual element must serve understanding.
## Critical Rules
🚨 **Use established algorithms.** Graph layout, tree layout, spatial indexing—these problems are solved. Check dagre, d3-force, ELK.js before implementing anything custom.
🚨 **Choose encodings by perceptual accuracy.** Position beats length beats angle beats area beats color. Prefer bar charts over pie charts over bubble charts.
🚨 **Never rely on color alone.** 8% of men are colorblind. Use shape, pattern, or labels as backup encoding.
🚨 **Match rendering to scale.** SVG for <1000 elements, Canvas for 1000-10000, WebGL for >10000.
---
## 1. Visual Encoding
### Marks & Channels
**Marks** are geometric primitives representing data:
- Points (scatter plots, dot plots)
- Lines (line charts, network edges)
- Areas (bar charts, area charts, maps)
**Channels** are visual properties applied to marks:
- Position (x, y coordinates)
- Size (length, area, volume)
- Color (hue, saturation, lightness)
- Shape (circle, square, triangle)
- Orientation (angle, slope)
### Cleveland & McGill Hierarchy (1984)
Visual encodings ranked by perceptual accuracy:
1. **Position along common scale** (most accurate)
2. Position on non-aligned scales
3. Length
4. Angle/slope
5. Area
6. Volume
7. **Color saturation/hue** (least accurate)
**Implication:** Bar charts (position) > pie charts (angle) > bubble charts (area)
### Preattentive Attributes
Properties processed in <250ms without conscious effort:
- Color (hue, saturation)
- Form (orientation, length, width, size, shape)
- Spatial position
- Motion
Use preattentive attributes for the most important data—they "pop out" automatically.
### Channel Effectiveness by Data Type
| Data Type | Best Channels |
|-----------|---------------|
| Quantitative | Position, length, angle, area |
| Ordinal | Position, density, saturation |
| Categorical | Shape, hue, spatial region |
---
## 2. Interaction Design
### Shneiderman's Mantra (1996)
"Overview first, zoom and filter, then details on demand"
1. **Overview** — Show entire dataset, establish context
2. **Zoom & Filter** — Reduce complexity, focus on subset
3. **Details on Demand** — Tooltips, click-to-expand, drill-down
### Interaction Patterns
| Pattern | Use Case |
|---------|----------|
| Brushing & linking | Cross-highlighting across coordinated views |
| Focus + context | Fisheye lens, detail-on-demand panels |
| Direct manipulation | Drag nodes, resize elements, reorder |
| Animated transitions | Help users track changes between states |
| Pan & zoom | Navigate large visualizations |
| Filtering | Reduce data to relevant subset |
| Selection | Highlight specific data points |
---
## 3. Chart Selection
### By Question Type
| Question | Chart Type | Why |
|----------|------------|-----|
| How do values compare? | Bar chart | Position encoding is most accurate |
| How has this changed over time? | Line chart | Shows trends, handles many points |
| What's the distribution? | Histogram, box plot | Shows spread, outliers, shape |
| What's the relationship? | Scatter plot | Reveals correlation, clusters |
| What's the part-to-whole? | Stacked bar, treemap | Shows composition |
| What are the connections? | Network graph, Sankey | Shows relationships, flows |
| What's the hierarchy? | Tree, sunburst, treemap | Shows parent-child structure |
| Where is it? | Choropleth, symbol map | Geographic context |
### By Data Volume
| Volume | Approach |
|--------|----------|
| <20 points | Simple charts, direct labeling |
| 20-500 | Standard visualization |
| 500-5000 | Consider aggregation, filtering |
| 5000+ | Aggregation mandatory, or Canvas/WebGL |
### Common Anti-Patterns
- ❌ Pie charts with >5 slices (use bar chart)
- ❌ 3D charts without strong justification
- ❌ Dual-axis with unrelated scales (misleading)
- ❌ Non-zero baselines for bar charts (distorts perception)
- ❌ Truncated axes without clear indication
---
## 4. Color
### Palette Types
| Type | Use Case | Examples |
|------|----------|----------|
| Sequential | Low to high values | Blues, Greens, Viridis |
| Diverging | Values diverge from midpoint | RdBu, BrBG, Spectral |
| Categorical | Distinct categories | Set2, Tableau10, Category10 |
### Colorblind Safety
- 8% of men, 0.5% of women have color vision deficiency
- **Never rely on color alone**—use shape, pattern, labels
- Safe sequential: viridis, cividis, plasma
- Safe categorical: ColorBrewer's colorblind-safe options
- Test with: Coblis, Sim Daltonism, Chrome DevTools
### Perceptual Uniformity
- **Avoid rainbow colormaps** (jet)—perceptual steps are uneven
- Use viridis, parula, cividis for sequential data
- These ensure equal perceptual distance between values
### Color Guidelines
- 4.5:1 contrast ratio for text (WCAG AA)
- 3:1 contrast for UI components
- Max 7-10 distinct categorical colors
- Use saturation/lightness variation for emphasis
---
## 5. Layout Algorithms
🚨 **Before implementing ANY layout algorithm, check if a library exists.**
### Algorithm → Library Mapping
| Problem | Algorithm | Libraries |
|---------|-----------|-----------|
| Layered/DAG graphs | Sugiyama (1981) | dagre, ELK.js |
| Force-directed networks | Fruchterman-Reingold (1991) | d3-force, Cytoscape.js |
| Tree layouts | Reingold-Tilford (1981) | d3-hierarchy |
| Treemaps | Squarified (2000) | d3-hierarchy, ECharts |
| Circle packing | Wang (2006) | d3-hierarchy |
| Sankey diagrams | — | d3-sankey |
| Chord diagrams | — | d3-chord |
| Large graphs (10k+) | WebGL + spatial indexing | Sigma.js, G6, deck.gl |
| Spatial queries | Quadtree, R-tree | d3-quadtree, rbush |
| Edge crossing minimization | Barth (2002) | Built into dagre/ELK |
### When to Use Each Layout
| Layout | Best For |
|--------|----------|
| Sugiyama (dagre) | Flowcharts, dependency graphs, DAGs with direction |
| Force-directed | Social networks, organic relationships, exploration |
| Tree | Hierarchies with single parent per node |
| Treemap | Hierarchies with quantitative values |
| Circular | Emphasizing central nodes, ring structures |
| Matrix | Dense graphs where edges would overlap |
**These problems are solved. Never implement from scratch.**
---
## 6. Rendering & Performance
### Rendering Technology Thresholds
```
<1000 elements → SVG
- DOM events work naturally
- Accessibility (ARIA) supported
- Crisp at any zoom level
- CSS styling
1000-10000 → Canvas
- Batch rendering
- Manual hit testing required
- Lower memory footprint
- requestAnimationFrame for animation
>10000 → WebGL
- GPU acceleration
- Sigma.js, deck.gl, regl
- Complex setup
- Limited text rendering
```
### Performance Patterns
| Pattern | When to Use |
|---------|-------------|
| Web Workers | Layout computation (never block main thread) |
| Spatial indexing | Hit detection with quadtree/R-tree |
| Level-of-detail | Simplify distant/small elements |
| Viewport culling | Only render visible elements |
| Debouncing | Expensive interactions (zoom, filter) |
| Virtualization | Long lists of chart components |
| Aggregation | Too many data points to render individually |
### Anti-Patterns
- ❌ 5000 SVG nodes (use Canvas)
- ❌ Layout computation on main thread
- ❌ Hit testing without spatial indexing
- ❌ Rendering off-screen elements
- ❌ Animating thousands of elements individually
---
## 7. Libraries
### Graph Layouts
| Library | Best For | Notes |
|---------|----------|-------|
| dagre | Layered DAGs, flowcharts | Sugiyama algorithm, good defaults |
| dagre-d3 | dagre + D3 rendering | SVG output |
| ELK.js | Complex layouts, compound graphs | Eclipse Layout Kernel, highly configurable |
| d3-force | Organic networks | Fruchterman-Reingold, customizable forces |
| Cytoscape.js | Graph analysis + visualization | Rich algorithm library |
| Sigma.js | Large graphs (10k+) | WebGL rendering |
| G6/AntV | Enterprise graphs | Full-featured, Chinese ecosystem |
| vis-network | Quick prototypes | Easy API, limited customization |
### Charting
| Library | Best For | Notes |
|---------|----------|-------|
| D3.js | Custom, highly interactive | Low-level, maximum control |
| Observable Plot | Quick exploration | D3 team, excellent defaults |
| Recharts | React integration | Declarative, composable |
| Victory | React integration | Animation support |
| ECharts | Feature-rich dashboards | Great mobile, large dataset support |
| Vega-Lite | Grammar of graphics | Declarative JSON spec |
| Chart.js | Simple charts | Easy setup, limited customization |
| Plotly | Scientific visualization | 3D support, interactivity |
### When to Use D3 vs Higher-Level Libraries
**Use D3 when:**
- Need complete control over rendering
- Building novel/custom visualizations
- Integrating with existing SVG/Canvas code
- Performance-critical with custom optimizations
**Use higher-level libraries when:**
- Standard chart types suffice
- Faster development time matters
- Team less experienced with D3
- Need built-in responsiveness/animation
---
## 8. Composition & Layout
### Project Composition (Dashboard Level)
- **Visual hierarchy** — Guide eye to most important first
- **Grid systems** — Align elements for coherence
- **Grouping** — Related visualizations together
- **White space** — Breathing room, not wasted space
- **Reading flow** — Z-pattern or F-pattern for Western audiences
### Chart Composition (Single Chart)
| Element | Guidelines |
|---------|------------|
| Title | Clear, descriptive; top-left or centered above |
| Subtitle | Additional context; smaller, below title |
| Axes | Labeled with units; tick marks at meaningful intervals |
| Legend | Embedded when possible; external if complex |
| Aspect ratio | Affects slope perception; 45° banking for trends |
| Margins | Enough for labels; consistent across charts |
### Aspect Ratio Guidelines
- **Line charts:** ~16:9 for trends (banking to 45°)
- **Bar charts:** Depends on number of bars
- **Scatter plots:** Often square (1:1) for correlation
- **Maps:** Preserve geographic proportions
---
## 9. Annotation
### Annotation Types
| Type | Purpose |
|------|---------|
| Title | The "what" — identifies the visualization |
| Subtitle | Additional context, data source |
| Caption | The "so what" — key insight or takeaway |
| Axis labels | Variable names and units |
| Legend | Decode color/shape/size mappings |
| Callouts | Highlight specific data points |
| Reference lines | Benchmarks, targets, averages |
| Source citation | Data provenance |
### Best Practices
- **Annotate the insight, not just the data** — "Sales peaked in Q3" not just "Sales over time"
- **Use callouts sparingly** — Highlight 1-3 key points maximum
- **Direct labeling** — Embed labels in chart when possible (vs separate legend)
- **Provide context** — Benchmarks, historical reference, targets
- **Layer information** — Overview visible, details on interaction
### Text Hierarchy
1. Title (largest, boldest)
2. Subtitle/caption
3. Axis titles
4. Tick labels
5. Annotations
6. Source (smallest)
---
## 10. Accessibility
### WCAG Requirements
- **AA minimum** (AAA preferred)
- 4.5:1 contrast ratio for normal text
- 3:1 contrast for large text and UI components
- No information conveyed by color alone
### Keyboard Navigation
- Tab through interactive elements
- Arrow keys for traversing data points
- Enter/Space for selection
- Escape to cancel/close
### Screen Reader Support
```html
<svg role="img" aria-labelledby="chart-title chart-desc">
<title id="chart-title">Monthly Sales 2024</title>
<desc id="chart-desc">Bar chart showing sales increasing from $10M in January to $15M in December</desc>
</svg>
```
- Use ARIA labels and roles
- Provide text alternatives
- Announce dynamic updates with live regions
- Structure for logical reading order
### Alternative Representations
- **Data tables** — Provide as fallback for all charts
- **Text summaries** — Describe key insights
- **Sonification** — Audio representation for time-series
- **Tactile graphics** — For physical accessibility
---
## 11. Anti-Patterns Summary
### Design Anti-Patterns
| Anti-Pattern | Why It's Wrong | What to Do |
|--------------|----------------|------------|
| 3D charts | Distorts perception | Use 2D |
| Pie >5 slices | Hard to compare | Use bar chart |
| Dual unrelated axes | Misleading correlation | Separate charts |
| Non-zero baseline | Exaggerates differences | Start at zero |
| Rainbow colormap | Perceptually uneven | Use viridis |
| Color-only encoding | Excludes colorblind | Add shape/pattern |
| Chart junk | Distracts from data | Remove decoration |
| Overplotting | Hides data density | Aggregate or jitter |
### Implementation Anti-Patterns
| Anti-Pattern | Why It's Wrong | What to Do |
|--------------|----------------|------------|
| Custom graph layout | Reinventing solved problem | Use dagre/ELK |
| 5000 SVG nodes | Poor performance | Use Canvas |
| Main thread layout | Blocks UI | Use Web Worker |
| No spatial indexing | Slow hit detection | Use quadtree |
| Rendering off-screen | Wasted computation | Viewport culling |
---
## 12. Academic Foundations
### Seminal Papers
| Paper | Year | Contribution |
|-------|------|--------------|
| Cleveland & McGill "Graphical Perception" | 1984 | Visual encoding hierarchy |
| Shneiderman "The Eyes Have It" | 1996 | Overview-zoom-filter-details mantra |
| Gansner et al. "Drawing Directed Graphs" | 1993 | Foundation for dagre |
| Fruchterman & Reingold "Force-directed Placement" | 1991 | Foundation for d3-force |
| Sugiyama et al. "Hierarchical Systems" | 1981 | Layered graph layout |
| Barth et al. "Bilayer Cross Counting" | 2002 | Edge crossing minimization |
| Brewer "Color Use Guidelines" | 1994 | ColorBrewer palettes |
### Essential Resources
| Resource | Type | Focus |
|----------|------|-------|
| ColorBrewer (colorbrewer2.org) | Tool | Accessible color palettes |
| From Data to Viz (data-to-viz.com) | Guide | Chart selection decision tree |
| Visualization Analysis & Design (Munzner) | Textbook | Comprehensive theory |
| Data Visualisation (Kirk) | Textbook | Practitioner guide |
| Visual Display of Quantitative Information (Tufte) | Textbook | Data-ink ratio, chart junk |
| D3 Gallery (observablehq.com/@d3/gallery) | Examples | Implementation patterns |
---
## Summary
🚨 **Before implementing visualization:**
1. **What question are you answering?** → Select chart type
2. **What's your data volume?** → Select rendering technology
3. **Is there an established algorithm?** → Use the library
4. **Is it accessible?** → Color, keyboard, screen reader
5. **Does it follow perceptual best practices?** → Encoding hierarchy