Files
2025-11-30 08:38:26 +08:00

498 lines
15 KiB
Markdown

# Data Visualization
This resource provides cognitive design principles specifically for charts, dashboards, and data-driven presentations.
**Covered topics:**
1. Chart selection via task-encoding alignment
2. Dashboard hierarchy and grouping
3. Progressive disclosure for interactive exploration
4. Narrative data visualization
---
## Why Data Visualization Needs Cognitive Design
### WHY This Matters
**Core insight:** Data visualization is cognitive communication - the goal is insight transfer from data to human understanding. Success requires matching visual encodings to human perceptual strengths.
**Common problems:**
- Wrong chart type chosen (pie when bar would be clearer)
- Dashboards overwhelming users with too many metrics
- Interactive tools causing anxiety (users fear getting lost)
- Data stories lacking clear narrative guidance
**How cognitive principles help:**
- Choose encodings that exploit perceptual accuracy (Cleveland & McGill hierarchy)
- Design dashboards within working memory limits (chunking, hierarchy)
- Structure exploration to reduce cognitive load (progressive disclosure, visible state)
- Guide interpretation through annotations and narrative flow
**Mental model:** Data visualization is a translation problem - translating abstract data into visual form that human perception can process efficiently.
---
## What You'll Learn
**Four key areas:**
1. **Chart Selection:** Match visualization type to user task and perceptual capabilities
2. **Dashboard Design:** Organize multiple visualizations for at-a-glance comprehension
3. **Interactive Exploration:** Enable data investigation without overwhelming or losing users
4. **Narrative Visualization:** Tell data stories with clear flow and guided interpretation
---
## Why Chart Selection Matters
### WHY This Matters
**Core insight:** Not all visual encodings serve all tasks equally well - position/length enable 5-10x more accurate comparison than angle/area.
**Cleveland & McGill's Perceptual Hierarchy (most to least accurate):**
1. Position along common scale
2. Position along non-aligned scales
3. Length
4. Angle/Slope
5. Area
6. Volume
7. Color hue
8. Color saturation
**Implication:** Chart type choice directly determines user accuracy in extracting insights.
**Mental model:** Different charts are tools for different jobs - bar chart is like precision calipers (accurate comparison), pie chart is like eyeballing (rough proportion sense).
---
### WHAT to Apply
#### Task-to-Chart Mapping
**User Task: Compare Values**
**Best choice:** Bar chart (horizontal or vertical)
- **Why:** Position/length encoding (top of hierarchy)
- **Enables:** Precise magnitude comparison, easy ranking
- **Example:** Comparing sales across 6 regions
**Avoid:** Pie chart
- **Why:** Angle/area encoding (lower accuracy)
- **Problem:** Difficult to judge which slice is larger when differences are subtle
**Application rule:**
```
For precise comparison → bar chart with common baseline
Sort descending for easy ranking
Add data labels for exact values (dual coding)
```
---
**User Task: Show Trend Over Time**
**Best choice:** Line chart
- **Why:** Position along time axis shows progression naturally
- **Enables:** Pattern detection (rising, falling, cyclical), slope perception
- **Example:** Monthly revenue over 2 years
**Avoid:** Bar chart for trends (acceptable but line is clearer), stacked area (hard to compare non-bottom series)
**Application rule:**
```
Time always on x-axis (left to right)
Use line for continuous data, bar for discrete periods
Limit to 5-7 lines max (more becomes tangled mess)
Annotate significant events ("Product launch Q2")
```
---
**User Task: Understand Distribution**
**Best choice:** Histogram or box plot
- **Why:** Shows shape (normal, skewed, bimodal), spread, outliers
- **Enables:** Pattern recognition (is data normally distributed?), outlier identification
- **Example:** Distribution of customer order values
**Avoid:** Pie charts (can't show distribution), bar charts of individual values (too granular)
**Application rule:**
```
Histogram: Choose appropriate bin width (too narrow = noise, too wide = hide patterns)
Box plot: Shows median, quartiles, outliers clearly
Include sample size in label for context
```
---
**User Task: Show Part-to-Whole Relationship**
**Acceptable choice:** Pie chart (BUT only if ≤5 slices and differences clear)
- **Why:** Circular metaphor intuitive for "whole"
- **Limitations:** Hard to judge angles, tiny slices unreadable
**Better choice:** Stacked bar chart or treemap
- **Why:** Position/length still more accurate than angle
- **Enables:** Easier comparison of parts
**Application rule:**
```
If using pie: ≤5 slices, sort descending, label percentages directly
Better: Stacked bar (compare lengths) or treemap (hierarchical parts)
Always ask: "Do users need precise comparison?" → If yes, avoid pie
```
---
**User Task: Find Outliers or Clusters**
**Best choice:** Scatterplot
- **Why:** 2D position enables pattern detection, outliers pop out preattentively
- **Enables:** Correlation visualization, cluster identification
- **Example:** Relationship between marketing spend and revenue by product
**Application rule:**
```
Add trend line if correlation expected
Use color/shape for categorical third variable (max 5-7 categories)
Label outliers with annotations
Consider log scale if data spans orders of magnitude
```
---
**User Task: Compare Multiple Variables**
**Best choice:** Small multiples (repeated chart structure)
- **Why:** Consistent structure enables pattern comparison across conditions
- **Enables:** Seeing how patterns change by category
- **Example:** Monthly sales trends for each product category (separate line chart per category)
**Application rule:**
```
Keep axes consistent across multiples for fair comparison
Arrange in logical order (by average, alphabetically, geographically)
Limit to 6-12 multiples (more requires pagination)
Highlight one for focus if needed
```
---
## Why Dashboard Design Matters
### WHY This Matters
**Core insight:** Dashboards combine many visualizations competing for limited attention - without clear hierarchy and grouping, users experience cognitive overload.
**Problems dashboards solve:**
- At-a-glance status monitoring
- Quick decision-making
- Pattern detection across metrics
**Cognitive challenges:**
- Information density overwhelming working memory
- No clear entry point (where to look first?)
- Related metrics not visually grouped
- Excessive scrolling breaks mental model
**Mental model:** Dashboard is like airplane cockpit - instruments grouped by system, critical alerts preattentively salient, most important gauges in pilot's direct view.
---
### WHAT to Apply
#### Visual Hierarchy for Dashboards
**Principle:** Establish clear focal point with size, position, and contrast
**Application pattern:**
**Primary KPIs (3-4 max):**
```
- Position: Top-left (F-pattern start)
- Size: Large numbers (36-48px)
- Style: Bold, high contrast
- Purpose: "What's the current status?" answered in 5 seconds
```
**Secondary Metrics (5-10):**
```
- Position: Below or right of primary KPIs
- Size: Medium (18-24px)
- Grouping: Clustered by relationship (all conversion metrics together)
- Purpose: Supporting detail for primary KPIs
```
**Tertiary/Details:**
```
- Position: Lower sections or drill-down
- Size: Smaller (12-16px)
- Access: Details on demand (click to expand)
- Purpose: Deep investigation, not monitoring
```
**Example:** E-commerce dashboard - Primary KPIs top-left (revenue, conversion, users), secondary metrics mid (channels, products), tertiary bottom (details, history).
---
#### Gestalt Grouping for Clarity
**Principle:** Use proximity, similarity, and whitespace to show relationships
**Proximity:** Related metrics close, unrelated separated by whitespace
**Similarity:** Consistent styling (same charts styled same way, color encoding consistent)
**Example:** Traffic panel (sessions, pageviews, bounce), Conversion panel (rate, revenue, AOV) separated by whitespace, red for errors throughout
---
#### Chunking for Working Memory
**Principle:** Limit concurrent visualizations to 4±1 major groups
**Application rule:** Group >15 metrics into 3-5 categories, each with 3-5 metrics max = 9-25 metrics organized into 4±1 chunks (fits working memory).
---
#### Preattentive Salience for Alerts
**Principle:** Use red color ONLY for threshold violations requiring immediate action
**Application rule:** Normal = gray, alert = red (critical only). Avoid red for all negatives (causes alert fatigue). Example: Revenue down 2% = gray, down 15% = red threshold violation.
---
#### Data-Ink Ratio Maximization
**Principle:** Remove decorative elements; maximize ink showing actual data
**Remove:** Heavy gridlines, 3D effects, gradients, excessive decimals, decorative icons, chart borders
**Keep:** Data (points/lines/bars), minimal axes, direct labels, meaningful annotations
---
## Why Progressive Disclosure Matters
### WHY This Matters
**Core insight:** Users fear getting lost in complex data exploration - progressive disclosure (overview first, zoom/filter, then details) provides safety net and reduces cognitive load.
**Shneiderman's Mantra:** "Overview first, zoom and filter, then details on demand"
**Benefits:**
- Reduces initial cognitive load (don't show everything at once)
- Provides context before detail (prevents disorientation)
- Enables personalized investigation (users drill into what they care about)
- Lowers anxiety (visible state shows "where am I", undo provides safety)
**Mental model:** Like Google Maps - start with full map view (overview), zoom to neighborhood (filter), click building for details (details on demand).
---
### WHAT to Apply
#### Overview Level
**Purpose:** Provide context and entry point
**Design principles:**
```
Show: Overall pattern, main trends, big picture
Hide: Granular details, outliers (show in drill-down)
Enable: Quick understanding of "what's the story here?"
```
**Example:**
```
Sales Dashboard Overview:
- Total sales number (aggregated)
- Trend line (last 90 days)
- Top 3 performing regions (summary)
User understands: "Sales are up 12% this quarter, West region leading"
```
---
#### Filter/Zoom Level
**Purpose:** Focus on subset of interest
**Design principles:**
```
Show: Filtered results clearly
Externalize: Active filters as visible chips/tags (recognition, not recall)
Enable: Easy modification (click chip to remove filter)
Provide: Clear "Reset all filters" option
```
**Example:**
```
User clicks "West region" from overview:
- Dashboard updates to show only West region data
- Visible chip at top: "Region: West" with X to remove
- Breadcrumb: "All Regions > West"
- "Reset" button returns to overview
User sees: "I'm viewing West region data, can easily get back"
```
---
#### Details on Demand Level
**Purpose:** Reveal specifics only when needed
**Design principles:**
```
Trigger: Hover, click, or expand action (not visible by default)
Show: Granular data, additional metrics, raw numbers
Position: Tooltip, modal, or expandable section (doesn't disrupt main view)
```
**Example:**
```
User hovers over data point on trend line:
- Tooltip appears: "June 15, 2024: $45,234 sales, 234 orders, $193 AOV"
- User moves mouse away: Tooltip disappears
- Main chart unchanged (not cluttered with all this detail by default)
```
---
#### State Visibility
**Principle:** Always show current navigation state - don't make users remember
**Critical state to externalize:**
```
✓ Active filters (as removable chips)
✓ Current zoom level or time range
✓ Drilled-down path (breadcrumbs)
✓ Sorting/grouping applied
✓ Comparison baseline (if comparing to previous period)
```
**Example implementation:**
```
Dashboard header always shows:
- Time range: "Last 30 days" (clickable to change)
- Filters: [Region: West] [Product: Widget] (each with X to remove)
- Comparison: "vs. previous 30 days" (toggle on/off)
- Breadcrumb: "Overview > West > Widget Sales"
User never wonders: "What am I looking at? How did I get here?"
```
---
## Why Narrative Visualization Matters
### WHY This Matters
**Core insight:** People naturally seek stories with cause-effect and chronology - structuring data as narrative chunks information meaningfully and aids retention.
**Benefits of narrative structure:**
- Easier to process (story arc vs heap of facts)
- Better retention (narrative = memorable)
- Guided interpretation (annotations prevent misunderstanding)
- Emotional engagement (storytelling activates empathy)
**Mental model:** Data journalism isn't a data dump - it's a guided tour where the designer leads readers to insights.
---
### WHAT to Apply
#### Narrative Structure
**Classic arc:** Context → Problem/Question → Data/Evidence → Resolution/Insight
**Application to visualization:**
**Context (What's the situation?):**
```
- Title that sets stage: "U.S. Solar Energy Adoption 2010-2024"
- Subtitle: "How policy changes drove growth"
- Brief intro text or annotation providing background
```
**Problem/Question (What are we investigating?):**
```
- Visual question posed: "Did tax incentives accelerate adoption?"
- Annotation highlighting the question point on chart
```
**Data/Evidence (What does data show?):**
```
- Chart showing trend with clear inflection point
- Annotation: "Tax credit introduced 2015" with arrow to spike
- Supporting data: % increase after policy vs before
```
**Resolution/Insight (What did we learn?):**
```
- Conclusion annotation: "Adoption rate tripled post-incentive"
- Implications: "States with stronger incentives saw faster growth"
- Call-to-action or next question (optional)
```
---
#### Annotation Strategy
**Principle:** Guide attention to key insights; prevent misinterpretation
**Types of annotations:**
**Callout boxes:**
```
Purpose: Highlight main insight
Position: Near relevant data point
Style: Contrasting color or background
Content: 1-2 sentences max ("Sales spiked 45% after campaign launch")
```
**Arrows/Lines:**
```
Purpose: Connect explanation to data
Use: Point from text to specific data element
Example: Arrow from "Product launch" text to spike in line chart
```
**Shaded regions:**
```
Purpose: Mark time periods or ranges
Use: Highlight recession periods, policy changes, events
Example: Gray shaded region "COVID-19 lockdown Mar-May 2020"
```
**Direct labels:**
```
Purpose: Replace legend lookups
Use: Label lines/bars directly instead of requiring legend
Example: Line chart with "North region" text next to the line (not legend box)
```
**Application rule:**
```
Annotate: Main insight (what users should notice)
Annotate: Unusual events (explain outliers/anomalies)
Don't annotate: Obvious patterns (if trend is clear, don't state "going up")
Test: Can user grasp message without annotations? If not, add guidance
```
---
#### Scrollytelling
**Definition:** Visualization that updates or highlights aspects as user scrolls
**Benefits:**
- Maintains context (chart stays visible while story progresses)
- Reveals complexity gradually (progressive disclosure in narrative form)
- Engaging (interactive storytelling)
**Application pattern:**
**Example progression:** Start with full context (0%) → Highlight specific periods as user scrolls (33%, 66%) → End with full picture + projection (100%). Chart stays visible, smooth transitions, user can scroll back, provide skip option.