Files
gh-lyndonkl-claude/skills/cognitive-design/resources/evaluation-tools.md
2025-11-30 08:38:26 +08:00

17 KiB

Evaluation Tools

This resource provides systematic checklists and frameworks for evaluating designs against cognitive principles.

Tools covered:

  1. Cognitive Design Checklist (general interface/visualization evaluation)
  2. Visualization Audit Framework (4-criteria data visualization quality assessment)

Why Systematic Evaluation

WHY This Matters

Core insight: Cognitive design has multiple dimensions - visibility, hierarchy, chunking, consistency, feedback, memory support, integrity. Ad-hoc review often misses issues in one or more dimensions.

Benefits of systematic evaluation:

  • Comprehensive coverage: Ensures all cognitive principles checked
  • Objective assessment: Reduces subjective bias
  • Catches issues early: Before launch or during design critiques
  • Team alignment: Shared criteria for quality
  • Measurable improvement: Track fixes over time

Mental model: Like a pre-flight checklist for pilots - systematically verify all critical systems before takeoff.

Without systematic evaluation: missed cognitive issues, inconsistent quality, user confusion that could have been prevented.

Use when:

  • Conducting design reviews/critiques
  • Evaluating existing designs for improvement
  • Quality assurance before launch
  • Diagnosing why design feels "off"
  • Teaching/mentoring cognitive design

What You'll Learn

Two complementary tools:

Cognitive Design Checklist: General-purpose evaluation for any interface, visualization, or content

  • Quick questions across 6 dimensions
  • Suitable for any design context
  • 10-15 minutes for thorough review

Visualization Audit Framework: Specialized 4-criteria assessment for data visualizations

  • Clarity, Efficiency, Integrity, Aesthetics
  • Systematic quality scoring
  • 15-30 minutes depending on complexity

Why Cognitive Design Checklist

WHY This Matters

Purpose: Catch glaring cognitive problems before they reach users.

Coverage areas:

  1. Visibility & Comprehension (can users see and understand?)
  2. Visual Hierarchy (what gets noticed first?)
  3. Chunking & Organization (fits working memory?)
  4. Simplicity & Clarity (extraneous elements removed?)
  5. Memory Support (state externalized?)
  6. Feedback & Interaction (immediate responses?)
  7. Consistency (patterns maintained?)
  8. Scanning Patterns (layout leverages F/Z-pattern?)

Mental model: Like a doctor's diagnostic checklist - systematically check each vital sign.


WHAT to Check

1. Visibility & Immediate Comprehension

Goal: Core message/purpose graspable in ≤5 seconds

Checklist:

  • Can users identify the purpose/main message within 5 seconds? (5-second test)
  • Is important information visible without scrolling (above fold)?
  • Is text/content legible? (sufficient size, contrast, line length)
  • Are interactive elements distinguishable from static content?

Test: 5-second test (show design, ask what they remember). Pass: Identify purpose. Fail: Remember decoration or miss point. Common failures: Cluttered layout, poor contrast, content buried below fold. Fix priorities: CRITICAL (contrast), HIGH (5-second clarity), MEDIUM (hierarchy)


2. Visual Hierarchy

Goal: Users can distinguish primary vs secondary vs tertiary content

Checklist:

  • Is visual hierarchy clear? (size, contrast, position differentiate importance)
  • Do headings/labels form clear levels? (H1 > H2 > H3 > body)
  • Does design pass "squint test"? (important elements still visible when blurred)
  • Are calls-to-action visually prominent?

Test: Squint test (blur design). Pass: Important elements visible when blurred. Fail: Everything same weight. Common failures: Everything same size, primary CTA not distinguished, decoration more prominent than data. Fix priorities: HIGH (primary not prominent), MEDIUM (heading hierarchy), LOW (minor adjustments)


3. Chunking & Organization

Goal: Information grouped to fit working memory capacity (4±1 chunks, max 7)

Checklist:

  • Are long lists broken into categories? (≤7 items per unbroken list)
  • Are related items visually grouped? (proximity, backgrounds, whitespace)
  • Is navigation organized into logical categories? (≤7 top-level items)
  • Are form fields grouped by relationship? (personal info, account, preferences)

Test: Count ungrouped items. Pass: ≤7 items or clear grouping. Fail: >7 items ungrouped. Common failures: 15+ flat navigation, 30-field ungrouped form, 20 equal-weight metrics. Fix priorities: CRITICAL (>10 ungrouped), HIGH (7-10 ungrouped), MEDIUM (clearer groups)


4. Simplicity & Clarity

Goal: Every element serves user goal; extraneous elements removed

Checklist:

  • Can you justify every visual element? (Does it convey information or improve usability?)
  • Is data-ink ratio high? (maximize ink showing data, minimize decoration)
  • Are decorative elements eliminated? (chartjunk, unnecessary lines, ornaments)
  • Is terminology familiar or explained? (no unexplained jargon)

Test: Point to each element, ask "What purpose?" Pass: Every element justified. Fail: Decorative/unclear elements. Common failures: Chartjunk (3D, backgrounds, excess gridlines), jargon, redundant elements. Fix priorities: HIGH (decoration competing with data), MEDIUM (unexplained terms), LOW (minor simplification)


5. Memory Support

Goal: Users don't need to remember what could be shown (recognition over recall)

Checklist:

  • Is current system state visible? (active filters, current page, progress through flow)
  • Are navigation breadcrumbs provided? (where am I, how did I get here)
  • For multi-step processes, is progress shown? (wizard step X of Y)
  • Are options presented rather than requiring recall? (dropdowns vs typed commands)

Test: Identify what users must remember. Ask "Could this be shown?" Pass: State visible. Fail: Relying on memory. Common failures: No active filter indication, no progress indicator, hidden state. Fix priorities: CRITICAL (lost in flow), HIGH (critical state invisible), MEDIUM (minor memory aids)


6. Feedback & Interaction

Goal: Every action gets immediate, clear feedback

Checklist:

  • Do all interactive elements provide immediate feedback? (hover states, click feedback)
  • Are loading states shown? (spinners, progress bars for waits >1 second)
  • Do form fields validate inline? (immediate feedback, not after submit)
  • Are error messages contextual? (next to problem, not top of page)
  • Are success confirmations shown? ("Saved", checkmarks)

Test: Click/interact. Pass: Feedback within 100ms. Fail: No feedback or delayed >1s without indicator. Common failures: No hover states, no loading indicator, errors not contextual, no success confirmation. Fix priorities: CRITICAL (no feedback for critical actions), HIGH (delayed without loading), MEDIUM (missing hover)


7. Consistency

Goal: Repeated patterns throughout (terminology, layout, interactions, visual style)

Checklist:

  • Is terminology consistent? (same words for same concepts)
  • Are UI patterns consistent? (buttons, links, inputs styled uniformly)
  • Is color usage consistent? (red = error, green = success throughout)
  • Are interaction patterns predictable? (click/tap behavior consistent)

Test: List similar elements, check consistency. Pass: Identical styling/behavior. Fail: Unjustified variations. Common failures: Inconsistent terminology ("Email" vs "E-mail"), visual inconsistency (button styles vary), semantic inconsistency (red means error and negative). Fix priorities: HIGH (terminology), MEDIUM (visual styling), LOW (minor patterns)


8. Scanning Patterns

Goal: Layout leverages predictable F-pattern or Z-pattern scanning

Checklist:

  • Is primary content positioned top-left? (where scanning starts)
  • For text-heavy content, does layout follow F-pattern? (top horizontal, then down left, short mid horizontal)
  • For visual-heavy content, does layout follow Z-pattern? (top-left to top-right, diagonal to bottom-left, then bottom-right)
  • Are terminal actions positioned bottom-right? (where scanning ends)

Test: Trace eye movement (F/Z pattern). Pass: Critical elements on path. Fail: Important content off path. Common failures: Primary CTA bottom-left (off Z-pattern), key info middle-right (off F-pattern), patterns ignored. Fix priorities: MEDIUM (CTA off path), LOW (secondary optimization)


Why Visualization Audit Framework

WHY This Matters

Purpose: Comprehensive quality assessment for data visualizations across four independent dimensions.

Key insight: Visualization quality requires success on ALL four criteria - high score on one doesn't compensate for failure on another.

Four Criteria:

  1. Clarity: Immediately understandable and unambiguous
  2. Efficiency: Minimal cognitive effort to extract information
  3. Integrity: Truthful and free from misleading distortions
  4. Aesthetics: Visually pleasing and appropriate

Mental model: Like evaluating a car - needs to be safe (integrity), functional (efficiency), easy to use (clarity), and pleasant (aesthetics). Missing any dimension makes it poor overall.

Use when:

  • Evaluating data visualizations (charts, dashboards, infographics)
  • Choosing between visualization alternatives
  • Quality assurance before publication
  • Diagnosing why visualization isn't working

WHAT to Audit

Criterion 1: Clarity

Question: Is visualization immediately understandable and unambiguous?

Checklist:

  • Is main message obvious or clearly annotated?
  • Are axes labeled with units?
  • Is legend clear and necessary? (or use direct labels if possible)
  • Is title descriptive? (conveys what's being shown)
  • Are annotations used to guide interpretation?
  • Is chart type appropriate for message?

5-Second Test:

  • Show visualization for 5 seconds
  • Ask: "What's the main point?"
    • Pass: Correctly identify main insight
    • Fail: Confused or remember decorative elements instead

Scoring:

  • 5 (Excellent): Main message graspable in <5 seconds, perfectly labeled
  • 4 (Good): Clear with minor improvements needed (e.g., better title)
  • 3 (Adequate): Understandable but requires effort
  • 2 (Needs work): Ambiguous or missing critical labels
  • 1 (Poor): Incomprehensible

Criterion 2: Efficiency

Question: Can users extract information with minimal cognitive effort?

Checklist:

  • Are encodings appropriate for task? (position/length for comparison, not angle/area)
  • Is chart type matched to user task? (compare → bar, trend → line, distribution → histogram)
  • Is comparison easy? (common baseline, aligned scales)
  • Is cross-referencing minimized? (direct labels instead of legend lookups)
  • Are cognitive shortcuts enabled? (sorting by value, highlighting key points)

Encoding Check:

  • Identify user task (compare, see trend, find outliers)
  • Check encoding against Cleveland & McGill hierarchy
    • Pass: Position/length used for precise comparisons
    • Fail: Angle/area/color used when position would work better

Scoring:

  • 5 (Excellent): Optimal encoding, zero wasted cognitive effort
  • 4 (Good): Appropriate with minor inefficiencies
  • 3 (Adequate): Works but more effort than necessary
  • 2 (Needs work): Poor encoding choice (pie when bar would be better)
  • 1 (Poor): Wrong chart type for task

Criterion 3: Integrity

Question: Is visualization truthful and free from misleading distortions?

Checklist:

  • Do axes start at zero (or clearly note truncation)?
  • Are scale intervals uniform?
  • Is data complete? (not cherry-picked dates hiding context)
  • Are comparisons fair? (same scale for compared items)
  • Is context provided? (baselines, historical comparison, benchmarks)
  • Are limitations noted? (sample size, data source, margin of error)

Integrity Tests:

  1. Axis test: Does y-axis start at zero for bar charts? If not, is truncation clearly noted?

    • Pass: Zero baseline or explicit truncation note
    • Fail: Truncated axis exaggerating differences without disclosure
  2. Completeness test: Is full relevant time period shown? Or cherry-picked subset?

    • Pass: Complete data with context
    • Fail: Selective dates hiding broader trend
  3. Fairness test: Are compared items on same scale?

    • Pass: Common scale enables fair comparison
    • Fail: Dual-axis manipulation creates false correlation

Scoring:

  • 5 (Excellent): Completely honest, full context provided
  • 4 (Good): Honest with minor context improvements possible
  • 3 (Adequate): Not misleading but could provide more context
  • 2 (Needs work): Distortions present (truncated axis, cherry-picked data)
  • 1 (Poor): Actively misleading (severe distortions, no context)

CRITICAL: Scores below 3 on integrity are unacceptable - fix immediately


Criterion 4: Aesthetics

Question: Is visualization visually pleasing and appropriate for context?

Checklist:

  • Is visual design professional and polished?
  • Is color palette appropriate? (not garish, suits content tone)
  • Is whitespace used effectively? (not cramped, not wasteful)
  • Are typography choices appropriate? (readable, professional)
  • Does style match context? (serious for finance, friendly for consumer)

Important: Aesthetics should NEVER undermine clarity or integrity

Scoring:

  • 5 (Excellent): Beautiful and appropriate, enhances engagement
  • 4 (Good): Pleasant and professional
  • 3 (Adequate): Acceptable, not ugly but not polished
  • 2 (Needs work): Amateurish or inappropriate style
  • 1 (Poor): Ugly or completely inappropriate

Trade-off Note: If forced to choose, prioritize Clarity and Integrity over Aesthetics


Using the 4-Criteria Framework

Process:

Step 1: Evaluate each criterion independently

  • Score Clarity (1-5)
  • Score Efficiency (1-5)
  • Score Integrity (1-5)
  • Score Aesthetics (1-5)

Step 2: Calculate average

  • Average score = (Clarity + Efficiency + Integrity + Aesthetics) / 4
  • Pass threshold: ≥3.5 average
  • Critical failures: Any individual score <3 requires attention

Step 3: Identify weakest dimension

  • Which criterion has lowest score?
  • This is your primary improvement target

Step 4: Prioritize fixes

  1. CRITICAL: Integrity < 3 (fix immediately - misleading is unacceptable)
  2. HIGH: Clarity or Efficiency < 3 (users can't understand or use it)
  3. MEDIUM: Aesthetics < 3 (affects engagement)
  4. LOW: Scores 3-4 (optimization, not critical)

Step 5: Verify fixes don't harm other dimensions

  • Example: Improving aesthetics shouldn't reduce clarity
  • Example: Improving efficiency shouldn't compromise integrity

Examples of Evaluation in Practice

Example 1: Dashboard Review Using Checklist

Context: Team dashboard with 20 metrics, users overwhelmed and missing alerts

Checklist Results:

  • Visibility: Too cluttered, 20 equal-weight metrics
  • Hierarchy: Everything same size, alerts not prominent
  • Chunking: 15 ungrouped metrics (exceeds working memory)
  • Simplicity: Chartjunk (gridlines, 3D, gradients)
  • Memory: No active filter indication
  • Feedback: Hover states, loading indicators present
  • ⚠️ Consistency: Mostly consistent, minor button variations
  • Scanning: Key KPI bottom-right (off F-pattern)

Fixes: (1) Reduce to 3-4 primary KPIs top-left, group others. (2) Remove chartjunk, establish hierarchy. (3) Show active filters as chips. (4) Standardize buttons.

Outcome: Users grasp status in 5 seconds, find alerts immediately


Example 2: Bar Chart Audit Using 4 Criteria

Context: Q4 sales bar chart for presentation

Audit Scores:

  • Clarity (4/5): Clear title/labels, direct bar labels. Minor: Could annotate top performer.
  • Efficiency (5/5): Optimal position/length encoding, sorted descending, common baseline.
  • Integrity (2/5 - CRITICAL): Y-axis starts at 80 (exaggerates differences), No historical context.
  • Aesthetics (4/5): Clean, professional. Minor: Could use brand colors.

Average: 3.75/5 (barely passes). Critical issue: Integrity <3 unacceptable.

Fixes: (1) Start y-axis at zero or add break symbol + "Axis truncated" note. (2) Add Q3 baseline for context. (3) Annotate: "West region led Q4 with 23% increase."

After fixes: Clarity 5/5, Efficiency 5/5, Integrity 5/5, Aesthetics 4/5 = 4.75/5 Excellent