gh-lyndonkl-claude/skills/information-architecture/resources/methodology.md

# Information Architecture: Advanced Methodology

This document covers advanced techniques for card sorting analysis, taxonomy design, navigation optimization, and findability improvement.

## Table of Contents
1. [Card Sorting Analysis](#1-card-sorting-analysis)
2. [Taxonomy Design Principles](#2-taxonomy-design-principles)
3. [Navigation Depth & Breadth Optimization](#3-navigation-depth--breadth-optimization)
4. [Information Scent & Findability](#4-information-scent--findability)
5. [Advanced Topics](#5-advanced-topics)

---

## 1. Card Sorting Analysis

### Analyzing Card Sort Results

**Goal**: Extract meaningful patterns from user groupings

### Similarity Matrix

**What it is**: Shows how often users grouped two cards together

**How to calculate**:
- For each pair of cards, count how many users put them in the same group
- Express as percentage: (# users who grouped together) / (total users)

**Example**:

|  | Sign Up | First Login | Quick Start | Reports | Dashboards |
|--|---------|-------------|-------------|---------|------------|
| Sign Up | - | 85% | 90% | 15% | 10% |
| First Login | 85% | - | 88% | 12% | 8% |
| Quick Start | 90% | 88% | - | 10% | 12% |
| Reports | 15% | 12% | 10% | - | 75% |
| Dashboards | 10% | 8% | 12% | 75% | - |

**Interpretation**:
- **Strong clustering** (>70%): "Sign Up", "First Login", "Quick Start" belong together → "Getting Started" category
- **Strong clustering** (75%): "Reports" and "Dashboards" belong together → "Analytics" category
- **Weak links** (<20%): "Getting Started" and "Analytics" are distinct categories

### Dendrogram (Hierarchical Clustering)

**What it is**: Tree diagram showing hierarchical relationships

**How to create**:
1. Start with each card as its own cluster
2. Iteratively merge closest clusters (highest similarity)
3. Continue until all cards in one cluster

**Interpreting dendrograms**:
- **Short branches**: High agreement (merge early)
- **Long branches**: Low agreement (merge late)
- **Clusters**: Cut tree at appropriate height to identify categories

**Example**:
```
                        All Cards
                            |
        ____________________+_____________________
        |                                         |
    Getting Started                          Features
        |                                         |
    ____+____                              _____+_____
    |        |                            |           |
  Sign Up  First Login              Analytics    Settings
                                        |
                                    ____+____
                                    |        |
                                 Reports  Dashboards
```

**Insight**: Users see clear distinction between "Getting Started" (onboarding tasks) and "Features" (ongoing use).

### Agreement Score (Consensus)

**What it is**: How much users agree on groupings

**Calculation methods**:

1. **Category agreement**: % of users who created similar category
   - Example: 18/20 users (90%) created "Getting Started" category

2. **Pairwise agreement**: Average similarity across all card pairs
   - Formula: Sum(all pairwise similarities) / Number of pairs
   - High score (>70%) = strong consensus
   - Low score (<50%) = weak consensus, need refinement

**When consensus is low**:
- Cards may be ambiguous (clarify labels)
- Users have different mental models (consider multiple navigation paths)
- Category is too broad (split into subcategories)

### Outlier Cards

**What they are**: Cards that don't fit anywhere consistently

**How to identify**: Low similarity with all other cards (<30% with any card)

**Common reasons**:
- Card label is unclear → Rewrite card
- Content doesn't belong in product → Remove
- Content is unique → Create standalone category or utility link

**Example**: "Billing" card — 15 users put it in "Settings", 3 in "Account", 2 didn't categorize it
- **Action**: Clarify if "Billing" is settings (configuration) or account (transactions)

---

## 2. Taxonomy Design Principles

### Mutually Exclusive, Collectively Exhaustive (MECE)

**Principle**: Categories don't overlap AND cover all content

**Mutually exclusive**: Each item belongs to exactly ONE category
- **Bad**: "Products" and "Best Sellers" (best sellers are also products — overlap)
- **Good**: "Products" (all) and "Featured" (separate facet or tag)

**Collectively exhaustive**: Every item has a category
- **Bad**: Categories: "Electronics", "Clothing" — but you also sell "Books" (gap)
- **Good**: Add "Books" OR create "Other" catch-all

**Testing MECE**:
1. List all content items
2. Try to categorize each
3. If item fits >1 category → not mutually exclusive
4. If item fits 0 categories → not collectively exhaustive

### Polyhierarchy vs. Faceted Classification

**Polyhierarchy**: Item can live in multiple places in hierarchy
- **Example**: "iPhone case" could be in:
  - Electronics > Accessories > Phone Accessories
  - Gifts > Under $50 > Tech Gifts
- **Pro**: Matches multiple user mental models
- **Con**: Confusing (where is "canonical" location?), hard to maintain

**Faceted classification**: Item has ONE location, multiple orthogonal attributes
- **Example**: "iPhone case" is in Electronics (primary category)
  - Facet 1: Category = Electronics
  - Facet 2: Price = Under $50
  - Facet 3: Use Case = Gifts
- **Pro**: Clear, flexible filtering, scalable
- **Con**: Requires good facet design

**When to use each**:
- **Polyhierarchy**: Small content sets (<500 items), clear user need for multiple paths
- **Faceted**: Large content sets (>500 items), many attributes, users need flexible filtering

### Controlled Vocabulary vs. Folksonomy

**Controlled vocabulary**: Preset tags, curated by admins
- **Example**: "Authentication", "API", "Database" (exact tags, no variations)
- **Pro**: Consistency, findability, no duplication ("Auth" vs "Authentication")
- **Con**: Requires maintenance, may miss user terminology

**Folksonomy**: User-generated tags, anyone can create
- **Example**: Users tag articles with whatever terms they want
- **Pro**: Emergent, captures user language, low maintenance
- **Con**: Inconsistent, duplicates, noise ("Auth", "Authentication", "auth", "Authn")

**Hybrid approach** (recommended):
- Controlled vocabulary for core categories and facets
- Folksonomy for supplementary tags (with moderation)
- Periodically review folksonomy tags → promote common ones to controlled vocabulary

**Tag moderation**:
- Merge synonyms: "Auth" → "Authentication"
- Remove noise: "asdf", "test"
- Suggest tags: When user types "auth", suggest "Authentication"

### Category Size & Balance

**Guideline**: Aim for balanced category sizes (no one category dominates)

**Red flags**:
- **One huge category**: "Other" with 60% of items → need better taxonomy
- **Many tiny categories**: 20 categories, each with 2-5 items → over-categorization, consolidate
- **Unbalanced tree**: One branch 5 levels deep, others 2 levels → inconsistent complexity

**Target distribution**:
- Top-level categories: 5-9 categories
- Each category: Roughly equal # of items (within 2× of each other)
- If one category much larger: Split into subcategories

**Example**: E-commerce with 1000 products
- **Bad**: Electronics (600), Clothing (300), Books (80), Other (20)
- **Good**: Electronics (250), Clothing (250), Books (250), Home & Garden (250)

### Taxonomy Evolution

**Principle**: Taxonomies grow and change — design for evolution

**Strategies**:
1. **Leave room for growth**: Don't create 10 top-level categories if you'll need 15 next year
2. **Use "Other" temporarily**: New category emerging but not big enough yet? Use "Other" until critical mass
3. **Versioning**: Date taxonomy versions, track changes over time
4. **Deprecation**: Don't delete categories immediately — mark "deprecated", redirect users, then remove after transition period

**Example**: Software product adding ML features
- **Today**: 20 ML-related articles scattered across "Advanced", "API", "Tutorials"
- **Transition**: Create "Machine Learning" subcategory under "Advanced"
- **Future**: 100 ML articles → Promote "Machine Learning" to top-level category

---

## 3. Navigation Depth & Breadth Optimization

### Hick's Law & Choice Overload

**Hick's Law**: Decision time increases logarithmically with number of choices

**Formula**: Time = a + b × log₂(n + 1)
- More choices → longer time to decide

**Implications for IA**:
- **5-9 items per level**: Sweet spot (Miller's "7±2")
- **>12 items**: Users feel overwhelmed, scan inefficiently
- **<3 items**: Feels unnecessarily nested

**Example**:
- 100 items, flat (1 level, 100 choices): Overwhelming
- 100 items, 2 levels (10 × 10): Manageable
- 100 items, 4 levels (3 × 3 × 3 × 4): Too many clicks

**Optimal for 100 items**: 3 levels (5 × 5 × 4) or (7 × 7 × 2)

### The "3-Click Rule" Myth

**Myth**: Users abandon if content requires >3 clicks

**Reality**: Users tolerate clicks if:
1. **Progress is clear**: Breadcrumbs, page titles show "getting closer"
2. **Information scent is strong**: Each click brings them closer to goal (see Section 4)
3. **No dead ends**: Every click leads somewhere useful

**Research** (UIE study): Users successfully completed tasks requiring 5-12 clicks when navigation was clear

**Guideline**: Minimize clicks, but prioritize clarity over absolute number
- **Good**: 5 clear, purposeful clicks
- **Bad**: 2 clicks but confusing labels, users backtrack

### Breadth-First vs. Depth-First Navigation

**Breadth-first** (shallow, many top-level options):
- **Structure**: 10-15 top-level categories, 2-3 levels deep
- **Best for**: Browsing, exploration, users know general area but not exact item
- **Example**: News sites, e-commerce homepages

**Depth-first** (narrow, few top-level but deep):
- **Structure**: 3-5 top-level categories, 4-6 levels deep
- **Best for**: Specific lookup, expert users, hierarchical domains
- **Example**: Technical documentation, academic libraries

**Hybrid** (recommended for most):
- **Structure**: 5-7 top-level categories, 3-4 levels deep
- **Supplement with**: Search, filters, related links to "shortcut" across hierarchy

### Progressive Disclosure

**Principle**: Start simple, reveal complexity on-demand

**Techniques**:

1. **Hub-and-spoke**: Overview page → Detailed pages
   - Hub: "Getting Started" with 5 clear entry points
   - Spokes: Detailed guides linked from hub

2. **Accordion/Collapse**: Hide detail until user expands
   - Navigation: Show categories, hide subcategories until expanded
   - Content: Show summary, expand for full text

3. **Tiered navigation**: Primary nav (always visible) + secondary nav (contextual)
   - Primary: "Products", "Support", "About"
   - Secondary (when in "Products"): "Electronics", "Clothing", "Books"

4. **"More..." links**: Show top N items, hide rest until "Show more" clicked
   - Navigation: Top 5 categories visible, "+3 more" link expands

**Anti-pattern**: Mega-menus showing everything at once (overwhelming)

---

## 4. Information Scent & Findability

### Information Scent

**Definition**: Cues that indicate whether a path will lead to desired information

**Strong scent**: Clear labels, descriptive headings, users click confidently
**Weak scent**: Vague labels, users guess, backtrack often

**Example**:
- **Weak scent**: "Solutions" → What's in there? (generic)
- **Strong scent**: "Developer API Documentation" → Clear what's inside

**Optimizing information scent**:

1. **Specific labels** (not generic):
   - Bad: "Resources" → Too vague
   - Good: "Code Samples", "Video Tutorials", "White Papers" → Specific

2. **Trigger words** (match user vocabulary):
   - Card sort reveals users say "How do I..." → Label category "How-To Guides"
   - Users search "pricing" → Ensure "Pricing" in nav, not "Plans" or "Subscription"

3. **Descriptive breadcrumbs**:
   - Bad: "Home > Section 1 > Page 3" → No meaning
   - Good: "Home > Developer Docs > API Reference" → Clear path

4. **Preview text**: Show snippet of content under link
   - Navigation item: "API Reference" + "Complete list of endpoints and parameters"

### Findability Metrics

**Key metrics to track**:

1. **Time to find**: How long to locate content?
   - **Target**: <30 sec for simple tasks, <2 min for complex
   - **Measurement**: Task completion time in usability tests

2. **Success rate**: % of users who find content?
   - **Target**: ≥70% (tree test), ≥80% (live site with search)
   - **Measurement**: Tree test results, task success in usability tests

3. **Search vs. browse**: Do users search or navigate?
   - **Good**: 40-60% browse, 40-60% search (both work)
   - **Bad**: 90% search (navigation broken), 90% browse (search broken)
   - **Measurement**: Analytics (search usage %, nav click-through)

4. **Search refinement rate**: % of searches that are refined?
   - **Target**: <30% (users find on first search)
   - **Bad**: >50% (users search, refine, search again → poor results)
   - **Measurement**: Analytics (queries per session)

5. **Bounce rate by entry point**: % leaving immediately?
   - **Target**: <40% for landing pages
   - **Bad**: >60% (users don't find what they expected)
   - **Measurement**: Analytics (bounce rate by page)

6. **Navigation abandonment**: % who start navigating, then leave?
   - **Target**: <20%
   - **Bad**: >40% (users get lost, give up)
   - **Measurement**: Analytics (drop-off in navigation funnels)

### Search vs. Navigation Trade-offs

**When search is preferred**:
- Large content sets (>5000 items)
- Users know exactly what they want ("lookup" mode)
- Diverse content types (hard to categorize consistently)

**When navigation is preferred**:
- Smaller content sets (<500 items)
- Users browsing, exploring ("discovery" mode)
- Hierarchical domains (clear parent-child relationships)

**Best practice**: Offer BOTH
- Navigation for discovery, context, exploration
- Search for lookup, speed, known-item finding

**Optimizing search**:
- **Autocomplete**: Suggest as user types
- **Filters**: Narrow results by category, date, type
- **Best bets**: Featured results for common queries
- **Zero-results page**: Suggest alternatives, show popular content

**Optimizing navigation**:
- **Clear labels**: Match user vocabulary (card sort insights)
- **Faceted filters**: Browse + filter combination
- **Related links**: Help users discover adjacent content
- **Breadcrumbs**: Show path, enable backtracking

---

## 5. Advanced Topics

### Mental Models & User Research

**Mental model**: User's internal representation of how system works

**Why it matters**: Navigation should match user's mental model, not company's org chart

**Researching mental models**:

1. **Card sorting**: Reveals how users group/label content
2. **User interviews**: Ask "How would you organize this?" "What would you call this?"
3. **Tree testing**: Validates if proposed structure matches mental model
4. **First-click testing**: Where do users expect to find X?

**Common mismatches**:
- **Company thinks**: "Features" (technical view)
- **Users think**: "What can I do?" (task view)
- **Solution**: Rename to task-based labels ("Create Report", "Share Dashboard")

**Example**: SaaS product
- **Internal (wrong)**: "Modules" → "Synergistic Solutions" → "Widget Management"
- **User mental model (right)**: "Features" → "Reporting" → "Custom Reports"

### Cross-Cultural IA

**Challenge**: Different cultures have different categorization preferences

**Examples**:
- **Alphabetical**: Works for Latin scripts, not ideographic (Chinese, Japanese)
- **Color coding**: Red = danger (Western), Red = luck (Chinese)
- **Icons**: Mailbox icon = email (US), doesn't translate (many countries have different mailbox designs)

**Strategies**:
1. **Localization testing**: Card sort with target culture users
2. **Avoid culturally-specific metaphors**: "Home run", "touchdown" (US sports)
3. **Simple, universal labels**: "Home", "Search", "Help" (widely understood)
4. **Icons + text**: Don't rely on icons alone

### IA Governance

**Problem**: Taxonomy degrades over time without maintenance

**Governance framework**:

1. **Roles**:
   - **Content owner**: Publishes content, assigns categories/tags
   - **Taxonomy owner**: Maintains category structure, adds/removes categories
   - **IA steward**: Monitors usage, recommends improvements

2. **Processes**:
   - **Quarterly review**: Check taxonomy usage, identify issues
   - **Change request**: How to propose new categories or restructure
   - **Deprecation**: Process for removing outdated categories
   - **Tag moderation**: Review user-generated tags, merge synonyms

3. **Metrics to monitor**:
   - % content in "Other" or "Uncategorized" (should be <5%)
   - Empty categories (no content) — remove or consolidate
   - Oversized categories (>50% of content) — split into subcategories

4. **Tools**:
   - CMS with taxonomy management
   - Analytics to track usage
   - Automated alerts (e.g., "Category X has no content")

### Personalization & Dynamic IA

**Concept**: Navigation adapts to user

**Approaches**:

1. **Audience-based**: Show different nav for different user types
   - "For Developers", "For Marketers", "For Executives"

2. **History-based**: Prioritize recently visited or frequently used
   - "Recently Viewed", "Your Favorites"

3. **Context-based**: Show nav relevant to current task
   - "Related Articles", "Next Steps"

4. **Adaptive search**: Results ranked by user's past behavior

**Caution**: Don't over-personalize
- Users need consistency to build mental model
- Personalization should augment, not replace, standard navigation

### IA for Voice & AI Interfaces

**Challenge**: Traditional visual hierarchy doesn't work for voice

**Strategies**:

1. **Flat structure**: No deep nesting (can't show menu)
2. **Natural language categories**: "Where can I find information about X?" vs. "Navigate to Category > Subcategory"
3. **Conversational**: "What would you like to do?" vs. "Select option 1, 2, or 3"
4. **Context-aware**: Remember user's previous question, continue conversation

**Example**:
- **Web**: Home > Products > Electronics > Phones
- **Voice**: "Show me phones" → "Here are our top phone options..."

---

## Summary

**Card sorting** reveals user mental models through similarity matrices, dendrograms, and consensus scores. Outliers indicate unclear content.

**Taxonomy design** follows MECE principle (mutually exclusive, collectively exhaustive). Use faceted classification for scale, controlled vocabulary for consistency, and plan for evolution.

**Navigation optimization** balances breadth (many choices) vs. depth (many clicks). Optimal: 5-9 items per level, 3-4 levels deep. Progressive disclosure reduces initial complexity.

**Information scent** guides users with clear labels, trigger words, and descriptive breadcrumbs. Track findability metrics: time to find (<30 sec), success rate (≥70%), search vs. browse balance (40-60% each).

**Advanced techniques** include mental model research (card sort, interviews), cross-cultural adaptation, governance frameworks, personalization, and voice interface design.

**The goal**: Users can predict where information lives and find it quickly, regardless of access method.