# Information Architecture: Advanced Methodology This document covers advanced techniques for card sorting analysis, taxonomy design, navigation optimization, and findability improvement. ## Table of Contents 1. [Card Sorting Analysis](#1-card-sorting-analysis) 2. [Taxonomy Design Principles](#2-taxonomy-design-principles) 3. [Navigation Depth & Breadth Optimization](#3-navigation-depth--breadth-optimization) 4. [Information Scent & Findability](#4-information-scent--findability) 5. [Advanced Topics](#5-advanced-topics) --- ## 1. Card Sorting Analysis ### Analyzing Card Sort Results **Goal**: Extract meaningful patterns from user groupings ### Similarity Matrix **What it is**: Shows how often users grouped two cards together **How to calculate**: - For each pair of cards, count how many users put them in the same group - Express as percentage: (# users who grouped together) / (total users) **Example**: | | Sign Up | First Login | Quick Start | Reports | Dashboards | |--|---------|-------------|-------------|---------|------------| | Sign Up | - | 85% | 90% | 15% | 10% | | First Login | 85% | - | 88% | 12% | 8% | | Quick Start | 90% | 88% | - | 10% | 12% | | Reports | 15% | 12% | 10% | - | 75% | | Dashboards | 10% | 8% | 12% | 75% | - | **Interpretation**: - **Strong clustering** (>70%): "Sign Up", "First Login", "Quick Start" belong together → "Getting Started" category - **Strong clustering** (75%): "Reports" and "Dashboards" belong together → "Analytics" category - **Weak links** (<20%): "Getting Started" and "Analytics" are distinct categories ### Dendrogram (Hierarchical Clustering) **What it is**: Tree diagram showing hierarchical relationships **How to create**: 1. Start with each card as its own cluster 2. Iteratively merge closest clusters (highest similarity) 3. Continue until all cards in one cluster **Interpreting dendrograms**: - **Short branches**: High agreement (merge early) - **Long branches**: Low agreement (merge late) - **Clusters**: Cut tree at appropriate height to identify categories **Example**: ``` All Cards | ____________________+_____________________ | | Getting Started Features | | ____+____ _____+_____ | | | | Sign Up First Login Analytics Settings | ____+____ | | Reports Dashboards ``` **Insight**: Users see clear distinction between "Getting Started" (onboarding tasks) and "Features" (ongoing use). ### Agreement Score (Consensus) **What it is**: How much users agree on groupings **Calculation methods**: 1. **Category agreement**: % of users who created similar category - Example: 18/20 users (90%) created "Getting Started" category 2. **Pairwise agreement**: Average similarity across all card pairs - Formula: Sum(all pairwise similarities) / Number of pairs - High score (>70%) = strong consensus - Low score (<50%) = weak consensus, need refinement **When consensus is low**: - Cards may be ambiguous (clarify labels) - Users have different mental models (consider multiple navigation paths) - Category is too broad (split into subcategories) ### Outlier Cards **What they are**: Cards that don't fit anywhere consistently **How to identify**: Low similarity with all other cards (<30% with any card) **Common reasons**: - Card label is unclear → Rewrite card - Content doesn't belong in product → Remove - Content is unique → Create standalone category or utility link **Example**: "Billing" card — 15 users put it in "Settings", 3 in "Account", 2 didn't categorize it - **Action**: Clarify if "Billing" is settings (configuration) or account (transactions) --- ## 2. Taxonomy Design Principles ### Mutually Exclusive, Collectively Exhaustive (MECE) **Principle**: Categories don't overlap AND cover all content **Mutually exclusive**: Each item belongs to exactly ONE category - **Bad**: "Products" and "Best Sellers" (best sellers are also products — overlap) - **Good**: "Products" (all) and "Featured" (separate facet or tag) **Collectively exhaustive**: Every item has a category - **Bad**: Categories: "Electronics", "Clothing" — but you also sell "Books" (gap) - **Good**: Add "Books" OR create "Other" catch-all **Testing MECE**: 1. List all content items 2. Try to categorize each 3. If item fits >1 category → not mutually exclusive 4. If item fits 0 categories → not collectively exhaustive ### Polyhierarchy vs. Faceted Classification **Polyhierarchy**: Item can live in multiple places in hierarchy - **Example**: "iPhone case" could be in: - Electronics > Accessories > Phone Accessories - Gifts > Under $50 > Tech Gifts - **Pro**: Matches multiple user mental models - **Con**: Confusing (where is "canonical" location?), hard to maintain **Faceted classification**: Item has ONE location, multiple orthogonal attributes - **Example**: "iPhone case" is in Electronics (primary category) - Facet 1: Category = Electronics - Facet 2: Price = Under $50 - Facet 3: Use Case = Gifts - **Pro**: Clear, flexible filtering, scalable - **Con**: Requires good facet design **When to use each**: - **Polyhierarchy**: Small content sets (<500 items), clear user need for multiple paths - **Faceted**: Large content sets (>500 items), many attributes, users need flexible filtering ### Controlled Vocabulary vs. Folksonomy **Controlled vocabulary**: Preset tags, curated by admins - **Example**: "Authentication", "API", "Database" (exact tags, no variations) - **Pro**: Consistency, findability, no duplication ("Auth" vs "Authentication") - **Con**: Requires maintenance, may miss user terminology **Folksonomy**: User-generated tags, anyone can create - **Example**: Users tag articles with whatever terms they want - **Pro**: Emergent, captures user language, low maintenance - **Con**: Inconsistent, duplicates, noise ("Auth", "Authentication", "auth", "Authn") **Hybrid approach** (recommended): - Controlled vocabulary for core categories and facets - Folksonomy for supplementary tags (with moderation) - Periodically review folksonomy tags → promote common ones to controlled vocabulary **Tag moderation**: - Merge synonyms: "Auth" → "Authentication" - Remove noise: "asdf", "test" - Suggest tags: When user types "auth", suggest "Authentication" ### Category Size & Balance **Guideline**: Aim for balanced category sizes (no one category dominates) **Red flags**: - **One huge category**: "Other" with 60% of items → need better taxonomy - **Many tiny categories**: 20 categories, each with 2-5 items → over-categorization, consolidate - **Unbalanced tree**: One branch 5 levels deep, others 2 levels → inconsistent complexity **Target distribution**: - Top-level categories: 5-9 categories - Each category: Roughly equal # of items (within 2× of each other) - If one category much larger: Split into subcategories **Example**: E-commerce with 1000 products - **Bad**: Electronics (600), Clothing (300), Books (80), Other (20) - **Good**: Electronics (250), Clothing (250), Books (250), Home & Garden (250) ### Taxonomy Evolution **Principle**: Taxonomies grow and change — design for evolution **Strategies**: 1. **Leave room for growth**: Don't create 10 top-level categories if you'll need 15 next year 2. **Use "Other" temporarily**: New category emerging but not big enough yet? Use "Other" until critical mass 3. **Versioning**: Date taxonomy versions, track changes over time 4. **Deprecation**: Don't delete categories immediately — mark "deprecated", redirect users, then remove after transition period **Example**: Software product adding ML features - **Today**: 20 ML-related articles scattered across "Advanced", "API", "Tutorials" - **Transition**: Create "Machine Learning" subcategory under "Advanced" - **Future**: 100 ML articles → Promote "Machine Learning" to top-level category --- ## 3. Navigation Depth & Breadth Optimization ### Hick's Law & Choice Overload **Hick's Law**: Decision time increases logarithmically with number of choices **Formula**: Time = a + b × log₂(n + 1) - More choices → longer time to decide **Implications for IA**: - **5-9 items per level**: Sweet spot (Miller's "7±2") - **>12 items**: Users feel overwhelmed, scan inefficiently - **<3 items**: Feels unnecessarily nested **Example**: - 100 items, flat (1 level, 100 choices): Overwhelming - 100 items, 2 levels (10 × 10): Manageable - 100 items, 4 levels (3 × 3 × 3 × 4): Too many clicks **Optimal for 100 items**: 3 levels (5 × 5 × 4) or (7 × 7 × 2) ### The "3-Click Rule" Myth **Myth**: Users abandon if content requires >3 clicks **Reality**: Users tolerate clicks if: 1. **Progress is clear**: Breadcrumbs, page titles show "getting closer" 2. **Information scent is strong**: Each click brings them closer to goal (see Section 4) 3. **No dead ends**: Every click leads somewhere useful **Research** (UIE study): Users successfully completed tasks requiring 5-12 clicks when navigation was clear **Guideline**: Minimize clicks, but prioritize clarity over absolute number - **Good**: 5 clear, purposeful clicks - **Bad**: 2 clicks but confusing labels, users backtrack ### Breadth-First vs. Depth-First Navigation **Breadth-first** (shallow, many top-level options): - **Structure**: 10-15 top-level categories, 2-3 levels deep - **Best for**: Browsing, exploration, users know general area but not exact item - **Example**: News sites, e-commerce homepages **Depth-first** (narrow, few top-level but deep): - **Structure**: 3-5 top-level categories, 4-6 levels deep - **Best for**: Specific lookup, expert users, hierarchical domains - **Example**: Technical documentation, academic libraries **Hybrid** (recommended for most): - **Structure**: 5-7 top-level categories, 3-4 levels deep - **Supplement with**: Search, filters, related links to "shortcut" across hierarchy ### Progressive Disclosure **Principle**: Start simple, reveal complexity on-demand **Techniques**: 1. **Hub-and-spoke**: Overview page → Detailed pages - Hub: "Getting Started" with 5 clear entry points - Spokes: Detailed guides linked from hub 2. **Accordion/Collapse**: Hide detail until user expands - Navigation: Show categories, hide subcategories until expanded - Content: Show summary, expand for full text 3. **Tiered navigation**: Primary nav (always visible) + secondary nav (contextual) - Primary: "Products", "Support", "About" - Secondary (when in "Products"): "Electronics", "Clothing", "Books" 4. **"More..." links**: Show top N items, hide rest until "Show more" clicked - Navigation: Top 5 categories visible, "+3 more" link expands **Anti-pattern**: Mega-menus showing everything at once (overwhelming) --- ## 4. Information Scent & Findability ### Information Scent **Definition**: Cues that indicate whether a path will lead to desired information **Strong scent**: Clear labels, descriptive headings, users click confidently **Weak scent**: Vague labels, users guess, backtrack often **Example**: - **Weak scent**: "Solutions" → What's in there? (generic) - **Strong scent**: "Developer API Documentation" → Clear what's inside **Optimizing information scent**: 1. **Specific labels** (not generic): - Bad: "Resources" → Too vague - Good: "Code Samples", "Video Tutorials", "White Papers" → Specific 2. **Trigger words** (match user vocabulary): - Card sort reveals users say "How do I..." → Label category "How-To Guides" - Users search "pricing" → Ensure "Pricing" in nav, not "Plans" or "Subscription" 3. **Descriptive breadcrumbs**: - Bad: "Home > Section 1 > Page 3" → No meaning - Good: "Home > Developer Docs > API Reference" → Clear path 4. **Preview text**: Show snippet of content under link - Navigation item: "API Reference" + "Complete list of endpoints and parameters" ### Findability Metrics **Key metrics to track**: 1. **Time to find**: How long to locate content? - **Target**: <30 sec for simple tasks, <2 min for complex - **Measurement**: Task completion time in usability tests 2. **Success rate**: % of users who find content? - **Target**: ≥70% (tree test), ≥80% (live site with search) - **Measurement**: Tree test results, task success in usability tests 3. **Search vs. browse**: Do users search or navigate? - **Good**: 40-60% browse, 40-60% search (both work) - **Bad**: 90% search (navigation broken), 90% browse (search broken) - **Measurement**: Analytics (search usage %, nav click-through) 4. **Search refinement rate**: % of searches that are refined? - **Target**: <30% (users find on first search) - **Bad**: >50% (users search, refine, search again → poor results) - **Measurement**: Analytics (queries per session) 5. **Bounce rate by entry point**: % leaving immediately? - **Target**: <40% for landing pages - **Bad**: >60% (users don't find what they expected) - **Measurement**: Analytics (bounce rate by page) 6. **Navigation abandonment**: % who start navigating, then leave? - **Target**: <20% - **Bad**: >40% (users get lost, give up) - **Measurement**: Analytics (drop-off in navigation funnels) ### Search vs. Navigation Trade-offs **When search is preferred**: - Large content sets (>5000 items) - Users know exactly what they want ("lookup" mode) - Diverse content types (hard to categorize consistently) **When navigation is preferred**: - Smaller content sets (<500 items) - Users browsing, exploring ("discovery" mode) - Hierarchical domains (clear parent-child relationships) **Best practice**: Offer BOTH - Navigation for discovery, context, exploration - Search for lookup, speed, known-item finding **Optimizing search**: - **Autocomplete**: Suggest as user types - **Filters**: Narrow results by category, date, type - **Best bets**: Featured results for common queries - **Zero-results page**: Suggest alternatives, show popular content **Optimizing navigation**: - **Clear labels**: Match user vocabulary (card sort insights) - **Faceted filters**: Browse + filter combination - **Related links**: Help users discover adjacent content - **Breadcrumbs**: Show path, enable backtracking --- ## 5. Advanced Topics ### Mental Models & User Research **Mental model**: User's internal representation of how system works **Why it matters**: Navigation should match user's mental model, not company's org chart **Researching mental models**: 1. **Card sorting**: Reveals how users group/label content 2. **User interviews**: Ask "How would you organize this?" "What would you call this?" 3. **Tree testing**: Validates if proposed structure matches mental model 4. **First-click testing**: Where do users expect to find X? **Common mismatches**: - **Company thinks**: "Features" (technical view) - **Users think**: "What can I do?" (task view) - **Solution**: Rename to task-based labels ("Create Report", "Share Dashboard") **Example**: SaaS product - **Internal (wrong)**: "Modules" → "Synergistic Solutions" → "Widget Management" - **User mental model (right)**: "Features" → "Reporting" → "Custom Reports" ### Cross-Cultural IA **Challenge**: Different cultures have different categorization preferences **Examples**: - **Alphabetical**: Works for Latin scripts, not ideographic (Chinese, Japanese) - **Color coding**: Red = danger (Western), Red = luck (Chinese) - **Icons**: Mailbox icon = email (US), doesn't translate (many countries have different mailbox designs) **Strategies**: 1. **Localization testing**: Card sort with target culture users 2. **Avoid culturally-specific metaphors**: "Home run", "touchdown" (US sports) 3. **Simple, universal labels**: "Home", "Search", "Help" (widely understood) 4. **Icons + text**: Don't rely on icons alone ### IA Governance **Problem**: Taxonomy degrades over time without maintenance **Governance framework**: 1. **Roles**: - **Content owner**: Publishes content, assigns categories/tags - **Taxonomy owner**: Maintains category structure, adds/removes categories - **IA steward**: Monitors usage, recommends improvements 2. **Processes**: - **Quarterly review**: Check taxonomy usage, identify issues - **Change request**: How to propose new categories or restructure - **Deprecation**: Process for removing outdated categories - **Tag moderation**: Review user-generated tags, merge synonyms 3. **Metrics to monitor**: - % content in "Other" or "Uncategorized" (should be <5%) - Empty categories (no content) — remove or consolidate - Oversized categories (>50% of content) — split into subcategories 4. **Tools**: - CMS with taxonomy management - Analytics to track usage - Automated alerts (e.g., "Category X has no content") ### Personalization & Dynamic IA **Concept**: Navigation adapts to user **Approaches**: 1. **Audience-based**: Show different nav for different user types - "For Developers", "For Marketers", "For Executives" 2. **History-based**: Prioritize recently visited or frequently used - "Recently Viewed", "Your Favorites" 3. **Context-based**: Show nav relevant to current task - "Related Articles", "Next Steps" 4. **Adaptive search**: Results ranked by user's past behavior **Caution**: Don't over-personalize - Users need consistency to build mental model - Personalization should augment, not replace, standard navigation ### IA for Voice & AI Interfaces **Challenge**: Traditional visual hierarchy doesn't work for voice **Strategies**: 1. **Flat structure**: No deep nesting (can't show menu) 2. **Natural language categories**: "Where can I find information about X?" vs. "Navigate to Category > Subcategory" 3. **Conversational**: "What would you like to do?" vs. "Select option 1, 2, or 3" 4. **Context-aware**: Remember user's previous question, continue conversation **Example**: - **Web**: Home > Products > Electronics > Phones - **Voice**: "Show me phones" → "Here are our top phone options..." --- ## Summary **Card sorting** reveals user mental models through similarity matrices, dendrograms, and consensus scores. Outliers indicate unclear content. **Taxonomy design** follows MECE principle (mutually exclusive, collectively exhaustive). Use faceted classification for scale, controlled vocabulary for consistency, and plan for evolution. **Navigation optimization** balances breadth (many choices) vs. depth (many clicks). Optimal: 5-9 items per level, 3-4 levels deep. Progressive disclosure reduces initial complexity. **Information scent** guides users with clear labels, trigger words, and descriptive breadcrumbs. Track findability metrics: time to find (<30 sec), success rate (≥70%), search vs. browse balance (40-60% each). **Advanced techniques** include mental model research (card sort, interviews), cross-cultural adaptation, governance frameworks, personalization, and voice interface design. **The goal**: Users can predict where information lives and find it quickly, regardless of access method.