Files
2025-11-30 08:37:27 +08:00

18 KiB
Raw Permalink Blame History

Risk-Based Testing Guide

Purpose

This guide replaces the traditional Test Pyramid (70/20/10 ratio) with a Value-Based Testing Framework that prioritizes business risk and practical test limits. The goal is to write tests that matter, not to chase coverage metrics.

Problem solved: Traditional Test Pyramid approach generates excessive tests (~200 per Story) by mechanically testing every conditional branch. This creates maintenance burden without proportional business value.

Solution: Risk-Based Testing with clear prioritization criteria and enforced limits (10-28 tests max per Story).

Core Philosophy

Kent Beck's Principle

"Write tests. Not too many. Mostly integration."

Key Insights

  1. Test business value, not code coverage - 80% coverage means nothing if critical payment flow isn't tested
  2. Manual testing has value - Not every scenario needs automated test duplication
  3. Each test has maintenance cost - More tests = more refactoring overhead
  4. Integration tests catch real bugs - Unit tests catch edge cases in isolation
  5. E2E tests validate user value - Only E2E proves the feature actually works end-to-end

Minimum Viable Testing Philosophy

Start Minimal, Justify Additions

Baseline for every Story:

  • 2 E2E tests per endpoint: Positive scenario (happy path) + Negative scenario (critical error)
  • 0 Integration tests (E2E covers full stack by default)
  • 0 Unit tests (E2E covers simple logic by default)

Realistic goal: 2-7 tests per Story (not 10-28!)

Additional tests ONLY with critical justification:

  • Test #3 and beyond: Each requires documented answer to "Why does this test OUR business logic (not framework/library/database)?"
  • Priority ≥15 required for all additional tests
  • Auto-trim to 7 tests if plan exceeds realistic goal

Critical Justification Questions

Before adding ANY test beyond 2 baseline E2E, answer:

  1. Does this test OUR business logic?

    • YES: Tax calculation with country-specific rules (OUR algorithm)
    • NO: bcrypt hashing (library behavior)
    • NO: Prisma query execution (framework behavior)
    • NO: PostgreSQL LIKE operator (database behavior)
  2. Is this already covered by 2 baseline E2E tests?

    • NO: E2E doesn't exercise all branches of complex calculation
    • YES: E2E test validates full flow end-to-end
  3. Priority ≥15?

    • YES: Money, security, data integrity
    • NO: Skip, manual testing sufficient
  4. Unique business value?

    • YES: Tests different scenario than existing tests
    • NO: Duplicate coverage

If ANY answer is NO → SKIP this test

Risk Priority Matrix

Calculation Formula

Priority = Business Impact (1-5) × Probability of Failure (1-5)

Result ranges:

  • Priority ≥15 (15-25): MUST test - critical scenarios
  • Priority 9-14: SHOULD test if not already covered
  • Priority ≤8 (1-8): SKIP - manual testing sufficient

Business Impact Scoring (1-5)

Score Impact Level Examples
5 Critical Money loss, security breach, data corruption, legal liability
4 High Core business flow breaks (cannot complete purchase, cannot login)
3 Medium Feature partially broken (search works but pagination fails)
2 Low Minor UX issue (button disabled state wrong, tooltip missing)
1 Trivial Cosmetic bug (color slightly off, spacing issue)

Probability of Failure Scoring (1-5)

Score Probability Indicators
5 Very High (>50%) Complex algorithm, external API, new technology, no existing tests
4 High (25-50%) Multiple dependencies, concurrency, state management
3 Medium (10-25%) Standard CRUD, framework defaults, well-tested patterns
2 Low (5-10%) Simple logic, established library, copy-paste from working code
1 Very Low (<5%) Trivial assignment, framework-generated code

Priority Matrix Table

Probability 1 Probability 2 Probability 3 Probability 4 Probability 5
Impact 5 5 (SKIP) 10 (SHOULD) 15 (MUST) 20 (MUST) 25 (MUST)
Impact 4 4 (SKIP) 8 (SKIP) 12 (SHOULD) 16 (MUST) 20 (MUST)
Impact 3 3 (SKIP) 6 (SKIP) 9 (SHOULD) 12 (SHOULD) 15 (MUST)
Impact 2 2 (SKIP) 4 (SKIP) 6 (SKIP) 8 (SKIP) 10 (SHOULD)
Impact 1 1 (SKIP) 2 (SKIP) 3 (SKIP) 4 (SKIP) 5 (SKIP)

Test Type Decision Tree

Step 1: Calculate Risk Priority

Use Risk Priority Matrix above.

Step 2: Select Test Type

IF Priority ≥15 → Proceed to Step 3
ELSE IF Priority 9-14 → Check Anti-Duplication (Step 4), then Step 3
ELSE Priority ≤8 → SKIP (manual testing sufficient)

Step 3: Choose Test Level

E2E Test (2-5 max per Story):

  • BASELINE (ALWAYS): 2 E2E tests per endpoint
    • Test 1: Positive scenario (happy path validating main AC)
    • Test 2: Negative scenario (critical error handling)
  • ADDITIONAL (3-5): ONLY if Priority ≥15 AND justified
    • Critical edge case from manual testing
    • Second endpoint (if Story implements multiple endpoints)
  • Examples:
    • User registers → receives email → confirms → can login
    • User adds product → proceeds to checkout → pays → sees confirmation
    • User uploads file → sees progress → file appears in list

Integration Test (0-8 max per Story):

  • DEFAULT: 0 Integration tests (2 E2E tests cover full stack by default)
  • ADD ONLY if: E2E doesn't cover interaction completely AND Priority ≥15 AND justified
  • Examples:
    • Transaction rollback on error (E2E tests happy path only)
    • Concurrent request handling (E2E tests single request)
    • External API error scenarios (500, timeout) with Priority ≥15
  • MANDATORY SKIP:
    • Simple pass-through calls (E2E already validates end-to-end)
    • Testing framework integrations (Prisma client, TypeORM repository, Express app)
    • Testing database query execution (database engine behavior)

Unit Test (0-15 max per Story):

  • DEFAULT: 0 Unit tests (2 E2E tests cover simple logic by default)
  • ADD ONLY for complex business logic with Priority ≥15:
    • Financial calculations (tax, discount, currency conversion) WITH COMPLEX RULES
    • Security algorithms (password strength, permission matrix) WITH CUSTOM LOGIC
    • Complex business algorithms (scoring, matching, ranking) WITH MULTIPLE FACTORS
  • MANDATORY SKIP - DO NOT create unit tests for:
    • Simple CRUD operations (already covered by E2E)
    • Framework code (Express middleware, React hooks, FastAPI dependencies)
    • Library functions (bcrypt hashing, jsonwebtoken signing, axios requests)
    • Database queries (Prisma findMany, TypeORM query builder, SQL joins)
    • Getters/setters or simple property access
    • Trivial conditionals (if (user) return user.name, status === 'active')
    • Pass-through functions (wrappers without logic)
    • Performance/load testing (benchmarks, stress tests, scalability validation)

Step 4: Anti-Duplication Check

Before writing ANY test, verify:

  1. Is this scenario already covered by E2E?

    • E2E tests payment flow → SKIP unit test for calculateTotal()
    • E2E tests login → SKIP unit test for validateEmail()
  2. Is this testing framework code?

    • Testing Express app.use() → SKIP
    • Testing React useState → SKIP
    • Testing Prisma findMany() → SKIP
  3. Does this add unique business value?

    • E2E tests happy path → Unit test for edge case (negative price) → KEEP
    • Integration test already validates DB transaction → SKIP duplicate unit test
  4. Is this a one-line function?

    • getFullName() { return firstName + lastName } → SKIP (E2E covers it)

Test Limits Per Story

Enforced Limits with Realistic Goals

Test Type Minimum Realistic Goal Maximum Purpose
E2E 2 2 5 Baseline: positive + negative per endpoint
Integration 0 0-2 8 ONLY if E2E doesn't cover interaction
Unit 0 0-3 15 ONLY complex business logic (financial/security/algorithms)
TOTAL 2 2-7 28 Start minimal, add only with justification

Key Change: Test limits are now CEILINGS (maximum allowed), NOT targets to fill. Start with 2 E2E tests, add more only with critical justification.

Rationale for Limits

Why maximum 5 E2E?

  • E2E tests are slow (10-60 seconds each)
  • Each Story typically has 2-4 Acceptance Criteria
  • 1-2 E2E per AC is sufficient
  • Edge cases covered by Integration/Unit tests

Why maximum 8 Integration?

  • Integration tests validate layer interactions
  • Typical Story has 3-5 integration points (API → Service → DB)
  • 1-2 tests per integration point + error scenarios

Why maximum 15 Unit?

  • Only test complex business logic
  • Typical Story has 2-4 complex functions
  • 3-5 tests per function (happy path + edge cases)

Why total maximum 28?

  • Industry data: Stories with >30 tests rarely have proportional bug prevention
  • Maintenance cost grows quadratically beyond this point
  • Focus on quality over quantity

Common Over-Testing Anti-Patterns

Anti-Pattern 1: "Every if/else needs a test"

Bad:

// Function with 10 if/else branches
function processOrder(order) {
  if (!order) return null;           // Test 1
  if (!order.items) return null;      // Test 2
  if (order.items.length === 0) return null; // Test 3
  // ... 7 more conditionals
}

Problem: 10 unit tests for trivial validation logic already covered by E2E test that calls processOrder().

Good:

  • 1 E2E test: User submits valid order → success
  • 1 E2E test: User submits invalid order → error message
  • 1 Unit test: Complex tax calculation inside processOrder() (if exists)

Total: 3 tests instead of 12

Anti-Pattern 2: "Testing framework code"

Bad:

// Testing Express middleware
test('CORS middleware sets headers', () => {
  // Testing Express, not OUR code
});

// Testing React hook
test('useState updates component', () => {
  // Testing React, not OUR code
});

Good:

  • Trust framework tests (Express/React have thousands of tests)
  • Test OUR business logic that USES framework

Anti-Pattern 3: "Duplicating E2E coverage with Unit tests"

Bad:

// E2E already tests: POST /api/orders → creates order in DB
test('E2E: User can create order', ...);          // E2E
test('Unit: createOrder() inserts to database', ...); // Duplicate!
test('Unit: createOrder() returns order object', ...); // Duplicate!

Good:

// E2E tests full flow
test('E2E: User can create order', ...);

// Unit tests ONLY complex calculation NOT fully exercised by E2E
test('Unit: Bulk discount applied when quantity > 100', ...);

Anti-Pattern 4: "Aiming for 80% coverage"

Bad mindset:

  • "We have 75% coverage, need 5 more tests to hit 80%"
  • Writes tests for trivial getters/setters to inflate coverage

Good mindset:

  • "Payment flow is critical (Priority 25) but only has 1 E2E test"
  • "We have 60% coverage but all critical paths tested - DONE"

Anti-Pattern 5: "Testing framework integration"

Bad:

// Testing Express framework behavior
test('Express middleware chain works', () => {
  // Testing Express.js, not OUR code
});

// Testing Prisma client behavior
test('Prisma findMany returns array', () => {
  // Testing Prisma, not OUR code
});

// Testing React hook behavior
test('useState triggers rerender', () => {
  // Testing React, not OUR code
});

Why bad: Frameworks have thousands of tests. Trust the framework, test OUR business logic that USES the framework.

Good:

// Test OUR business logic that uses framework
test('E2E: User can create order', () => {
  // Tests OUR endpoint logic (which happens to use Express + Prisma)
  // But we're validating OUR business rules, not framework behavior
});

Anti-Pattern 6: "Testing database query syntax"

Bad:

// Testing database query execution
test('findByEmail() returns user from database', () => {
  await prisma.user.findUnique({ where: { email: 'test@example.com' }});
  // Testing Prisma query builder, not OUR logic
});

// Testing SQL JOIN behavior
test('getUserWithOrders() joins tables correctly', () => {
  // Testing PostgreSQL JOIN semantics, not OUR logic
});

Why bad: Database engines have extensive test suites. We're testing PostgreSQL/MySQL, not our code.

Good:

// E2E test already validates query works
test('E2E: User can view order history', () => {
  // Implicitly validates that JOIN query works correctly
  // We test the USER OUTCOME, not the database mechanism
});

// Unit test ONLY for complex query construction logic
test('buildSearchQuery() with multiple filters generates correct WHERE clause', () => {
  // ONLY if we have complex query building logic with business rules
  // NOT testing database execution, testing OUR query builder logic
});

Anti-Pattern 7: "Testing library behavior"

Bad:

// Testing bcrypt library
test('bcrypt hashes password correctly', () => {
  const hash = await bcrypt.hash('password', 10);
  const valid = await bcrypt.compare('password', hash);
  expect(valid).toBe(true);
  // Testing bcrypt library, not OUR code
});

// Testing jsonwebtoken library
test('JWT token is valid', () => {
  const token = jwt.sign({ userId: 1 }, SECRET);
  const decoded = jwt.verify(token, SECRET);
  // Testing jsonwebtoken library, not OUR code
});

// Testing axios library
test('axios makes HTTP request', () => {
  await axios.get('https://api.example.com');
  // Testing axios library, not OUR code
});

Why bad: Libraries are already tested by their maintainers. We're duplicating their test suite.

Good:

// E2E test validates full authentication flow
test('E2E: User can login and access protected endpoint', () => {
  // Implicitly validates that bcrypt comparison works
  // AND that JWT token generation/validation works
  // But we test the USER FLOW, not library internals
});

// Unit test ONLY for custom password rules (OUR business logic)
test('validatePasswordStrength() requires 12+ chars with special symbols', () => {
  // Testing OUR custom password policy, not bcrypt itself
});

When to Break the Rules

Scenario 1: Regulatory Compliance

Financial/Healthcare applications:

  • May need >28 tests for audit trail
  • Document WHY each test exists (regulation reference)

Scenario 2: Bug-Prone Legacy Code

If Story modifies legacy code with history of bugs:

  • Increase Unit test limit to 20
  • Add characterization tests

Scenario 3: Public API

If Story creates API consumed by 3rd parties:

  • Increase Integration test limit to 12
  • Test all error codes (400, 401, 403, 404, 429, 500)

Scenario 4: Security-Critical Features

Authentication, authorization, encryption:

  • All scenarios Priority ≥15
  • May reach 28 test maximum legitimately

Quick Reference

Decision Flowchart (Minimum Viable Testing)

1. Start with 2 baseline E2E tests (positive + negative) - ALWAYS
   ↓
2. For test #3 and beyond, calculate Risk Priority (Impact × Probability)
   ↓
3. Priority ≥15?
   NO (≤14) → SKIP (manual testing sufficient)
   YES → Proceed to Step 4
   ↓
4. Critical Justification Check (ALL must be YES):
   ❓ Tests OUR business logic? (not framework/library/database)
   ❓ Not already covered by 2 baseline E2E?
   ❓ Unique business value?
   ANY NO? → SKIP
   ALL YES? → Proceed to Step 5
   ↓
5. Select Test Type:
   - User flow? → E2E #3-5 (with justification)
   - E2E doesn't cover interaction? → Integration 0-8 (with justification)
   - Complex OUR algorithm? → Unit 0-15 (with justification)
   ↓
6. Verify total ≤7 (realistic goal) or ≤28 (hard limit)
   > 7 tests? → Auto-trim by Priority, keep 2 baseline E2E + top 5 Priority

Red Flags (Stop and Reconsider)

"I need to test every branch for coverage" → Focus on business risk, not coverage "This E2E already tests it, but I'll add unit test anyway" → Duplication "Need to test Express middleware behavior" → Testing framework, not OUR code "Need to test Prisma query execution" → Testing database/ORM, not OUR code "Need to test bcrypt hashing" → Testing library, not OUR code "Story has 45 tests" → Exceeds limit, prioritize and trim "Story has 15 tests but includes Prisma/bcrypt/Express tests" → Testing framework/library, remove "Testing getter/setter" → Trivial code, E2E covers it "Need more tests to hit 10 minimum" → Minimum is 2, not 10!

Green Lights (Good Test)

"2 E2E tests: positive + negative for main endpoint" → Baseline (ALWAYS) "Tax calculation with country-specific rules, Priority 25" → Unit test (OUR complex logic) "User must complete checkout, Priority 20" → E2E test (user value) "Story has 3 tests: 2 E2E + 1 Unit for OUR tax logic" → Minimum viable! "Story has 5 tests, all test OUR business logic, all Priority ≥15" → Justified and minimal "Skipped 8 scenarios - all were framework/library behavior" → Good filtering!

References

  • Kent Beck, "Test Desiderata" (2018)
  • Martin Fowler, "Practical Test Pyramid" (2018)
  • Kent C. Dodds, "The Testing Trophy" (2020)
  • Google Testing Blog, "Code Coverage Best Practices" (2020)
  • Netflix Tech Blog, "Testing Strategy at Scale" (2021)
  • Michael Feathers, "Working Effectively with Legacy Code" (2004)
  • OWASP Testing Guide v4.2 (2023)

Version History

Version Date Changes
1.0 2025-10-31 Initial Risk-Based Testing framework to replace Test Pyramid (10-28 tests per Story)
2.0.0 2025-11-11 Minimum Viable Testing philosophy: Start with 2 E2E baseline, realistic goal 2-7 tests. Critical justification required for each test beyond baseline. New anti-patterns (5-7) for framework/library/database testing. Updated examples (Login 6→3, Search 7→2, Payment 13→5)

Version: 2.0.0 Last Updated: 2025-11-11