18 KiB
Risk-Based Testing Guide
Purpose
This guide replaces the traditional Test Pyramid (70/20/10 ratio) with a Value-Based Testing Framework that prioritizes business risk and practical test limits. The goal is to write tests that matter, not to chase coverage metrics.
Problem solved: Traditional Test Pyramid approach generates excessive tests (~200 per Story) by mechanically testing every conditional branch. This creates maintenance burden without proportional business value.
Solution: Risk-Based Testing with clear prioritization criteria and enforced limits (10-28 tests max per Story).
Core Philosophy
Kent Beck's Principle
"Write tests. Not too many. Mostly integration."
Key Insights
- Test business value, not code coverage - 80% coverage means nothing if critical payment flow isn't tested
- Manual testing has value - Not every scenario needs automated test duplication
- Each test has maintenance cost - More tests = more refactoring overhead
- Integration tests catch real bugs - Unit tests catch edge cases in isolation
- E2E tests validate user value - Only E2E proves the feature actually works end-to-end
Minimum Viable Testing Philosophy
Start Minimal, Justify Additions
Baseline for every Story:
- 2 E2E tests per endpoint: Positive scenario (happy path) + Negative scenario (critical error)
- 0 Integration tests (E2E covers full stack by default)
- 0 Unit tests (E2E covers simple logic by default)
Realistic goal: 2-7 tests per Story (not 10-28!)
Additional tests ONLY with critical justification:
- Test #3 and beyond: Each requires documented answer to "Why does this test OUR business logic (not framework/library/database)?"
- Priority ≥15 required for all additional tests
- Auto-trim to 7 tests if plan exceeds realistic goal
Critical Justification Questions
Before adding ANY test beyond 2 baseline E2E, answer:
-
❓ Does this test OUR business logic?
- ✅ YES: Tax calculation with country-specific rules (OUR algorithm)
- ❌ NO: bcrypt hashing (library behavior)
- ❌ NO: Prisma query execution (framework behavior)
- ❌ NO: PostgreSQL LIKE operator (database behavior)
-
❓ Is this already covered by 2 baseline E2E tests?
- ✅ NO: E2E doesn't exercise all branches of complex calculation
- ❌ YES: E2E test validates full flow end-to-end
-
❓ Priority ≥15?
- ✅ YES: Money, security, data integrity
- ❌ NO: Skip, manual testing sufficient
-
❓ Unique business value?
- ✅ YES: Tests different scenario than existing tests
- ❌ NO: Duplicate coverage
If ANY answer is ❌ NO → SKIP this test
Risk Priority Matrix
Calculation Formula
Priority = Business Impact (1-5) × Probability of Failure (1-5)
Result ranges:
- Priority ≥15 (15-25): MUST test - critical scenarios
- Priority 9-14: SHOULD test if not already covered
- Priority ≤8 (1-8): SKIP - manual testing sufficient
Business Impact Scoring (1-5)
| Score | Impact Level | Examples |
|---|---|---|
| 5 | Critical | Money loss, security breach, data corruption, legal liability |
| 4 | High | Core business flow breaks (cannot complete purchase, cannot login) |
| 3 | Medium | Feature partially broken (search works but pagination fails) |
| 2 | Low | Minor UX issue (button disabled state wrong, tooltip missing) |
| 1 | Trivial | Cosmetic bug (color slightly off, spacing issue) |
Probability of Failure Scoring (1-5)
| Score | Probability | Indicators |
|---|---|---|
| 5 | Very High (>50%) | Complex algorithm, external API, new technology, no existing tests |
| 4 | High (25-50%) | Multiple dependencies, concurrency, state management |
| 3 | Medium (10-25%) | Standard CRUD, framework defaults, well-tested patterns |
| 2 | Low (5-10%) | Simple logic, established library, copy-paste from working code |
| 1 | Very Low (<5%) | Trivial assignment, framework-generated code |
Priority Matrix Table
| Probability 1 | Probability 2 | Probability 3 | Probability 4 | Probability 5 | |
|---|---|---|---|---|---|
| Impact 5 | 5 (SKIP) | 10 (SHOULD) | 15 (MUST) | 20 (MUST) | 25 (MUST) |
| Impact 4 | 4 (SKIP) | 8 (SKIP) | 12 (SHOULD) | 16 (MUST) | 20 (MUST) |
| Impact 3 | 3 (SKIP) | 6 (SKIP) | 9 (SHOULD) | 12 (SHOULD) | 15 (MUST) |
| Impact 2 | 2 (SKIP) | 4 (SKIP) | 6 (SKIP) | 8 (SKIP) | 10 (SHOULD) |
| Impact 1 | 1 (SKIP) | 2 (SKIP) | 3 (SKIP) | 4 (SKIP) | 5 (SKIP) |
Test Type Decision Tree
Step 1: Calculate Risk Priority
Use Risk Priority Matrix above.
Step 2: Select Test Type
IF Priority ≥15 → Proceed to Step 3
ELSE IF Priority 9-14 → Check Anti-Duplication (Step 4), then Step 3
ELSE Priority ≤8 → SKIP (manual testing sufficient)
Step 3: Choose Test Level
E2E Test (2-5 max per Story):
- BASELINE (ALWAYS): 2 E2E tests per endpoint
- Test 1: Positive scenario (happy path validating main AC)
- Test 2: Negative scenario (critical error handling)
- ADDITIONAL (3-5): ONLY if Priority ≥15 AND justified
- Critical edge case from manual testing
- Second endpoint (if Story implements multiple endpoints)
- Examples:
- User registers → receives email → confirms → can login
- User adds product → proceeds to checkout → pays → sees confirmation
- User uploads file → sees progress → file appears in list
Integration Test (0-8 max per Story):
- DEFAULT: 0 Integration tests (2 E2E tests cover full stack by default)
- ADD ONLY if: E2E doesn't cover interaction completely AND Priority ≥15 AND justified
- Examples:
- Transaction rollback on error (E2E tests happy path only)
- Concurrent request handling (E2E tests single request)
- External API error scenarios (500, timeout) with Priority ≥15
- MANDATORY SKIP:
- ❌ Simple pass-through calls (E2E already validates end-to-end)
- ❌ Testing framework integrations (Prisma client, TypeORM repository, Express app)
- ❌ Testing database query execution (database engine behavior)
Unit Test (0-15 max per Story):
- DEFAULT: 0 Unit tests (2 E2E tests cover simple logic by default)
- ADD ONLY for complex business logic with Priority ≥15:
- Financial calculations (tax, discount, currency conversion) WITH COMPLEX RULES
- Security algorithms (password strength, permission matrix) WITH CUSTOM LOGIC
- Complex business algorithms (scoring, matching, ranking) WITH MULTIPLE FACTORS
- MANDATORY SKIP - DO NOT create unit tests for:
- ❌ Simple CRUD operations (already covered by E2E)
- ❌ Framework code (Express middleware, React hooks, FastAPI dependencies)
- ❌ Library functions (bcrypt hashing, jsonwebtoken signing, axios requests)
- ❌ Database queries (Prisma findMany, TypeORM query builder, SQL joins)
- ❌ Getters/setters or simple property access
- ❌ Trivial conditionals (
if (user) return user.name,status === 'active') - ❌ Pass-through functions (wrappers without logic)
- ❌ Performance/load testing (benchmarks, stress tests, scalability validation)
Step 4: Anti-Duplication Check
Before writing ANY test, verify:
-
Is this scenario already covered by E2E?
- E2E tests payment flow → SKIP unit test for
calculateTotal() - E2E tests login → SKIP unit test for
validateEmail()
- E2E tests payment flow → SKIP unit test for
-
Is this testing framework code?
- Testing Express
app.use()→ SKIP - Testing React
useState→ SKIP - Testing Prisma
findMany()→ SKIP
- Testing Express
-
Does this add unique business value?
- E2E tests happy path → Unit test for edge case (negative price) → KEEP
- Integration test already validates DB transaction → SKIP duplicate unit test
-
Is this a one-line function?
getFullName() { return firstName + lastName }→ SKIP (E2E covers it)
Test Limits Per Story
Enforced Limits with Realistic Goals
| Test Type | Minimum | Realistic Goal | Maximum | Purpose |
|---|---|---|---|---|
| E2E | 2 | 2 | 5 | Baseline: positive + negative per endpoint |
| Integration | 0 | 0-2 | 8 | ONLY if E2E doesn't cover interaction |
| Unit | 0 | 0-3 | 15 | ONLY complex business logic (financial/security/algorithms) |
| TOTAL | 2 | 2-7 | 28 | Start minimal, add only with justification |
Key Change: Test limits are now CEILINGS (maximum allowed), NOT targets to fill. Start with 2 E2E tests, add more only with critical justification.
Rationale for Limits
Why maximum 5 E2E?
- E2E tests are slow (10-60 seconds each)
- Each Story typically has 2-4 Acceptance Criteria
- 1-2 E2E per AC is sufficient
- Edge cases covered by Integration/Unit tests
Why maximum 8 Integration?
- Integration tests validate layer interactions
- Typical Story has 3-5 integration points (API → Service → DB)
- 1-2 tests per integration point + error scenarios
Why maximum 15 Unit?
- Only test complex business logic
- Typical Story has 2-4 complex functions
- 3-5 tests per function (happy path + edge cases)
Why total maximum 28?
- Industry data: Stories with >30 tests rarely have proportional bug prevention
- Maintenance cost grows quadratically beyond this point
- Focus on quality over quantity
Common Over-Testing Anti-Patterns
Anti-Pattern 1: "Every if/else needs a test"
Bad:
// Function with 10 if/else branches
function processOrder(order) {
if (!order) return null; // Test 1
if (!order.items) return null; // Test 2
if (order.items.length === 0) return null; // Test 3
// ... 7 more conditionals
}
Problem: 10 unit tests for trivial validation logic already covered by E2E test that calls processOrder().
Good:
- 1 E2E test: User submits valid order → success
- 1 E2E test: User submits invalid order → error message
- 1 Unit test: Complex tax calculation inside
processOrder()(if exists)
Total: 3 tests instead of 12
Anti-Pattern 2: "Testing framework code"
Bad:
// Testing Express middleware
test('CORS middleware sets headers', () => {
// Testing Express, not OUR code
});
// Testing React hook
test('useState updates component', () => {
// Testing React, not OUR code
});
Good:
- Trust framework tests (Express/React have thousands of tests)
- Test OUR business logic that USES framework
Anti-Pattern 3: "Duplicating E2E coverage with Unit tests"
Bad:
// E2E already tests: POST /api/orders → creates order in DB
test('E2E: User can create order', ...); // E2E
test('Unit: createOrder() inserts to database', ...); // Duplicate!
test('Unit: createOrder() returns order object', ...); // Duplicate!
Good:
// E2E tests full flow
test('E2E: User can create order', ...);
// Unit tests ONLY complex calculation NOT fully exercised by E2E
test('Unit: Bulk discount applied when quantity > 100', ...);
Anti-Pattern 4: "Aiming for 80% coverage"
Bad mindset:
- "We have 75% coverage, need 5 more tests to hit 80%"
- Writes tests for trivial getters/setters to inflate coverage
Good mindset:
- "Payment flow is critical (Priority 25) but only has 1 E2E test"
- "We have 60% coverage but all critical paths tested - DONE"
Anti-Pattern 5: "Testing framework integration"
Bad:
// Testing Express framework behavior
test('Express middleware chain works', () => {
// Testing Express.js, not OUR code
});
// Testing Prisma client behavior
test('Prisma findMany returns array', () => {
// Testing Prisma, not OUR code
});
// Testing React hook behavior
test('useState triggers rerender', () => {
// Testing React, not OUR code
});
Why bad: Frameworks have thousands of tests. Trust the framework, test OUR business logic that USES the framework.
Good:
// Test OUR business logic that uses framework
test('E2E: User can create order', () => {
// Tests OUR endpoint logic (which happens to use Express + Prisma)
// But we're validating OUR business rules, not framework behavior
});
Anti-Pattern 6: "Testing database query syntax"
Bad:
// Testing database query execution
test('findByEmail() returns user from database', () => {
await prisma.user.findUnique({ where: { email: 'test@example.com' }});
// Testing Prisma query builder, not OUR logic
});
// Testing SQL JOIN behavior
test('getUserWithOrders() joins tables correctly', () => {
// Testing PostgreSQL JOIN semantics, not OUR logic
});
Why bad: Database engines have extensive test suites. We're testing PostgreSQL/MySQL, not our code.
Good:
// E2E test already validates query works
test('E2E: User can view order history', () => {
// Implicitly validates that JOIN query works correctly
// We test the USER OUTCOME, not the database mechanism
});
// Unit test ONLY for complex query construction logic
test('buildSearchQuery() with multiple filters generates correct WHERE clause', () => {
// ONLY if we have complex query building logic with business rules
// NOT testing database execution, testing OUR query builder logic
});
Anti-Pattern 7: "Testing library behavior"
Bad:
// Testing bcrypt library
test('bcrypt hashes password correctly', () => {
const hash = await bcrypt.hash('password', 10);
const valid = await bcrypt.compare('password', hash);
expect(valid).toBe(true);
// Testing bcrypt library, not OUR code
});
// Testing jsonwebtoken library
test('JWT token is valid', () => {
const token = jwt.sign({ userId: 1 }, SECRET);
const decoded = jwt.verify(token, SECRET);
// Testing jsonwebtoken library, not OUR code
});
// Testing axios library
test('axios makes HTTP request', () => {
await axios.get('https://api.example.com');
// Testing axios library, not OUR code
});
Why bad: Libraries are already tested by their maintainers. We're duplicating their test suite.
Good:
// E2E test validates full authentication flow
test('E2E: User can login and access protected endpoint', () => {
// Implicitly validates that bcrypt comparison works
// AND that JWT token generation/validation works
// But we test the USER FLOW, not library internals
});
// Unit test ONLY for custom password rules (OUR business logic)
test('validatePasswordStrength() requires 12+ chars with special symbols', () => {
// Testing OUR custom password policy, not bcrypt itself
});
When to Break the Rules
Scenario 1: Regulatory Compliance
Financial/Healthcare applications:
- May need >28 tests for audit trail
- Document WHY each test exists (regulation reference)
Scenario 2: Bug-Prone Legacy Code
If Story modifies legacy code with history of bugs:
- Increase Unit test limit to 20
- Add characterization tests
Scenario 3: Public API
If Story creates API consumed by 3rd parties:
- Increase Integration test limit to 12
- Test all error codes (400, 401, 403, 404, 429, 500)
Scenario 4: Security-Critical Features
Authentication, authorization, encryption:
- All scenarios Priority ≥15
- May reach 28 test maximum legitimately
Quick Reference
Decision Flowchart (Minimum Viable Testing)
1. Start with 2 baseline E2E tests (positive + negative) - ALWAYS
↓
2. For test #3 and beyond, calculate Risk Priority (Impact × Probability)
↓
3. Priority ≥15?
NO (≤14) → SKIP (manual testing sufficient)
YES → Proceed to Step 4
↓
4. Critical Justification Check (ALL must be YES):
❓ Tests OUR business logic? (not framework/library/database)
❓ Not already covered by 2 baseline E2E?
❓ Unique business value?
ANY NO? → SKIP
ALL YES? → Proceed to Step 5
↓
5. Select Test Type:
- User flow? → E2E #3-5 (with justification)
- E2E doesn't cover interaction? → Integration 0-8 (with justification)
- Complex OUR algorithm? → Unit 0-15 (with justification)
↓
6. Verify total ≤7 (realistic goal) or ≤28 (hard limit)
> 7 tests? → Auto-trim by Priority, keep 2 baseline E2E + top 5 Priority
Red Flags (Stop and Reconsider)
❌ "I need to test every branch for coverage" → Focus on business risk, not coverage ❌ "This E2E already tests it, but I'll add unit test anyway" → Duplication ❌ "Need to test Express middleware behavior" → Testing framework, not OUR code ❌ "Need to test Prisma query execution" → Testing database/ORM, not OUR code ❌ "Need to test bcrypt hashing" → Testing library, not OUR code ❌ "Story has 45 tests" → Exceeds limit, prioritize and trim ❌ "Story has 15 tests but includes Prisma/bcrypt/Express tests" → Testing framework/library, remove ❌ "Testing getter/setter" → Trivial code, E2E covers it ❌ "Need more tests to hit 10 minimum" → Minimum is 2, not 10!
Green Lights (Good Test)
✅ "2 E2E tests: positive + negative for main endpoint" → Baseline (ALWAYS) ✅ "Tax calculation with country-specific rules, Priority 25" → Unit test (OUR complex logic) ✅ "User must complete checkout, Priority 20" → E2E test (user value) ✅ "Story has 3 tests: 2 E2E + 1 Unit for OUR tax logic" → Minimum viable! ✅ "Story has 5 tests, all test OUR business logic, all Priority ≥15" → Justified and minimal ✅ "Skipped 8 scenarios - all were framework/library behavior" → Good filtering!
References
- Kent Beck, "Test Desiderata" (2018)
- Martin Fowler, "Practical Test Pyramid" (2018)
- Kent C. Dodds, "The Testing Trophy" (2020)
- Google Testing Blog, "Code Coverage Best Practices" (2020)
- Netflix Tech Blog, "Testing Strategy at Scale" (2021)
- Michael Feathers, "Working Effectively with Legacy Code" (2004)
- OWASP Testing Guide v4.2 (2023)
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2025-10-31 | Initial Risk-Based Testing framework to replace Test Pyramid (10-28 tests per Story) |
| 2.0.0 | 2025-11-11 | Minimum Viable Testing philosophy: Start with 2 E2E baseline, realistic goal 2-7 tests. Critical justification required for each test beyond baseline. New anti-patterns (5-7) for framework/library/database testing. Updated examples (Login 6→3, Search 7→2, Payment 13→5) |
Version: 2.0.0 Last Updated: 2025-11-11