14 KiB
Testing Strategy
Universal testing philosophy and strategy for modern software projects: principles, organization, and best practices.
SCOPE: Testing philosophy, risk-based strategy, test organization, isolation patterns, what to test. NOT IN SCOPE: Project structure, framework-specific patterns, CI/CD configuration, test tooling setup.
Quick Navigation
- Tests Organization: tests/README.md - Directory structure, Story-Level Pattern, running tests
- Test Inventory: tests/unit/REGISTRY.md, tests/integration/REGISTRY.md, tests/e2e/REGISTRY.md
Core Philosophy
Test YOUR Code, Not Frameworks
Focus testing effort on YOUR business logic and integration usage. Do not retest database constraints, ORM internals, framework validation, or third-party library mechanics.
Rule of thumb: If deleting your code wouldn't fail the test, you're testing someone else's code.
Examples
| Verdict | Test Description | Rationale |
|---|---|---|
| ✅ GOOD | Custom validation logic raises exception for invalid input | Tests YOUR validation rules |
| ✅ GOOD | Repository query returns filtered results based on business criteria | Tests YOUR query construction |
| ✅ GOOD | API endpoint returns correct HTTP status for error scenarios | Tests YOUR error handling |
| ❌ BAD | Database enforces UNIQUE constraint on email column | Tests database, not your code |
| ❌ BAD | ORM model has correct column types and lengths | Tests ORM configuration, not logic |
| ❌ BAD | Framework validates request body matches schema | Tests framework validation |
Risk-Based Testing Strategy
Priority Matrix
Automate only high-value scenarios using Business Impact (1-5) × Probability (1-5).
| Priority Score | Action | Example Scenarios |
|---|---|---|
| ≥15 | MUST test | Payment processing, authentication, data loss scenarios |
| 10-14 | Consider testing | Edge cases with moderate impact |
| <10 | Skip automated tests | Low-probability edge cases, framework behavior |
Test Caps (per Story)
Enforce caps to prevent test bloat:
- E2E: 2-5 tests
- Integration: 3-8 tests
- Unit: 5-15 tests
- Total: 10-28 tests per Story
Key principles:
- No minimum limits - Can be 0 tests if no Priority ≥15 scenarios exist
- No test pyramids - Test distribution based on risk, not arbitrary ratios
- Every test must add value - Each test should validate unique Priority ≥15 scenario
Exception: ML/GPU/Hardware-dependent workloads may favor more E2E (5-10), fewer Integration (2-5), minimal Unit (1-3) because behavior is hardware-dependent and mocks lack fidelity. Same 10-28 total cap applies.
Story-Level Testing Pattern
When to Write Tests
Consolidate ALL tests in Story's final test task AFTER implementation + manual verification.
| Task Type | Contains Tests? | Rationale |
|---|---|---|
| Implementation Tasks | ❌ NO tests | Focus on implementation only |
| Final Test Task | ✅ ALL tests | Complete Story coverage after manual verification |
Benefits
- Complete context - Tests written when all code implemented
- No duplication - E2E covers integration paths, no need to retest same code
- Better prioritization - Manual testing identifies Priority ≥15 scenarios before automation
- Atomic delivery - Story delivers working code + comprehensive tests together
Anti-Pattern Example
| ❌ Wrong Approach | ✅ Correct Approach |
|---|---|
| Task 1: Implement feature X + write unit tests Task 2: Update integration + write integration tests Task 3: Add logging + write E2E tests |
Task 1: Implement feature X Task 2: Update integration points Task 3: Add logging Task 4 (Final): Write ALL tests (2 E2E, 3 Integration, 8 Unit) |
| Result: Tests scattered, duplication, incomplete coverage | Result: Tests consolidated, no duplication, complete coverage |
Test Organization
Directory Structure
tests/
├── e2e/ # End-to-end tests (full system, real services)
│ ├── test_user_journey.ext
│ └── REGISTRY.md # E2E test inventory
├── integration/ # Integration tests (multiple components, real dependencies)
│ ├── api/
│ ├── services/
│ ├── db/
│ └── REGISTRY.md # Integration test inventory
├── unit/ # Unit tests (single component, mocked dependencies)
│ ├── api/
│ ├── services/
│ ├── db/
│ └── REGISTRY.md # Unit test inventory
└── README.md # Test documentation
Test Inventory (REGISTRY.md)
Each test category has REGISTRY.md with detailed test descriptions:
Purpose:
- Document what each test validates
- Track test counts per Epic/Story
- Provide navigation for test maintenance
Format example:
# E2E Test Registry
## Quality Estimation (Epic 6 - API-69)
**File:** tests/e2e/test_quality_estimation.ext
**Tests (4):**
1. **evaluate_endpoint_batch_splitting** - MetricX batch splitting (segments >128 split into batches)
2. **evaluate_endpoint_gpu_integration** - MetricX-24 GPU service integration
3. **evaluate_endpoint_error_handling** - Service timeout handling (503 status)
4. **evaluate_endpoint_response_format** - Response schema validation
**Total:** 4 E2E tests | **Coverage:** 100% Priority ≥15 scenarios
Test Levels
E2E (End-to-End) Tests
Definition: Full system tests with real external services and complete data flow.
Characteristics:
- Real external APIs/services
- Real database
- Full request-response cycle
- Validates complete user journeys
When to write:
- Critical user workflows (authentication, payments, core features)
- Integration with external services
- Priority ≥15 scenarios that span multiple systems
Example: User registration flow (E2E) vs individual validation function (Unit)
Integration Tests
Definition: Tests multiple components together with real dependencies (database, cache, file system).
Characteristics:
- Real database/cache/file system
- Multiple components interact
- May mock external APIs
- Validates component integration
When to write:
- Database query behavior
- Service orchestration
- Component interaction
- API endpoint behavior (without external services)
Example: Repository query with real database vs service logic with mocked repository
Unit Tests
Definition: Tests single component in isolation with mocked dependencies.
Characteristics:
- Fast execution (<1ms per test)
- No external dependencies
- Mocked collaborators
- Validates single responsibility
When to write:
- Business logic validation
- Complex calculations
- Error handling logic
- Custom transformations
Example: Validation function with mocked data vs endpoint with real database
Isolation Patterns
Pattern Comparison
| Pattern | Speed | Complexity | Best For |
|---|---|---|---|
| Data Deletion | ⚡⚡⚡ Fastest | Simple | Default choice (90% of projects) |
| Transaction Rollback | ⚡⚡ Fast | Moderate | Transaction semantics testing |
| Database Recreation | ⚡ Slow | Simple | Maximum isolation paranoia |
Data Deletion (Default)
How it works:
- Create schema once at test session start
- Delete data after each test
- Drop schema at test session end
Benefits:
- Fast (5-8s for 50 tests)
- Simple implementation
- Full isolation between tests
When to use: Default choice for most projects
Transaction Rollback
How it works:
- Start transaction before each test
- Run test code
- Rollback transaction after test
Benefits:
- Good for testing transaction semantics
- Faster than DB recreation
When to use: Testing transaction behavior, savepoints, isolation levels
Database Recreation
How it works:
- Drop and recreate database before each test
- Apply migrations
- Run test
Benefits:
- Maximum isolation
- Catches migration issues
When to use: Paranoia about shared state, testing migrations
What To Test vs NOT Test
✅ Test (GOOD)
Test YOUR code and integration usage:
| Category | Examples |
|---|---|
| Business logic | Validation rules, orchestration, error handling, computed properties |
| Query construction | Filters, joins, aggregations, pagination |
| API behavior | Request validation, response shape, HTTP status codes |
| Custom validators | Complex validation logic, transformations |
| Integration smoke | Database connectivity, basic CRUD, configuration |
❌ Avoid (BAD)
Don't test framework internals and third-party libraries:
| Category | Examples |
|---|---|
| Database constraints | UNIQUE, FOREIGN KEY, NOT NULL, CHECK constraints |
| ORM internals | Column types, table creation, metadata, relationships |
| Framework validation | Request body validation, dependency injection, routing |
| Third-party libraries | HTTP client behavior, serialization libraries, cryptography |
Testing Patterns
Arrange-Act-Assert
Structure tests clearly:
test_example:
# ARRANGE: Set up test data and dependencies
setup_data()
mock_dependencies()
# ACT: Execute code under test
result = execute_operation()
# ASSERT: Verify outcomes
assert result == expected
verify_side_effects()
Benefits:
- Clear test structure
- Easy to read and maintain
- Explicit test phases
Mock at the Seam
Mock at component boundaries, not internals:
| Test Type | What to Mock | What to Use Real |
|---|---|---|
| Unit tests | External dependencies (repositories, APIs, file system) | Business logic |
| Integration tests | External APIs, slow services | Database, cache, your code |
| E2E tests | Nothing (or minimal external services) | Everything |
Anti-pattern: Over-mocking your own code defeats the purpose of integration tests.
Test Data Builders
Create readable test data:
# Builder pattern for test data
user = build_user(
email="test@example.com",
role="admin",
active=True
)
# Easy to create edge cases
inactive_user = build_user(active=False)
guest_user = build_user(role="guest")
Benefits:
- Readable test setup
- Easy edge case creation
- Reusable across tests
Common Issues
Flaky Tests
Symptom: Tests pass/fail randomly without code changes
Common causes:
- Shared state between tests (global variables, cached data)
- Time-dependent logic (timestamps, delays)
- External service instability
- Improper cleanup between tests
Solutions:
- Isolate test data (per-test creation, cleanup)
- Mock time-dependent code
- Use test-specific configurations
- Implement proper teardown
Slow Tests
Symptom: Test suite takes too long (>30s for 50 tests)
Common causes:
- Database recreation per test
- Running migrations per test
- No connection pooling
- Too many E2E tests
Solutions:
- Use Data Deletion pattern
- Run migrations once per session
- Optimize test data creation
- Balance test levels (more Unit, fewer E2E)
Test Coupling
Symptom: Changing one component breaks many unrelated tests
Common causes:
- Tests depend on implementation details
- Shared test fixtures across unrelated tests
- Testing framework internals instead of behavior
Solutions:
- Test behavior, not implementation
- Use independent test data per test
- Focus on public APIs, not internal state
Coverage Guidelines
Targets
| Layer | Target | Priority |
|---|---|---|
| Critical business logic | 100% branch coverage | HIGH |
| Repositories/Data access | 90%+ line coverage | HIGH |
| API endpoints | 80%+ line coverage | MEDIUM |
| Utilities/Helpers | 80%+ line coverage | MEDIUM |
| Overall | 80%+ line coverage | MEDIUM |
What Coverage Means
Coverage is a tool, not a goal:
- ✅ High coverage + focused tests = good quality signal
- ❌ High coverage + meaningless tests = false confidence
- ❌ Low coverage = blind spots in testing
Focus on:
- Critical paths covered
- Edge cases tested
- Error handling validated
Not on:
- Arbitrary percentage targets
- Testing getters/setters
- Framework code
Verification Checklist
Strategy
- Risk-based selection (Priority ≥15)
- Test caps enforced (E2E 2-5, Integration 3-8, Unit 5-15)
- Total 10-28 tests per Story
- Tests target YOUR code, not framework internals
- E2E smoke tests for critical integrations
Organization
- Story-Level Test Task Pattern followed
- Tests consolidated in final Story task
- REGISTRY.md files maintained for all test categories
- Test directory structure follows conventions
Isolation
- Isolation pattern chosen (Data Deletion recommended)
- Each test creates own data
- Proper cleanup between tests
- No shared state between tests
Quality
- Tests are order-independent
- Tests run fast (<10s for 50 integration tests)
- No flaky tests
- Coverage ≥80% overall, 100% for critical logic
- Meaningful test names and descriptions
Maintenance
Update Triggers:
- New testing patterns discovered
- Framework version changes affecting tests
- Significant changes to test architecture
- New isolation issues identified
Verification: Review this strategy when starting new projects or experiencing test quality issues.
Last Updated: [CURRENT_DATE] - Initial universal testing strategy