Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:37:27 +08:00
commit 37774aa937
131 changed files with 31137 additions and 0 deletions

View File

@@ -0,0 +1,465 @@
# Testing Strategy
Universal testing philosophy and strategy for modern software projects: principles, organization, and best practices.
> **SCOPE:** Testing philosophy, risk-based strategy, test organization, isolation patterns, what to test. **NOT IN SCOPE:** Project structure, framework-specific patterns, CI/CD configuration, test tooling setup.
## Quick Navigation
- **Tests Organization:** [tests/README.md](../../tests/README.md) - Directory structure, Story-Level Pattern, running tests
- **Test Inventory:** [tests/unit/REGISTRY.md](../../tests/unit/REGISTRY.md), [tests/integration/REGISTRY.md](../../tests/integration/REGISTRY.md), [tests/e2e/REGISTRY.md](../../tests/e2e/REGISTRY.md)
---
## Core Philosophy
### Test YOUR Code, Not Frameworks
**Focus testing effort on YOUR business logic and integration usage.** Do not retest database constraints, ORM internals, framework validation, or third-party library mechanics.
**Rule of thumb:** If deleting your code wouldn't fail the test, you're testing someone else's code.
### Examples
| Verdict | Test Description | Rationale |
|---------|-----------------|-----------|
| ✅ **GOOD** | Custom validation logic raises exception for invalid input | Tests YOUR validation rules |
| ✅ **GOOD** | Repository query returns filtered results based on business criteria | Tests YOUR query construction |
| ✅ **GOOD** | API endpoint returns correct HTTP status for error scenarios | Tests YOUR error handling |
| ❌ **BAD** | Database enforces UNIQUE constraint on email column | Tests database, not your code |
| ❌ **BAD** | ORM model has correct column types and lengths | Tests ORM configuration, not logic |
| ❌ **BAD** | Framework validates request body matches schema | Tests framework validation |
---
## Risk-Based Testing Strategy
### Priority Matrix
**Automate only high-value scenarios** using Business Impact (1-5) × Probability (1-5).
| Priority Score | Action | Example Scenarios |
|----------------|--------|-------------------|
| **≥15** | MUST test | Payment processing, authentication, data loss scenarios |
| **10-14** | Consider testing | Edge cases with moderate impact |
| **<10** | Skip automated tests | Low-probability edge cases, framework behavior |
### Test Caps (per Story)
**Enforce caps to prevent test bloat:**
- **E2E:** 2-5 tests
- **Integration:** 3-8 tests
- **Unit:** 5-15 tests
- **Total:** 10-28 tests per Story
**Key principles:**
- **No minimum limits** - Can be 0 tests if no Priority ≥15 scenarios exist
- **No test pyramids** - Test distribution based on risk, not arbitrary ratios
- **Every test must add value** - Each test should validate unique Priority ≥15 scenario
**Exception:** ML/GPU/Hardware-dependent workloads may favor more E2E (5-10), fewer Integration (2-5), minimal Unit (1-3) because behavior is hardware-dependent and mocks lack fidelity. Same 10-28 total cap applies.
---
## Story-Level Testing Pattern
### When to Write Tests
**Consolidate ALL tests in Story's final test task** AFTER implementation + manual verification.
| Task Type | Contains Tests? | Rationale |
|-----------|----------------|-----------|
| **Implementation Tasks** | ❌ NO tests | Focus on implementation only |
| **Final Test Task** | ✅ ALL tests | Complete Story coverage after manual verification |
### Benefits
1. **Complete context** - Tests written when all code implemented
2. **No duplication** - E2E covers integration paths, no need to retest same code
3. **Better prioritization** - Manual testing identifies Priority ≥15 scenarios before automation
4. **Atomic delivery** - Story delivers working code + comprehensive tests together
### Anti-Pattern Example
| ❌ Wrong Approach | ✅ Correct Approach |
|-------------------|---------------------|
| Task 1: Implement feature X + write unit tests<br>Task 2: Update integration + write integration tests<br>Task 3: Add logging + write E2E tests | Task 1: Implement feature X<br>Task 2: Update integration points<br>Task 3: Add logging<br>**Task 4 (Final): Write ALL tests (2 E2E, 3 Integration, 8 Unit)** |
| **Result:** Tests scattered, duplication, incomplete coverage | **Result:** Tests consolidated, no duplication, complete coverage |
---
## Test Organization
### Directory Structure
```
tests/
├── e2e/ # End-to-end tests (full system, real services)
│ ├── test_user_journey.ext
│ └── REGISTRY.md # E2E test inventory
├── integration/ # Integration tests (multiple components, real dependencies)
│ ├── api/
│ ├── services/
│ ├── db/
│ └── REGISTRY.md # Integration test inventory
├── unit/ # Unit tests (single component, mocked dependencies)
│ ├── api/
│ ├── services/
│ ├── db/
│ └── REGISTRY.md # Unit test inventory
└── README.md # Test documentation
```
### Test Inventory (REGISTRY.md)
**Each test category has REGISTRY.md** with detailed test descriptions:
**Purpose:**
- Document what each test validates
- Track test counts per Epic/Story
- Provide navigation for test maintenance
**Format example:**
```markdown
# E2E Test Registry
## Quality Estimation (Epic 6 - API-69)
**File:** tests/e2e/test_quality_estimation.ext
**Tests (4):**
1. **evaluate_endpoint_batch_splitting** - MetricX batch splitting (segments >128 split into batches)
2. **evaluate_endpoint_gpu_integration** - MetricX-24 GPU service integration
3. **evaluate_endpoint_error_handling** - Service timeout handling (503 status)
4. **evaluate_endpoint_response_format** - Response schema validation
**Total:** 4 E2E tests | **Coverage:** 100% Priority ≥15 scenarios
```
---
## Test Levels
### E2E (End-to-End) Tests
**Definition:** Full system tests with real external services and complete data flow.
**Characteristics:**
- Real external APIs/services
- Real database
- Full request-response cycle
- Validates complete user journeys
**When to write:**
- Critical user workflows (authentication, payments, core features)
- Integration with external services
- Priority ≥15 scenarios that span multiple systems
**Example:** User registration flow (E2E) vs individual validation function (Unit)
### Integration Tests
**Definition:** Tests multiple components together with real dependencies (database, cache, file system).
**Characteristics:**
- Real database/cache/file system
- Multiple components interact
- May mock external APIs
- Validates component integration
**When to write:**
- Database query behavior
- Service orchestration
- Component interaction
- API endpoint behavior (without external services)
**Example:** Repository query with real database vs service logic with mocked repository
### Unit Tests
**Definition:** Tests single component in isolation with mocked dependencies.
**Characteristics:**
- Fast execution (<1ms per test)
- No external dependencies
- Mocked collaborators
- Validates single responsibility
**When to write:**
- Business logic validation
- Complex calculations
- Error handling logic
- Custom transformations
**Example:** Validation function with mocked data vs endpoint with real database
---
## Isolation Patterns
### Pattern Comparison
| Pattern | Speed | Complexity | Best For |
|---------|-------|------------|----------|
| **Data Deletion** | ⚡⚡⚡ Fastest | Simple | Default choice (90% of projects) |
| **Transaction Rollback** | ⚡⚡ Fast | Moderate | Transaction semantics testing |
| **Database Recreation** | ⚡ Slow | Simple | Maximum isolation paranoia |
### Data Deletion (Default)
**How it works:**
1. Create schema once at test session start
2. Delete data after each test
3. Drop schema at test session end
**Benefits:**
- Fast (5-8s for 50 tests)
- Simple implementation
- Full isolation between tests
**When to use:** Default choice for most projects
### Transaction Rollback
**How it works:**
1. Start transaction before each test
2. Run test code
3. Rollback transaction after test
**Benefits:**
- Good for testing transaction semantics
- Faster than DB recreation
**When to use:** Testing transaction behavior, savepoints, isolation levels
### Database Recreation
**How it works:**
1. Drop and recreate database before each test
2. Apply migrations
3. Run test
**Benefits:**
- Maximum isolation
- Catches migration issues
**When to use:** Paranoia about shared state, testing migrations
---
## What To Test vs NOT Test
### ✅ Test (GOOD)
**Test YOUR code and integration usage:**
| Category | Examples |
|----------|----------|
| **Business logic** | Validation rules, orchestration, error handling, computed properties |
| **Query construction** | Filters, joins, aggregations, pagination |
| **API behavior** | Request validation, response shape, HTTP status codes |
| **Custom validators** | Complex validation logic, transformations |
| **Integration smoke** | Database connectivity, basic CRUD, configuration |
### ❌ Avoid (BAD)
**Don't test framework internals and third-party libraries:**
| Category | Examples |
|----------|----------|
| **Database constraints** | UNIQUE, FOREIGN KEY, NOT NULL, CHECK constraints |
| **ORM internals** | Column types, table creation, metadata, relationships |
| **Framework validation** | Request body validation, dependency injection, routing |
| **Third-party libraries** | HTTP client behavior, serialization libraries, cryptography |
---
## Testing Patterns
### Arrange-Act-Assert
**Structure tests clearly:**
```
test_example:
# ARRANGE: Set up test data and dependencies
setup_data()
mock_dependencies()
# ACT: Execute code under test
result = execute_operation()
# ASSERT: Verify outcomes
assert result == expected
verify_side_effects()
```
**Benefits:**
- Clear test structure
- Easy to read and maintain
- Explicit test phases
### Mock at the Seam
**Mock at component boundaries, not internals:**
| Test Type | What to Mock | What to Use Real |
|-----------|--------------|------------------|
| **Unit tests** | External dependencies (repositories, APIs, file system) | Business logic |
| **Integration tests** | External APIs, slow services | Database, cache, your code |
| **E2E tests** | Nothing (or minimal external services) | Everything |
**Anti-pattern:** Over-mocking your own code defeats the purpose of integration tests.
### Test Data Builders
**Create readable test data:**
```
# Builder pattern for test data
user = build_user(
email="test@example.com",
role="admin",
active=True
)
# Easy to create edge cases
inactive_user = build_user(active=False)
guest_user = build_user(role="guest")
```
**Benefits:**
- Readable test setup
- Easy edge case creation
- Reusable across tests
---
## Common Issues
### Flaky Tests
**Symptom:** Tests pass/fail randomly without code changes
**Common causes:**
- Shared state between tests (global variables, cached data)
- Time-dependent logic (timestamps, delays)
- External service instability
- Improper cleanup between tests
**Solutions:**
- Isolate test data (per-test creation, cleanup)
- Mock time-dependent code
- Use test-specific configurations
- Implement proper teardown
### Slow Tests
**Symptom:** Test suite takes too long (>30s for 50 tests)
**Common causes:**
- Database recreation per test
- Running migrations per test
- No connection pooling
- Too many E2E tests
**Solutions:**
- Use Data Deletion pattern
- Run migrations once per session
- Optimize test data creation
- Balance test levels (more Unit, fewer E2E)
### Test Coupling
**Symptom:** Changing one component breaks many unrelated tests
**Common causes:**
- Tests depend on implementation details
- Shared test fixtures across unrelated tests
- Testing framework internals instead of behavior
**Solutions:**
- Test behavior, not implementation
- Use independent test data per test
- Focus on public APIs, not internal state
---
## Coverage Guidelines
### Targets
| Layer | Target | Priority |
|-------|--------|----------|
| **Critical business logic** | 100% branch coverage | HIGH |
| **Repositories/Data access** | 90%+ line coverage | HIGH |
| **API endpoints** | 80%+ line coverage | MEDIUM |
| **Utilities/Helpers** | 80%+ line coverage | MEDIUM |
| **Overall** | 80%+ line coverage | MEDIUM |
### What Coverage Means
**Coverage is a tool, not a goal:**
- ✅ High coverage + focused tests = good quality signal
- ❌ High coverage + meaningless tests = false confidence
- ❌ Low coverage = blind spots in testing
**Focus on:**
- Critical paths covered
- Edge cases tested
- Error handling validated
**Not on:**
- Arbitrary percentage targets
- Testing getters/setters
- Framework code
---
## Verification Checklist
### Strategy
- [ ] Risk-based selection (Priority ≥15)
- [ ] Test caps enforced (E2E 2-5, Integration 3-8, Unit 5-15)
- [ ] Total 10-28 tests per Story
- [ ] Tests target YOUR code, not framework internals
- [ ] E2E smoke tests for critical integrations
### Organization
- [ ] Story-Level Test Task Pattern followed
- [ ] Tests consolidated in final Story task
- [ ] REGISTRY.md files maintained for all test categories
- [ ] Test directory structure follows conventions
### Isolation
- [ ] Isolation pattern chosen (Data Deletion recommended)
- [ ] Each test creates own data
- [ ] Proper cleanup between tests
- [ ] No shared state between tests
### Quality
- [ ] Tests are order-independent
- [ ] Tests run fast (<10s for 50 integration tests)
- [ ] No flaky tests
- [ ] Coverage ≥80% overall, 100% for critical logic
- [ ] Meaningful test names and descriptions
---
## Maintenance
**Update Triggers:**
- New testing patterns discovered
- Framework version changes affecting tests
- Significant changes to test architecture
- New isolation issues identified
**Verification:** Review this strategy when starting new projects or experiencing test quality issues.
**Last Updated:** [CURRENT_DATE] - Initial universal testing strategy