Files
gh-tachyon-beep-skillpacks-…/skills/load-testing-patterns/SKILL.md
2025-11-30 08:59:43 +08:00

844 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: load-testing-patterns
description: Use when designing load tests, choosing tools (k6, JMeter, Gatling), calculating concurrent users from DAU, interpreting latency degradation, identifying bottlenecks, or running spike/soak/stress tests - provides test patterns, anti-patterns, and load calculation frameworks
---
# Load Testing Patterns
## Overview
**Core principle:** Test realistic load patterns, not constant artificial load. Find limits before users do.
**Rule:** Load testing reveals system behavior under stress. Without it, production is your load test.
## Tool Selection Decision Tree
| Your Need | Protocol | Team Skills | Use | Why |
|-----------|----------|-------------|-----|-----|
| Modern API testing | HTTP/REST/GraphQL | JavaScript | **k6** | Best dev experience, CI/CD friendly |
| Enterprise/complex protocols | HTTP/SOAP/JMS/JDBC | Java/GUI comfort | **JMeter** | Mature, comprehensive protocols |
| Python team | HTTP/WebSocket | Python | **Locust** | Pythonic, easy scripting |
| High performance/complex scenarios | HTTP/gRPC | Scala/Java | **Gatling** | Best reports, high throughput |
| Cloud-native at scale | HTTP/WebSocket | Any (SaaS) | **Artillery, Flood.io** | Managed, distributed |
**First choice:** k6 (modern, scriptable, excellent CI/CD integration)
**Why not ApacheBench/wrk:** Too simple for realistic scenarios, no complex user flows
## Test Pattern Library
| Pattern | Purpose | Duration | When to Use |
|---------|---------|----------|-------------|
| **Smoke Test** | Verify test works | 1-2 min | Before every test run |
| **Load Test** | Normal/peak capacity | 10-30 min | Regular capacity validation |
| **Stress Test** | Find breaking point | 20-60 min | Understand limits |
| **Spike Test** | Sudden traffic surge | 5-15 min | Black Friday, launch events |
| **Soak Test** | Memory leaks, stability | 1-8 hours | Pre-release validation |
| **Capacity Test** | Max sustainable load | Variable | Capacity planning |
### Smoke Test
**Goal:** Verify test script works with minimal load
```javascript
// k6 smoke test
export let options = {
vus: 1,
duration: '1m',
thresholds: {
http_req_duration: ['p(95)<500'], // 95% < 500ms
http_req_failed: ['rate<0.01'], // <1% errors
}
}
```
**Purpose:** Catch test script bugs before running expensive full tests
### Load Test (Ramp-Up Pattern)
**Goal:** Test normal and peak expected load
```javascript
// k6 load test with ramp-up
export let options = {
stages: [
{ duration: '5m', target: 100 }, // Ramp to normal load
{ duration: '10m', target: 100 }, // Hold at normal
{ duration: '5m', target: 200 }, // Ramp to peak
{ duration: '10m', target: 200 }, // Hold at peak
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
http_req_failed: ['rate<0.05'],
}
}
```
**Pattern:** Gradual ramp-up → sustain → ramp down. Never start at peak.
### Stress Test (Breaking Point)
**Goal:** Find system limits
```javascript
// k6 stress test
export let options = {
stages: [
{ duration: '5m', target: 100 }, // Normal
{ duration: '5m', target: 300 }, // Above peak
{ duration: '5m', target: 600 }, // 2x peak
{ duration: '5m', target: 900 }, // 3x peak (expect failure)
{ duration: '10m', target: 0 }, // Recovery
]
}
```
**Success:** Identify at what load system degrades (not necessarily breaking completely)
### Spike Test (Sudden Surge)
**Goal:** Test sudden traffic bursts (viral post, email campaign)
```javascript
// k6 spike test
export let options = {
stages: [
{ duration: '1m', target: 100 }, // Normal
{ duration: '30s', target: 1000 }, // SPIKE to 10x
{ duration: '5m', target: 1000 }, // Hold spike
{ duration: '2m', target: 100 }, // Back to normal
{ duration: '5m', target: 100 }, // Recovery check
]
}
```
**Tests:** Auto-scaling, circuit breakers, rate limiting
### Soak Test (Endurance)
**Goal:** Find memory leaks, resource exhaustion over time
```javascript
// k6 soak test
export let options = {
stages: [
{ duration: '5m', target: 100 }, // Ramp
{ duration: '4h', target: 100 }, // Soak (sustained load)
{ duration: '5m', target: 0 }, // Ramp down
]
}
```
**Monitor:** Memory growth, connection leaks, disk space, file descriptors
**Duration:** Minimum 1 hour, ideally 4-8 hours
## Load Calculation Framework
**Problem:** Convert "10,000 daily active users" to concurrent load
### Step 1: DAU to Concurrent Users
```
Concurrent Users = DAU × Concurrency Ratio × Peak Multiplier
Concurrency Ratios by App Type:
- Web apps: 5-10%
- Social media: 10-20%
- Business apps: 20-30% (work hours)
- Gaming: 15-25%
Peak Multiplier: 1.5-2x for safety margin
```
**Example:**
```
DAU = 10,000
Concurrency = 10% (web app)
Peak Multiplier = 1.5
Concurrent Users = 10,000 × 0.10 × 1.5 = 1,500 concurrent users
```
### Step 2: Concurrent Users to Requests/Second
```
RPS = (Concurrent Users × Requests per Session) / (Session Duration × Think Time Ratio)
Think Time Ratio:
- Active browsing: 0.3-0.5 (30-50% time clicking/typing)
- Reading-heavy: 0.1-0.2 (10-20% active)
- API clients: 0.8-1.0 (80-100% active)
```
**Example:**
```
Concurrent Users = 1,500
Requests per Session = 20
Session Duration = 10 minutes = 600 seconds
Think Time Ratio = 0.3 (web browsing)
RPS = (1,500 × 20) / (600 × 0.3) = 30,000 / 180 = 167 RPS
```
### Step 3: Model Realistic Patterns
Don't use constant load. Use realistic traffic patterns:
```javascript
// Realistic daily pattern
export let options = {
stages: [
// Morning ramp
{ duration: '2h', target: 500 }, // 08:00-10:00
{ duration: '2h', target: 1000 }, // 10:00-12:00 (peak)
// Lunch dip
{ duration: '1h', target: 600 }, // 12:00-13:00
// Afternoon peak
{ duration: '2h', target: 1200 }, // 13:00-15:00 (peak)
{ duration: '2h', target: 800 }, // 15:00-17:00
// Evening drop
{ duration: '2h', target: 300 }, // 17:00-19:00
]
}
```
## Anti-Patterns Catalog
### ❌ Coordinated Omission
**Symptom:** Fixed rate load generation ignores slow responses, underestimating latency
**Why bad:** Hides real latency impact when system slows down
**Fix:** Use arrival rate (requests/sec) not iteration rate
```javascript
// ❌ Bad - coordinated omission
export default function() {
http.get('https://api.example.com')
sleep(1) // Wait 1s between requests
}
// ✅ Good - arrival rate pacing
export let options = {
scenarios: {
constant_arrival_rate: {
executor: 'constant-arrival-rate',
rate: 100, // 100 RPS regardless of response time
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 50,
maxVUs: 200,
}
}
}
```
---
### ❌ Cold Start Testing
**Symptom:** Running load test immediately after deployment without warm-up
**Why bad:** JIT compilation, cache warming, connection pooling haven't stabilized
**Fix:** Warm-up phase before measurement
```javascript
// ✅ Good - warm-up phase
export let options = {
stages: [
{ duration: '2m', target: 50 }, // Warm-up (not measured)
{ duration: '10m', target: 100 }, // Actual test
]
}
```
---
### ❌ Unrealistic Test Data
**Symptom:** Using same user ID, same query parameters for all virtual users
**Why bad:** Caches give unrealistic performance, doesn't test real database load
**Fix:** Parameterized, realistic data
```javascript
// ❌ Bad - same data
http.get('https://api.example.com/users/123')
// ✅ Good - parameterized data
import { SharedArray } from 'k6/data'
import papaparse from 'https://jslib.k6.io/papaparse/5.1.1/index.js'
const csvData = new SharedArray('users', function () {
return papaparse.parse(open('./users.csv'), { header: true }).data
})
export default function() {
const user = csvData[__VU % csvData.length]
http.get(`https://api.example.com/users/${user.id}`)
}
```
---
### ❌ Constant Load Pattern
**Symptom:** Running with constant VUs instead of realistic traffic pattern
**Why bad:** Real traffic has peaks, valleys, not flat line
**Fix:** Use realistic daily/hourly patterns
---
### ❌ Ignoring Think Time
**Symptom:** No delays between requests, hammering API as fast as possible
**Why bad:** Unrealistic user behavior, overestimates load
**Fix:** Add realistic think time based on user behavior
```javascript
// ✅ Good - realistic think time
import { sleep } from 'k6'
export default function() {
http.get('https://api.example.com/products')
sleep(Math.random() * 3 + 2) // 2-5 seconds browsing
http.post('https://api.example.com/cart', {...})
sleep(Math.random() * 5 + 5) // 5-10 seconds deciding
http.post('https://api.example.com/checkout', {...})
}
```
## Result Interpretation Guide
### Latency Degradation Patterns
| Pattern | Cause | What to Check |
|---------|-------|---------------|
| **Linear growth** (2x users → 2x latency) | CPU-bound | Thread pool, CPU usage |
| **Exponential growth** (2x users → 10x latency) | Resource saturation | Connection pools, locks, queues |
| **Sudden cliff** (works until X, then fails) | Hard limit hit | Max connections, memory, file descriptors |
| **Gradual degradation** (slow increase over time) | Memory leak, cache pollution | Memory trends, GC activity |
### Bottleneck Classification
**Symptom: p95 latency 10x at 2x load**
**Resource saturation** (database connection pool, thread pool, queue)
**Symptom: Errors increase with load**
**Hard limit** (connection limit, rate limiting, timeout)
**Symptom: Latency grows over time at constant load**
**Memory leak** or **cache pollution**
**Symptom: High variance (p50 good, p99 terrible)**
**GC pauses**, **lock contention**, or **slow queries**
### What to Monitor
| Layer | Metrics to Track |
|-------|------------------|
| **Application** | Request rate, error rate, p50/p95/p99 latency, active requests |
| **Runtime** | GC pauses (JVM, .NET), thread pool usage, heap/memory |
| **Database** | Connection pool usage, query latency, lock waits, slow queries |
| **Infrastructure** | CPU %, memory %, disk I/O, network throughput |
| **External** | Third-party API latency, rate limit hits |
### Capacity Planning Formula
```
Safe Capacity = (Breaking Point × Degradation Factor) × Safety Margin
Breaking Point = VUs where p95 latency > threshold
Degradation Factor = 0.7 (start degradation before break)
Safety Margin = 0.5-0.7 (handle traffic spikes)
Example:
- System breaks at 1000 VUs (p95 > 1s)
- Start seeing degradation at 700 VUs (70%)
- Safe capacity: 700 × 0.7 = 490 VUs
```
## Authentication and Session Management
**Problem:** Real APIs require authentication. Can't use same token for all virtual users.
### Token Strategy Decision Framework
| Scenario | Strategy | Why |
|----------|----------|-----|
| **Short test (<10 min)** | Pre-generate tokens | Fast, simple, no login load |
| **Long test (soak)** | Login during test + refresh | Realistic, tests auth system |
| **Testing auth system** | Simulate login flow | Auth is part of load |
| **Read-only testing** | Shared token (single user) | Simplest, adequate for API-only tests |
**Default:** Pre-generate tokens for load tests, simulate login for auth system tests
### Pre-Generated Tokens Pattern
**Best for:** API testing where auth system isn't being tested
```javascript
// k6 with pre-generated JWT tokens
import http from 'k6/http'
import { SharedArray } from 'k6/data'
// Load tokens from file (generated externally)
const tokens = new SharedArray('auth tokens', function () {
return JSON.parse(open('./tokens.json'))
})
export default function() {
const token = tokens[__VU % tokens.length]
const headers = {
'Authorization': `Bearer ${token}`
}
http.get('https://api.example.com/protected', { headers })
}
```
**Generate tokens externally:**
```bash
# Script to generate 1000 tokens
for i in {1..1000}; do
curl -X POST https://api.example.com/login \
-d "username=loadtest_user_$i&password=test" \
| jq -r '.token'
done > tokens.json
```
**Pros:** No login load, fast test setup
**Cons:** Tokens may expire during long tests, not testing auth flow
---
### Login Flow Simulation Pattern
**Best for:** Testing auth system, soak tests where tokens expire
```javascript
// k6 with login simulation
import http from 'k6/http'
import { SharedArray } from 'k6/data'
const users = new SharedArray('users', function () {
return JSON.parse(open('./users.json')) // [{username, password}, ...]
})
export default function() {
const user = users[__VU % users.length]
// Login to get token
const loginRes = http.post('https://api.example.com/login', {
username: user.username,
password: user.password
})
const token = loginRes.json('token')
// Use token for subsequent requests
const headers = { 'Authorization': `Bearer ${token}` }
http.get('https://api.example.com/protected', { headers })
http.post('https://api.example.com/data', {}, { headers })
}
```
**Token refresh for long tests:**
```javascript
// k6 with token refresh
import { sleep } from 'k6'
let token = null
let tokenExpiry = 0
export default function() {
const now = Date.now() / 1000
// Refresh token if expired or about to expire
if (!token || now > tokenExpiry - 300) { // Refresh 5 min before expiry
const loginRes = http.post('https://api.example.com/login', {...})
token = loginRes.json('token')
tokenExpiry = loginRes.json('expires_at')
}
http.get('https://api.example.com/protected', {
headers: { 'Authorization': `Bearer ${token}` }
})
sleep(1)
}
```
---
### Session Cookie Management
**For cookie-based auth:**
```javascript
// k6 with session cookies
import http from 'k6/http'
export default function() {
// k6 automatically handles cookies with jar
const jar = http.cookieJar()
// Login (sets session cookie)
http.post('https://example.com/login', {
username: 'user',
password: 'pass'
})
// Subsequent requests use session cookie automatically
http.get('https://example.com/dashboard')
http.get('https://example.com/profile')
}
```
---
### Rate Limiting Detection
**Pattern:** Detect when hitting rate limits during load test
```javascript
// k6 rate limit detection
import { check } from 'k6'
export default function() {
const res = http.get('https://api.example.com/data')
check(res, {
'not rate limited': (r) => r.status !== 429
})
if (res.status === 429) {
console.warn(`Rate limited at VU ${__VU}, iteration ${__ITER}`)
const retryAfter = res.headers['Retry-After']
console.warn(`Retry-After: ${retryAfter} seconds`)
}
}
```
**Thresholds for rate limiting:**
```javascript
export let options = {
thresholds: {
'http_req_failed{status:429}': ['rate<0.01'] // <1% rate limited
}
}
```
## Third-Party Dependency Handling
**Problem:** APIs call external services (payment, email, third-party APIs). Should you mock them?
### Mock vs Real Decision Framework
| External Service | Mock or Real? | Why |
|------------------|---------------|-----|
| **Payment gateway** | Real (sandbox) | Need to test integration, has sandbox mode |
| **Email provider** | Mock | Cost ($0.001/email × 1000 VUs = expensive), no value testing |
| **Third-party API (has staging)** | Real (staging) | Test integration, realistic latency |
| **Third-party API (no staging)** | Mock | Can't load test production, rate limits |
| **Internal microservices** | Real | Testing real integration points |
| **Analytics/tracking** | Mock | High volume, no functional impact |
**Rule:** Use real services if they have sandbox/staging. Mock if expensive, rate-limited, or no test environment.
---
### Service Virtualization with WireMock
**Best for:** Mocking HTTP APIs with realistic responses
```javascript
// k6 test pointing to WireMock
export default function() {
// WireMock running on localhost:8080 mocks external API
const res = http.get('http://localhost:8080/api/payment/process')
check(res, {
'payment mock responds': (r) => r.status === 200
})
}
```
**WireMock stub setup:**
```json
{
"request": {
"method": "POST",
"url": "/api/payment/process"
},
"response": {
"status": 200,
"jsonBody": {
"transaction_id": "{{randomValue type='UUID'}}",
"status": "approved"
},
"headers": {
"Content-Type": "application/json"
},
"fixedDelayMilliseconds": 200
}
}
```
**Why WireMock:** Realistic latency simulation, dynamic responses, stateful mocking
---
### Partial Mocking Pattern
**Pattern:** Mock some services, use real for others
```javascript
// k6 with partial mocking
import http from 'k6/http'
export default function() {
// Real API (points to staging)
const productRes = http.get('https://staging-api.example.com/products')
// Mock email service (points to WireMock)
http.post('http://localhost:8080/mock/email/send', {
to: 'user@example.com',
subject: 'Order confirmation'
})
// Real payment sandbox
http.post('https://sandbox-payment.stripe.com/charge', {
amount: 1000,
currency: 'usd',
source: 'tok_visa'
})
}
```
**Decision criteria:**
- Real: Services with sandbox, need integration validation, low cost
- Mock: No sandbox, expensive, rate-limited, testing failure scenarios
---
### Testing External Service Failures
**Use mocks to simulate failures:**
```javascript
// WireMock stub for failure scenarios
{
"request": {
"method": "POST",
"url": "/api/payment/process"
},
"response": {
"status": 503,
"jsonBody": {
"error": "Service temporarily unavailable"
},
"fixedDelayMilliseconds": 5000 // Slow failure
}
}
```
**k6 test for resilience:**
```javascript
export default function() {
const res = http.post('http://localhost:8080/api/payment/process', {})
// Verify app handles payment failures gracefully
check(res, {
'handles payment failure': (r) => r.status === 503,
'returns within timeout': (r) => r.timings.duration < 6000
})
}
```
---
### Cost and Compliance Guardrails
**Before testing with real external services:**
| Check | Why |
|-------|-----|
| **Sandbox mode exists?** | Avoid production costs/rate limits |
| **Cost per request?** | 1000 VUs × 10 req/s × 600s = 6M requests |
| **Rate limits?** | Will you hit external service limits? |
| **Terms of service?** | Does load testing violate TOS? |
| **Data privacy?** | Using real user emails/PII? |
**Example cost calculation:**
```
Email service: $0.001/email
Load test: 100 VUs × 5 emails/session × 600s = 300,000 emails
Cost: 300,000 × $0.001 = $300
Decision: Mock email service, use real payment sandbox (free)
```
**Compliance:**
- Don't use real user data in load tests (GDPR, privacy)
- Check third-party TOS (some prohibit load testing)
- Use synthetic test data only
## Your First Load Test
**Goal:** Basic load test in one day
**Hour 1-2: Install tool and write smoke test**
```bash
# Install k6
brew install k6 # macOS
# or snap install k6 # Linux
# Create test.js
cat > test.js <<'EOF'
import http from 'k6/http'
import { check, sleep } from 'k6'
export let options = {
vus: 1,
duration: '30s'
}
export default function() {
let res = http.get('https://your-api.com/health')
check(res, {
'status is 200': (r) => r.status === 200,
'response < 500ms': (r) => r.timings.duration < 500
})
sleep(1)
}
EOF
# Run smoke test
k6 run test.js
```
**Hour 3-4: Calculate target load**
```
Your DAU: 10,000
Concurrency: 10%
Peak multiplier: 1.5
Target: 10,000 × 0.10 × 1.5 = 1,500 VUs
```
**Hour 5-6: Write load test with ramp-up**
```javascript
export let options = {
stages: [
{ duration: '5m', target: 750 }, // Ramp to normal (50%)
{ duration: '10m', target: 750 }, // Hold normal
{ duration: '5m', target: 1500 }, // Ramp to peak
{ duration: '10m', target: 1500 }, // Hold peak
{ duration: '5m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
http_req_failed: ['rate<0.05'] // < 5% errors
}
}
```
**Hour 7-8: Run test and analyze**
```bash
# Run load test
k6 run --out json=results.json test.js
# Check summary output for:
# - p95/p99 latency trends
# - Error rates
# - When degradation started
```
**If test fails:** Check thresholds, adjust targets, investigate bottlenecks
## Common Mistakes
### ❌ Testing Production Without Safeguards
**Fix:** Use feature flags, test environment, or controlled percentage
---
### ❌ No Baseline Performance Metrics
**Fix:** Run smoke test first to establish baseline before load testing
---
### ❌ Using Iteration Duration Instead of Arrival Rate
**Fix:** Use `constant-arrival-rate` executor in k6
---
### ❌ Not Warming Up Caches/JIT
**Fix:** 2-5 minute warm-up phase before measurement
## Quick Reference
**Tool Selection:**
- Modern API: k6
- Enterprise: JMeter
- Python team: Locust
**Test Patterns:**
- Smoke: 1 VU, 1 min
- Load: Ramp-up → peak → ramp-down
- Stress: Increase until break
- Spike: Sudden 10x surge
- Soak: 4-8 hours constant
**Load Calculation:**
```
Concurrent = DAU × 0.10 × 1.5
RPS = (Concurrent × Requests/Session) / (Duration × Think Time)
```
**Anti-Patterns:**
- Coordinated omission (use arrival rate)
- Cold start (warm-up first)
- Unrealistic data (parameterize)
- Constant load (use realistic patterns)
**Result Interpretation:**
- Linear growth → CPU-bound
- Exponential growth → Resource saturation
- Sudden cliff → Hard limit
- Gradual degradation → Memory leak
**Authentication:**
- Short tests: Pre-generate tokens
- Long tests: Login + refresh
- Testing auth: Simulate login flow
**Third-Party Dependencies:**
- Has sandbox: Use real (staging/sandbox)
- Expensive/rate-limited: Mock (WireMock)
- No sandbox: Mock
## Bottom Line
**Start with smoke test (1 VU). Calculate realistic load from DAU. Use ramp-up pattern (never start at peak). Monitor p95/p99 latency. Find breaking point before users do.**
Test realistic scenarios with think time, not hammer tests.