Initial commit

2025-11-30 08:59:43 +08:00
commit 966ef521f7
25 changed files with 9763 additions and 0 deletions
--- a/skills/quality-metrics-and-kpis/SKILL.md
+++ b/skills/quality-metrics-and-kpis/SKILL.md
@@ -0,0 +1,448 @@
+---
+name: quality-metrics-and-kpis
+description: Use when setting up quality dashboards, defining test coverage targets, tracking quality trends, configuring CI/CD quality gates, or reporting quality metrics to stakeholders - provides metric selection, threshold strategies, and dashboard design patterns
+---
+
+# Quality Metrics & KPIs
+
+## Overview
+
+**Core principle:** Measure what matters. Track trends, not absolutes. Use metrics to drive action, not for vanity.
+
+**Rule:** Every metric must have a defined threshold and action plan. If a metric doesn't change behavior, stop tracking it.
+
+## Quality Metrics vs Vanity Metrics
+
+| Type | Example | Problem | Better Metric |
+|------|---------|---------|---------------|
+| **Vanity** | "We have 10,000 tests!" | Doesn't indicate quality | Pass rate, flakiness rate |
+| **Vanity** | "95% code coverage!" | Can be gamed, doesn't mean tests are good | Coverage delta (new code), mutation score |
+| **Actionable** | "Test flakiness: 5% → 2%" | Drives action | Track trend, set target |
+| **Actionable** | "P95 build time: 15 min" | Identifies bottleneck | Optimize slow tests |
+
+**Actionable metrics answer:** "What should I fix next?"
+
+---
+
+## Core Quality Metrics
+
+### 1. Test Pass Rate
+
+**Definition:** % of tests that pass on first run
+
+```
+Pass Rate = (Passing Tests / Total Tests) × 100
+```
+
+**Thresholds:**
+- **> 98%:** Healthy
+- **95-98%:** Investigate failures
+- **< 95%:** Critical (tests are unreliable)
+
+**Why it matters:** Low pass rate means flaky tests or broken code
+
+**Action:** If < 98%, run flaky-test-prevention skill
+
+---
+
+### 2. Test Flakiness Rate
+
+**Definition:** % of tests that fail intermittently
+
+```
+Flakiness Rate = (Flaky Tests / Total Tests) × 100
+```
+
+**How to measure:**
+```bash
+# Run each test 100 times
+pytest --count=100 test_checkout.py
+
+# Flaky if passes 1-99 times (not 0 or 100)
+```
+
+**Thresholds:**
+- **< 1%:** Healthy
+- **1-5%:** Moderate (fix soon)
+- **> 5%:** Critical (CI is unreliable)
+
+**Action:** Fix flaky tests before adding new tests
+
+---
+
+### 3. Code Coverage
+
+**Definition:** % of code lines executed by tests
+
+```
+Coverage = (Executed Lines / Total Lines) × 100
+```
+
+**Thresholds (by test type):**
+- **Unit tests:** 80-90% of business logic
+- **Integration tests:** 60-70% of integration points
+- **E2E tests:** 40-50% of critical paths
+
+**Configuration (pytest):**
+```ini
+# .coveragerc
+[run]
+source = src
+omit = */tests/*, */migrations/*
+
+[report]
+fail_under = 80  # Fail if coverage < 80%
+show_missing = True
+```
+
+**Anti-pattern:** 100% coverage as goal
+
+**Why it's wrong:** Easy to game (tests that execute code without asserting anything)
+
+**Better metric:** Coverage + mutation score (see mutation-testing skill)
+
+---
+
+### 4. Coverage Delta (New Code)
+
+**Definition:** Coverage of newly added code
+
+**Why it matters:** More actionable than absolute coverage
+
+```bash
+# Measure coverage on changed files only
+pytest --cov=src --cov-report=term-missing \
+  $(git diff --name-only origin/main...HEAD | grep '\.py$')
+```
+
+**Threshold:** 90% for new code (stricter than legacy)
+
+**Action:** Block PR if new code coverage < 90%
+
+---
+
+### 5. Build Time (CI/CD)
+
+**Definition:** Time from commit to merge-ready
+
+**Track by stage:**
+- **Lint/format:** < 30s
+- **Unit tests:** < 2 min
+- **Integration tests:** < 5 min
+- **E2E tests:** < 15 min
+- **Total PR pipeline:** < 20 min
+
+**Why it matters:** Slow CI blocks developer productivity
+
+**Action:** If build > 20 min, see test-automation-architecture for optimization patterns
+
+---
+
+### 6. Test Execution Time Trend
+
+**Definition:** How test suite duration changes over time
+
+```python
+# Track in CI
+import time
+import json
+
+start = time.time()
+pytest.main()
+duration = time.time() - start
+
+metrics = {"test_duration_seconds": duration, "timestamp": time.time()}
+with open("metrics.json", "w") as f:
+    json.dump(metrics, f)
+```
+
+**Threshold:** < 5% growth per month
+
+**Action:** If growth > 5%/month, parallelize tests or refactor slow tests
+
+---
+
+### 7. Defect Escape Rate
+
+**Definition:** Bugs found in production that should have been caught by tests
+
+```
+Defect Escape Rate = (Production Bugs / Total Releases) × 100
+```
+
+**Thresholds:**
+- **< 2%:** Excellent
+- **2-5%:** Acceptable
+- **> 5%:** Tests are missing critical scenarios
+
+**Action:** For each escape, write regression test to prevent recurrence
+
+---
+
+### 8. Mean Time to Detection (MTTD)
+
+**Definition:** Time from bug introduction to discovery
+
+```
+MTTD = Deployment Time - Bug Introduction Time
+```
+
+**Thresholds:**
+- **< 1 hour:** Excellent (caught in CI)
+- **1-24 hours:** Good (caught in staging/canary)
+- **> 24 hours:** Poor (caught in production)
+
+**Action:** If MTTD > 24h, improve observability (see observability-and-monitoring skill)
+
+---
+
+### 9. Mean Time to Recovery (MTTR)
+
+**Definition:** Time from bug detection to fix deployed
+
+```
+MTTR = Fix Deployment Time - Bug Detection Time
+```
+
+**Thresholds:**
+- **< 1 hour:** Excellent
+- **1-8 hours:** Acceptable
+- **> 8 hours:** Poor
+
+**Action:** If MTTR > 8h, improve rollback procedures (see testing-in-production skill)
+
+---
+
+## Dashboard Design
+
+### Grafana Dashboard Example
+
+```yaml
+# grafana-dashboard.json
+{
+  "panels": [
+    {
+      "title": "Test Pass Rate (7 days)",
+      "targets": [{
+        "expr": "sum(tests_passed) / sum(tests_total) * 100"
+      }],
+      "thresholds": [
+        {"value": 95, "color": "red"},
+        {"value": 98, "color": "yellow"},
+        {"value": 100, "color": "green"}
+      ]
+    },
+    {
+      "title": "Build Time Trend (30 days)",
+      "targets": [{
+        "expr": "avg_over_time(ci_build_duration_seconds[30d])"
+      }]
+    },
+    {
+      "title": "Coverage Delta (per PR)",
+      "targets": [{
+        "expr": "coverage_new_code_percent"
+      }],
+      "thresholds": [
+        {"value": 90, "color": "green"},
+        {"value": 80, "color": "yellow"},
+        {"value": 0, "color": "red"}
+      ]
+    }
+  ]
+}
+```
+
+---
+
+### CI/CD Quality Gates
+
+**GitHub Actions example:**
+
+```yaml
+# .github/workflows/quality-gates.yml
+name: Quality Gates
+
+on: [pull_request]
+
+jobs:
+  quality-check:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Run tests with coverage
+        run: pytest --cov=src --cov-report=json
+
+      - name: Check coverage threshold
+        run: |
+          COVERAGE=$(jq '.totals.percent_covered' coverage.json)
+          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
+            echo "Coverage $COVERAGE% below 80% threshold"
+            exit 1
+          fi
+
+      - name: Check build time
+        run: |
+          DURATION=$(jq '.duration' test-results.json)
+          if (( $(echo "$DURATION > 300" | bc -l) )); then
+            echo "Build time ${DURATION}s exceeds 5-minute threshold"
+            exit 1
+          fi
+```
+
+---
+
+## Reporting Patterns
+
+### Weekly Quality Report
+
+**Template:**
+
+```markdown
+# Quality Report - Week of 2025-01-13
+
+## Summary
+- **Test pass rate:** 98.5% (+0.5% from last week)
+- **Flakiness rate:** 2.1% (-1.3% from last week) ✅
+- **Coverage:** 85.2% (+2.1% from last week) ✅
+- **Build time:** 18 min (-2 min from last week) ✅
+
+## Actions Taken
+- Fixed 8 flaky tests in checkout flow
+- Added integration tests for payment service (+5% coverage)
+- Parallelized slow E2E tests (reduced build time by 2 min)
+
+## Action Items
+- [ ] Fix remaining 3 flaky tests in user registration
+- [ ] Increase coverage of order service (currently 72%)
+- [ ] Investigate why staging MTTD increased to 4 hours
+```
+
+---
+
+### Stakeholder Dashboard (Executive View)
+
+**Metrics to show:**
+1. **Quality trend (6 months):** Pass rate over time
+2. **Velocity impact:** How long does CI take per PR?
+3. **Production stability:** Defect escape rate
+4. **Recovery time:** MTTR for incidents
+
+**What NOT to show:**
+- Absolute test count (vanity metric)
+- Lines of code (meaningless)
+- Individual developer metrics (creates wrong incentives)
+
+---
+
+## Anti-Patterns Catalog
+
+### ❌ Coverage as the Only Metric
+
+**Symptom:** "We need 100% coverage!"
+
+**Why bad:** Easy to game with meaningless tests
+
+```python
+# ❌ BAD: 100% coverage, 0% value
+def calculate_tax(amount):
+    return amount * 0.08
+
+def test_calculate_tax():
+    calculate_tax(100)  # Executes code, asserts nothing!
+```
+
+**Fix:** Use coverage + mutation score
+
+---
+
+### ❌ Tracking Metrics Without Thresholds
+
+**Symptom:** Dashboard shows metrics but no action taken
+
+**Why bad:** Metrics become noise
+
+**Fix:** Every metric needs:
+- **Target threshold** (e.g., flakiness < 1%)
+- **Alert level** (e.g., alert if flakiness > 5%)
+- **Action plan** (e.g., "Fix flaky tests before adding new features")
+
+---
+
+### ❌ Optimizing for Metrics, Not Quality
+
+**Symptom:** Gaming metrics to hit targets
+
+**Example:** Removing tests to increase pass rate
+
+**Fix:** Track multiple complementary metrics (pass rate + flakiness + coverage)
+
+---
+
+### ❌ Measuring Individual Developer Productivity
+
+**Symptom:** "Developer A writes more tests than Developer B"
+
+**Why bad:** Creates wrong incentives (quantity over quality)
+
+**Fix:** Measure team metrics, not individual
+
+---
+
+## Tool Integration
+
+### SonarQube Metrics
+
+**Quality Gate:**
+```properties
+# sonar-project.properties
+sonar.qualitygate.wait=true
+
+# Metrics tracked:
+# - Bugs (target: 0)
+# - Vulnerabilities (target: 0)
+# - Code smells (target: < 100)
+# - Coverage (target: > 80%)
+# - Duplications (target: < 3%)
+```
+
+---
+
+### Codecov Integration
+
+```yaml
+# codecov.yml
+coverage:
+  status:
+    project:
+      default:
+        target: 80%      # Overall coverage target
+        threshold: 2%    # Allow 2% drop
+
+    patch:
+      default:
+        target: 90%      # New code must have 90% coverage
+        threshold: 0%    # No drops allowed
+```
+
+---
+
+## Bottom Line
+
+**Track actionable metrics with defined thresholds. Use metrics to drive improvement, not for vanity.**
+
+**Core dashboard:**
+- Test pass rate (> 98%)
+- Flakiness rate (< 1%)
+- Coverage delta on new code (> 90%)
+- Build time (< 20 min)
+- Defect escape rate (< 2%)
+
+**Weekly actions:**
+- Review metrics against thresholds
+- Identify trends (improving/degrading)
+- Create action items for violations
+- Track progress on improvements
+
+**If you're tracking a metric but not acting on it, stop tracking it. Metrics exist to drive action, not to fill dashboards.**