Initial commit

2025-11-30 08:59:43 +08:00
commit 966ef521f7
25 changed files with 9763 additions and 0 deletions
--- a/skills/fuzz-testing/SKILL.md
+++ b/skills/fuzz-testing/SKILL.md
@@ -0,0 +1,445 @@
+---
+name: fuzz-testing
+description: Use when testing input validation, discovering edge cases, finding security vulnerabilities, testing parsers/APIs with random inputs, or integrating fuzzing tools (AFL, libFuzzer, Atheris) - provides fuzzing strategies, tool selection, and crash triage workflows
+---
+
+# Fuzz Testing
+
+## Overview
+
+**Core principle:** Fuzz testing feeds random/malformed inputs to find crashes, hangs, and security vulnerabilities that manual tests miss.
+
+**Rule:** Fuzzing finds bugs you didn't know to test for. Use it for security-critical code (parsers, validators, APIs).
+
+## Fuzz Testing vs Other Testing
+
+| Test Type | Input | Goal |
+|-----------|-------|------|
+| **Unit Testing** | Known valid/invalid inputs | Verify expected behavior |
+| **Property-Based Testing** | Generated valid inputs | Verify invariants hold |
+| **Fuzz Testing** | Random/malformed inputs | Find crashes, hangs, memory issues |
+
+**Fuzzing finds:** Buffer overflows, null pointer dereferences, infinite loops, unhandled exceptions
+
+**Fuzzing does NOT find:** Logic bugs, performance issues
+
+---
+
+## When to Use Fuzz Testing
+
+**Good candidates:**
+- Input parsers (JSON, XML, CSV, binary formats)
+- Network protocol handlers
+- Image/video codecs
+- Cryptographic functions
+- User input validators (file uploads, form data)
+- APIs accepting untrusted data
+
+**Poor candidates:**
+- Business logic (use property-based testing)
+- UI interactions (use E2E tests)
+- Database queries (use integration tests)
+
+---
+
+## Tool Selection
+
+| Tool | Language | Type | When to Use |
+|------|----------|------|-------------|
+| **Atheris** | Python | Coverage-guided | Python applications, libraries |
+| **AFL (American Fuzzy Lop)** | C/C++ | Coverage-guided | Native code, high performance |
+| **libFuzzer** | C/C++/Rust | Coverage-guided | Integrated with LLVM/Clang |
+| **Jazzer** | Java/JVM | Coverage-guided | Java applications |
+| **go-fuzz** | Go | Coverage-guided | Go applications |
+
+**Coverage-guided:** Tracks which code paths are executed, generates inputs to explore new paths
+
+---
+
+## Basic Fuzzing Example (Python + Atheris)
+
+### Installation
+
+```bash
+pip install atheris
+```
+
+---
+
+### Simple Fuzz Test
+
+```python
+import atheris
+import sys
+
+def parse_email(data):
+    """Function to fuzz - finds bugs we didn't know about."""
+    if "@" not in data:
+        raise ValueError("Invalid email")
+
+    local, domain = data.split("@", 1)
+
+    if "." not in domain:
+        raise ValueError("Invalid domain")
+
+    # BUG: Crashes on multiple @ symbols!
+    # "user@@example.com" → crashes with ValueError
+
+    return (local, domain)
+
+@atheris.instrument_func
+def TestOneInput(data):
+    """Fuzz harness - called repeatedly with random inputs."""
+    try:
+        parse_email(data.decode('utf-8', errors='ignore'))
+    except (ValueError, UnicodeDecodeError):
+        # Expected exceptions - not crashes
+        pass
+    # Any other exception = crash found!
+
+atheris.Setup(sys.argv, TestOneInput)
+atheris.Fuzz()
+```
+
+**Run:**
+```bash
+python fuzz_email.py
+```
+
+**Output:**
+```
+INFO: Seed: 1234567890
+INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
+#1: NEW coverage: 10 exec/s: 1000
+#100: NEW coverage: 15 exec/s: 5000
+CRASH: input was 'user@@example.com'
+```
+
+---
+
+## Advanced Fuzzing Patterns
+
+### Structured Fuzzing (JSON)
+
+**Problem:** Random bytes rarely form valid JSON
+
+```python
+import atheris
+import json
+
+@atheris.instrument_func
+def TestOneInput(data):
+    try:
+        # Parse as JSON
+        obj = json.loads(data.decode('utf-8', errors='ignore'))
+
+        # Fuzz your JSON handler
+        process_user_data(obj)
+    except (json.JSONDecodeError, ValueError, KeyError):
+        pass  # Expected for invalid JSON
+
+def process_user_data(data):
+    """Crashes on: {"name": "", "age": -1}"""
+    if len(data["name"]) == 0:
+        raise ValueError("Name cannot be empty")
+    if data["age"] < 0:
+        raise ValueError("Age cannot be negative")
+```
+
+---
+
+### Fuzzing with Corpus (Seed Inputs)
+
+**Corpus:** Collection of valid inputs to start from
+
+```python
+import atheris
+import sys
+import os
+
+# Seed corpus: Valid examples
+CORPUS_DIR = "./corpus"
+os.makedirs(CORPUS_DIR, exist_ok=True)
+
+# Create seed files
+with open(f"{CORPUS_DIR}/valid1.txt", "wb") as f:
+    f.write(b"user@example.com")
+with open(f"{CORPUS_DIR}/valid2.txt", "wb") as f:
+    f.write(b"alice+tag@subdomain.example.org")
+
+@atheris.instrument_func
+def TestOneInput(data):
+    try:
+        parse_email(data.decode('utf-8'))
+    except ValueError:
+        pass
+
+atheris.Setup(sys.argv, TestOneInput, corpus_dir=CORPUS_DIR)
+atheris.Fuzz()
+```
+
+**Benefits:** Faster convergence to interesting inputs
+
+---
+
+## Crash Triage Workflow
+
+### 1. Reproduce Crash
+
+```bash
+# Atheris outputs crash input
+CRASH: input was b'user@@example.com'
+
+# Save to file
+echo "user@@example.com" > crash.txt
+```
+
+---
+
+### 2. Minimize Input
+
+**Find smallest input that triggers crash:**
+
+```python
+# Original: "user@@example.com" (19 bytes)
+# Minimized: "@@" (2 bytes)
+
+# Atheris does this automatically
+python fuzz_email.py crash.txt
+```
+
+---
+
+### 3. Root Cause Analysis
+
+```python
+def parse_email(data):
+    # Crash: data = "@@"
+    local, domain = data.split("@", 1)
+    # local = "", domain = "@"
+
+    if "." not in domain:
+        # domain = "@" → no "." → raises ValueError
+        raise ValueError("Invalid domain")
+
+    # FIX: Validate before splitting
+    if data.count("@") != 1:
+        raise ValueError("Email must have exactly one @")
+```
+
+---
+
+### 4. Write Regression Test
+
+```python
+def test_email_multiple_at_symbols():
+    """Regression test for fuzz-found bug."""
+    with pytest.raises(ValueError, match="exactly one @"):
+        parse_email("user@@example.com")
+```
+
+---
+
+## Integration with CI/CD
+
+### Continuous Fuzzing (GitHub Actions)
+
+```yaml
+# .github/workflows/fuzz.yml
+name: Fuzz Testing
+
+on:
+  schedule:
+    - cron: '0 2 * * *'  # Nightly at 2 AM
+  workflow_dispatch:
+
+jobs:
+  fuzz:
+    runs-on: ubuntu-latest
+    timeout-minutes: 60  # Run for 1 hour
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: pip install atheris
+
+      - name: Run fuzzing
+        run: |
+          timeout 3600 python fuzz_email.py || true
+
+      - name: Upload crashes
+        if: failure()
+        uses: actions/upload-artifact@v3
+        with:
+          name: fuzz-crashes
+          path: crash-*
+```
+
+**Why nightly:** Fuzzing is CPU-intensive, not suitable for every PR
+
+---
+
+## AFL (C/C++) Example
+
+### Installation
+
+```bash
+# Ubuntu/Debian
+sudo apt-get install afl++
+
+# macOS
+brew install afl++
+```
+
+---
+
+### Fuzz Target
+
+```c
+// fuzz_target.c
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+void parse_command(const char *input) {
+    char buffer[64];
+
+    // BUG: Buffer overflow if input > 64 bytes!
+    strcpy(buffer, input);
+
+    if (strcmp(buffer, "exit") == 0) {
+        exit(0);
+    }
+}
+
+int main(int argc, char **argv) {
+    if (argc < 2) return 1;
+
+    FILE *f = fopen(argv[1], "rb");
+    if (!f) return 1;
+
+    char buffer[1024];
+    size_t len = fread(buffer, 1, sizeof(buffer), f);
+    fclose(f);
+
+    buffer[len] = '\0';
+    parse_command(buffer);
+
+    return 0;
+}
+```
+
+---
+
+### Compile and Run
+
+```bash
+# Compile with AFL instrumentation
+afl-gcc fuzz_target.c -o fuzz_target
+
+# Create corpus directory
+mkdir -p corpus
+echo "exit" > corpus/input1.txt
+
+# Run fuzzer
+afl-fuzz -i corpus -o findings -- ./fuzz_target @@
+```
+
+**Output:**
+```
+american fuzzy lop 4.00a
+  path : findings/queue
+  crashes : 1
+  hangs : 0
+  execs done : 1000000
+```
+
+**Crashes found in:** `findings/crashes/`
+
+---
+
+## Anti-Patterns Catalog
+
+### ❌ Fuzzing Without Sanitizers
+
+**Symptom:** Memory bugs don't crash, just corrupt silently
+
+**Fix:** Compile with AddressSanitizer (ASan)
+
+```bash
+# C/C++: Compile with ASan
+afl-gcc -fsanitize=address fuzz_target.c -o fuzz_target
+
+# Python: Use PyASan (if available)
+```
+
+**What ASan catches:** Buffer overflows, use-after-free, memory leaks
+
+---
+
+### ❌ Ignoring Hangs
+
+**Symptom:** Fuzzer reports hangs, not investigated
+
+**What hangs mean:** Infinite loops, algorithmic complexity attacks
+
+**Fix:** Investigate and add timeout checks
+
+```python
+import signal
+
+def timeout_handler(signum, frame):
+    raise TimeoutError("Operation timed out")
+
+@atheris.instrument_func
+def TestOneInput(data):
+    signal.signal(signal.SIGALRM, timeout_handler)
+    signal.alarm(1)  # 1-second timeout
+
+    try:
+        parse_data(data.decode('utf-8'))
+    except (ValueError, TimeoutError):
+        pass
+    finally:
+        signal.alarm(0)
+```
+
+---
+
+### ❌ No Regression Tests
+
+**Symptom:** Same bugs found repeatedly
+
+**Fix:** Add regression test for every crash
+
+```python
+# After fuzzing finds crash on input "@@"
+def test_regression_double_at():
+    with pytest.raises(ValueError):
+        parse_email("@@")
+```
+
+---
+
+## Bottom Line
+
+**Fuzz testing finds crashes and security vulnerabilities by feeding random/malformed inputs. Use it for security-critical code (parsers, validators, APIs).**
+
+**Setup:**
+- Use Atheris (Python), AFL (C/C++), or language-specific fuzzer
+- Start with corpus (valid examples)
+- Run nightly in CI (1-24 hours)
+
+**Workflow:**
+1. Fuzzer finds crash
+2. Minimize crashing input
+3. Root cause analysis
+4. Fix bug
+5. Add regression test
+
+**If your code accepts untrusted input (files, network data, user input), you should be fuzzing it. Fuzzing finds bugs that manual testing misses.**