Initial commit

2025-11-30 08:49:50 +08:00
commit adc4b2be25
147 changed files with 24716 additions and 0 deletions
--- a/skills/mxcp-expert/references/comprehensive-testing-guide.md
+++ b/skills/mxcp-expert/references/comprehensive-testing-guide.md
@@ -0,0 +1,769 @@
+# Comprehensive Testing Guide
+
+**Complete testing strategy for MXCP servers: MXCP tests, Python unit tests, mocking, test databases, and concurrency safety.**
+
+## Two Types of Tests
+
+### 1. MXCP Tests (Integration Tests)
+
+**Purpose**: Test the full tool/resource/prompt as it will be called by LLMs.
+
+**Located**: In tool YAML files under `tests:` section
+
+**Run with**: `mxcp test`
+
+**Tests**:
+- Tool can be invoked with parameters
+- Return type matches specification
+- Result structure is correct
+- Parameter validation works
+
+**Example**:
+```yaml
+# tools/get_customers.yml
+mxcp: 1
+tool:
+  name: get_customers
+  tests:
+    - name: "basic_query"
+      arguments:
+        - key: city
+          value: "Chicago"
+      result:
+        - customer_id: 3
+          name: "Bob Johnson"
+```
+
+### 2. Python Unit Tests (Isolation Tests)
+
+**Purpose**: Test Python functions in isolation with mocking, edge cases, concurrency.
+
+**Located**: In `tests/` directory (pytest format)
+
+**Run with**: `pytest` or `python -m pytest`
+
+**Tests**:
+- Function logic correctness
+- Edge cases and error handling
+- Mocked external dependencies
+- Concurrency safety
+- Result correctness verification
+
+**Example**:
+```python
+# tests/test_api_wrapper.py
+import pytest
+from python.api_wrapper import fetch_users
+
+@pytest.mark.asyncio
+async def test_fetch_users_correctness():
+    """Test that fetch_users returns correct structure"""
+    result = await fetch_users(limit=5)
+
+    assert "users" in result
+    assert "count" in result
+    assert result["count"] == 5
+    assert len(result["users"]) == 5
+    assert all("id" in user for user in result["users"])
+```
+
+## When to Use Which Tests
+
+| Scenario | MXCP Tests | Python Unit Tests |
+|----------|------------|-------------------|
+| SQL-only tool | ✅ Required | ❌ Not applicable |
+| Python tool (no external calls) | ✅ Required | ✅ Recommended |
+| Python tool (with API calls) | ✅ Required | ✅ **Required** (with mocking) |
+| Python tool (with DB writes) | ✅ Required | ✅ **Required** (test DB) |
+| Python tool (async/concurrent) | ✅ Required | ✅ **Required** (concurrency tests) |
+
+## Complete Testing Workflow
+
+### Phase 1: MXCP Tests (Always First)
+
+**For every tool, add test cases to YAML:**
+
+```yaml
+tool:
+  name: my_tool
+  # ... definition ...
+  tests:
+    - name: "happy_path"
+      arguments:
+        - key: param1
+          value: "test_value"
+      result:
+        expected_field: "expected_value"
+
+    - name: "edge_case_empty"
+      arguments:
+        - key: param1
+          value: "nonexistent"
+      result: []
+
+    - name: "missing_optional_param"
+      arguments: []
+      # Should work with defaults
+```
+
+**Run**:
+```bash
+mxcp test tool my_tool
+```
+
+### Phase 2: Python Unit Tests (For Python Tools)
+
+**Create test file structure**:
+```bash
+mkdir -p tests
+touch tests/__init__.py
+touch tests/test_my_module.py
+```
+
+**Write unit tests with pytest**:
+```python
+# tests/test_my_module.py
+import pytest
+from python.my_module import my_function
+
+def test_my_function_correctness():
+    """Verify correct results"""
+    result = my_function("input")
+    assert result["key"] == "expected_value"
+    assert len(result["items"]) == 5
+
+def test_my_function_edge_cases():
+    """Test edge cases"""
+    assert my_function("") == {"error": "Empty input"}
+    assert my_function(None) == {"error": "Invalid input"}
+```
+
+**Run**:
+```bash
+pytest tests/
+# Or with coverage
+pytest --cov=python tests/
+```
+
+## Testing SQL Tools with Test Database
+
+**CRITICAL**: SQL tools must be tested with real data to verify result correctness.
+
+### Pattern 1: Use dbt Seeds for Test Data
+
+```bash
+# 1. Create test data seed
+cat > seeds/test_data.csv <<'EOF'
+id,name,value
+1,test1,100
+2,test2,200
+3,test3,300
+EOF
+
+# 2. Create schema
+cat > seeds/schema.yml <<'EOF'
+version: 2
+seeds:
+  - name: test_data
+    columns:
+      - name: id
+        tests: [unique, not_null]
+EOF
+
+# 3. Load test data
+dbt seed --select test_data
+
+# 4. Create tool with tests
+cat > tools/query_test_data.yml <<'EOF'
+mxcp: 1
+tool:
+  name: query_test_data
+  parameters:
+    - name: min_value
+      type: integer
+  return:
+    type: array
+  source:
+    code: |
+      SELECT * FROM test_data WHERE value >= $min_value
+  tests:
+    - name: "filter_200"
+      arguments:
+        - key: min_value
+          value: 200
+      result:
+        - id: 2
+          value: 200
+        - id: 3
+          value: 300
+EOF
+
+# 5. Test
+mxcp test tool query_test_data
+```
+
+### Pattern 2: Create Test Fixtures in SQL
+
+```sql
+-- models/test_fixtures.sql
+{{ config(materialized='table') }}
+
+-- Create predictable test data
+SELECT 1 as id, 'Alice' as name, 100 as score
+UNION ALL
+SELECT 2 as id, 'Bob' as name, 200 as score
+UNION ALL
+SELECT 3 as id, 'Charlie' as name, 150 as score
+```
+
+```yaml
+# tools/top_scores.yml
+tool:
+  name: top_scores
+  source:
+    code: |
+      SELECT * FROM test_fixtures ORDER BY score DESC LIMIT $limit
+  tests:
+    - name: "top_2"
+      arguments:
+        - key: limit
+          value: 2
+      result:
+        - id: 2
+          name: "Bob"
+          score: 200
+        - id: 3
+          name: "Charlie"
+          score: 150
+```
+
+### Pattern 3: Verify Aggregation Correctness
+
+```yaml
+# tools/calculate_stats.yml
+tool:
+  name: calculate_stats
+  source:
+    code: |
+      SELECT
+        COUNT(*) as total_count,
+        SUM(score) as total_score,
+        AVG(score) as avg_score,
+        MAX(score) as max_score
+      FROM test_fixtures
+  tests:
+    - name: "verify_aggregations"
+      arguments: []
+      result:
+        - total_count: 3
+          total_score: 450
+          avg_score: 150.0
+          max_score: 200
+```
+
+**If aggregations don't match expected values, the SQL logic is WRONG.**
+
+## Testing Python Tools with Mocking
+
+**CRITICAL**: Python tools with external API calls MUST use mocking in tests.
+
+### Pattern 1: Mock HTTP Calls with pytest-httpx
+
+```bash
+# Install
+pip install pytest-httpx
+```
+
+```python
+# python/api_client.py
+import httpx
+
+async def fetch_external_data(api_key: str, user_id: int) -> dict:
+    """Fetch data from external API"""
+    async with httpx.AsyncClient() as client:
+        response = await client.get(
+            f"https://api.example.com/users/{user_id}",
+            headers={"Authorization": f"Bearer {api_key}"}
+        )
+        response.raise_for_status()
+        return response.json()
+```
+
+```python
+# tests/test_api_client.py
+import pytest
+from httpx import Response
+from python.api_client import fetch_external_data
+
+@pytest.mark.asyncio
+async def test_fetch_external_data_success(httpx_mock):
+    """Test successful API call with mocked response"""
+    # Mock the HTTP call
+    httpx_mock.add_response(
+        url="https://api.example.com/users/123",
+        json={"id": 123, "name": "Test User", "email": "test@example.com"}
+    )
+
+    # Call function
+    result = await fetch_external_data("fake_api_key", 123)
+
+    # Verify correctness
+    assert result["id"] == 123
+    assert result["name"] == "Test User"
+    assert result["email"] == "test@example.com"
+
+@pytest.mark.asyncio
+async def test_fetch_external_data_error(httpx_mock):
+    """Test API error handling"""
+    httpx_mock.add_response(
+        url="https://api.example.com/users/999",
+        status_code=404,
+        json={"error": "User not found"}
+    )
+
+    # Should handle error gracefully
+    with pytest.raises(httpx.HTTPStatusError):
+        await fetch_external_data("fake_api_key", 999)
+```
+
+### Pattern 2: Mock Database Calls
+
+```python
+# python/db_operations.py
+from mxcp.runtime import db
+
+def get_user_orders(user_id: int) -> list[dict]:
+    """Get orders for a user"""
+    result = db.execute(
+        "SELECT * FROM orders WHERE user_id = $1",
+        {"user_id": user_id}
+    )
+    return result.fetchall()
+```
+
+```python
+# tests/test_db_operations.py
+import pytest
+from unittest.mock import Mock, MagicMock
+from python.db_operations import get_user_orders
+
+def test_get_user_orders(monkeypatch):
+    """Test with mocked database"""
+    # Create mock result
+    mock_result = MagicMock()
+    mock_result.fetchall.return_value = [
+        {"order_id": 1, "user_id": 123, "amount": 50.0},
+        {"order_id": 2, "user_id": 123, "amount": 75.0}
+    ]
+
+    # Mock db.execute
+    mock_db = Mock()
+    mock_db.execute.return_value = mock_result
+
+    # Inject mock
+    import python.db_operations
+    monkeypatch.setattr(python.db_operations, "db", mock_db)
+
+    # Test
+    orders = get_user_orders(123)
+
+    # Verify
+    assert len(orders) == 2
+    assert orders[0]["order_id"] == 1
+    assert sum(o["amount"] for o in orders) == 125.0
+```
+
+### Pattern 3: Mock Third-Party Libraries
+
+```python
+# python/stripe_wrapper.py
+import stripe
+
+def create_customer(email: str, name: str) -> dict:
+    """Create Stripe customer"""
+    customer = stripe.Customer.create(email=email, name=name)
+    return {"id": customer.id, "email": customer.email}
+```
+
+```python
+# tests/test_stripe_wrapper.py
+import pytest
+from unittest.mock import Mock, patch
+from python.stripe_wrapper import create_customer
+
+@patch('stripe.Customer.create')
+def test_create_customer(mock_create):
+    """Test Stripe customer creation with mock"""
+    # Mock Stripe response
+    mock_customer = Mock()
+    mock_customer.id = "cus_test123"
+    mock_customer.email = "test@example.com"
+    mock_create.return_value = mock_customer
+
+    # Call function
+    result = create_customer("test@example.com", "Test User")
+
+    # Verify correctness
+    assert result["id"] == "cus_test123"
+    assert result["email"] == "test@example.com"
+
+    # Verify Stripe was called correctly
+    mock_create.assert_called_once_with(
+        email="test@example.com",
+        name="Test User"
+    )
+```
+
+## Result Correctness Verification
+
+**CRITICAL**: Tests must verify results are CORRECT, not just that code doesn't crash.
+
+### Bad Test (Only checks structure):
+```python
+def test_calculate_total_bad():
+    result = calculate_total([10, 20, 30])
+    assert "total" in result  # ❌ Doesn't verify correctness
+```
+
+### Good Test (Verifies correct value):
+```python
+def test_calculate_total_good():
+    result = calculate_total([10, 20, 30])
+    assert result["total"] == 60  # ✅ Verifies correct calculation
+    assert result["count"] == 3   # ✅ Verifies correct count
+    assert result["average"] == 20.0  # ✅ Verifies correct average
+```
+
+### Pattern: Test Edge Cases for Correctness
+
+```python
+def test_aggregation_correctness():
+    """Test various aggregations for correctness"""
+    data = [
+        {"id": 1, "value": 100},
+        {"id": 2, "value": 200},
+        {"id": 3, "value": 150}
+    ]
+
+    result = aggregate_data(data)
+
+    # Verify each aggregation
+    assert result["sum"] == 450  # 100 + 200 + 150
+    assert result["avg"] == 150.0  # 450 / 3
+    assert result["min"] == 100
+    assert result["max"] == 200
+    assert result["count"] == 3
+
+    # Verify derived values
+    assert result["range"] == 100  # 200 - 100
+    assert result["median"] == 150
+
+def test_empty_data_correctness():
+    """Test edge case: empty data"""
+    result = aggregate_data([])
+
+    assert result["sum"] == 0
+    assert result["avg"] == 0.0
+    assert result["count"] == 0
+    # Ensure no crashes, correct behavior for empty data
+```
+
+## Concurrency Safety for Python Tools
+
+**CRITICAL**: MXCP tools run as a server - multiple requests can happen simultaneously.
+
+### Common Concurrency Issues
+
+#### ❌ WRONG: Global State with Race Conditions
+
+```python
+# python/unsafe_counter.py
+counter = 0  # ❌ DANGER: Race condition!
+
+def increment_counter() -> dict:
+    global counter
+    counter += 1  # ❌ Not thread-safe!
+    return {"count": counter}
+
+# Two simultaneous requests could both read counter=5,
+# both increment to 6, both write 6 -> one increment lost!
+```
+
+#### ✅ CORRECT: Use Thread-Safe Approaches
+
+**Option 1: Avoid shared state (stateless)**
+```python
+# python/safe_stateless.py
+def process_request(data: dict) -> dict:
+    """Completely stateless - safe for concurrent calls"""
+    result = compute_something(data)
+    return {"result": result}
+    # No global state, no problem!
+```
+
+**Option 2: Use thread-safe structures**
+```python
+# python/safe_with_lock.py
+import threading
+
+counter_lock = threading.Lock()
+counter = 0
+
+def increment_counter() -> dict:
+    global counter
+    with counter_lock:  # ✅ Thread-safe
+        counter += 1
+        current = counter
+    return {"count": current}
+```
+
+**Option 3: Use atomic operations**
+```python
+# python/safe_atomic.py
+from threading import Lock
+from collections import defaultdict
+
+# Thread-safe counter
+class SafeCounter:
+    def __init__(self):
+        self._value = 0
+        self._lock = Lock()
+
+    def increment(self):
+        with self._lock:
+            self._value += 1
+            return self._value
+
+counter = SafeCounter()
+
+def increment_counter() -> dict:
+    return {"count": counter.increment()}
+```
+
+### Concurrency-Safe Patterns
+
+#### Pattern 1: Database as State (DuckDB is thread-safe)
+
+```python
+# python/db_counter.py
+from mxcp.runtime import db
+
+def increment_counter() -> dict:
+    """Use database for state - thread-safe"""
+    db.execute("""
+        CREATE TABLE IF NOT EXISTS counter (
+            id INTEGER PRIMARY KEY,
+            value INTEGER
+        )
+    """)
+
+    db.execute("""
+        INSERT INTO counter (id, value) VALUES (1, 1)
+        ON CONFLICT(id) DO UPDATE SET value = value + 1
+    """)
+
+    result = db.execute("SELECT value FROM counter WHERE id = 1")
+    return {"count": result.fetchone()["value"]}
+```
+
+#### Pattern 2: Local Variables Only (Immutable)
+
+```python
+# python/safe_processing.py
+async def process_data(input_data: list[dict]) -> dict:
+    """Local variables only - safe for concurrent calls"""
+    # All state is local to this function call
+    results = []
+    total = 0
+
+    for item in input_data:
+        processed = transform(item)  # Pure function
+        results.append(processed)
+        total += processed["value"]
+
+    return {
+        "results": results,
+        "total": total,
+        "count": len(results)
+    }
+    # When function returns, all state is discarded
+```
+
+#### Pattern 3: Async/Await (Concurrent, Not Parallel)
+
+```python
+# python/safe_async.py
+import asyncio
+import httpx
+
+async def fetch_multiple_users(user_ids: list[int]) -> list[dict]:
+    """Concurrent API calls - safe with async"""
+
+    async def fetch_one(user_id: int) -> dict:
+        # Each call has its own context - no shared state
+        async with httpx.AsyncClient() as client:
+            response = await client.get(f"https://api.example.com/users/{user_id}")
+            return response.json()
+
+    # Run concurrently, but each fetch_one is independent
+    results = await asyncio.gather(*[fetch_one(uid) for uid in user_ids])
+    return results
+```
+
+### Testing Concurrency Safety
+
+```python
+# tests/test_concurrency.py
+import pytest
+import asyncio
+from python.my_module import concurrent_function
+
+@pytest.mark.asyncio
+async def test_concurrent_calls_no_race_condition():
+    """Test that concurrent calls don't have race conditions"""
+
+    # Run function 100 times concurrently
+    tasks = [concurrent_function(i) for i in range(100)]
+    results = await asyncio.gather(*tasks)
+
+    # Verify all calls succeeded
+    assert len(results) == 100
+
+    # Verify no data corruption
+    assert all(isinstance(r, dict) for r in results)
+
+    # If function has a counter, verify correctness
+    # (e.g., if each call increments, final count should be 100)
+
+def test_parallel_execution_thread_safe():
+    """Test with actual threading"""
+    import threading
+
+    results = []
+    errors = []
+
+    def worker(n):
+        try:
+            result = my_function(n)
+            results.append(result)
+        except Exception as e:
+            errors.append(e)
+
+    # Create 50 threads
+    threads = [threading.Thread(target=worker, args=(i,)) for i in range(50)]
+
+    # Start all threads
+    for t in threads:
+        t.start()
+
+    # Wait for completion
+    for t in threads:
+        t.join()
+
+    # Verify
+    assert len(errors) == 0, f"Errors occurred: {errors}"
+    assert len(results) == 50
+```
+
+## Complete Testing Checklist
+
+### For SQL Tools:
+
+- [ ] MXCP test cases in YAML
+- [ ] Test with real seed data
+- [ ] Verify result correctness (exact values)
+- [ ] Test edge cases (empty results, NULL values)
+- [ ] Test filters work correctly
+- [ ] Test aggregations are mathematically correct
+- [ ] Test with dbt test for data quality
+
+### For Python Tools (No External Calls):
+
+- [ ] MXCP test cases in YAML
+- [ ] Python unit tests (pytest)
+- [ ] Verify result correctness
+- [ ] Test edge cases (empty input, NULL, invalid)
+- [ ] Test error handling
+- [ ] Test concurrency safety (if using shared state)
+
+### For Python Tools (With External API Calls):
+
+- [ ] MXCP test cases in YAML
+- [ ] Python unit tests with mocking (pytest + httpx_mock)
+- [ ] Mock all external API calls
+- [ ] Test success path with mocked responses
+- [ ] Test error cases (404, 500, timeout)
+- [ ] Verify correct API parameters
+- [ ] Test result correctness
+- [ ] Test concurrency (multiple simultaneous calls)
+
+### For Python Tools (With Database Operations):
+
+- [ ] MXCP test cases in YAML
+- [ ] Python unit tests
+- [ ] Use test fixtures/seed data
+- [ ] Verify query results correctness
+- [ ] Test transactions (if applicable)
+- [ ] Test concurrency (DuckDB is thread-safe)
+- [ ] Clean up test data after tests
+
+## Project Structure for Testing
+
+```
+project/
+├── mxcp-site.yml
+├── tools/
+│   └── my_tool.yml              # Contains MXCP tests
+├── python/
+│   └── my_module.py             # Python code
+├── tests/
+│   ├── __init__.py
+│   ├── test_my_module.py        # Python unit tests
+│   ├── conftest.py              # pytest fixtures
+│   └── fixtures/
+│       └── test_data.json       # Test data
+├── seeds/
+│   ├── test_data.csv            # Test database seeds
+│   └── schema.yml
+└── requirements.txt             # Include: pytest, pytest-asyncio, pytest-httpx, pytest-cov
+```
+
+## Running Tests
+
+```bash
+# 1. MXCP tests (always run first)
+mxcp validate  # Structure validation
+mxcp test      # Integration tests
+
+# 2. dbt tests (if using dbt)
+dbt test
+
+# 3. Python unit tests
+pytest tests/ -v
+
+# 4. With coverage report
+pytest tests/ --cov=python --cov-report=html
+
+# 5. Concurrency stress test (custom)
+pytest tests/test_concurrency.py -v --count=100
+
+# All together
+mxcp validate && mxcp test && dbt test && pytest tests/ -v
+```
+
+## Summary
+
+**Both types of tests are required**:
+
+1. **MXCP tests** - Verify tools work end-to-end
+2. **Python unit tests** - Verify logic, mocking, correctness, concurrency
+
+**Key principles**:
+- ✅ **Mock all external calls** - Use pytest-httpx, unittest.mock
+- ✅ **Verify result correctness** - Don't just check structure
+- ✅ **Use test databases** - SQL tools need real data
+- ✅ **Test concurrency** - Tools run as servers
+- ✅ **Avoid global mutable state** - Use stateless patterns or locks
+- ✅ **Test edge cases** - Empty data, NULL, invalid input
+
+**Before declaring a project done, BOTH test types must pass completely.**