Files
gh-raw-labs-claude-code-mar…/skills/mxcp-expert/references/comprehensive-testing-guide.md
2025-11-30 08:49:50 +08:00

19 KiB

Comprehensive Testing Guide

Complete testing strategy for MXCP servers: MXCP tests, Python unit tests, mocking, test databases, and concurrency safety.

Two Types of Tests

1. MXCP Tests (Integration Tests)

Purpose: Test the full tool/resource/prompt as it will be called by LLMs.

Located: In tool YAML files under tests: section

Run with: mxcp test

Tests:

  • Tool can be invoked with parameters
  • Return type matches specification
  • Result structure is correct
  • Parameter validation works

Example:

# tools/get_customers.yml
mxcp: 1
tool:
  name: get_customers
  tests:
    - name: "basic_query"
      arguments:
        - key: city
          value: "Chicago"
      result:
        - customer_id: 3
          name: "Bob Johnson"

2. Python Unit Tests (Isolation Tests)

Purpose: Test Python functions in isolation with mocking, edge cases, concurrency.

Located: In tests/ directory (pytest format)

Run with: pytest or python -m pytest

Tests:

  • Function logic correctness
  • Edge cases and error handling
  • Mocked external dependencies
  • Concurrency safety
  • Result correctness verification

Example:

# tests/test_api_wrapper.py
import pytest
from python.api_wrapper import fetch_users

@pytest.mark.asyncio
async def test_fetch_users_correctness():
    """Test that fetch_users returns correct structure"""
    result = await fetch_users(limit=5)

    assert "users" in result
    assert "count" in result
    assert result["count"] == 5
    assert len(result["users"]) == 5
    assert all("id" in user for user in result["users"])

When to Use Which Tests

Scenario MXCP Tests Python Unit Tests
SQL-only tool Required Not applicable
Python tool (no external calls) Required Recommended
Python tool (with API calls) Required Required (with mocking)
Python tool (with DB writes) Required Required (test DB)
Python tool (async/concurrent) Required Required (concurrency tests)

Complete Testing Workflow

Phase 1: MXCP Tests (Always First)

For every tool, add test cases to YAML:

tool:
  name: my_tool
  # ... definition ...
  tests:
    - name: "happy_path"
      arguments:
        - key: param1
          value: "test_value"
      result:
        expected_field: "expected_value"

    - name: "edge_case_empty"
      arguments:
        - key: param1
          value: "nonexistent"
      result: []

    - name: "missing_optional_param"
      arguments: []
      # Should work with defaults

Run:

mxcp test tool my_tool

Phase 2: Python Unit Tests (For Python Tools)

Create test file structure:

mkdir -p tests
touch tests/__init__.py
touch tests/test_my_module.py

Write unit tests with pytest:

# tests/test_my_module.py
import pytest
from python.my_module import my_function

def test_my_function_correctness():
    """Verify correct results"""
    result = my_function("input")
    assert result["key"] == "expected_value"
    assert len(result["items"]) == 5

def test_my_function_edge_cases():
    """Test edge cases"""
    assert my_function("") == {"error": "Empty input"}
    assert my_function(None) == {"error": "Invalid input"}

Run:

pytest tests/
# Or with coverage
pytest --cov=python tests/

Testing SQL Tools with Test Database

CRITICAL: SQL tools must be tested with real data to verify result correctness.

Pattern 1: Use dbt Seeds for Test Data

# 1. Create test data seed
cat > seeds/test_data.csv <<'EOF'
id,name,value
1,test1,100
2,test2,200
3,test3,300
EOF

# 2. Create schema
cat > seeds/schema.yml <<'EOF'
version: 2
seeds:
  - name: test_data
    columns:
      - name: id
        tests: [unique, not_null]
EOF

# 3. Load test data
dbt seed --select test_data

# 4. Create tool with tests
cat > tools/query_test_data.yml <<'EOF'
mxcp: 1
tool:
  name: query_test_data
  parameters:
    - name: min_value
      type: integer
  return:
    type: array
  source:
    code: |
      SELECT * FROM test_data WHERE value >= $min_value
  tests:
    - name: "filter_200"
      arguments:
        - key: min_value
          value: 200
      result:
        - id: 2
          value: 200
        - id: 3
          value: 300
EOF

# 5. Test
mxcp test tool query_test_data

Pattern 2: Create Test Fixtures in SQL

-- models/test_fixtures.sql
{{ config(materialized='table') }}

-- Create predictable test data
SELECT 1 as id, 'Alice' as name, 100 as score
UNION ALL
SELECT 2 as id, 'Bob' as name, 200 as score
UNION ALL
SELECT 3 as id, 'Charlie' as name, 150 as score
# tools/top_scores.yml
tool:
  name: top_scores
  source:
    code: |
      SELECT * FROM test_fixtures ORDER BY score DESC LIMIT $limit
  tests:
    - name: "top_2"
      arguments:
        - key: limit
          value: 2
      result:
        - id: 2
          name: "Bob"
          score: 200
        - id: 3
          name: "Charlie"
          score: 150

Pattern 3: Verify Aggregation Correctness

# tools/calculate_stats.yml
tool:
  name: calculate_stats
  source:
    code: |
      SELECT
        COUNT(*) as total_count,
        SUM(score) as total_score,
        AVG(score) as avg_score,
        MAX(score) as max_score
      FROM test_fixtures
  tests:
    - name: "verify_aggregations"
      arguments: []
      result:
        - total_count: 3
          total_score: 450
          avg_score: 150.0
          max_score: 200

If aggregations don't match expected values, the SQL logic is WRONG.

Testing Python Tools with Mocking

CRITICAL: Python tools with external API calls MUST use mocking in tests.

Pattern 1: Mock HTTP Calls with pytest-httpx

# Install
pip install pytest-httpx
# python/api_client.py
import httpx

async def fetch_external_data(api_key: str, user_id: int) -> dict:
    """Fetch data from external API"""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://api.example.com/users/{user_id}",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        response.raise_for_status()
        return response.json()
# tests/test_api_client.py
import pytest
from httpx import Response
from python.api_client import fetch_external_data

@pytest.mark.asyncio
async def test_fetch_external_data_success(httpx_mock):
    """Test successful API call with mocked response"""
    # Mock the HTTP call
    httpx_mock.add_response(
        url="https://api.example.com/users/123",
        json={"id": 123, "name": "Test User", "email": "test@example.com"}
    )

    # Call function
    result = await fetch_external_data("fake_api_key", 123)

    # Verify correctness
    assert result["id"] == 123
    assert result["name"] == "Test User"
    assert result["email"] == "test@example.com"

@pytest.mark.asyncio
async def test_fetch_external_data_error(httpx_mock):
    """Test API error handling"""
    httpx_mock.add_response(
        url="https://api.example.com/users/999",
        status_code=404,
        json={"error": "User not found"}
    )

    # Should handle error gracefully
    with pytest.raises(httpx.HTTPStatusError):
        await fetch_external_data("fake_api_key", 999)

Pattern 2: Mock Database Calls

# python/db_operations.py
from mxcp.runtime import db

def get_user_orders(user_id: int) -> list[dict]:
    """Get orders for a user"""
    result = db.execute(
        "SELECT * FROM orders WHERE user_id = $1",
        {"user_id": user_id}
    )
    return result.fetchall()
# tests/test_db_operations.py
import pytest
from unittest.mock import Mock, MagicMock
from python.db_operations import get_user_orders

def test_get_user_orders(monkeypatch):
    """Test with mocked database"""
    # Create mock result
    mock_result = MagicMock()
    mock_result.fetchall.return_value = [
        {"order_id": 1, "user_id": 123, "amount": 50.0},
        {"order_id": 2, "user_id": 123, "amount": 75.0}
    ]

    # Mock db.execute
    mock_db = Mock()
    mock_db.execute.return_value = mock_result

    # Inject mock
    import python.db_operations
    monkeypatch.setattr(python.db_operations, "db", mock_db)

    # Test
    orders = get_user_orders(123)

    # Verify
    assert len(orders) == 2
    assert orders[0]["order_id"] == 1
    assert sum(o["amount"] for o in orders) == 125.0

Pattern 3: Mock Third-Party Libraries

# python/stripe_wrapper.py
import stripe

def create_customer(email: str, name: str) -> dict:
    """Create Stripe customer"""
    customer = stripe.Customer.create(email=email, name=name)
    return {"id": customer.id, "email": customer.email}
# tests/test_stripe_wrapper.py
import pytest
from unittest.mock import Mock, patch
from python.stripe_wrapper import create_customer

@patch('stripe.Customer.create')
def test_create_customer(mock_create):
    """Test Stripe customer creation with mock"""
    # Mock Stripe response
    mock_customer = Mock()
    mock_customer.id = "cus_test123"
    mock_customer.email = "test@example.com"
    mock_create.return_value = mock_customer

    # Call function
    result = create_customer("test@example.com", "Test User")

    # Verify correctness
    assert result["id"] == "cus_test123"
    assert result["email"] == "test@example.com"

    # Verify Stripe was called correctly
    mock_create.assert_called_once_with(
        email="test@example.com",
        name="Test User"
    )

Result Correctness Verification

CRITICAL: Tests must verify results are CORRECT, not just that code doesn't crash.

Bad Test (Only checks structure):

def test_calculate_total_bad():
    result = calculate_total([10, 20, 30])
    assert "total" in result  # ❌ Doesn't verify correctness

Good Test (Verifies correct value):

def test_calculate_total_good():
    result = calculate_total([10, 20, 30])
    assert result["total"] == 60  # ✅ Verifies correct calculation
    assert result["count"] == 3   # ✅ Verifies correct count
    assert result["average"] == 20.0  # ✅ Verifies correct average

Pattern: Test Edge Cases for Correctness

def test_aggregation_correctness():
    """Test various aggregations for correctness"""
    data = [
        {"id": 1, "value": 100},
        {"id": 2, "value": 200},
        {"id": 3, "value": 150}
    ]

    result = aggregate_data(data)

    # Verify each aggregation
    assert result["sum"] == 450  # 100 + 200 + 150
    assert result["avg"] == 150.0  # 450 / 3
    assert result["min"] == 100
    assert result["max"] == 200
    assert result["count"] == 3

    # Verify derived values
    assert result["range"] == 100  # 200 - 100
    assert result["median"] == 150

def test_empty_data_correctness():
    """Test edge case: empty data"""
    result = aggregate_data([])

    assert result["sum"] == 0
    assert result["avg"] == 0.0
    assert result["count"] == 0
    # Ensure no crashes, correct behavior for empty data

Concurrency Safety for Python Tools

CRITICAL: MXCP tools run as a server - multiple requests can happen simultaneously.

Common Concurrency Issues

WRONG: Global State with Race Conditions

# python/unsafe_counter.py
counter = 0  # ❌ DANGER: Race condition!

def increment_counter() -> dict:
    global counter
    counter += 1  # ❌ Not thread-safe!
    return {"count": counter}

# Two simultaneous requests could both read counter=5,
# both increment to 6, both write 6 -> one increment lost!

CORRECT: Use Thread-Safe Approaches

Option 1: Avoid shared state (stateless)

# python/safe_stateless.py
def process_request(data: dict) -> dict:
    """Completely stateless - safe for concurrent calls"""
    result = compute_something(data)
    return {"result": result}
    # No global state, no problem!

Option 2: Use thread-safe structures

# python/safe_with_lock.py
import threading

counter_lock = threading.Lock()
counter = 0

def increment_counter() -> dict:
    global counter
    with counter_lock:  # ✅ Thread-safe
        counter += 1
        current = counter
    return {"count": current}

Option 3: Use atomic operations

# python/safe_atomic.py
from threading import Lock
from collections import defaultdict

# Thread-safe counter
class SafeCounter:
    def __init__(self):
        self._value = 0
        self._lock = Lock()

    def increment(self):
        with self._lock:
            self._value += 1
            return self._value

counter = SafeCounter()

def increment_counter() -> dict:
    return {"count": counter.increment()}

Concurrency-Safe Patterns

Pattern 1: Database as State (DuckDB is thread-safe)

# python/db_counter.py
from mxcp.runtime import db

def increment_counter() -> dict:
    """Use database for state - thread-safe"""
    db.execute("""
        CREATE TABLE IF NOT EXISTS counter (
            id INTEGER PRIMARY KEY,
            value INTEGER
        )
    """)

    db.execute("""
        INSERT INTO counter (id, value) VALUES (1, 1)
        ON CONFLICT(id) DO UPDATE SET value = value + 1
    """)

    result = db.execute("SELECT value FROM counter WHERE id = 1")
    return {"count": result.fetchone()["value"]}

Pattern 2: Local Variables Only (Immutable)

# python/safe_processing.py
async def process_data(input_data: list[dict]) -> dict:
    """Local variables only - safe for concurrent calls"""
    # All state is local to this function call
    results = []
    total = 0

    for item in input_data:
        processed = transform(item)  # Pure function
        results.append(processed)
        total += processed["value"]

    return {
        "results": results,
        "total": total,
        "count": len(results)
    }
    # When function returns, all state is discarded

Pattern 3: Async/Await (Concurrent, Not Parallel)

# python/safe_async.py
import asyncio
import httpx

async def fetch_multiple_users(user_ids: list[int]) -> list[dict]:
    """Concurrent API calls - safe with async"""

    async def fetch_one(user_id: int) -> dict:
        # Each call has its own context - no shared state
        async with httpx.AsyncClient() as client:
            response = await client.get(f"https://api.example.com/users/{user_id}")
            return response.json()

    # Run concurrently, but each fetch_one is independent
    results = await asyncio.gather(*[fetch_one(uid) for uid in user_ids])
    return results

Testing Concurrency Safety

# tests/test_concurrency.py
import pytest
import asyncio
from python.my_module import concurrent_function

@pytest.mark.asyncio
async def test_concurrent_calls_no_race_condition():
    """Test that concurrent calls don't have race conditions"""

    # Run function 100 times concurrently
    tasks = [concurrent_function(i) for i in range(100)]
    results = await asyncio.gather(*tasks)

    # Verify all calls succeeded
    assert len(results) == 100

    # Verify no data corruption
    assert all(isinstance(r, dict) for r in results)

    # If function has a counter, verify correctness
    # (e.g., if each call increments, final count should be 100)

def test_parallel_execution_thread_safe():
    """Test with actual threading"""
    import threading

    results = []
    errors = []

    def worker(n):
        try:
            result = my_function(n)
            results.append(result)
        except Exception as e:
            errors.append(e)

    # Create 50 threads
    threads = [threading.Thread(target=worker, args=(i,)) for i in range(50)]

    # Start all threads
    for t in threads:
        t.start()

    # Wait for completion
    for t in threads:
        t.join()

    # Verify
    assert len(errors) == 0, f"Errors occurred: {errors}"
    assert len(results) == 50

Complete Testing Checklist

For SQL Tools:

  • MXCP test cases in YAML
  • Test with real seed data
  • Verify result correctness (exact values)
  • Test edge cases (empty results, NULL values)
  • Test filters work correctly
  • Test aggregations are mathematically correct
  • Test with dbt test for data quality

For Python Tools (No External Calls):

  • MXCP test cases in YAML
  • Python unit tests (pytest)
  • Verify result correctness
  • Test edge cases (empty input, NULL, invalid)
  • Test error handling
  • Test concurrency safety (if using shared state)

For Python Tools (With External API Calls):

  • MXCP test cases in YAML
  • Python unit tests with mocking (pytest + httpx_mock)
  • Mock all external API calls
  • Test success path with mocked responses
  • Test error cases (404, 500, timeout)
  • Verify correct API parameters
  • Test result correctness
  • Test concurrency (multiple simultaneous calls)

For Python Tools (With Database Operations):

  • MXCP test cases in YAML
  • Python unit tests
  • Use test fixtures/seed data
  • Verify query results correctness
  • Test transactions (if applicable)
  • Test concurrency (DuckDB is thread-safe)
  • Clean up test data after tests

Project Structure for Testing

project/
├── mxcp-site.yml
├── tools/
│   └── my_tool.yml              # Contains MXCP tests
├── python/
│   └── my_module.py             # Python code
├── tests/
│   ├── __init__.py
│   ├── test_my_module.py        # Python unit tests
│   ├── conftest.py              # pytest fixtures
│   └── fixtures/
│       └── test_data.json       # Test data
├── seeds/
│   ├── test_data.csv            # Test database seeds
│   └── schema.yml
└── requirements.txt             # Include: pytest, pytest-asyncio, pytest-httpx, pytest-cov

Running Tests

# 1. MXCP tests (always run first)
mxcp validate  # Structure validation
mxcp test      # Integration tests

# 2. dbt tests (if using dbt)
dbt test

# 3. Python unit tests
pytest tests/ -v

# 4. With coverage report
pytest tests/ --cov=python --cov-report=html

# 5. Concurrency stress test (custom)
pytest tests/test_concurrency.py -v --count=100

# All together
mxcp validate && mxcp test && dbt test && pytest tests/ -v

Summary

Both types of tests are required:

  1. MXCP tests - Verify tools work end-to-end
  2. Python unit tests - Verify logic, mocking, correctness, concurrency

Key principles:

  • Mock all external calls - Use pytest-httpx, unittest.mock
  • Verify result correctness - Don't just check structure
  • Use test databases - SQL tools need real data
  • Test concurrency - Tools run as servers
  • Avoid global mutable state - Use stateless patterns or locks
  • Test edge cases - Empty data, NULL, invalid input

Before declaring a project done, BOTH test types must pass completely.