zhongwei/gh-francyjglisboa-agent-skill-creator-agent-skill-creator-plugin

Fork 0

Files

Zhongwei Li 8db9c44dd8 Initial commit

2025-11-29 18:27:28 +08:00

18 KiB

Raw Blame History

Phase 6: Test Suite Generation (NEW v2.0!)

Objective

GENERATE comprehensive test suite that validates ALL functions of the created skill.

LEARNING: us-crop-monitor v1.0 had ZERO tests. When expanding to v2.0, it was difficult to ensure nothing broke. v2.0 has 25 tests (100% passing) that ensure reliability.

Why Are Tests Critical?

Benefits for Developer:

✅ Ensures code works before distribution
✅ Detects bugs early (not after client installs!)
✅ Allows confident changes (regression testing)
✅ Documents expected behavior

Benefits for Client:

✅ Confidence in skill ("100% tested")
✅ Fewer bugs in production
✅ More professional (commercially viable)

Benefits for Agent-Creator:

✅ Validates that generated skill actually works
✅ Catch errors before considering "done"
✅ Automatic quality gate

Test Structure

tests/ Directory

{skill-name}/
└── tests/
    ├── test_fetch.py           # Tests API client
    ├── test_parse.py           # Tests parsers
    ├── test_analyze.py         # Tests analyses
    ├── test_integration.py     # Tests end-to-end
    ├── test_validation.py      # Tests validators
    ├── test_helpers.py         # Tests helpers (year detection, etc.)
    └── README.md               # How to run tests

Template 1: test_fetch.py

Objective: Validate API client works

#!/usr/bin/env python3
"""
Test suite for {API} client.

Tests all fetch methods with real API data.
"""

import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))

from fetch_{api} import {ApiClient}, DataNotFoundError


def test_get_{metric1}():
    """Test fetching {metric1} data."""
    print("\nTesting get_{metric1}()...")

    try:
        client = {ApiClient}()

        # Test with valid parameters
        result = client.get_{metric1}(
            {entity}='{valid_entity}',
            year=2024
        )

        # Validations
        assert 'data' in result, "Missing 'data' in result"
        assert 'metadata' in result, "Missing 'metadata'"
        assert len(result['data']) > 0, "No data returned"
        assert result['metadata']['from_cache'] in [True, False]

        print(f"  ✓ Fetched {len(result['data'])} records")
        print(f"  ✓ Metadata present")
        print(f"  ✓ From cache: {result['metadata']['from_cache']}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False


def test_get_{metric2}():
    """Test fetching {metric2} data."""
    # Similar structure...
    pass


def test_error_handling():
    """Test that errors are handled correctly."""
    print("\nTesting error handling...")

    try:
        client = {ApiClient}()

        # Test invalid entity (should raise)
        try:
            result = client.get_{metric1}({entity}='INVALID_ENTITY', year=2024)
            print("  ✗ Should have raised DataNotFoundError")
            return False
        except DataNotFoundError:
            print("  ✓ Correctly raises DataNotFoundError for invalid entity")

        # Test invalid year (should raise)
        try:
            result = client.get_{metric1}({entity}='{valid}', year=2099)
            print("  ✗ Should have raised ValidationError")
            return False
        except Exception as e:
            print(f"  ✓ Correctly raises error for future year")

        return True

    except Exception as e:
        print(f"  ✗ Unexpected error: {e}")
        return False


def main():
    """Run all fetch tests."""
    print("=" * 70)
    print("FETCH TESTS - {API} Client")
    print("=" * 70)

    results = []

    # Test each get_* method
    results.append(("get_{metric1}", test_get_{metric1}()))
    results.append(("get_{metric2}", test_get_{metric2}()))
    # ... add test for ALL get_* methods

    results.append(("error_handling", test_error_handling()))

    # Summary
    print("\n" + "=" * 70)
    print("SUMMARY")
    print("=" * 70)

    passed = sum(1 for _, r in results if r)
    total = len(results)

    for name, result in results:
        status = "✓ PASS" if result else "✗ FAIL"
        print(f"{status}: {name}()")

    print(f"\nResults: {passed}/{total} tests passed")

    return passed == total


if __name__ == "__main__":
    success = main()
    sys.exit(0 if success else 1)

Rule: ONE test function for EACH get_*() method implemented!

Template 2: test_parse.py

Objective: Validate parsers

#!/usr/bin/env python3
"""
Test suite for data parsers.

Tests all parse_* modules.
"""

import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))

from parse_{type1} import parse_{type1}_response
from parse_{type2} import parse_{type2}_response


def test_parse_{type1}():
    """Test {type1} parser."""
    print("\nTesting parse_{type1}_response()...")

    # Sample data (real structure from API)
    sample_data = [
        {
            'field1': 'value1',
            'field2': 'value2',
            'Value': '123',
            # ... real API fields
        }
    ]

    try:
        df = parse_{type1}_response(sample_data)

        # Validations
        assert not df.empty, "DataFrame is empty"
        assert 'Value' in df.columns or '{metric}_value' in df.columns
        assert len(df) == len(sample_data)

        print(f"  ✓ Parsed {len(df)} records")
        print(f"  ✓ Columns: {list(df.columns)}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        import traceback
        traceback.print_exc()
        return False


def test_parse_empty_data():
    """Test parser handles empty data gracefully."""
    print("\nTesting empty data handling...")

    try:
        from parse_{type1} import ParseError

        try:
            df = parse_{type1}_response([])
            print("  ✗ Should have raised ParseError")
            return False
        except ParseError as e:
            print(f"  ✓ Correctly raises ParseError: {e}")
            return True

    except Exception as e:
        print(f"  ✗ Unexpected error: {e}")
        return False


def main():
    results = []

    # Test each parser
    results.append(("parse_{type1}", test_parse_{type1}()))
    results.append(("parse_{type2}", test_parse_{type2}()))
    # ... for ALL parsers

    results.append(("empty_data", test_parse_empty_data()))

    # Summary
    passed = sum(1 for _, r in results if r)
    print(f"\nResults: {passed}/{len(results)} passed")
    return passed == len(results)


if __name__ == "__main__":
    sys.exit(0 if main() else 1)

Template 3: test_integration.py

Objective: End-to-end tests (MOST IMPORTANT!)

#!/usr/bin/env python3
"""
Integration tests for {skill-name}.

Tests all analysis functions with REAL API data.
"""

import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))

from analyze_{domain} import (
    {function1},
    {function2},
    {function3},
    # ... import ALL functions
)


def test_{function1}():
    """Test {function1} with auto-year detection."""
    print("\n1. Testing {function1}()...")

    try:
        # Test WITHOUT year (auto-detection)
        result = {function1}({entity}='{valid_entity}')

        # Validations
        assert 'year' in result, "Missing year"
        assert 'year_requested' in result, "Missing year_requested"
        assert 'year_info' in result, "Missing year_info"
        assert result['year'] >= 2024, "Year too old"
        assert result['year_requested'] is None, "Should auto-detect"

        print(f"  ✓ Auto-year detection: {result['year']}")
        print(f"  ✓ Year info: {result['year_info']}")
        print(f"  ✓ Data present: {list(result.keys())}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        import traceback
        traceback.print_exc()
        return False


def test_{function1}_with_explicit_year():
    """Test {function1} with explicit year."""
    print("\n2. Testing {function1}() with explicit year...")

    try:
        # Test WITH year specified
        result = {function1}({entity}='{valid_entity}', year=2024)

        assert result['year'] == 2024, f"Expected 2024, got {result['year']}"
        assert result['year_requested'] == 2024

        print(f"  ✓ Uses specified year: {result['year']}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False


def test_all_functions_exist():
    """Verify all expected functions are implemented."""
    print("\nVerifying all functions exist...")

    expected_functions = [
        '{function1}',
        '{function2}',
        '{function3}',
        # ... ALL functions
    ]

    missing = []
    for func_name in expected_functions:
        if func_name not in globals():
            missing.append(func_name)

    if missing:
        print(f"  ✗ Missing functions: {missing}")
        return False
    else:
        print(f"  ✓ All {len(expected_functions)} functions present")
        return True


def main():
    """Run all integration tests."""
    print("\n" + "=" * 70)
    print("{SKILL NAME} - INTEGRATION TEST SUITE")
    print("=" * 70)

    results = []

    # Test each function
    results.append(("{function1} auto-year", test_{function1}()))
    results.append(("{function1} explicit-year", test_{function1}_with_explicit_year()))
    # ... repeat for ALL functions

    results.append(("all_functions_exist", test_all_functions_exist()))

    # Summary
    print("\n" + "=" * 70)
    print("FINAL SUMMARY")
    print("=" * 70)

    passed = sum(1 for _, r in results if r)
    total = len(results)

    print(f"\n✓ Passed: {passed}/{total}")
    print(f"✗ Failed: {total - passed}/{total}")

    if passed == total:
        print("\n🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!")
    else:
        print(f"\n⚠ {total - passed} test(s) failed - FIX BEFORE RELEASE")

    print("=" * 70)

    return passed == total


if __name__ == "__main__":
    success = main()
    sys.exit(0 if success else 1)

Rule: Minimum 2 tests per analysis function (auto-year + explicit-year)

Template 4: test_helpers.py

Objective: Test year detection helpers

#!/usr/bin/env python3
"""
Test suite for utility helpers.

Tests temporal context detection.
"""

import sys
from pathlib import Path
from datetime import datetime

sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))

from utils.helpers import (
    get_current_{domain}_year,
    should_try_previous_year,
    format_year_message
)


def test_get_current_year():
    """Test current year detection."""
    print("\nTesting get_current_{domain}_year()...")

    try:
        year = get_current_{domain}_year()
        current_year = datetime.now().year

        assert year == current_year, f"Expected {current_year}, got {year}"

        print(f"  ✓ Correctly returns: {year}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False


def test_should_try_previous_year():
    """Test seasonal fallback logic."""
    print("\nTesting should_try_previous_year()...")

    try:
        # Test with None (current year)
        result = should_try_previous_year()
        print(f"  ✓ Current year fallback: {result}")

        # Test with specific year
        result_past = should_try_previous_year(2023)
        print(f"  ✓ Past year fallback: {result_past}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False


def test_format_year_message():
    """Test year message formatting."""
    print("\nTesting format_year_message()...")

    try:
        # Test auto-detected
        msg1 = format_year_message(2025, None)
        assert "auto-detected" in msg1.lower() or "2025" in msg1
        print(f"  ✓ Auto-detected: {msg1}")

        # Test requested
        msg2 = format_year_message(2024, 2024)
        assert "2024" in msg2
        print(f"  ✓ Requested: {msg2}")

        # Test fallback
        msg3 = format_year_message(2024, 2025)
        assert "not" in msg3.lower() or "fallback" in msg3.lower()
        print(f"  ✓ Fallback: {msg3}")

        return True

    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False


def main():
    results = []

    results.append(("get_current_year", test_get_current_year()))
    results.append(("should_try_previous_year", test_should_try_previous_year()))
    results.append(("format_year_message", test_format_year_message()))

    passed = sum(1 for _, r in results if r)
    print(f"\nResults: {passed}/{len(results)} passed")

    return passed == len(results)


if __name__ == "__main__":
    sys.exit(0 if main() else 1)

Quality Rules for Tests

1. ALL tests must use REAL DATA

❌ FORBIDDEN:

def test_function():
    # Mock data
    mock_data = {'fake': 'data'}
    result = function(mock_data)
    assert result == 'expected'

✅ MANDATORY:

def test_function():
    # Real API call
    client = ApiClient()
    result = client.get_real_data(entity='REAL', year=2024)

    # Validate real response
    assert len(result['data']) > 0
    assert 'metadata' in result

Why?

Tests with mocks don't guarantee API is working
Real tests detect API changes
Client needs to know it works with REAL data

2. Tests must be FAST

Goal: Complete suite in < 60 seconds

Techniques:

Use cache: First test populates cache, rest use cached
Limit requests: Don't test 100 entities, test 2-3
Parallel where possible

# Example: Populate cache once
@classmethod
def setUpClass(cls):
    """Populate cache before all tests."""
    client = ApiClient()
    client.get_data('ENTITY1', 2024)  # Cache for other tests

# Tests then use cached data (fast)

3. Tests must PASS 100%

Quality Gate: Skill is only "done" when ALL tests pass.

if __name__ == "__main__":
    success = main()
    if not success:
        print("\n❌ SKILL NOT READY - FIX FAILING TESTS")
        sys.exit(1)
    else:
        print("\n✅ SKILL READY FOR DISTRIBUTION")
        sys.exit(0)

Test Coverage Requirements

Minimum Mandatory:

Per module:

fetch_{api}.py: 1 test per get_*() method + 1 error handling test
Each parse_{type}.py: 1 test per main function
analyze_{domain}.py: 2 tests per analysis (auto-year + explicit-year)
utils/helpers.py: 3 tests (get_year, should_fallback, format_message)

Expected total: 15-30 tests depending on skill size

Example (us-crop-monitor v2.0):

test_fetch.py: 6 tests (5 get_* + 1 error)
test_parse.py: 4 tests (4 parsers)
test_analyze.py: 11 tests (11 functions)
test_helpers.py: 3 tests
test_integration.py: 1 end-to-end test
Total: 25 tests

How to Run Tests

Individual:

python3 tests/test_fetch.py
python3 tests/test_integration.py

Complete suite:

# Run all
for test in tests/test_*.py; do
    python3 $test || exit 1
done

# Or with pytest (if available)
pytest tests/

In CI/CD:

# .github/workflows/test.yml
name: Test Suite
on: [push]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: pip install -r requirements.txt
      - run: python3 tests/test_integration.py

Output Example

When tests pass:

======================================================================
US CROP MONITOR - INTEGRATION TEST SUITE
======================================================================

1. current_condition_report()...
   ✓ Year: 2025 | Week: 39
   ✓ Good+Excellent: 66.0%

2. week_over_week_comparison()...
   ✓ Year: 2025 | Weeks: 39 vs 38
   ✓ Delta: -2.2 pts

...

======================================================================
FINAL SUMMARY
======================================================================

✓ Passed: 25/25 tests
✗ Failed: 0/25 tests

🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!
======================================================================

When tests fail:

8. yield_analysis()...
   ✗ FAILED: 'yield_bu_per_acre' not in result

...

FINAL SUMMARY:
✓ Passed: 24/25
✗ Failed: 1/25

❌ SKILL NOT READY - FIX FAILING TESTS

Integration with Agent-Creator

When to generate tests:

In Phase 5 (Implementation):

Updated order:

...
8. Implement analyze (analyses)
9. CREATE TESTS (← here!)
   - Generate test_fetch.py
   - Generate test_parse.py
   - Generate test_analyze.py
   - Generate test_helpers.py
   - Generate test_integration.py
10. RUN TESTS
    - Run test suite
    - If fails → FIX and re-run
    - Only continue when 100% passing
11. Create examples/
...

Quality Gate:

# Agent-creator should do:
print("Running test suite...")
exit_code = subprocess.run(['python3', 'tests/test_integration.py']).returncode

if exit_code != 0:
    print("❌ Tests failed - aborting skill generation")
    print("Fix errors above and try again")
    sys.exit(1)

print("✅ All tests passed - continuing...")

Testing Checklist

Before considering skill "done":

tests/ directory created
test_fetch.py with 1 test per get_*() method
test_parse.py with 1 test per parser
test_analyze.py with 2 tests per function (auto-year + explicit)
test_helpers.py with year detection tests
test_integration.py with end-to-end test
ALL tests passing (100%)
Test suite executes in < 60 seconds
README in tests/ explaining how to run

Real Example: us-crop-monitor v2.0

Tests created:

test_new_metrics.py - 5 tests (fetch methods)
test_year_detection.py - 2 tests (auto-detection)
test_all_year_detection.py - 4 tests (all functions)
test_new_analyses.py - 3 tests (new analyses)
tests/test_integrated_validation.py - 11 tests (comprehensive)

Total: 25 tests, 100% passing

Result:

✓ Passed: 25/25 tests
🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!

Benefit: Full confidence v2.0 works before distribution!

Conclusion

ALWAYS generate test suite!

Skills without tests = prototypes Skills with tests = professional products ✅

ROI: Tests cost +2h to create, but save 10-20h of debugging later!

18 KiB Raw Blame History