18 KiB
Phase 6: Test Suite Generation (NEW v2.0!)
Objective
GENERATE comprehensive test suite that validates ALL functions of the created skill.
LEARNING: us-crop-monitor v1.0 had ZERO tests. When expanding to v2.0, it was difficult to ensure nothing broke. v2.0 has 25 tests (100% passing) that ensure reliability.
Why Are Tests Critical?
Benefits for Developer:
- ✅ Ensures code works before distribution
- ✅ Detects bugs early (not after client installs!)
- ✅ Allows confident changes (regression testing)
- ✅ Documents expected behavior
Benefits for Client:
- ✅ Confidence in skill ("100% tested")
- ✅ Fewer bugs in production
- ✅ More professional (commercially viable)
Benefits for Agent-Creator:
- ✅ Validates that generated skill actually works
- ✅ Catch errors before considering "done"
- ✅ Automatic quality gate
Test Structure
tests/ Directory
{skill-name}/
└── tests/
├── test_fetch.py # Tests API client
├── test_parse.py # Tests parsers
├── test_analyze.py # Tests analyses
├── test_integration.py # Tests end-to-end
├── test_validation.py # Tests validators
├── test_helpers.py # Tests helpers (year detection, etc.)
└── README.md # How to run tests
Template 1: test_fetch.py
Objective: Validate API client works
#!/usr/bin/env python3
"""
Test suite for {API} client.
Tests all fetch methods with real API data.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from fetch_{api} import {ApiClient}, DataNotFoundError
def test_get_{metric1}():
"""Test fetching {metric1} data."""
print("\nTesting get_{metric1}()...")
try:
client = {ApiClient}()
# Test with valid parameters
result = client.get_{metric1}(
{entity}='{valid_entity}',
year=2024
)
# Validations
assert 'data' in result, "Missing 'data' in result"
assert 'metadata' in result, "Missing 'metadata'"
assert len(result['data']) > 0, "No data returned"
assert result['metadata']['from_cache'] in [True, False]
print(f" ✓ Fetched {len(result['data'])} records")
print(f" ✓ Metadata present")
print(f" ✓ From cache: {result['metadata']['from_cache']}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_get_{metric2}():
"""Test fetching {metric2} data."""
# Similar structure...
pass
def test_error_handling():
"""Test that errors are handled correctly."""
print("\nTesting error handling...")
try:
client = {ApiClient}()
# Test invalid entity (should raise)
try:
result = client.get_{metric1}({entity}='INVALID_ENTITY', year=2024)
print(" ✗ Should have raised DataNotFoundError")
return False
except DataNotFoundError:
print(" ✓ Correctly raises DataNotFoundError for invalid entity")
# Test invalid year (should raise)
try:
result = client.get_{metric1}({entity}='{valid}', year=2099)
print(" ✗ Should have raised ValidationError")
return False
except Exception as e:
print(f" ✓ Correctly raises error for future year")
return True
except Exception as e:
print(f" ✗ Unexpected error: {e}")
return False
def main():
"""Run all fetch tests."""
print("=" * 70)
print("FETCH TESTS - {API} Client")
print("=" * 70)
results = []
# Test each get_* method
results.append(("get_{metric1}", test_get_{metric1}()))
results.append(("get_{metric2}", test_get_{metric2}()))
# ... add test for ALL get_* methods
results.append(("error_handling", test_error_handling()))
# Summary
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
passed = sum(1 for _, r in results if r)
total = len(results)
for name, result in results:
status = "✓ PASS" if result else "✗ FAIL"
print(f"{status}: {name}()")
print(f"\nResults: {passed}/{total} tests passed")
return passed == total
if __name__ == "__main__":
success = main()
sys.exit(0 if success else 1)
Rule: ONE test function for EACH get_*() method implemented!
Template 2: test_parse.py
Objective: Validate parsers
#!/usr/bin/env python3
"""
Test suite for data parsers.
Tests all parse_* modules.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from parse_{type1} import parse_{type1}_response
from parse_{type2} import parse_{type2}_response
def test_parse_{type1}():
"""Test {type1} parser."""
print("\nTesting parse_{type1}_response()...")
# Sample data (real structure from API)
sample_data = [
{
'field1': 'value1',
'field2': 'value2',
'Value': '123',
# ... real API fields
}
]
try:
df = parse_{type1}_response(sample_data)
# Validations
assert not df.empty, "DataFrame is empty"
assert 'Value' in df.columns or '{metric}_value' in df.columns
assert len(df) == len(sample_data)
print(f" ✓ Parsed {len(df)} records")
print(f" ✓ Columns: {list(df.columns)}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
import traceback
traceback.print_exc()
return False
def test_parse_empty_data():
"""Test parser handles empty data gracefully."""
print("\nTesting empty data handling...")
try:
from parse_{type1} import ParseError
try:
df = parse_{type1}_response([])
print(" ✗ Should have raised ParseError")
return False
except ParseError as e:
print(f" ✓ Correctly raises ParseError: {e}")
return True
except Exception as e:
print(f" ✗ Unexpected error: {e}")
return False
def main():
results = []
# Test each parser
results.append(("parse_{type1}", test_parse_{type1}()))
results.append(("parse_{type2}", test_parse_{type2}()))
# ... for ALL parsers
results.append(("empty_data", test_parse_empty_data()))
# Summary
passed = sum(1 for _, r in results if r)
print(f"\nResults: {passed}/{len(results)} passed")
return passed == len(results)
if __name__ == "__main__":
sys.exit(0 if main() else 1)
Template 3: test_integration.py
Objective: End-to-end tests (MOST IMPORTANT!)
#!/usr/bin/env python3
"""
Integration tests for {skill-name}.
Tests all analysis functions with REAL API data.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from analyze_{domain} import (
{function1},
{function2},
{function3},
# ... import ALL functions
)
def test_{function1}():
"""Test {function1} with auto-year detection."""
print("\n1. Testing {function1}()...")
try:
# Test WITHOUT year (auto-detection)
result = {function1}({entity}='{valid_entity}')
# Validations
assert 'year' in result, "Missing year"
assert 'year_requested' in result, "Missing year_requested"
assert 'year_info' in result, "Missing year_info"
assert result['year'] >= 2024, "Year too old"
assert result['year_requested'] is None, "Should auto-detect"
print(f" ✓ Auto-year detection: {result['year']}")
print(f" ✓ Year info: {result['year_info']}")
print(f" ✓ Data present: {list(result.keys())}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
import traceback
traceback.print_exc()
return False
def test_{function1}_with_explicit_year():
"""Test {function1} with explicit year."""
print("\n2. Testing {function1}() with explicit year...")
try:
# Test WITH year specified
result = {function1}({entity}='{valid_entity}', year=2024)
assert result['year'] == 2024, f"Expected 2024, got {result['year']}"
assert result['year_requested'] == 2024
print(f" ✓ Uses specified year: {result['year']}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_all_functions_exist():
"""Verify all expected functions are implemented."""
print("\nVerifying all functions exist...")
expected_functions = [
'{function1}',
'{function2}',
'{function3}',
# ... ALL functions
]
missing = []
for func_name in expected_functions:
if func_name not in globals():
missing.append(func_name)
if missing:
print(f" ✗ Missing functions: {missing}")
return False
else:
print(f" ✓ All {len(expected_functions)} functions present")
return True
def main():
"""Run all integration tests."""
print("\n" + "=" * 70)
print("{SKILL NAME} - INTEGRATION TEST SUITE")
print("=" * 70)
results = []
# Test each function
results.append(("{function1} auto-year", test_{function1}()))
results.append(("{function1} explicit-year", test_{function1}_with_explicit_year()))
# ... repeat for ALL functions
results.append(("all_functions_exist", test_all_functions_exist()))
# Summary
print("\n" + "=" * 70)
print("FINAL SUMMARY")
print("=" * 70)
passed = sum(1 for _, r in results if r)
total = len(results)
print(f"\n✓ Passed: {passed}/{total}")
print(f"✗ Failed: {total - passed}/{total}")
if passed == total:
print("\n🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!")
else:
print(f"\n⚠ {total - passed} test(s) failed - FIX BEFORE RELEASE")
print("=" * 70)
return passed == total
if __name__ == "__main__":
success = main()
sys.exit(0 if success else 1)
Rule: Minimum 2 tests per analysis function (auto-year + explicit-year)
Template 4: test_helpers.py
Objective: Test year detection helpers
#!/usr/bin/env python3
"""
Test suite for utility helpers.
Tests temporal context detection.
"""
import sys
from pathlib import Path
from datetime import datetime
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from utils.helpers import (
get_current_{domain}_year,
should_try_previous_year,
format_year_message
)
def test_get_current_year():
"""Test current year detection."""
print("\nTesting get_current_{domain}_year()...")
try:
year = get_current_{domain}_year()
current_year = datetime.now().year
assert year == current_year, f"Expected {current_year}, got {year}"
print(f" ✓ Correctly returns: {year}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_should_try_previous_year():
"""Test seasonal fallback logic."""
print("\nTesting should_try_previous_year()...")
try:
# Test with None (current year)
result = should_try_previous_year()
print(f" ✓ Current year fallback: {result}")
# Test with specific year
result_past = should_try_previous_year(2023)
print(f" ✓ Past year fallback: {result_past}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_format_year_message():
"""Test year message formatting."""
print("\nTesting format_year_message()...")
try:
# Test auto-detected
msg1 = format_year_message(2025, None)
assert "auto-detected" in msg1.lower() or "2025" in msg1
print(f" ✓ Auto-detected: {msg1}")
# Test requested
msg2 = format_year_message(2024, 2024)
assert "2024" in msg2
print(f" ✓ Requested: {msg2}")
# Test fallback
msg3 = format_year_message(2024, 2025)
assert "not" in msg3.lower() or "fallback" in msg3.lower()
print(f" ✓ Fallback: {msg3}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def main():
results = []
results.append(("get_current_year", test_get_current_year()))
results.append(("should_try_previous_year", test_should_try_previous_year()))
results.append(("format_year_message", test_format_year_message()))
passed = sum(1 for _, r in results if r)
print(f"\nResults: {passed}/{len(results)} passed")
return passed == len(results)
if __name__ == "__main__":
sys.exit(0 if main() else 1)
Quality Rules for Tests
1. ALL tests must use REAL DATA
❌ FORBIDDEN:
def test_function():
# Mock data
mock_data = {'fake': 'data'}
result = function(mock_data)
assert result == 'expected'
✅ MANDATORY:
def test_function():
# Real API call
client = ApiClient()
result = client.get_real_data(entity='REAL', year=2024)
# Validate real response
assert len(result['data']) > 0
assert 'metadata' in result
Why?
- Tests with mocks don't guarantee API is working
- Real tests detect API changes
- Client needs to know it works with REAL data
2. Tests must be FAST
Goal: Complete suite in < 60 seconds
Techniques:
- Use cache: First test populates cache, rest use cached
- Limit requests: Don't test 100 entities, test 2-3
- Parallel where possible
# Example: Populate cache once
@classmethod
def setUpClass(cls):
"""Populate cache before all tests."""
client = ApiClient()
client.get_data('ENTITY1', 2024) # Cache for other tests
# Tests then use cached data (fast)
3. Tests must PASS 100%
Quality Gate: Skill is only "done" when ALL tests pass.
if __name__ == "__main__":
success = main()
if not success:
print("\n❌ SKILL NOT READY - FIX FAILING TESTS")
sys.exit(1)
else:
print("\n✅ SKILL READY FOR DISTRIBUTION")
sys.exit(0)
Test Coverage Requirements
Minimum Mandatory:
Per module:
fetch_{api}.py: 1 test perget_*()method + 1 error handling test- Each
parse_{type}.py: 1 test per main function analyze_{domain}.py: 2 tests per analysis (auto-year + explicit-year)utils/helpers.py: 3 tests (get_year, should_fallback, format_message)
Expected total: 15-30 tests depending on skill size
Example (us-crop-monitor v2.0):
- test_fetch.py: 6 tests (5 get_* + 1 error)
- test_parse.py: 4 tests (4 parsers)
- test_analyze.py: 11 tests (11 functions)
- test_helpers.py: 3 tests
- test_integration.py: 1 end-to-end test
- Total: 25 tests
How to Run Tests
Individual:
python3 tests/test_fetch.py
python3 tests/test_integration.py
Complete suite:
# Run all
for test in tests/test_*.py; do
python3 $test || exit 1
done
# Or with pytest (if available)
pytest tests/
In CI/CD:
# .github/workflows/test.yml
name: Test Suite
on: [push]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: pip install -r requirements.txt
- run: python3 tests/test_integration.py
Output Example
When tests pass:
======================================================================
US CROP MONITOR - INTEGRATION TEST SUITE
======================================================================
1. current_condition_report()...
✓ Year: 2025 | Week: 39
✓ Good+Excellent: 66.0%
2. week_over_week_comparison()...
✓ Year: 2025 | Weeks: 39 vs 38
✓ Delta: -2.2 pts
...
======================================================================
FINAL SUMMARY
======================================================================
✓ Passed: 25/25 tests
✗ Failed: 0/25 tests
🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!
======================================================================
When tests fail:
8. yield_analysis()...
✗ FAILED: 'yield_bu_per_acre' not in result
...
FINAL SUMMARY:
✓ Passed: 24/25
✗ Failed: 1/25
❌ SKILL NOT READY - FIX FAILING TESTS
Integration with Agent-Creator
When to generate tests:
In Phase 5 (Implementation):
Updated order:
...
8. Implement analyze (analyses)
9. CREATE TESTS (← here!)
- Generate test_fetch.py
- Generate test_parse.py
- Generate test_analyze.py
- Generate test_helpers.py
- Generate test_integration.py
10. RUN TESTS
- Run test suite
- If fails → FIX and re-run
- Only continue when 100% passing
11. Create examples/
...
Quality Gate:
# Agent-creator should do:
print("Running test suite...")
exit_code = subprocess.run(['python3', 'tests/test_integration.py']).returncode
if exit_code != 0:
print("❌ Tests failed - aborting skill generation")
print("Fix errors above and try again")
sys.exit(1)
print("✅ All tests passed - continuing...")
Testing Checklist
Before considering skill "done":
- tests/ directory created
- test_fetch.py with 1 test per get_*() method
- test_parse.py with 1 test per parser
- test_analyze.py with 2 tests per function (auto-year + explicit)
- test_helpers.py with year detection tests
- test_integration.py with end-to-end test
- ALL tests passing (100%)
- Test suite executes in < 60 seconds
- README in tests/ explaining how to run
Real Example: us-crop-monitor v2.0
Tests created:
test_new_metrics.py- 5 tests (fetch methods)test_year_detection.py- 2 tests (auto-detection)test_all_year_detection.py- 4 tests (all functions)test_new_analyses.py- 3 tests (new analyses)tests/test_integrated_validation.py- 11 tests (comprehensive)
Total: 25 tests, 100% passing
Result:
✓ Passed: 25/25 tests
🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!
Benefit: Full confidence v2.0 works before distribution!
Conclusion
ALWAYS generate test suite!
Skills without tests = prototypes Skills with tests = professional products ✅
ROI: Tests cost +2h to create, but save 10-20h of debugging later!