# Phase 6: Test Suite Generation (NEW v2.0!) ## Objective **GENERATE** comprehensive test suite that validates ALL functions of the created skill. **LEARNING:** us-crop-monitor v1.0 had ZERO tests. When expanding to v2.0, it was difficult to ensure nothing broke. v2.0 has 25 tests (100% passing) that ensure reliability. --- ## Why Are Tests Critical? ### Benefits for Developer: - ✅ Ensures code works before distribution - ✅ Detects bugs early (not after client installs!) - ✅ Allows confident changes (regression testing) - ✅ Documents expected behavior ### Benefits for Client: - ✅ Confidence in skill ("100% tested") - ✅ Fewer bugs in production - ✅ More professional (commercially viable) ### Benefits for Agent-Creator: - ✅ Validates that generated skill actually works - ✅ Catch errors before considering "done" - ✅ Automatic quality gate --- ## Test Structure ### tests/ Directory ``` {skill-name}/ └── tests/ ├── test_fetch.py # Tests API client ├── test_parse.py # Tests parsers ├── test_analyze.py # Tests analyses ├── test_integration.py # Tests end-to-end ├── test_validation.py # Tests validators ├── test_helpers.py # Tests helpers (year detection, etc.) └── README.md # How to run tests ``` --- ## Template 1: test_fetch.py **Objective:** Validate API client works ```python #!/usr/bin/env python3 """ Test suite for {API} client. Tests all fetch methods with real API data. """ import sys from pathlib import Path sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts')) from fetch_{api} import {ApiClient}, DataNotFoundError def test_get_{metric1}(): """Test fetching {metric1} data.""" print("\nTesting get_{metric1}()...") try: client = {ApiClient}() # Test with valid parameters result = client.get_{metric1}( {entity}='{valid_entity}', year=2024 ) # Validations assert 'data' in result, "Missing 'data' in result" assert 'metadata' in result, "Missing 'metadata'" assert len(result['data']) > 0, "No data returned" assert result['metadata']['from_cache'] in [True, False] print(f" ✓ Fetched {len(result['data'])} records") print(f" ✓ Metadata present") print(f" ✓ From cache: {result['metadata']['from_cache']}") return True except Exception as e: print(f" ✗ FAILED: {e}") return False def test_get_{metric2}(): """Test fetching {metric2} data.""" # Similar structure... pass def test_error_handling(): """Test that errors are handled correctly.""" print("\nTesting error handling...") try: client = {ApiClient}() # Test invalid entity (should raise) try: result = client.get_{metric1}({entity}='INVALID_ENTITY', year=2024) print(" ✗ Should have raised DataNotFoundError") return False except DataNotFoundError: print(" ✓ Correctly raises DataNotFoundError for invalid entity") # Test invalid year (should raise) try: result = client.get_{metric1}({entity}='{valid}', year=2099) print(" ✗ Should have raised ValidationError") return False except Exception as e: print(f" ✓ Correctly raises error for future year") return True except Exception as e: print(f" ✗ Unexpected error: {e}") return False def main(): """Run all fetch tests.""" print("=" * 70) print("FETCH TESTS - {API} Client") print("=" * 70) results = [] # Test each get_* method results.append(("get_{metric1}", test_get_{metric1}())) results.append(("get_{metric2}", test_get_{metric2}())) # ... add test for ALL get_* methods results.append(("error_handling", test_error_handling())) # Summary print("\n" + "=" * 70) print("SUMMARY") print("=" * 70) passed = sum(1 for _, r in results if r) total = len(results) for name, result in results: status = "✓ PASS" if result else "✗ FAIL" print(f"{status}: {name}()") print(f"\nResults: {passed}/{total} tests passed") return passed == total if __name__ == "__main__": success = main() sys.exit(0 if success else 1) ``` **Rule:** ONE test function for EACH `get_*()` method implemented! --- ## Template 2: test_parse.py **Objective:** Validate parsers ```python #!/usr/bin/env python3 """ Test suite for data parsers. Tests all parse_* modules. """ import sys from pathlib import Path sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts')) from parse_{type1} import parse_{type1}_response from parse_{type2} import parse_{type2}_response def test_parse_{type1}(): """Test {type1} parser.""" print("\nTesting parse_{type1}_response()...") # Sample data (real structure from API) sample_data = [ { 'field1': 'value1', 'field2': 'value2', 'Value': '123', # ... real API fields } ] try: df = parse_{type1}_response(sample_data) # Validations assert not df.empty, "DataFrame is empty" assert 'Value' in df.columns or '{metric}_value' in df.columns assert len(df) == len(sample_data) print(f" ✓ Parsed {len(df)} records") print(f" ✓ Columns: {list(df.columns)}") return True except Exception as e: print(f" ✗ FAILED: {e}") import traceback traceback.print_exc() return False def test_parse_empty_data(): """Test parser handles empty data gracefully.""" print("\nTesting empty data handling...") try: from parse_{type1} import ParseError try: df = parse_{type1}_response([]) print(" ✗ Should have raised ParseError") return False except ParseError as e: print(f" ✓ Correctly raises ParseError: {e}") return True except Exception as e: print(f" ✗ Unexpected error: {e}") return False def main(): results = [] # Test each parser results.append(("parse_{type1}", test_parse_{type1}())) results.append(("parse_{type2}", test_parse_{type2}())) # ... for ALL parsers results.append(("empty_data", test_parse_empty_data())) # Summary passed = sum(1 for _, r in results if r) print(f"\nResults: {passed}/{len(results)} passed") return passed == len(results) if __name__ == "__main__": sys.exit(0 if main() else 1) ``` --- ## Template 3: test_integration.py **Objective:** End-to-end tests (MOST IMPORTANT!) ```python #!/usr/bin/env python3 """ Integration tests for {skill-name}. Tests all analysis functions with REAL API data. """ import sys from pathlib import Path sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts')) from analyze_{domain} import ( {function1}, {function2}, {function3}, # ... import ALL functions ) def test_{function1}(): """Test {function1} with auto-year detection.""" print("\n1. Testing {function1}()...") try: # Test WITHOUT year (auto-detection) result = {function1}({entity}='{valid_entity}') # Validations assert 'year' in result, "Missing year" assert 'year_requested' in result, "Missing year_requested" assert 'year_info' in result, "Missing year_info" assert result['year'] >= 2024, "Year too old" assert result['year_requested'] is None, "Should auto-detect" print(f" ✓ Auto-year detection: {result['year']}") print(f" ✓ Year info: {result['year_info']}") print(f" ✓ Data present: {list(result.keys())}") return True except Exception as e: print(f" ✗ FAILED: {e}") import traceback traceback.print_exc() return False def test_{function1}_with_explicit_year(): """Test {function1} with explicit year.""" print("\n2. Testing {function1}() with explicit year...") try: # Test WITH year specified result = {function1}({entity}='{valid_entity}', year=2024) assert result['year'] == 2024, f"Expected 2024, got {result['year']}" assert result['year_requested'] == 2024 print(f" ✓ Uses specified year: {result['year']}") return True except Exception as e: print(f" ✗ FAILED: {e}") return False def test_all_functions_exist(): """Verify all expected functions are implemented.""" print("\nVerifying all functions exist...") expected_functions = [ '{function1}', '{function2}', '{function3}', # ... ALL functions ] missing = [] for func_name in expected_functions: if func_name not in globals(): missing.append(func_name) if missing: print(f" ✗ Missing functions: {missing}") return False else: print(f" ✓ All {len(expected_functions)} functions present") return True def main(): """Run all integration tests.""" print("\n" + "=" * 70) print("{SKILL NAME} - INTEGRATION TEST SUITE") print("=" * 70) results = [] # Test each function results.append(("{function1} auto-year", test_{function1}())) results.append(("{function1} explicit-year", test_{function1}_with_explicit_year())) # ... repeat for ALL functions results.append(("all_functions_exist", test_all_functions_exist())) # Summary print("\n" + "=" * 70) print("FINAL SUMMARY") print("=" * 70) passed = sum(1 for _, r in results if r) total = len(results) print(f"\n✓ Passed: {passed}/{total}") print(f"✗ Failed: {total - passed}/{total}") if passed == total: print("\n🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!") else: print(f"\n⚠ {total - passed} test(s) failed - FIX BEFORE RELEASE") print("=" * 70) return passed == total if __name__ == "__main__": success = main() sys.exit(0 if success else 1) ``` **Rule:** Minimum 2 tests per analysis function (auto-year + explicit-year) --- ## Template 4: test_helpers.py **Objective:** Test year detection helpers ```python #!/usr/bin/env python3 """ Test suite for utility helpers. Tests temporal context detection. """ import sys from pathlib import Path from datetime import datetime sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts')) from utils.helpers import ( get_current_{domain}_year, should_try_previous_year, format_year_message ) def test_get_current_year(): """Test current year detection.""" print("\nTesting get_current_{domain}_year()...") try: year = get_current_{domain}_year() current_year = datetime.now().year assert year == current_year, f"Expected {current_year}, got {year}" print(f" ✓ Correctly returns: {year}") return True except Exception as e: print(f" ✗ FAILED: {e}") return False def test_should_try_previous_year(): """Test seasonal fallback logic.""" print("\nTesting should_try_previous_year()...") try: # Test with None (current year) result = should_try_previous_year() print(f" ✓ Current year fallback: {result}") # Test with specific year result_past = should_try_previous_year(2023) print(f" ✓ Past year fallback: {result_past}") return True except Exception as e: print(f" ✗ FAILED: {e}") return False def test_format_year_message(): """Test year message formatting.""" print("\nTesting format_year_message()...") try: # Test auto-detected msg1 = format_year_message(2025, None) assert "auto-detected" in msg1.lower() or "2025" in msg1 print(f" ✓ Auto-detected: {msg1}") # Test requested msg2 = format_year_message(2024, 2024) assert "2024" in msg2 print(f" ✓ Requested: {msg2}") # Test fallback msg3 = format_year_message(2024, 2025) assert "not" in msg3.lower() or "fallback" in msg3.lower() print(f" ✓ Fallback: {msg3}") return True except Exception as e: print(f" ✗ FAILED: {e}") return False def main(): results = [] results.append(("get_current_year", test_get_current_year())) results.append(("should_try_previous_year", test_should_try_previous_year())) results.append(("format_year_message", test_format_year_message())) passed = sum(1 for _, r in results if r) print(f"\nResults: {passed}/{len(results)} passed") return passed == len(results) if __name__ == "__main__": sys.exit(0 if main() else 1) ``` --- ## Quality Rules for Tests ### 1. ALL tests must use REAL DATA ❌ **FORBIDDEN:** ```python def test_function(): # Mock data mock_data = {'fake': 'data'} result = function(mock_data) assert result == 'expected' ``` ✅ **MANDATORY:** ```python def test_function(): # Real API call client = ApiClient() result = client.get_real_data(entity='REAL', year=2024) # Validate real response assert len(result['data']) > 0 assert 'metadata' in result ``` **Why?** - Tests with mocks don't guarantee API is working - Real tests detect API changes - Client needs to know it works with REAL data --- ### 2. Tests must be FAST **Goal:** Complete suite in < 60 seconds **Techniques:** - Use cache: First test populates cache, rest use cached - Limit requests: Don't test 100 entities, test 2-3 - Parallel where possible ```python # Example: Populate cache once @classmethod def setUpClass(cls): """Populate cache before all tests.""" client = ApiClient() client.get_data('ENTITY1', 2024) # Cache for other tests # Tests then use cached data (fast) ``` --- ### 3. Tests must PASS 100% **Quality Gate:** Skill is only "done" when ALL tests pass. ```python if __name__ == "__main__": success = main() if not success: print("\n❌ SKILL NOT READY - FIX FAILING TESTS") sys.exit(1) else: print("\n✅ SKILL READY FOR DISTRIBUTION") sys.exit(0) ``` --- ## Test Coverage Requirements ### Minimum Mandatory: **Per module:** - `fetch_{api}.py`: 1 test per `get_*()` method + 1 error handling test - Each `parse_{type}.py`: 1 test per main function - `analyze_{domain}.py`: 2 tests per analysis (auto-year + explicit-year) - `utils/helpers.py`: 3 tests (get_year, should_fallback, format_message) **Expected total:** 15-30 tests depending on skill size **Example (us-crop-monitor v2.0):** - test_fetch.py: 6 tests (5 get_* + 1 error) - test_parse.py: 4 tests (4 parsers) - test_analyze.py: 11 tests (11 functions) - test_helpers.py: 3 tests - test_integration.py: 1 end-to-end test - **Total:** 25 tests --- ## How to Run Tests ### Individual: ```bash python3 tests/test_fetch.py python3 tests/test_integration.py ``` ### Complete suite: ```bash # Run all for test in tests/test_*.py; do python3 $test || exit 1 done # Or with pytest (if available) pytest tests/ ``` ### In CI/CD: ```yaml # .github/workflows/test.yml name: Test Suite on: [push] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - run: pip install -r requirements.txt - run: python3 tests/test_integration.py ``` --- ## Output Example **When tests pass:** ``` ====================================================================== US CROP MONITOR - INTEGRATION TEST SUITE ====================================================================== 1. current_condition_report()... ✓ Year: 2025 | Week: 39 ✓ Good+Excellent: 66.0% 2. week_over_week_comparison()... ✓ Year: 2025 | Weeks: 39 vs 38 ✓ Delta: -2.2 pts ... ====================================================================== FINAL SUMMARY ====================================================================== ✓ Passed: 25/25 tests ✗ Failed: 0/25 tests 🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY! ====================================================================== ``` **When tests fail:** ``` 8. yield_analysis()... ✗ FAILED: 'yield_bu_per_acre' not in result ... FINAL SUMMARY: ✓ Passed: 24/25 ✗ Failed: 1/25 ❌ SKILL NOT READY - FIX FAILING TESTS ``` --- ## Integration with Agent-Creator ### When to generate tests: **In Phase 5 (Implementation):** Updated order: ``` ... 8. Implement analyze (analyses) 9. CREATE TESTS (← here!) - Generate test_fetch.py - Generate test_parse.py - Generate test_analyze.py - Generate test_helpers.py - Generate test_integration.py 10. RUN TESTS - Run test suite - If fails → FIX and re-run - Only continue when 100% passing 11. Create examples/ ... ``` ### Quality Gate: ```python # Agent-creator should do: print("Running test suite...") exit_code = subprocess.run(['python3', 'tests/test_integration.py']).returncode if exit_code != 0: print("❌ Tests failed - aborting skill generation") print("Fix errors above and try again") sys.exit(1) print("✅ All tests passed - continuing...") ``` --- ## Testing Checklist Before considering skill "done": - [ ] tests/ directory created - [ ] test_fetch.py with 1 test per get_*() method - [ ] test_parse.py with 1 test per parser - [ ] test_analyze.py with 2 tests per function (auto-year + explicit) - [ ] test_helpers.py with year detection tests - [ ] test_integration.py with end-to-end test - [ ] ALL tests passing (100%) - [ ] Test suite executes in < 60 seconds - [ ] README in tests/ explaining how to run --- ## Real Example: us-crop-monitor v2.0 **Tests created:** - `test_new_metrics.py` - 5 tests (fetch methods) - `test_year_detection.py` - 2 tests (auto-detection) - `test_all_year_detection.py` - 4 tests (all functions) - `test_new_analyses.py` - 3 tests (new analyses) - `tests/test_integrated_validation.py` - 11 tests (comprehensive) **Total:** 25 tests, 100% passing **Result:** ``` ✓ Passed: 25/25 tests 🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY! ``` **Benefit:** Full confidence v2.0 works before distribution! --- ## Conclusion **ALWAYS generate test suite!** Skills without tests = prototypes Skills with tests = professional products ✅ **ROI:** Tests cost +2h to create, but save 10-20h of debugging later!