784 lines
18 KiB
Markdown
784 lines
18 KiB
Markdown
# Phase 6: Test Suite Generation (NEW v2.0!)
|
|
|
|
## Objective
|
|
|
|
**GENERATE** comprehensive test suite that validates ALL functions of the created skill.
|
|
|
|
**LEARNING:** us-crop-monitor v1.0 had ZERO tests. When expanding to v2.0, it was difficult to ensure nothing broke. v2.0 has 25 tests (100% passing) that ensure reliability.
|
|
|
|
---
|
|
|
|
## Why Are Tests Critical?
|
|
|
|
### Benefits for Developer:
|
|
- ✅ Ensures code works before distribution
|
|
- ✅ Detects bugs early (not after client installs!)
|
|
- ✅ Allows confident changes (regression testing)
|
|
- ✅ Documents expected behavior
|
|
|
|
### Benefits for Client:
|
|
- ✅ Confidence in skill ("100% tested")
|
|
- ✅ Fewer bugs in production
|
|
- ✅ More professional (commercially viable)
|
|
|
|
### Benefits for Agent-Creator:
|
|
- ✅ Validates that generated skill actually works
|
|
- ✅ Catch errors before considering "done"
|
|
- ✅ Automatic quality gate
|
|
|
|
---
|
|
|
|
## Test Structure
|
|
|
|
### tests/ Directory
|
|
|
|
```
|
|
{skill-name}/
|
|
└── tests/
|
|
├── test_fetch.py # Tests API client
|
|
├── test_parse.py # Tests parsers
|
|
├── test_analyze.py # Tests analyses
|
|
├── test_integration.py # Tests end-to-end
|
|
├── test_validation.py # Tests validators
|
|
├── test_helpers.py # Tests helpers (year detection, etc.)
|
|
└── README.md # How to run tests
|
|
```
|
|
|
|
---
|
|
|
|
## Template 1: test_fetch.py
|
|
|
|
**Objective:** Validate API client works
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Test suite for {API} client.
|
|
|
|
Tests all fetch methods with real API data.
|
|
"""
|
|
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
|
|
|
from fetch_{api} import {ApiClient}, DataNotFoundError
|
|
|
|
|
|
def test_get_{metric1}():
|
|
"""Test fetching {metric1} data."""
|
|
print("\nTesting get_{metric1}()...")
|
|
|
|
try:
|
|
client = {ApiClient}()
|
|
|
|
# Test with valid parameters
|
|
result = client.get_{metric1}(
|
|
{entity}='{valid_entity}',
|
|
year=2024
|
|
)
|
|
|
|
# Validations
|
|
assert 'data' in result, "Missing 'data' in result"
|
|
assert 'metadata' in result, "Missing 'metadata'"
|
|
assert len(result['data']) > 0, "No data returned"
|
|
assert result['metadata']['from_cache'] in [True, False]
|
|
|
|
print(f" ✓ Fetched {len(result['data'])} records")
|
|
print(f" ✓ Metadata present")
|
|
print(f" ✓ From cache: {result['metadata']['from_cache']}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
return False
|
|
|
|
|
|
def test_get_{metric2}():
|
|
"""Test fetching {metric2} data."""
|
|
# Similar structure...
|
|
pass
|
|
|
|
|
|
def test_error_handling():
|
|
"""Test that errors are handled correctly."""
|
|
print("\nTesting error handling...")
|
|
|
|
try:
|
|
client = {ApiClient}()
|
|
|
|
# Test invalid entity (should raise)
|
|
try:
|
|
result = client.get_{metric1}({entity}='INVALID_ENTITY', year=2024)
|
|
print(" ✗ Should have raised DataNotFoundError")
|
|
return False
|
|
except DataNotFoundError:
|
|
print(" ✓ Correctly raises DataNotFoundError for invalid entity")
|
|
|
|
# Test invalid year (should raise)
|
|
try:
|
|
result = client.get_{metric1}({entity}='{valid}', year=2099)
|
|
print(" ✗ Should have raised ValidationError")
|
|
return False
|
|
except Exception as e:
|
|
print(f" ✓ Correctly raises error for future year")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ Unexpected error: {e}")
|
|
return False
|
|
|
|
|
|
def main():
|
|
"""Run all fetch tests."""
|
|
print("=" * 70)
|
|
print("FETCH TESTS - {API} Client")
|
|
print("=" * 70)
|
|
|
|
results = []
|
|
|
|
# Test each get_* method
|
|
results.append(("get_{metric1}", test_get_{metric1}()))
|
|
results.append(("get_{metric2}", test_get_{metric2}()))
|
|
# ... add test for ALL get_* methods
|
|
|
|
results.append(("error_handling", test_error_handling()))
|
|
|
|
# Summary
|
|
print("\n" + "=" * 70)
|
|
print("SUMMARY")
|
|
print("=" * 70)
|
|
|
|
passed = sum(1 for _, r in results if r)
|
|
total = len(results)
|
|
|
|
for name, result in results:
|
|
status = "✓ PASS" if result else "✗ FAIL"
|
|
print(f"{status}: {name}()")
|
|
|
|
print(f"\nResults: {passed}/{total} tests passed")
|
|
|
|
return passed == total
|
|
|
|
|
|
if __name__ == "__main__":
|
|
success = main()
|
|
sys.exit(0 if success else 1)
|
|
```
|
|
|
|
**Rule:** ONE test function for EACH `get_*()` method implemented!
|
|
|
|
---
|
|
|
|
## Template 2: test_parse.py
|
|
|
|
**Objective:** Validate parsers
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Test suite for data parsers.
|
|
|
|
Tests all parse_* modules.
|
|
"""
|
|
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
|
|
|
from parse_{type1} import parse_{type1}_response
|
|
from parse_{type2} import parse_{type2}_response
|
|
|
|
|
|
def test_parse_{type1}():
|
|
"""Test {type1} parser."""
|
|
print("\nTesting parse_{type1}_response()...")
|
|
|
|
# Sample data (real structure from API)
|
|
sample_data = [
|
|
{
|
|
'field1': 'value1',
|
|
'field2': 'value2',
|
|
'Value': '123',
|
|
# ... real API fields
|
|
}
|
|
]
|
|
|
|
try:
|
|
df = parse_{type1}_response(sample_data)
|
|
|
|
# Validations
|
|
assert not df.empty, "DataFrame is empty"
|
|
assert 'Value' in df.columns or '{metric}_value' in df.columns
|
|
assert len(df) == len(sample_data)
|
|
|
|
print(f" ✓ Parsed {len(df)} records")
|
|
print(f" ✓ Columns: {list(df.columns)}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
import traceback
|
|
traceback.print_exc()
|
|
return False
|
|
|
|
|
|
def test_parse_empty_data():
|
|
"""Test parser handles empty data gracefully."""
|
|
print("\nTesting empty data handling...")
|
|
|
|
try:
|
|
from parse_{type1} import ParseError
|
|
|
|
try:
|
|
df = parse_{type1}_response([])
|
|
print(" ✗ Should have raised ParseError")
|
|
return False
|
|
except ParseError as e:
|
|
print(f" ✓ Correctly raises ParseError: {e}")
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ Unexpected error: {e}")
|
|
return False
|
|
|
|
|
|
def main():
|
|
results = []
|
|
|
|
# Test each parser
|
|
results.append(("parse_{type1}", test_parse_{type1}()))
|
|
results.append(("parse_{type2}", test_parse_{type2}()))
|
|
# ... for ALL parsers
|
|
|
|
results.append(("empty_data", test_parse_empty_data()))
|
|
|
|
# Summary
|
|
passed = sum(1 for _, r in results if r)
|
|
print(f"\nResults: {passed}/{len(results)} passed")
|
|
return passed == len(results)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(0 if main() else 1)
|
|
```
|
|
|
|
---
|
|
|
|
## Template 3: test_integration.py
|
|
|
|
**Objective:** End-to-end tests (MOST IMPORTANT!)
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Integration tests for {skill-name}.
|
|
|
|
Tests all analysis functions with REAL API data.
|
|
"""
|
|
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
|
|
|
from analyze_{domain} import (
|
|
{function1},
|
|
{function2},
|
|
{function3},
|
|
# ... import ALL functions
|
|
)
|
|
|
|
|
|
def test_{function1}():
|
|
"""Test {function1} with auto-year detection."""
|
|
print("\n1. Testing {function1}()...")
|
|
|
|
try:
|
|
# Test WITHOUT year (auto-detection)
|
|
result = {function1}({entity}='{valid_entity}')
|
|
|
|
# Validations
|
|
assert 'year' in result, "Missing year"
|
|
assert 'year_requested' in result, "Missing year_requested"
|
|
assert 'year_info' in result, "Missing year_info"
|
|
assert result['year'] >= 2024, "Year too old"
|
|
assert result['year_requested'] is None, "Should auto-detect"
|
|
|
|
print(f" ✓ Auto-year detection: {result['year']}")
|
|
print(f" ✓ Year info: {result['year_info']}")
|
|
print(f" ✓ Data present: {list(result.keys())}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
import traceback
|
|
traceback.print_exc()
|
|
return False
|
|
|
|
|
|
def test_{function1}_with_explicit_year():
|
|
"""Test {function1} with explicit year."""
|
|
print("\n2. Testing {function1}() with explicit year...")
|
|
|
|
try:
|
|
# Test WITH year specified
|
|
result = {function1}({entity}='{valid_entity}', year=2024)
|
|
|
|
assert result['year'] == 2024, f"Expected 2024, got {result['year']}"
|
|
assert result['year_requested'] == 2024
|
|
|
|
print(f" ✓ Uses specified year: {result['year']}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
return False
|
|
|
|
|
|
def test_all_functions_exist():
|
|
"""Verify all expected functions are implemented."""
|
|
print("\nVerifying all functions exist...")
|
|
|
|
expected_functions = [
|
|
'{function1}',
|
|
'{function2}',
|
|
'{function3}',
|
|
# ... ALL functions
|
|
]
|
|
|
|
missing = []
|
|
for func_name in expected_functions:
|
|
if func_name not in globals():
|
|
missing.append(func_name)
|
|
|
|
if missing:
|
|
print(f" ✗ Missing functions: {missing}")
|
|
return False
|
|
else:
|
|
print(f" ✓ All {len(expected_functions)} functions present")
|
|
return True
|
|
|
|
|
|
def main():
|
|
"""Run all integration tests."""
|
|
print("\n" + "=" * 70)
|
|
print("{SKILL NAME} - INTEGRATION TEST SUITE")
|
|
print("=" * 70)
|
|
|
|
results = []
|
|
|
|
# Test each function
|
|
results.append(("{function1} auto-year", test_{function1}()))
|
|
results.append(("{function1} explicit-year", test_{function1}_with_explicit_year()))
|
|
# ... repeat for ALL functions
|
|
|
|
results.append(("all_functions_exist", test_all_functions_exist()))
|
|
|
|
# Summary
|
|
print("\n" + "=" * 70)
|
|
print("FINAL SUMMARY")
|
|
print("=" * 70)
|
|
|
|
passed = sum(1 for _, r in results if r)
|
|
total = len(results)
|
|
|
|
print(f"\n✓ Passed: {passed}/{total}")
|
|
print(f"✗ Failed: {total - passed}/{total}")
|
|
|
|
if passed == total:
|
|
print("\n🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!")
|
|
else:
|
|
print(f"\n⚠ {total - passed} test(s) failed - FIX BEFORE RELEASE")
|
|
|
|
print("=" * 70)
|
|
|
|
return passed == total
|
|
|
|
|
|
if __name__ == "__main__":
|
|
success = main()
|
|
sys.exit(0 if success else 1)
|
|
```
|
|
|
|
**Rule:** Minimum 2 tests per analysis function (auto-year + explicit-year)
|
|
|
|
---
|
|
|
|
## Template 4: test_helpers.py
|
|
|
|
**Objective:** Test year detection helpers
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Test suite for utility helpers.
|
|
|
|
Tests temporal context detection.
|
|
"""
|
|
|
|
import sys
|
|
from pathlib import Path
|
|
from datetime import datetime
|
|
|
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
|
|
|
from utils.helpers import (
|
|
get_current_{domain}_year,
|
|
should_try_previous_year,
|
|
format_year_message
|
|
)
|
|
|
|
|
|
def test_get_current_year():
|
|
"""Test current year detection."""
|
|
print("\nTesting get_current_{domain}_year()...")
|
|
|
|
try:
|
|
year = get_current_{domain}_year()
|
|
current_year = datetime.now().year
|
|
|
|
assert year == current_year, f"Expected {current_year}, got {year}"
|
|
|
|
print(f" ✓ Correctly returns: {year}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
return False
|
|
|
|
|
|
def test_should_try_previous_year():
|
|
"""Test seasonal fallback logic."""
|
|
print("\nTesting should_try_previous_year()...")
|
|
|
|
try:
|
|
# Test with None (current year)
|
|
result = should_try_previous_year()
|
|
print(f" ✓ Current year fallback: {result}")
|
|
|
|
# Test with specific year
|
|
result_past = should_try_previous_year(2023)
|
|
print(f" ✓ Past year fallback: {result_past}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
return False
|
|
|
|
|
|
def test_format_year_message():
|
|
"""Test year message formatting."""
|
|
print("\nTesting format_year_message()...")
|
|
|
|
try:
|
|
# Test auto-detected
|
|
msg1 = format_year_message(2025, None)
|
|
assert "auto-detected" in msg1.lower() or "2025" in msg1
|
|
print(f" ✓ Auto-detected: {msg1}")
|
|
|
|
# Test requested
|
|
msg2 = format_year_message(2024, 2024)
|
|
assert "2024" in msg2
|
|
print(f" ✓ Requested: {msg2}")
|
|
|
|
# Test fallback
|
|
msg3 = format_year_message(2024, 2025)
|
|
assert "not" in msg3.lower() or "fallback" in msg3.lower()
|
|
print(f" ✓ Fallback: {msg3}")
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f" ✗ FAILED: {e}")
|
|
return False
|
|
|
|
|
|
def main():
|
|
results = []
|
|
|
|
results.append(("get_current_year", test_get_current_year()))
|
|
results.append(("should_try_previous_year", test_should_try_previous_year()))
|
|
results.append(("format_year_message", test_format_year_message()))
|
|
|
|
passed = sum(1 for _, r in results if r)
|
|
print(f"\nResults: {passed}/{len(results)} passed")
|
|
|
|
return passed == len(results)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(0 if main() else 1)
|
|
```
|
|
|
|
---
|
|
|
|
## Quality Rules for Tests
|
|
|
|
### 1. ALL tests must use REAL DATA
|
|
|
|
❌ **FORBIDDEN:**
|
|
```python
|
|
def test_function():
|
|
# Mock data
|
|
mock_data = {'fake': 'data'}
|
|
result = function(mock_data)
|
|
assert result == 'expected'
|
|
```
|
|
|
|
✅ **MANDATORY:**
|
|
```python
|
|
def test_function():
|
|
# Real API call
|
|
client = ApiClient()
|
|
result = client.get_real_data(entity='REAL', year=2024)
|
|
|
|
# Validate real response
|
|
assert len(result['data']) > 0
|
|
assert 'metadata' in result
|
|
```
|
|
|
|
**Why?**
|
|
- Tests with mocks don't guarantee API is working
|
|
- Real tests detect API changes
|
|
- Client needs to know it works with REAL data
|
|
|
|
---
|
|
|
|
### 2. Tests must be FAST
|
|
|
|
**Goal:** Complete suite in < 60 seconds
|
|
|
|
**Techniques:**
|
|
- Use cache: First test populates cache, rest use cached
|
|
- Limit requests: Don't test 100 entities, test 2-3
|
|
- Parallel where possible
|
|
|
|
```python
|
|
# Example: Populate cache once
|
|
@classmethod
|
|
def setUpClass(cls):
|
|
"""Populate cache before all tests."""
|
|
client = ApiClient()
|
|
client.get_data('ENTITY1', 2024) # Cache for other tests
|
|
|
|
# Tests then use cached data (fast)
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Tests must PASS 100%
|
|
|
|
**Quality Gate:** Skill is only "done" when ALL tests pass.
|
|
|
|
```python
|
|
if __name__ == "__main__":
|
|
success = main()
|
|
if not success:
|
|
print("\n❌ SKILL NOT READY - FIX FAILING TESTS")
|
|
sys.exit(1)
|
|
else:
|
|
print("\n✅ SKILL READY FOR DISTRIBUTION")
|
|
sys.exit(0)
|
|
```
|
|
|
|
---
|
|
|
|
## Test Coverage Requirements
|
|
|
|
### Minimum Mandatory:
|
|
|
|
**Per module:**
|
|
- `fetch_{api}.py`: 1 test per `get_*()` method + 1 error handling test
|
|
- Each `parse_{type}.py`: 1 test per main function
|
|
- `analyze_{domain}.py`: 2 tests per analysis (auto-year + explicit-year)
|
|
- `utils/helpers.py`: 3 tests (get_year, should_fallback, format_message)
|
|
|
|
**Expected total:** 15-30 tests depending on skill size
|
|
|
|
**Example (us-crop-monitor v2.0):**
|
|
- test_fetch.py: 6 tests (5 get_* + 1 error)
|
|
- test_parse.py: 4 tests (4 parsers)
|
|
- test_analyze.py: 11 tests (11 functions)
|
|
- test_helpers.py: 3 tests
|
|
- test_integration.py: 1 end-to-end test
|
|
- **Total:** 25 tests
|
|
|
|
---
|
|
|
|
## How to Run Tests
|
|
|
|
### Individual:
|
|
```bash
|
|
python3 tests/test_fetch.py
|
|
python3 tests/test_integration.py
|
|
```
|
|
|
|
### Complete suite:
|
|
```bash
|
|
# Run all
|
|
for test in tests/test_*.py; do
|
|
python3 $test || exit 1
|
|
done
|
|
|
|
# Or with pytest (if available)
|
|
pytest tests/
|
|
```
|
|
|
|
### In CI/CD:
|
|
```yaml
|
|
# .github/workflows/test.yml
|
|
name: Test Suite
|
|
on: [push]
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v2
|
|
- run: pip install -r requirements.txt
|
|
- run: python3 tests/test_integration.py
|
|
```
|
|
|
|
---
|
|
|
|
## Output Example
|
|
|
|
**When tests pass:**
|
|
```
|
|
======================================================================
|
|
US CROP MONITOR - INTEGRATION TEST SUITE
|
|
======================================================================
|
|
|
|
1. current_condition_report()...
|
|
✓ Year: 2025 | Week: 39
|
|
✓ Good+Excellent: 66.0%
|
|
|
|
2. week_over_week_comparison()...
|
|
✓ Year: 2025 | Weeks: 39 vs 38
|
|
✓ Delta: -2.2 pts
|
|
|
|
...
|
|
|
|
======================================================================
|
|
FINAL SUMMARY
|
|
======================================================================
|
|
|
|
✓ Passed: 25/25 tests
|
|
✗ Failed: 0/25 tests
|
|
|
|
🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!
|
|
======================================================================
|
|
```
|
|
|
|
**When tests fail:**
|
|
```
|
|
8. yield_analysis()...
|
|
✗ FAILED: 'yield_bu_per_acre' not in result
|
|
|
|
...
|
|
|
|
FINAL SUMMARY:
|
|
✓ Passed: 24/25
|
|
✗ Failed: 1/25
|
|
|
|
❌ SKILL NOT READY - FIX FAILING TESTS
|
|
```
|
|
|
|
---
|
|
|
|
## Integration with Agent-Creator
|
|
|
|
### When to generate tests:
|
|
|
|
**In Phase 5 (Implementation):**
|
|
|
|
Updated order:
|
|
```
|
|
...
|
|
8. Implement analyze (analyses)
|
|
9. CREATE TESTS (← here!)
|
|
- Generate test_fetch.py
|
|
- Generate test_parse.py
|
|
- Generate test_analyze.py
|
|
- Generate test_helpers.py
|
|
- Generate test_integration.py
|
|
10. RUN TESTS
|
|
- Run test suite
|
|
- If fails → FIX and re-run
|
|
- Only continue when 100% passing
|
|
11. Create examples/
|
|
...
|
|
```
|
|
|
|
### Quality Gate:
|
|
|
|
```python
|
|
# Agent-creator should do:
|
|
print("Running test suite...")
|
|
exit_code = subprocess.run(['python3', 'tests/test_integration.py']).returncode
|
|
|
|
if exit_code != 0:
|
|
print("❌ Tests failed - aborting skill generation")
|
|
print("Fix errors above and try again")
|
|
sys.exit(1)
|
|
|
|
print("✅ All tests passed - continuing...")
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Checklist
|
|
|
|
Before considering skill "done":
|
|
|
|
- [ ] tests/ directory created
|
|
- [ ] test_fetch.py with 1 test per get_*() method
|
|
- [ ] test_parse.py with 1 test per parser
|
|
- [ ] test_analyze.py with 2 tests per function (auto-year + explicit)
|
|
- [ ] test_helpers.py with year detection tests
|
|
- [ ] test_integration.py with end-to-end test
|
|
- [ ] ALL tests passing (100%)
|
|
- [ ] Test suite executes in < 60 seconds
|
|
- [ ] README in tests/ explaining how to run
|
|
|
|
---
|
|
|
|
## Real Example: us-crop-monitor v2.0
|
|
|
|
**Tests created:**
|
|
- `test_new_metrics.py` - 5 tests (fetch methods)
|
|
- `test_year_detection.py` - 2 tests (auto-detection)
|
|
- `test_all_year_detection.py` - 4 tests (all functions)
|
|
- `test_new_analyses.py` - 3 tests (new analyses)
|
|
- `tests/test_integrated_validation.py` - 11 tests (comprehensive)
|
|
|
|
**Total:** 25 tests, 100% passing
|
|
|
|
**Result:**
|
|
```
|
|
✓ Passed: 25/25 tests
|
|
🎉 ALL TESTS PASSED! SKILL IS PRODUCTION READY!
|
|
```
|
|
|
|
**Benefit:** Full confidence v2.0 works before distribution!
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
**ALWAYS generate test suite!**
|
|
|
|
Skills without tests = prototypes
|
|
Skills with tests = professional products ✅
|
|
|
|
**ROI:** Tests cost +2h to create, but save 10-20h of debugging later!
|