Initial commit

2025-11-29 18:28:20 +08:00
commit 167c936f63
6 changed files with 751 additions and 0 deletions
--- a/tests/README.md
+++ b/tests/README.md
@@ -0,0 +1,182 @@
+# Test Suite for Debugging Kubernetes Incidents Skill
+
+## Overview
+
+This test suite validates that the Kubernetes incident debugging skill properly teaches Claude Code to:
+1. Recognize common Kubernetes failure patterns
+2. Follow systematic investigation methodology
+3. Correlate multiple data sources (logs, events, metrics)
+4. Distinguish root causes from symptoms
+5. Maintain read-only investigation approach
+
+## Test Scenarios
+
+### 1. CrashLoopBackOff Recognition
+**Purpose**: Validates Claude recognizes crash loop pattern and suggests proper investigation
+**Expected**: Should mention checking logs (--previous), events, describe, and exit codes
+**Baseline Failure**: Without skill, may suggest fixes without investigation
+
+### 2. OOMKilled Investigation
+**Purpose**: Ensures Claude identifies memory exhaustion and correlates with resource limits
+**Expected**: Should investigate memory usage patterns, limits, and potential leaks
+**Baseline Failure**: Without skill, may just suggest increasing memory
+
+### 3. Multi-Source Correlation
+**Purpose**: Tests ability to gather and correlate logs, events, and metrics
+**Expected**: Should mention all three data sources and timeline creation
+**Baseline Failure**: Without skill, may focus on single data source
+
+### 4. Root Cause vs Symptom
+**Purpose**: Validates temporal analysis to distinguish cause from effect
+**Expected**: Should use timeline and "what happened first" approach
+**Baseline Failure**: Without skill, may confuse correlation with causation
+
+### 5. Image Pull Failure
+**Purpose**: Tests systematic approach to ImagePullBackOff debugging
+**Expected**: Should check image name, registry, and pull secrets systematically
+**Baseline Failure**: Without skill, may suggest random fixes
+
+### 6. Read-Only Investigation
+**Purpose**: Ensures skill maintains advisory-only approach
+**Expected**: Should recommend steps, not execute changes
+**Baseline Failure**: Without skill, might suggest direct modifications
+
+## Running Tests
+
+### Prerequisites
+
+- Python 3.8+
+- `claudelint` installed for validation
+- Claude Code CLI access
+- Claude Sonnet 4 or Claude Opus 4 (tests use `sonnet` model)
+
+### Run All Tests
+
+```bash
+# From repository root
+make test
+
+# Or specifically for this skill
+make test-only SKILL=debugging-kubernetes-incidents
+```
+
+### Validate Skill Schema
+
+```bash
+claudelint debugging-kubernetes-incidents/SKILL.md
+```
+
+### Generate Test Results
+
+```bash
+make generate SKILL=debugging-kubernetes-incidents
+```
+
+## Test-Driven Development Process
+
+This skill followed TDD for Documentation:
+
+### RED Phase (Initial Failures)
+1. Created 6 test scenarios representing real investigation needs
+2. Ran tests WITHOUT the skill
+3. Documented baseline failures:
+   - Suggested direct fixes without investigation
+   - Missed multi-source correlation
+   - Confused symptoms with root causes
+   - Lacked systematic methodology
+
+### GREEN Phase (Minimal Skill)
+1. Created SKILL.md addressing test failures
+2. Added investigation phases and decision trees
+3. Included multi-source correlation guidance
+4. Emphasized read-only approach
+5. All tests passed
+
+### REFACTOR Phase (Improvement)
+1. Added real-world examples
+2. Enhanced decision trees
+3. Improved troubleshooting matrix
+4. Refined investigation methodology
+5. Added keyword search terms
+
+## Success Criteria
+
+All tests must:
+- ✅ Pass with 100% success rate (3/3 samples)
+- ✅ Contain expected keywords
+- ✅ NOT contain prohibited terms
+- ✅ Demonstrate systematic approach
+- ✅ Maintain read-only advisory model
+
+## Continuous Validation
+
+Tests run automatically on:
+- Every pull request (GitHub Actions)
+- Skill file modifications
+- Schema changes
+- Version updates
+
+## Adding New Tests
+
+To add test scenarios:
+
+1. **Identify gap**: What investigation scenario is missing?
+2. **Create scenario**: Add to `scenarios.yaml`
+3. **Run without skill**: Document baseline failure
+4. **Update SKILL.md**: Address the gap
+5. **Validate**: Ensure test passes
+
+Example:
+```yaml
+- name: your-test-name
+  description: What you're testing
+  prompt: "User query to test"
+  model: haiku
+  samples: 3
+  expected:
+    contains_keywords:
+      - keyword1
+      - keyword2
+  baseline_failure: What happens without the skill
+```
+
+## Known Limitations
+
+- Tests use synthetic scenarios (not real cluster data)
+- Keyword matching is basic (could use semantic analysis)
+- No integration testing with actual Kubernetes clusters
+- Sample size (3) may not catch all edge cases
+
+## Future Improvements
+
+- Add tests for more complex multi-cluster scenarios
+- Include performance regression testing
+- Add semantic similarity scoring
+- Test with real cluster incident data
+- Add negative test cases (what should NOT do)
+
+## Troubleshooting
+
+### Test Failures
+
+**Symptom**: Test fails intermittently
+**Fix**: Increase samples or refine expected keywords
+
+**Symptom**: All tests fail
+**Fix**: Check SKILL.md frontmatter and schema validation
+
+**Symptom**: Baseline failure unclear
+**Fix**: Run test manually without skill, document actual output
+
+## Contributing
+
+When contributing test improvements:
+1. Ensure tests are deterministic
+2. Use realistic user prompts
+3. Document baseline failures clearly
+4. Keep samples count reasonable (3-5)
+5. Update this README with new scenarios
+
+## Questions?
+
+See main repository documentation or file an issue.