Initial commit

2025-11-29 18:28:20 +08:00
commit 167c936f63
6 changed files with 751 additions and 0 deletions
--- a/tests/README.md
+++ b/tests/README.md
@@ -0,0 +1,182 @@
+# Test Suite for Debugging Kubernetes Incidents Skill
+
+## Overview
+
+This test suite validates that the Kubernetes incident debugging skill properly teaches Claude Code to:
+1. Recognize common Kubernetes failure patterns
+2. Follow systematic investigation methodology
+3. Correlate multiple data sources (logs, events, metrics)
+4. Distinguish root causes from symptoms
+5. Maintain read-only investigation approach
+
+## Test Scenarios
+
+### 1. CrashLoopBackOff Recognition
+**Purpose**: Validates Claude recognizes crash loop pattern and suggests proper investigation
+**Expected**: Should mention checking logs (--previous), events, describe, and exit codes
+**Baseline Failure**: Without skill, may suggest fixes without investigation
+
+### 2. OOMKilled Investigation
+**Purpose**: Ensures Claude identifies memory exhaustion and correlates with resource limits
+**Expected**: Should investigate memory usage patterns, limits, and potential leaks
+**Baseline Failure**: Without skill, may just suggest increasing memory
+
+### 3. Multi-Source Correlation
+**Purpose**: Tests ability to gather and correlate logs, events, and metrics
+**Expected**: Should mention all three data sources and timeline creation
+**Baseline Failure**: Without skill, may focus on single data source
+
+### 4. Root Cause vs Symptom
+**Purpose**: Validates temporal analysis to distinguish cause from effect
+**Expected**: Should use timeline and "what happened first" approach
+**Baseline Failure**: Without skill, may confuse correlation with causation
+
+### 5. Image Pull Failure
+**Purpose**: Tests systematic approach to ImagePullBackOff debugging
+**Expected**: Should check image name, registry, and pull secrets systematically
+**Baseline Failure**: Without skill, may suggest random fixes
+
+### 6. Read-Only Investigation
+**Purpose**: Ensures skill maintains advisory-only approach
+**Expected**: Should recommend steps, not execute changes
+**Baseline Failure**: Without skill, might suggest direct modifications
+
+## Running Tests
+
+### Prerequisites
+
+- Python 3.8+
+- `claudelint` installed for validation
+- Claude Code CLI access
+- Claude Sonnet 4 or Claude Opus 4 (tests use `sonnet` model)
+
+### Run All Tests
+
+```bash
+# From repository root
+make test
+
+# Or specifically for this skill
+make test-only SKILL=debugging-kubernetes-incidents
+```
+
+### Validate Skill Schema
+
+```bash
+claudelint debugging-kubernetes-incidents/SKILL.md
+```
+
+### Generate Test Results
+
+```bash
+make generate SKILL=debugging-kubernetes-incidents
+```
+
+## Test-Driven Development Process
+
+This skill followed TDD for Documentation:
+
+### RED Phase (Initial Failures)
+1. Created 6 test scenarios representing real investigation needs
+2. Ran tests WITHOUT the skill
+3. Documented baseline failures:
+   - Suggested direct fixes without investigation
+   - Missed multi-source correlation
+   - Confused symptoms with root causes
+   - Lacked systematic methodology
+
+### GREEN Phase (Minimal Skill)
+1. Created SKILL.md addressing test failures
+2. Added investigation phases and decision trees
+3. Included multi-source correlation guidance
+4. Emphasized read-only approach
+5. All tests passed
+
+### REFACTOR Phase (Improvement)
+1. Added real-world examples
+2. Enhanced decision trees
+3. Improved troubleshooting matrix
+4. Refined investigation methodology
+5. Added keyword search terms
+
+## Success Criteria
+
+All tests must:
+- ✅ Pass with 100% success rate (3/3 samples)
+- ✅ Contain expected keywords
+- ✅ NOT contain prohibited terms
+- ✅ Demonstrate systematic approach
+- ✅ Maintain read-only advisory model
+
+## Continuous Validation
+
+Tests run automatically on:
+- Every pull request (GitHub Actions)
+- Skill file modifications
+- Schema changes
+- Version updates
+
+## Adding New Tests
+
+To add test scenarios:
+
+1. **Identify gap**: What investigation scenario is missing?
+2. **Create scenario**: Add to `scenarios.yaml`
+3. **Run without skill**: Document baseline failure
+4. **Update SKILL.md**: Address the gap
+5. **Validate**: Ensure test passes
+
+Example:
+```yaml
+- name: your-test-name
+  description: What you're testing
+  prompt: "User query to test"
+  model: haiku
+  samples: 3
+  expected:
+    contains_keywords:
+      - keyword1
+      - keyword2
+  baseline_failure: What happens without the skill
+```
+
+## Known Limitations
+
+- Tests use synthetic scenarios (not real cluster data)
+- Keyword matching is basic (could use semantic analysis)
+- No integration testing with actual Kubernetes clusters
+- Sample size (3) may not catch all edge cases
+
+## Future Improvements
+
+- Add tests for more complex multi-cluster scenarios
+- Include performance regression testing
+- Add semantic similarity scoring
+- Test with real cluster incident data
+- Add negative test cases (what should NOT do)
+
+## Troubleshooting
+
+### Test Failures
+
+**Symptom**: Test fails intermittently
+**Fix**: Increase samples or refine expected keywords
+
+**Symptom**: All tests fail
+**Fix**: Check SKILL.md frontmatter and schema validation
+
+**Symptom**: Baseline failure unclear
+**Fix**: Run test manually without skill, document actual output
+
+## Contributing
+
+When contributing test improvements:
+1. Ensure tests are deterministic
+2. Use realistic user prompts
+3. Document baseline failures clearly
+4. Keep samples count reasonable (3-5)
+5. Update this README with new scenarios
+
+## Questions?
+
+See main repository documentation or file an issue.
--- a/tests/scenarios.yaml
+++ b/tests/scenarios.yaml
@@ -0,0 +1,116 @@
+skill_name: debugging-kubernetes-incidents
+description: Validates systematic Kubernetes incident investigation methodology and root cause analysis capabilities
+test_scenarios:
+  - name: crashloopbackoff-recognition
+    description: Recognizes CrashLoopBackOff pattern and suggests appropriate investigation steps
+    prompt: "I have a pod in CrashLoopBackOff state. What should I do?"
+    model: sonnet
+    samples: 3
+    expected:
+      recognizes: CrashLoopBackOff
+      contains_keywords:
+        - logs
+        - previous
+        - events
+        - describe
+        - exit code
+      does_not_contain:
+        - modify
+        - delete
+        - force
+    baseline_failure: Without the skill, may suggest direct fixes without investigation or miss the importance of checking previous container logs
+
+  - name: oomkilled-investigation
+    description: Identifies memory exhaustion pattern and correlates events with resource limits
+    prompt: "My pods keep getting OOMKilled. How do I find out why?"
+    model: sonnet
+    samples: 3
+    expected:
+      recognizes: OOMKilled
+      contains_keywords:
+        - memory
+        - limits
+        - resources
+        - usage
+        - leak
+        - events
+      does_not_contain:
+        - ignore
+        - just restart
+    baseline_failure: Without the skill, may only suggest increasing memory without investigating the underlying cause of memory exhaustion
+
+  - name: multi-source-correlation
+    description: Demonstrates correlation of logs, events, and metrics for complete incident picture
+    prompt: "I'm investigating a service degradation issue. What data should I collect and how do I correlate it?"
+    model: sonnet
+    samples: 3
+    expected:
+      contains_keywords:
+        - logs
+        - events
+        - metrics
+        - correlate
+        - timeline
+        - multiple sources
+      mentions_tools:
+        - kubectl logs
+        - kubectl get events
+        - kubectl top
+    baseline_failure: Without the skill, may focus on only one data source without correlating across logs, events, and metrics
+
+  - name: root-cause-vs-symptom
+    description: Distinguishes between root cause and symptoms using temporal analysis
+    prompt: "I see high CPU usage and connection errors. Which is the root cause?"
+    model: sonnet
+    samples: 3
+    expected:
+      contains_keywords:
+        - temporal
+        - first
+        - timeline
+        - causation
+        - correlation
+      provides_approach:
+        - Check what happened first
+        - Create timeline
+        - Validate causal mechanism
+    baseline_failure: Without the skill, may incorrectly identify symptoms as root causes or confuse correlation with causation
+
+  - name: image-pull-failure
+    description: Provides systematic approach to investigating ImagePullBackOff errors
+    prompt: "Pod status shows ImagePullBackOff. How do I debug this?"
+    model: sonnet
+    samples: 3
+    expected:
+      recognizes: ImagePullBackOff
+      contains_keywords:
+        - image
+        - registry
+        - secret
+        - pull
+        - authentication
+        - describe
+      investigation_steps:
+        - Check image name
+        - Verify registry
+        - Check pull secrets
+    baseline_failure: Without the skill, may suggest random fixes without systematic investigation of image name, registry access, or credentials
+
+  - name: read-only-investigation
+    description: Ensures skill maintains read-only approach and never suggests direct modifications
+    prompt: "I found the issue. How do I fix the pod?"
+    model: sonnet
+    samples: 3
+    expected:
+      contains_keywords:
+        - recommend
+        - manual
+        - review
+        - steps
+      does_not_contain:
+        - automatically
+        - I will delete
+        - I will modify
+        - I will restart
+      approach: Advisory recommendations only
+    baseline_failure: Without the skill, might suggest direct modifications or automated fixes without proper human review