Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:28:20 +08:00
commit 167c936f63
6 changed files with 751 additions and 0 deletions

182
tests/README.md Normal file
View File

@@ -0,0 +1,182 @@
# Test Suite for Debugging Kubernetes Incidents Skill
## Overview
This test suite validates that the Kubernetes incident debugging skill properly teaches Claude Code to:
1. Recognize common Kubernetes failure patterns
2. Follow systematic investigation methodology
3. Correlate multiple data sources (logs, events, metrics)
4. Distinguish root causes from symptoms
5. Maintain read-only investigation approach
## Test Scenarios
### 1. CrashLoopBackOff Recognition
**Purpose**: Validates Claude recognizes crash loop pattern and suggests proper investigation
**Expected**: Should mention checking logs (--previous), events, describe, and exit codes
**Baseline Failure**: Without skill, may suggest fixes without investigation
### 2. OOMKilled Investigation
**Purpose**: Ensures Claude identifies memory exhaustion and correlates with resource limits
**Expected**: Should investigate memory usage patterns, limits, and potential leaks
**Baseline Failure**: Without skill, may just suggest increasing memory
### 3. Multi-Source Correlation
**Purpose**: Tests ability to gather and correlate logs, events, and metrics
**Expected**: Should mention all three data sources and timeline creation
**Baseline Failure**: Without skill, may focus on single data source
### 4. Root Cause vs Symptom
**Purpose**: Validates temporal analysis to distinguish cause from effect
**Expected**: Should use timeline and "what happened first" approach
**Baseline Failure**: Without skill, may confuse correlation with causation
### 5. Image Pull Failure
**Purpose**: Tests systematic approach to ImagePullBackOff debugging
**Expected**: Should check image name, registry, and pull secrets systematically
**Baseline Failure**: Without skill, may suggest random fixes
### 6. Read-Only Investigation
**Purpose**: Ensures skill maintains advisory-only approach
**Expected**: Should recommend steps, not execute changes
**Baseline Failure**: Without skill, might suggest direct modifications
## Running Tests
### Prerequisites
- Python 3.8+
- `claudelint` installed for validation
- Claude Code CLI access
- Claude Sonnet 4 or Claude Opus 4 (tests use `sonnet` model)
### Run All Tests
```bash
# From repository root
make test
# Or specifically for this skill
make test-only SKILL=debugging-kubernetes-incidents
```
### Validate Skill Schema
```bash
claudelint debugging-kubernetes-incidents/SKILL.md
```
### Generate Test Results
```bash
make generate SKILL=debugging-kubernetes-incidents
```
## Test-Driven Development Process
This skill followed TDD for Documentation:
### RED Phase (Initial Failures)
1. Created 6 test scenarios representing real investigation needs
2. Ran tests WITHOUT the skill
3. Documented baseline failures:
- Suggested direct fixes without investigation
- Missed multi-source correlation
- Confused symptoms with root causes
- Lacked systematic methodology
### GREEN Phase (Minimal Skill)
1. Created SKILL.md addressing test failures
2. Added investigation phases and decision trees
3. Included multi-source correlation guidance
4. Emphasized read-only approach
5. All tests passed
### REFACTOR Phase (Improvement)
1. Added real-world examples
2. Enhanced decision trees
3. Improved troubleshooting matrix
4. Refined investigation methodology
5. Added keyword search terms
## Success Criteria
All tests must:
- ✅ Pass with 100% success rate (3/3 samples)
- ✅ Contain expected keywords
- ✅ NOT contain prohibited terms
- ✅ Demonstrate systematic approach
- ✅ Maintain read-only advisory model
## Continuous Validation
Tests run automatically on:
- Every pull request (GitHub Actions)
- Skill file modifications
- Schema changes
- Version updates
## Adding New Tests
To add test scenarios:
1. **Identify gap**: What investigation scenario is missing?
2. **Create scenario**: Add to `scenarios.yaml`
3. **Run without skill**: Document baseline failure
4. **Update SKILL.md**: Address the gap
5. **Validate**: Ensure test passes
Example:
```yaml
- name: your-test-name
description: What you're testing
prompt: "User query to test"
model: haiku
samples: 3
expected:
contains_keywords:
- keyword1
- keyword2
baseline_failure: What happens without the skill
```
## Known Limitations
- Tests use synthetic scenarios (not real cluster data)
- Keyword matching is basic (could use semantic analysis)
- No integration testing with actual Kubernetes clusters
- Sample size (3) may not catch all edge cases
## Future Improvements
- Add tests for more complex multi-cluster scenarios
- Include performance regression testing
- Add semantic similarity scoring
- Test with real cluster incident data
- Add negative test cases (what should NOT do)
## Troubleshooting
### Test Failures
**Symptom**: Test fails intermittently
**Fix**: Increase samples or refine expected keywords
**Symptom**: All tests fail
**Fix**: Check SKILL.md frontmatter and schema validation
**Symptom**: Baseline failure unclear
**Fix**: Run test manually without skill, document actual output
## Contributing
When contributing test improvements:
1. Ensure tests are deterministic
2. Use realistic user prompts
3. Document baseline failures clearly
4. Keep samples count reasonable (3-5)
5. Update this README with new scenarios
## Questions?
See main repository documentation or file an issue.

116
tests/scenarios.yaml Normal file
View File

@@ -0,0 +1,116 @@
skill_name: debugging-kubernetes-incidents
description: Validates systematic Kubernetes incident investigation methodology and root cause analysis capabilities
test_scenarios:
- name: crashloopbackoff-recognition
description: Recognizes CrashLoopBackOff pattern and suggests appropriate investigation steps
prompt: "I have a pod in CrashLoopBackOff state. What should I do?"
model: sonnet
samples: 3
expected:
recognizes: CrashLoopBackOff
contains_keywords:
- logs
- previous
- events
- describe
- exit code
does_not_contain:
- modify
- delete
- force
baseline_failure: Without the skill, may suggest direct fixes without investigation or miss the importance of checking previous container logs
- name: oomkilled-investigation
description: Identifies memory exhaustion pattern and correlates events with resource limits
prompt: "My pods keep getting OOMKilled. How do I find out why?"
model: sonnet
samples: 3
expected:
recognizes: OOMKilled
contains_keywords:
- memory
- limits
- resources
- usage
- leak
- events
does_not_contain:
- ignore
- just restart
baseline_failure: Without the skill, may only suggest increasing memory without investigating the underlying cause of memory exhaustion
- name: multi-source-correlation
description: Demonstrates correlation of logs, events, and metrics for complete incident picture
prompt: "I'm investigating a service degradation issue. What data should I collect and how do I correlate it?"
model: sonnet
samples: 3
expected:
contains_keywords:
- logs
- events
- metrics
- correlate
- timeline
- multiple sources
mentions_tools:
- kubectl logs
- kubectl get events
- kubectl top
baseline_failure: Without the skill, may focus on only one data source without correlating across logs, events, and metrics
- name: root-cause-vs-symptom
description: Distinguishes between root cause and symptoms using temporal analysis
prompt: "I see high CPU usage and connection errors. Which is the root cause?"
model: sonnet
samples: 3
expected:
contains_keywords:
- temporal
- first
- timeline
- causation
- correlation
provides_approach:
- Check what happened first
- Create timeline
- Validate causal mechanism
baseline_failure: Without the skill, may incorrectly identify symptoms as root causes or confuse correlation with causation
- name: image-pull-failure
description: Provides systematic approach to investigating ImagePullBackOff errors
prompt: "Pod status shows ImagePullBackOff. How do I debug this?"
model: sonnet
samples: 3
expected:
recognizes: ImagePullBackOff
contains_keywords:
- image
- registry
- secret
- pull
- authentication
- describe
investigation_steps:
- Check image name
- Verify registry
- Check pull secrets
baseline_failure: Without the skill, may suggest random fixes without systematic investigation of image name, registry access, or credentials
- name: read-only-investigation
description: Ensures skill maintains read-only approach and never suggests direct modifications
prompt: "I found the issue. How do I fix the pod?"
model: sonnet
samples: 3
expected:
contains_keywords:
- recommend
- manual
- review
- steps
does_not_contain:
- automatically
- I will delete
- I will modify
- I will restart
approach: Advisory recommendations only
baseline_failure: Without the skill, might suggest direct modifications or automated fixes without proper human review