Initial commit
This commit is contained in:
180
tests/README.md
Normal file
180
tests/README.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# Test Suite for Debugging Pipeline Failures Skill
|
||||
|
||||
## Overview
|
||||
|
||||
This test suite validates that the pipeline debugging skill properly teaches Claude Code to:
|
||||
1. Follow systematic investigation methodology
|
||||
2. Use standard kubectl and Tekton commands
|
||||
3. Distinguish root causes from symptoms
|
||||
4. Correlate logs, events, and resource states
|
||||
5. Provide actionable debugging steps
|
||||
|
||||
## Test Scenarios
|
||||
|
||||
### 1. Systematic Investigation Approach
|
||||
**Purpose**: Validates Claude follows phased methodology (identify → logs → events → resources → root cause)
|
||||
**Expected**: Should mention systematic approach with kubectl commands for PipelineRun and TaskRun inspection
|
||||
**Baseline Failure**: Without skill, may suggest random checks without structure
|
||||
|
||||
### 2. Image Pull Failure Diagnosis
|
||||
**Purpose**: Tests systematic diagnosis of ImagePullBackOff errors
|
||||
**Expected**: Should check pod events, image name, registry, and ServiceAccount imagePullSecrets
|
||||
**Baseline Failure**: Without skill, may not know to check pod describe or imagePullSecrets
|
||||
|
||||
### 3. Stuck Pipeline Investigation
|
||||
**Purpose**: Validates methodology for pipelines stuck in Running state
|
||||
**Expected**: Should check individual TaskRun statuses to identify which is stuck/pending
|
||||
**Baseline Failure**: Without skill, may not know to list TaskRuns filtered by pipelineRun label
|
||||
|
||||
### 4. Resource Constraint Recognition
|
||||
**Purpose**: Tests identification of scheduling and quota issues
|
||||
**Expected**: Should check events for FailedScheduling and namespace resource quotas
|
||||
**Baseline Failure**: Without skill, may not connect Pending state with resource constraints
|
||||
|
||||
### 5. Log Analysis Methodology
|
||||
**Purpose**: Ensures proper Tekton log retrieval for failed steps
|
||||
**Expected**: Should know how to get logs from specific step containers in Tekton pods
|
||||
**Baseline Failure**: Without skill, may not understand Tekton step container naming
|
||||
|
||||
### 6. Root Cause vs Symptom
|
||||
**Purpose**: Validates focus on investigation before applying fixes
|
||||
**Expected**: Should recommend investigating logs and root cause before increasing timeouts
|
||||
**Baseline Failure**: Without skill, may suggest quick fixes without investigation
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Python 3.8+
|
||||
- Claude Code CLI access
|
||||
- Claude Sonnet 4.5 (tests use `sonnet` model)
|
||||
- Access to test framework (if available in konflux-ci/skills repo)
|
||||
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
# From repository root
|
||||
make test
|
||||
|
||||
# Or specifically for this skill
|
||||
make test-only SKILL=debugging-pipeline-failures
|
||||
```
|
||||
|
||||
### Validate Skill Schema
|
||||
|
||||
```bash
|
||||
claudelint debugging-pipeline-failures/SKILL.md
|
||||
```
|
||||
|
||||
### Generate Test Results
|
||||
|
||||
```bash
|
||||
make generate SKILL=debugging-pipeline-failures
|
||||
```
|
||||
|
||||
## Test-Driven Development Process
|
||||
|
||||
This skill followed TDD for Documentation:
|
||||
|
||||
### RED Phase (Initial Failures)
|
||||
1. Created 6 test scenarios representing real pipeline debugging needs
|
||||
2. Ran tests WITHOUT the skill
|
||||
3. Documented baseline failures:
|
||||
- No systematic methodology
|
||||
- Didn't know Tekton-specific kubectl commands
|
||||
- Confused symptoms with root causes
|
||||
- Missing event and resource correlation
|
||||
|
||||
### GREEN Phase (Minimal Skill)
|
||||
1. Created SKILL.md addressing test failures
|
||||
2. Added 5-phase investigation methodology
|
||||
3. Included kubectl command examples
|
||||
4. Emphasized root cause analysis
|
||||
5. All tests passed
|
||||
|
||||
### REFACTOR Phase (Improvement)
|
||||
1. Added common failure patterns (6 types)
|
||||
2. Enhanced with decision tree
|
||||
3. Improved troubleshooting workflow
|
||||
4. Added common confusions section
|
||||
|
||||
## Success Criteria
|
||||
|
||||
All tests must:
|
||||
- ✅ Pass with 100% success rate (3/3 samples)
|
||||
- ✅ Contain expected keywords (kubectl, systematic approach)
|
||||
- ✅ NOT contain prohibited terms (quick fixes without investigation)
|
||||
- ✅ Demonstrate phased methodology
|
||||
- ✅ Focus on standard Tekton/Kubernetes tools
|
||||
|
||||
## Continuous Validation
|
||||
|
||||
Tests run automatically on:
|
||||
- Every pull request (GitHub Actions)
|
||||
- Skill file modifications
|
||||
- Schema changes
|
||||
- Version updates
|
||||
|
||||
## Adding New Tests
|
||||
|
||||
To add test scenarios:
|
||||
|
||||
1. **Identify gap**: What failure pattern is missing?
|
||||
2. **Create scenario**: Add to `scenarios.yaml`
|
||||
3. **Run without skill**: Document baseline failure
|
||||
4. **Update SKILL.md**: Address the gap
|
||||
5. **Validate**: Ensure test passes
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
- name: your-test-name
|
||||
description: What you're testing
|
||||
prompt: "User query to test"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
contains_keywords:
|
||||
- keyword1
|
||||
- keyword2
|
||||
baseline_failure: What happens without the skill
|
||||
```
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- Tests use synthetic scenarios (not real Konflux failures)
|
||||
- Keyword matching is basic (could use semantic analysis)
|
||||
- No integration testing with actual clusters
|
||||
- Sample size (3) may not catch all edge cases
|
||||
|
||||
## Future Improvements
|
||||
|
||||
- Add tests for multi-step pipeline failures
|
||||
- Include workspace debugging scenarios
|
||||
- Add tests for intermittent failures
|
||||
- Test with real Konflux pipeline YAML
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Test Failures
|
||||
|
||||
**Symptom**: Test fails intermittently
|
||||
**Fix**: Increase samples or refine expected keywords
|
||||
|
||||
**Symptom**: All tests fail
|
||||
**Fix**: Check SKILL.md frontmatter and schema validation
|
||||
|
||||
**Symptom**: Baseline failure unclear
|
||||
**Fix**: Run test manually without skill, document actual output
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing test improvements:
|
||||
1. Ensure tests are deterministic
|
||||
2. Use realistic Konflux user prompts
|
||||
3. Document baseline failures clearly
|
||||
4. Keep samples count reasonable (3-5)
|
||||
5. Update this README with new scenarios
|
||||
|
||||
## Questions?
|
||||
|
||||
See main repository documentation or file an issue in konflux-ci/skills.
|
||||
Reference in New Issue
Block a user