Test Suite for Debugging Pipeline Failures Skill
Overview
This test suite validates that the pipeline debugging skill properly teaches Claude Code to:
- Follow systematic investigation methodology
- Use standard kubectl and Tekton commands
- Distinguish root causes from symptoms
- Correlate logs, events, and resource states
- Provide actionable debugging steps
Test Scenarios
1. Systematic Investigation Approach
Purpose: Validates Claude follows phased methodology (identify → logs → events → resources → root cause) Expected: Should mention systematic approach with kubectl commands for PipelineRun and TaskRun inspection Baseline Failure: Without skill, may suggest random checks without structure
2. Image Pull Failure Diagnosis
Purpose: Tests systematic diagnosis of ImagePullBackOff errors Expected: Should check pod events, image name, registry, and ServiceAccount imagePullSecrets Baseline Failure: Without skill, may not know to check pod describe or imagePullSecrets
3. Stuck Pipeline Investigation
Purpose: Validates methodology for pipelines stuck in Running state Expected: Should check individual TaskRun statuses to identify which is stuck/pending Baseline Failure: Without skill, may not know to list TaskRuns filtered by pipelineRun label
4. Resource Constraint Recognition
Purpose: Tests identification of scheduling and quota issues Expected: Should check events for FailedScheduling and namespace resource quotas Baseline Failure: Without skill, may not connect Pending state with resource constraints
5. Log Analysis Methodology
Purpose: Ensures proper Tekton log retrieval for failed steps Expected: Should know how to get logs from specific step containers in Tekton pods Baseline Failure: Without skill, may not understand Tekton step container naming
6. Root Cause vs Symptom
Purpose: Validates focus on investigation before applying fixes Expected: Should recommend investigating logs and root cause before increasing timeouts Baseline Failure: Without skill, may suggest quick fixes without investigation
Running Tests
Prerequisites
- Python 3.8+
- Claude Code CLI access
- Claude Sonnet 4.5 (tests use
sonnetmodel) - Access to test framework (if available in konflux-ci/skills repo)
Run All Tests
# From repository root
make test
# Or specifically for this skill
make test-only SKILL=debugging-pipeline-failures
Validate Skill Schema
claudelint debugging-pipeline-failures/SKILL.md
Generate Test Results
make generate SKILL=debugging-pipeline-failures
Test-Driven Development Process
This skill followed TDD for Documentation:
RED Phase (Initial Failures)
- Created 6 test scenarios representing real pipeline debugging needs
- Ran tests WITHOUT the skill
- Documented baseline failures:
- No systematic methodology
- Didn't know Tekton-specific kubectl commands
- Confused symptoms with root causes
- Missing event and resource correlation
GREEN Phase (Minimal Skill)
- Created SKILL.md addressing test failures
- Added 5-phase investigation methodology
- Included kubectl command examples
- Emphasized root cause analysis
- All tests passed
REFACTOR Phase (Improvement)
- Added common failure patterns (6 types)
- Enhanced with decision tree
- Improved troubleshooting workflow
- Added common confusions section
Success Criteria
All tests must:
- ✅ Pass with 100% success rate (3/3 samples)
- ✅ Contain expected keywords (kubectl, systematic approach)
- ✅ NOT contain prohibited terms (quick fixes without investigation)
- ✅ Demonstrate phased methodology
- ✅ Focus on standard Tekton/Kubernetes tools
Continuous Validation
Tests run automatically on:
- Every pull request (GitHub Actions)
- Skill file modifications
- Schema changes
- Version updates
Adding New Tests
To add test scenarios:
- Identify gap: What failure pattern is missing?
- Create scenario: Add to
scenarios.yaml - Run without skill: Document baseline failure
- Update SKILL.md: Address the gap
- Validate: Ensure test passes
Example:
- name: your-test-name
description: What you're testing
prompt: "User query to test"
model: sonnet
samples: 3
expected:
contains_keywords:
- keyword1
- keyword2
baseline_failure: What happens without the skill
Known Limitations
- Tests use synthetic scenarios (not real Konflux failures)
- Keyword matching is basic (could use semantic analysis)
- No integration testing with actual clusters
- Sample size (3) may not catch all edge cases
Future Improvements
- Add tests for multi-step pipeline failures
- Include workspace debugging scenarios
- Add tests for intermittent failures
- Test with real Konflux pipeline YAML
Troubleshooting
Test Failures
Symptom: Test fails intermittently Fix: Increase samples or refine expected keywords
Symptom: All tests fail Fix: Check SKILL.md frontmatter and schema validation
Symptom: Baseline failure unclear Fix: Run test manually without skill, document actual output
Contributing
When contributing test improvements:
- Ensure tests are deterministic
- Use realistic Konflux user prompts
- Document baseline failures clearly
- Keep samples count reasonable (3-5)
- Update this README with new scenarios
Questions?
See main repository documentation or file an issue in konflux-ci/skills.