# Test Suite for Debugging Pipeline Failures Skill ## Overview This test suite validates that the pipeline debugging skill properly teaches Claude Code to: 1. Follow systematic investigation methodology 2. Use standard kubectl and Tekton commands 3. Distinguish root causes from symptoms 4. Correlate logs, events, and resource states 5. Provide actionable debugging steps ## Test Scenarios ### 1. Systematic Investigation Approach **Purpose**: Validates Claude follows phased methodology (identify → logs → events → resources → root cause) **Expected**: Should mention systematic approach with kubectl commands for PipelineRun and TaskRun inspection **Baseline Failure**: Without skill, may suggest random checks without structure ### 2. Image Pull Failure Diagnosis **Purpose**: Tests systematic diagnosis of ImagePullBackOff errors **Expected**: Should check pod events, image name, registry, and ServiceAccount imagePullSecrets **Baseline Failure**: Without skill, may not know to check pod describe or imagePullSecrets ### 3. Stuck Pipeline Investigation **Purpose**: Validates methodology for pipelines stuck in Running state **Expected**: Should check individual TaskRun statuses to identify which is stuck/pending **Baseline Failure**: Without skill, may not know to list TaskRuns filtered by pipelineRun label ### 4. Resource Constraint Recognition **Purpose**: Tests identification of scheduling and quota issues **Expected**: Should check events for FailedScheduling and namespace resource quotas **Baseline Failure**: Without skill, may not connect Pending state with resource constraints ### 5. Log Analysis Methodology **Purpose**: Ensures proper Tekton log retrieval for failed steps **Expected**: Should know how to get logs from specific step containers in Tekton pods **Baseline Failure**: Without skill, may not understand Tekton step container naming ### 6. Root Cause vs Symptom **Purpose**: Validates focus on investigation before applying fixes **Expected**: Should recommend investigating logs and root cause before increasing timeouts **Baseline Failure**: Without skill, may suggest quick fixes without investigation ## Running Tests ### Prerequisites - Python 3.8+ - Claude Code CLI access - Claude Sonnet 4.5 (tests use `sonnet` model) - Access to test framework (if available in konflux-ci/skills repo) ### Run All Tests ```bash # From repository root make test # Or specifically for this skill make test-only SKILL=debugging-pipeline-failures ``` ### Validate Skill Schema ```bash claudelint debugging-pipeline-failures/SKILL.md ``` ### Generate Test Results ```bash make generate SKILL=debugging-pipeline-failures ``` ## Test-Driven Development Process This skill followed TDD for Documentation: ### RED Phase (Initial Failures) 1. Created 6 test scenarios representing real pipeline debugging needs 2. Ran tests WITHOUT the skill 3. Documented baseline failures: - No systematic methodology - Didn't know Tekton-specific kubectl commands - Confused symptoms with root causes - Missing event and resource correlation ### GREEN Phase (Minimal Skill) 1. Created SKILL.md addressing test failures 2. Added 5-phase investigation methodology 3. Included kubectl command examples 4. Emphasized root cause analysis 5. All tests passed ### REFACTOR Phase (Improvement) 1. Added common failure patterns (6 types) 2. Enhanced with decision tree 3. Improved troubleshooting workflow 4. Added common confusions section ## Success Criteria All tests must: - ✅ Pass with 100% success rate (3/3 samples) - ✅ Contain expected keywords (kubectl, systematic approach) - ✅ NOT contain prohibited terms (quick fixes without investigation) - ✅ Demonstrate phased methodology - ✅ Focus on standard Tekton/Kubernetes tools ## Continuous Validation Tests run automatically on: - Every pull request (GitHub Actions) - Skill file modifications - Schema changes - Version updates ## Adding New Tests To add test scenarios: 1. **Identify gap**: What failure pattern is missing? 2. **Create scenario**: Add to `scenarios.yaml` 3. **Run without skill**: Document baseline failure 4. **Update SKILL.md**: Address the gap 5. **Validate**: Ensure test passes Example: ```yaml - name: your-test-name description: What you're testing prompt: "User query to test" model: sonnet samples: 3 expected: contains_keywords: - keyword1 - keyword2 baseline_failure: What happens without the skill ``` ## Known Limitations - Tests use synthetic scenarios (not real Konflux failures) - Keyword matching is basic (could use semantic analysis) - No integration testing with actual clusters - Sample size (3) may not catch all edge cases ## Future Improvements - Add tests for multi-step pipeline failures - Include workspace debugging scenarios - Add tests for intermittent failures - Test with real Konflux pipeline YAML ## Troubleshooting ### Test Failures **Symptom**: Test fails intermittently **Fix**: Increase samples or refine expected keywords **Symptom**: All tests fail **Fix**: Check SKILL.md frontmatter and schema validation **Symptom**: Baseline failure unclear **Fix**: Run test manually without skill, document actual output ## Contributing When contributing test improvements: 1. Ensure tests are deterministic 2. Use realistic Konflux user prompts 3. Document baseline failures clearly 4. Keep samples count reasonable (3-5) 5. Update this README with new scenarios ## Questions? See main repository documentation or file an issue in konflux-ci/skills.