zhongwei/gh-konflux-ci-skills-skills-debugging-pipeline-failures

Fork 0

Files

History

Zhongwei Li 3dba6825cf Initial commit

2025-11-30 08:35:19 +08:00

results

Initial commit

2025-11-30 08:35:19 +08:00

README.md

Initial commit

2025-11-30 08:35:19 +08:00

scenarios.yaml

Initial commit

2025-11-30 08:35:19 +08:00

README.md

Test Suite for Debugging Pipeline Failures Skill

Overview

This test suite validates that the pipeline debugging skill properly teaches Claude Code to:

Follow systematic investigation methodology
Use standard kubectl and Tekton commands
Distinguish root causes from symptoms
Correlate logs, events, and resource states
Provide actionable debugging steps

Test Scenarios

1. Systematic Investigation Approach

Purpose: Validates Claude follows phased methodology (identify → logs → events → resources → root cause) Expected: Should mention systematic approach with kubectl commands for PipelineRun and TaskRun inspection Baseline Failure: Without skill, may suggest random checks without structure

2. Image Pull Failure Diagnosis

Purpose: Tests systematic diagnosis of ImagePullBackOff errors Expected: Should check pod events, image name, registry, and ServiceAccount imagePullSecrets Baseline Failure: Without skill, may not know to check pod describe or imagePullSecrets

3. Stuck Pipeline Investigation

Purpose: Validates methodology for pipelines stuck in Running state Expected: Should check individual TaskRun statuses to identify which is stuck/pending Baseline Failure: Without skill, may not know to list TaskRuns filtered by pipelineRun label

4. Resource Constraint Recognition

Purpose: Tests identification of scheduling and quota issues Expected: Should check events for FailedScheduling and namespace resource quotas Baseline Failure: Without skill, may not connect Pending state with resource constraints

5. Log Analysis Methodology

Purpose: Ensures proper Tekton log retrieval for failed steps Expected: Should know how to get logs from specific step containers in Tekton pods Baseline Failure: Without skill, may not understand Tekton step container naming

6. Root Cause vs Symptom

Purpose: Validates focus on investigation before applying fixes Expected: Should recommend investigating logs and root cause before increasing timeouts Baseline Failure: Without skill, may suggest quick fixes without investigation

Running Tests

Prerequisites

Python 3.8+
Claude Code CLI access
Claude Sonnet 4.5 (tests use sonnet model)
Access to test framework (if available in konflux-ci/skills repo)

Run All Tests

# From repository root
make test

# Or specifically for this skill
make test-only SKILL=debugging-pipeline-failures

Validate Skill Schema

claudelint debugging-pipeline-failures/SKILL.md

Generate Test Results

make generate SKILL=debugging-pipeline-failures

Test-Driven Development Process

This skill followed TDD for Documentation:

RED Phase (Initial Failures)

Created 6 test scenarios representing real pipeline debugging needs
Ran tests WITHOUT the skill
Documented baseline failures:
- No systematic methodology
- Didn't know Tekton-specific kubectl commands
- Confused symptoms with root causes
- Missing event and resource correlation

GREEN Phase (Minimal Skill)

Created SKILL.md addressing test failures
Added 5-phase investigation methodology
Included kubectl command examples
Emphasized root cause analysis
All tests passed

REFACTOR Phase (Improvement)

Added common failure patterns (6 types)
Enhanced with decision tree
Improved troubleshooting workflow
Added common confusions section

Success Criteria

All tests must:

✅ Pass with 100% success rate (3/3 samples)
✅ Contain expected keywords (kubectl, systematic approach)
✅ NOT contain prohibited terms (quick fixes without investigation)
✅ Demonstrate phased methodology
✅ Focus on standard Tekton/Kubernetes tools

Continuous Validation

Tests run automatically on:

Every pull request (GitHub Actions)
Skill file modifications
Schema changes
Version updates

Adding New Tests

To add test scenarios:

Identify gap: What failure pattern is missing?
Create scenario: Add to scenarios.yaml
Run without skill: Document baseline failure
Update SKILL.md: Address the gap
Validate: Ensure test passes

Example:

- name: your-test-name
  description: What you're testing
  prompt: "User query to test"
  model: sonnet
  samples: 3
  expected:
    contains_keywords:
      - keyword1
      - keyword2
  baseline_failure: What happens without the skill

Known Limitations

Tests use synthetic scenarios (not real Konflux failures)
Keyword matching is basic (could use semantic analysis)
No integration testing with actual clusters
Sample size (3) may not catch all edge cases

Future Improvements

Add tests for multi-step pipeline failures
Include workspace debugging scenarios
Add tests for intermittent failures
Test with real Konflux pipeline YAML

Troubleshooting

Test Failures

Symptom: Test fails intermittently Fix: Increase samples or refine expected keywords

Symptom: All tests fail Fix: Check SKILL.md frontmatter and schema validation

Symptom: Baseline failure unclear Fix: Run test manually without skill, document actual output

Contributing

When contributing test improvements:

Ensure tests are deterministic
Use realistic Konflux user prompts
Document baseline failures clearly
Keep samples count reasonable (3-5)
Update this README with new scenarios

Questions?

See main repository documentation or file an issue in konflux-ci/skills.