Files
gh-geored-sre-skill-debuggi…/tests/scenarios.yaml
2025-11-29 18:28:20 +08:00

117 lines
3.9 KiB
YAML

skill_name: debugging-kubernetes-incidents
description: Validates systematic Kubernetes incident investigation methodology and root cause analysis capabilities
test_scenarios:
- name: crashloopbackoff-recognition
description: Recognizes CrashLoopBackOff pattern and suggests appropriate investigation steps
prompt: "I have a pod in CrashLoopBackOff state. What should I do?"
model: sonnet
samples: 3
expected:
recognizes: CrashLoopBackOff
contains_keywords:
- logs
- previous
- events
- describe
- exit code
does_not_contain:
- modify
- delete
- force
baseline_failure: Without the skill, may suggest direct fixes without investigation or miss the importance of checking previous container logs
- name: oomkilled-investigation
description: Identifies memory exhaustion pattern and correlates events with resource limits
prompt: "My pods keep getting OOMKilled. How do I find out why?"
model: sonnet
samples: 3
expected:
recognizes: OOMKilled
contains_keywords:
- memory
- limits
- resources
- usage
- leak
- events
does_not_contain:
- ignore
- just restart
baseline_failure: Without the skill, may only suggest increasing memory without investigating the underlying cause of memory exhaustion
- name: multi-source-correlation
description: Demonstrates correlation of logs, events, and metrics for complete incident picture
prompt: "I'm investigating a service degradation issue. What data should I collect and how do I correlate it?"
model: sonnet
samples: 3
expected:
contains_keywords:
- logs
- events
- metrics
- correlate
- timeline
- multiple sources
mentions_tools:
- kubectl logs
- kubectl get events
- kubectl top
baseline_failure: Without the skill, may focus on only one data source without correlating across logs, events, and metrics
- name: root-cause-vs-symptom
description: Distinguishes between root cause and symptoms using temporal analysis
prompt: "I see high CPU usage and connection errors. Which is the root cause?"
model: sonnet
samples: 3
expected:
contains_keywords:
- temporal
- first
- timeline
- causation
- correlation
provides_approach:
- Check what happened first
- Create timeline
- Validate causal mechanism
baseline_failure: Without the skill, may incorrectly identify symptoms as root causes or confuse correlation with causation
- name: image-pull-failure
description: Provides systematic approach to investigating ImagePullBackOff errors
prompt: "Pod status shows ImagePullBackOff. How do I debug this?"
model: sonnet
samples: 3
expected:
recognizes: ImagePullBackOff
contains_keywords:
- image
- registry
- secret
- pull
- authentication
- describe
investigation_steps:
- Check image name
- Verify registry
- Check pull secrets
baseline_failure: Without the skill, may suggest random fixes without systematic investigation of image name, registry access, or credentials
- name: read-only-investigation
description: Ensures skill maintains read-only approach and never suggests direct modifications
prompt: "I found the issue. How do I fix the pod?"
model: sonnet
samples: 3
expected:
contains_keywords:
- recommend
- manual
- review
- steps
does_not_contain:
- automatically
- I will delete
- I will modify
- I will restart
approach: Advisory recommendations only
baseline_failure: Without the skill, might suggest direct modifications or automated fixes without proper human review