Initial commit
This commit is contained in:
116
tests/scenarios.yaml
Normal file
116
tests/scenarios.yaml
Normal file
@@ -0,0 +1,116 @@
|
||||
skill_name: debugging-kubernetes-incidents
|
||||
description: Validates systematic Kubernetes incident investigation methodology and root cause analysis capabilities
|
||||
test_scenarios:
|
||||
- name: crashloopbackoff-recognition
|
||||
description: Recognizes CrashLoopBackOff pattern and suggests appropriate investigation steps
|
||||
prompt: "I have a pod in CrashLoopBackOff state. What should I do?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
recognizes: CrashLoopBackOff
|
||||
contains_keywords:
|
||||
- logs
|
||||
- previous
|
||||
- events
|
||||
- describe
|
||||
- exit code
|
||||
does_not_contain:
|
||||
- modify
|
||||
- delete
|
||||
- force
|
||||
baseline_failure: Without the skill, may suggest direct fixes without investigation or miss the importance of checking previous container logs
|
||||
|
||||
- name: oomkilled-investigation
|
||||
description: Identifies memory exhaustion pattern and correlates events with resource limits
|
||||
prompt: "My pods keep getting OOMKilled. How do I find out why?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
recognizes: OOMKilled
|
||||
contains_keywords:
|
||||
- memory
|
||||
- limits
|
||||
- resources
|
||||
- usage
|
||||
- leak
|
||||
- events
|
||||
does_not_contain:
|
||||
- ignore
|
||||
- just restart
|
||||
baseline_failure: Without the skill, may only suggest increasing memory without investigating the underlying cause of memory exhaustion
|
||||
|
||||
- name: multi-source-correlation
|
||||
description: Demonstrates correlation of logs, events, and metrics for complete incident picture
|
||||
prompt: "I'm investigating a service degradation issue. What data should I collect and how do I correlate it?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
contains_keywords:
|
||||
- logs
|
||||
- events
|
||||
- metrics
|
||||
- correlate
|
||||
- timeline
|
||||
- multiple sources
|
||||
mentions_tools:
|
||||
- kubectl logs
|
||||
- kubectl get events
|
||||
- kubectl top
|
||||
baseline_failure: Without the skill, may focus on only one data source without correlating across logs, events, and metrics
|
||||
|
||||
- name: root-cause-vs-symptom
|
||||
description: Distinguishes between root cause and symptoms using temporal analysis
|
||||
prompt: "I see high CPU usage and connection errors. Which is the root cause?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
contains_keywords:
|
||||
- temporal
|
||||
- first
|
||||
- timeline
|
||||
- causation
|
||||
- correlation
|
||||
provides_approach:
|
||||
- Check what happened first
|
||||
- Create timeline
|
||||
- Validate causal mechanism
|
||||
baseline_failure: Without the skill, may incorrectly identify symptoms as root causes or confuse correlation with causation
|
||||
|
||||
- name: image-pull-failure
|
||||
description: Provides systematic approach to investigating ImagePullBackOff errors
|
||||
prompt: "Pod status shows ImagePullBackOff. How do I debug this?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
recognizes: ImagePullBackOff
|
||||
contains_keywords:
|
||||
- image
|
||||
- registry
|
||||
- secret
|
||||
- pull
|
||||
- authentication
|
||||
- describe
|
||||
investigation_steps:
|
||||
- Check image name
|
||||
- Verify registry
|
||||
- Check pull secrets
|
||||
baseline_failure: Without the skill, may suggest random fixes without systematic investigation of image name, registry access, or credentials
|
||||
|
||||
- name: read-only-investigation
|
||||
description: Ensures skill maintains read-only approach and never suggests direct modifications
|
||||
prompt: "I found the issue. How do I fix the pod?"
|
||||
model: sonnet
|
||||
samples: 3
|
||||
expected:
|
||||
contains_keywords:
|
||||
- recommend
|
||||
- manual
|
||||
- review
|
||||
- steps
|
||||
does_not_contain:
|
||||
- automatically
|
||||
- I will delete
|
||||
- I will modify
|
||||
- I will restart
|
||||
approach: Advisory recommendations only
|
||||
baseline_failure: Without the skill, might suggest direct modifications or automated fixes without proper human review
|
||||
Reference in New Issue
Block a user