Files
gh-konflux-ci-skills-skills…/tests/results/stuck-pipeline-investigation.2.txt
2025-11-30 08:35:19 +08:00

69 lines
2.2 KiB
Plaintext

# skill_digest: 651e129c7e0478c097161a76655c9f54e3566f5510368426e68d88fbbceb77b9
Based on the debugging guide, here's what you should check for a PipelineRun stuck in Running state for over an hour:
## Immediate Checks
**1. Identify which TaskRun(s) are stuck:**
```bash
kubectl get taskruns -l tekton.dev/pipelineRun=<pr-name> -n <namespace>
```
Look for TaskRuns that are either:
- **Pending** → Resource constraints or scheduling issues
- **Running** for too long → Timeout or hanging process
**2. Check the stuck TaskRun's pod status:**
```bash
kubectl get pods -l tekton.dev/taskRun=<tr-name> -n <namespace>
```
**3. Review recent events:**
```bash
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
```
Look for:
- `FailedScheduling` → Resource quota exceeded or insufficient cluster resources
- `FailedMount` → PVC or workspace issues
- `ImagePullBackOff` → Can't pull container image
## Common Causes for Stuck Pipelines
### **If TaskRun is Pending:**
```bash
# Check resource quotas
kubectl describe namespace <namespace> | grep -A5 "Resource Quotas"
# Check node resources
kubectl describe node | grep -A5 "Allocated resources"
```
**Likely cause:** Insufficient resources to schedule the pod
### **If TaskRun is Running:**
```bash
# Check the logs for the current step
kubectl logs <pod-name> --all-containers=true -n <namespace>
# Check which step is currently executing
kubectl get taskrun <tr-name> -n <namespace> -o jsonpath='{.status.steps[*].name}{"\n"}{.status.steps[*].terminated}'
```
**Likely causes:**
- Process hanging without output
- Network operation stalled (downloading dependencies)
- Timeout set too high, waiting for operation to complete
### **If Pod hasn't started:**
```bash
kubectl describe pod <pod-name> -n <namespace>
```
Check the Events section for ImagePull errors or volume mount failures.
## Quick Decision Path
1. **TaskRuns all Pending?** → Resource/quota issue
2. **One TaskRun Running forever?** → Check its logs for hanging process
3. **No pod created yet?** → Image or workspace configuration problem
4. **Pod exists but waiting?** → Check events for FailedMount/ImagePull errors
What namespace and PipelineRun name are you working with? I can help you run these diagnostic commands.