263 lines
9.1 KiB
Markdown
263 lines
9.1 KiB
Markdown
---
|
|
description: Quick analysis of must-gather data - runs all analysis scripts and provides comprehensive cluster diagnostics
|
|
argument-hint: [must-gather-path] [component]
|
|
---
|
|
|
|
## Name
|
|
must-gather:analyze
|
|
|
|
## Synopsis
|
|
```
|
|
/must-gather:analyze [must-gather-path] [component]
|
|
```
|
|
|
|
## Description
|
|
|
|
The `analyze` command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.
|
|
|
|
The command can analyze:
|
|
- Cluster version and update status
|
|
- Cluster operator health (degraded, progressing, unavailable)
|
|
- Node conditions and resource status
|
|
- Pod failures, restarts, and crash loops
|
|
- Network configuration and OVN health
|
|
- OVN databases - logical topology, ACLs, pods
|
|
- Kubernetes events (warnings and errors)
|
|
- etcd cluster health and quorum status
|
|
- Persistent volume and claim status
|
|
- Prometheus alerts
|
|
|
|
You can request analysis of the entire cluster or focus on a specific component.
|
|
|
|
## Prerequisites
|
|
|
|
**Required Directory Structure:**
|
|
|
|
Must-gather data typically has this structure:
|
|
```
|
|
must-gather/
|
|
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
|
├── cluster-scoped-resources/
|
|
├── namespaces/
|
|
└── ...
|
|
```
|
|
|
|
The actual must-gather directory is the subdirectory with the hash name, not the parent directory.
|
|
|
|
**Required Scripts:**
|
|
|
|
Analysis scripts are bundled with this plugin at:
|
|
```
|
|
<plugin-root>/skills/must-gather-analyzer/scripts/
|
|
├── analyze_clusterversion.py
|
|
├── analyze_clusteroperators.py
|
|
├── analyze_nodes.py
|
|
├── analyze_pods.py
|
|
├── analyze_network.py
|
|
├── analyze_ovn_dbs.py
|
|
├── analyze_events.py
|
|
├── analyze_etcd.py
|
|
└── analyze_pvs.py
|
|
```
|
|
|
|
Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
|
|
|
|
## Error Handling
|
|
|
|
**CRITICAL: Script-Only Analysis**
|
|
|
|
- **NEVER** attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
|
|
- **ONLY** use the provided Python scripts in `plugins/must-gather/skills/must-gather-analyzer/scripts/`
|
|
- If scripts are missing or not found:
|
|
1. Stop immediately
|
|
2. Inform the user that the analysis scripts are not available
|
|
3. Ask the user to ensure the scripts are installed at the correct path
|
|
4. Do NOT attempt alternative approaches
|
|
|
|
**Script Availability Check:**
|
|
|
|
Before running any analysis:
|
|
|
|
1. Locate the scripts directory by searching for a known script:
|
|
```bash
|
|
SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
|
|
|
|
if [ -z "$SCRIPT_PATH" ]; then
|
|
echo "ERROR: Must-gather analysis scripts not found."
|
|
echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
|
|
exit 1
|
|
fi
|
|
|
|
# All scripts are in the same directory, so just get the directory
|
|
SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
|
|
```
|
|
|
|
2. If scripts cannot be found, STOP and report to the user:
|
|
```
|
|
The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
|
|
```
|
|
|
|
## Implementation
|
|
|
|
The command performs the following steps:
|
|
|
|
1. **Validate Must-Gather Path**:
|
|
- If path not provided as argument, ask the user
|
|
- Check if path contains `cluster-scoped-resources/` and `namespaces/` directories
|
|
- If user provides root directory, automatically find the correct subdirectory
|
|
- Verify the path exists and is readable
|
|
|
|
2. **Determine Analysis Scope**:
|
|
|
|
**STEP 1: Check for SPECIFIC component keywords**
|
|
|
|
If the user mentions a specific component, run ONLY that script:
|
|
- "pods", "pod status", "containers", "crashloop", "failing pods" → `analyze_pods.py` ONLY
|
|
- "etcd", "etcd health", "quorum" → `analyze_etcd.py` ONLY
|
|
- "network", "networking", "ovn", "connectivity" → `analyze_network.py` ONLY
|
|
- "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" → `analyze_ovn_dbs.py` ONLY
|
|
- "nodes", "node status", "node conditions" → `analyze_nodes.py` ONLY
|
|
- "operators", "cluster operators", "degraded" → `analyze_clusteroperators.py` ONLY
|
|
- "version", "cluster version", "update", "upgrade" → `analyze_clusterversion.py` ONLY
|
|
- "events", "warnings", "errors" → `analyze_events.py` ONLY
|
|
- "storage", "pv", "pvc", "volumes", "persistent" → `analyze_pvs.py` ONLY
|
|
- "alerts", "prometheus", "monitoring" → `analyze_prometheus.py` ONLY
|
|
|
|
**STEP 2: No specific component mentioned**
|
|
|
|
If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:
|
|
1. ClusterVersion (`analyze_clusterversion.py`)
|
|
2. Cluster Operators (`analyze_clusteroperators.py`)
|
|
3. Nodes (`analyze_nodes.py`)
|
|
4. Pods - problems only (`analyze_pods.py --problems-only`)
|
|
5. Network (`analyze_network.py`)
|
|
6. Events - warnings only (`analyze_events.py --type Warning --count 50`)
|
|
7. etcd (`analyze_etcd.py`)
|
|
8. Storage (`analyze_pvs.py`)
|
|
9. Monitoring (`analyze_prometheus.py`)
|
|
|
|
3. **Locate Plugin Scripts**:
|
|
- Use the script availability check from the Error Handling section to find the plugin root
|
|
- Store the scripts directory path in `$SCRIPTS_DIR`
|
|
|
|
4. **Execute Analysis Scripts**:
|
|
```bash
|
|
python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>
|
|
```
|
|
|
|
Example:
|
|
```bash
|
|
python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-...
|
|
```
|
|
|
|
5. **Synthesize Results**: Generate findings and recommendations based on script output
|
|
|
|
## Return Value
|
|
|
|
The command outputs structured analysis results to stdout:
|
|
|
|
**For Component-Specific Analysis:**
|
|
- Script output for the requested component only
|
|
- Focused findings and recommendations
|
|
|
|
**For Full Analysis:**
|
|
- Organized sections for each component
|
|
- Executive summary of overall cluster health
|
|
- Prioritized list of critical issues
|
|
- Actionable recommendations
|
|
- Suggested log files to review
|
|
|
|
## Output Structure
|
|
|
|
```
|
|
================================================================================
|
|
MUST-GATHER ANALYSIS SUMMARY
|
|
================================================================================
|
|
|
|
[Script outputs organized by component]
|
|
|
|
CLUSTER VERSION:
|
|
[output from analyze_clusterversion.py]
|
|
|
|
CLUSTER OPERATORS:
|
|
[output from analyze_clusteroperators.py]
|
|
|
|
NODES:
|
|
[output from analyze_nodes.py]
|
|
|
|
PROBLEMATIC PODS:
|
|
[output from analyze_pods.py --problems-only]
|
|
|
|
NETWORK STATUS:
|
|
[output from analyze_network.py]
|
|
|
|
WARNING EVENTS (Last 50):
|
|
[output from analyze_events.py --type Warning --count 50]
|
|
|
|
ETCD CLUSTER HEALTH:
|
|
[output from analyze_etcd.py]
|
|
|
|
STORAGE (PVs/PVCs):
|
|
[output from analyze_pvs.py]
|
|
|
|
MONITORING (Alerts):
|
|
[output from analyze_prometheus.py]
|
|
|
|
================================================================================
|
|
FINDINGS AND RECOMMENDATIONS
|
|
================================================================================
|
|
|
|
Critical Issues:
|
|
- [Critical problems requiring immediate attention]
|
|
|
|
Warnings:
|
|
- [Potential issues or degraded components]
|
|
|
|
Recommendations:
|
|
- [Specific next steps for investigation]
|
|
|
|
Logs to Review:
|
|
- [Specific log files to examine based on findings]
|
|
```
|
|
|
|
## Examples
|
|
|
|
1. **Full cluster analysis**:
|
|
```
|
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
|
|
```
|
|
Runs all analysis scripts and provides comprehensive cluster diagnostics.
|
|
|
|
2. **Analyze pod issues only**:
|
|
```
|
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statuses
|
|
```
|
|
Runs only `analyze_pods.py` to focus on pod-related issues.
|
|
|
|
3. **Check etcd health**:
|
|
```
|
|
/must-gather:analyze check etcd health
|
|
```
|
|
Asks for must-gather path, then runs only `analyze_etcd.py`.
|
|
|
|
4. **Network troubleshooting**:
|
|
```
|
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issues
|
|
```
|
|
Runs only `analyze_network.py` for network-specific analysis.
|
|
|
|
## Notes
|
|
|
|
- **Must-Gather Path**: Always use the subdirectory containing `cluster-scoped-resources/` and `namespaces/`, not the parent directory
|
|
- **Script Dependencies**: Analysis scripts must be executable and have required Python dependencies installed
|
|
- **Error Handling**: If scripts are not found or must-gather path is invalid, clear error messages are displayed
|
|
- **Cross-Referencing**: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
|
|
- **Pattern Detection**: Identifies patterns like multiple pod failures on the same node
|
|
- **Actionable Output**: Focuses on insights and recommendations rather than raw data dumps
|
|
- **Priority**: Issues are prioritized by severity (Critical > Warning > Info)
|
|
|
|
## Arguments
|
|
|
|
- **$1** (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
|
|
- **$2+** (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.
|