9.1 KiB
Name
must-gather:analyze
Synopsis
/must-gather:analyze [must-gather-path] [component]
Description
The analyze command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.
The command can analyze:
- Cluster version and update status
- Cluster operator health (degraded, progressing, unavailable)
- Node conditions and resource status
- Pod failures, restarts, and crash loops
- Network configuration and OVN health
- OVN databases - logical topology, ACLs, pods
- Kubernetes events (warnings and errors)
- etcd cluster health and quorum status
- Persistent volume and claim status
- Prometheus alerts
You can request analysis of the entire cluster or focus on a specific component.
Prerequisites
Required Directory Structure:
Must-gather data typically has this structure:
must-gather/
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
├── cluster-scoped-resources/
├── namespaces/
└── ...
The actual must-gather directory is the subdirectory with the hash name, not the parent directory.
Required Scripts:
Analysis scripts are bundled with this plugin at:
<plugin-root>/skills/must-gather-analyzer/scripts/
├── analyze_clusterversion.py
├── analyze_clusteroperators.py
├── analyze_nodes.py
├── analyze_pods.py
├── analyze_network.py
├── analyze_ovn_dbs.py
├── analyze_events.py
├── analyze_etcd.py
└── analyze_pvs.py
Where <plugin-root> is the directory where this plugin is installed (typically ~/.cursor/commands/ai-helpers/plugins/must-gather/ or similar).
Error Handling
CRITICAL: Script-Only Analysis
- NEVER attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
- ONLY use the provided Python scripts in
plugins/must-gather/skills/must-gather-analyzer/scripts/ - If scripts are missing or not found:
- Stop immediately
- Inform the user that the analysis scripts are not available
- Ask the user to ensure the scripts are installed at the correct path
- Do NOT attempt alternative approaches
Script Availability Check:
Before running any analysis:
-
Locate the scripts directory by searching for a known script:
SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1) if [ -z "$SCRIPT_PATH" ]; then echo "ERROR: Must-gather analysis scripts not found." echo "Please ensure the must-gather plugin from ai-helpers is properly installed." exit 1 fi # All scripts are in the same directory, so just get the directory SCRIPTS_DIR=$(dirname "$SCRIPT_PATH") -
If scripts cannot be found, STOP and report to the user:
The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
Implementation
The command performs the following steps:
-
Validate Must-Gather Path:
- If path not provided as argument, ask the user
- Check if path contains
cluster-scoped-resources/andnamespaces/directories - If user provides root directory, automatically find the correct subdirectory
- Verify the path exists and is readable
-
Determine Analysis Scope:
STEP 1: Check for SPECIFIC component keywords
If the user mentions a specific component, run ONLY that script:
- "pods", "pod status", "containers", "crashloop", "failing pods" →
analyze_pods.pyONLY - "etcd", "etcd health", "quorum" →
analyze_etcd.pyONLY - "network", "networking", "ovn", "connectivity" →
analyze_network.pyONLY - "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" →
analyze_ovn_dbs.pyONLY - "nodes", "node status", "node conditions" →
analyze_nodes.pyONLY - "operators", "cluster operators", "degraded" →
analyze_clusteroperators.pyONLY - "version", "cluster version", "update", "upgrade" →
analyze_clusterversion.pyONLY - "events", "warnings", "errors" →
analyze_events.pyONLY - "storage", "pv", "pvc", "volumes", "persistent" →
analyze_pvs.pyONLY - "alerts", "prometheus", "monitoring" →
analyze_prometheus.pyONLY
STEP 2: No specific component mentioned
If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:
- ClusterVersion (
analyze_clusterversion.py) - Cluster Operators (
analyze_clusteroperators.py) - Nodes (
analyze_nodes.py) - Pods - problems only (
analyze_pods.py --problems-only) - Network (
analyze_network.py) - Events - warnings only (
analyze_events.py --type Warning --count 50) - etcd (
analyze_etcd.py) - Storage (
analyze_pvs.py) - Monitoring (
analyze_prometheus.py)
- "pods", "pod status", "containers", "crashloop", "failing pods" →
-
Locate Plugin Scripts:
- Use the script availability check from the Error Handling section to find the plugin root
- Store the scripts directory path in
$SCRIPTS_DIR
-
Execute Analysis Scripts:
python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>Example:
python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-... -
Synthesize Results: Generate findings and recommendations based on script output
Return Value
The command outputs structured analysis results to stdout:
For Component-Specific Analysis:
- Script output for the requested component only
- Focused findings and recommendations
For Full Analysis:
- Organized sections for each component
- Executive summary of overall cluster health
- Prioritized list of critical issues
- Actionable recommendations
- Suggested log files to review
Output Structure
================================================================================
MUST-GATHER ANALYSIS SUMMARY
================================================================================
[Script outputs organized by component]
CLUSTER VERSION:
[output from analyze_clusterversion.py]
CLUSTER OPERATORS:
[output from analyze_clusteroperators.py]
NODES:
[output from analyze_nodes.py]
PROBLEMATIC PODS:
[output from analyze_pods.py --problems-only]
NETWORK STATUS:
[output from analyze_network.py]
WARNING EVENTS (Last 50):
[output from analyze_events.py --type Warning --count 50]
ETCD CLUSTER HEALTH:
[output from analyze_etcd.py]
STORAGE (PVs/PVCs):
[output from analyze_pvs.py]
MONITORING (Alerts):
[output from analyze_prometheus.py]
================================================================================
FINDINGS AND RECOMMENDATIONS
================================================================================
Critical Issues:
- [Critical problems requiring immediate attention]
Warnings:
- [Potential issues or degraded components]
Recommendations:
- [Specific next steps for investigation]
Logs to Review:
- [Specific log files to examine based on findings]
Examples
-
Full cluster analysis:
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/Runs all analysis scripts and provides comprehensive cluster diagnostics.
-
Analyze pod issues only:
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statusesRuns only
analyze_pods.pyto focus on pod-related issues. -
Check etcd health:
/must-gather:analyze check etcd healthAsks for must-gather path, then runs only
analyze_etcd.py. -
Network troubleshooting:
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issuesRuns only
analyze_network.pyfor network-specific analysis.
Notes
- Must-Gather Path: Always use the subdirectory containing
cluster-scoped-resources/andnamespaces/, not the parent directory - Script Dependencies: Analysis scripts must be executable and have required Python dependencies installed
- Error Handling: If scripts are not found or must-gather path is invalid, clear error messages are displayed
- Cross-Referencing: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
- Pattern Detection: Identifies patterns like multiple pod failures on the same node
- Actionable Output: Focuses on insights and recommendations rather than raw data dumps
- Priority: Issues are prioritized by severity (Critical > Warning > Info)
Arguments
- $1 (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
- $2+ (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.