Files
gh-openshift-eng-ai-helpers…/commands/analyze.md
2025-11-30 08:46:06 +08:00

9.1 KiB

description: Quick analysis of must-gather data - runs all analysis scripts and provides comprehensive cluster diagnostics argument-hint: [must-gather-path] [component]

Name

must-gather:analyze

Synopsis

/must-gather:analyze [must-gather-path] [component]

Description

The analyze command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.

The command can analyze:

  • Cluster version and update status
  • Cluster operator health (degraded, progressing, unavailable)
  • Node conditions and resource status
  • Pod failures, restarts, and crash loops
  • Network configuration and OVN health
  • OVN databases - logical topology, ACLs, pods
  • Kubernetes events (warnings and errors)
  • etcd cluster health and quorum status
  • Persistent volume and claim status
  • Prometheus alerts

You can request analysis of the entire cluster or focus on a specific component.

Prerequisites

Required Directory Structure:

Must-gather data typically has this structure:

must-gather/
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
    ├── cluster-scoped-resources/
    ├── namespaces/
    └── ...

The actual must-gather directory is the subdirectory with the hash name, not the parent directory.

Required Scripts:

Analysis scripts are bundled with this plugin at:

<plugin-root>/skills/must-gather-analyzer/scripts/
├── analyze_clusterversion.py
├── analyze_clusteroperators.py
├── analyze_nodes.py
├── analyze_pods.py
├── analyze_network.py
├── analyze_ovn_dbs.py
├── analyze_events.py
├── analyze_etcd.py
└── analyze_pvs.py

Where <plugin-root> is the directory where this plugin is installed (typically ~/.cursor/commands/ai-helpers/plugins/must-gather/ or similar).

Error Handling

CRITICAL: Script-Only Analysis

  • NEVER attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
  • ONLY use the provided Python scripts in plugins/must-gather/skills/must-gather-analyzer/scripts/
  • If scripts are missing or not found:
    1. Stop immediately
    2. Inform the user that the analysis scripts are not available
    3. Ask the user to ensure the scripts are installed at the correct path
    4. Do NOT attempt alternative approaches

Script Availability Check:

Before running any analysis:

  1. Locate the scripts directory by searching for a known script:

    SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
    
    if [ -z "$SCRIPT_PATH" ]; then
        echo "ERROR: Must-gather analysis scripts not found."
        echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
        exit 1
    fi
    
    # All scripts are in the same directory, so just get the directory
    SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
    
  2. If scripts cannot be found, STOP and report to the user:

    The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
    

Implementation

The command performs the following steps:

  1. Validate Must-Gather Path:

    • If path not provided as argument, ask the user
    • Check if path contains cluster-scoped-resources/ and namespaces/ directories
    • If user provides root directory, automatically find the correct subdirectory
    • Verify the path exists and is readable
  2. Determine Analysis Scope:

    STEP 1: Check for SPECIFIC component keywords

    If the user mentions a specific component, run ONLY that script:

    • "pods", "pod status", "containers", "crashloop", "failing pods" → analyze_pods.py ONLY
    • "etcd", "etcd health", "quorum" → analyze_etcd.py ONLY
    • "network", "networking", "ovn", "connectivity" → analyze_network.py ONLY
    • "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" → analyze_ovn_dbs.py ONLY
    • "nodes", "node status", "node conditions" → analyze_nodes.py ONLY
    • "operators", "cluster operators", "degraded" → analyze_clusteroperators.py ONLY
    • "version", "cluster version", "update", "upgrade" → analyze_clusterversion.py ONLY
    • "events", "warnings", "errors" → analyze_events.py ONLY
    • "storage", "pv", "pvc", "volumes", "persistent" → analyze_pvs.py ONLY
    • "alerts", "prometheus", "monitoring" → analyze_prometheus.py ONLY

    STEP 2: No specific component mentioned

    If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:

    1. ClusterVersion (analyze_clusterversion.py)
    2. Cluster Operators (analyze_clusteroperators.py)
    3. Nodes (analyze_nodes.py)
    4. Pods - problems only (analyze_pods.py --problems-only)
    5. Network (analyze_network.py)
    6. Events - warnings only (analyze_events.py --type Warning --count 50)
    7. etcd (analyze_etcd.py)
    8. Storage (analyze_pvs.py)
    9. Monitoring (analyze_prometheus.py)
  3. Locate Plugin Scripts:

    • Use the script availability check from the Error Handling section to find the plugin root
    • Store the scripts directory path in $SCRIPTS_DIR
  4. Execute Analysis Scripts:

    python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>
    

    Example:

    python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-...
    
  5. Synthesize Results: Generate findings and recommendations based on script output

Return Value

The command outputs structured analysis results to stdout:

For Component-Specific Analysis:

  • Script output for the requested component only
  • Focused findings and recommendations

For Full Analysis:

  • Organized sections for each component
  • Executive summary of overall cluster health
  • Prioritized list of critical issues
  • Actionable recommendations
  • Suggested log files to review

Output Structure

================================================================================
MUST-GATHER ANALYSIS SUMMARY
================================================================================

[Script outputs organized by component]

CLUSTER VERSION:
[output from analyze_clusterversion.py]

CLUSTER OPERATORS:
[output from analyze_clusteroperators.py]

NODES:
[output from analyze_nodes.py]

PROBLEMATIC PODS:
[output from analyze_pods.py --problems-only]

NETWORK STATUS:
[output from analyze_network.py]

WARNING EVENTS (Last 50):
[output from analyze_events.py --type Warning --count 50]

ETCD CLUSTER HEALTH:
[output from analyze_etcd.py]

STORAGE (PVs/PVCs):
[output from analyze_pvs.py]

MONITORING (Alerts):
[output from analyze_prometheus.py]

================================================================================
FINDINGS AND RECOMMENDATIONS
================================================================================

Critical Issues:
- [Critical problems requiring immediate attention]

Warnings:
- [Potential issues or degraded components]

Recommendations:
- [Specific next steps for investigation]

Logs to Review:
- [Specific log files to examine based on findings]

Examples

  1. Full cluster analysis:

    /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
    

    Runs all analysis scripts and provides comprehensive cluster diagnostics.

  2. Analyze pod issues only:

    /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statuses
    

    Runs only analyze_pods.py to focus on pod-related issues.

  3. Check etcd health:

    /must-gather:analyze check etcd health
    

    Asks for must-gather path, then runs only analyze_etcd.py.

  4. Network troubleshooting:

    /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issues
    

    Runs only analyze_network.py for network-specific analysis.

Notes

  • Must-Gather Path: Always use the subdirectory containing cluster-scoped-resources/ and namespaces/, not the parent directory
  • Script Dependencies: Analysis scripts must be executable and have required Python dependencies installed
  • Error Handling: If scripts are not found or must-gather path is invalid, clear error messages are displayed
  • Cross-Referencing: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
  • Pattern Detection: Identifies patterns like multiple pod failures on the same node
  • Actionable Output: Focuses on insights and recommendations rather than raw data dumps
  • Priority: Issues are prioritized by severity (Critical > Warning > Info)

Arguments

  • $1 (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
  • $2+ (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.