Initial commit

2025-11-30 08:46:06 +08:00
commit cf9da06850
16 changed files with 3315 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,14 @@
 {
  "name": "must-gather",
  "description": "A plugin to analyze and report on must-gather data",
  "version": "0.0.1",
  "author": {
    "name": "openshift"
  },
  "skills": [
    "./skills"
  ],
  "commands": [
    "./commands"
  ]
 }
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # must-gather
 A plugin to analyze and report on must-gather data
--- a/commands/analyze.md
+++ b/commands/analyze.md
@@ -0,0 +1,262 @@
 ---
 description: Quick analysis of must-gather data - runs all analysis scripts and provides comprehensive cluster diagnostics
 argument-hint: [must-gather-path] [component]
 ---
 ## Name
 must-gather:analyze
 ## Synopsis
 ```
 /must-gather:analyze [must-gather-path] [component]
 ```
 ## Description
 The `analyze` command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.
 The command can analyze:
 - Cluster version and update status
 - Cluster operator health (degraded, progressing, unavailable)
 - Node conditions and resource status
 - Pod failures, restarts, and crash loops
 - Network configuration and OVN health
 - OVN databases - logical topology, ACLs, pods
 - Kubernetes events (warnings and errors)
 - etcd cluster health and quorum status
 - Persistent volume and claim status
 - Prometheus alerts
 You can request analysis of the entire cluster or focus on a specific component.
 ## Prerequisites
 **Required Directory Structure:**
 Must-gather data typically has this structure:
 ```
 must-gather/
 └── registry-ci-openshift-org-origin-...-sha256-<hash>/
    ├── cluster-scoped-resources/
    ├── namespaces/
    └── ...
 ```
 The actual must-gather directory is the subdirectory with the hash name, not the parent directory.
 **Required Scripts:**
 Analysis scripts are bundled with this plugin at:
 ```
 <plugin-root>/skills/must-gather-analyzer/scripts/
 ├── analyze_clusterversion.py
 ├── analyze_clusteroperators.py
 ├── analyze_nodes.py
 ├── analyze_pods.py
 ├── analyze_network.py
 ├── analyze_ovn_dbs.py
 ├── analyze_events.py
 ├── analyze_etcd.py
 └── analyze_pvs.py
 ```
 Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
 ## Error Handling
 **CRITICAL: Script-Only Analysis**
 - **NEVER** attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
 - **ONLY** use the provided Python scripts in `plugins/must-gather/skills/must-gather-analyzer/scripts/`
 - If scripts are missing or not found:
  1. Stop immediately
  2. Inform the user that the analysis scripts are not available
  3. Ask the user to ensure the scripts are installed at the correct path
  4. Do NOT attempt alternative approaches
 **Script Availability Check:**
 Before running any analysis:
 1. Locate the scripts directory by searching for a known script:
   ```bash
   SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
   if [ -z "$SCRIPT_PATH" ]; then
       echo "ERROR: Must-gather analysis scripts not found."
       echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
       exit 1
   fi
   # All scripts are in the same directory, so just get the directory
   SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
   ```
 2. If scripts cannot be found, STOP and report to the user:
   ```
   The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
   ```
 ## Implementation
 The command performs the following steps:
 1. **Validate Must-Gather Path**:
   - If path not provided as argument, ask the user
   - Check if path contains `cluster-scoped-resources/` and `namespaces/` directories
   - If user provides root directory, automatically find the correct subdirectory
   - Verify the path exists and is readable
 2. **Determine Analysis Scope**:
   **STEP 1: Check for SPECIFIC component keywords**
   If the user mentions a specific component, run ONLY that script:
   - "pods", "pod status", "containers", "crashloop", "failing pods" → `analyze_pods.py` ONLY
   - "etcd", "etcd health", "quorum" → `analyze_etcd.py` ONLY
   - "network", "networking", "ovn", "connectivity" → `analyze_network.py` ONLY
   - "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" → `analyze_ovn_dbs.py` ONLY
   - "nodes", "node status", "node conditions" → `analyze_nodes.py` ONLY
   - "operators", "cluster operators", "degraded" → `analyze_clusteroperators.py` ONLY
   - "version", "cluster version", "update", "upgrade" → `analyze_clusterversion.py` ONLY
   - "events", "warnings", "errors" → `analyze_events.py` ONLY
   - "storage", "pv", "pvc", "volumes", "persistent" → `analyze_pvs.py` ONLY
   - "alerts", "prometheus", "monitoring" → `analyze_prometheus.py` ONLY
   **STEP 2: No specific component mentioned**
   If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:
   1. ClusterVersion (`analyze_clusterversion.py`)
   2. Cluster Operators (`analyze_clusteroperators.py`)
   3. Nodes (`analyze_nodes.py`)
   4. Pods - problems only (`analyze_pods.py --problems-only`)
   5. Network (`analyze_network.py`)
   6. Events - warnings only (`analyze_events.py --type Warning --count 50`)
   7. etcd (`analyze_etcd.py`)
   8. Storage (`analyze_pvs.py`)
   9. Monitoring (`analyze_prometheus.py`)
 3. **Locate Plugin Scripts**:
   - Use the script availability check from the Error Handling section to find the plugin root
   - Store the scripts directory path in `$SCRIPTS_DIR`
 4. **Execute Analysis Scripts**:
   ```bash
   python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>
   ```
   Example:
   ```bash
   python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-...
   ```
 5. **Synthesize Results**: Generate findings and recommendations based on script output
 ## Return Value
 The command outputs structured analysis results to stdout:
 **For Component-Specific Analysis:**
 - Script output for the requested component only
 - Focused findings and recommendations
 **For Full Analysis:**
 - Organized sections for each component
 - Executive summary of overall cluster health
 - Prioritized list of critical issues
 - Actionable recommendations
 - Suggested log files to review
 ## Output Structure
 ```
 ================================================================================
 MUST-GATHER ANALYSIS SUMMARY
 ================================================================================
 [Script outputs organized by component]
 CLUSTER VERSION:
 [output from analyze_clusterversion.py]
 CLUSTER OPERATORS:
 [output from analyze_clusteroperators.py]
 NODES:
 [output from analyze_nodes.py]
 PROBLEMATIC PODS:
 [output from analyze_pods.py --problems-only]
 NETWORK STATUS:
 [output from analyze_network.py]
 WARNING EVENTS (Last 50):
 [output from analyze_events.py --type Warning --count 50]
 ETCD CLUSTER HEALTH:
 [output from analyze_etcd.py]
 STORAGE (PVs/PVCs):
 [output from analyze_pvs.py]
 MONITORING (Alerts):
 [output from analyze_prometheus.py]
 ================================================================================
 FINDINGS AND RECOMMENDATIONS
 ================================================================================
 Critical Issues:
 - [Critical problems requiring immediate attention]
 Warnings:
 - [Potential issues or degraded components]
 Recommendations:
 - [Specific next steps for investigation]
 Logs to Review:
 - [Specific log files to examine based on findings]
 ```
 ## Examples
 1. **Full cluster analysis**:
   ```
   /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
   ```
   Runs all analysis scripts and provides comprehensive cluster diagnostics.
 2. **Analyze pod issues only**:
   ```
   /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statuses
   ```
   Runs only `analyze_pods.py` to focus on pod-related issues.
 3. **Check etcd health**:
   ```
   /must-gather:analyze check etcd health
   ```
   Asks for must-gather path, then runs only `analyze_etcd.py`.
 4. **Network troubleshooting**:
   ```
   /must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issues
   ```
   Runs only `analyze_network.py` for network-specific analysis.
 ## Notes
 - **Must-Gather Path**: Always use the subdirectory containing `cluster-scoped-resources/` and `namespaces/`, not the parent directory
 - **Script Dependencies**: Analysis scripts must be executable and have required Python dependencies installed
 - **Error Handling**: If scripts are not found or must-gather path is invalid, clear error messages are displayed
 - **Cross-Referencing**: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
 - **Pattern Detection**: Identifies patterns like multiple pod failures on the same node
 - **Actionable Output**: Focuses on insights and recommendations rather than raw data dumps
 - **Priority**: Issues are prioritized by severity (Critical > Warning > Info)
 ## Arguments
 - **$1** (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
 - **$2+** (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.
--- a/commands/ovn-dbs.md
+++ b/commands/ovn-dbs.md
@@ -0,0 +1,266 @@
 ---
 description: Analyze OVN databases from a must-gather using ovsdb-tool
 argument-hint: [must-gather-path]
 ---
 ## Name
 must-gather:ovn-dbs
 ## Synopsis
 ```
 /must-gather:ovn-dbs [must-gather-path] [--node <node-name>] [--query <json>]
 ```
 ## Description
 The `ovn-dbs` command analyzes OVN Northbound and Southbound databases collected from clusters. It uses `ovsdb-tool` to query the binary database files (`.db`) collected per-node, providing detailed information about the logical network topology, pods, ACLs, and routers on each node.
 The command automatically maps ovnkube pods to their corresponding nodes by reading pod specifications from the must-gather data.
 **Two modes of operation:**
 1. **Standard Analysis** (default): Runs pre-built analysis showing switches, ports, ACLs, and routers
 2. **Query Mode** (`--query`): Run custom OVSDB JSON queries for specific data extraction
 **What it analyzes:**
 - **Per-zone logical network topology**
 - **Logical Switches** and their ports
 - **Pod Logical Switch Ports** with namespace, pod name, and IP addresses
 - **Access Control Lists (ACLs)** with priorities, directions, and match rules
 - **Logical Routers** and their ports
 **Important:** This command only works with must-gathers from clusters, where each node/zone has its own database files.
 ## Prerequisites
 The must-gather should contain:
 ```
 network_logs/
 └── ovnk_database_store.tar.gz
 ```
 **Required Tools:**
 - `ovsdb-tool` must be installed (from openvswitch package)
  - Check with: `which ovsdb-tool`
  - Install: `sudo dnf install openvswitch` or `sudo apt install openvswitch-common`
 **Analysis Script:**
 The script is bundled with this plugin:
 ```
 <plugin-root>/skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
 ```
 Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
 Claude will automatically locate it by searching for the script in the plugin installation directory, regardless of your current working directory.
 ## Implementation
 The command performs the following steps:
 1. **Locate Analysis Script**:
   ```bash
   SCRIPT_PATH=$(find ~ -name "analyze_ovn_dbs.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
   if [ -z "$SCRIPT_PATH" ]; then
       echo "ERROR: analyze_ovn_dbs.py script not found."
       echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
       exit 1
   fi
   SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
   ```
 2. **Extract Database Tarball**:
   - Locate `network_logs/ovnk_database_store.tar.gz`
   - Extract if not already extracted
   - Find all `*_nbdb` and `*_sbdb` files
 3. **Query Each Zone's Database**:
   For each zone (node), query the Northbound database using `ovsdb-tool query`:
   ```bash
   ovsdb-tool query <zone>_nbdb '["OVN_Northbound", {"op":"select", "table":"<table>", "where":[], "columns":[...]}]'
   ```
 4. **Analyze and Display**:
   - **Logical Switches**: Names and port counts
   - **Logical Switch Ports**: Filter for pods (external_ids.pod=true), show namespace, pod name, and IP
   - **ACLs**: Priority, direction, match rules, and actions
   - **Logical Routers**: Names and port counts
 5. **Present Zone Summary**:
   - Total counts per zone
   - Detailed breakdowns
   - Sorted and formatted output
 ## Return Value
 The command outputs structured analysis for each node:
 ```
 Found 6 node(s)
 ================================================================================
 Node: ip-10-0-26-145.us-east-2.compute.internal
 Pod:  ovnkube-node-79cbh
 ================================================================================
  Logical Switches:      4
  Logical Switch Ports:  55
  ACLs:                  7
  Logical Routers:       2
  LOGICAL SWITCHES (4):
  NAME                                                         PORTS
  --------------------------------------------------------------------------------
  transit_switch                                               6
  ip-10-0-1-10.us-east-2.compute.internal                      7
  ext_ip-10-0-1-10.us-east-2.compute.internal                  2
  join                                                         2
  POD LOGICAL SWITCH PORTS (5):
  NAMESPACE                                POD                                           IP
  ------------------------------------------------------------------------------------------------------------------------
  openshift-dns                            dns-default-abc123                            10.128.0.5
  openshift-monitoring                     prometheus-k8s-0                              10.128.0.10
  openshift-etcd                           etcd-master-0                                 10.128.0.3
  ...
  ACCESS CONTROL LISTS (7):
  PRIORITY   DIRECTION       ACTION          MATCH
  ------------------------------------------------------------------------------------------------------------------------
  1012       from-lport      allow           inport == @a4743249366342378346 && (ip4.mcast ...
  1011       to-lport        drop            (ip4.mcast || mldv1 || mldv2 || ...
  1001       to-lport        allow-related   ip4.src==10.128.0.2
  ...
  LOGICAL ROUTERS (2):
  NAME                                                         PORTS
  --------------------------------------------------------------------------------
  ovn_cluster_router                                           3
  GR_ip-10-0-1-10.us-east-2.compute.internal                   2
 ```
 ## Examples
 1. **Analyze all nodes in a must-gather**:
   ```
   /must-gather:ovn-dbs ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
   ```
   Shows logical network topology for all nodes.
 2. **Analyze specific node**:
   ```
   /must-gather:ovn-dbs ./must-gather/.../ --node ip-10-0-26-145
   ```
   Shows OVN database information only for the specified node (supports partial name matching).
 3. **Analyze master node**:
   ```
   /must-gather:ovn-dbs ./must-gather/.../ --node master-0
   ```
   Filter to a specific master node using partial name matching.
 4. **Interactive usage without path**:
   ```
   /must-gather:ovn-dbs
   ```
   The command will ask for the must-gather path.
 5. **Check if pod exists in OVN**:
   ```
   /must-gather:ovn-dbs ./must-gather/.../
   ```
   Then search the output for the pod name to see which node it's on and its IP allocation.
 6. **Investigate ACL rules on a specific node**:
   ```
   /must-gather:ovn-dbs ./must-gather/.../ --node worker-1
   ```
   Review the ACL section for a specific node to understand traffic filtering rules.
 7. **Run custom OVSDB query** (Query Mode):
   ```
   /must-gather:ovn-dbs ./must-gather/.../ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
   ```
   Query ACLs with priority > 1000 across all nodes. Claude can construct the JSON query for any OVSDB table.
 8. **Query specific node with custom query**:
   ```
   /must-gather:ovn-dbs ./must-gather/.../ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name","ports"]}]'
   ```
   List all logical switches with their ports on master-0.
 9. **Query specific table** (Claude constructs JSON):
   Just ask Claude to query a specific OVSDB table and it will construct the appropriate JSON query. For example:
   - "Show all Logical_Router_Static_Route entries"
   - "Find ACLs with action 'drop'"
   - "List Logical_Switch_Port entries where external_ids contains 'openshift-etcd'"
 ## Error Handling
 **Missing ovsdb-tool:**
 ```
 Error: ovsdb-tool not found. Please install openvswitch package.
 ```
 Solution: Install openvswitch: `sudo dnf install openvswitch`
 **Missing database tarball:**
 ```
 Error: Database tarball not found: network_logs/ovnk_database_store.tar.gz
 ```
 Solution: Ensure this is a must-gather from an OVN cluster.
 **Node not found:**
 ```
 Error: No databases found for node matching 'master-5'
 Available nodes:
  - ip-10-0-77-117.us-east-2.compute.internal
  - ip-10-0-26-145.us-east-2.compute.internal
  - ip-10-0-1-194.us-east-2.compute.internal
 ```
 Solution: Use one of the listed node names or a partial match.
 ## Notes
 - **Binary Database Format**: Uses `ovsdb-tool` to read OVSDB binary files directly
 - **Per-Node Analysis**: Each node in IC mode has its own database (one NB and one SB per zone)
 - **Node Mapping**: Automatically correlates ovnkube pods to nodes by reading pod specs from must-gather
 - **Pod Discovery**: Pods are identified by `external_ids` with `pod=true`
 - **IP Extraction**: Pod IPs are parsed from the `addresses` field (format: "MAC IP")
 - **ACL Priorities**: Higher priority ACLs are processed first (shown at top)
 - **Node Filtering**: Supports partial name matching for convenience (e.g., "--node master" matches all masters)
 - **Query Mode**: Accepts raw OVSDB JSON queries in the format `["OVN_Northbound", {"op":"select", "table":"...", ...}]`
 - **Claude Query Construction**: Claude can automatically construct OVSDB JSON queries based on natural language requests
 - **Performance**: Querying large databases may take a few seconds per node
 ## Use Cases
 1. **Verify Pod Network Configuration**:
   - Check if pods are registered in OVN
   - Verify IP address assignments
   - Confirm logical switch port creation
 2. **Troubleshoot Connectivity Issues**:
   - Review ACL rules blocking traffic
   - Check if pods are in correct logical switches
   - Verify router configurations
 3. **Understand Topology**:
   - See how zones are interconnected via transit_switch
   - Review gateway router configurations
   - Understand logical network structure
 4. **Audit Network Policies**:
   - See ACL rules generated from NetworkPolicies
   - Identify overly permissive or restrictive rules
   - Check rule priorities and match conditions
 ## Arguments
 - **$1** (must-gather-path): Optional. Path to the must-gather directory containing network_logs/. If not provided, user will be prompted.
 - **--node, -n** (node-name): Optional. Filter analysis to a specific node. Supports partial name matching (e.g., "master-0", "ip-10-0-26-145"). If no match is found, displays list of available nodes.
 - **--query, -q** (json-query): Optional. Run a raw OVSDB JSON query instead of standard analysis. Claude can construct the JSON query based on OVSDB transaction format. When provided, outputs raw JSON results instead of formatted analysis.
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,93 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:openshift-eng/ai-helpers:plugins/must-gather",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "5e7aea9c51347184db2f2a1db1029335d5e6b4b6",
    "treeHash": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716",
    "generatedAt": "2025-11-28T10:27:30.749372Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "must-gather",
    "description": "A plugin to analyze and report on must-gather data",
    "version": "0.0.1"
  },
  "content": {
    "files": [
      {
        "path": "README.md",
        "sha256": "62224e0cac5af83035e53ea011e387f43c066eef4c2dd942ec536612c6d5a53c"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "9efe7a9dce84b69f125de43024e8760070e603b79d161e1662ea485eae8d02f9"
      },
      {
        "path": "commands/analyze.md",
        "sha256": "d4aa3d91663e16df1ff5176f992e62b99d9b7a0941fed5b8b24587ebcca23a7c"
      },
      {
        "path": "commands/ovn-dbs.md",
        "sha256": "768eb9b9489dae9c511f9d25b3472cbd878d44d06dfa73f29872c5e62c7e3aeb"
      },
      {
        "path": "skills/must-gather-analyzer/SKILL.md",
        "sha256": "7d269e9a90a45600af26df99247f11152b69e9fb3eb09813f2e830dcecd21503"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_prometheus.py",
        "sha256": "5fd3f1580bf58cc8a0d967d8a4b08654f85c19ee2f3c3202e6e8e6db2469d56b"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_pvs.py",
        "sha256": "d05ef93f72853eb9c43b19a53d0d1765a9620fef08643e5ce2ce86559dd7f89f"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_pods.py",
        "sha256": "dc23a7d5822a5f572a2373fc30de6609f701a270bc6f1d0837efcfbada88d00e"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_network.py",
        "sha256": "6316796bfd2f4bc463fa4681b7b81198dcb9fba513bcec7b31f621314019470f"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_events.py",
        "sha256": "ff02feac6053c13a3e3a6c4e83c1cb5fc1e5818ada860f351c5dccb7d54724c6"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py",
        "sha256": "be537b03e1fee5e48ce936b544387e81aa6089a07893fa5ab329cc347980dde9"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_clusterversion.py",
        "sha256": "f1110d963ba18942861c25a7234d9a01cc0798c4ef76534a5631de857d96805b"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_nodes.py",
        "sha256": "90c8e2a9fa59b60946d5789cdaa2c602c5089227b4ec7f202c690b21b738d7c4"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_clusteroperators.py",
        "sha256": "51915d87c97af925f39fbccd16964bde4aafed62c28830882bb544f3ef8d8533"
      },
      {
        "path": "skills/must-gather-analyzer/scripts/analyze_etcd.py",
        "sha256": "03c6ef97197427acbce2c503d4644a113c01aef5e033ddd8ef464bba22e57962"
      }
    ],
    "dirSha256": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
--- a/skills/must-gather-analyzer/SKILL.md
+++ b/skills/must-gather-analyzer/SKILL.md
@@ -0,0 +1,285 @@
 ---
 name: Must-Gather Analyzer
 description: |
  Analyze OpenShift must-gather diagnostic data including cluster operators, pods, nodes,
  and network components. Use this skill when the user asks about cluster health, operator status,
  pod issues, node conditions, or wants diagnostic insights from must-gather data.
  Triggers: "analyze must-gather", "check cluster health", "operator status", "pod issues",
  "node status", "failing pods", "degraded operators", "cluster problems", "crashlooping",
  "network issues", "etcd health", "analyze clusteroperators", "analyze pods", "analyze nodes"
 ---
 # Must-Gather Analyzer Skill
 Comprehensive analysis of OpenShift must-gather diagnostic data with helper scripts that parse YAML and display output in `oc`-like format.
 ## Overview
 This skill provides analysis for:
 - **ClusterVersion**: Current version, update status, and capabilities
 - **Cluster Operators**: Status, degradation, and availability
 - **Pods**: Health, restarts, crashes, and failures across namespaces
 - **Nodes**: Conditions, capacity, and readiness
 - **Network**: OVN/SDN diagnostics and connectivity
 - **Events**: Warning and error events across namespaces
 - **etcd**: Cluster health, member status, and quorum
 - **Storage**: PersistentVolumes and PersistentVolumeClaims status
 ## Must-Gather Directory Structure
 **Important**: Must-gather data is contained in a subdirectory with a long hash name:
 ```
 must-gather/
 └── registry-ci-openshift-org-origin-...-sha256-<hash>/
    ├── cluster-scoped-resources/
    │   ├── config.openshift.io/clusteroperators/
    │   └── core/nodes/
    ├── namespaces/
    │   └── <namespace>/
    │       └── pods/
    │           └── <pod-name>/
    │               └── <pod-name>.yaml
    └── network_logs/
 ```
 The analysis scripts expect the path to the **subdirectory** (the one with the hash), not the root must-gather folder.
 ## Instructions
 ### 1. Get Must-Gather Path
 Ask the user for the must-gather directory path if not already provided.
 - If they provide the root directory, look for the subdirectory with the hash name
 - The correct path contains `cluster-scoped-resources/` and `namespaces/` directories
 ### 2. Choose Analysis Type
 Based on user's request, run the appropriate helper script:
 #### ClusterVersion Analysis
 ```bash
 ./scripts/analyze_clusterversion.py <must-gather-path>
 ```
 Shows cluster version information similar to `oc get clusterversion`:
 - Current version and update status
 - Progressing state
 - Available updates
 - Version conditions
 - Enabled capabilities
 - Update history
 #### Cluster Operators Analysis
 ```bash
 ./scripts/analyze_clusteroperators.py <must-gather-path>
 ```
 Shows cluster operator status similar to `oc get clusteroperators`:
 - Available, Progressing, Degraded conditions
 - Version information
 - Time since condition change
 - Detailed messages for operators with issues
 #### Pods Analysis
 ```bash
 # All namespaces
 ./scripts/analyze_pods.py <must-gather-path>
 # Specific namespace
 ./scripts/analyze_pods.py <must-gather-path> --namespace <namespace>
 # Show only problematic pods
 ./scripts/analyze_pods.py <must-gather-path> --problems-only
 ```
 Shows pod status similar to `oc get pods -A`:
 - Ready/Total containers
 - Status (Running, Pending, CrashLoopBackOff, etc.)
 - Restart counts
 - Age
 - Categorized issues (crashlooping, pending, failed)
 #### Nodes Analysis
 ```bash
 ./scripts/analyze_nodes.py <must-gather-path>
 # Show only nodes with issues
 ./scripts/analyze_nodes.py <must-gather-path> --problems-only
 ```
 Shows node status similar to `oc get nodes`:
 - Ready status
 - Roles (master, worker)
 - Age
 - Kubernetes version
 - Node conditions (DiskPressure, MemoryPressure, etc.)
 - Capacity and allocatable resources
 #### Network Analysis
 ```bash
 ./scripts/analyze_network.py <must-gather-path>
 ```
 Shows network health:
 - Network type (OVN-Kubernetes, OpenShift SDN)
 - Network operator status
 - OVN pod health
 - PodNetworkConnectivityCheck results
 - Network-related issues
 #### Events Analysis
 ```bash
 # Recent events (last 100)
 ./scripts/analyze_events.py <must-gather-path>
 # Warning events only
 ./scripts/analyze_events.py <must-gather-path> --type Warning
 # Events in specific namespace
 ./scripts/analyze_events.py <must-gather-path> --namespace openshift-etcd
 # Show last 50 events
 ./scripts/analyze_events.py <must-gather-path> --count 50
 ```
 Shows cluster events:
 - Event type (Warning, Normal)
 - Last seen timestamp
 - Reason and message
 - Affected object
 - Event count
 #### etcd Analysis
 ```bash
 ./scripts/analyze_etcd.py <must-gather-path>
 ```
 Shows etcd cluster health:
 - Member health status
 - Member list with IDs and URLs
 - Endpoint status (leader, version, DB size)
 - Quorum status
 - Cluster summary
 #### Storage Analysis
 ```bash
 # All PVs and PVCs
 ./scripts/analyze_pvs.py <must-gather-path>
 # PVCs in specific namespace
 ./scripts/analyze_pvs.py <must-gather-path> --namespace openshift-monitoring
 ```
 Shows storage resources:
 - PersistentVolumes (capacity, status, claims)
 - PersistentVolumeClaims (binding, capacity)
 - Storage classes
 - Pending/unbound volumes
 #### Monitoring Analysis
 ```bash
 # All alerts.
 ./scripts/analyze_prometheus.py <must-gather-path>
 # Alerts in specific namespace
 ./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
 ```
 Shows monitoring information:
 - Alerts (state, namespace, name, active since, labels)
 - Total of pending/firing alerts
 ### 3. Interpret and Report
 After running the scripts:
 1. Review the summary statistics
 2. Focus on items flagged with issues
 3. Provide actionable insights and next steps
 4. Suggest log analysis for specific components if needed
 5. Cross-reference issues (e.g., degraded operator → failing pods → node issues)
 ## Output Format
 All scripts provide:
 - **Summary Section**: High-level statistics with emoji indicators
 - **Table View**: `oc`-like formatted output
 - **Issues Section**: Detailed breakdown of problems
 Example summary format:
 ```
 ================================================================================
 SUMMARY: 25/28 operators healthy
  ⚠️  3 operators with issues
  🔄 1 progressing
  ❌ 2 degraded
 ================================================================================
 ```
 ## Helper Scripts Reference
 ### scripts/analyze_clusterversion.py
 Parses: `cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml`
 Output: ClusterVersion table with detailed version info, conditions, and capabilities
 ### scripts/analyze_clusteroperators.py
 Parses: `cluster-scoped-resources/config.openshift.io/clusteroperators/`
 Output: ClusterOperator status table with conditions
 ### scripts/analyze_pods.py
 Parses: `namespaces/*/pods/*/*.yaml` (individual pod directories)
 Output: Pod status table with issues categorized
 ### scripts/analyze_nodes.py
 Parses: `cluster-scoped-resources/core/nodes/`
 Output: Node status table with conditions and capacity
 ### scripts/analyze_network.py
 Parses: `network_logs/`, network operator, OVN resources
 Output: Network health summary and diagnostics
 ### scripts/analyze_events.py
 Parses: `namespaces/*/core/events.yaml`
 Output: Event table sorted by last occurrence
 ### scripts/analyze_etcd.py
 Parses: `etcd_info/` (endpoint_health.json, member_list.json, endpoint_status.json)
 Output: etcd cluster health and member status
 ### scripts/analyze_pvs.py
 Parses: `cluster-scoped-resources/core/persistentvolumes/`, `namespaces/*/core/persistentvolumeclaims.yaml`
 Output: PV and PVC status tables
 ## Tips for Analysis
 1. **Start with Cluster Operators**: They often reveal system-wide issues
 2. **Check Timing**: Look at "SINCE" columns to understand when issues started
 3. **Follow Dependencies**: Degraded operator → check its namespace pods → check hosting nodes
 4. **Look for Patterns**: Multiple pods failing on same node suggests node issue
 5. **Cross-reference**: Use multiple scripts together for complete picture
 ## Common Scenarios
 ### "Why is my cluster degraded?"
 1. Run `analyze_clusteroperators.py` - identify degraded operators
 2. Run `analyze_pods.py --namespace <operator-namespace>` - check operator pods
 3. Run `analyze_nodes.py` - verify node health
 ### "Pods keep crashing"
 1. Run `analyze_pods.py --problems-only` - find crashlooping pods
 2. Check which nodes they're on
 3. Run `analyze_nodes.py` - verify node conditions
 4. Suggest checking pod logs in must-gather data
 ### "Network connectivity issues"
 1. Run `analyze_network.py` - check network health
 2. Run `analyze_pods.py --namespace openshift-ovn-kubernetes`
 3. Check PodNetworkConnectivityCheck results
 ## Next Steps After Analysis
 Based on findings, suggest:
 - Examining specific pod logs in `namespaces/<ns>/pods/<pod>/<container>/logs/`
 - Reviewing events in `namespaces/<ns>/core/events.yaml`
 - Checking audit logs in `audit_logs/`
 - Analyzing metrics data if available
 - Looking at host service logs in `host_service_logs/`
--- a/skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
+++ b/skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
@@ -0,0 +1,199 @@
 #!/usr/bin/env python3
 """
 Analyze ClusterOperator resources from must-gather data.
 Displays output similar to 'oc get clusteroperators' command.
 """
 import sys
 import os
 import yaml
 from pathlib import Path
 from datetime import datetime
 from typing import List, Dict, Any, Optional
 def parse_clusteroperator(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a single clusteroperator YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            if doc and doc.get('kind') == 'ClusterOperator':
                return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def get_condition_status(conditions: List[Dict], condition_type: str) -> tuple[str, str, str]:
    """
    Get status, reason, and message for a specific condition type.
    Returns (status, reason, message).
    """
    for condition in conditions:
        if condition.get('type') == condition_type:
            status = condition.get('status', 'Unknown')
            reason = condition.get('reason', '')
            message = condition.get('message', '')
            return status, reason, message
    return 'Unknown', '', ''
 def calculate_duration(timestamp_str: str) -> str:
    """Calculate duration from timestamp to now."""
    try:
        # Parse Kubernetes timestamp format
        ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
        now = datetime.now(ts.tzinfo)
        delta = now - ts
        days = delta.days
        hours = delta.seconds // 3600
        minutes = (delta.seconds % 3600) // 60
        if days > 0:
            return f"{days}d"
        elif hours > 0:
            return f"{hours}h"
        elif minutes > 0:
            return f"{minutes}m"
        else:
            return "<1m"
    except Exception:
        return "unknown"
 def get_condition_duration(conditions: List[Dict], condition_type: str) -> str:
    """Get the duration since a condition last transitioned."""
    for condition in conditions:
        if condition.get('type') == condition_type:
            last_transition = condition.get('lastTransitionTime')
            if last_transition:
                return calculate_duration(last_transition)
    return ""
 def format_operator_row(operator: Dict[str, Any]) -> Dict[str, str]:
    """Format a ClusterOperator into a row for display."""
    name = operator.get('metadata', {}).get('name', 'unknown')
    conditions = operator.get('status', {}).get('conditions', [])
    versions = operator.get('status', {}).get('versions', [])
    # Get version (first version in the list, usually the operator version)
    version = versions[0].get('version', '') if versions else ''
    # Get condition statuses
    available_status, _, _ = get_condition_status(conditions, 'Available')
    progressing_status, _, _ = get_condition_status(conditions, 'Progressing')
    degraded_status, degraded_reason, degraded_msg = get_condition_status(conditions, 'Degraded')
    # Determine which condition to show duration and message for
    # Priority: Degraded > Progressing > Available (if false)
    if degraded_status == 'True':
        since = get_condition_duration(conditions, 'Degraded')
        message = degraded_msg if degraded_msg else degraded_reason
    elif progressing_status == 'True':
        since = get_condition_duration(conditions, 'Progressing')
        _, prog_reason, prog_msg = get_condition_status(conditions, 'Progressing')
        message = prog_msg if prog_msg else prog_reason
    elif available_status == 'False':
        since = get_condition_duration(conditions, 'Available')
        _, avail_reason, avail_msg = get_condition_status(conditions, 'Available')
        message = avail_msg if avail_msg else avail_reason
    else:
        # All good, show time since available
        since = get_condition_duration(conditions, 'Available')
        message = ''
    return {
        'name': name,
        'version': version,
        'available': available_status,
        'progressing': progressing_status,
        'degraded': degraded_status,
        'since': since,
        'message': message
    }
 def print_operators_table(operators: List[Dict[str, str]]):
    """Print operators in a formatted table like 'oc get clusteroperators'."""
    if not operators:
        print("No resources found.")
        return
    # Print header - no width limit on VERSION to match oc output
    print(f"{'NAME':<42} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'DEGRADED':<10} {'SINCE':<7} MESSAGE")
    # Print rows
    for op in operators:
        name = op['name'][:42]
        version = op['version']  # Don't truncate version
        available = op['available'][:11]
        progressing = op['progressing'][:13]
        degraded = op['degraded'][:10]
        since = op['since'][:7]
        message = op['message']
        print(f"{name:<42} {version:<50} {available:<11} {progressing:<13} {degraded:<10} {since:<7} {message}")
 def analyze_clusteroperators(must_gather_path: str):
    """Analyze all clusteroperators in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Common paths where clusteroperators might be
    possible_patterns = [
        "cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
        "*/cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
    ]
    clusteroperators = []
    # Find and parse all clusteroperator files
    for pattern in possible_patterns:
        for co_file in base_path.glob(pattern):
            operator = parse_clusteroperator(co_file)
            if operator:
                clusteroperators.append(operator)
    if not clusteroperators:
        print("No resources found.", file=sys.stderr)
        return 1
    # Remove duplicates (same operator from different glob patterns)
    seen = set()
    unique_operators = []
    for op in clusteroperators:
        name = op.get('metadata', {}).get('name')
        if name and name not in seen:
            seen.add(name)
            unique_operators.append(op)
    # Format and sort operators by name
    formatted_ops = [format_operator_row(op) for op in unique_operators]
    formatted_ops.sort(key=lambda x: x['name'])
    # Print results
    print_operators_table(formatted_ops)
    return 0
 def main():
    if len(sys.argv) < 2:
        print("Usage: analyze_clusteroperators.py <must-gather-directory>", file=sys.stderr)
        print("\nExample:", file=sys.stderr)
        print("  analyze_clusteroperators.py ./must-gather.local.123456789", file=sys.stderr)
        return 1
    must_gather_path = sys.argv[1]
    if not os.path.isdir(must_gather_path):
        print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
        return 1
    return analyze_clusteroperators(must_gather_path)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_clusterversion.py
+++ b/skills/must-gather-analyzer/scripts/analyze_clusterversion.py
@@ -0,0 +1,261 @@
 #!/usr/bin/env python3
 """
 Analyze ClusterVersion from must-gather data.
 Displays output similar to 'oc get clusterversion' command.
 """
 import sys
 import os
 import yaml
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, Any, Optional
 def parse_clusterversion(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse the clusterversion YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            if doc and doc.get('kind') == 'ClusterVersion':
                return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def get_condition_status(conditions: list, condition_type: str) -> str:
    """Get status for a specific condition type."""
    for condition in conditions:
        if condition.get('type') == condition_type:
            return condition.get('status', 'Unknown')
    return 'Unknown'
 def calculate_duration(timestamp_str: str) -> str:
    """Calculate duration from timestamp to now."""
    try:
        ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
        now = datetime.now(ts.tzinfo)
        delta = now - ts
        days = delta.days
        hours = delta.seconds // 3600
        minutes = (delta.seconds % 3600) // 60
        if days > 0:
            return f"{days}d"
        elif hours > 0:
            return f"{hours}h"
        elif minutes > 0:
            return f"{minutes}m"
        else:
            return "<1m"
    except Exception:
        return ""
 def format_clusterversion(cv: Dict[str, Any]) -> Dict[str, str]:
    """Format ClusterVersion for display."""
    name = cv.get('metadata', {}).get('name', 'version')
    status = cv.get('status', {})
    # Get version from desired
    desired = status.get('desired', {})
    version = desired.get('version', '')
    # Get available updates count
    available_updates = status.get('availableUpdates')
    if available_updates and isinstance(available_updates, list):
        available = str(len(available_updates))
    elif available_updates is None:
        available = ''
    else:
        available = '0'
    # Get conditions
    conditions = status.get('conditions', [])
    progressing = get_condition_status(conditions, 'Progressing')
    since = ''
    # Get time since progressing started (if true) or since last update
    for condition in conditions:
        if condition.get('type') == 'Progressing':
            last_transition = condition.get('lastTransitionTime')
            if last_transition:
                since = calculate_duration(last_transition)
            break
    # Get status message
    status_msg = ''
    for condition in conditions:
        if condition.get('type') == 'Progressing' and condition.get('status') == 'True':
            status_msg = condition.get('message', '')[:80]
            break
    # If not progressing, check if failed
    if progressing != 'True':
        for condition in conditions:
            if condition.get('type') == 'Failing' and condition.get('status') == 'True':
                status_msg = condition.get('message', '')[:80]
                break
    return {
        'name': name,
        'version': version,
        'available': available,
        'progressing': progressing,
        'since': since,
        'status': status_msg
    }
 def print_clusterversion_table(cv_info: Dict[str, str]):
    """Print ClusterVersion in a formatted table like 'oc get clusterversion'."""
    # Print header
    print(f"{'NAME':<10} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'SINCE':<7} STATUS")
    # Print row
    name = cv_info['name'][:10]
    version = cv_info['version'][:50]
    available = cv_info['available'][:11]
    progressing = cv_info['progressing'][:13]
    since = cv_info['since'][:7]
    status = cv_info['status']
    print(f"{name:<10} {version:<50} {available:<11} {progressing:<13} {since:<7} {status}")
 def print_detailed_info(cv: Dict[str, Any]):
    """Print detailed cluster version information."""
    status = cv.get('status', {})
    spec = cv.get('spec', {})
    print(f"\n{'='*80}")
    print("CLUSTER VERSION DETAILS")
    print(f"{'='*80}")
    # Cluster ID
    cluster_id = spec.get('clusterID', 'unknown')
    print(f"Cluster ID: {cluster_id}")
    # Desired version
    desired = status.get('desired', {})
    print(f"Desired Version: {desired.get('version', 'unknown')}")
    print(f"Desired Image: {desired.get('image', 'unknown')}")
    # Version hash
    version_hash = status.get('versionHash', '')
    if version_hash:
        print(f"Version Hash: {version_hash}")
    # Upstream
    upstream = spec.get('upstream', '')
    if upstream:
        print(f"Update Server: {upstream}")
    # Conditions
    conditions = status.get('conditions', [])
    print(f"\nCONDITIONS:")
    for condition in conditions:
        cond_type = condition.get('type', 'Unknown')
        cond_status = condition.get('status', 'Unknown')
        last_transition = condition.get('lastTransitionTime', '')
        message = condition.get('message', '')
        # Calculate time since transition
        age = calculate_duration(last_transition) if last_transition else ''
        status_indicator = "✅" if cond_status == "True" else "❌" if cond_status == "False" else "❓"
        print(f"  {status_indicator} {cond_type}: {cond_status} (for {age})")
        if message and cond_status == 'True':
            print(f"     Message: {message[:100]}")
    # Update history
    history = status.get('history', [])
    if history:
        print(f"\nUPDATE HISTORY (last 5):")
        for i, entry in enumerate(history[:5]):
            state = entry.get('state', 'Unknown')
            version = entry.get('version', 'unknown')
            image = entry.get('image', '')
            completion_time = entry.get('completionTime', '')
            age = calculate_duration(completion_time) if completion_time else ''
            print(f"  {i+1}. {version} - {state} {f'({age} ago)' if age else ''}")
    # Available updates
    available_updates = status.get('availableUpdates')
    if available_updates and isinstance(available_updates, list) and len(available_updates) > 0:
        print(f"\nAVAILABLE UPDATES ({len(available_updates)}):")
        for i, update in enumerate(available_updates[:5]):
            version = update.get('version', 'unknown')
            image = update.get('image', '')
            print(f"  {i+1}. {version}")
    elif available_updates is None:
        print(f"\nAVAILABLE UPDATES: Unable to retrieve updates")
    # Capabilities
    capabilities = status.get('capabilities', {})
    enabled_caps = capabilities.get('enabledCapabilities', [])
    if enabled_caps:
        print(f"\nENABLED CAPABILITIES ({len(enabled_caps)}):")
        # Print in columns
        for i in range(0, len(enabled_caps), 3):
            caps = enabled_caps[i:i+3]
            print(f"  {', '.join(caps)}")
    print(f"{'='*80}\n")
 def analyze_clusterversion(must_gather_path: str):
    """Analyze ClusterVersion in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Find ClusterVersion file
    possible_patterns = [
        "cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
        "*/cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
    ]
    cv = None
    for pattern in possible_patterns:
        for cv_file in base_path.glob(pattern):
            cv = parse_clusterversion(cv_file)
            if cv:
                break
        if cv:
            break
    if not cv:
        print("No ClusterVersion found.")
        return 1
    # Format and print table
    cv_info = format_clusterversion(cv)
    print_clusterversion_table(cv_info)
    # Print detailed information
    print_detailed_info(cv)
    return 0
 def main():
    if len(sys.argv) < 2:
        print("Usage: analyze_clusterversion.py <must-gather-directory>", file=sys.stderr)
        print("\nExample:", file=sys.stderr)
        print("  analyze_clusterversion.py ./must-gather.local.123456789", file=sys.stderr)
        return 1
    must_gather_path = sys.argv[1]
    if not os.path.isdir(must_gather_path):
        print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
        return 1
    return analyze_clusterversion(must_gather_path)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_etcd.py
+++ b/skills/must-gather-analyzer/scripts/analyze_etcd.py
@@ -0,0 +1,206 @@
 #!/usr/bin/env python3
 """
 Analyze etcd information from must-gather data.
 Shows etcd cluster health, member status, and diagnostics.
 """
 import sys
 import os
 import json
 from pathlib import Path
 from typing import Dict, Any, List, Optional
 def parse_etcd_info(must_gather_path: Path) -> Dict[str, Any]:
    """Parse etcd_info directory for cluster health information."""
    etcd_data = {
        'member_health': [],
        'member_list': [],
        'endpoint_health': [],
        'endpoint_status': []
    }
    # Find etcd_info directory
    etcd_dirs = list(must_gather_path.glob("etcd_info")) + \
                list(must_gather_path.glob("*/etcd_info"))
    if not etcd_dirs:
        return etcd_data
    etcd_info_dir = etcd_dirs[0]
    # Parse member health
    member_health_file = etcd_info_dir / "endpoint_health.json"
    if member_health_file.exists():
        try:
            with open(member_health_file, 'r') as f:
                data = json.load(f)
                etcd_data['member_health'] = data if isinstance(data, list) else [data]
        except Exception as e:
            print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
    # Parse member list
    member_list_file = etcd_info_dir / "member_list.json"
    if member_list_file.exists():
        try:
            with open(member_list_file, 'r') as f:
                data = json.load(f)
                if isinstance(data, dict) and 'members' in data:
                    etcd_data['member_list'] = data['members']
                elif isinstance(data, list):
                    etcd_data['member_list'] = data
        except Exception as e:
            print(f"Warning: Failed to parse member_list.json: {e}", file=sys.stderr)
    # Parse endpoint health
    endpoint_health_file = etcd_info_dir / "endpoint_health.json"
    if endpoint_health_file.exists():
        try:
            with open(endpoint_health_file, 'r') as f:
                data = json.load(f)
                etcd_data['endpoint_health'] = data if isinstance(data, list) else [data]
        except Exception as e:
            print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
    # Parse endpoint status
    endpoint_status_file = etcd_info_dir / "endpoint_status.json"
    if endpoint_status_file.exists():
        try:
            with open(endpoint_status_file, 'r') as f:
                data = json.load(f)
                etcd_data['endpoint_status'] = data if isinstance(data, list) else [data]
        except Exception as e:
            print(f"Warning: Failed to parse endpoint_status.json: {e}", file=sys.stderr)
    return etcd_data
 def print_member_health(members: List[Dict[str, Any]]):
    """Print etcd member health status."""
    if not members:
        print("No member health data found.")
        return
    print("ETCD MEMBER HEALTH")
    print(f"{'ENDPOINT':<60} {'HEALTH':<10} {'TOOK':<10} ERROR")
    for member in members:
        endpoint = member.get('endpoint', 'unknown')[:60]
        health = 'true' if member.get('health') else 'false'
        took = member.get('took', '')
        error = member.get('error', '')
        print(f"{endpoint:<60} {health:<10} {took:<10} {error}")
 def print_member_list(members: List[Dict[str, Any]]):
    """Print etcd member list."""
    if not members:
        print("\nNo member list data found.")
        return
    print("\nETCD MEMBER LIST")
    print(f"{'ID':<20} {'NAME':<40} {'PEER URLS':<60} {'CLIENT URLS':<60}")
    for member in members:
        member_id = str(member.get('ID', member.get('id', 'unknown')))[:20]
        name = member.get('name', 'unknown')[:40]
        peer_urls = ','.join(member.get('peerURLs', []))[:60]
        client_urls = ','.join(member.get('clientURLs', []))[:60]
        print(f"{member_id:<20} {name:<40} {peer_urls:<60} {client_urls:<60}")
 def print_endpoint_status(endpoints: List[Dict[str, Any]]):
    """Print etcd endpoint status."""
    if not endpoints:
        print("\nNo endpoint status data found.")
        return
    print("\nETCD ENDPOINT STATUS")
    print(f"{'ENDPOINT':<60} {'LEADER':<20} {'VERSION':<10} {'DB SIZE':<10} {'IS LEARNER'}")
    for endpoint in endpoints:
        ep = endpoint.get('Endpoint', 'unknown')[:60]
        status = endpoint.get('Status', {})
        leader = str(status.get('leader', 'unknown'))[:20]
        version = status.get('version', 'unknown')[:10]
        db_size = status.get('dbSize', 0)
        db_size_mb = f"{db_size / (1024*1024):.1f}MB" if db_size else '0MB'
        is_learner = 'true' if status.get('isLearner') else 'false'
        print(f"{ep:<60} {leader:<20} {version:<10} {db_size_mb:<10} {is_learner}")
 def print_summary(etcd_data: Dict[str, Any]):
    """Print summary of etcd cluster health."""
    member_health = etcd_data.get('member_health', [])
    member_list = etcd_data.get('member_list', [])
    total_members = len(member_list)
    healthy_members = sum(1 for m in member_health if m.get('health'))
    print(f"\n{'='*80}")
    print(f"ETCD CLUSTER SUMMARY")
    print(f"{'='*80}")
    print(f"Total Members: {total_members}")
    print(f"Healthy Members: {healthy_members}/{len(member_health) if member_health else total_members}")
    if healthy_members < total_members:
        print(f"  ⚠️  Warning: Not all members are healthy!")
    elif healthy_members == total_members and total_members > 0:
        print(f"  ✅ All members healthy")
    # Check for quorum
    if total_members >= 3:
        quorum = (total_members // 2) + 1
        if healthy_members >= quorum:
            print(f"  ✅ Quorum achieved ({healthy_members}/{quorum})")
        else:
            print(f"  ❌ Quorum lost! ({healthy_members}/{quorum})")
    print(f"{'='*80}\n")
 def analyze_etcd(must_gather_path: str):
    """Analyze etcd information in a must-gather directory."""
    base_path = Path(must_gather_path)
    etcd_data = parse_etcd_info(base_path)
    if not any(etcd_data.values()):
        print("No etcd_info data found in must-gather.")
        print("Expected location: etcd_info/ directory")
        return 1
    # Print summary first
    print_summary(etcd_data)
    # Print detailed information
    print_member_health(etcd_data.get('member_health', []))
    print_member_list(etcd_data.get('member_list', []))
    print_endpoint_status(etcd_data.get('endpoint_status', []))
    return 0
 def main():
    if len(sys.argv) < 2:
        print("Usage: analyze_etcd.py <must-gather-directory>", file=sys.stderr)
        print("\nExample:", file=sys.stderr)
        print("  analyze_etcd.py ./must-gather.local.123456789", file=sys.stderr)
        return 1
    must_gather_path = sys.argv[1]
    if not os.path.isdir(must_gather_path):
        print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
        return 1
    return analyze_etcd(must_gather_path)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_events.py
+++ b/skills/must-gather-analyzer/scripts/analyze_events.py
@@ -0,0 +1,201 @@
 #!/usr/bin/env python3
 """
 Analyze Events from must-gather data.
 Shows warning and error events sorted by last occurrence.
 """
 import sys
 import os
 import yaml
 import argparse
 from pathlib import Path
 from datetime import datetime
 from typing import List, Dict, Any, Optional
 from collections import defaultdict
 def parse_events_file(file_path: Path) -> List[Dict[str, Any]]:
    """Parse events YAML file which may contain multiple events."""
    events = []
    try:
        with open(file_path, 'r') as f:
            docs = yaml.safe_load_all(f)
            for doc in docs:
                if doc and doc.get('kind') == 'Event':
                    events.append(doc)
                elif doc and doc.get('kind') == 'EventList':
                    # Handle EventList
                    events.extend(doc.get('items', []))
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return events
 def calculate_age(timestamp_str: str) -> str:
    """Calculate age from timestamp."""
    try:
        ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
        now = datetime.now(ts.tzinfo)
        delta = now - ts
        days = delta.days
        hours = delta.seconds // 3600
        minutes = (delta.seconds % 3600) // 60
        if days > 0:
            return f"{days}d"
        elif hours > 0:
            return f"{hours}h"
        elif minutes > 0:
            return f"{minutes}m"
        else:
            return "<1m"
    except Exception:
        return ""
 def format_event(event: Dict[str, Any]) -> Dict[str, Any]:
    """Format an event for display."""
    metadata = event.get('metadata', {})
    namespace = metadata.get('namespace', '')
    name = metadata.get('name', 'unknown')
    # Get last timestamp
    last_timestamp = event.get('lastTimestamp') or event.get('eventTime') or metadata.get('creationTimestamp', '')
    age = calculate_age(last_timestamp) if last_timestamp else ''
    # Event details
    event_type = event.get('type', 'Normal')
    reason = event.get('reason', '')
    message = event.get('message', '')
    count = event.get('count', 1)
    # Involved object
    involved = event.get('involvedObject', {})
    obj_kind = involved.get('kind', '')
    obj_name = involved.get('name', '')
    return {
        'namespace': namespace,
        'last_seen': age,
        'type': event_type,
        'reason': reason,
        'object_kind': obj_kind,
        'object_name': obj_name,
        'message': message,
        'count': count,
        'timestamp': last_timestamp
    }
 def print_events_table(events: List[Dict[str, Any]]):
    """Print events in a table format."""
    if not events:
        print("No resources found.")
        return
    # Print header
    print(f"{'NAMESPACE':<30} {'LAST SEEN':<10} {'TYPE':<10} {'REASON':<30} {'OBJECT':<40} {'MESSAGE':<60}")
    # Print rows
    for event in events:
        namespace = event['namespace'][:30] if event['namespace'] else '<cluster>'
        last_seen = event['last_seen'][:10]
        event_type = event['type'][:10]
        reason = event['reason'][:30]
        obj = f"{event['object_kind']}/{event['object_name']}"[:40]
        message = event['message'][:60]
        print(f"{namespace:<30} {last_seen:<10} {event_type:<10} {reason:<30} {obj:<40} {message:<60}")
 def analyze_events(must_gather_path: str, namespace: Optional[str] = None,
                   event_type: Optional[str] = None, show_count: int = 100):
    """Analyze events in a must-gather directory."""
    base_path = Path(must_gather_path)
    all_events = []
    # Find all events files
    if namespace:
        patterns = [
            f"namespaces/{namespace}/core/events.yaml",
            f"*/namespaces/{namespace}/core/events.yaml",
        ]
    else:
        patterns = [
            "namespaces/*/core/events.yaml",
            "*/namespaces/*/core/events.yaml",
        ]
    for pattern in patterns:
        for events_file in base_path.glob(pattern):
            events = parse_events_file(events_file)
            all_events.extend(events)
    if not all_events:
        print("No resources found.")
        return 1
    # Format events
    formatted_events = [format_event(e) for e in all_events]
    # Filter by type if specified
    if event_type:
        formatted_events = [e for e in formatted_events if e['type'].lower() == event_type.lower()]
    # Sort by timestamp (most recent first)
    formatted_events.sort(key=lambda x: x['timestamp'], reverse=True)
    # Limit count
    if show_count and show_count > 0:
        formatted_events = formatted_events[:show_count]
    # Print results
    print_events_table(formatted_events)
    # Summary
    total = len(formatted_events)
    warnings = sum(1 for e in formatted_events if e['type'] == 'Warning')
    normal = sum(1 for e in formatted_events if e['type'] == 'Normal')
    print(f"\nShowing {total} most recent events")
    if warnings > 0:
        print(f"  ⚠️  {warnings} Warning events")
    if normal > 0:
        print(f"  ℹ️  {normal} Normal events")
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description='Analyze events from must-gather data',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s ./must-gather
  %(prog)s ./must-gather --namespace openshift-etcd
  %(prog)s ./must-gather --type Warning
  %(prog)s ./must-gather --count 50
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('-n', '--namespace', help='Filter by namespace')
    parser.add_argument('-t', '--type', help='Filter by event type (Warning, Normal)')
    parser.add_argument('-c', '--count', type=int, default=100,
                        help='Number of events to show (default: 100)')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    return analyze_events(args.must_gather_path, args.namespace, args.type, args.count)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_network.py
+++ b/skills/must-gather-analyzer/scripts/analyze_network.py
@@ -0,0 +1,281 @@
 #!/usr/bin/env python3
 """
 Analyze Network resources and diagnostics from must-gather data.
 Shows network operator status, OVN pods, and connectivity checks.
 """
 import sys
 import os
 import yaml
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def get_network_type(must_gather_path: Path) -> str:
    """Determine the network type from cluster network config."""
    # First try to find networks.yaml (List object)
    patterns = [
        "cluster-scoped-resources/config.openshift.io/networks.yaml",
        "*/cluster-scoped-resources/config.openshift.io/networks.yaml",
    ]
    for pattern in patterns:
        for network_file in must_gather_path.glob(pattern):
            network_list = parse_yaml_file(network_file)
            if network_list:
                # Handle NetworkList object
                items = network_list.get('items', [])
                if items:
                    # Get the first network item
                    network = items[0]
                    spec = network.get('spec', {})
                    network_type = spec.get('networkType', 'Unknown')
                    if network_type != 'Unknown':
                        return network_type
    # Fallback: try individual network config files
    patterns = [
        "cluster-scoped-resources/config.openshift.io/*.yaml",
    ]
    for pattern in patterns:
        for network_file in must_gather_path.glob(pattern):
            if network_file.name in ['networks.yaml']:
                continue
            network = parse_yaml_file(network_file)
            if network:
                spec = network.get('spec', {})
                network_type = spec.get('networkType', 'Unknown')
                if network_type != 'Unknown':
                    return network_type
    return 'Unknown'
 def analyze_network_operator(must_gather_path: Path) -> Optional[Dict[str, Any]]:
    """Analyze network operator status."""
    patterns = [
        "cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
        "*/cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
    ]
    for pattern in patterns:
        for op_file in must_gather_path.glob(pattern):
            operator = parse_yaml_file(op_file)
            if operator:
                conditions = operator.get('status', {}).get('conditions', [])
                result = {}
                for cond in conditions:
                    cond_type = cond.get('type')
                    if cond_type in ['Available', 'Progressing', 'Degraded']:
                        result[cond_type] = cond.get('status', 'Unknown')
                        result[f'{cond_type}_message'] = cond.get('message', '')
                return result
    return None
 def analyze_ovn_pods(must_gather_path: Path) -> List[Dict[str, str]]:
    """Analyze OVN-Kubernetes pods."""
    pods = []
    patterns = [
        "namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
        "*/namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
    ]
    for pattern in patterns:
        for pod_file in must_gather_path.glob(pattern):
            if pod_file.name == 'pods.yaml':
                continue
            pod = parse_yaml_file(pod_file)
            if pod:
                name = pod.get('metadata', {}).get('name', 'unknown')
                status = pod.get('status', {})
                phase = status.get('phase', 'Unknown')
                container_statuses = status.get('containerStatuses', [])
                total = len(pod.get('spec', {}).get('containers', []))
                ready = sum(1 for cs in container_statuses if cs.get('ready', False))
                pods.append({
                    'name': name,
                    'ready': f"{ready}/{total}",
                    'status': phase
                })
    # Remove duplicates
    seen = set()
    unique_pods = []
    for p in pods:
        if p['name'] not in seen:
            seen.add(p['name'])
            unique_pods.append(p)
    return sorted(unique_pods, key=lambda x: x['name'])
 def analyze_connectivity_checks(must_gather_path: Path) -> Dict[str, Any]:
    """Analyze PodNetworkConnectivityCheck resources."""
    # First try to find podnetworkconnectivitychecks.yaml (List object)
    patterns = [
        "pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
        "*/pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
    ]
    total_checks = 0
    failed_checks = []
    for pattern in patterns:
        for check_file in must_gather_path.glob(pattern):
            check_list = parse_yaml_file(check_file)
            if check_list:
                items = check_list.get('items', [])
                for check in items:
                    total_checks += 1
                    name = check.get('metadata', {}).get('name', 'unknown')
                    status = check.get('status', {})
                    conditions = status.get('conditions', [])
                    for cond in conditions:
                        if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
                            failed_checks.append({
                                'name': name,
                                'message': cond.get('message', 'Unknown')
                            })
                # If we found the list file, no need to continue
                if total_checks > 0:
                    return {
                        'total': total_checks,
                        'failed': failed_checks
                    }
    # Fallback: try individual check files
    patterns = [
        "*/pod_network_connectivity_check/*.yaml",
    ]
    for pattern in patterns:
        for check_file in must_gather_path.glob(pattern):
            if check_file.name == 'podnetworkconnectivitychecks.yaml':
                continue
            check = parse_yaml_file(check_file)
            if check:
                total_checks += 1
                name = check.get('metadata', {}).get('name', 'unknown')
                status = check.get('status', {})
                conditions = status.get('conditions', [])
                for cond in conditions:
                    if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
                        failed_checks.append({
                            'name': name,
                            'message': cond.get('message', 'Unknown')
                        })
    return {
        'total': total_checks,
        'failed': failed_checks
    }
 def print_network_summary(network_type: str, operator_status: Optional[Dict],
                         ovn_pods: List[Dict], connectivity: Dict):
    """Print network analysis summary."""
    print(f"{'NETWORK TYPE':<30} {network_type}")
    print()
    if operator_status:
        print("NETWORK OPERATOR STATUS")
        print(f"{'Available':<15} {operator_status.get('Available', 'Unknown')}")
        print(f"{'Progressing':<15} {operator_status.get('Progressing', 'Unknown')}")
        print(f"{'Degraded':<15} {operator_status.get('Degraded', 'Unknown')}")
        if operator_status.get('Degraded') == 'True':
            msg = operator_status.get('Degraded_message', '')
            if msg:
                print(f"  Message: {msg}")
        print()
    if ovn_pods and network_type == 'OVNKubernetes':
        print("OVN-KUBERNETES PODS")
        print(f"{'NAME':<60} {'READY':<10} STATUS")
        for pod in ovn_pods:
            name = pod['name'][:60]
            ready = pod['ready'][:10]
            status = pod['status']
            print(f"{name:<60} {ready:<10} {status}")
        print()
    if connectivity['total'] > 0:
        print(f"NETWORK CONNECTIVITY CHECKS: {connectivity['total']} total")
        if connectivity['failed']:
            print(f"  Failed: {len(connectivity['failed'])}")
            for failed in connectivity['failed'][:10]:  # Show first 10
                print(f"    - {failed['name']}")
                if failed['message']:
                    print(f"      {failed['message'][:100]}")
        else:
            print("  All checks passing")
        print()
 def analyze_network(must_gather_path: str):
    """Analyze network resources in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Get network type
    network_type = get_network_type(base_path)
    # Get network operator status
    operator_status = analyze_network_operator(base_path)
    # Get OVN pods if applicable
    ovn_pods = []
    if network_type == 'OVNKubernetes':
        ovn_pods = analyze_ovn_pods(base_path)
    # Get connectivity checks
    connectivity = analyze_connectivity_checks(base_path)
    # Print summary
    print_network_summary(network_type, operator_status, ovn_pods, connectivity)
    return 0
 def main():
    if len(sys.argv) < 2:
        print("Usage: analyze_network.py <must-gather-directory>", file=sys.stderr)
        print("\nExample:", file=sys.stderr)
        print("  analyze_network.py ./must-gather.local.123456789", file=sys.stderr)
        return 1
    must_gather_path = sys.argv[1]
    if not os.path.isdir(must_gather_path):
        print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
        return 1
    return analyze_network(must_gather_path)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_nodes.py
+++ b/skills/must-gather-analyzer/scripts/analyze_nodes.py
@@ -0,0 +1,224 @@
 #!/usr/bin/env python3
 """
 Analyze Node resources from must-gather data.
 Displays output similar to 'oc get nodes' command.
 """
 import sys
 import os
 import yaml
 import argparse
 from pathlib import Path
 from datetime import datetime
 from typing import List, Dict, Any, Optional
 def parse_node(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a single node YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            if doc and doc.get('kind') == 'Node':
                return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def calculate_age(creation_timestamp: str) -> str:
    """Calculate age from creation timestamp."""
    try:
        ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
        now = datetime.now(ts.tzinfo)
        delta = now - ts
        days = delta.days
        hours = delta.seconds // 3600
        if days > 0:
            return f"{days}d"
        elif hours > 0:
            return f"{hours}h"
        else:
            return "<1h"
    except Exception:
        return ""
 def get_node_roles(labels: Dict[str, str]) -> str:
    """Extract node roles from labels."""
    roles = []
    for key in labels:
        if key.startswith('node-role.kubernetes.io/'):
            role = key.split('/')[-1]
            if role:
                roles.append(role)
    return ','.join(sorted(roles)) if roles else '<none>'
 def get_node_status(node: Dict[str, Any]) -> Dict[str, Any]:
    """Extract node status information."""
    metadata = node.get('metadata', {})
    status = node.get('status', {})
    name = metadata.get('name', 'unknown')
    labels = metadata.get('labels', {})
    creation_time = metadata.get('creationTimestamp', '')
    # Get roles
    roles = get_node_roles(labels)
    # Get conditions
    conditions = status.get('conditions', [])
    ready_condition = 'Unknown'
    node_issues = []
    for condition in conditions:
        cond_type = condition.get('type', '')
        cond_status = condition.get('status', 'Unknown')
        if cond_type == 'Ready':
            ready_condition = cond_status
        elif cond_status == 'True' and cond_type in ['MemoryPressure', 'DiskPressure', 'PIDPressure', 'NetworkUnavailable']:
            node_issues.append(cond_type)
    # Determine overall status
    if ready_condition == 'True':
        node_status = 'Ready'
    elif ready_condition == 'False':
        node_status = 'NotReady'
    else:
        node_status = 'Unknown'
    # Add issues to status
    if node_issues:
        node_status = f"{node_status},{','.join(node_issues)}"
    # Get version
    node_info = status.get('nodeInfo', {})
    version = node_info.get('kubeletVersion', '')
    # Get age
    age = calculate_age(creation_time) if creation_time else ''
    # Internal IP
    addresses = status.get('addresses', [])
    internal_ip = ''
    for addr in addresses:
        if addr.get('type') == 'InternalIP':
            internal_ip = addr.get('address', '')
            break
    # OS Image
    os_image = node_info.get('osImage', '')
    return {
        'name': name,
        'status': node_status,
        'roles': roles,
        'age': age,
        'version': version,
        'internal_ip': internal_ip,
        'os_image': os_image,
        'is_problem': node_status != 'Ready' or len(node_issues) > 0
    }
 def print_nodes_table(nodes: List[Dict[str, Any]]):
    """Print nodes in a formatted table like 'oc get nodes'."""
    if not nodes:
        print("No resources found.")
        return
    # Print header
    print(f"{'NAME':<50} {'STATUS':<30} {'ROLES':<20} {'AGE':<7} VERSION")
    # Print rows
    for node in nodes:
        name = node['name'][:50]
        status = node['status'][:30]
        roles = node['roles'][:20]
        age = node['age'][:7]
        version = node['version']
        print(f"{name:<50} {status:<30} {roles:<20} {age:<7} {version}")
 def analyze_nodes(must_gather_path: str, problems_only: bool = False):
    """Analyze all nodes in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Find all node YAML files
    possible_patterns = [
        "cluster-scoped-resources/core/nodes/*.yaml",
        "*/cluster-scoped-resources/core/nodes/*.yaml",
    ]
    nodes = []
    for pattern in possible_patterns:
        for node_file in base_path.glob(pattern):
            # Skip the nodes.yaml file that contains all nodes
            if node_file.name == 'nodes.yaml':
                continue
            node = parse_node(node_file)
            if node:
                node_status = get_node_status(node)
                nodes.append(node_status)
    if not nodes:
        print("No resources found.")
        return 1
    # Remove duplicates
    seen = set()
    unique_nodes = []
    for n in nodes:
        if n['name'] not in seen:
            seen.add(n['name'])
            unique_nodes.append(n)
    # Sort by name
    unique_nodes.sort(key=lambda x: x['name'])
    # Filter if problems only
    if problems_only:
        unique_nodes = [n for n in unique_nodes if n['is_problem']]
        if not unique_nodes:
            print("No resources found.")
            return 0
    # Print results
    print_nodes_table(unique_nodes)
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description='Analyze node resources from must-gather data',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s ./must-gather.local.123456789
  %(prog)s ./must-gather.local.123456789 --problems-only
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('-p', '--problems-only', action='store_true',
                        help='Show only nodes with issues')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    return analyze_nodes(args.must_gather_path, args.problems_only)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
+++ b/skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
@@ -0,0 +1,444 @@
 #!/usr/bin/env python3
 """
 Analyze OVN Northbound and Southbound databases from must-gather.
 Uses ovsdb-tool to read binary .db files collected per-node.
 Must-gather structure:
  network_logs/
  └── ovnk_database_store.tar.gz
      └── ovnk_database_store/
          ├── ovnkube-node-{pod}_nbdb  (per-zone NBDB)
          ├── ovnkube-node-{pod}_sbdb  (per-zone SBDB)
          └── ...
 """
 import subprocess
 import json
 import sys
 import os
 import tarfile
 import yaml
 import argparse
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 class OVNDatabase:
    """Wrapper for querying OVSDB files using ovsdb-tool"""
    def __init__(self, db_path: Path, db_type: str, node_name: str = None):
        self.db_path = db_path
        self.db_type = db_type  # 'nbdb' or 'sbdb'
        self.pod_name = db_path.stem.replace('_nbdb', '').replace('_sbdb', '')
        self.node_name = node_name or self.pod_name  # Use node name if available
    def query(self, table: str, columns: List[str] = None, where: List = None) -> List[Dict]:
        """Query OVSDB table using ovsdb-tool query command"""
        schema = "OVN_Northbound" if self.db_type == "nbdb" else "OVN_Southbound"
        # Build query
        query_op = {
            "op": "select",
            "table": table,
            "where": where or []
        }
        if columns:
            query_op["columns"] = columns
        query_json = json.dumps([schema, query_op])
        try:
            result = subprocess.run(
                ['ovsdb-tool', 'query', str(self.db_path), query_json],
                capture_output=True,
                text=True,
                timeout=30
            )
            if result.returncode != 0:
                print(f"Warning: Query failed for {self.db_path}: {result.stderr}", file=sys.stderr)
                return []
            data = json.loads(result.stdout)
            return data[0].get('rows', [])
        except Exception as e:
            print(f"Warning: Failed to query {table} from {self.db_path}: {e}", file=sys.stderr)
            return []
 def build_pod_to_node_mapping(mg_path: Path) -> Dict[str, str]:
    """Build mapping of ovnkube pod names to node names"""
    pod_to_node = {}
    # Look for ovnkube-node pods in openshift-ovn-kubernetes namespace
    ovn_ns_path = mg_path / "namespaces" / "openshift-ovn-kubernetes" / "pods"
    if not ovn_ns_path.exists():
        print(f"Warning: OVN namespace pods not found at {ovn_ns_path}", file=sys.stderr)
        return pod_to_node
    # Find all ovnkube-node pod directories
    for pod_dir in ovn_ns_path.glob("ovnkube-node-*"):
        if not pod_dir.is_dir():
            continue
        pod_name = pod_dir.name
        pod_yaml = pod_dir / f"{pod_name}.yaml"
        if not pod_yaml.exists():
            continue
        try:
            with open(pod_yaml, 'r') as f:
                pod = yaml.safe_load(f)
                node_name = pod.get('spec', {}).get('nodeName')
                if node_name:
                    pod_to_node[pod_name] = node_name
        except Exception as e:
            print(f"Warning: Failed to parse {pod_yaml}: {e}", file=sys.stderr)
    return pod_to_node
 def extract_db_tarball(mg_path: Path) -> Path:
    """Extract ovnk_database_store.tar.gz if not already extracted"""
    network_logs = mg_path / "network_logs"
    tarball = network_logs / "ovnk_database_store.tar.gz"
    extract_dir = network_logs / "ovnk_database_store"
    if not tarball.exists():
        print(f"Error: Database tarball not found: {tarball}", file=sys.stderr)
        return None
    # Extract if directory doesn't exist
    if not extract_dir.exists():
        print(f"Extracting {tarball}...")
        with tarfile.open(tarball, 'r:gz') as tar:
            tar.extractall(path=network_logs)
    return extract_dir
 def get_nb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
    """Find all NB database files and map them to nodes"""
    databases = []
    for db in sorted(db_dir.glob("*_nbdb")):
        pod_name = db.stem.replace('_nbdb', '')
        node_name = pod_to_node.get(pod_name)
        databases.append(OVNDatabase(db, 'nbdb', node_name))
    return databases
 def get_sb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
    """Find all SB database files and map them to nodes"""
    databases = []
    for db in sorted(db_dir.glob("*_sbdb")):
        pod_name = db.stem.replace('_sbdb', '')
        node_name = pod_to_node.get(pod_name)
        databases.append(OVNDatabase(db, 'sbdb', node_name))
    return databases
 def analyze_logical_switches(db: OVNDatabase):
    """Analyze logical switches in the zone"""
    switches = db.query("Logical_Switch", columns=["name", "ports", "other_config"])
    if not switches:
        print("  No logical switches found.")
        return
    print(f"\n  LOGICAL SWITCHES ({len(switches)}):")
    print(f"  {'NAME':<60} PORTS")
    print(f"  {'-'*80}")
    for sw in switches:
        name = sw.get('name', 'unknown')
        # ports is a UUID set, just count them
        port_count = 0
        ports = sw.get('ports', [])
        if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
            port_count = len(ports[1])
        print(f"  {name:<60} {port_count}")
 def analyze_logical_switch_ports(db: OVNDatabase):
    """Analyze logical switch ports, focusing on pods"""
    lsps = db.query("Logical_Switch_Port", columns=["name", "external_ids", "addresses"])
    # Filter for pod ports (have pod=true in external_ids)
    pod_ports = []
    for lsp in lsps:
        ext_ids = lsp.get('external_ids', [])
        if isinstance(ext_ids, list) and len(ext_ids) == 2 and ext_ids[0] == "map":
            ext_map = dict(ext_ids[1])
            if ext_map.get('pod') == 'true':
                # Pod name is in the LSP name (format: namespace_podname)
                lsp_name = lsp.get('name', '')
                namespace = ext_map.get('namespace', '')
                # Extract pod name from LSP name
                pod_name = lsp_name
                if lsp_name.startswith(namespace + '_'):
                    pod_name = lsp_name[len(namespace) + 1:]
                # Extract IP from addresses (format can be string "MAC IP" or empty)
                ip = ""
                addrs = lsp.get('addresses', '')
                if isinstance(addrs, str) and addrs:
                    parts = addrs.split()
                    if len(parts) > 1:
                        ip = parts[1]
                pod_ports.append({
                    'name': lsp_name,
                    'namespace': namespace,
                    'pod_name': pod_name,
                    'ip': ip
                })
    if not pod_ports:
        print("  No pod logical switch ports found.")
        return
    print(f"\n  POD LOGICAL SWITCH PORTS ({len(pod_ports)}):")
    print(f"  {'NAMESPACE':<40} {'POD':<45} IP")
    print(f"  {'-'*120}")
    for port in sorted(pod_ports, key=lambda x: (x['namespace'], x['pod_name']))[:20]:  # Show first 20
        namespace = port['namespace'][:40]
        pod_name = port['pod_name'][:45]
        ip = port['ip']
        print(f"  {namespace:<40} {pod_name:<45} {ip}")
    if len(pod_ports) > 20:
        print(f"  ... and {len(pod_ports) - 20} more")
 def analyze_acls(db: OVNDatabase):
    """Analyze ACLs in the zone"""
    acls = db.query("ACL", columns=["priority", "direction", "match", "action", "severity"])
    if not acls:
        print("  No ACLs found.")
        return
    print(f"\n  ACCESS CONTROL LISTS ({len(acls)}):")
    print(f"  {'PRIORITY':<10} {'DIRECTION':<15} {'ACTION':<15} MATCH")
    print(f"  {'-'*120}")
    # Show highest priority ACLs first
    sorted_acls = sorted(acls, key=lambda x: x.get('priority', 0), reverse=True)
    for acl in sorted_acls[:15]:  # Show top 15
        priority = acl.get('priority', 0)
        direction = acl.get('direction', '')
        action = acl.get('action', '')
        match = acl.get('match', '')[:70]  # Truncate long matches
        print(f"  {priority:<10} {direction:<15} {action:<15} {match}")
    if len(acls) > 15:
        print(f"  ... and {len(acls) - 15} more")
 def analyze_logical_routers(db: OVNDatabase):
    """Analyze logical routers in the zone"""
    routers = db.query("Logical_Router", columns=["name", "ports", "static_routes"])
    if not routers:
        print("  No logical routers found.")
        return
    print(f"\n  LOGICAL ROUTERS ({len(routers)}):")
    print(f"  {'NAME':<60} PORTS")
    print(f"  {'-'*80}")
    for router in routers:
        name = router.get('name', 'unknown')
        # Count ports
        port_count = 0
        ports = router.get('ports', [])
        if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
            port_count = len(ports[1])
        print(f"  {name:<60} {port_count}")
 def analyze_zone_summary(db: OVNDatabase):
    """Print summary for a zone"""
    # Get counts - for ACLs we need multiple columns to get accurate count
    switches = db.query("Logical_Switch", columns=["name"])
    lsps = db.query("Logical_Switch_Port", columns=["name"])
    acls = db.query("ACL", columns=["priority", "direction", "match"])
    routers = db.query("Logical_Router", columns=["name"])
    print(f"\n{'='*80}")
    print(f"Node: {db.node_name}")
    if db.node_name != db.pod_name:
        print(f"Pod:  {db.pod_name}")
    print(f"{'='*80}")
    print(f"  Logical Switches:      {len(switches)}")
    print(f"  Logical Switch Ports:  {len(lsps)}")
    print(f"  ACLs:                  {len(acls)}")
    print(f"  Logical Routers:       {len(routers)}")
 def run_raw_query(mg_path: str, node_filter: str, query_json: str):
    """Run a raw JSON query against OVN databases"""
    base_path = Path(mg_path)
    # Build pod-to-node mapping
    pod_to_node = build_pod_to_node_mapping(base_path)
    # Extract tarball
    db_dir = extract_db_tarball(base_path)
    if not db_dir:
        return 1
    # Get all NB databases
    nb_dbs = get_nb_databases(db_dir, pod_to_node)
    if not nb_dbs:
        print("No Northbound databases found in must-gather.", file=sys.stderr)
        return 1
    # Filter by node if specified
    if node_filter:
        filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
        if not filtered_dbs:
            print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
            print(f"\nAvailable nodes:", file=sys.stderr)
            for db in nb_dbs:
                print(f"  - {db.node_name}", file=sys.stderr)
            return 1
        nb_dbs = filtered_dbs
    # Run query on each database
    for db in nb_dbs:
        print(f"\n{'='*80}")
        print(f"Node: {db.node_name}")
        if db.node_name != db.pod_name:
            print(f"Pod:  {db.pod_name}")
        print(f"{'='*80}\n")
        try:
            # Run the raw query using ovsdb-tool
            result = subprocess.run(
                ['ovsdb-tool', 'query', str(db.db_path), query_json],
                capture_output=True,
                text=True,
                timeout=30
            )
            if result.returncode != 0:
                print(f"Error: Query failed: {result.stderr}", file=sys.stderr)
                continue
            # Pretty print the JSON result
            try:
                data = json.loads(result.stdout)
                print(json.dumps(data, indent=2))
            except json.JSONDecodeError:
                # If not valid JSON, just print raw output
                print(result.stdout)
        except Exception as e:
            print(f"Error: Failed to execute query: {e}", file=sys.stderr)
    return 0
 def analyze_northbound_databases(mg_path: str, node_filter: str = None):
    """Analyze all Northbound databases"""
    base_path = Path(mg_path)
    # Build pod-to-node mapping
    pod_to_node = build_pod_to_node_mapping(base_path)
    # Extract tarball
    db_dir = extract_db_tarball(base_path)
    if not db_dir:
        return 1
    # Get all NB databases
    nb_dbs = get_nb_databases(db_dir, pod_to_node)
    if not nb_dbs:
        print("No Northbound databases found in must-gather.", file=sys.stderr)
        return 1
    # Filter by node if specified
    if node_filter:
        filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
        if not filtered_dbs:
            print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
            print(f"\nAvailable nodes:", file=sys.stderr)
            for db in nb_dbs:
                print(f"  - {db.node_name}", file=sys.stderr)
            return 1
        nb_dbs = filtered_dbs
    print(f"\nFound {len(nb_dbs)} node(s)\n")
    # Analyze each zone
    for db in nb_dbs:
        analyze_zone_summary(db)
        analyze_logical_switches(db)
        analyze_logical_switch_ports(db)
        analyze_acls(db)
        analyze_logical_routers(db)
        print()
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description="Analyze OVN databases from must-gather",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  # Analyze all nodes
  analyze_ovn_dbs.py ./must-gather.local.123456789
  # Analyze specific node
  analyze_ovn_dbs.py ./must-gather.local.123456789 --node ip-10-0-26-145
  # Run raw OVSDB query (Claude can construct the JSON)
  analyze_ovn_dbs.py ./must-gather/ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
  # Query specific node
  analyze_ovn_dbs.py ./must-gather/ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name"]}]'
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('--node', '-n', help='Filter by node name (supports partial matches)')
    parser.add_argument('--query', '-q', help='Run raw OVSDB JSON query instead of standard analysis')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    # Check if ovsdb-tool is available
    try:
        subprocess.run(['ovsdb-tool', '--version'], capture_output=True, check=True)
    except (subprocess.CalledProcessError, FileNotFoundError):
        print("Error: ovsdb-tool not found. Please install openvswitch package.", file=sys.stderr)
        return 1
    # Run query mode or standard analysis
    if args.query:
        return run_raw_query(args.must_gather_path, args.node, args.query)
    else:
        return analyze_northbound_databases(args.must_gather_path, args.node)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_pods.py
+++ b/skills/must-gather-analyzer/scripts/analyze_pods.py
@@ -0,0 +1,224 @@
 #!/usr/bin/env python3
 """
 Analyze Pod resources from must-gather data.
 Displays output similar to 'oc get pods -A' command.
 """
 import sys
 import os
 import yaml
 import argparse
 from pathlib import Path
 from datetime import datetime
 from typing import List, Dict, Any, Optional
 def parse_pod(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a single pod YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            if doc and doc.get('kind') == 'Pod':
                return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def calculate_age(creation_timestamp: str) -> str:
    """Calculate age from creation timestamp."""
    try:
        ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
        now = datetime.now(ts.tzinfo)
        delta = now - ts
        days = delta.days
        hours = delta.seconds // 3600
        minutes = (delta.seconds % 3600) // 60
        if days > 0:
            return f"{days}d"
        elif hours > 0:
            return f"{hours}h"
        elif minutes > 0:
            return f"{minutes}m"
        else:
            return "<1m"
    except Exception:
        return ""
 def get_pod_status(pod: Dict[str, Any]) -> Dict[str, Any]:
    """Extract pod status information."""
    metadata = pod.get('metadata', {})
    status = pod.get('status', {})
    spec = pod.get('spec', {})
    name = metadata.get('name', 'unknown')
    namespace = metadata.get('namespace', 'unknown')
    creation_time = metadata.get('creationTimestamp', '')
    # Get container statuses
    container_statuses = status.get('containerStatuses', [])
    init_container_statuses = status.get('initContainerStatuses', [])
    # Calculate ready containers
    total_containers = len(spec.get('containers', []))
    ready_containers = sum(1 for cs in container_statuses if cs.get('ready', False))
    # Get overall phase
    phase = status.get('phase', 'Unknown')
    # Determine more specific status
    pod_status = phase
    reason = status.get('reason', '')
    # Check for specific container states
    for cs in container_statuses:
        state = cs.get('state', {})
        if 'waiting' in state:
            waiting = state['waiting']
            pod_status = waiting.get('reason', 'Waiting')
        elif 'terminated' in state:
            terminated = state['terminated']
            if terminated.get('exitCode', 0) != 0:
                pod_status = terminated.get('reason', 'Error')
    # Check init containers
    for ics in init_container_statuses:
        state = ics.get('state', {})
        if 'waiting' in state:
            waiting = state['waiting']
            if waiting.get('reason') in ['CrashLoopBackOff', 'ImagePullBackOff', 'ErrImagePull']:
                pod_status = f"Init:{waiting.get('reason', 'Waiting')}"
    # Calculate total restarts
    total_restarts = sum(cs.get('restartCount', 0) for cs in container_statuses)
    # Calculate age
    age = calculate_age(creation_time) if creation_time else ''
    return {
        'namespace': namespace,
        'name': name,
        'ready': f"{ready_containers}/{total_containers}",
        'status': pod_status,
        'restarts': str(total_restarts),
        'age': age,
        'node': spec.get('nodeName', ''),
        'is_problem': pod_status not in ['Running', 'Succeeded', 'Completed'] or total_restarts > 0
    }
 def print_pods_table(pods: List[Dict[str, Any]], show_namespace: bool = True):
    """Print pods in a formatted table like 'oc get pods'."""
    if not pods:
        print("No resources found.")
        return
    # Print header
    if show_namespace:
        print(f"{'NAMESPACE':<42} {'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
    else:
        print(f"{'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
    # Print rows
    for pod in pods:
        name = pod['name'][:50]
        ready = pod['ready'][:7]
        status = pod['status'][:20]
        restarts = pod['restarts'][:9]
        age = pod['age']
        if show_namespace:
            namespace = pod['namespace'][:42]
            print(f"{namespace:<42} {name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
        else:
            print(f"{name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
 def analyze_pods(must_gather_path: str, namespace: Optional[str] = None, problems_only: bool = False):
    """Analyze all pods in a must-gather directory."""
    base_path = Path(must_gather_path)
    pods = []
    # Find all pod YAML files
    # Structure: namespaces/<namespace>/pods/<pod-name>/<pod-name>.yaml
    if namespace:
        # Specific namespace
        patterns = [
            f"namespaces/{namespace}/pods/*/*.yaml",
            f"*/namespaces/{namespace}/pods/*/*.yaml",
        ]
    else:
        # All namespaces
        patterns = [
            "namespaces/*/pods/*/*.yaml",
            "*/namespaces/*/pods/*/*.yaml",
        ]
    for pattern in patterns:
        for pod_file in base_path.glob(pattern):
            pod = parse_pod(pod_file)
            if pod:
                pod_status = get_pod_status(pod)
                pods.append(pod_status)
    if not pods:
        print("No resources found.")
        return 1
    # Remove duplicates
    seen = set()
    unique_pods = []
    for p in pods:
        key = f"{p['namespace']}/{p['name']}"
        if key not in seen:
            seen.add(key)
            unique_pods.append(p)
    # Sort by namespace, then name
    unique_pods.sort(key=lambda x: (x['namespace'], x['name']))
    # Filter if problems only
    if problems_only:
        unique_pods = [p for p in unique_pods if p['is_problem']]
        if not unique_pods:
            print("No resources found.")
            return 0
    # Print results
    print_pods_table(unique_pods, show_namespace=(namespace is None))
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description='Analyze pod resources from must-gather data',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s ./must-gather.local.123456789
  %(prog)s ./must-gather.local.123456789 --namespace openshift-etcd
  %(prog)s ./must-gather.local.123456789 --problems-only
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('-n', '--namespace', help='Filter by namespace')
    parser.add_argument('-p', '--problems-only', action='store_true',
                        help='Show only pods with issues')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    return analyze_pods(args.must_gather_path, args.namespace, args.problems_only)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_prometheus.py
+++ b/skills/must-gather-analyzer/scripts/analyze_prometheus.py
@@ -0,0 +1,117 @@
 #!/usr/bin/env python3
 """
 Analyze Prometheus data from must-gather data.
 Shows Prometheus status, targets, and active alerts.
 """
 import sys
 import os
 import json
 import argparse
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a JSON file."""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            doc = json.load(f)
            return doc
    except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
        print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def print_alerts_table(alerts):
    """Print alerts in a table format."""
    if not alerts:
        print("No alerts found.")
        return
    print("ALERTS")
    print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")
    for alert in alerts:
        state = alert.get('state', '')
        since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
        labels = alert.get('labels', {})
        namespace = labels.pop('namespace', '')[:50]
        name = labels.pop('alertname', '')[:50]
        severity = labels.pop('severity', '')[:10]
        print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")
 def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
    """Analyze Prometheus data in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Retrieve active alerts.
    rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
    rules = parse_json_file(rules_path)
    if rules is None:
        return 1
    status = rules.get("status", "")
    if status != "success":
        print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
        return 1
    if "data" not in rules or "groups" not in rules["data"]:
        print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
        return 1
    alerts = []
    for group in rules["data"]["groups"]:
        for rule in group["rules"]:
            if rule["type"] == 'alerting' and rule["state"] != 'inactive':
                for alert in rule["alerts"]:
                    if namespace is None or namespace == '':
                        alerts.append(alert)
                    elif alert.get('labels', {}).get('namespace', '') == namespace:
                        alerts.append(alert)
    # Sort alerts by namespace, alertname and severity.
    alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))
    # Print results
    print_alerts_table(alerts)
    # Summary
    total_alerts = len(alerts)
    pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
    firing = sum(1 for alert in alerts if alert.get('state') == 'firing')
    print(f"\n{'='*80}")
    print(f"SUMMARY")
    print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
    print(f"{'='*80}")
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description='Analyze Prometheus data from must-gather data',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s ./must-gather
  %(prog)s ./must-gather --namespace openshift-monitoring
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('-n', '--namespace', help='Filter information by namespace')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    return analyze_prometheus(args.must_gather_path, args.namespace)
 if __name__ == '__main__':
    sys.exit(main())
--- a/skills/must-gather-analyzer/scripts/analyze_pvs.py
+++ b/skills/must-gather-analyzer/scripts/analyze_pvs.py
@@ -0,0 +1,235 @@
 #!/usr/bin/env python3
 """
 Analyze PersistentVolumes and PersistentVolumeClaims from must-gather data.
 Shows PV/PVC status, capacity, and binding information.
 """
 import sys
 import os
 import yaml
 import argparse
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
    """Parse a YAML file."""
    try:
        with open(file_path, 'r') as f:
            doc = yaml.safe_load(f)
            return doc
    except Exception as e:
        print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
    return None
 def format_pv(pv: Dict[str, Any]) -> Dict[str, str]:
    """Format a PersistentVolume for display."""
    name = pv.get('metadata', {}).get('name', 'unknown')
    spec = pv.get('spec', {})
    status = pv.get('status', {})
    capacity = spec.get('capacity', {}).get('storage', '')
    access_modes = ','.join(spec.get('accessModes', []))[:20]
    reclaim_policy = spec.get('persistentVolumeReclaimPolicy', '')
    pv_status = status.get('phase', 'Unknown')
    claim_ref = spec.get('claimRef', {})
    claim = ''
    if claim_ref:
        claim_ns = claim_ref.get('namespace', '')
        claim_name = claim_ref.get('name', '')
        claim = f"{claim_ns}/{claim_name}" if claim_ns else claim_name
    storage_class = spec.get('storageClassName', '')
    return {
        'name': name,
        'capacity': capacity,
        'access_modes': access_modes,
        'reclaim_policy': reclaim_policy,
        'status': pv_status,
        'claim': claim,
        'storage_class': storage_class
    }
 def format_pvc(pvc: Dict[str, Any]) -> Dict[str, str]:
    """Format a PersistentVolumeClaim for display."""
    metadata = pvc.get('metadata', {})
    name = metadata.get('name', 'unknown')
    namespace = metadata.get('namespace', 'unknown')
    spec = pvc.get('spec', {})
    status = pvc.get('status', {})
    pvc_status = status.get('phase', 'Unknown')
    volume = spec.get('volumeName', '')
    capacity = status.get('capacity', {}).get('storage', '')
    access_modes = ','.join(status.get('accessModes', []))[:20]
    storage_class = spec.get('storageClassName', '')
    return {
        'namespace': namespace,
        'name': name,
        'status': pvc_status,
        'volume': volume,
        'capacity': capacity,
        'access_modes': access_modes,
        'storage_class': storage_class
    }
 def print_pvs_table(pvs: List[Dict[str, str]]):
    """Print PVs in a table format."""
    if not pvs:
        print("No PersistentVolumes found.")
        return
    print("PERSISTENT VOLUMES")
    print(f"{'NAME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} {'RECLAIM':<10} {'STATUS':<10} {'CLAIM':<40} STORAGECLASS")
    for pv in pvs:
        name = pv['name'][:50]
        capacity = pv['capacity'][:10]
        access = pv['access_modes'][:20]
        reclaim = pv['reclaim_policy'][:10]
        status = pv['status'][:10]
        claim = pv['claim'][:40]
        sc = pv['storage_class']
        print(f"{name:<50} {capacity:<10} {access:<20} {reclaim:<10} {status:<10} {claim:<40} {sc}")
 def print_pvcs_table(pvcs: List[Dict[str, str]]):
    """Print PVCs in a table format."""
    if not pvcs:
        print("\nNo PersistentVolumeClaims found.")
        return
    print("\nPERSISTENT VOLUME CLAIMS")
    print(f"{'NAMESPACE':<30} {'NAME':<40} {'STATUS':<10} {'VOLUME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} STORAGECLASS")
    for pvc in pvcs:
        namespace = pvc['namespace'][:30]
        name = pvc['name'][:40]
        status = pvc['status'][:10]
        volume = pvc['volume'][:50]
        capacity = pvc['capacity'][:10]
        access = pvc['access_modes'][:20]
        sc = pvc['storage_class']
        print(f"{namespace:<30} {name:<40} {status:<10} {volume:<50} {capacity:<10} {access:<20} {sc}")
 def analyze_storage(must_gather_path: str, namespace: Optional[str] = None):
    """Analyze PVs and PVCs in a must-gather directory."""
    base_path = Path(must_gather_path)
    # Find PVs (cluster-scoped)
    pv_patterns = [
        "cluster-scoped-resources/core/persistentvolumes/*.yaml",
        "*/cluster-scoped-resources/core/persistentvolumes/*.yaml",
    ]
    pvs = []
    for pattern in pv_patterns:
        for pv_file in base_path.glob(pattern):
            if pv_file.name == 'persistentvolumes.yaml':
                continue
            pv = parse_yaml_file(pv_file)
            if pv and pv.get('kind') == 'PersistentVolume':
                pvs.append(format_pv(pv))
    # Find PVCs (namespace-scoped)
    if namespace:
        pvc_patterns = [
            f"namespaces/{namespace}/core/persistentvolumeclaims.yaml",
            f"*/namespaces/{namespace}/core/persistentvolumeclaims.yaml",
        ]
    else:
        pvc_patterns = [
            "namespaces/*/core/persistentvolumeclaims.yaml",
            "*/namespaces/*/core/persistentvolumeclaims.yaml",
        ]
    pvcs = []
    for pattern in pvc_patterns:
        for pvc_file in base_path.glob(pattern):
            pvc_doc = parse_yaml_file(pvc_file)
            if pvc_doc:
                if pvc_doc.get('kind') == 'PersistentVolumeClaim':
                    pvcs.append(format_pvc(pvc_doc))
                elif pvc_doc.get('kind') == 'List':
                    for item in pvc_doc.get('items', []):
                        if item.get('kind') == 'PersistentVolumeClaim':
                            pvcs.append(format_pvc(item))
    # Remove duplicates
    seen_pvs = set()
    unique_pvs = []
    for pv in pvs:
        if pv['name'] not in seen_pvs:
            seen_pvs.add(pv['name'])
            unique_pvs.append(pv)
    seen_pvcs = set()
    unique_pvcs = []
    for pvc in pvcs:
        key = f"{pvc['namespace']}/{pvc['name']}"
        if key not in seen_pvcs:
            seen_pvcs.add(key)
            unique_pvcs.append(pvc)
    # Sort
    unique_pvs.sort(key=lambda x: x['name'])
    unique_pvcs.sort(key=lambda x: (x['namespace'], x['name']))
    # Print results
    print_pvs_table(unique_pvs)
    print_pvcs_table(unique_pvcs)
    # Summary
    total_pvs = len(unique_pvs)
    bound_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Bound')
    available_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Available')
    total_pvcs = len(unique_pvcs)
    bound_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Bound')
    pending_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Pending')
    print(f"\n{'='*80}")
    print(f"SUMMARY")
    print(f"PVs: {total_pvs} total ({bound_pvs} bound, {available_pvs} available)")
    print(f"PVCs: {total_pvcs} total ({bound_pvcs} bound, {pending_pvcs} pending)")
    if pending_pvcs > 0:
        print(f"  ⚠️  {pending_pvcs} PVC(s) pending - check storage provisioner")
    print(f"{'='*80}")
    return 0
 def main():
    parser = argparse.ArgumentParser(
        description='Analyze PVs and PVCs from must-gather data',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
 Examples:
  %(prog)s ./must-gather
  %(prog)s ./must-gather --namespace openshift-monitoring
        """
    )
    parser.add_argument('must_gather_path', help='Path to must-gather directory')
    parser.add_argument('-n', '--namespace', help='Filter PVCs by namespace')
    args = parser.parse_args()
    if not os.path.isdir(args.must_gather_path):
        print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
        return 1
    return analyze_storage(args.must_gather_path, args.namespace)
 if __name__ == '__main__':
    sys.exit(main())
		`@@ -0,0 +1,3 @@`
							`# must-gather`

							`A plugin to analyze and report on must-gather data`