Initial commit
This commit is contained in:
14
.claude-plugin/plugin.json
Normal file
14
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"name": "must-gather",
|
||||
"description": "A plugin to analyze and report on must-gather data",
|
||||
"version": "0.0.1",
|
||||
"author": {
|
||||
"name": "openshift"
|
||||
},
|
||||
"skills": [
|
||||
"./skills"
|
||||
],
|
||||
"commands": [
|
||||
"./commands"
|
||||
]
|
||||
}
|
||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# must-gather
|
||||
|
||||
A plugin to analyze and report on must-gather data
|
||||
262
commands/analyze.md
Normal file
262
commands/analyze.md
Normal file
@@ -0,0 +1,262 @@
|
||||
---
|
||||
description: Quick analysis of must-gather data - runs all analysis scripts and provides comprehensive cluster diagnostics
|
||||
argument-hint: [must-gather-path] [component]
|
||||
---
|
||||
|
||||
## Name
|
||||
must-gather:analyze
|
||||
|
||||
## Synopsis
|
||||
```
|
||||
/must-gather:analyze [must-gather-path] [component]
|
||||
```
|
||||
|
||||
## Description
|
||||
|
||||
The `analyze` command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.
|
||||
|
||||
The command can analyze:
|
||||
- Cluster version and update status
|
||||
- Cluster operator health (degraded, progressing, unavailable)
|
||||
- Node conditions and resource status
|
||||
- Pod failures, restarts, and crash loops
|
||||
- Network configuration and OVN health
|
||||
- OVN databases - logical topology, ACLs, pods
|
||||
- Kubernetes events (warnings and errors)
|
||||
- etcd cluster health and quorum status
|
||||
- Persistent volume and claim status
|
||||
- Prometheus alerts
|
||||
|
||||
You can request analysis of the entire cluster or focus on a specific component.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**Required Directory Structure:**
|
||||
|
||||
Must-gather data typically has this structure:
|
||||
```
|
||||
must-gather/
|
||||
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
||||
├── cluster-scoped-resources/
|
||||
├── namespaces/
|
||||
└── ...
|
||||
```
|
||||
|
||||
The actual must-gather directory is the subdirectory with the hash name, not the parent directory.
|
||||
|
||||
**Required Scripts:**
|
||||
|
||||
Analysis scripts are bundled with this plugin at:
|
||||
```
|
||||
<plugin-root>/skills/must-gather-analyzer/scripts/
|
||||
├── analyze_clusterversion.py
|
||||
├── analyze_clusteroperators.py
|
||||
├── analyze_nodes.py
|
||||
├── analyze_pods.py
|
||||
├── analyze_network.py
|
||||
├── analyze_ovn_dbs.py
|
||||
├── analyze_events.py
|
||||
├── analyze_etcd.py
|
||||
└── analyze_pvs.py
|
||||
```
|
||||
|
||||
Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
|
||||
|
||||
## Error Handling
|
||||
|
||||
**CRITICAL: Script-Only Analysis**
|
||||
|
||||
- **NEVER** attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
|
||||
- **ONLY** use the provided Python scripts in `plugins/must-gather/skills/must-gather-analyzer/scripts/`
|
||||
- If scripts are missing or not found:
|
||||
1. Stop immediately
|
||||
2. Inform the user that the analysis scripts are not available
|
||||
3. Ask the user to ensure the scripts are installed at the correct path
|
||||
4. Do NOT attempt alternative approaches
|
||||
|
||||
**Script Availability Check:**
|
||||
|
||||
Before running any analysis:
|
||||
|
||||
1. Locate the scripts directory by searching for a known script:
|
||||
```bash
|
||||
SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
|
||||
|
||||
if [ -z "$SCRIPT_PATH" ]; then
|
||||
echo "ERROR: Must-gather analysis scripts not found."
|
||||
echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# All scripts are in the same directory, so just get the directory
|
||||
SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
|
||||
```
|
||||
|
||||
2. If scripts cannot be found, STOP and report to the user:
|
||||
```
|
||||
The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
The command performs the following steps:
|
||||
|
||||
1. **Validate Must-Gather Path**:
|
||||
- If path not provided as argument, ask the user
|
||||
- Check if path contains `cluster-scoped-resources/` and `namespaces/` directories
|
||||
- If user provides root directory, automatically find the correct subdirectory
|
||||
- Verify the path exists and is readable
|
||||
|
||||
2. **Determine Analysis Scope**:
|
||||
|
||||
**STEP 1: Check for SPECIFIC component keywords**
|
||||
|
||||
If the user mentions a specific component, run ONLY that script:
|
||||
- "pods", "pod status", "containers", "crashloop", "failing pods" → `analyze_pods.py` ONLY
|
||||
- "etcd", "etcd health", "quorum" → `analyze_etcd.py` ONLY
|
||||
- "network", "networking", "ovn", "connectivity" → `analyze_network.py` ONLY
|
||||
- "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" → `analyze_ovn_dbs.py` ONLY
|
||||
- "nodes", "node status", "node conditions" → `analyze_nodes.py` ONLY
|
||||
- "operators", "cluster operators", "degraded" → `analyze_clusteroperators.py` ONLY
|
||||
- "version", "cluster version", "update", "upgrade" → `analyze_clusterversion.py` ONLY
|
||||
- "events", "warnings", "errors" → `analyze_events.py` ONLY
|
||||
- "storage", "pv", "pvc", "volumes", "persistent" → `analyze_pvs.py` ONLY
|
||||
- "alerts", "prometheus", "monitoring" → `analyze_prometheus.py` ONLY
|
||||
|
||||
**STEP 2: No specific component mentioned**
|
||||
|
||||
If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:
|
||||
1. ClusterVersion (`analyze_clusterversion.py`)
|
||||
2. Cluster Operators (`analyze_clusteroperators.py`)
|
||||
3. Nodes (`analyze_nodes.py`)
|
||||
4. Pods - problems only (`analyze_pods.py --problems-only`)
|
||||
5. Network (`analyze_network.py`)
|
||||
6. Events - warnings only (`analyze_events.py --type Warning --count 50`)
|
||||
7. etcd (`analyze_etcd.py`)
|
||||
8. Storage (`analyze_pvs.py`)
|
||||
9. Monitoring (`analyze_prometheus.py`)
|
||||
|
||||
3. **Locate Plugin Scripts**:
|
||||
- Use the script availability check from the Error Handling section to find the plugin root
|
||||
- Store the scripts directory path in `$SCRIPTS_DIR`
|
||||
|
||||
4. **Execute Analysis Scripts**:
|
||||
```bash
|
||||
python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-...
|
||||
```
|
||||
|
||||
5. **Synthesize Results**: Generate findings and recommendations based on script output
|
||||
|
||||
## Return Value
|
||||
|
||||
The command outputs structured analysis results to stdout:
|
||||
|
||||
**For Component-Specific Analysis:**
|
||||
- Script output for the requested component only
|
||||
- Focused findings and recommendations
|
||||
|
||||
**For Full Analysis:**
|
||||
- Organized sections for each component
|
||||
- Executive summary of overall cluster health
|
||||
- Prioritized list of critical issues
|
||||
- Actionable recommendations
|
||||
- Suggested log files to review
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
================================================================================
|
||||
MUST-GATHER ANALYSIS SUMMARY
|
||||
================================================================================
|
||||
|
||||
[Script outputs organized by component]
|
||||
|
||||
CLUSTER VERSION:
|
||||
[output from analyze_clusterversion.py]
|
||||
|
||||
CLUSTER OPERATORS:
|
||||
[output from analyze_clusteroperators.py]
|
||||
|
||||
NODES:
|
||||
[output from analyze_nodes.py]
|
||||
|
||||
PROBLEMATIC PODS:
|
||||
[output from analyze_pods.py --problems-only]
|
||||
|
||||
NETWORK STATUS:
|
||||
[output from analyze_network.py]
|
||||
|
||||
WARNING EVENTS (Last 50):
|
||||
[output from analyze_events.py --type Warning --count 50]
|
||||
|
||||
ETCD CLUSTER HEALTH:
|
||||
[output from analyze_etcd.py]
|
||||
|
||||
STORAGE (PVs/PVCs):
|
||||
[output from analyze_pvs.py]
|
||||
|
||||
MONITORING (Alerts):
|
||||
[output from analyze_prometheus.py]
|
||||
|
||||
================================================================================
|
||||
FINDINGS AND RECOMMENDATIONS
|
||||
================================================================================
|
||||
|
||||
Critical Issues:
|
||||
- [Critical problems requiring immediate attention]
|
||||
|
||||
Warnings:
|
||||
- [Potential issues or degraded components]
|
||||
|
||||
Recommendations:
|
||||
- [Specific next steps for investigation]
|
||||
|
||||
Logs to Review:
|
||||
- [Specific log files to examine based on findings]
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
1. **Full cluster analysis**:
|
||||
```
|
||||
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
|
||||
```
|
||||
Runs all analysis scripts and provides comprehensive cluster diagnostics.
|
||||
|
||||
2. **Analyze pod issues only**:
|
||||
```
|
||||
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statuses
|
||||
```
|
||||
Runs only `analyze_pods.py` to focus on pod-related issues.
|
||||
|
||||
3. **Check etcd health**:
|
||||
```
|
||||
/must-gather:analyze check etcd health
|
||||
```
|
||||
Asks for must-gather path, then runs only `analyze_etcd.py`.
|
||||
|
||||
4. **Network troubleshooting**:
|
||||
```
|
||||
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issues
|
||||
```
|
||||
Runs only `analyze_network.py` for network-specific analysis.
|
||||
|
||||
## Notes
|
||||
|
||||
- **Must-Gather Path**: Always use the subdirectory containing `cluster-scoped-resources/` and `namespaces/`, not the parent directory
|
||||
- **Script Dependencies**: Analysis scripts must be executable and have required Python dependencies installed
|
||||
- **Error Handling**: If scripts are not found or must-gather path is invalid, clear error messages are displayed
|
||||
- **Cross-Referencing**: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
|
||||
- **Pattern Detection**: Identifies patterns like multiple pod failures on the same node
|
||||
- **Actionable Output**: Focuses on insights and recommendations rather than raw data dumps
|
||||
- **Priority**: Issues are prioritized by severity (Critical > Warning > Info)
|
||||
|
||||
## Arguments
|
||||
|
||||
- **$1** (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
|
||||
- **$2+** (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.
|
||||
266
commands/ovn-dbs.md
Normal file
266
commands/ovn-dbs.md
Normal file
@@ -0,0 +1,266 @@
|
||||
---
|
||||
description: Analyze OVN databases from a must-gather using ovsdb-tool
|
||||
argument-hint: [must-gather-path]
|
||||
---
|
||||
|
||||
## Name
|
||||
must-gather:ovn-dbs
|
||||
|
||||
## Synopsis
|
||||
```
|
||||
/must-gather:ovn-dbs [must-gather-path] [--node <node-name>] [--query <json>]
|
||||
```
|
||||
|
||||
## Description
|
||||
|
||||
The `ovn-dbs` command analyzes OVN Northbound and Southbound databases collected from clusters. It uses `ovsdb-tool` to query the binary database files (`.db`) collected per-node, providing detailed information about the logical network topology, pods, ACLs, and routers on each node.
|
||||
|
||||
The command automatically maps ovnkube pods to their corresponding nodes by reading pod specifications from the must-gather data.
|
||||
|
||||
**Two modes of operation:**
|
||||
1. **Standard Analysis** (default): Runs pre-built analysis showing switches, ports, ACLs, and routers
|
||||
2. **Query Mode** (`--query`): Run custom OVSDB JSON queries for specific data extraction
|
||||
|
||||
**What it analyzes:**
|
||||
- **Per-zone logical network topology**
|
||||
- **Logical Switches** and their ports
|
||||
- **Pod Logical Switch Ports** with namespace, pod name, and IP addresses
|
||||
- **Access Control Lists (ACLs)** with priorities, directions, and match rules
|
||||
- **Logical Routers** and their ports
|
||||
|
||||
**Important:** This command only works with must-gathers from clusters, where each node/zone has its own database files.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
The must-gather should contain:
|
||||
```
|
||||
network_logs/
|
||||
└── ovnk_database_store.tar.gz
|
||||
```
|
||||
|
||||
**Required Tools:**
|
||||
|
||||
- `ovsdb-tool` must be installed (from openvswitch package)
|
||||
- Check with: `which ovsdb-tool`
|
||||
- Install: `sudo dnf install openvswitch` or `sudo apt install openvswitch-common`
|
||||
|
||||
**Analysis Script:**
|
||||
|
||||
The script is bundled with this plugin:
|
||||
```
|
||||
<plugin-root>/skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
|
||||
```
|
||||
|
||||
Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
|
||||
|
||||
Claude will automatically locate it by searching for the script in the plugin installation directory, regardless of your current working directory.
|
||||
|
||||
## Implementation
|
||||
|
||||
The command performs the following steps:
|
||||
|
||||
1. **Locate Analysis Script**:
|
||||
```bash
|
||||
SCRIPT_PATH=$(find ~ -name "analyze_ovn_dbs.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
|
||||
|
||||
if [ -z "$SCRIPT_PATH" ]; then
|
||||
echo "ERROR: analyze_ovn_dbs.py script not found."
|
||||
echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
|
||||
```
|
||||
|
||||
2. **Extract Database Tarball**:
|
||||
- Locate `network_logs/ovnk_database_store.tar.gz`
|
||||
- Extract if not already extracted
|
||||
- Find all `*_nbdb` and `*_sbdb` files
|
||||
|
||||
3. **Query Each Zone's Database**:
|
||||
For each zone (node), query the Northbound database using `ovsdb-tool query`:
|
||||
|
||||
```bash
|
||||
ovsdb-tool query <zone>_nbdb '["OVN_Northbound", {"op":"select", "table":"<table>", "where":[], "columns":[...]}]'
|
||||
```
|
||||
|
||||
4. **Analyze and Display**:
|
||||
- **Logical Switches**: Names and port counts
|
||||
- **Logical Switch Ports**: Filter for pods (external_ids.pod=true), show namespace, pod name, and IP
|
||||
- **ACLs**: Priority, direction, match rules, and actions
|
||||
- **Logical Routers**: Names and port counts
|
||||
|
||||
5. **Present Zone Summary**:
|
||||
- Total counts per zone
|
||||
- Detailed breakdowns
|
||||
- Sorted and formatted output
|
||||
|
||||
## Return Value
|
||||
|
||||
The command outputs structured analysis for each node:
|
||||
|
||||
```
|
||||
Found 6 node(s)
|
||||
|
||||
================================================================================
|
||||
Node: ip-10-0-26-145.us-east-2.compute.internal
|
||||
Pod: ovnkube-node-79cbh
|
||||
================================================================================
|
||||
Logical Switches: 4
|
||||
Logical Switch Ports: 55
|
||||
ACLs: 7
|
||||
Logical Routers: 2
|
||||
|
||||
LOGICAL SWITCHES (4):
|
||||
NAME PORTS
|
||||
--------------------------------------------------------------------------------
|
||||
transit_switch 6
|
||||
ip-10-0-1-10.us-east-2.compute.internal 7
|
||||
ext_ip-10-0-1-10.us-east-2.compute.internal 2
|
||||
join 2
|
||||
|
||||
POD LOGICAL SWITCH PORTS (5):
|
||||
NAMESPACE POD IP
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
openshift-dns dns-default-abc123 10.128.0.5
|
||||
openshift-monitoring prometheus-k8s-0 10.128.0.10
|
||||
openshift-etcd etcd-master-0 10.128.0.3
|
||||
...
|
||||
|
||||
ACCESS CONTROL LISTS (7):
|
||||
PRIORITY DIRECTION ACTION MATCH
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
1012 from-lport allow inport == @a4743249366342378346 && (ip4.mcast ...
|
||||
1011 to-lport drop (ip4.mcast || mldv1 || mldv2 || ...
|
||||
1001 to-lport allow-related ip4.src==10.128.0.2
|
||||
...
|
||||
|
||||
LOGICAL ROUTERS (2):
|
||||
NAME PORTS
|
||||
--------------------------------------------------------------------------------
|
||||
ovn_cluster_router 3
|
||||
GR_ip-10-0-1-10.us-east-2.compute.internal 2
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
1. **Analyze all nodes in a must-gather**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
|
||||
```
|
||||
Shows logical network topology for all nodes.
|
||||
|
||||
2. **Analyze specific node**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../ --node ip-10-0-26-145
|
||||
```
|
||||
Shows OVN database information only for the specified node (supports partial name matching).
|
||||
|
||||
3. **Analyze master node**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../ --node master-0
|
||||
```
|
||||
Filter to a specific master node using partial name matching.
|
||||
|
||||
4. **Interactive usage without path**:
|
||||
```
|
||||
/must-gather:ovn-dbs
|
||||
```
|
||||
The command will ask for the must-gather path.
|
||||
|
||||
5. **Check if pod exists in OVN**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../
|
||||
```
|
||||
Then search the output for the pod name to see which node it's on and its IP allocation.
|
||||
|
||||
6. **Investigate ACL rules on a specific node**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../ --node worker-1
|
||||
```
|
||||
Review the ACL section for a specific node to understand traffic filtering rules.
|
||||
|
||||
7. **Run custom OVSDB query** (Query Mode):
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
|
||||
```
|
||||
Query ACLs with priority > 1000 across all nodes. Claude can construct the JSON query for any OVSDB table.
|
||||
|
||||
8. **Query specific node with custom query**:
|
||||
```
|
||||
/must-gather:ovn-dbs ./must-gather/.../ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name","ports"]}]'
|
||||
```
|
||||
List all logical switches with their ports on master-0.
|
||||
|
||||
9. **Query specific table** (Claude constructs JSON):
|
||||
Just ask Claude to query a specific OVSDB table and it will construct the appropriate JSON query. For example:
|
||||
- "Show all Logical_Router_Static_Route entries"
|
||||
- "Find ACLs with action 'drop'"
|
||||
- "List Logical_Switch_Port entries where external_ids contains 'openshift-etcd'"
|
||||
|
||||
## Error Handling
|
||||
|
||||
**Missing ovsdb-tool:**
|
||||
```
|
||||
Error: ovsdb-tool not found. Please install openvswitch package.
|
||||
```
|
||||
Solution: Install openvswitch: `sudo dnf install openvswitch`
|
||||
|
||||
**Missing database tarball:**
|
||||
```
|
||||
Error: Database tarball not found: network_logs/ovnk_database_store.tar.gz
|
||||
```
|
||||
Solution: Ensure this is a must-gather from an OVN cluster.
|
||||
|
||||
|
||||
**Node not found:**
|
||||
```
|
||||
Error: No databases found for node matching 'master-5'
|
||||
|
||||
Available nodes:
|
||||
- ip-10-0-77-117.us-east-2.compute.internal
|
||||
- ip-10-0-26-145.us-east-2.compute.internal
|
||||
- ip-10-0-1-194.us-east-2.compute.internal
|
||||
```
|
||||
Solution: Use one of the listed node names or a partial match.
|
||||
|
||||
## Notes
|
||||
|
||||
- **Binary Database Format**: Uses `ovsdb-tool` to read OVSDB binary files directly
|
||||
- **Per-Node Analysis**: Each node in IC mode has its own database (one NB and one SB per zone)
|
||||
- **Node Mapping**: Automatically correlates ovnkube pods to nodes by reading pod specs from must-gather
|
||||
- **Pod Discovery**: Pods are identified by `external_ids` with `pod=true`
|
||||
- **IP Extraction**: Pod IPs are parsed from the `addresses` field (format: "MAC IP")
|
||||
- **ACL Priorities**: Higher priority ACLs are processed first (shown at top)
|
||||
- **Node Filtering**: Supports partial name matching for convenience (e.g., "--node master" matches all masters)
|
||||
- **Query Mode**: Accepts raw OVSDB JSON queries in the format `["OVN_Northbound", {"op":"select", "table":"...", ...}]`
|
||||
- **Claude Query Construction**: Claude can automatically construct OVSDB JSON queries based on natural language requests
|
||||
- **Performance**: Querying large databases may take a few seconds per node
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. **Verify Pod Network Configuration**:
|
||||
- Check if pods are registered in OVN
|
||||
- Verify IP address assignments
|
||||
- Confirm logical switch port creation
|
||||
|
||||
2. **Troubleshoot Connectivity Issues**:
|
||||
- Review ACL rules blocking traffic
|
||||
- Check if pods are in correct logical switches
|
||||
- Verify router configurations
|
||||
|
||||
3. **Understand Topology**:
|
||||
- See how zones are interconnected via transit_switch
|
||||
- Review gateway router configurations
|
||||
- Understand logical network structure
|
||||
|
||||
4. **Audit Network Policies**:
|
||||
- See ACL rules generated from NetworkPolicies
|
||||
- Identify overly permissive or restrictive rules
|
||||
- Check rule priorities and match conditions
|
||||
|
||||
## Arguments
|
||||
|
||||
- **$1** (must-gather-path): Optional. Path to the must-gather directory containing network_logs/. If not provided, user will be prompted.
|
||||
- **--node, -n** (node-name): Optional. Filter analysis to a specific node. Supports partial name matching (e.g., "master-0", "ip-10-0-26-145"). If no match is found, displays list of available nodes.
|
||||
- **--query, -q** (json-query): Optional. Run a raw OVSDB JSON query instead of standard analysis. Claude can construct the JSON query based on OVSDB transaction format. When provided, outputs raw JSON results instead of formatted analysis.
|
||||
93
plugin.lock.json
Normal file
93
plugin.lock.json
Normal file
@@ -0,0 +1,93 @@
|
||||
{
|
||||
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||
"pluginId": "gh:openshift-eng/ai-helpers:plugins/must-gather",
|
||||
"normalized": {
|
||||
"repo": null,
|
||||
"ref": "refs/tags/v20251128.0",
|
||||
"commit": "5e7aea9c51347184db2f2a1db1029335d5e6b4b6",
|
||||
"treeHash": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716",
|
||||
"generatedAt": "2025-11-28T10:27:30.749372Z",
|
||||
"toolVersion": "publish_plugins.py@0.2.0"
|
||||
},
|
||||
"origin": {
|
||||
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||
"branch": "master",
|
||||
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||
},
|
||||
"manifest": {
|
||||
"name": "must-gather",
|
||||
"description": "A plugin to analyze and report on must-gather data",
|
||||
"version": "0.0.1"
|
||||
},
|
||||
"content": {
|
||||
"files": [
|
||||
{
|
||||
"path": "README.md",
|
||||
"sha256": "62224e0cac5af83035e53ea011e387f43c066eef4c2dd942ec536612c6d5a53c"
|
||||
},
|
||||
{
|
||||
"path": ".claude-plugin/plugin.json",
|
||||
"sha256": "9efe7a9dce84b69f125de43024e8760070e603b79d161e1662ea485eae8d02f9"
|
||||
},
|
||||
{
|
||||
"path": "commands/analyze.md",
|
||||
"sha256": "d4aa3d91663e16df1ff5176f992e62b99d9b7a0941fed5b8b24587ebcca23a7c"
|
||||
},
|
||||
{
|
||||
"path": "commands/ovn-dbs.md",
|
||||
"sha256": "768eb9b9489dae9c511f9d25b3472cbd878d44d06dfa73f29872c5e62c7e3aeb"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/SKILL.md",
|
||||
"sha256": "7d269e9a90a45600af26df99247f11152b69e9fb3eb09813f2e830dcecd21503"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_prometheus.py",
|
||||
"sha256": "5fd3f1580bf58cc8a0d967d8a4b08654f85c19ee2f3c3202e6e8e6db2469d56b"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_pvs.py",
|
||||
"sha256": "d05ef93f72853eb9c43b19a53d0d1765a9620fef08643e5ce2ce86559dd7f89f"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_pods.py",
|
||||
"sha256": "dc23a7d5822a5f572a2373fc30de6609f701a270bc6f1d0837efcfbada88d00e"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_network.py",
|
||||
"sha256": "6316796bfd2f4bc463fa4681b7b81198dcb9fba513bcec7b31f621314019470f"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_events.py",
|
||||
"sha256": "ff02feac6053c13a3e3a6c4e83c1cb5fc1e5818ada860f351c5dccb7d54724c6"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py",
|
||||
"sha256": "be537b03e1fee5e48ce936b544387e81aa6089a07893fa5ab329cc347980dde9"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_clusterversion.py",
|
||||
"sha256": "f1110d963ba18942861c25a7234d9a01cc0798c4ef76534a5631de857d96805b"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_nodes.py",
|
||||
"sha256": "90c8e2a9fa59b60946d5789cdaa2c602c5089227b4ec7f202c690b21b738d7c4"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_clusteroperators.py",
|
||||
"sha256": "51915d87c97af925f39fbccd16964bde4aafed62c28830882bb544f3ef8d8533"
|
||||
},
|
||||
{
|
||||
"path": "skills/must-gather-analyzer/scripts/analyze_etcd.py",
|
||||
"sha256": "03c6ef97197427acbce2c503d4644a113c01aef5e033ddd8ef464bba22e57962"
|
||||
}
|
||||
],
|
||||
"dirSha256": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716"
|
||||
},
|
||||
"security": {
|
||||
"scannedAt": null,
|
||||
"scannerVersion": null,
|
||||
"flags": []
|
||||
}
|
||||
}
|
||||
285
skills/must-gather-analyzer/SKILL.md
Normal file
285
skills/must-gather-analyzer/SKILL.md
Normal file
@@ -0,0 +1,285 @@
|
||||
---
|
||||
name: Must-Gather Analyzer
|
||||
description: |
|
||||
Analyze OpenShift must-gather diagnostic data including cluster operators, pods, nodes,
|
||||
and network components. Use this skill when the user asks about cluster health, operator status,
|
||||
pod issues, node conditions, or wants diagnostic insights from must-gather data.
|
||||
|
||||
Triggers: "analyze must-gather", "check cluster health", "operator status", "pod issues",
|
||||
"node status", "failing pods", "degraded operators", "cluster problems", "crashlooping",
|
||||
"network issues", "etcd health", "analyze clusteroperators", "analyze pods", "analyze nodes"
|
||||
---
|
||||
|
||||
# Must-Gather Analyzer Skill
|
||||
|
||||
Comprehensive analysis of OpenShift must-gather diagnostic data with helper scripts that parse YAML and display output in `oc`-like format.
|
||||
|
||||
## Overview
|
||||
|
||||
This skill provides analysis for:
|
||||
- **ClusterVersion**: Current version, update status, and capabilities
|
||||
- **Cluster Operators**: Status, degradation, and availability
|
||||
- **Pods**: Health, restarts, crashes, and failures across namespaces
|
||||
- **Nodes**: Conditions, capacity, and readiness
|
||||
- **Network**: OVN/SDN diagnostics and connectivity
|
||||
- **Events**: Warning and error events across namespaces
|
||||
- **etcd**: Cluster health, member status, and quorum
|
||||
- **Storage**: PersistentVolumes and PersistentVolumeClaims status
|
||||
|
||||
## Must-Gather Directory Structure
|
||||
|
||||
**Important**: Must-gather data is contained in a subdirectory with a long hash name:
|
||||
```
|
||||
must-gather/
|
||||
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
||||
├── cluster-scoped-resources/
|
||||
│ ├── config.openshift.io/clusteroperators/
|
||||
│ └── core/nodes/
|
||||
├── namespaces/
|
||||
│ └── <namespace>/
|
||||
│ └── pods/
|
||||
│ └── <pod-name>/
|
||||
│ └── <pod-name>.yaml
|
||||
└── network_logs/
|
||||
```
|
||||
|
||||
The analysis scripts expect the path to the **subdirectory** (the one with the hash), not the root must-gather folder.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Get Must-Gather Path
|
||||
Ask the user for the must-gather directory path if not already provided.
|
||||
- If they provide the root directory, look for the subdirectory with the hash name
|
||||
- The correct path contains `cluster-scoped-resources/` and `namespaces/` directories
|
||||
|
||||
### 2. Choose Analysis Type
|
||||
|
||||
Based on user's request, run the appropriate helper script:
|
||||
|
||||
#### ClusterVersion Analysis
|
||||
```bash
|
||||
./scripts/analyze_clusterversion.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows cluster version information similar to `oc get clusterversion`:
|
||||
- Current version and update status
|
||||
- Progressing state
|
||||
- Available updates
|
||||
- Version conditions
|
||||
- Enabled capabilities
|
||||
- Update history
|
||||
|
||||
#### Cluster Operators Analysis
|
||||
```bash
|
||||
./scripts/analyze_clusteroperators.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows cluster operator status similar to `oc get clusteroperators`:
|
||||
- Available, Progressing, Degraded conditions
|
||||
- Version information
|
||||
- Time since condition change
|
||||
- Detailed messages for operators with issues
|
||||
|
||||
#### Pods Analysis
|
||||
```bash
|
||||
# All namespaces
|
||||
./scripts/analyze_pods.py <must-gather-path>
|
||||
|
||||
# Specific namespace
|
||||
./scripts/analyze_pods.py <must-gather-path> --namespace <namespace>
|
||||
|
||||
# Show only problematic pods
|
||||
./scripts/analyze_pods.py <must-gather-path> --problems-only
|
||||
```
|
||||
|
||||
Shows pod status similar to `oc get pods -A`:
|
||||
- Ready/Total containers
|
||||
- Status (Running, Pending, CrashLoopBackOff, etc.)
|
||||
- Restart counts
|
||||
- Age
|
||||
- Categorized issues (crashlooping, pending, failed)
|
||||
|
||||
#### Nodes Analysis
|
||||
```bash
|
||||
./scripts/analyze_nodes.py <must-gather-path>
|
||||
|
||||
# Show only nodes with issues
|
||||
./scripts/analyze_nodes.py <must-gather-path> --problems-only
|
||||
```
|
||||
|
||||
Shows node status similar to `oc get nodes`:
|
||||
- Ready status
|
||||
- Roles (master, worker)
|
||||
- Age
|
||||
- Kubernetes version
|
||||
- Node conditions (DiskPressure, MemoryPressure, etc.)
|
||||
- Capacity and allocatable resources
|
||||
|
||||
#### Network Analysis
|
||||
```bash
|
||||
./scripts/analyze_network.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows network health:
|
||||
- Network type (OVN-Kubernetes, OpenShift SDN)
|
||||
- Network operator status
|
||||
- OVN pod health
|
||||
- PodNetworkConnectivityCheck results
|
||||
- Network-related issues
|
||||
|
||||
#### Events Analysis
|
||||
```bash
|
||||
# Recent events (last 100)
|
||||
./scripts/analyze_events.py <must-gather-path>
|
||||
|
||||
# Warning events only
|
||||
./scripts/analyze_events.py <must-gather-path> --type Warning
|
||||
|
||||
# Events in specific namespace
|
||||
./scripts/analyze_events.py <must-gather-path> --namespace openshift-etcd
|
||||
|
||||
# Show last 50 events
|
||||
./scripts/analyze_events.py <must-gather-path> --count 50
|
||||
```
|
||||
|
||||
Shows cluster events:
|
||||
- Event type (Warning, Normal)
|
||||
- Last seen timestamp
|
||||
- Reason and message
|
||||
- Affected object
|
||||
- Event count
|
||||
|
||||
#### etcd Analysis
|
||||
```bash
|
||||
./scripts/analyze_etcd.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows etcd cluster health:
|
||||
- Member health status
|
||||
- Member list with IDs and URLs
|
||||
- Endpoint status (leader, version, DB size)
|
||||
- Quorum status
|
||||
- Cluster summary
|
||||
|
||||
#### Storage Analysis
|
||||
```bash
|
||||
# All PVs and PVCs
|
||||
./scripts/analyze_pvs.py <must-gather-path>
|
||||
|
||||
# PVCs in specific namespace
|
||||
./scripts/analyze_pvs.py <must-gather-path> --namespace openshift-monitoring
|
||||
```
|
||||
|
||||
Shows storage resources:
|
||||
- PersistentVolumes (capacity, status, claims)
|
||||
- PersistentVolumeClaims (binding, capacity)
|
||||
- Storage classes
|
||||
- Pending/unbound volumes
|
||||
|
||||
#### Monitoring Analysis
|
||||
```bash
|
||||
# All alerts.
|
||||
./scripts/analyze_prometheus.py <must-gather-path>
|
||||
|
||||
# Alerts in specific namespace
|
||||
./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
|
||||
```
|
||||
|
||||
Shows monitoring information:
|
||||
- Alerts (state, namespace, name, active since, labels)
|
||||
- Total of pending/firing alerts
|
||||
|
||||
### 3. Interpret and Report
|
||||
|
||||
After running the scripts:
|
||||
1. Review the summary statistics
|
||||
2. Focus on items flagged with issues
|
||||
3. Provide actionable insights and next steps
|
||||
4. Suggest log analysis for specific components if needed
|
||||
5. Cross-reference issues (e.g., degraded operator → failing pods → node issues)
|
||||
|
||||
## Output Format
|
||||
|
||||
All scripts provide:
|
||||
- **Summary Section**: High-level statistics with emoji indicators
|
||||
- **Table View**: `oc`-like formatted output
|
||||
- **Issues Section**: Detailed breakdown of problems
|
||||
|
||||
Example summary format:
|
||||
```
|
||||
================================================================================
|
||||
SUMMARY: 25/28 operators healthy
|
||||
⚠️ 3 operators with issues
|
||||
🔄 1 progressing
|
||||
❌ 2 degraded
|
||||
================================================================================
|
||||
```
|
||||
|
||||
## Helper Scripts Reference
|
||||
|
||||
### scripts/analyze_clusterversion.py
|
||||
Parses: `cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml`
|
||||
Output: ClusterVersion table with detailed version info, conditions, and capabilities
|
||||
|
||||
### scripts/analyze_clusteroperators.py
|
||||
Parses: `cluster-scoped-resources/config.openshift.io/clusteroperators/`
|
||||
Output: ClusterOperator status table with conditions
|
||||
|
||||
### scripts/analyze_pods.py
|
||||
Parses: `namespaces/*/pods/*/*.yaml` (individual pod directories)
|
||||
Output: Pod status table with issues categorized
|
||||
|
||||
### scripts/analyze_nodes.py
|
||||
Parses: `cluster-scoped-resources/core/nodes/`
|
||||
Output: Node status table with conditions and capacity
|
||||
|
||||
### scripts/analyze_network.py
|
||||
Parses: `network_logs/`, network operator, OVN resources
|
||||
Output: Network health summary and diagnostics
|
||||
|
||||
### scripts/analyze_events.py
|
||||
Parses: `namespaces/*/core/events.yaml`
|
||||
Output: Event table sorted by last occurrence
|
||||
|
||||
### scripts/analyze_etcd.py
|
||||
Parses: `etcd_info/` (endpoint_health.json, member_list.json, endpoint_status.json)
|
||||
Output: etcd cluster health and member status
|
||||
|
||||
### scripts/analyze_pvs.py
|
||||
Parses: `cluster-scoped-resources/core/persistentvolumes/`, `namespaces/*/core/persistentvolumeclaims.yaml`
|
||||
Output: PV and PVC status tables
|
||||
|
||||
## Tips for Analysis
|
||||
|
||||
1. **Start with Cluster Operators**: They often reveal system-wide issues
|
||||
2. **Check Timing**: Look at "SINCE" columns to understand when issues started
|
||||
3. **Follow Dependencies**: Degraded operator → check its namespace pods → check hosting nodes
|
||||
4. **Look for Patterns**: Multiple pods failing on same node suggests node issue
|
||||
5. **Cross-reference**: Use multiple scripts together for complete picture
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### "Why is my cluster degraded?"
|
||||
1. Run `analyze_clusteroperators.py` - identify degraded operators
|
||||
2. Run `analyze_pods.py --namespace <operator-namespace>` - check operator pods
|
||||
3. Run `analyze_nodes.py` - verify node health
|
||||
|
||||
### "Pods keep crashing"
|
||||
1. Run `analyze_pods.py --problems-only` - find crashlooping pods
|
||||
2. Check which nodes they're on
|
||||
3. Run `analyze_nodes.py` - verify node conditions
|
||||
4. Suggest checking pod logs in must-gather data
|
||||
|
||||
### "Network connectivity issues"
|
||||
1. Run `analyze_network.py` - check network health
|
||||
2. Run `analyze_pods.py --namespace openshift-ovn-kubernetes`
|
||||
3. Check PodNetworkConnectivityCheck results
|
||||
|
||||
## Next Steps After Analysis
|
||||
|
||||
Based on findings, suggest:
|
||||
- Examining specific pod logs in `namespaces/<ns>/pods/<pod>/<container>/logs/`
|
||||
- Reviewing events in `namespaces/<ns>/core/events.yaml`
|
||||
- Checking audit logs in `audit_logs/`
|
||||
- Analyzing metrics data if available
|
||||
- Looking at host service logs in `host_service_logs/`
|
||||
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
@@ -0,0 +1,199 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze ClusterOperator resources from must-gather data.
|
||||
Displays output similar to 'oc get clusteroperators' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_clusteroperator(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single clusteroperator YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'ClusterOperator':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_condition_status(conditions: List[Dict], condition_type: str) -> tuple[str, str, str]:
|
||||
"""
|
||||
Get status, reason, and message for a specific condition type.
|
||||
Returns (status, reason, message).
|
||||
"""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
status = condition.get('status', 'Unknown')
|
||||
reason = condition.get('reason', '')
|
||||
message = condition.get('message', '')
|
||||
return status, reason, message
|
||||
return 'Unknown', '', ''
|
||||
|
||||
|
||||
def calculate_duration(timestamp_str: str) -> str:
|
||||
"""Calculate duration from timestamp to now."""
|
||||
try:
|
||||
# Parse Kubernetes timestamp format
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return "unknown"
|
||||
|
||||
|
||||
def get_condition_duration(conditions: List[Dict], condition_type: str) -> str:
|
||||
"""Get the duration since a condition last transitioned."""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
last_transition = condition.get('lastTransitionTime')
|
||||
if last_transition:
|
||||
return calculate_duration(last_transition)
|
||||
return ""
|
||||
|
||||
|
||||
def format_operator_row(operator: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a ClusterOperator into a row for display."""
|
||||
name = operator.get('metadata', {}).get('name', 'unknown')
|
||||
conditions = operator.get('status', {}).get('conditions', [])
|
||||
versions = operator.get('status', {}).get('versions', [])
|
||||
|
||||
# Get version (first version in the list, usually the operator version)
|
||||
version = versions[0].get('version', '') if versions else ''
|
||||
|
||||
# Get condition statuses
|
||||
available_status, _, _ = get_condition_status(conditions, 'Available')
|
||||
progressing_status, _, _ = get_condition_status(conditions, 'Progressing')
|
||||
degraded_status, degraded_reason, degraded_msg = get_condition_status(conditions, 'Degraded')
|
||||
|
||||
# Determine which condition to show duration and message for
|
||||
# Priority: Degraded > Progressing > Available (if false)
|
||||
if degraded_status == 'True':
|
||||
since = get_condition_duration(conditions, 'Degraded')
|
||||
message = degraded_msg if degraded_msg else degraded_reason
|
||||
elif progressing_status == 'True':
|
||||
since = get_condition_duration(conditions, 'Progressing')
|
||||
_, prog_reason, prog_msg = get_condition_status(conditions, 'Progressing')
|
||||
message = prog_msg if prog_msg else prog_reason
|
||||
elif available_status == 'False':
|
||||
since = get_condition_duration(conditions, 'Available')
|
||||
_, avail_reason, avail_msg = get_condition_status(conditions, 'Available')
|
||||
message = avail_msg if avail_msg else avail_reason
|
||||
else:
|
||||
# All good, show time since available
|
||||
since = get_condition_duration(conditions, 'Available')
|
||||
message = ''
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'version': version,
|
||||
'available': available_status,
|
||||
'progressing': progressing_status,
|
||||
'degraded': degraded_status,
|
||||
'since': since,
|
||||
'message': message
|
||||
}
|
||||
|
||||
|
||||
def print_operators_table(operators: List[Dict[str, str]]):
|
||||
"""Print operators in a formatted table like 'oc get clusteroperators'."""
|
||||
if not operators:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header - no width limit on VERSION to match oc output
|
||||
print(f"{'NAME':<42} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'DEGRADED':<10} {'SINCE':<7} MESSAGE")
|
||||
|
||||
# Print rows
|
||||
for op in operators:
|
||||
name = op['name'][:42]
|
||||
version = op['version'] # Don't truncate version
|
||||
available = op['available'][:11]
|
||||
progressing = op['progressing'][:13]
|
||||
degraded = op['degraded'][:10]
|
||||
since = op['since'][:7]
|
||||
message = op['message']
|
||||
|
||||
print(f"{name:<42} {version:<50} {available:<11} {progressing:<13} {degraded:<10} {since:<7} {message}")
|
||||
|
||||
|
||||
def analyze_clusteroperators(must_gather_path: str):
|
||||
"""Analyze all clusteroperators in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Common paths where clusteroperators might be
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||
]
|
||||
|
||||
clusteroperators = []
|
||||
|
||||
# Find and parse all clusteroperator files
|
||||
for pattern in possible_patterns:
|
||||
for co_file in base_path.glob(pattern):
|
||||
operator = parse_clusteroperator(co_file)
|
||||
if operator:
|
||||
clusteroperators.append(operator)
|
||||
|
||||
if not clusteroperators:
|
||||
print("No resources found.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Remove duplicates (same operator from different glob patterns)
|
||||
seen = set()
|
||||
unique_operators = []
|
||||
for op in clusteroperators:
|
||||
name = op.get('metadata', {}).get('name')
|
||||
if name and name not in seen:
|
||||
seen.add(name)
|
||||
unique_operators.append(op)
|
||||
|
||||
# Format and sort operators by name
|
||||
formatted_ops = [format_operator_row(op) for op in unique_operators]
|
||||
formatted_ops.sort(key=lambda x: x['name'])
|
||||
|
||||
# Print results
|
||||
print_operators_table(formatted_ops)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_clusteroperators.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_clusteroperators.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_clusteroperators(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
@@ -0,0 +1,261 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze ClusterVersion from must-gather data.
|
||||
Displays output similar to 'oc get clusterversion' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_clusterversion(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse the clusterversion YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'ClusterVersion':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_condition_status(conditions: list, condition_type: str) -> str:
|
||||
"""Get status for a specific condition type."""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
return condition.get('status', 'Unknown')
|
||||
return 'Unknown'
|
||||
|
||||
|
||||
def calculate_duration(timestamp_str: str) -> str:
|
||||
"""Calculate duration from timestamp to now."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def format_clusterversion(cv: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format ClusterVersion for display."""
|
||||
name = cv.get('metadata', {}).get('name', 'version')
|
||||
status = cv.get('status', {})
|
||||
|
||||
# Get version from desired
|
||||
desired = status.get('desired', {})
|
||||
version = desired.get('version', '')
|
||||
|
||||
# Get available updates count
|
||||
available_updates = status.get('availableUpdates')
|
||||
if available_updates and isinstance(available_updates, list):
|
||||
available = str(len(available_updates))
|
||||
elif available_updates is None:
|
||||
available = ''
|
||||
else:
|
||||
available = '0'
|
||||
|
||||
# Get conditions
|
||||
conditions = status.get('conditions', [])
|
||||
progressing = get_condition_status(conditions, 'Progressing')
|
||||
since = ''
|
||||
|
||||
# Get time since progressing started (if true) or since last update
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Progressing':
|
||||
last_transition = condition.get('lastTransitionTime')
|
||||
if last_transition:
|
||||
since = calculate_duration(last_transition)
|
||||
break
|
||||
|
||||
# Get status message
|
||||
status_msg = ''
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Progressing' and condition.get('status') == 'True':
|
||||
status_msg = condition.get('message', '')[:80]
|
||||
break
|
||||
|
||||
# If not progressing, check if failed
|
||||
if progressing != 'True':
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Failing' and condition.get('status') == 'True':
|
||||
status_msg = condition.get('message', '')[:80]
|
||||
break
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'version': version,
|
||||
'available': available,
|
||||
'progressing': progressing,
|
||||
'since': since,
|
||||
'status': status_msg
|
||||
}
|
||||
|
||||
|
||||
def print_clusterversion_table(cv_info: Dict[str, str]):
|
||||
"""Print ClusterVersion in a formatted table like 'oc get clusterversion'."""
|
||||
# Print header
|
||||
print(f"{'NAME':<10} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'SINCE':<7} STATUS")
|
||||
|
||||
# Print row
|
||||
name = cv_info['name'][:10]
|
||||
version = cv_info['version'][:50]
|
||||
available = cv_info['available'][:11]
|
||||
progressing = cv_info['progressing'][:13]
|
||||
since = cv_info['since'][:7]
|
||||
status = cv_info['status']
|
||||
|
||||
print(f"{name:<10} {version:<50} {available:<11} {progressing:<13} {since:<7} {status}")
|
||||
|
||||
|
||||
def print_detailed_info(cv: Dict[str, Any]):
|
||||
"""Print detailed cluster version information."""
|
||||
status = cv.get('status', {})
|
||||
spec = cv.get('spec', {})
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print("CLUSTER VERSION DETAILS")
|
||||
print(f"{'='*80}")
|
||||
|
||||
# Cluster ID
|
||||
cluster_id = spec.get('clusterID', 'unknown')
|
||||
print(f"Cluster ID: {cluster_id}")
|
||||
|
||||
# Desired version
|
||||
desired = status.get('desired', {})
|
||||
print(f"Desired Version: {desired.get('version', 'unknown')}")
|
||||
print(f"Desired Image: {desired.get('image', 'unknown')}")
|
||||
|
||||
# Version hash
|
||||
version_hash = status.get('versionHash', '')
|
||||
if version_hash:
|
||||
print(f"Version Hash: {version_hash}")
|
||||
|
||||
# Upstream
|
||||
upstream = spec.get('upstream', '')
|
||||
if upstream:
|
||||
print(f"Update Server: {upstream}")
|
||||
|
||||
# Conditions
|
||||
conditions = status.get('conditions', [])
|
||||
print(f"\nCONDITIONS:")
|
||||
for condition in conditions:
|
||||
cond_type = condition.get('type', 'Unknown')
|
||||
cond_status = condition.get('status', 'Unknown')
|
||||
last_transition = condition.get('lastTransitionTime', '')
|
||||
message = condition.get('message', '')
|
||||
|
||||
# Calculate time since transition
|
||||
age = calculate_duration(last_transition) if last_transition else ''
|
||||
|
||||
status_indicator = "✅" if cond_status == "True" else "❌" if cond_status == "False" else "❓"
|
||||
print(f" {status_indicator} {cond_type}: {cond_status} (for {age})")
|
||||
if message and cond_status == 'True':
|
||||
print(f" Message: {message[:100]}")
|
||||
|
||||
# Update history
|
||||
history = status.get('history', [])
|
||||
if history:
|
||||
print(f"\nUPDATE HISTORY (last 5):")
|
||||
for i, entry in enumerate(history[:5]):
|
||||
state = entry.get('state', 'Unknown')
|
||||
version = entry.get('version', 'unknown')
|
||||
image = entry.get('image', '')
|
||||
completion_time = entry.get('completionTime', '')
|
||||
|
||||
age = calculate_duration(completion_time) if completion_time else ''
|
||||
print(f" {i+1}. {version} - {state} {f'({age} ago)' if age else ''}")
|
||||
|
||||
# Available updates
|
||||
available_updates = status.get('availableUpdates')
|
||||
if available_updates and isinstance(available_updates, list) and len(available_updates) > 0:
|
||||
print(f"\nAVAILABLE UPDATES ({len(available_updates)}):")
|
||||
for i, update in enumerate(available_updates[:5]):
|
||||
version = update.get('version', 'unknown')
|
||||
image = update.get('image', '')
|
||||
print(f" {i+1}. {version}")
|
||||
elif available_updates is None:
|
||||
print(f"\nAVAILABLE UPDATES: Unable to retrieve updates")
|
||||
|
||||
# Capabilities
|
||||
capabilities = status.get('capabilities', {})
|
||||
enabled_caps = capabilities.get('enabledCapabilities', [])
|
||||
if enabled_caps:
|
||||
print(f"\nENABLED CAPABILITIES ({len(enabled_caps)}):")
|
||||
# Print in columns
|
||||
for i in range(0, len(enabled_caps), 3):
|
||||
caps = enabled_caps[i:i+3]
|
||||
print(f" {', '.join(caps)}")
|
||||
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
|
||||
def analyze_clusterversion(must_gather_path: str):
|
||||
"""Analyze ClusterVersion in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find ClusterVersion file
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||
]
|
||||
|
||||
cv = None
|
||||
for pattern in possible_patterns:
|
||||
for cv_file in base_path.glob(pattern):
|
||||
cv = parse_clusterversion(cv_file)
|
||||
if cv:
|
||||
break
|
||||
if cv:
|
||||
break
|
||||
|
||||
if not cv:
|
||||
print("No ClusterVersion found.")
|
||||
return 1
|
||||
|
||||
# Format and print table
|
||||
cv_info = format_clusterversion(cv)
|
||||
print_clusterversion_table(cv_info)
|
||||
|
||||
# Print detailed information
|
||||
print_detailed_info(cv)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_clusterversion.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_clusterversion.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_clusterversion(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
@@ -0,0 +1,206 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze etcd information from must-gather data.
|
||||
Shows etcd cluster health, member status, and diagnostics.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
|
||||
|
||||
def parse_etcd_info(must_gather_path: Path) -> Dict[str, Any]:
|
||||
"""Parse etcd_info directory for cluster health information."""
|
||||
etcd_data = {
|
||||
'member_health': [],
|
||||
'member_list': [],
|
||||
'endpoint_health': [],
|
||||
'endpoint_status': []
|
||||
}
|
||||
|
||||
# Find etcd_info directory
|
||||
etcd_dirs = list(must_gather_path.glob("etcd_info")) + \
|
||||
list(must_gather_path.glob("*/etcd_info"))
|
||||
|
||||
if not etcd_dirs:
|
||||
return etcd_data
|
||||
|
||||
etcd_info_dir = etcd_dirs[0]
|
||||
|
||||
# Parse member health
|
||||
member_health_file = etcd_info_dir / "endpoint_health.json"
|
||||
if member_health_file.exists():
|
||||
try:
|
||||
with open(member_health_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['member_health'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse member list
|
||||
member_list_file = etcd_info_dir / "member_list.json"
|
||||
if member_list_file.exists():
|
||||
try:
|
||||
with open(member_list_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
if isinstance(data, dict) and 'members' in data:
|
||||
etcd_data['member_list'] = data['members']
|
||||
elif isinstance(data, list):
|
||||
etcd_data['member_list'] = data
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse member_list.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse endpoint health
|
||||
endpoint_health_file = etcd_info_dir / "endpoint_health.json"
|
||||
if endpoint_health_file.exists():
|
||||
try:
|
||||
with open(endpoint_health_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['endpoint_health'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse endpoint status
|
||||
endpoint_status_file = etcd_info_dir / "endpoint_status.json"
|
||||
if endpoint_status_file.exists():
|
||||
try:
|
||||
with open(endpoint_status_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['endpoint_status'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_status.json: {e}", file=sys.stderr)
|
||||
|
||||
return etcd_data
|
||||
|
||||
|
||||
def print_member_health(members: List[Dict[str, Any]]):
|
||||
"""Print etcd member health status."""
|
||||
if not members:
|
||||
print("No member health data found.")
|
||||
return
|
||||
|
||||
print("ETCD MEMBER HEALTH")
|
||||
print(f"{'ENDPOINT':<60} {'HEALTH':<10} {'TOOK':<10} ERROR")
|
||||
|
||||
for member in members:
|
||||
endpoint = member.get('endpoint', 'unknown')[:60]
|
||||
health = 'true' if member.get('health') else 'false'
|
||||
took = member.get('took', '')
|
||||
error = member.get('error', '')
|
||||
|
||||
print(f"{endpoint:<60} {health:<10} {took:<10} {error}")
|
||||
|
||||
|
||||
def print_member_list(members: List[Dict[str, Any]]):
|
||||
"""Print etcd member list."""
|
||||
if not members:
|
||||
print("\nNo member list data found.")
|
||||
return
|
||||
|
||||
print("\nETCD MEMBER LIST")
|
||||
print(f"{'ID':<20} {'NAME':<40} {'PEER URLS':<60} {'CLIENT URLS':<60}")
|
||||
|
||||
for member in members:
|
||||
member_id = str(member.get('ID', member.get('id', 'unknown')))[:20]
|
||||
name = member.get('name', 'unknown')[:40]
|
||||
peer_urls = ','.join(member.get('peerURLs', []))[:60]
|
||||
client_urls = ','.join(member.get('clientURLs', []))[:60]
|
||||
|
||||
print(f"{member_id:<20} {name:<40} {peer_urls:<60} {client_urls:<60}")
|
||||
|
||||
|
||||
def print_endpoint_status(endpoints: List[Dict[str, Any]]):
|
||||
"""Print etcd endpoint status."""
|
||||
if not endpoints:
|
||||
print("\nNo endpoint status data found.")
|
||||
return
|
||||
|
||||
print("\nETCD ENDPOINT STATUS")
|
||||
print(f"{'ENDPOINT':<60} {'LEADER':<20} {'VERSION':<10} {'DB SIZE':<10} {'IS LEARNER'}")
|
||||
|
||||
for endpoint in endpoints:
|
||||
ep = endpoint.get('Endpoint', 'unknown')[:60]
|
||||
|
||||
status = endpoint.get('Status', {})
|
||||
leader = str(status.get('leader', 'unknown'))[:20]
|
||||
version = status.get('version', 'unknown')[:10]
|
||||
|
||||
db_size = status.get('dbSize', 0)
|
||||
db_size_mb = f"{db_size / (1024*1024):.1f}MB" if db_size else '0MB'
|
||||
|
||||
is_learner = 'true' if status.get('isLearner') else 'false'
|
||||
|
||||
print(f"{ep:<60} {leader:<20} {version:<10} {db_size_mb:<10} {is_learner}")
|
||||
|
||||
|
||||
def print_summary(etcd_data: Dict[str, Any]):
|
||||
"""Print summary of etcd cluster health."""
|
||||
member_health = etcd_data.get('member_health', [])
|
||||
member_list = etcd_data.get('member_list', [])
|
||||
|
||||
total_members = len(member_list)
|
||||
healthy_members = sum(1 for m in member_health if m.get('health'))
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"ETCD CLUSTER SUMMARY")
|
||||
print(f"{'='*80}")
|
||||
print(f"Total Members: {total_members}")
|
||||
print(f"Healthy Members: {healthy_members}/{len(member_health) if member_health else total_members}")
|
||||
|
||||
if healthy_members < total_members:
|
||||
print(f" ⚠️ Warning: Not all members are healthy!")
|
||||
elif healthy_members == total_members and total_members > 0:
|
||||
print(f" ✅ All members healthy")
|
||||
|
||||
# Check for quorum
|
||||
if total_members >= 3:
|
||||
quorum = (total_members // 2) + 1
|
||||
if healthy_members >= quorum:
|
||||
print(f" ✅ Quorum achieved ({healthy_members}/{quorum})")
|
||||
else:
|
||||
print(f" ❌ Quorum lost! ({healthy_members}/{quorum})")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
|
||||
def analyze_etcd(must_gather_path: str):
|
||||
"""Analyze etcd information in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
etcd_data = parse_etcd_info(base_path)
|
||||
|
||||
if not any(etcd_data.values()):
|
||||
print("No etcd_info data found in must-gather.")
|
||||
print("Expected location: etcd_info/ directory")
|
||||
return 1
|
||||
|
||||
# Print summary first
|
||||
print_summary(etcd_data)
|
||||
|
||||
# Print detailed information
|
||||
print_member_health(etcd_data.get('member_health', []))
|
||||
print_member_list(etcd_data.get('member_list', []))
|
||||
print_endpoint_status(etcd_data.get('endpoint_status', []))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_etcd.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_etcd.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_etcd(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
@@ -0,0 +1,201 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Events from must-gather data.
|
||||
Shows warning and error events sorted by last occurrence.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
def parse_events_file(file_path: Path) -> List[Dict[str, Any]]:
|
||||
"""Parse events YAML file which may contain multiple events."""
|
||||
events = []
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
docs = yaml.safe_load_all(f)
|
||||
for doc in docs:
|
||||
if doc and doc.get('kind') == 'Event':
|
||||
events.append(doc)
|
||||
elif doc and doc.get('kind') == 'EventList':
|
||||
# Handle EventList
|
||||
events.extend(doc.get('items', []))
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return events
|
||||
|
||||
|
||||
def calculate_age(timestamp_str: str) -> str:
|
||||
"""Calculate age from timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def format_event(event: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Format an event for display."""
|
||||
metadata = event.get('metadata', {})
|
||||
|
||||
namespace = metadata.get('namespace', '')
|
||||
name = metadata.get('name', 'unknown')
|
||||
|
||||
# Get last timestamp
|
||||
last_timestamp = event.get('lastTimestamp') or event.get('eventTime') or metadata.get('creationTimestamp', '')
|
||||
age = calculate_age(last_timestamp) if last_timestamp else ''
|
||||
|
||||
# Event details
|
||||
event_type = event.get('type', 'Normal')
|
||||
reason = event.get('reason', '')
|
||||
message = event.get('message', '')
|
||||
count = event.get('count', 1)
|
||||
|
||||
# Involved object
|
||||
involved = event.get('involvedObject', {})
|
||||
obj_kind = involved.get('kind', '')
|
||||
obj_name = involved.get('name', '')
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'last_seen': age,
|
||||
'type': event_type,
|
||||
'reason': reason,
|
||||
'object_kind': obj_kind,
|
||||
'object_name': obj_name,
|
||||
'message': message,
|
||||
'count': count,
|
||||
'timestamp': last_timestamp
|
||||
}
|
||||
|
||||
|
||||
def print_events_table(events: List[Dict[str, Any]]):
|
||||
"""Print events in a table format."""
|
||||
if not events:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
print(f"{'NAMESPACE':<30} {'LAST SEEN':<10} {'TYPE':<10} {'REASON':<30} {'OBJECT':<40} {'MESSAGE':<60}")
|
||||
|
||||
# Print rows
|
||||
for event in events:
|
||||
namespace = event['namespace'][:30] if event['namespace'] else '<cluster>'
|
||||
last_seen = event['last_seen'][:10]
|
||||
event_type = event['type'][:10]
|
||||
reason = event['reason'][:30]
|
||||
obj = f"{event['object_kind']}/{event['object_name']}"[:40]
|
||||
message = event['message'][:60]
|
||||
|
||||
print(f"{namespace:<30} {last_seen:<10} {event_type:<10} {reason:<30} {obj:<40} {message:<60}")
|
||||
|
||||
|
||||
def analyze_events(must_gather_path: str, namespace: Optional[str] = None,
|
||||
event_type: Optional[str] = None, show_count: int = 100):
|
||||
"""Analyze events in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
all_events = []
|
||||
|
||||
# Find all events files
|
||||
if namespace:
|
||||
patterns = [
|
||||
f"namespaces/{namespace}/core/events.yaml",
|
||||
f"*/namespaces/{namespace}/core/events.yaml",
|
||||
]
|
||||
else:
|
||||
patterns = [
|
||||
"namespaces/*/core/events.yaml",
|
||||
"*/namespaces/*/core/events.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for events_file in base_path.glob(pattern):
|
||||
events = parse_events_file(events_file)
|
||||
all_events.extend(events)
|
||||
|
||||
if not all_events:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Format events
|
||||
formatted_events = [format_event(e) for e in all_events]
|
||||
|
||||
# Filter by type if specified
|
||||
if event_type:
|
||||
formatted_events = [e for e in formatted_events if e['type'].lower() == event_type.lower()]
|
||||
|
||||
# Sort by timestamp (most recent first)
|
||||
formatted_events.sort(key=lambda x: x['timestamp'], reverse=True)
|
||||
|
||||
# Limit count
|
||||
if show_count and show_count > 0:
|
||||
formatted_events = formatted_events[:show_count]
|
||||
|
||||
# Print results
|
||||
print_events_table(formatted_events)
|
||||
|
||||
# Summary
|
||||
total = len(formatted_events)
|
||||
warnings = sum(1 for e in formatted_events if e['type'] == 'Warning')
|
||||
normal = sum(1 for e in formatted_events if e['type'] == 'Normal')
|
||||
|
||||
print(f"\nShowing {total} most recent events")
|
||||
if warnings > 0:
|
||||
print(f" ⚠️ {warnings} Warning events")
|
||||
if normal > 0:
|
||||
print(f" ℹ️ {normal} Normal events")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze events from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-etcd
|
||||
%(prog)s ./must-gather --type Warning
|
||||
%(prog)s ./must-gather --count 50
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||
parser.add_argument('-t', '--type', help='Filter by event type (Warning, Normal)')
|
||||
parser.add_argument('-c', '--count', type=int, default=100,
|
||||
help='Number of events to show (default: 100)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_events(args.must_gather_path, args.namespace, args.type, args.count)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
@@ -0,0 +1,281 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Network resources and diagnostics from must-gather data.
|
||||
Shows network operator status, OVN pods, and connectivity checks.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_network_type(must_gather_path: Path) -> str:
|
||||
"""Determine the network type from cluster network config."""
|
||||
# First try to find networks.yaml (List object)
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for network_file in must_gather_path.glob(pattern):
|
||||
network_list = parse_yaml_file(network_file)
|
||||
if network_list:
|
||||
# Handle NetworkList object
|
||||
items = network_list.get('items', [])
|
||||
if items:
|
||||
# Get the first network item
|
||||
network = items[0]
|
||||
spec = network.get('spec', {})
|
||||
network_type = spec.get('networkType', 'Unknown')
|
||||
if network_type != 'Unknown':
|
||||
return network_type
|
||||
|
||||
# Fallback: try individual network config files
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for network_file in must_gather_path.glob(pattern):
|
||||
if network_file.name in ['networks.yaml']:
|
||||
continue
|
||||
|
||||
network = parse_yaml_file(network_file)
|
||||
if network:
|
||||
spec = network.get('spec', {})
|
||||
network_type = spec.get('networkType', 'Unknown')
|
||||
if network_type != 'Unknown':
|
||||
return network_type
|
||||
|
||||
return 'Unknown'
|
||||
|
||||
|
||||
def analyze_network_operator(must_gather_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Analyze network operator status."""
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for op_file in must_gather_path.glob(pattern):
|
||||
operator = parse_yaml_file(op_file)
|
||||
if operator:
|
||||
conditions = operator.get('status', {}).get('conditions', [])
|
||||
result = {}
|
||||
|
||||
for cond in conditions:
|
||||
cond_type = cond.get('type')
|
||||
if cond_type in ['Available', 'Progressing', 'Degraded']:
|
||||
result[cond_type] = cond.get('status', 'Unknown')
|
||||
result[f'{cond_type}_message'] = cond.get('message', '')
|
||||
|
||||
return result
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def analyze_ovn_pods(must_gather_path: Path) -> List[Dict[str, str]]:
|
||||
"""Analyze OVN-Kubernetes pods."""
|
||||
pods = []
|
||||
|
||||
patterns = [
|
||||
"namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||
"*/namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for pod_file in must_gather_path.glob(pattern):
|
||||
if pod_file.name == 'pods.yaml':
|
||||
continue
|
||||
|
||||
pod = parse_yaml_file(pod_file)
|
||||
if pod:
|
||||
name = pod.get('metadata', {}).get('name', 'unknown')
|
||||
status = pod.get('status', {})
|
||||
phase = status.get('phase', 'Unknown')
|
||||
|
||||
container_statuses = status.get('containerStatuses', [])
|
||||
total = len(pod.get('spec', {}).get('containers', []))
|
||||
ready = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||
|
||||
pods.append({
|
||||
'name': name,
|
||||
'ready': f"{ready}/{total}",
|
||||
'status': phase
|
||||
})
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_pods = []
|
||||
for p in pods:
|
||||
if p['name'] not in seen:
|
||||
seen.add(p['name'])
|
||||
unique_pods.append(p)
|
||||
|
||||
return sorted(unique_pods, key=lambda x: x['name'])
|
||||
|
||||
|
||||
def analyze_connectivity_checks(must_gather_path: Path) -> Dict[str, Any]:
|
||||
"""Analyze PodNetworkConnectivityCheck resources."""
|
||||
# First try to find podnetworkconnectivitychecks.yaml (List object)
|
||||
patterns = [
|
||||
"pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||
"*/pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||
]
|
||||
|
||||
total_checks = 0
|
||||
failed_checks = []
|
||||
|
||||
for pattern in patterns:
|
||||
for check_file in must_gather_path.glob(pattern):
|
||||
check_list = parse_yaml_file(check_file)
|
||||
if check_list:
|
||||
items = check_list.get('items', [])
|
||||
for check in items:
|
||||
total_checks += 1
|
||||
name = check.get('metadata', {}).get('name', 'unknown')
|
||||
status = check.get('status', {})
|
||||
|
||||
conditions = status.get('conditions', [])
|
||||
for cond in conditions:
|
||||
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||
failed_checks.append({
|
||||
'name': name,
|
||||
'message': cond.get('message', 'Unknown')
|
||||
})
|
||||
|
||||
# If we found the list file, no need to continue
|
||||
if total_checks > 0:
|
||||
return {
|
||||
'total': total_checks,
|
||||
'failed': failed_checks
|
||||
}
|
||||
|
||||
# Fallback: try individual check files
|
||||
patterns = [
|
||||
"*/pod_network_connectivity_check/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for check_file in must_gather_path.glob(pattern):
|
||||
if check_file.name == 'podnetworkconnectivitychecks.yaml':
|
||||
continue
|
||||
|
||||
check = parse_yaml_file(check_file)
|
||||
if check:
|
||||
total_checks += 1
|
||||
name = check.get('metadata', {}).get('name', 'unknown')
|
||||
status = check.get('status', {})
|
||||
|
||||
conditions = status.get('conditions', [])
|
||||
for cond in conditions:
|
||||
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||
failed_checks.append({
|
||||
'name': name,
|
||||
'message': cond.get('message', 'Unknown')
|
||||
})
|
||||
|
||||
return {
|
||||
'total': total_checks,
|
||||
'failed': failed_checks
|
||||
}
|
||||
|
||||
|
||||
def print_network_summary(network_type: str, operator_status: Optional[Dict],
|
||||
ovn_pods: List[Dict], connectivity: Dict):
|
||||
"""Print network analysis summary."""
|
||||
print(f"{'NETWORK TYPE':<30} {network_type}")
|
||||
print()
|
||||
|
||||
if operator_status:
|
||||
print("NETWORK OPERATOR STATUS")
|
||||
print(f"{'Available':<15} {operator_status.get('Available', 'Unknown')}")
|
||||
print(f"{'Progressing':<15} {operator_status.get('Progressing', 'Unknown')}")
|
||||
print(f"{'Degraded':<15} {operator_status.get('Degraded', 'Unknown')}")
|
||||
|
||||
if operator_status.get('Degraded') == 'True':
|
||||
msg = operator_status.get('Degraded_message', '')
|
||||
if msg:
|
||||
print(f" Message: {msg}")
|
||||
print()
|
||||
|
||||
if ovn_pods and network_type == 'OVNKubernetes':
|
||||
print("OVN-KUBERNETES PODS")
|
||||
print(f"{'NAME':<60} {'READY':<10} STATUS")
|
||||
for pod in ovn_pods:
|
||||
name = pod['name'][:60]
|
||||
ready = pod['ready'][:10]
|
||||
status = pod['status']
|
||||
print(f"{name:<60} {ready:<10} {status}")
|
||||
print()
|
||||
|
||||
if connectivity['total'] > 0:
|
||||
print(f"NETWORK CONNECTIVITY CHECKS: {connectivity['total']} total")
|
||||
if connectivity['failed']:
|
||||
print(f" Failed: {len(connectivity['failed'])}")
|
||||
for failed in connectivity['failed'][:10]: # Show first 10
|
||||
print(f" - {failed['name']}")
|
||||
if failed['message']:
|
||||
print(f" {failed['message'][:100]}")
|
||||
else:
|
||||
print(" All checks passing")
|
||||
print()
|
||||
|
||||
|
||||
def analyze_network(must_gather_path: str):
|
||||
"""Analyze network resources in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Get network type
|
||||
network_type = get_network_type(base_path)
|
||||
|
||||
# Get network operator status
|
||||
operator_status = analyze_network_operator(base_path)
|
||||
|
||||
# Get OVN pods if applicable
|
||||
ovn_pods = []
|
||||
if network_type == 'OVNKubernetes':
|
||||
ovn_pods = analyze_ovn_pods(base_path)
|
||||
|
||||
# Get connectivity checks
|
||||
connectivity = analyze_connectivity_checks(base_path)
|
||||
|
||||
# Print summary
|
||||
print_network_summary(network_type, operator_status, ovn_pods, connectivity)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_network.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_network.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_network(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
@@ -0,0 +1,224 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Node resources from must-gather data.
|
||||
Displays output similar to 'oc get nodes' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_node(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single node YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'Node':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def calculate_age(creation_timestamp: str) -> str:
|
||||
"""Calculate age from creation timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
else:
|
||||
return "<1h"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def get_node_roles(labels: Dict[str, str]) -> str:
|
||||
"""Extract node roles from labels."""
|
||||
roles = []
|
||||
for key in labels:
|
||||
if key.startswith('node-role.kubernetes.io/'):
|
||||
role = key.split('/')[-1]
|
||||
if role:
|
||||
roles.append(role)
|
||||
|
||||
return ','.join(sorted(roles)) if roles else '<none>'
|
||||
|
||||
|
||||
def get_node_status(node: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract node status information."""
|
||||
metadata = node.get('metadata', {})
|
||||
status = node.get('status', {})
|
||||
|
||||
name = metadata.get('name', 'unknown')
|
||||
labels = metadata.get('labels', {})
|
||||
creation_time = metadata.get('creationTimestamp', '')
|
||||
|
||||
# Get roles
|
||||
roles = get_node_roles(labels)
|
||||
|
||||
# Get conditions
|
||||
conditions = status.get('conditions', [])
|
||||
ready_condition = 'Unknown'
|
||||
node_issues = []
|
||||
|
||||
for condition in conditions:
|
||||
cond_type = condition.get('type', '')
|
||||
cond_status = condition.get('status', 'Unknown')
|
||||
|
||||
if cond_type == 'Ready':
|
||||
ready_condition = cond_status
|
||||
elif cond_status == 'True' and cond_type in ['MemoryPressure', 'DiskPressure', 'PIDPressure', 'NetworkUnavailable']:
|
||||
node_issues.append(cond_type)
|
||||
|
||||
# Determine overall status
|
||||
if ready_condition == 'True':
|
||||
node_status = 'Ready'
|
||||
elif ready_condition == 'False':
|
||||
node_status = 'NotReady'
|
||||
else:
|
||||
node_status = 'Unknown'
|
||||
|
||||
# Add issues to status
|
||||
if node_issues:
|
||||
node_status = f"{node_status},{','.join(node_issues)}"
|
||||
|
||||
# Get version
|
||||
node_info = status.get('nodeInfo', {})
|
||||
version = node_info.get('kubeletVersion', '')
|
||||
|
||||
# Get age
|
||||
age = calculate_age(creation_time) if creation_time else ''
|
||||
|
||||
# Internal IP
|
||||
addresses = status.get('addresses', [])
|
||||
internal_ip = ''
|
||||
for addr in addresses:
|
||||
if addr.get('type') == 'InternalIP':
|
||||
internal_ip = addr.get('address', '')
|
||||
break
|
||||
|
||||
# OS Image
|
||||
os_image = node_info.get('osImage', '')
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'status': node_status,
|
||||
'roles': roles,
|
||||
'age': age,
|
||||
'version': version,
|
||||
'internal_ip': internal_ip,
|
||||
'os_image': os_image,
|
||||
'is_problem': node_status != 'Ready' or len(node_issues) > 0
|
||||
}
|
||||
|
||||
|
||||
def print_nodes_table(nodes: List[Dict[str, Any]]):
|
||||
"""Print nodes in a formatted table like 'oc get nodes'."""
|
||||
if not nodes:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
print(f"{'NAME':<50} {'STATUS':<30} {'ROLES':<20} {'AGE':<7} VERSION")
|
||||
|
||||
# Print rows
|
||||
for node in nodes:
|
||||
name = node['name'][:50]
|
||||
status = node['status'][:30]
|
||||
roles = node['roles'][:20]
|
||||
age = node['age'][:7]
|
||||
version = node['version']
|
||||
|
||||
print(f"{name:<50} {status:<30} {roles:<20} {age:<7} {version}")
|
||||
|
||||
|
||||
def analyze_nodes(must_gather_path: str, problems_only: bool = False):
|
||||
"""Analyze all nodes in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find all node YAML files
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/core/nodes/*.yaml",
|
||||
"*/cluster-scoped-resources/core/nodes/*.yaml",
|
||||
]
|
||||
|
||||
nodes = []
|
||||
|
||||
for pattern in possible_patterns:
|
||||
for node_file in base_path.glob(pattern):
|
||||
# Skip the nodes.yaml file that contains all nodes
|
||||
if node_file.name == 'nodes.yaml':
|
||||
continue
|
||||
|
||||
node = parse_node(node_file)
|
||||
if node:
|
||||
node_status = get_node_status(node)
|
||||
nodes.append(node_status)
|
||||
|
||||
if not nodes:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_nodes = []
|
||||
for n in nodes:
|
||||
if n['name'] not in seen:
|
||||
seen.add(n['name'])
|
||||
unique_nodes.append(n)
|
||||
|
||||
# Sort by name
|
||||
unique_nodes.sort(key=lambda x: x['name'])
|
||||
|
||||
# Filter if problems only
|
||||
if problems_only:
|
||||
unique_nodes = [n for n in unique_nodes if n['is_problem']]
|
||||
if not unique_nodes:
|
||||
print("No resources found.")
|
||||
return 0
|
||||
|
||||
# Print results
|
||||
print_nodes_table(unique_nodes)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze node resources from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather.local.123456789
|
||||
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||
help='Show only nodes with issues')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_nodes(args.must_gather_path, args.problems_only)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
@@ -0,0 +1,444 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze OVN Northbound and Southbound databases from must-gather.
|
||||
Uses ovsdb-tool to read binary .db files collected per-node.
|
||||
|
||||
Must-gather structure:
|
||||
network_logs/
|
||||
└── ovnk_database_store.tar.gz
|
||||
└── ovnk_database_store/
|
||||
├── ovnkube-node-{pod}_nbdb (per-zone NBDB)
|
||||
├── ovnkube-node-{pod}_sbdb (per-zone SBDB)
|
||||
└── ...
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import tarfile
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
class OVNDatabase:
|
||||
"""Wrapper for querying OVSDB files using ovsdb-tool"""
|
||||
|
||||
def __init__(self, db_path: Path, db_type: str, node_name: str = None):
|
||||
self.db_path = db_path
|
||||
self.db_type = db_type # 'nbdb' or 'sbdb'
|
||||
self.pod_name = db_path.stem.replace('_nbdb', '').replace('_sbdb', '')
|
||||
self.node_name = node_name or self.pod_name # Use node name if available
|
||||
|
||||
def query(self, table: str, columns: List[str] = None, where: List = None) -> List[Dict]:
|
||||
"""Query OVSDB table using ovsdb-tool query command"""
|
||||
schema = "OVN_Northbound" if self.db_type == "nbdb" else "OVN_Southbound"
|
||||
|
||||
# Build query
|
||||
query_op = {
|
||||
"op": "select",
|
||||
"table": table,
|
||||
"where": where or []
|
||||
}
|
||||
|
||||
if columns:
|
||||
query_op["columns"] = columns
|
||||
|
||||
query_json = json.dumps([schema, query_op])
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ovsdb-tool', 'query', str(self.db_path), query_json],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Warning: Query failed for {self.db_path}: {result.stderr}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
data = json.loads(result.stdout)
|
||||
return data[0].get('rows', [])
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to query {table} from {self.db_path}: {e}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
|
||||
def build_pod_to_node_mapping(mg_path: Path) -> Dict[str, str]:
|
||||
"""Build mapping of ovnkube pod names to node names"""
|
||||
pod_to_node = {}
|
||||
|
||||
# Look for ovnkube-node pods in openshift-ovn-kubernetes namespace
|
||||
ovn_ns_path = mg_path / "namespaces" / "openshift-ovn-kubernetes" / "pods"
|
||||
|
||||
if not ovn_ns_path.exists():
|
||||
print(f"Warning: OVN namespace pods not found at {ovn_ns_path}", file=sys.stderr)
|
||||
return pod_to_node
|
||||
|
||||
# Find all ovnkube-node pod directories
|
||||
for pod_dir in ovn_ns_path.glob("ovnkube-node-*"):
|
||||
if not pod_dir.is_dir():
|
||||
continue
|
||||
|
||||
pod_name = pod_dir.name
|
||||
pod_yaml = pod_dir / f"{pod_name}.yaml"
|
||||
|
||||
if not pod_yaml.exists():
|
||||
continue
|
||||
|
||||
try:
|
||||
with open(pod_yaml, 'r') as f:
|
||||
pod = yaml.safe_load(f)
|
||||
node_name = pod.get('spec', {}).get('nodeName')
|
||||
if node_name:
|
||||
pod_to_node[pod_name] = node_name
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {pod_yaml}: {e}", file=sys.stderr)
|
||||
|
||||
return pod_to_node
|
||||
|
||||
|
||||
def extract_db_tarball(mg_path: Path) -> Path:
|
||||
"""Extract ovnk_database_store.tar.gz if not already extracted"""
|
||||
network_logs = mg_path / "network_logs"
|
||||
tarball = network_logs / "ovnk_database_store.tar.gz"
|
||||
extract_dir = network_logs / "ovnk_database_store"
|
||||
|
||||
if not tarball.exists():
|
||||
print(f"Error: Database tarball not found: {tarball}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
# Extract if directory doesn't exist
|
||||
if not extract_dir.exists():
|
||||
print(f"Extracting {tarball}...")
|
||||
with tarfile.open(tarball, 'r:gz') as tar:
|
||||
tar.extractall(path=network_logs)
|
||||
|
||||
return extract_dir
|
||||
|
||||
|
||||
def get_nb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||
"""Find all NB database files and map them to nodes"""
|
||||
databases = []
|
||||
for db in sorted(db_dir.glob("*_nbdb")):
|
||||
pod_name = db.stem.replace('_nbdb', '')
|
||||
node_name = pod_to_node.get(pod_name)
|
||||
databases.append(OVNDatabase(db, 'nbdb', node_name))
|
||||
return databases
|
||||
|
||||
|
||||
def get_sb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||
"""Find all SB database files and map them to nodes"""
|
||||
databases = []
|
||||
for db in sorted(db_dir.glob("*_sbdb")):
|
||||
pod_name = db.stem.replace('_sbdb', '')
|
||||
node_name = pod_to_node.get(pod_name)
|
||||
databases.append(OVNDatabase(db, 'sbdb', node_name))
|
||||
return databases
|
||||
|
||||
|
||||
def analyze_logical_switches(db: OVNDatabase):
|
||||
"""Analyze logical switches in the zone"""
|
||||
switches = db.query("Logical_Switch", columns=["name", "ports", "other_config"])
|
||||
|
||||
if not switches:
|
||||
print(" No logical switches found.")
|
||||
return
|
||||
|
||||
print(f"\n LOGICAL SWITCHES ({len(switches)}):")
|
||||
print(f" {'NAME':<60} PORTS")
|
||||
print(f" {'-'*80}")
|
||||
|
||||
for sw in switches:
|
||||
name = sw.get('name', 'unknown')
|
||||
# ports is a UUID set, just count them
|
||||
port_count = 0
|
||||
ports = sw.get('ports', [])
|
||||
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||
port_count = len(ports[1])
|
||||
|
||||
print(f" {name:<60} {port_count}")
|
||||
|
||||
|
||||
def analyze_logical_switch_ports(db: OVNDatabase):
|
||||
"""Analyze logical switch ports, focusing on pods"""
|
||||
lsps = db.query("Logical_Switch_Port", columns=["name", "external_ids", "addresses"])
|
||||
|
||||
# Filter for pod ports (have pod=true in external_ids)
|
||||
pod_ports = []
|
||||
for lsp in lsps:
|
||||
ext_ids = lsp.get('external_ids', [])
|
||||
if isinstance(ext_ids, list) and len(ext_ids) == 2 and ext_ids[0] == "map":
|
||||
ext_map = dict(ext_ids[1])
|
||||
if ext_map.get('pod') == 'true':
|
||||
# Pod name is in the LSP name (format: namespace_podname)
|
||||
lsp_name = lsp.get('name', '')
|
||||
namespace = ext_map.get('namespace', '')
|
||||
|
||||
# Extract pod name from LSP name
|
||||
pod_name = lsp_name
|
||||
if lsp_name.startswith(namespace + '_'):
|
||||
pod_name = lsp_name[len(namespace) + 1:]
|
||||
|
||||
# Extract IP from addresses (format can be string "MAC IP" or empty)
|
||||
ip = ""
|
||||
addrs = lsp.get('addresses', '')
|
||||
if isinstance(addrs, str) and addrs:
|
||||
parts = addrs.split()
|
||||
if len(parts) > 1:
|
||||
ip = parts[1]
|
||||
|
||||
pod_ports.append({
|
||||
'name': lsp_name,
|
||||
'namespace': namespace,
|
||||
'pod_name': pod_name,
|
||||
'ip': ip
|
||||
})
|
||||
|
||||
if not pod_ports:
|
||||
print(" No pod logical switch ports found.")
|
||||
return
|
||||
|
||||
print(f"\n POD LOGICAL SWITCH PORTS ({len(pod_ports)}):")
|
||||
print(f" {'NAMESPACE':<40} {'POD':<45} IP")
|
||||
print(f" {'-'*120}")
|
||||
|
||||
for port in sorted(pod_ports, key=lambda x: (x['namespace'], x['pod_name']))[:20]: # Show first 20
|
||||
namespace = port['namespace'][:40]
|
||||
pod_name = port['pod_name'][:45]
|
||||
ip = port['ip']
|
||||
|
||||
print(f" {namespace:<40} {pod_name:<45} {ip}")
|
||||
|
||||
if len(pod_ports) > 20:
|
||||
print(f" ... and {len(pod_ports) - 20} more")
|
||||
|
||||
|
||||
def analyze_acls(db: OVNDatabase):
|
||||
"""Analyze ACLs in the zone"""
|
||||
acls = db.query("ACL", columns=["priority", "direction", "match", "action", "severity"])
|
||||
|
||||
if not acls:
|
||||
print(" No ACLs found.")
|
||||
return
|
||||
|
||||
print(f"\n ACCESS CONTROL LISTS ({len(acls)}):")
|
||||
print(f" {'PRIORITY':<10} {'DIRECTION':<15} {'ACTION':<15} MATCH")
|
||||
print(f" {'-'*120}")
|
||||
|
||||
# Show highest priority ACLs first
|
||||
sorted_acls = sorted(acls, key=lambda x: x.get('priority', 0), reverse=True)
|
||||
|
||||
for acl in sorted_acls[:15]: # Show top 15
|
||||
priority = acl.get('priority', 0)
|
||||
direction = acl.get('direction', '')
|
||||
action = acl.get('action', '')
|
||||
match = acl.get('match', '')[:70] # Truncate long matches
|
||||
|
||||
print(f" {priority:<10} {direction:<15} {action:<15} {match}")
|
||||
|
||||
if len(acls) > 15:
|
||||
print(f" ... and {len(acls) - 15} more")
|
||||
|
||||
|
||||
def analyze_logical_routers(db: OVNDatabase):
|
||||
"""Analyze logical routers in the zone"""
|
||||
routers = db.query("Logical_Router", columns=["name", "ports", "static_routes"])
|
||||
|
||||
if not routers:
|
||||
print(" No logical routers found.")
|
||||
return
|
||||
|
||||
print(f"\n LOGICAL ROUTERS ({len(routers)}):")
|
||||
print(f" {'NAME':<60} PORTS")
|
||||
print(f" {'-'*80}")
|
||||
|
||||
for router in routers:
|
||||
name = router.get('name', 'unknown')
|
||||
|
||||
# Count ports
|
||||
port_count = 0
|
||||
ports = router.get('ports', [])
|
||||
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||
port_count = len(ports[1])
|
||||
|
||||
print(f" {name:<60} {port_count}")
|
||||
|
||||
|
||||
def analyze_zone_summary(db: OVNDatabase):
|
||||
"""Print summary for a zone"""
|
||||
# Get counts - for ACLs we need multiple columns to get accurate count
|
||||
switches = db.query("Logical_Switch", columns=["name"])
|
||||
lsps = db.query("Logical_Switch_Port", columns=["name"])
|
||||
acls = db.query("ACL", columns=["priority", "direction", "match"])
|
||||
routers = db.query("Logical_Router", columns=["name"])
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"Node: {db.node_name}")
|
||||
if db.node_name != db.pod_name:
|
||||
print(f"Pod: {db.pod_name}")
|
||||
print(f"{'='*80}")
|
||||
print(f" Logical Switches: {len(switches)}")
|
||||
print(f" Logical Switch Ports: {len(lsps)}")
|
||||
print(f" ACLs: {len(acls)}")
|
||||
print(f" Logical Routers: {len(routers)}")
|
||||
|
||||
|
||||
def run_raw_query(mg_path: str, node_filter: str, query_json: str):
|
||||
"""Run a raw JSON query against OVN databases"""
|
||||
base_path = Path(mg_path)
|
||||
|
||||
# Build pod-to-node mapping
|
||||
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||
|
||||
# Extract tarball
|
||||
db_dir = extract_db_tarball(base_path)
|
||||
if not db_dir:
|
||||
return 1
|
||||
|
||||
# Get all NB databases
|
||||
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||
|
||||
if not nb_dbs:
|
||||
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Filter by node if specified
|
||||
if node_filter:
|
||||
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||
if not filtered_dbs:
|
||||
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||
for db in nb_dbs:
|
||||
print(f" - {db.node_name}", file=sys.stderr)
|
||||
return 1
|
||||
nb_dbs = filtered_dbs
|
||||
|
||||
# Run query on each database
|
||||
for db in nb_dbs:
|
||||
print(f"\n{'='*80}")
|
||||
print(f"Node: {db.node_name}")
|
||||
if db.node_name != db.pod_name:
|
||||
print(f"Pod: {db.pod_name}")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
try:
|
||||
# Run the raw query using ovsdb-tool
|
||||
result = subprocess.run(
|
||||
['ovsdb-tool', 'query', str(db.db_path), query_json],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: Query failed: {result.stderr}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
# Pretty print the JSON result
|
||||
try:
|
||||
data = json.loads(result.stdout)
|
||||
print(json.dumps(data, indent=2))
|
||||
except json.JSONDecodeError:
|
||||
# If not valid JSON, just print raw output
|
||||
print(result.stdout)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: Failed to execute query: {e}", file=sys.stderr)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def analyze_northbound_databases(mg_path: str, node_filter: str = None):
|
||||
"""Analyze all Northbound databases"""
|
||||
base_path = Path(mg_path)
|
||||
|
||||
# Build pod-to-node mapping
|
||||
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||
|
||||
# Extract tarball
|
||||
db_dir = extract_db_tarball(base_path)
|
||||
if not db_dir:
|
||||
return 1
|
||||
|
||||
# Get all NB databases
|
||||
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||
|
||||
if not nb_dbs:
|
||||
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Filter by node if specified
|
||||
if node_filter:
|
||||
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||
if not filtered_dbs:
|
||||
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||
for db in nb_dbs:
|
||||
print(f" - {db.node_name}", file=sys.stderr)
|
||||
return 1
|
||||
nb_dbs = filtered_dbs
|
||||
|
||||
print(f"\nFound {len(nb_dbs)} node(s)\n")
|
||||
|
||||
# Analyze each zone
|
||||
for db in nb_dbs:
|
||||
analyze_zone_summary(db)
|
||||
analyze_logical_switches(db)
|
||||
analyze_logical_switch_ports(db)
|
||||
analyze_acls(db)
|
||||
analyze_logical_routers(db)
|
||||
print()
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze OVN databases from must-gather",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Analyze all nodes
|
||||
analyze_ovn_dbs.py ./must-gather.local.123456789
|
||||
|
||||
# Analyze specific node
|
||||
analyze_ovn_dbs.py ./must-gather.local.123456789 --node ip-10-0-26-145
|
||||
|
||||
# Run raw OVSDB query (Claude can construct the JSON)
|
||||
analyze_ovn_dbs.py ./must-gather/ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
|
||||
|
||||
# Query specific node
|
||||
analyze_ovn_dbs.py ./must-gather/ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name"]}]'
|
||||
"""
|
||||
)
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('--node', '-n', help='Filter by node name (supports partial matches)')
|
||||
parser.add_argument('--query', '-q', help='Run raw OVSDB JSON query instead of standard analysis')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Check if ovsdb-tool is available
|
||||
try:
|
||||
subprocess.run(['ovsdb-tool', '--version'], capture_output=True, check=True)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("Error: ovsdb-tool not found. Please install openvswitch package.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Run query mode or standard analysis
|
||||
if args.query:
|
||||
return run_raw_query(args.must_gather_path, args.node, args.query)
|
||||
else:
|
||||
return analyze_northbound_databases(args.must_gather_path, args.node)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
@@ -0,0 +1,224 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Pod resources from must-gather data.
|
||||
Displays output similar to 'oc get pods -A' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_pod(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single pod YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'Pod':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def calculate_age(creation_timestamp: str) -> str:
|
||||
"""Calculate age from creation timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def get_pod_status(pod: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract pod status information."""
|
||||
metadata = pod.get('metadata', {})
|
||||
status = pod.get('status', {})
|
||||
spec = pod.get('spec', {})
|
||||
|
||||
name = metadata.get('name', 'unknown')
|
||||
namespace = metadata.get('namespace', 'unknown')
|
||||
creation_time = metadata.get('creationTimestamp', '')
|
||||
|
||||
# Get container statuses
|
||||
container_statuses = status.get('containerStatuses', [])
|
||||
init_container_statuses = status.get('initContainerStatuses', [])
|
||||
|
||||
# Calculate ready containers
|
||||
total_containers = len(spec.get('containers', []))
|
||||
ready_containers = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||
|
||||
# Get overall phase
|
||||
phase = status.get('phase', 'Unknown')
|
||||
|
||||
# Determine more specific status
|
||||
pod_status = phase
|
||||
reason = status.get('reason', '')
|
||||
|
||||
# Check for specific container states
|
||||
for cs in container_statuses:
|
||||
state = cs.get('state', {})
|
||||
if 'waiting' in state:
|
||||
waiting = state['waiting']
|
||||
pod_status = waiting.get('reason', 'Waiting')
|
||||
elif 'terminated' in state:
|
||||
terminated = state['terminated']
|
||||
if terminated.get('exitCode', 0) != 0:
|
||||
pod_status = terminated.get('reason', 'Error')
|
||||
|
||||
# Check init containers
|
||||
for ics in init_container_statuses:
|
||||
state = ics.get('state', {})
|
||||
if 'waiting' in state:
|
||||
waiting = state['waiting']
|
||||
if waiting.get('reason') in ['CrashLoopBackOff', 'ImagePullBackOff', 'ErrImagePull']:
|
||||
pod_status = f"Init:{waiting.get('reason', 'Waiting')}"
|
||||
|
||||
# Calculate total restarts
|
||||
total_restarts = sum(cs.get('restartCount', 0) for cs in container_statuses)
|
||||
|
||||
# Calculate age
|
||||
age = calculate_age(creation_time) if creation_time else ''
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'name': name,
|
||||
'ready': f"{ready_containers}/{total_containers}",
|
||||
'status': pod_status,
|
||||
'restarts': str(total_restarts),
|
||||
'age': age,
|
||||
'node': spec.get('nodeName', ''),
|
||||
'is_problem': pod_status not in ['Running', 'Succeeded', 'Completed'] or total_restarts > 0
|
||||
}
|
||||
|
||||
|
||||
def print_pods_table(pods: List[Dict[str, Any]], show_namespace: bool = True):
|
||||
"""Print pods in a formatted table like 'oc get pods'."""
|
||||
if not pods:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
if show_namespace:
|
||||
print(f"{'NAMESPACE':<42} {'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||
else:
|
||||
print(f"{'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||
|
||||
# Print rows
|
||||
for pod in pods:
|
||||
name = pod['name'][:50]
|
||||
ready = pod['ready'][:7]
|
||||
status = pod['status'][:20]
|
||||
restarts = pod['restarts'][:9]
|
||||
age = pod['age']
|
||||
|
||||
if show_namespace:
|
||||
namespace = pod['namespace'][:42]
|
||||
print(f"{namespace:<42} {name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||
else:
|
||||
print(f"{name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||
|
||||
|
||||
def analyze_pods(must_gather_path: str, namespace: Optional[str] = None, problems_only: bool = False):
|
||||
"""Analyze all pods in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
pods = []
|
||||
|
||||
# Find all pod YAML files
|
||||
# Structure: namespaces/<namespace>/pods/<pod-name>/<pod-name>.yaml
|
||||
if namespace:
|
||||
# Specific namespace
|
||||
patterns = [
|
||||
f"namespaces/{namespace}/pods/*/*.yaml",
|
||||
f"*/namespaces/{namespace}/pods/*/*.yaml",
|
||||
]
|
||||
else:
|
||||
# All namespaces
|
||||
patterns = [
|
||||
"namespaces/*/pods/*/*.yaml",
|
||||
"*/namespaces/*/pods/*/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for pod_file in base_path.glob(pattern):
|
||||
pod = parse_pod(pod_file)
|
||||
if pod:
|
||||
pod_status = get_pod_status(pod)
|
||||
pods.append(pod_status)
|
||||
|
||||
if not pods:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_pods = []
|
||||
for p in pods:
|
||||
key = f"{p['namespace']}/{p['name']}"
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
unique_pods.append(p)
|
||||
|
||||
# Sort by namespace, then name
|
||||
unique_pods.sort(key=lambda x: (x['namespace'], x['name']))
|
||||
|
||||
# Filter if problems only
|
||||
if problems_only:
|
||||
unique_pods = [p for p in unique_pods if p['is_problem']]
|
||||
if not unique_pods:
|
||||
print("No resources found.")
|
||||
return 0
|
||||
|
||||
# Print results
|
||||
print_pods_table(unique_pods, show_namespace=(namespace is None))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze pod resources from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather.local.123456789
|
||||
%(prog)s ./must-gather.local.123456789 --namespace openshift-etcd
|
||||
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||
help='Show only pods with issues')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_pods(args.must_gather_path, args.namespace, args.problems_only)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
@@ -0,0 +1,117 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Prometheus data from must-gather data.
|
||||
Shows Prometheus status, targets, and active alerts.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a JSON file."""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
doc = json.load(f)
|
||||
return doc
|
||||
except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
|
||||
print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
def print_alerts_table(alerts):
|
||||
"""Print alerts in a table format."""
|
||||
if not alerts:
|
||||
print("No alerts found.")
|
||||
return
|
||||
|
||||
print("ALERTS")
|
||||
print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")
|
||||
|
||||
for alert in alerts:
|
||||
state = alert.get('state', '')
|
||||
since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
|
||||
labels = alert.get('labels', {})
|
||||
namespace = labels.pop('namespace', '')[:50]
|
||||
name = labels.pop('alertname', '')[:50]
|
||||
severity = labels.pop('severity', '')[:10]
|
||||
|
||||
print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")
|
||||
|
||||
|
||||
def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
|
||||
"""Analyze Prometheus data in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Retrieve active alerts.
|
||||
rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
|
||||
rules = parse_json_file(rules_path)
|
||||
|
||||
if rules is None:
|
||||
return 1
|
||||
status = rules.get("status", "")
|
||||
if status != "success":
|
||||
print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
if "data" not in rules or "groups" not in rules["data"]:
|
||||
print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
alerts = []
|
||||
for group in rules["data"]["groups"]:
|
||||
for rule in group["rules"]:
|
||||
if rule["type"] == 'alerting' and rule["state"] != 'inactive':
|
||||
for alert in rule["alerts"]:
|
||||
if namespace is None or namespace == '':
|
||||
alerts.append(alert)
|
||||
elif alert.get('labels', {}).get('namespace', '') == namespace:
|
||||
alerts.append(alert)
|
||||
|
||||
# Sort alerts by namespace, alertname and severity.
|
||||
alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))
|
||||
|
||||
# Print results
|
||||
print_alerts_table(alerts)
|
||||
|
||||
# Summary
|
||||
total_alerts = len(alerts)
|
||||
pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
|
||||
firing = sum(1 for alert in alerts if alert.get('state') == 'firing')
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"SUMMARY")
|
||||
print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
|
||||
print(f"{'='*80}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze Prometheus data from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter information by namespace')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_prometheus(args.must_gather_path, args.namespace)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
|
||||
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
@@ -0,0 +1,235 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze PersistentVolumes and PersistentVolumeClaims from must-gather data.
|
||||
Shows PV/PVC status, capacity, and binding information.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def format_pv(pv: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a PersistentVolume for display."""
|
||||
name = pv.get('metadata', {}).get('name', 'unknown')
|
||||
spec = pv.get('spec', {})
|
||||
status = pv.get('status', {})
|
||||
|
||||
capacity = spec.get('capacity', {}).get('storage', '')
|
||||
access_modes = ','.join(spec.get('accessModes', []))[:20]
|
||||
reclaim_policy = spec.get('persistentVolumeReclaimPolicy', '')
|
||||
pv_status = status.get('phase', 'Unknown')
|
||||
|
||||
claim_ref = spec.get('claimRef', {})
|
||||
claim = ''
|
||||
if claim_ref:
|
||||
claim_ns = claim_ref.get('namespace', '')
|
||||
claim_name = claim_ref.get('name', '')
|
||||
claim = f"{claim_ns}/{claim_name}" if claim_ns else claim_name
|
||||
|
||||
storage_class = spec.get('storageClassName', '')
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'capacity': capacity,
|
||||
'access_modes': access_modes,
|
||||
'reclaim_policy': reclaim_policy,
|
||||
'status': pv_status,
|
||||
'claim': claim,
|
||||
'storage_class': storage_class
|
||||
}
|
||||
|
||||
|
||||
def format_pvc(pvc: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a PersistentVolumeClaim for display."""
|
||||
metadata = pvc.get('metadata', {})
|
||||
name = metadata.get('name', 'unknown')
|
||||
namespace = metadata.get('namespace', 'unknown')
|
||||
spec = pvc.get('spec', {})
|
||||
status = pvc.get('status', {})
|
||||
|
||||
pvc_status = status.get('phase', 'Unknown')
|
||||
volume = spec.get('volumeName', '')
|
||||
capacity = status.get('capacity', {}).get('storage', '')
|
||||
access_modes = ','.join(status.get('accessModes', []))[:20]
|
||||
storage_class = spec.get('storageClassName', '')
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'name': name,
|
||||
'status': pvc_status,
|
||||
'volume': volume,
|
||||
'capacity': capacity,
|
||||
'access_modes': access_modes,
|
||||
'storage_class': storage_class
|
||||
}
|
||||
|
||||
|
||||
def print_pvs_table(pvs: List[Dict[str, str]]):
|
||||
"""Print PVs in a table format."""
|
||||
if not pvs:
|
||||
print("No PersistentVolumes found.")
|
||||
return
|
||||
|
||||
print("PERSISTENT VOLUMES")
|
||||
print(f"{'NAME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} {'RECLAIM':<10} {'STATUS':<10} {'CLAIM':<40} STORAGECLASS")
|
||||
|
||||
for pv in pvs:
|
||||
name = pv['name'][:50]
|
||||
capacity = pv['capacity'][:10]
|
||||
access = pv['access_modes'][:20]
|
||||
reclaim = pv['reclaim_policy'][:10]
|
||||
status = pv['status'][:10]
|
||||
claim = pv['claim'][:40]
|
||||
sc = pv['storage_class']
|
||||
|
||||
print(f"{name:<50} {capacity:<10} {access:<20} {reclaim:<10} {status:<10} {claim:<40} {sc}")
|
||||
|
||||
|
||||
def print_pvcs_table(pvcs: List[Dict[str, str]]):
|
||||
"""Print PVCs in a table format."""
|
||||
if not pvcs:
|
||||
print("\nNo PersistentVolumeClaims found.")
|
||||
return
|
||||
|
||||
print("\nPERSISTENT VOLUME CLAIMS")
|
||||
print(f"{'NAMESPACE':<30} {'NAME':<40} {'STATUS':<10} {'VOLUME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} STORAGECLASS")
|
||||
|
||||
for pvc in pvcs:
|
||||
namespace = pvc['namespace'][:30]
|
||||
name = pvc['name'][:40]
|
||||
status = pvc['status'][:10]
|
||||
volume = pvc['volume'][:50]
|
||||
capacity = pvc['capacity'][:10]
|
||||
access = pvc['access_modes'][:20]
|
||||
sc = pvc['storage_class']
|
||||
|
||||
print(f"{namespace:<30} {name:<40} {status:<10} {volume:<50} {capacity:<10} {access:<20} {sc}")
|
||||
|
||||
|
||||
def analyze_storage(must_gather_path: str, namespace: Optional[str] = None):
|
||||
"""Analyze PVs and PVCs in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find PVs (cluster-scoped)
|
||||
pv_patterns = [
|
||||
"cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||
"*/cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||
]
|
||||
|
||||
pvs = []
|
||||
for pattern in pv_patterns:
|
||||
for pv_file in base_path.glob(pattern):
|
||||
if pv_file.name == 'persistentvolumes.yaml':
|
||||
continue
|
||||
pv = parse_yaml_file(pv_file)
|
||||
if pv and pv.get('kind') == 'PersistentVolume':
|
||||
pvs.append(format_pv(pv))
|
||||
|
||||
# Find PVCs (namespace-scoped)
|
||||
if namespace:
|
||||
pvc_patterns = [
|
||||
f"namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||
f"*/namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||
]
|
||||
else:
|
||||
pvc_patterns = [
|
||||
"namespaces/*/core/persistentvolumeclaims.yaml",
|
||||
"*/namespaces/*/core/persistentvolumeclaims.yaml",
|
||||
]
|
||||
|
||||
pvcs = []
|
||||
for pattern in pvc_patterns:
|
||||
for pvc_file in base_path.glob(pattern):
|
||||
pvc_doc = parse_yaml_file(pvc_file)
|
||||
if pvc_doc:
|
||||
if pvc_doc.get('kind') == 'PersistentVolumeClaim':
|
||||
pvcs.append(format_pvc(pvc_doc))
|
||||
elif pvc_doc.get('kind') == 'List':
|
||||
for item in pvc_doc.get('items', []):
|
||||
if item.get('kind') == 'PersistentVolumeClaim':
|
||||
pvcs.append(format_pvc(item))
|
||||
|
||||
# Remove duplicates
|
||||
seen_pvs = set()
|
||||
unique_pvs = []
|
||||
for pv in pvs:
|
||||
if pv['name'] not in seen_pvs:
|
||||
seen_pvs.add(pv['name'])
|
||||
unique_pvs.append(pv)
|
||||
|
||||
seen_pvcs = set()
|
||||
unique_pvcs = []
|
||||
for pvc in pvcs:
|
||||
key = f"{pvc['namespace']}/{pvc['name']}"
|
||||
if key not in seen_pvcs:
|
||||
seen_pvcs.add(key)
|
||||
unique_pvcs.append(pvc)
|
||||
|
||||
# Sort
|
||||
unique_pvs.sort(key=lambda x: x['name'])
|
||||
unique_pvcs.sort(key=lambda x: (x['namespace'], x['name']))
|
||||
|
||||
# Print results
|
||||
print_pvs_table(unique_pvs)
|
||||
print_pvcs_table(unique_pvcs)
|
||||
|
||||
# Summary
|
||||
total_pvs = len(unique_pvs)
|
||||
bound_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Bound')
|
||||
available_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Available')
|
||||
|
||||
total_pvcs = len(unique_pvcs)
|
||||
bound_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Bound')
|
||||
pending_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Pending')
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"SUMMARY")
|
||||
print(f"PVs: {total_pvs} total ({bound_pvs} bound, {available_pvs} available)")
|
||||
print(f"PVCs: {total_pvcs} total ({bound_pvcs} bound, {pending_pvcs} pending)")
|
||||
if pending_pvcs > 0:
|
||||
print(f" ⚠️ {pending_pvcs} PVC(s) pending - check storage provisioner")
|
||||
print(f"{'='*80}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze PVs and PVCs from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter PVCs by namespace')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_storage(args.must_gather_path, args.namespace)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user