Initial commit
This commit is contained in:
14
.claude-plugin/plugin.json
Normal file
14
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"name": "must-gather",
|
||||||
|
"description": "A plugin to analyze and report on must-gather data",
|
||||||
|
"version": "0.0.1",
|
||||||
|
"author": {
|
||||||
|
"name": "openshift"
|
||||||
|
},
|
||||||
|
"skills": [
|
||||||
|
"./skills"
|
||||||
|
],
|
||||||
|
"commands": [
|
||||||
|
"./commands"
|
||||||
|
]
|
||||||
|
}
|
||||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# must-gather
|
||||||
|
|
||||||
|
A plugin to analyze and report on must-gather data
|
||||||
262
commands/analyze.md
Normal file
262
commands/analyze.md
Normal file
@@ -0,0 +1,262 @@
|
|||||||
|
---
|
||||||
|
description: Quick analysis of must-gather data - runs all analysis scripts and provides comprehensive cluster diagnostics
|
||||||
|
argument-hint: [must-gather-path] [component]
|
||||||
|
---
|
||||||
|
|
||||||
|
## Name
|
||||||
|
must-gather:analyze
|
||||||
|
|
||||||
|
## Synopsis
|
||||||
|
```
|
||||||
|
/must-gather:analyze [must-gather-path] [component]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
The `analyze` command performs comprehensive analysis of OpenShift must-gather diagnostic data. It runs specialized Python analysis scripts to extract and summarize cluster health information across multiple components.
|
||||||
|
|
||||||
|
The command can analyze:
|
||||||
|
- Cluster version and update status
|
||||||
|
- Cluster operator health (degraded, progressing, unavailable)
|
||||||
|
- Node conditions and resource status
|
||||||
|
- Pod failures, restarts, and crash loops
|
||||||
|
- Network configuration and OVN health
|
||||||
|
- OVN databases - logical topology, ACLs, pods
|
||||||
|
- Kubernetes events (warnings and errors)
|
||||||
|
- etcd cluster health and quorum status
|
||||||
|
- Persistent volume and claim status
|
||||||
|
- Prometheus alerts
|
||||||
|
|
||||||
|
You can request analysis of the entire cluster or focus on a specific component.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
**Required Directory Structure:**
|
||||||
|
|
||||||
|
Must-gather data typically has this structure:
|
||||||
|
```
|
||||||
|
must-gather/
|
||||||
|
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
||||||
|
├── cluster-scoped-resources/
|
||||||
|
├── namespaces/
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The actual must-gather directory is the subdirectory with the hash name, not the parent directory.
|
||||||
|
|
||||||
|
**Required Scripts:**
|
||||||
|
|
||||||
|
Analysis scripts are bundled with this plugin at:
|
||||||
|
```
|
||||||
|
<plugin-root>/skills/must-gather-analyzer/scripts/
|
||||||
|
├── analyze_clusterversion.py
|
||||||
|
├── analyze_clusteroperators.py
|
||||||
|
├── analyze_nodes.py
|
||||||
|
├── analyze_pods.py
|
||||||
|
├── analyze_network.py
|
||||||
|
├── analyze_ovn_dbs.py
|
||||||
|
├── analyze_events.py
|
||||||
|
├── analyze_etcd.py
|
||||||
|
└── analyze_pvs.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
**CRITICAL: Script-Only Analysis**
|
||||||
|
|
||||||
|
- **NEVER** attempt to analyze must-gather data directly using bash commands, grep, or manual file reading
|
||||||
|
- **ONLY** use the provided Python scripts in `plugins/must-gather/skills/must-gather-analyzer/scripts/`
|
||||||
|
- If scripts are missing or not found:
|
||||||
|
1. Stop immediately
|
||||||
|
2. Inform the user that the analysis scripts are not available
|
||||||
|
3. Ask the user to ensure the scripts are installed at the correct path
|
||||||
|
4. Do NOT attempt alternative approaches
|
||||||
|
|
||||||
|
**Script Availability Check:**
|
||||||
|
|
||||||
|
Before running any analysis:
|
||||||
|
|
||||||
|
1. Locate the scripts directory by searching for a known script:
|
||||||
|
```bash
|
||||||
|
SCRIPT_PATH=$(find ~ -name "analyze_clusteroperators.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
|
||||||
|
|
||||||
|
if [ -z "$SCRIPT_PATH" ]; then
|
||||||
|
echo "ERROR: Must-gather analysis scripts not found."
|
||||||
|
echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# All scripts are in the same directory, so just get the directory
|
||||||
|
SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. If scripts cannot be found, STOP and report to the user:
|
||||||
|
```
|
||||||
|
The must-gather analysis scripts could not be located. Please ensure the must-gather plugin from openshift-eng/ai-helpers is properly installed in your Claude Code plugins directory.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
The command performs the following steps:
|
||||||
|
|
||||||
|
1. **Validate Must-Gather Path**:
|
||||||
|
- If path not provided as argument, ask the user
|
||||||
|
- Check if path contains `cluster-scoped-resources/` and `namespaces/` directories
|
||||||
|
- If user provides root directory, automatically find the correct subdirectory
|
||||||
|
- Verify the path exists and is readable
|
||||||
|
|
||||||
|
2. **Determine Analysis Scope**:
|
||||||
|
|
||||||
|
**STEP 1: Check for SPECIFIC component keywords**
|
||||||
|
|
||||||
|
If the user mentions a specific component, run ONLY that script:
|
||||||
|
- "pods", "pod status", "containers", "crashloop", "failing pods" → `analyze_pods.py` ONLY
|
||||||
|
- "etcd", "etcd health", "quorum" → `analyze_etcd.py` ONLY
|
||||||
|
- "network", "networking", "ovn", "connectivity" → `analyze_network.py` ONLY
|
||||||
|
- "ovn databases", "ovn-dbs", "ovn db", "logical switches", "acls" → `analyze_ovn_dbs.py` ONLY
|
||||||
|
- "nodes", "node status", "node conditions" → `analyze_nodes.py` ONLY
|
||||||
|
- "operators", "cluster operators", "degraded" → `analyze_clusteroperators.py` ONLY
|
||||||
|
- "version", "cluster version", "update", "upgrade" → `analyze_clusterversion.py` ONLY
|
||||||
|
- "events", "warnings", "errors" → `analyze_events.py` ONLY
|
||||||
|
- "storage", "pv", "pvc", "volumes", "persistent" → `analyze_pvs.py` ONLY
|
||||||
|
- "alerts", "prometheus", "monitoring" → `analyze_prometheus.py` ONLY
|
||||||
|
|
||||||
|
**STEP 2: No specific component mentioned**
|
||||||
|
|
||||||
|
If generic request like "analyze must-gather", "/must-gather:analyze", or "check the cluster", run ALL scripts in this order:
|
||||||
|
1. ClusterVersion (`analyze_clusterversion.py`)
|
||||||
|
2. Cluster Operators (`analyze_clusteroperators.py`)
|
||||||
|
3. Nodes (`analyze_nodes.py`)
|
||||||
|
4. Pods - problems only (`analyze_pods.py --problems-only`)
|
||||||
|
5. Network (`analyze_network.py`)
|
||||||
|
6. Events - warnings only (`analyze_events.py --type Warning --count 50`)
|
||||||
|
7. etcd (`analyze_etcd.py`)
|
||||||
|
8. Storage (`analyze_pvs.py`)
|
||||||
|
9. Monitoring (`analyze_prometheus.py`)
|
||||||
|
|
||||||
|
3. **Locate Plugin Scripts**:
|
||||||
|
- Use the script availability check from the Error Handling section to find the plugin root
|
||||||
|
- Store the scripts directory path in `$SCRIPTS_DIR`
|
||||||
|
|
||||||
|
4. **Execute Analysis Scripts**:
|
||||||
|
```bash
|
||||||
|
python3 "$SCRIPTS_DIR/<script>.py" <must-gather-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```bash
|
||||||
|
python3 "$SCRIPTS_DIR/analyze_clusteroperators.py" ./must-gather.local.123/quay-io-...
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Synthesize Results**: Generate findings and recommendations based on script output
|
||||||
|
|
||||||
|
## Return Value
|
||||||
|
|
||||||
|
The command outputs structured analysis results to stdout:
|
||||||
|
|
||||||
|
**For Component-Specific Analysis:**
|
||||||
|
- Script output for the requested component only
|
||||||
|
- Focused findings and recommendations
|
||||||
|
|
||||||
|
**For Full Analysis:**
|
||||||
|
- Organized sections for each component
|
||||||
|
- Executive summary of overall cluster health
|
||||||
|
- Prioritized list of critical issues
|
||||||
|
- Actionable recommendations
|
||||||
|
- Suggested log files to review
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
================================================================================
|
||||||
|
MUST-GATHER ANALYSIS SUMMARY
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
[Script outputs organized by component]
|
||||||
|
|
||||||
|
CLUSTER VERSION:
|
||||||
|
[output from analyze_clusterversion.py]
|
||||||
|
|
||||||
|
CLUSTER OPERATORS:
|
||||||
|
[output from analyze_clusteroperators.py]
|
||||||
|
|
||||||
|
NODES:
|
||||||
|
[output from analyze_nodes.py]
|
||||||
|
|
||||||
|
PROBLEMATIC PODS:
|
||||||
|
[output from analyze_pods.py --problems-only]
|
||||||
|
|
||||||
|
NETWORK STATUS:
|
||||||
|
[output from analyze_network.py]
|
||||||
|
|
||||||
|
WARNING EVENTS (Last 50):
|
||||||
|
[output from analyze_events.py --type Warning --count 50]
|
||||||
|
|
||||||
|
ETCD CLUSTER HEALTH:
|
||||||
|
[output from analyze_etcd.py]
|
||||||
|
|
||||||
|
STORAGE (PVs/PVCs):
|
||||||
|
[output from analyze_pvs.py]
|
||||||
|
|
||||||
|
MONITORING (Alerts):
|
||||||
|
[output from analyze_prometheus.py]
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
FINDINGS AND RECOMMENDATIONS
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
Critical Issues:
|
||||||
|
- [Critical problems requiring immediate attention]
|
||||||
|
|
||||||
|
Warnings:
|
||||||
|
- [Potential issues or degraded components]
|
||||||
|
|
||||||
|
Recommendations:
|
||||||
|
- [Specific next steps for investigation]
|
||||||
|
|
||||||
|
Logs to Review:
|
||||||
|
- [Specific log files to examine based on findings]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
1. **Full cluster analysis**:
|
||||||
|
```
|
||||||
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
|
||||||
|
```
|
||||||
|
Runs all analysis scripts and provides comprehensive cluster diagnostics.
|
||||||
|
|
||||||
|
2. **Analyze pod issues only**:
|
||||||
|
```
|
||||||
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ analyze the pod statuses
|
||||||
|
```
|
||||||
|
Runs only `analyze_pods.py` to focus on pod-related issues.
|
||||||
|
|
||||||
|
3. **Check etcd health**:
|
||||||
|
```
|
||||||
|
/must-gather:analyze check etcd health
|
||||||
|
```
|
||||||
|
Asks for must-gather path, then runs only `analyze_etcd.py`.
|
||||||
|
|
||||||
|
4. **Network troubleshooting**:
|
||||||
|
```
|
||||||
|
/must-gather:analyze ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/ show me network issues
|
||||||
|
```
|
||||||
|
Runs only `analyze_network.py` for network-specific analysis.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- **Must-Gather Path**: Always use the subdirectory containing `cluster-scoped-resources/` and `namespaces/`, not the parent directory
|
||||||
|
- **Script Dependencies**: Analysis scripts must be executable and have required Python dependencies installed
|
||||||
|
- **Error Handling**: If scripts are not found or must-gather path is invalid, clear error messages are displayed
|
||||||
|
- **Cross-Referencing**: The analysis attempts to correlate issues across components (e.g., degraded operator → failing pods)
|
||||||
|
- **Pattern Detection**: Identifies patterns like multiple pod failures on the same node
|
||||||
|
- **Actionable Output**: Focuses on insights and recommendations rather than raw data dumps
|
||||||
|
- **Priority**: Issues are prioritized by severity (Critical > Warning > Info)
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- **$1** (must-gather-path): Optional. Path to the must-gather directory (the subdirectory with the hash name). If not provided, the user will be asked.
|
||||||
|
- **$2+** (component): Optional. If keywords for a specific component are detected, only that component's analysis script will run. Otherwise, all scripts run.
|
||||||
266
commands/ovn-dbs.md
Normal file
266
commands/ovn-dbs.md
Normal file
@@ -0,0 +1,266 @@
|
|||||||
|
---
|
||||||
|
description: Analyze OVN databases from a must-gather using ovsdb-tool
|
||||||
|
argument-hint: [must-gather-path]
|
||||||
|
---
|
||||||
|
|
||||||
|
## Name
|
||||||
|
must-gather:ovn-dbs
|
||||||
|
|
||||||
|
## Synopsis
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs [must-gather-path] [--node <node-name>] [--query <json>]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
The `ovn-dbs` command analyzes OVN Northbound and Southbound databases collected from clusters. It uses `ovsdb-tool` to query the binary database files (`.db`) collected per-node, providing detailed information about the logical network topology, pods, ACLs, and routers on each node.
|
||||||
|
|
||||||
|
The command automatically maps ovnkube pods to their corresponding nodes by reading pod specifications from the must-gather data.
|
||||||
|
|
||||||
|
**Two modes of operation:**
|
||||||
|
1. **Standard Analysis** (default): Runs pre-built analysis showing switches, ports, ACLs, and routers
|
||||||
|
2. **Query Mode** (`--query`): Run custom OVSDB JSON queries for specific data extraction
|
||||||
|
|
||||||
|
**What it analyzes:**
|
||||||
|
- **Per-zone logical network topology**
|
||||||
|
- **Logical Switches** and their ports
|
||||||
|
- **Pod Logical Switch Ports** with namespace, pod name, and IP addresses
|
||||||
|
- **Access Control Lists (ACLs)** with priorities, directions, and match rules
|
||||||
|
- **Logical Routers** and their ports
|
||||||
|
|
||||||
|
**Important:** This command only works with must-gathers from clusters, where each node/zone has its own database files.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
The must-gather should contain:
|
||||||
|
```
|
||||||
|
network_logs/
|
||||||
|
└── ovnk_database_store.tar.gz
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required Tools:**
|
||||||
|
|
||||||
|
- `ovsdb-tool` must be installed (from openvswitch package)
|
||||||
|
- Check with: `which ovsdb-tool`
|
||||||
|
- Install: `sudo dnf install openvswitch` or `sudo apt install openvswitch-common`
|
||||||
|
|
||||||
|
**Analysis Script:**
|
||||||
|
|
||||||
|
The script is bundled with this plugin:
|
||||||
|
```
|
||||||
|
<plugin-root>/skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `<plugin-root>` is the directory where this plugin is installed (typically `~/.cursor/commands/ai-helpers/plugins/must-gather/` or similar).
|
||||||
|
|
||||||
|
Claude will automatically locate it by searching for the script in the plugin installation directory, regardless of your current working directory.
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
The command performs the following steps:
|
||||||
|
|
||||||
|
1. **Locate Analysis Script**:
|
||||||
|
```bash
|
||||||
|
SCRIPT_PATH=$(find ~ -name "analyze_ovn_dbs.py" -path "*/must-gather/skills/must-gather-analyzer/scripts/*" 2>/dev/null | head -1)
|
||||||
|
|
||||||
|
if [ -z "$SCRIPT_PATH" ]; then
|
||||||
|
echo "ERROR: analyze_ovn_dbs.py script not found."
|
||||||
|
echo "Please ensure the must-gather plugin from ai-helpers is properly installed."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
SCRIPTS_DIR=$(dirname "$SCRIPT_PATH")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Extract Database Tarball**:
|
||||||
|
- Locate `network_logs/ovnk_database_store.tar.gz`
|
||||||
|
- Extract if not already extracted
|
||||||
|
- Find all `*_nbdb` and `*_sbdb` files
|
||||||
|
|
||||||
|
3. **Query Each Zone's Database**:
|
||||||
|
For each zone (node), query the Northbound database using `ovsdb-tool query`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ovsdb-tool query <zone>_nbdb '["OVN_Northbound", {"op":"select", "table":"<table>", "where":[], "columns":[...]}]'
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Analyze and Display**:
|
||||||
|
- **Logical Switches**: Names and port counts
|
||||||
|
- **Logical Switch Ports**: Filter for pods (external_ids.pod=true), show namespace, pod name, and IP
|
||||||
|
- **ACLs**: Priority, direction, match rules, and actions
|
||||||
|
- **Logical Routers**: Names and port counts
|
||||||
|
|
||||||
|
5. **Present Zone Summary**:
|
||||||
|
- Total counts per zone
|
||||||
|
- Detailed breakdowns
|
||||||
|
- Sorted and formatted output
|
||||||
|
|
||||||
|
## Return Value
|
||||||
|
|
||||||
|
The command outputs structured analysis for each node:
|
||||||
|
|
||||||
|
```
|
||||||
|
Found 6 node(s)
|
||||||
|
|
||||||
|
================================================================================
|
||||||
|
Node: ip-10-0-26-145.us-east-2.compute.internal
|
||||||
|
Pod: ovnkube-node-79cbh
|
||||||
|
================================================================================
|
||||||
|
Logical Switches: 4
|
||||||
|
Logical Switch Ports: 55
|
||||||
|
ACLs: 7
|
||||||
|
Logical Routers: 2
|
||||||
|
|
||||||
|
LOGICAL SWITCHES (4):
|
||||||
|
NAME PORTS
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
transit_switch 6
|
||||||
|
ip-10-0-1-10.us-east-2.compute.internal 7
|
||||||
|
ext_ip-10-0-1-10.us-east-2.compute.internal 2
|
||||||
|
join 2
|
||||||
|
|
||||||
|
POD LOGICAL SWITCH PORTS (5):
|
||||||
|
NAMESPACE POD IP
|
||||||
|
------------------------------------------------------------------------------------------------------------------------
|
||||||
|
openshift-dns dns-default-abc123 10.128.0.5
|
||||||
|
openshift-monitoring prometheus-k8s-0 10.128.0.10
|
||||||
|
openshift-etcd etcd-master-0 10.128.0.3
|
||||||
|
...
|
||||||
|
|
||||||
|
ACCESS CONTROL LISTS (7):
|
||||||
|
PRIORITY DIRECTION ACTION MATCH
|
||||||
|
------------------------------------------------------------------------------------------------------------------------
|
||||||
|
1012 from-lport allow inport == @a4743249366342378346 && (ip4.mcast ...
|
||||||
|
1011 to-lport drop (ip4.mcast || mldv1 || mldv2 || ...
|
||||||
|
1001 to-lport allow-related ip4.src==10.128.0.2
|
||||||
|
...
|
||||||
|
|
||||||
|
LOGICAL ROUTERS (2):
|
||||||
|
NAME PORTS
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
ovn_cluster_router 3
|
||||||
|
GR_ip-10-0-1-10.us-east-2.compute.internal 2
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
1. **Analyze all nodes in a must-gather**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/registry-ci-openshift-org-origin-4-20-...-sha256-abc123/
|
||||||
|
```
|
||||||
|
Shows logical network topology for all nodes.
|
||||||
|
|
||||||
|
2. **Analyze specific node**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../ --node ip-10-0-26-145
|
||||||
|
```
|
||||||
|
Shows OVN database information only for the specified node (supports partial name matching).
|
||||||
|
|
||||||
|
3. **Analyze master node**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../ --node master-0
|
||||||
|
```
|
||||||
|
Filter to a specific master node using partial name matching.
|
||||||
|
|
||||||
|
4. **Interactive usage without path**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs
|
||||||
|
```
|
||||||
|
The command will ask for the must-gather path.
|
||||||
|
|
||||||
|
5. **Check if pod exists in OVN**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../
|
||||||
|
```
|
||||||
|
Then search the output for the pod name to see which node it's on and its IP allocation.
|
||||||
|
|
||||||
|
6. **Investigate ACL rules on a specific node**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../ --node worker-1
|
||||||
|
```
|
||||||
|
Review the ACL section for a specific node to understand traffic filtering rules.
|
||||||
|
|
||||||
|
7. **Run custom OVSDB query** (Query Mode):
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
|
||||||
|
```
|
||||||
|
Query ACLs with priority > 1000 across all nodes. Claude can construct the JSON query for any OVSDB table.
|
||||||
|
|
||||||
|
8. **Query specific node with custom query**:
|
||||||
|
```
|
||||||
|
/must-gather:ovn-dbs ./must-gather/.../ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name","ports"]}]'
|
||||||
|
```
|
||||||
|
List all logical switches with their ports on master-0.
|
||||||
|
|
||||||
|
9. **Query specific table** (Claude constructs JSON):
|
||||||
|
Just ask Claude to query a specific OVSDB table and it will construct the appropriate JSON query. For example:
|
||||||
|
- "Show all Logical_Router_Static_Route entries"
|
||||||
|
- "Find ACLs with action 'drop'"
|
||||||
|
- "List Logical_Switch_Port entries where external_ids contains 'openshift-etcd'"
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
**Missing ovsdb-tool:**
|
||||||
|
```
|
||||||
|
Error: ovsdb-tool not found. Please install openvswitch package.
|
||||||
|
```
|
||||||
|
Solution: Install openvswitch: `sudo dnf install openvswitch`
|
||||||
|
|
||||||
|
**Missing database tarball:**
|
||||||
|
```
|
||||||
|
Error: Database tarball not found: network_logs/ovnk_database_store.tar.gz
|
||||||
|
```
|
||||||
|
Solution: Ensure this is a must-gather from an OVN cluster.
|
||||||
|
|
||||||
|
|
||||||
|
**Node not found:**
|
||||||
|
```
|
||||||
|
Error: No databases found for node matching 'master-5'
|
||||||
|
|
||||||
|
Available nodes:
|
||||||
|
- ip-10-0-77-117.us-east-2.compute.internal
|
||||||
|
- ip-10-0-26-145.us-east-2.compute.internal
|
||||||
|
- ip-10-0-1-194.us-east-2.compute.internal
|
||||||
|
```
|
||||||
|
Solution: Use one of the listed node names or a partial match.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- **Binary Database Format**: Uses `ovsdb-tool` to read OVSDB binary files directly
|
||||||
|
- **Per-Node Analysis**: Each node in IC mode has its own database (one NB and one SB per zone)
|
||||||
|
- **Node Mapping**: Automatically correlates ovnkube pods to nodes by reading pod specs from must-gather
|
||||||
|
- **Pod Discovery**: Pods are identified by `external_ids` with `pod=true`
|
||||||
|
- **IP Extraction**: Pod IPs are parsed from the `addresses` field (format: "MAC IP")
|
||||||
|
- **ACL Priorities**: Higher priority ACLs are processed first (shown at top)
|
||||||
|
- **Node Filtering**: Supports partial name matching for convenience (e.g., "--node master" matches all masters)
|
||||||
|
- **Query Mode**: Accepts raw OVSDB JSON queries in the format `["OVN_Northbound", {"op":"select", "table":"...", ...}]`
|
||||||
|
- **Claude Query Construction**: Claude can automatically construct OVSDB JSON queries based on natural language requests
|
||||||
|
- **Performance**: Querying large databases may take a few seconds per node
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
1. **Verify Pod Network Configuration**:
|
||||||
|
- Check if pods are registered in OVN
|
||||||
|
- Verify IP address assignments
|
||||||
|
- Confirm logical switch port creation
|
||||||
|
|
||||||
|
2. **Troubleshoot Connectivity Issues**:
|
||||||
|
- Review ACL rules blocking traffic
|
||||||
|
- Check if pods are in correct logical switches
|
||||||
|
- Verify router configurations
|
||||||
|
|
||||||
|
3. **Understand Topology**:
|
||||||
|
- See how zones are interconnected via transit_switch
|
||||||
|
- Review gateway router configurations
|
||||||
|
- Understand logical network structure
|
||||||
|
|
||||||
|
4. **Audit Network Policies**:
|
||||||
|
- See ACL rules generated from NetworkPolicies
|
||||||
|
- Identify overly permissive or restrictive rules
|
||||||
|
- Check rule priorities and match conditions
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
- **$1** (must-gather-path): Optional. Path to the must-gather directory containing network_logs/. If not provided, user will be prompted.
|
||||||
|
- **--node, -n** (node-name): Optional. Filter analysis to a specific node. Supports partial name matching (e.g., "master-0", "ip-10-0-26-145"). If no match is found, displays list of available nodes.
|
||||||
|
- **--query, -q** (json-query): Optional. Run a raw OVSDB JSON query instead of standard analysis. Claude can construct the JSON query based on OVSDB transaction format. When provided, outputs raw JSON results instead of formatted analysis.
|
||||||
93
plugin.lock.json
Normal file
93
plugin.lock.json
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
{
|
||||||
|
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||||
|
"pluginId": "gh:openshift-eng/ai-helpers:plugins/must-gather",
|
||||||
|
"normalized": {
|
||||||
|
"repo": null,
|
||||||
|
"ref": "refs/tags/v20251128.0",
|
||||||
|
"commit": "5e7aea9c51347184db2f2a1db1029335d5e6b4b6",
|
||||||
|
"treeHash": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716",
|
||||||
|
"generatedAt": "2025-11-28T10:27:30.749372Z",
|
||||||
|
"toolVersion": "publish_plugins.py@0.2.0"
|
||||||
|
},
|
||||||
|
"origin": {
|
||||||
|
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||||
|
"branch": "master",
|
||||||
|
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||||
|
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||||
|
},
|
||||||
|
"manifest": {
|
||||||
|
"name": "must-gather",
|
||||||
|
"description": "A plugin to analyze and report on must-gather data",
|
||||||
|
"version": "0.0.1"
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"files": [
|
||||||
|
{
|
||||||
|
"path": "README.md",
|
||||||
|
"sha256": "62224e0cac5af83035e53ea011e387f43c066eef4c2dd942ec536612c6d5a53c"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": ".claude-plugin/plugin.json",
|
||||||
|
"sha256": "9efe7a9dce84b69f125de43024e8760070e603b79d161e1662ea485eae8d02f9"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "commands/analyze.md",
|
||||||
|
"sha256": "d4aa3d91663e16df1ff5176f992e62b99d9b7a0941fed5b8b24587ebcca23a7c"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "commands/ovn-dbs.md",
|
||||||
|
"sha256": "768eb9b9489dae9c511f9d25b3472cbd878d44d06dfa73f29872c5e62c7e3aeb"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/SKILL.md",
|
||||||
|
"sha256": "7d269e9a90a45600af26df99247f11152b69e9fb3eb09813f2e830dcecd21503"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_prometheus.py",
|
||||||
|
"sha256": "5fd3f1580bf58cc8a0d967d8a4b08654f85c19ee2f3c3202e6e8e6db2469d56b"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_pvs.py",
|
||||||
|
"sha256": "d05ef93f72853eb9c43b19a53d0d1765a9620fef08643e5ce2ce86559dd7f89f"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_pods.py",
|
||||||
|
"sha256": "dc23a7d5822a5f572a2373fc30de6609f701a270bc6f1d0837efcfbada88d00e"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_network.py",
|
||||||
|
"sha256": "6316796bfd2f4bc463fa4681b7b81198dcb9fba513bcec7b31f621314019470f"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_events.py",
|
||||||
|
"sha256": "ff02feac6053c13a3e3a6c4e83c1cb5fc1e5818ada860f351c5dccb7d54724c6"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py",
|
||||||
|
"sha256": "be537b03e1fee5e48ce936b544387e81aa6089a07893fa5ab329cc347980dde9"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_clusterversion.py",
|
||||||
|
"sha256": "f1110d963ba18942861c25a7234d9a01cc0798c4ef76534a5631de857d96805b"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_nodes.py",
|
||||||
|
"sha256": "90c8e2a9fa59b60946d5789cdaa2c602c5089227b4ec7f202c690b21b738d7c4"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_clusteroperators.py",
|
||||||
|
"sha256": "51915d87c97af925f39fbccd16964bde4aafed62c28830882bb544f3ef8d8533"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "skills/must-gather-analyzer/scripts/analyze_etcd.py",
|
||||||
|
"sha256": "03c6ef97197427acbce2c503d4644a113c01aef5e033ddd8ef464bba22e57962"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"dirSha256": "6642a98be3c0bebc6a688aeb737c645245616a0bd5a2c1612600b9a431d03716"
|
||||||
|
},
|
||||||
|
"security": {
|
||||||
|
"scannedAt": null,
|
||||||
|
"scannerVersion": null,
|
||||||
|
"flags": []
|
||||||
|
}
|
||||||
|
}
|
||||||
285
skills/must-gather-analyzer/SKILL.md
Normal file
285
skills/must-gather-analyzer/SKILL.md
Normal file
@@ -0,0 +1,285 @@
|
|||||||
|
---
|
||||||
|
name: Must-Gather Analyzer
|
||||||
|
description: |
|
||||||
|
Analyze OpenShift must-gather diagnostic data including cluster operators, pods, nodes,
|
||||||
|
and network components. Use this skill when the user asks about cluster health, operator status,
|
||||||
|
pod issues, node conditions, or wants diagnostic insights from must-gather data.
|
||||||
|
|
||||||
|
Triggers: "analyze must-gather", "check cluster health", "operator status", "pod issues",
|
||||||
|
"node status", "failing pods", "degraded operators", "cluster problems", "crashlooping",
|
||||||
|
"network issues", "etcd health", "analyze clusteroperators", "analyze pods", "analyze nodes"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Must-Gather Analyzer Skill
|
||||||
|
|
||||||
|
Comprehensive analysis of OpenShift must-gather diagnostic data with helper scripts that parse YAML and display output in `oc`-like format.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This skill provides analysis for:
|
||||||
|
- **ClusterVersion**: Current version, update status, and capabilities
|
||||||
|
- **Cluster Operators**: Status, degradation, and availability
|
||||||
|
- **Pods**: Health, restarts, crashes, and failures across namespaces
|
||||||
|
- **Nodes**: Conditions, capacity, and readiness
|
||||||
|
- **Network**: OVN/SDN diagnostics and connectivity
|
||||||
|
- **Events**: Warning and error events across namespaces
|
||||||
|
- **etcd**: Cluster health, member status, and quorum
|
||||||
|
- **Storage**: PersistentVolumes and PersistentVolumeClaims status
|
||||||
|
|
||||||
|
## Must-Gather Directory Structure
|
||||||
|
|
||||||
|
**Important**: Must-gather data is contained in a subdirectory with a long hash name:
|
||||||
|
```
|
||||||
|
must-gather/
|
||||||
|
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
||||||
|
├── cluster-scoped-resources/
|
||||||
|
│ ├── config.openshift.io/clusteroperators/
|
||||||
|
│ └── core/nodes/
|
||||||
|
├── namespaces/
|
||||||
|
│ └── <namespace>/
|
||||||
|
│ └── pods/
|
||||||
|
│ └── <pod-name>/
|
||||||
|
│ └── <pod-name>.yaml
|
||||||
|
└── network_logs/
|
||||||
|
```
|
||||||
|
|
||||||
|
The analysis scripts expect the path to the **subdirectory** (the one with the hash), not the root must-gather folder.
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
### 1. Get Must-Gather Path
|
||||||
|
Ask the user for the must-gather directory path if not already provided.
|
||||||
|
- If they provide the root directory, look for the subdirectory with the hash name
|
||||||
|
- The correct path contains `cluster-scoped-resources/` and `namespaces/` directories
|
||||||
|
|
||||||
|
### 2. Choose Analysis Type
|
||||||
|
|
||||||
|
Based on user's request, run the appropriate helper script:
|
||||||
|
|
||||||
|
#### ClusterVersion Analysis
|
||||||
|
```bash
|
||||||
|
./scripts/analyze_clusterversion.py <must-gather-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows cluster version information similar to `oc get clusterversion`:
|
||||||
|
- Current version and update status
|
||||||
|
- Progressing state
|
||||||
|
- Available updates
|
||||||
|
- Version conditions
|
||||||
|
- Enabled capabilities
|
||||||
|
- Update history
|
||||||
|
|
||||||
|
#### Cluster Operators Analysis
|
||||||
|
```bash
|
||||||
|
./scripts/analyze_clusteroperators.py <must-gather-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows cluster operator status similar to `oc get clusteroperators`:
|
||||||
|
- Available, Progressing, Degraded conditions
|
||||||
|
- Version information
|
||||||
|
- Time since condition change
|
||||||
|
- Detailed messages for operators with issues
|
||||||
|
|
||||||
|
#### Pods Analysis
|
||||||
|
```bash
|
||||||
|
# All namespaces
|
||||||
|
./scripts/analyze_pods.py <must-gather-path>
|
||||||
|
|
||||||
|
# Specific namespace
|
||||||
|
./scripts/analyze_pods.py <must-gather-path> --namespace <namespace>
|
||||||
|
|
||||||
|
# Show only problematic pods
|
||||||
|
./scripts/analyze_pods.py <must-gather-path> --problems-only
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows pod status similar to `oc get pods -A`:
|
||||||
|
- Ready/Total containers
|
||||||
|
- Status (Running, Pending, CrashLoopBackOff, etc.)
|
||||||
|
- Restart counts
|
||||||
|
- Age
|
||||||
|
- Categorized issues (crashlooping, pending, failed)
|
||||||
|
|
||||||
|
#### Nodes Analysis
|
||||||
|
```bash
|
||||||
|
./scripts/analyze_nodes.py <must-gather-path>
|
||||||
|
|
||||||
|
# Show only nodes with issues
|
||||||
|
./scripts/analyze_nodes.py <must-gather-path> --problems-only
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows node status similar to `oc get nodes`:
|
||||||
|
- Ready status
|
||||||
|
- Roles (master, worker)
|
||||||
|
- Age
|
||||||
|
- Kubernetes version
|
||||||
|
- Node conditions (DiskPressure, MemoryPressure, etc.)
|
||||||
|
- Capacity and allocatable resources
|
||||||
|
|
||||||
|
#### Network Analysis
|
||||||
|
```bash
|
||||||
|
./scripts/analyze_network.py <must-gather-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows network health:
|
||||||
|
- Network type (OVN-Kubernetes, OpenShift SDN)
|
||||||
|
- Network operator status
|
||||||
|
- OVN pod health
|
||||||
|
- PodNetworkConnectivityCheck results
|
||||||
|
- Network-related issues
|
||||||
|
|
||||||
|
#### Events Analysis
|
||||||
|
```bash
|
||||||
|
# Recent events (last 100)
|
||||||
|
./scripts/analyze_events.py <must-gather-path>
|
||||||
|
|
||||||
|
# Warning events only
|
||||||
|
./scripts/analyze_events.py <must-gather-path> --type Warning
|
||||||
|
|
||||||
|
# Events in specific namespace
|
||||||
|
./scripts/analyze_events.py <must-gather-path> --namespace openshift-etcd
|
||||||
|
|
||||||
|
# Show last 50 events
|
||||||
|
./scripts/analyze_events.py <must-gather-path> --count 50
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows cluster events:
|
||||||
|
- Event type (Warning, Normal)
|
||||||
|
- Last seen timestamp
|
||||||
|
- Reason and message
|
||||||
|
- Affected object
|
||||||
|
- Event count
|
||||||
|
|
||||||
|
#### etcd Analysis
|
||||||
|
```bash
|
||||||
|
./scripts/analyze_etcd.py <must-gather-path>
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows etcd cluster health:
|
||||||
|
- Member health status
|
||||||
|
- Member list with IDs and URLs
|
||||||
|
- Endpoint status (leader, version, DB size)
|
||||||
|
- Quorum status
|
||||||
|
- Cluster summary
|
||||||
|
|
||||||
|
#### Storage Analysis
|
||||||
|
```bash
|
||||||
|
# All PVs and PVCs
|
||||||
|
./scripts/analyze_pvs.py <must-gather-path>
|
||||||
|
|
||||||
|
# PVCs in specific namespace
|
||||||
|
./scripts/analyze_pvs.py <must-gather-path> --namespace openshift-monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows storage resources:
|
||||||
|
- PersistentVolumes (capacity, status, claims)
|
||||||
|
- PersistentVolumeClaims (binding, capacity)
|
||||||
|
- Storage classes
|
||||||
|
- Pending/unbound volumes
|
||||||
|
|
||||||
|
#### Monitoring Analysis
|
||||||
|
```bash
|
||||||
|
# All alerts.
|
||||||
|
./scripts/analyze_prometheus.py <must-gather-path>
|
||||||
|
|
||||||
|
# Alerts in specific namespace
|
||||||
|
./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows monitoring information:
|
||||||
|
- Alerts (state, namespace, name, active since, labels)
|
||||||
|
- Total of pending/firing alerts
|
||||||
|
|
||||||
|
### 3. Interpret and Report
|
||||||
|
|
||||||
|
After running the scripts:
|
||||||
|
1. Review the summary statistics
|
||||||
|
2. Focus on items flagged with issues
|
||||||
|
3. Provide actionable insights and next steps
|
||||||
|
4. Suggest log analysis for specific components if needed
|
||||||
|
5. Cross-reference issues (e.g., degraded operator → failing pods → node issues)
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
All scripts provide:
|
||||||
|
- **Summary Section**: High-level statistics with emoji indicators
|
||||||
|
- **Table View**: `oc`-like formatted output
|
||||||
|
- **Issues Section**: Detailed breakdown of problems
|
||||||
|
|
||||||
|
Example summary format:
|
||||||
|
```
|
||||||
|
================================================================================
|
||||||
|
SUMMARY: 25/28 operators healthy
|
||||||
|
⚠️ 3 operators with issues
|
||||||
|
🔄 1 progressing
|
||||||
|
❌ 2 degraded
|
||||||
|
================================================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
## Helper Scripts Reference
|
||||||
|
|
||||||
|
### scripts/analyze_clusterversion.py
|
||||||
|
Parses: `cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml`
|
||||||
|
Output: ClusterVersion table with detailed version info, conditions, and capabilities
|
||||||
|
|
||||||
|
### scripts/analyze_clusteroperators.py
|
||||||
|
Parses: `cluster-scoped-resources/config.openshift.io/clusteroperators/`
|
||||||
|
Output: ClusterOperator status table with conditions
|
||||||
|
|
||||||
|
### scripts/analyze_pods.py
|
||||||
|
Parses: `namespaces/*/pods/*/*.yaml` (individual pod directories)
|
||||||
|
Output: Pod status table with issues categorized
|
||||||
|
|
||||||
|
### scripts/analyze_nodes.py
|
||||||
|
Parses: `cluster-scoped-resources/core/nodes/`
|
||||||
|
Output: Node status table with conditions and capacity
|
||||||
|
|
||||||
|
### scripts/analyze_network.py
|
||||||
|
Parses: `network_logs/`, network operator, OVN resources
|
||||||
|
Output: Network health summary and diagnostics
|
||||||
|
|
||||||
|
### scripts/analyze_events.py
|
||||||
|
Parses: `namespaces/*/core/events.yaml`
|
||||||
|
Output: Event table sorted by last occurrence
|
||||||
|
|
||||||
|
### scripts/analyze_etcd.py
|
||||||
|
Parses: `etcd_info/` (endpoint_health.json, member_list.json, endpoint_status.json)
|
||||||
|
Output: etcd cluster health and member status
|
||||||
|
|
||||||
|
### scripts/analyze_pvs.py
|
||||||
|
Parses: `cluster-scoped-resources/core/persistentvolumes/`, `namespaces/*/core/persistentvolumeclaims.yaml`
|
||||||
|
Output: PV and PVC status tables
|
||||||
|
|
||||||
|
## Tips for Analysis
|
||||||
|
|
||||||
|
1. **Start with Cluster Operators**: They often reveal system-wide issues
|
||||||
|
2. **Check Timing**: Look at "SINCE" columns to understand when issues started
|
||||||
|
3. **Follow Dependencies**: Degraded operator → check its namespace pods → check hosting nodes
|
||||||
|
4. **Look for Patterns**: Multiple pods failing on same node suggests node issue
|
||||||
|
5. **Cross-reference**: Use multiple scripts together for complete picture
|
||||||
|
|
||||||
|
## Common Scenarios
|
||||||
|
|
||||||
|
### "Why is my cluster degraded?"
|
||||||
|
1. Run `analyze_clusteroperators.py` - identify degraded operators
|
||||||
|
2. Run `analyze_pods.py --namespace <operator-namespace>` - check operator pods
|
||||||
|
3. Run `analyze_nodes.py` - verify node health
|
||||||
|
|
||||||
|
### "Pods keep crashing"
|
||||||
|
1. Run `analyze_pods.py --problems-only` - find crashlooping pods
|
||||||
|
2. Check which nodes they're on
|
||||||
|
3. Run `analyze_nodes.py` - verify node conditions
|
||||||
|
4. Suggest checking pod logs in must-gather data
|
||||||
|
|
||||||
|
### "Network connectivity issues"
|
||||||
|
1. Run `analyze_network.py` - check network health
|
||||||
|
2. Run `analyze_pods.py --namespace openshift-ovn-kubernetes`
|
||||||
|
3. Check PodNetworkConnectivityCheck results
|
||||||
|
|
||||||
|
## Next Steps After Analysis
|
||||||
|
|
||||||
|
Based on findings, suggest:
|
||||||
|
- Examining specific pod logs in `namespaces/<ns>/pods/<pod>/<container>/logs/`
|
||||||
|
- Reviewing events in `namespaces/<ns>/core/events.yaml`
|
||||||
|
- Checking audit logs in `audit_logs/`
|
||||||
|
- Analyzing metrics data if available
|
||||||
|
- Looking at host service logs in `host_service_logs/`
|
||||||
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
@@ -0,0 +1,199 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze ClusterOperator resources from must-gather data.
|
||||||
|
Displays output similar to 'oc get clusteroperators' command.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_clusteroperator(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a single clusteroperator YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
if doc and doc.get('kind') == 'ClusterOperator':
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def get_condition_status(conditions: List[Dict], condition_type: str) -> tuple[str, str, str]:
|
||||||
|
"""
|
||||||
|
Get status, reason, and message for a specific condition type.
|
||||||
|
Returns (status, reason, message).
|
||||||
|
"""
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == condition_type:
|
||||||
|
status = condition.get('status', 'Unknown')
|
||||||
|
reason = condition.get('reason', '')
|
||||||
|
message = condition.get('message', '')
|
||||||
|
return status, reason, message
|
||||||
|
return 'Unknown', '', ''
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_duration(timestamp_str: str) -> str:
|
||||||
|
"""Calculate duration from timestamp to now."""
|
||||||
|
try:
|
||||||
|
# Parse Kubernetes timestamp format
|
||||||
|
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||||
|
now = datetime.now(ts.tzinfo)
|
||||||
|
delta = now - ts
|
||||||
|
|
||||||
|
days = delta.days
|
||||||
|
hours = delta.seconds // 3600
|
||||||
|
minutes = (delta.seconds % 3600) // 60
|
||||||
|
|
||||||
|
if days > 0:
|
||||||
|
return f"{days}d"
|
||||||
|
elif hours > 0:
|
||||||
|
return f"{hours}h"
|
||||||
|
elif minutes > 0:
|
||||||
|
return f"{minutes}m"
|
||||||
|
else:
|
||||||
|
return "<1m"
|
||||||
|
except Exception:
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def get_condition_duration(conditions: List[Dict], condition_type: str) -> str:
|
||||||
|
"""Get the duration since a condition last transitioned."""
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == condition_type:
|
||||||
|
last_transition = condition.get('lastTransitionTime')
|
||||||
|
if last_transition:
|
||||||
|
return calculate_duration(last_transition)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def format_operator_row(operator: Dict[str, Any]) -> Dict[str, str]:
|
||||||
|
"""Format a ClusterOperator into a row for display."""
|
||||||
|
name = operator.get('metadata', {}).get('name', 'unknown')
|
||||||
|
conditions = operator.get('status', {}).get('conditions', [])
|
||||||
|
versions = operator.get('status', {}).get('versions', [])
|
||||||
|
|
||||||
|
# Get version (first version in the list, usually the operator version)
|
||||||
|
version = versions[0].get('version', '') if versions else ''
|
||||||
|
|
||||||
|
# Get condition statuses
|
||||||
|
available_status, _, _ = get_condition_status(conditions, 'Available')
|
||||||
|
progressing_status, _, _ = get_condition_status(conditions, 'Progressing')
|
||||||
|
degraded_status, degraded_reason, degraded_msg = get_condition_status(conditions, 'Degraded')
|
||||||
|
|
||||||
|
# Determine which condition to show duration and message for
|
||||||
|
# Priority: Degraded > Progressing > Available (if false)
|
||||||
|
if degraded_status == 'True':
|
||||||
|
since = get_condition_duration(conditions, 'Degraded')
|
||||||
|
message = degraded_msg if degraded_msg else degraded_reason
|
||||||
|
elif progressing_status == 'True':
|
||||||
|
since = get_condition_duration(conditions, 'Progressing')
|
||||||
|
_, prog_reason, prog_msg = get_condition_status(conditions, 'Progressing')
|
||||||
|
message = prog_msg if prog_msg else prog_reason
|
||||||
|
elif available_status == 'False':
|
||||||
|
since = get_condition_duration(conditions, 'Available')
|
||||||
|
_, avail_reason, avail_msg = get_condition_status(conditions, 'Available')
|
||||||
|
message = avail_msg if avail_msg else avail_reason
|
||||||
|
else:
|
||||||
|
# All good, show time since available
|
||||||
|
since = get_condition_duration(conditions, 'Available')
|
||||||
|
message = ''
|
||||||
|
|
||||||
|
return {
|
||||||
|
'name': name,
|
||||||
|
'version': version,
|
||||||
|
'available': available_status,
|
||||||
|
'progressing': progressing_status,
|
||||||
|
'degraded': degraded_status,
|
||||||
|
'since': since,
|
||||||
|
'message': message
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_operators_table(operators: List[Dict[str, str]]):
|
||||||
|
"""Print operators in a formatted table like 'oc get clusteroperators'."""
|
||||||
|
if not operators:
|
||||||
|
print("No resources found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Print header - no width limit on VERSION to match oc output
|
||||||
|
print(f"{'NAME':<42} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'DEGRADED':<10} {'SINCE':<7} MESSAGE")
|
||||||
|
|
||||||
|
# Print rows
|
||||||
|
for op in operators:
|
||||||
|
name = op['name'][:42]
|
||||||
|
version = op['version'] # Don't truncate version
|
||||||
|
available = op['available'][:11]
|
||||||
|
progressing = op['progressing'][:13]
|
||||||
|
degraded = op['degraded'][:10]
|
||||||
|
since = op['since'][:7]
|
||||||
|
message = op['message']
|
||||||
|
|
||||||
|
print(f"{name:<42} {version:<50} {available:<11} {progressing:<13} {degraded:<10} {since:<7} {message}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_clusteroperators(must_gather_path: str):
|
||||||
|
"""Analyze all clusteroperators in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Common paths where clusteroperators might be
|
||||||
|
possible_patterns = [
|
||||||
|
"cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||||
|
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
clusteroperators = []
|
||||||
|
|
||||||
|
# Find and parse all clusteroperator files
|
||||||
|
for pattern in possible_patterns:
|
||||||
|
for co_file in base_path.glob(pattern):
|
||||||
|
operator = parse_clusteroperator(co_file)
|
||||||
|
if operator:
|
||||||
|
clusteroperators.append(operator)
|
||||||
|
|
||||||
|
if not clusteroperators:
|
||||||
|
print("No resources found.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Remove duplicates (same operator from different glob patterns)
|
||||||
|
seen = set()
|
||||||
|
unique_operators = []
|
||||||
|
for op in clusteroperators:
|
||||||
|
name = op.get('metadata', {}).get('name')
|
||||||
|
if name and name not in seen:
|
||||||
|
seen.add(name)
|
||||||
|
unique_operators.append(op)
|
||||||
|
|
||||||
|
# Format and sort operators by name
|
||||||
|
formatted_ops = [format_operator_row(op) for op in unique_operators]
|
||||||
|
formatted_ops.sort(key=lambda x: x['name'])
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_operators_table(formatted_ops)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: analyze_clusteroperators.py <must-gather-directory>", file=sys.stderr)
|
||||||
|
print("\nExample:", file=sys.stderr)
|
||||||
|
print(" analyze_clusteroperators.py ./must-gather.local.123456789", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
must_gather_path = sys.argv[1]
|
||||||
|
|
||||||
|
if not os.path.isdir(must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_clusteroperators(must_gather_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
@@ -0,0 +1,261 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze ClusterVersion from must-gather data.
|
||||||
|
Displays output similar to 'oc get clusterversion' command.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_clusterversion(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse the clusterversion YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
if doc and doc.get('kind') == 'ClusterVersion':
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def get_condition_status(conditions: list, condition_type: str) -> str:
|
||||||
|
"""Get status for a specific condition type."""
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == condition_type:
|
||||||
|
return condition.get('status', 'Unknown')
|
||||||
|
return 'Unknown'
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_duration(timestamp_str: str) -> str:
|
||||||
|
"""Calculate duration from timestamp to now."""
|
||||||
|
try:
|
||||||
|
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||||
|
now = datetime.now(ts.tzinfo)
|
||||||
|
delta = now - ts
|
||||||
|
|
||||||
|
days = delta.days
|
||||||
|
hours = delta.seconds // 3600
|
||||||
|
minutes = (delta.seconds % 3600) // 60
|
||||||
|
|
||||||
|
if days > 0:
|
||||||
|
return f"{days}d"
|
||||||
|
elif hours > 0:
|
||||||
|
return f"{hours}h"
|
||||||
|
elif minutes > 0:
|
||||||
|
return f"{minutes}m"
|
||||||
|
else:
|
||||||
|
return "<1m"
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def format_clusterversion(cv: Dict[str, Any]) -> Dict[str, str]:
|
||||||
|
"""Format ClusterVersion for display."""
|
||||||
|
name = cv.get('metadata', {}).get('name', 'version')
|
||||||
|
status = cv.get('status', {})
|
||||||
|
|
||||||
|
# Get version from desired
|
||||||
|
desired = status.get('desired', {})
|
||||||
|
version = desired.get('version', '')
|
||||||
|
|
||||||
|
# Get available updates count
|
||||||
|
available_updates = status.get('availableUpdates')
|
||||||
|
if available_updates and isinstance(available_updates, list):
|
||||||
|
available = str(len(available_updates))
|
||||||
|
elif available_updates is None:
|
||||||
|
available = ''
|
||||||
|
else:
|
||||||
|
available = '0'
|
||||||
|
|
||||||
|
# Get conditions
|
||||||
|
conditions = status.get('conditions', [])
|
||||||
|
progressing = get_condition_status(conditions, 'Progressing')
|
||||||
|
since = ''
|
||||||
|
|
||||||
|
# Get time since progressing started (if true) or since last update
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == 'Progressing':
|
||||||
|
last_transition = condition.get('lastTransitionTime')
|
||||||
|
if last_transition:
|
||||||
|
since = calculate_duration(last_transition)
|
||||||
|
break
|
||||||
|
|
||||||
|
# Get status message
|
||||||
|
status_msg = ''
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == 'Progressing' and condition.get('status') == 'True':
|
||||||
|
status_msg = condition.get('message', '')[:80]
|
||||||
|
break
|
||||||
|
|
||||||
|
# If not progressing, check if failed
|
||||||
|
if progressing != 'True':
|
||||||
|
for condition in conditions:
|
||||||
|
if condition.get('type') == 'Failing' and condition.get('status') == 'True':
|
||||||
|
status_msg = condition.get('message', '')[:80]
|
||||||
|
break
|
||||||
|
|
||||||
|
return {
|
||||||
|
'name': name,
|
||||||
|
'version': version,
|
||||||
|
'available': available,
|
||||||
|
'progressing': progressing,
|
||||||
|
'since': since,
|
||||||
|
'status': status_msg
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_clusterversion_table(cv_info: Dict[str, str]):
|
||||||
|
"""Print ClusterVersion in a formatted table like 'oc get clusterversion'."""
|
||||||
|
# Print header
|
||||||
|
print(f"{'NAME':<10} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'SINCE':<7} STATUS")
|
||||||
|
|
||||||
|
# Print row
|
||||||
|
name = cv_info['name'][:10]
|
||||||
|
version = cv_info['version'][:50]
|
||||||
|
available = cv_info['available'][:11]
|
||||||
|
progressing = cv_info['progressing'][:13]
|
||||||
|
since = cv_info['since'][:7]
|
||||||
|
status = cv_info['status']
|
||||||
|
|
||||||
|
print(f"{name:<10} {version:<50} {available:<11} {progressing:<13} {since:<7} {status}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_detailed_info(cv: Dict[str, Any]):
|
||||||
|
"""Print detailed cluster version information."""
|
||||||
|
status = cv.get('status', {})
|
||||||
|
spec = cv.get('spec', {})
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print("CLUSTER VERSION DETAILS")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
# Cluster ID
|
||||||
|
cluster_id = spec.get('clusterID', 'unknown')
|
||||||
|
print(f"Cluster ID: {cluster_id}")
|
||||||
|
|
||||||
|
# Desired version
|
||||||
|
desired = status.get('desired', {})
|
||||||
|
print(f"Desired Version: {desired.get('version', 'unknown')}")
|
||||||
|
print(f"Desired Image: {desired.get('image', 'unknown')}")
|
||||||
|
|
||||||
|
# Version hash
|
||||||
|
version_hash = status.get('versionHash', '')
|
||||||
|
if version_hash:
|
||||||
|
print(f"Version Hash: {version_hash}")
|
||||||
|
|
||||||
|
# Upstream
|
||||||
|
upstream = spec.get('upstream', '')
|
||||||
|
if upstream:
|
||||||
|
print(f"Update Server: {upstream}")
|
||||||
|
|
||||||
|
# Conditions
|
||||||
|
conditions = status.get('conditions', [])
|
||||||
|
print(f"\nCONDITIONS:")
|
||||||
|
for condition in conditions:
|
||||||
|
cond_type = condition.get('type', 'Unknown')
|
||||||
|
cond_status = condition.get('status', 'Unknown')
|
||||||
|
last_transition = condition.get('lastTransitionTime', '')
|
||||||
|
message = condition.get('message', '')
|
||||||
|
|
||||||
|
# Calculate time since transition
|
||||||
|
age = calculate_duration(last_transition) if last_transition else ''
|
||||||
|
|
||||||
|
status_indicator = "✅" if cond_status == "True" else "❌" if cond_status == "False" else "❓"
|
||||||
|
print(f" {status_indicator} {cond_type}: {cond_status} (for {age})")
|
||||||
|
if message and cond_status == 'True':
|
||||||
|
print(f" Message: {message[:100]}")
|
||||||
|
|
||||||
|
# Update history
|
||||||
|
history = status.get('history', [])
|
||||||
|
if history:
|
||||||
|
print(f"\nUPDATE HISTORY (last 5):")
|
||||||
|
for i, entry in enumerate(history[:5]):
|
||||||
|
state = entry.get('state', 'Unknown')
|
||||||
|
version = entry.get('version', 'unknown')
|
||||||
|
image = entry.get('image', '')
|
||||||
|
completion_time = entry.get('completionTime', '')
|
||||||
|
|
||||||
|
age = calculate_duration(completion_time) if completion_time else ''
|
||||||
|
print(f" {i+1}. {version} - {state} {f'({age} ago)' if age else ''}")
|
||||||
|
|
||||||
|
# Available updates
|
||||||
|
available_updates = status.get('availableUpdates')
|
||||||
|
if available_updates and isinstance(available_updates, list) and len(available_updates) > 0:
|
||||||
|
print(f"\nAVAILABLE UPDATES ({len(available_updates)}):")
|
||||||
|
for i, update in enumerate(available_updates[:5]):
|
||||||
|
version = update.get('version', 'unknown')
|
||||||
|
image = update.get('image', '')
|
||||||
|
print(f" {i+1}. {version}")
|
||||||
|
elif available_updates is None:
|
||||||
|
print(f"\nAVAILABLE UPDATES: Unable to retrieve updates")
|
||||||
|
|
||||||
|
# Capabilities
|
||||||
|
capabilities = status.get('capabilities', {})
|
||||||
|
enabled_caps = capabilities.get('enabledCapabilities', [])
|
||||||
|
if enabled_caps:
|
||||||
|
print(f"\nENABLED CAPABILITIES ({len(enabled_caps)}):")
|
||||||
|
# Print in columns
|
||||||
|
for i in range(0, len(enabled_caps), 3):
|
||||||
|
caps = enabled_caps[i:i+3]
|
||||||
|
print(f" {', '.join(caps)}")
|
||||||
|
|
||||||
|
print(f"{'='*80}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_clusterversion(must_gather_path: str):
|
||||||
|
"""Analyze ClusterVersion in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Find ClusterVersion file
|
||||||
|
possible_patterns = [
|
||||||
|
"cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||||
|
"*/cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
cv = None
|
||||||
|
for pattern in possible_patterns:
|
||||||
|
for cv_file in base_path.glob(pattern):
|
||||||
|
cv = parse_clusterversion(cv_file)
|
||||||
|
if cv:
|
||||||
|
break
|
||||||
|
if cv:
|
||||||
|
break
|
||||||
|
|
||||||
|
if not cv:
|
||||||
|
print("No ClusterVersion found.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Format and print table
|
||||||
|
cv_info = format_clusterversion(cv)
|
||||||
|
print_clusterversion_table(cv_info)
|
||||||
|
|
||||||
|
# Print detailed information
|
||||||
|
print_detailed_info(cv)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: analyze_clusterversion.py <must-gather-directory>", file=sys.stderr)
|
||||||
|
print("\nExample:", file=sys.stderr)
|
||||||
|
print(" analyze_clusterversion.py ./must-gather.local.123456789", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
must_gather_path = sys.argv[1]
|
||||||
|
|
||||||
|
if not os.path.isdir(must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_clusterversion(must_gather_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
@@ -0,0 +1,206 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze etcd information from must-gather data.
|
||||||
|
Shows etcd cluster health, member status, and diagnostics.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, Any, List, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_etcd_info(must_gather_path: Path) -> Dict[str, Any]:
|
||||||
|
"""Parse etcd_info directory for cluster health information."""
|
||||||
|
etcd_data = {
|
||||||
|
'member_health': [],
|
||||||
|
'member_list': [],
|
||||||
|
'endpoint_health': [],
|
||||||
|
'endpoint_status': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Find etcd_info directory
|
||||||
|
etcd_dirs = list(must_gather_path.glob("etcd_info")) + \
|
||||||
|
list(must_gather_path.glob("*/etcd_info"))
|
||||||
|
|
||||||
|
if not etcd_dirs:
|
||||||
|
return etcd_data
|
||||||
|
|
||||||
|
etcd_info_dir = etcd_dirs[0]
|
||||||
|
|
||||||
|
# Parse member health
|
||||||
|
member_health_file = etcd_info_dir / "endpoint_health.json"
|
||||||
|
if member_health_file.exists():
|
||||||
|
try:
|
||||||
|
with open(member_health_file, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
etcd_data['member_health'] = data if isinstance(data, list) else [data]
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Parse member list
|
||||||
|
member_list_file = etcd_info_dir / "member_list.json"
|
||||||
|
if member_list_file.exists():
|
||||||
|
try:
|
||||||
|
with open(member_list_file, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
if isinstance(data, dict) and 'members' in data:
|
||||||
|
etcd_data['member_list'] = data['members']
|
||||||
|
elif isinstance(data, list):
|
||||||
|
etcd_data['member_list'] = data
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse member_list.json: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Parse endpoint health
|
||||||
|
endpoint_health_file = etcd_info_dir / "endpoint_health.json"
|
||||||
|
if endpoint_health_file.exists():
|
||||||
|
try:
|
||||||
|
with open(endpoint_health_file, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
etcd_data['endpoint_health'] = data if isinstance(data, list) else [data]
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
# Parse endpoint status
|
||||||
|
endpoint_status_file = etcd_info_dir / "endpoint_status.json"
|
||||||
|
if endpoint_status_file.exists():
|
||||||
|
try:
|
||||||
|
with open(endpoint_status_file, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
etcd_data['endpoint_status'] = data if isinstance(data, list) else [data]
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse endpoint_status.json: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
return etcd_data
|
||||||
|
|
||||||
|
|
||||||
|
def print_member_health(members: List[Dict[str, Any]]):
|
||||||
|
"""Print etcd member health status."""
|
||||||
|
if not members:
|
||||||
|
print("No member health data found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("ETCD MEMBER HEALTH")
|
||||||
|
print(f"{'ENDPOINT':<60} {'HEALTH':<10} {'TOOK':<10} ERROR")
|
||||||
|
|
||||||
|
for member in members:
|
||||||
|
endpoint = member.get('endpoint', 'unknown')[:60]
|
||||||
|
health = 'true' if member.get('health') else 'false'
|
||||||
|
took = member.get('took', '')
|
||||||
|
error = member.get('error', '')
|
||||||
|
|
||||||
|
print(f"{endpoint:<60} {health:<10} {took:<10} {error}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_member_list(members: List[Dict[str, Any]]):
|
||||||
|
"""Print etcd member list."""
|
||||||
|
if not members:
|
||||||
|
print("\nNo member list data found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("\nETCD MEMBER LIST")
|
||||||
|
print(f"{'ID':<20} {'NAME':<40} {'PEER URLS':<60} {'CLIENT URLS':<60}")
|
||||||
|
|
||||||
|
for member in members:
|
||||||
|
member_id = str(member.get('ID', member.get('id', 'unknown')))[:20]
|
||||||
|
name = member.get('name', 'unknown')[:40]
|
||||||
|
peer_urls = ','.join(member.get('peerURLs', []))[:60]
|
||||||
|
client_urls = ','.join(member.get('clientURLs', []))[:60]
|
||||||
|
|
||||||
|
print(f"{member_id:<20} {name:<40} {peer_urls:<60} {client_urls:<60}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_endpoint_status(endpoints: List[Dict[str, Any]]):
|
||||||
|
"""Print etcd endpoint status."""
|
||||||
|
if not endpoints:
|
||||||
|
print("\nNo endpoint status data found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("\nETCD ENDPOINT STATUS")
|
||||||
|
print(f"{'ENDPOINT':<60} {'LEADER':<20} {'VERSION':<10} {'DB SIZE':<10} {'IS LEARNER'}")
|
||||||
|
|
||||||
|
for endpoint in endpoints:
|
||||||
|
ep = endpoint.get('Endpoint', 'unknown')[:60]
|
||||||
|
|
||||||
|
status = endpoint.get('Status', {})
|
||||||
|
leader = str(status.get('leader', 'unknown'))[:20]
|
||||||
|
version = status.get('version', 'unknown')[:10]
|
||||||
|
|
||||||
|
db_size = status.get('dbSize', 0)
|
||||||
|
db_size_mb = f"{db_size / (1024*1024):.1f}MB" if db_size else '0MB'
|
||||||
|
|
||||||
|
is_learner = 'true' if status.get('isLearner') else 'false'
|
||||||
|
|
||||||
|
print(f"{ep:<60} {leader:<20} {version:<10} {db_size_mb:<10} {is_learner}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_summary(etcd_data: Dict[str, Any]):
|
||||||
|
"""Print summary of etcd cluster health."""
|
||||||
|
member_health = etcd_data.get('member_health', [])
|
||||||
|
member_list = etcd_data.get('member_list', [])
|
||||||
|
|
||||||
|
total_members = len(member_list)
|
||||||
|
healthy_members = sum(1 for m in member_health if m.get('health'))
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"ETCD CLUSTER SUMMARY")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print(f"Total Members: {total_members}")
|
||||||
|
print(f"Healthy Members: {healthy_members}/{len(member_health) if member_health else total_members}")
|
||||||
|
|
||||||
|
if healthy_members < total_members:
|
||||||
|
print(f" ⚠️ Warning: Not all members are healthy!")
|
||||||
|
elif healthy_members == total_members and total_members > 0:
|
||||||
|
print(f" ✅ All members healthy")
|
||||||
|
|
||||||
|
# Check for quorum
|
||||||
|
if total_members >= 3:
|
||||||
|
quorum = (total_members // 2) + 1
|
||||||
|
if healthy_members >= quorum:
|
||||||
|
print(f" ✅ Quorum achieved ({healthy_members}/{quorum})")
|
||||||
|
else:
|
||||||
|
print(f" ❌ Quorum lost! ({healthy_members}/{quorum})")
|
||||||
|
print(f"{'='*80}\n")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_etcd(must_gather_path: str):
|
||||||
|
"""Analyze etcd information in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
etcd_data = parse_etcd_info(base_path)
|
||||||
|
|
||||||
|
if not any(etcd_data.values()):
|
||||||
|
print("No etcd_info data found in must-gather.")
|
||||||
|
print("Expected location: etcd_info/ directory")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Print summary first
|
||||||
|
print_summary(etcd_data)
|
||||||
|
|
||||||
|
# Print detailed information
|
||||||
|
print_member_health(etcd_data.get('member_health', []))
|
||||||
|
print_member_list(etcd_data.get('member_list', []))
|
||||||
|
print_endpoint_status(etcd_data.get('endpoint_status', []))
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: analyze_etcd.py <must-gather-directory>", file=sys.stderr)
|
||||||
|
print("\nExample:", file=sys.stderr)
|
||||||
|
print(" analyze_etcd.py ./must-gather.local.123456789", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
must_gather_path = sys.argv[1]
|
||||||
|
|
||||||
|
if not os.path.isdir(must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_etcd(must_gather_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
@@ -0,0 +1,201 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze Events from must-gather data.
|
||||||
|
Shows warning and error events sorted by last occurrence.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
|
||||||
|
def parse_events_file(file_path: Path) -> List[Dict[str, Any]]:
|
||||||
|
"""Parse events YAML file which may contain multiple events."""
|
||||||
|
events = []
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
docs = yaml.safe_load_all(f)
|
||||||
|
for doc in docs:
|
||||||
|
if doc and doc.get('kind') == 'Event':
|
||||||
|
events.append(doc)
|
||||||
|
elif doc and doc.get('kind') == 'EventList':
|
||||||
|
# Handle EventList
|
||||||
|
events.extend(doc.get('items', []))
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return events
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_age(timestamp_str: str) -> str:
|
||||||
|
"""Calculate age from timestamp."""
|
||||||
|
try:
|
||||||
|
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||||
|
now = datetime.now(ts.tzinfo)
|
||||||
|
delta = now - ts
|
||||||
|
|
||||||
|
days = delta.days
|
||||||
|
hours = delta.seconds // 3600
|
||||||
|
minutes = (delta.seconds % 3600) // 60
|
||||||
|
|
||||||
|
if days > 0:
|
||||||
|
return f"{days}d"
|
||||||
|
elif hours > 0:
|
||||||
|
return f"{hours}h"
|
||||||
|
elif minutes > 0:
|
||||||
|
return f"{minutes}m"
|
||||||
|
else:
|
||||||
|
return "<1m"
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def format_event(event: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""Format an event for display."""
|
||||||
|
metadata = event.get('metadata', {})
|
||||||
|
|
||||||
|
namespace = metadata.get('namespace', '')
|
||||||
|
name = metadata.get('name', 'unknown')
|
||||||
|
|
||||||
|
# Get last timestamp
|
||||||
|
last_timestamp = event.get('lastTimestamp') or event.get('eventTime') or metadata.get('creationTimestamp', '')
|
||||||
|
age = calculate_age(last_timestamp) if last_timestamp else ''
|
||||||
|
|
||||||
|
# Event details
|
||||||
|
event_type = event.get('type', 'Normal')
|
||||||
|
reason = event.get('reason', '')
|
||||||
|
message = event.get('message', '')
|
||||||
|
count = event.get('count', 1)
|
||||||
|
|
||||||
|
# Involved object
|
||||||
|
involved = event.get('involvedObject', {})
|
||||||
|
obj_kind = involved.get('kind', '')
|
||||||
|
obj_name = involved.get('name', '')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'namespace': namespace,
|
||||||
|
'last_seen': age,
|
||||||
|
'type': event_type,
|
||||||
|
'reason': reason,
|
||||||
|
'object_kind': obj_kind,
|
||||||
|
'object_name': obj_name,
|
||||||
|
'message': message,
|
||||||
|
'count': count,
|
||||||
|
'timestamp': last_timestamp
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_events_table(events: List[Dict[str, Any]]):
|
||||||
|
"""Print events in a table format."""
|
||||||
|
if not events:
|
||||||
|
print("No resources found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Print header
|
||||||
|
print(f"{'NAMESPACE':<30} {'LAST SEEN':<10} {'TYPE':<10} {'REASON':<30} {'OBJECT':<40} {'MESSAGE':<60}")
|
||||||
|
|
||||||
|
# Print rows
|
||||||
|
for event in events:
|
||||||
|
namespace = event['namespace'][:30] if event['namespace'] else '<cluster>'
|
||||||
|
last_seen = event['last_seen'][:10]
|
||||||
|
event_type = event['type'][:10]
|
||||||
|
reason = event['reason'][:30]
|
||||||
|
obj = f"{event['object_kind']}/{event['object_name']}"[:40]
|
||||||
|
message = event['message'][:60]
|
||||||
|
|
||||||
|
print(f"{namespace:<30} {last_seen:<10} {event_type:<10} {reason:<30} {obj:<40} {message:<60}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_events(must_gather_path: str, namespace: Optional[str] = None,
|
||||||
|
event_type: Optional[str] = None, show_count: int = 100):
|
||||||
|
"""Analyze events in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
all_events = []
|
||||||
|
|
||||||
|
# Find all events files
|
||||||
|
if namespace:
|
||||||
|
patterns = [
|
||||||
|
f"namespaces/{namespace}/core/events.yaml",
|
||||||
|
f"*/namespaces/{namespace}/core/events.yaml",
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
patterns = [
|
||||||
|
"namespaces/*/core/events.yaml",
|
||||||
|
"*/namespaces/*/core/events.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for events_file in base_path.glob(pattern):
|
||||||
|
events = parse_events_file(events_file)
|
||||||
|
all_events.extend(events)
|
||||||
|
|
||||||
|
if not all_events:
|
||||||
|
print("No resources found.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Format events
|
||||||
|
formatted_events = [format_event(e) for e in all_events]
|
||||||
|
|
||||||
|
# Filter by type if specified
|
||||||
|
if event_type:
|
||||||
|
formatted_events = [e for e in formatted_events if e['type'].lower() == event_type.lower()]
|
||||||
|
|
||||||
|
# Sort by timestamp (most recent first)
|
||||||
|
formatted_events.sort(key=lambda x: x['timestamp'], reverse=True)
|
||||||
|
|
||||||
|
# Limit count
|
||||||
|
if show_count and show_count > 0:
|
||||||
|
formatted_events = formatted_events[:show_count]
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_events_table(formatted_events)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
total = len(formatted_events)
|
||||||
|
warnings = sum(1 for e in formatted_events if e['type'] == 'Warning')
|
||||||
|
normal = sum(1 for e in formatted_events if e['type'] == 'Normal')
|
||||||
|
|
||||||
|
print(f"\nShowing {total} most recent events")
|
||||||
|
if warnings > 0:
|
||||||
|
print(f" ⚠️ {warnings} Warning events")
|
||||||
|
if normal > 0:
|
||||||
|
print(f" ℹ️ {normal} Normal events")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Analyze events from must-gather data',
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s ./must-gather
|
||||||
|
%(prog)s ./must-gather --namespace openshift-etcd
|
||||||
|
%(prog)s ./must-gather --type Warning
|
||||||
|
%(prog)s ./must-gather --count 50
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||||
|
parser.add_argument('-t', '--type', help='Filter by event type (Warning, Normal)')
|
||||||
|
parser.add_argument('-c', '--count', type=int, default=100,
|
||||||
|
help='Number of events to show (default: 100)')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_events(args.must_gather_path, args.namespace, args.type, args.count)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
@@ -0,0 +1,281 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze Network resources and diagnostics from must-gather data.
|
||||||
|
Shows network operator status, OVN pods, and connectivity checks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def get_network_type(must_gather_path: Path) -> str:
|
||||||
|
"""Determine the network type from cluster network config."""
|
||||||
|
# First try to find networks.yaml (List object)
|
||||||
|
patterns = [
|
||||||
|
"cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||||
|
"*/cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for network_file in must_gather_path.glob(pattern):
|
||||||
|
network_list = parse_yaml_file(network_file)
|
||||||
|
if network_list:
|
||||||
|
# Handle NetworkList object
|
||||||
|
items = network_list.get('items', [])
|
||||||
|
if items:
|
||||||
|
# Get the first network item
|
||||||
|
network = items[0]
|
||||||
|
spec = network.get('spec', {})
|
||||||
|
network_type = spec.get('networkType', 'Unknown')
|
||||||
|
if network_type != 'Unknown':
|
||||||
|
return network_type
|
||||||
|
|
||||||
|
# Fallback: try individual network config files
|
||||||
|
patterns = [
|
||||||
|
"cluster-scoped-resources/config.openshift.io/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for network_file in must_gather_path.glob(pattern):
|
||||||
|
if network_file.name in ['networks.yaml']:
|
||||||
|
continue
|
||||||
|
|
||||||
|
network = parse_yaml_file(network_file)
|
||||||
|
if network:
|
||||||
|
spec = network.get('spec', {})
|
||||||
|
network_type = spec.get('networkType', 'Unknown')
|
||||||
|
if network_type != 'Unknown':
|
||||||
|
return network_type
|
||||||
|
|
||||||
|
return 'Unknown'
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_network_operator(must_gather_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Analyze network operator status."""
|
||||||
|
patterns = [
|
||||||
|
"cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||||
|
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for op_file in must_gather_path.glob(pattern):
|
||||||
|
operator = parse_yaml_file(op_file)
|
||||||
|
if operator:
|
||||||
|
conditions = operator.get('status', {}).get('conditions', [])
|
||||||
|
result = {}
|
||||||
|
|
||||||
|
for cond in conditions:
|
||||||
|
cond_type = cond.get('type')
|
||||||
|
if cond_type in ['Available', 'Progressing', 'Degraded']:
|
||||||
|
result[cond_type] = cond.get('status', 'Unknown')
|
||||||
|
result[f'{cond_type}_message'] = cond.get('message', '')
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_ovn_pods(must_gather_path: Path) -> List[Dict[str, str]]:
|
||||||
|
"""Analyze OVN-Kubernetes pods."""
|
||||||
|
pods = []
|
||||||
|
|
||||||
|
patterns = [
|
||||||
|
"namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||||
|
"*/namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for pod_file in must_gather_path.glob(pattern):
|
||||||
|
if pod_file.name == 'pods.yaml':
|
||||||
|
continue
|
||||||
|
|
||||||
|
pod = parse_yaml_file(pod_file)
|
||||||
|
if pod:
|
||||||
|
name = pod.get('metadata', {}).get('name', 'unknown')
|
||||||
|
status = pod.get('status', {})
|
||||||
|
phase = status.get('phase', 'Unknown')
|
||||||
|
|
||||||
|
container_statuses = status.get('containerStatuses', [])
|
||||||
|
total = len(pod.get('spec', {}).get('containers', []))
|
||||||
|
ready = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||||
|
|
||||||
|
pods.append({
|
||||||
|
'name': name,
|
||||||
|
'ready': f"{ready}/{total}",
|
||||||
|
'status': phase
|
||||||
|
})
|
||||||
|
|
||||||
|
# Remove duplicates
|
||||||
|
seen = set()
|
||||||
|
unique_pods = []
|
||||||
|
for p in pods:
|
||||||
|
if p['name'] not in seen:
|
||||||
|
seen.add(p['name'])
|
||||||
|
unique_pods.append(p)
|
||||||
|
|
||||||
|
return sorted(unique_pods, key=lambda x: x['name'])
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_connectivity_checks(must_gather_path: Path) -> Dict[str, Any]:
|
||||||
|
"""Analyze PodNetworkConnectivityCheck resources."""
|
||||||
|
# First try to find podnetworkconnectivitychecks.yaml (List object)
|
||||||
|
patterns = [
|
||||||
|
"pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||||
|
"*/pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
total_checks = 0
|
||||||
|
failed_checks = []
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for check_file in must_gather_path.glob(pattern):
|
||||||
|
check_list = parse_yaml_file(check_file)
|
||||||
|
if check_list:
|
||||||
|
items = check_list.get('items', [])
|
||||||
|
for check in items:
|
||||||
|
total_checks += 1
|
||||||
|
name = check.get('metadata', {}).get('name', 'unknown')
|
||||||
|
status = check.get('status', {})
|
||||||
|
|
||||||
|
conditions = status.get('conditions', [])
|
||||||
|
for cond in conditions:
|
||||||
|
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||||
|
failed_checks.append({
|
||||||
|
'name': name,
|
||||||
|
'message': cond.get('message', 'Unknown')
|
||||||
|
})
|
||||||
|
|
||||||
|
# If we found the list file, no need to continue
|
||||||
|
if total_checks > 0:
|
||||||
|
return {
|
||||||
|
'total': total_checks,
|
||||||
|
'failed': failed_checks
|
||||||
|
}
|
||||||
|
|
||||||
|
# Fallback: try individual check files
|
||||||
|
patterns = [
|
||||||
|
"*/pod_network_connectivity_check/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for check_file in must_gather_path.glob(pattern):
|
||||||
|
if check_file.name == 'podnetworkconnectivitychecks.yaml':
|
||||||
|
continue
|
||||||
|
|
||||||
|
check = parse_yaml_file(check_file)
|
||||||
|
if check:
|
||||||
|
total_checks += 1
|
||||||
|
name = check.get('metadata', {}).get('name', 'unknown')
|
||||||
|
status = check.get('status', {})
|
||||||
|
|
||||||
|
conditions = status.get('conditions', [])
|
||||||
|
for cond in conditions:
|
||||||
|
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||||
|
failed_checks.append({
|
||||||
|
'name': name,
|
||||||
|
'message': cond.get('message', 'Unknown')
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
'total': total_checks,
|
||||||
|
'failed': failed_checks
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_network_summary(network_type: str, operator_status: Optional[Dict],
|
||||||
|
ovn_pods: List[Dict], connectivity: Dict):
|
||||||
|
"""Print network analysis summary."""
|
||||||
|
print(f"{'NETWORK TYPE':<30} {network_type}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
if operator_status:
|
||||||
|
print("NETWORK OPERATOR STATUS")
|
||||||
|
print(f"{'Available':<15} {operator_status.get('Available', 'Unknown')}")
|
||||||
|
print(f"{'Progressing':<15} {operator_status.get('Progressing', 'Unknown')}")
|
||||||
|
print(f"{'Degraded':<15} {operator_status.get('Degraded', 'Unknown')}")
|
||||||
|
|
||||||
|
if operator_status.get('Degraded') == 'True':
|
||||||
|
msg = operator_status.get('Degraded_message', '')
|
||||||
|
if msg:
|
||||||
|
print(f" Message: {msg}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
if ovn_pods and network_type == 'OVNKubernetes':
|
||||||
|
print("OVN-KUBERNETES PODS")
|
||||||
|
print(f"{'NAME':<60} {'READY':<10} STATUS")
|
||||||
|
for pod in ovn_pods:
|
||||||
|
name = pod['name'][:60]
|
||||||
|
ready = pod['ready'][:10]
|
||||||
|
status = pod['status']
|
||||||
|
print(f"{name:<60} {ready:<10} {status}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
if connectivity['total'] > 0:
|
||||||
|
print(f"NETWORK CONNECTIVITY CHECKS: {connectivity['total']} total")
|
||||||
|
if connectivity['failed']:
|
||||||
|
print(f" Failed: {len(connectivity['failed'])}")
|
||||||
|
for failed in connectivity['failed'][:10]: # Show first 10
|
||||||
|
print(f" - {failed['name']}")
|
||||||
|
if failed['message']:
|
||||||
|
print(f" {failed['message'][:100]}")
|
||||||
|
else:
|
||||||
|
print(" All checks passing")
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_network(must_gather_path: str):
|
||||||
|
"""Analyze network resources in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Get network type
|
||||||
|
network_type = get_network_type(base_path)
|
||||||
|
|
||||||
|
# Get network operator status
|
||||||
|
operator_status = analyze_network_operator(base_path)
|
||||||
|
|
||||||
|
# Get OVN pods if applicable
|
||||||
|
ovn_pods = []
|
||||||
|
if network_type == 'OVNKubernetes':
|
||||||
|
ovn_pods = analyze_ovn_pods(base_path)
|
||||||
|
|
||||||
|
# Get connectivity checks
|
||||||
|
connectivity = analyze_connectivity_checks(base_path)
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
print_network_summary(network_type, operator_status, ovn_pods, connectivity)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: analyze_network.py <must-gather-directory>", file=sys.stderr)
|
||||||
|
print("\nExample:", file=sys.stderr)
|
||||||
|
print(" analyze_network.py ./must-gather.local.123456789", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
must_gather_path = sys.argv[1]
|
||||||
|
|
||||||
|
if not os.path.isdir(must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_network(must_gather_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
@@ -0,0 +1,224 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze Node resources from must-gather data.
|
||||||
|
Displays output similar to 'oc get nodes' command.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_node(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a single node YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
if doc and doc.get('kind') == 'Node':
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_age(creation_timestamp: str) -> str:
|
||||||
|
"""Calculate age from creation timestamp."""
|
||||||
|
try:
|
||||||
|
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||||
|
now = datetime.now(ts.tzinfo)
|
||||||
|
delta = now - ts
|
||||||
|
|
||||||
|
days = delta.days
|
||||||
|
hours = delta.seconds // 3600
|
||||||
|
|
||||||
|
if days > 0:
|
||||||
|
return f"{days}d"
|
||||||
|
elif hours > 0:
|
||||||
|
return f"{hours}h"
|
||||||
|
else:
|
||||||
|
return "<1h"
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def get_node_roles(labels: Dict[str, str]) -> str:
|
||||||
|
"""Extract node roles from labels."""
|
||||||
|
roles = []
|
||||||
|
for key in labels:
|
||||||
|
if key.startswith('node-role.kubernetes.io/'):
|
||||||
|
role = key.split('/')[-1]
|
||||||
|
if role:
|
||||||
|
roles.append(role)
|
||||||
|
|
||||||
|
return ','.join(sorted(roles)) if roles else '<none>'
|
||||||
|
|
||||||
|
|
||||||
|
def get_node_status(node: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""Extract node status information."""
|
||||||
|
metadata = node.get('metadata', {})
|
||||||
|
status = node.get('status', {})
|
||||||
|
|
||||||
|
name = metadata.get('name', 'unknown')
|
||||||
|
labels = metadata.get('labels', {})
|
||||||
|
creation_time = metadata.get('creationTimestamp', '')
|
||||||
|
|
||||||
|
# Get roles
|
||||||
|
roles = get_node_roles(labels)
|
||||||
|
|
||||||
|
# Get conditions
|
||||||
|
conditions = status.get('conditions', [])
|
||||||
|
ready_condition = 'Unknown'
|
||||||
|
node_issues = []
|
||||||
|
|
||||||
|
for condition in conditions:
|
||||||
|
cond_type = condition.get('type', '')
|
||||||
|
cond_status = condition.get('status', 'Unknown')
|
||||||
|
|
||||||
|
if cond_type == 'Ready':
|
||||||
|
ready_condition = cond_status
|
||||||
|
elif cond_status == 'True' and cond_type in ['MemoryPressure', 'DiskPressure', 'PIDPressure', 'NetworkUnavailable']:
|
||||||
|
node_issues.append(cond_type)
|
||||||
|
|
||||||
|
# Determine overall status
|
||||||
|
if ready_condition == 'True':
|
||||||
|
node_status = 'Ready'
|
||||||
|
elif ready_condition == 'False':
|
||||||
|
node_status = 'NotReady'
|
||||||
|
else:
|
||||||
|
node_status = 'Unknown'
|
||||||
|
|
||||||
|
# Add issues to status
|
||||||
|
if node_issues:
|
||||||
|
node_status = f"{node_status},{','.join(node_issues)}"
|
||||||
|
|
||||||
|
# Get version
|
||||||
|
node_info = status.get('nodeInfo', {})
|
||||||
|
version = node_info.get('kubeletVersion', '')
|
||||||
|
|
||||||
|
# Get age
|
||||||
|
age = calculate_age(creation_time) if creation_time else ''
|
||||||
|
|
||||||
|
# Internal IP
|
||||||
|
addresses = status.get('addresses', [])
|
||||||
|
internal_ip = ''
|
||||||
|
for addr in addresses:
|
||||||
|
if addr.get('type') == 'InternalIP':
|
||||||
|
internal_ip = addr.get('address', '')
|
||||||
|
break
|
||||||
|
|
||||||
|
# OS Image
|
||||||
|
os_image = node_info.get('osImage', '')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'name': name,
|
||||||
|
'status': node_status,
|
||||||
|
'roles': roles,
|
||||||
|
'age': age,
|
||||||
|
'version': version,
|
||||||
|
'internal_ip': internal_ip,
|
||||||
|
'os_image': os_image,
|
||||||
|
'is_problem': node_status != 'Ready' or len(node_issues) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_nodes_table(nodes: List[Dict[str, Any]]):
|
||||||
|
"""Print nodes in a formatted table like 'oc get nodes'."""
|
||||||
|
if not nodes:
|
||||||
|
print("No resources found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Print header
|
||||||
|
print(f"{'NAME':<50} {'STATUS':<30} {'ROLES':<20} {'AGE':<7} VERSION")
|
||||||
|
|
||||||
|
# Print rows
|
||||||
|
for node in nodes:
|
||||||
|
name = node['name'][:50]
|
||||||
|
status = node['status'][:30]
|
||||||
|
roles = node['roles'][:20]
|
||||||
|
age = node['age'][:7]
|
||||||
|
version = node['version']
|
||||||
|
|
||||||
|
print(f"{name:<50} {status:<30} {roles:<20} {age:<7} {version}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_nodes(must_gather_path: str, problems_only: bool = False):
|
||||||
|
"""Analyze all nodes in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Find all node YAML files
|
||||||
|
possible_patterns = [
|
||||||
|
"cluster-scoped-resources/core/nodes/*.yaml",
|
||||||
|
"*/cluster-scoped-resources/core/nodes/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
nodes = []
|
||||||
|
|
||||||
|
for pattern in possible_patterns:
|
||||||
|
for node_file in base_path.glob(pattern):
|
||||||
|
# Skip the nodes.yaml file that contains all nodes
|
||||||
|
if node_file.name == 'nodes.yaml':
|
||||||
|
continue
|
||||||
|
|
||||||
|
node = parse_node(node_file)
|
||||||
|
if node:
|
||||||
|
node_status = get_node_status(node)
|
||||||
|
nodes.append(node_status)
|
||||||
|
|
||||||
|
if not nodes:
|
||||||
|
print("No resources found.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Remove duplicates
|
||||||
|
seen = set()
|
||||||
|
unique_nodes = []
|
||||||
|
for n in nodes:
|
||||||
|
if n['name'] not in seen:
|
||||||
|
seen.add(n['name'])
|
||||||
|
unique_nodes.append(n)
|
||||||
|
|
||||||
|
# Sort by name
|
||||||
|
unique_nodes.sort(key=lambda x: x['name'])
|
||||||
|
|
||||||
|
# Filter if problems only
|
||||||
|
if problems_only:
|
||||||
|
unique_nodes = [n for n in unique_nodes if n['is_problem']]
|
||||||
|
if not unique_nodes:
|
||||||
|
print("No resources found.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_nodes_table(unique_nodes)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Analyze node resources from must-gather data',
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s ./must-gather.local.123456789
|
||||||
|
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||||
|
help='Show only nodes with issues')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_nodes(args.must_gather_path, args.problems_only)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
@@ -0,0 +1,444 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze OVN Northbound and Southbound databases from must-gather.
|
||||||
|
Uses ovsdb-tool to read binary .db files collected per-node.
|
||||||
|
|
||||||
|
Must-gather structure:
|
||||||
|
network_logs/
|
||||||
|
└── ovnk_database_store.tar.gz
|
||||||
|
└── ovnk_database_store/
|
||||||
|
├── ovnkube-node-{pod}_nbdb (per-zone NBDB)
|
||||||
|
├── ovnkube-node-{pod}_sbdb (per-zone SBDB)
|
||||||
|
└── ...
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import tarfile
|
||||||
|
import yaml
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
class OVNDatabase:
|
||||||
|
"""Wrapper for querying OVSDB files using ovsdb-tool"""
|
||||||
|
|
||||||
|
def __init__(self, db_path: Path, db_type: str, node_name: str = None):
|
||||||
|
self.db_path = db_path
|
||||||
|
self.db_type = db_type # 'nbdb' or 'sbdb'
|
||||||
|
self.pod_name = db_path.stem.replace('_nbdb', '').replace('_sbdb', '')
|
||||||
|
self.node_name = node_name or self.pod_name # Use node name if available
|
||||||
|
|
||||||
|
def query(self, table: str, columns: List[str] = None, where: List = None) -> List[Dict]:
|
||||||
|
"""Query OVSDB table using ovsdb-tool query command"""
|
||||||
|
schema = "OVN_Northbound" if self.db_type == "nbdb" else "OVN_Southbound"
|
||||||
|
|
||||||
|
# Build query
|
||||||
|
query_op = {
|
||||||
|
"op": "select",
|
||||||
|
"table": table,
|
||||||
|
"where": where or []
|
||||||
|
}
|
||||||
|
|
||||||
|
if columns:
|
||||||
|
query_op["columns"] = columns
|
||||||
|
|
||||||
|
query_json = json.dumps([schema, query_op])
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
['ovsdb-tool', 'query', str(self.db_path), query_json],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f"Warning: Query failed for {self.db_path}: {result.stderr}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
return data[0].get('rows', [])
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to query {table} from {self.db_path}: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def build_pod_to_node_mapping(mg_path: Path) -> Dict[str, str]:
|
||||||
|
"""Build mapping of ovnkube pod names to node names"""
|
||||||
|
pod_to_node = {}
|
||||||
|
|
||||||
|
# Look for ovnkube-node pods in openshift-ovn-kubernetes namespace
|
||||||
|
ovn_ns_path = mg_path / "namespaces" / "openshift-ovn-kubernetes" / "pods"
|
||||||
|
|
||||||
|
if not ovn_ns_path.exists():
|
||||||
|
print(f"Warning: OVN namespace pods not found at {ovn_ns_path}", file=sys.stderr)
|
||||||
|
return pod_to_node
|
||||||
|
|
||||||
|
# Find all ovnkube-node pod directories
|
||||||
|
for pod_dir in ovn_ns_path.glob("ovnkube-node-*"):
|
||||||
|
if not pod_dir.is_dir():
|
||||||
|
continue
|
||||||
|
|
||||||
|
pod_name = pod_dir.name
|
||||||
|
pod_yaml = pod_dir / f"{pod_name}.yaml"
|
||||||
|
|
||||||
|
if not pod_yaml.exists():
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(pod_yaml, 'r') as f:
|
||||||
|
pod = yaml.safe_load(f)
|
||||||
|
node_name = pod.get('spec', {}).get('nodeName')
|
||||||
|
if node_name:
|
||||||
|
pod_to_node[pod_name] = node_name
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {pod_yaml}: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
return pod_to_node
|
||||||
|
|
||||||
|
|
||||||
|
def extract_db_tarball(mg_path: Path) -> Path:
|
||||||
|
"""Extract ovnk_database_store.tar.gz if not already extracted"""
|
||||||
|
network_logs = mg_path / "network_logs"
|
||||||
|
tarball = network_logs / "ovnk_database_store.tar.gz"
|
||||||
|
extract_dir = network_logs / "ovnk_database_store"
|
||||||
|
|
||||||
|
if not tarball.exists():
|
||||||
|
print(f"Error: Database tarball not found: {tarball}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Extract if directory doesn't exist
|
||||||
|
if not extract_dir.exists():
|
||||||
|
print(f"Extracting {tarball}...")
|
||||||
|
with tarfile.open(tarball, 'r:gz') as tar:
|
||||||
|
tar.extractall(path=network_logs)
|
||||||
|
|
||||||
|
return extract_dir
|
||||||
|
|
||||||
|
|
||||||
|
def get_nb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||||
|
"""Find all NB database files and map them to nodes"""
|
||||||
|
databases = []
|
||||||
|
for db in sorted(db_dir.glob("*_nbdb")):
|
||||||
|
pod_name = db.stem.replace('_nbdb', '')
|
||||||
|
node_name = pod_to_node.get(pod_name)
|
||||||
|
databases.append(OVNDatabase(db, 'nbdb', node_name))
|
||||||
|
return databases
|
||||||
|
|
||||||
|
|
||||||
|
def get_sb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||||
|
"""Find all SB database files and map them to nodes"""
|
||||||
|
databases = []
|
||||||
|
for db in sorted(db_dir.glob("*_sbdb")):
|
||||||
|
pod_name = db.stem.replace('_sbdb', '')
|
||||||
|
node_name = pod_to_node.get(pod_name)
|
||||||
|
databases.append(OVNDatabase(db, 'sbdb', node_name))
|
||||||
|
return databases
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_logical_switches(db: OVNDatabase):
|
||||||
|
"""Analyze logical switches in the zone"""
|
||||||
|
switches = db.query("Logical_Switch", columns=["name", "ports", "other_config"])
|
||||||
|
|
||||||
|
if not switches:
|
||||||
|
print(" No logical switches found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n LOGICAL SWITCHES ({len(switches)}):")
|
||||||
|
print(f" {'NAME':<60} PORTS")
|
||||||
|
print(f" {'-'*80}")
|
||||||
|
|
||||||
|
for sw in switches:
|
||||||
|
name = sw.get('name', 'unknown')
|
||||||
|
# ports is a UUID set, just count them
|
||||||
|
port_count = 0
|
||||||
|
ports = sw.get('ports', [])
|
||||||
|
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||||
|
port_count = len(ports[1])
|
||||||
|
|
||||||
|
print(f" {name:<60} {port_count}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_logical_switch_ports(db: OVNDatabase):
|
||||||
|
"""Analyze logical switch ports, focusing on pods"""
|
||||||
|
lsps = db.query("Logical_Switch_Port", columns=["name", "external_ids", "addresses"])
|
||||||
|
|
||||||
|
# Filter for pod ports (have pod=true in external_ids)
|
||||||
|
pod_ports = []
|
||||||
|
for lsp in lsps:
|
||||||
|
ext_ids = lsp.get('external_ids', [])
|
||||||
|
if isinstance(ext_ids, list) and len(ext_ids) == 2 and ext_ids[0] == "map":
|
||||||
|
ext_map = dict(ext_ids[1])
|
||||||
|
if ext_map.get('pod') == 'true':
|
||||||
|
# Pod name is in the LSP name (format: namespace_podname)
|
||||||
|
lsp_name = lsp.get('name', '')
|
||||||
|
namespace = ext_map.get('namespace', '')
|
||||||
|
|
||||||
|
# Extract pod name from LSP name
|
||||||
|
pod_name = lsp_name
|
||||||
|
if lsp_name.startswith(namespace + '_'):
|
||||||
|
pod_name = lsp_name[len(namespace) + 1:]
|
||||||
|
|
||||||
|
# Extract IP from addresses (format can be string "MAC IP" or empty)
|
||||||
|
ip = ""
|
||||||
|
addrs = lsp.get('addresses', '')
|
||||||
|
if isinstance(addrs, str) and addrs:
|
||||||
|
parts = addrs.split()
|
||||||
|
if len(parts) > 1:
|
||||||
|
ip = parts[1]
|
||||||
|
|
||||||
|
pod_ports.append({
|
||||||
|
'name': lsp_name,
|
||||||
|
'namespace': namespace,
|
||||||
|
'pod_name': pod_name,
|
||||||
|
'ip': ip
|
||||||
|
})
|
||||||
|
|
||||||
|
if not pod_ports:
|
||||||
|
print(" No pod logical switch ports found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n POD LOGICAL SWITCH PORTS ({len(pod_ports)}):")
|
||||||
|
print(f" {'NAMESPACE':<40} {'POD':<45} IP")
|
||||||
|
print(f" {'-'*120}")
|
||||||
|
|
||||||
|
for port in sorted(pod_ports, key=lambda x: (x['namespace'], x['pod_name']))[:20]: # Show first 20
|
||||||
|
namespace = port['namespace'][:40]
|
||||||
|
pod_name = port['pod_name'][:45]
|
||||||
|
ip = port['ip']
|
||||||
|
|
||||||
|
print(f" {namespace:<40} {pod_name:<45} {ip}")
|
||||||
|
|
||||||
|
if len(pod_ports) > 20:
|
||||||
|
print(f" ... and {len(pod_ports) - 20} more")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_acls(db: OVNDatabase):
|
||||||
|
"""Analyze ACLs in the zone"""
|
||||||
|
acls = db.query("ACL", columns=["priority", "direction", "match", "action", "severity"])
|
||||||
|
|
||||||
|
if not acls:
|
||||||
|
print(" No ACLs found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n ACCESS CONTROL LISTS ({len(acls)}):")
|
||||||
|
print(f" {'PRIORITY':<10} {'DIRECTION':<15} {'ACTION':<15} MATCH")
|
||||||
|
print(f" {'-'*120}")
|
||||||
|
|
||||||
|
# Show highest priority ACLs first
|
||||||
|
sorted_acls = sorted(acls, key=lambda x: x.get('priority', 0), reverse=True)
|
||||||
|
|
||||||
|
for acl in sorted_acls[:15]: # Show top 15
|
||||||
|
priority = acl.get('priority', 0)
|
||||||
|
direction = acl.get('direction', '')
|
||||||
|
action = acl.get('action', '')
|
||||||
|
match = acl.get('match', '')[:70] # Truncate long matches
|
||||||
|
|
||||||
|
print(f" {priority:<10} {direction:<15} {action:<15} {match}")
|
||||||
|
|
||||||
|
if len(acls) > 15:
|
||||||
|
print(f" ... and {len(acls) - 15} more")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_logical_routers(db: OVNDatabase):
|
||||||
|
"""Analyze logical routers in the zone"""
|
||||||
|
routers = db.query("Logical_Router", columns=["name", "ports", "static_routes"])
|
||||||
|
|
||||||
|
if not routers:
|
||||||
|
print(" No logical routers found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"\n LOGICAL ROUTERS ({len(routers)}):")
|
||||||
|
print(f" {'NAME':<60} PORTS")
|
||||||
|
print(f" {'-'*80}")
|
||||||
|
|
||||||
|
for router in routers:
|
||||||
|
name = router.get('name', 'unknown')
|
||||||
|
|
||||||
|
# Count ports
|
||||||
|
port_count = 0
|
||||||
|
ports = router.get('ports', [])
|
||||||
|
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||||
|
port_count = len(ports[1])
|
||||||
|
|
||||||
|
print(f" {name:<60} {port_count}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_zone_summary(db: OVNDatabase):
|
||||||
|
"""Print summary for a zone"""
|
||||||
|
# Get counts - for ACLs we need multiple columns to get accurate count
|
||||||
|
switches = db.query("Logical_Switch", columns=["name"])
|
||||||
|
lsps = db.query("Logical_Switch_Port", columns=["name"])
|
||||||
|
acls = db.query("ACL", columns=["priority", "direction", "match"])
|
||||||
|
routers = db.query("Logical_Router", columns=["name"])
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"Node: {db.node_name}")
|
||||||
|
if db.node_name != db.pod_name:
|
||||||
|
print(f"Pod: {db.pod_name}")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
print(f" Logical Switches: {len(switches)}")
|
||||||
|
print(f" Logical Switch Ports: {len(lsps)}")
|
||||||
|
print(f" ACLs: {len(acls)}")
|
||||||
|
print(f" Logical Routers: {len(routers)}")
|
||||||
|
|
||||||
|
|
||||||
|
def run_raw_query(mg_path: str, node_filter: str, query_json: str):
|
||||||
|
"""Run a raw JSON query against OVN databases"""
|
||||||
|
base_path = Path(mg_path)
|
||||||
|
|
||||||
|
# Build pod-to-node mapping
|
||||||
|
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||||
|
|
||||||
|
# Extract tarball
|
||||||
|
db_dir = extract_db_tarball(base_path)
|
||||||
|
if not db_dir:
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Get all NB databases
|
||||||
|
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||||
|
|
||||||
|
if not nb_dbs:
|
||||||
|
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Filter by node if specified
|
||||||
|
if node_filter:
|
||||||
|
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||||
|
if not filtered_dbs:
|
||||||
|
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||||
|
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||||
|
for db in nb_dbs:
|
||||||
|
print(f" - {db.node_name}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
nb_dbs = filtered_dbs
|
||||||
|
|
||||||
|
# Run query on each database
|
||||||
|
for db in nb_dbs:
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"Node: {db.node_name}")
|
||||||
|
if db.node_name != db.pod_name:
|
||||||
|
print(f"Pod: {db.pod_name}")
|
||||||
|
print(f"{'='*80}\n")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Run the raw query using ovsdb-tool
|
||||||
|
result = subprocess.run(
|
||||||
|
['ovsdb-tool', 'query', str(db.db_path), query_json],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f"Error: Query failed: {result.stderr}", file=sys.stderr)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Pretty print the JSON result
|
||||||
|
try:
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
print(json.dumps(data, indent=2))
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
# If not valid JSON, just print raw output
|
||||||
|
print(result.stdout)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error: Failed to execute query: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_northbound_databases(mg_path: str, node_filter: str = None):
|
||||||
|
"""Analyze all Northbound databases"""
|
||||||
|
base_path = Path(mg_path)
|
||||||
|
|
||||||
|
# Build pod-to-node mapping
|
||||||
|
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||||
|
|
||||||
|
# Extract tarball
|
||||||
|
db_dir = extract_db_tarball(base_path)
|
||||||
|
if not db_dir:
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Get all NB databases
|
||||||
|
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||||
|
|
||||||
|
if not nb_dbs:
|
||||||
|
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Filter by node if specified
|
||||||
|
if node_filter:
|
||||||
|
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||||
|
if not filtered_dbs:
|
||||||
|
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||||
|
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||||
|
for db in nb_dbs:
|
||||||
|
print(f" - {db.node_name}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
nb_dbs = filtered_dbs
|
||||||
|
|
||||||
|
print(f"\nFound {len(nb_dbs)} node(s)\n")
|
||||||
|
|
||||||
|
# Analyze each zone
|
||||||
|
for db in nb_dbs:
|
||||||
|
analyze_zone_summary(db)
|
||||||
|
analyze_logical_switches(db)
|
||||||
|
analyze_logical_switch_ports(db)
|
||||||
|
analyze_acls(db)
|
||||||
|
analyze_logical_routers(db)
|
||||||
|
print()
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Analyze OVN databases from must-gather",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
# Analyze all nodes
|
||||||
|
analyze_ovn_dbs.py ./must-gather.local.123456789
|
||||||
|
|
||||||
|
# Analyze specific node
|
||||||
|
analyze_ovn_dbs.py ./must-gather.local.123456789 --node ip-10-0-26-145
|
||||||
|
|
||||||
|
# Run raw OVSDB query (Claude can construct the JSON)
|
||||||
|
analyze_ovn_dbs.py ./must-gather/ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
|
||||||
|
|
||||||
|
# Query specific node
|
||||||
|
analyze_ovn_dbs.py ./must-gather/ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name"]}]'
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('--node', '-n', help='Filter by node name (supports partial matches)')
|
||||||
|
parser.add_argument('--query', '-q', help='Run raw OVSDB JSON query instead of standard analysis')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Check if ovsdb-tool is available
|
||||||
|
try:
|
||||||
|
subprocess.run(['ovsdb-tool', '--version'], capture_output=True, check=True)
|
||||||
|
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||||
|
print("Error: ovsdb-tool not found. Please install openvswitch package.", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Run query mode or standard analysis
|
||||||
|
if args.query:
|
||||||
|
return run_raw_query(args.must_gather_path, args.node, args.query)
|
||||||
|
else:
|
||||||
|
return analyze_northbound_databases(args.must_gather_path, args.node)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
@@ -0,0 +1,224 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze Pod resources from must-gather data.
|
||||||
|
Displays output similar to 'oc get pods -A' command.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_pod(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a single pod YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
if doc and doc.get('kind') == 'Pod':
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_age(creation_timestamp: str) -> str:
|
||||||
|
"""Calculate age from creation timestamp."""
|
||||||
|
try:
|
||||||
|
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||||
|
now = datetime.now(ts.tzinfo)
|
||||||
|
delta = now - ts
|
||||||
|
|
||||||
|
days = delta.days
|
||||||
|
hours = delta.seconds // 3600
|
||||||
|
minutes = (delta.seconds % 3600) // 60
|
||||||
|
|
||||||
|
if days > 0:
|
||||||
|
return f"{days}d"
|
||||||
|
elif hours > 0:
|
||||||
|
return f"{hours}h"
|
||||||
|
elif minutes > 0:
|
||||||
|
return f"{minutes}m"
|
||||||
|
else:
|
||||||
|
return "<1m"
|
||||||
|
except Exception:
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def get_pod_status(pod: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""Extract pod status information."""
|
||||||
|
metadata = pod.get('metadata', {})
|
||||||
|
status = pod.get('status', {})
|
||||||
|
spec = pod.get('spec', {})
|
||||||
|
|
||||||
|
name = metadata.get('name', 'unknown')
|
||||||
|
namespace = metadata.get('namespace', 'unknown')
|
||||||
|
creation_time = metadata.get('creationTimestamp', '')
|
||||||
|
|
||||||
|
# Get container statuses
|
||||||
|
container_statuses = status.get('containerStatuses', [])
|
||||||
|
init_container_statuses = status.get('initContainerStatuses', [])
|
||||||
|
|
||||||
|
# Calculate ready containers
|
||||||
|
total_containers = len(spec.get('containers', []))
|
||||||
|
ready_containers = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||||
|
|
||||||
|
# Get overall phase
|
||||||
|
phase = status.get('phase', 'Unknown')
|
||||||
|
|
||||||
|
# Determine more specific status
|
||||||
|
pod_status = phase
|
||||||
|
reason = status.get('reason', '')
|
||||||
|
|
||||||
|
# Check for specific container states
|
||||||
|
for cs in container_statuses:
|
||||||
|
state = cs.get('state', {})
|
||||||
|
if 'waiting' in state:
|
||||||
|
waiting = state['waiting']
|
||||||
|
pod_status = waiting.get('reason', 'Waiting')
|
||||||
|
elif 'terminated' in state:
|
||||||
|
terminated = state['terminated']
|
||||||
|
if terminated.get('exitCode', 0) != 0:
|
||||||
|
pod_status = terminated.get('reason', 'Error')
|
||||||
|
|
||||||
|
# Check init containers
|
||||||
|
for ics in init_container_statuses:
|
||||||
|
state = ics.get('state', {})
|
||||||
|
if 'waiting' in state:
|
||||||
|
waiting = state['waiting']
|
||||||
|
if waiting.get('reason') in ['CrashLoopBackOff', 'ImagePullBackOff', 'ErrImagePull']:
|
||||||
|
pod_status = f"Init:{waiting.get('reason', 'Waiting')}"
|
||||||
|
|
||||||
|
# Calculate total restarts
|
||||||
|
total_restarts = sum(cs.get('restartCount', 0) for cs in container_statuses)
|
||||||
|
|
||||||
|
# Calculate age
|
||||||
|
age = calculate_age(creation_time) if creation_time else ''
|
||||||
|
|
||||||
|
return {
|
||||||
|
'namespace': namespace,
|
||||||
|
'name': name,
|
||||||
|
'ready': f"{ready_containers}/{total_containers}",
|
||||||
|
'status': pod_status,
|
||||||
|
'restarts': str(total_restarts),
|
||||||
|
'age': age,
|
||||||
|
'node': spec.get('nodeName', ''),
|
||||||
|
'is_problem': pod_status not in ['Running', 'Succeeded', 'Completed'] or total_restarts > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_pods_table(pods: List[Dict[str, Any]], show_namespace: bool = True):
|
||||||
|
"""Print pods in a formatted table like 'oc get pods'."""
|
||||||
|
if not pods:
|
||||||
|
print("No resources found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Print header
|
||||||
|
if show_namespace:
|
||||||
|
print(f"{'NAMESPACE':<42} {'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||||
|
else:
|
||||||
|
print(f"{'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||||
|
|
||||||
|
# Print rows
|
||||||
|
for pod in pods:
|
||||||
|
name = pod['name'][:50]
|
||||||
|
ready = pod['ready'][:7]
|
||||||
|
status = pod['status'][:20]
|
||||||
|
restarts = pod['restarts'][:9]
|
||||||
|
age = pod['age']
|
||||||
|
|
||||||
|
if show_namespace:
|
||||||
|
namespace = pod['namespace'][:42]
|
||||||
|
print(f"{namespace:<42} {name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||||
|
else:
|
||||||
|
print(f"{name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_pods(must_gather_path: str, namespace: Optional[str] = None, problems_only: bool = False):
|
||||||
|
"""Analyze all pods in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
pods = []
|
||||||
|
|
||||||
|
# Find all pod YAML files
|
||||||
|
# Structure: namespaces/<namespace>/pods/<pod-name>/<pod-name>.yaml
|
||||||
|
if namespace:
|
||||||
|
# Specific namespace
|
||||||
|
patterns = [
|
||||||
|
f"namespaces/{namespace}/pods/*/*.yaml",
|
||||||
|
f"*/namespaces/{namespace}/pods/*/*.yaml",
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
# All namespaces
|
||||||
|
patterns = [
|
||||||
|
"namespaces/*/pods/*/*.yaml",
|
||||||
|
"*/namespaces/*/pods/*/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern in patterns:
|
||||||
|
for pod_file in base_path.glob(pattern):
|
||||||
|
pod = parse_pod(pod_file)
|
||||||
|
if pod:
|
||||||
|
pod_status = get_pod_status(pod)
|
||||||
|
pods.append(pod_status)
|
||||||
|
|
||||||
|
if not pods:
|
||||||
|
print("No resources found.")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
# Remove duplicates
|
||||||
|
seen = set()
|
||||||
|
unique_pods = []
|
||||||
|
for p in pods:
|
||||||
|
key = f"{p['namespace']}/{p['name']}"
|
||||||
|
if key not in seen:
|
||||||
|
seen.add(key)
|
||||||
|
unique_pods.append(p)
|
||||||
|
|
||||||
|
# Sort by namespace, then name
|
||||||
|
unique_pods.sort(key=lambda x: (x['namespace'], x['name']))
|
||||||
|
|
||||||
|
# Filter if problems only
|
||||||
|
if problems_only:
|
||||||
|
unique_pods = [p for p in unique_pods if p['is_problem']]
|
||||||
|
if not unique_pods:
|
||||||
|
print("No resources found.")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_pods_table(unique_pods, show_namespace=(namespace is None))
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Analyze pod resources from must-gather data',
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s ./must-gather.local.123456789
|
||||||
|
%(prog)s ./must-gather.local.123456789 --namespace openshift-etcd
|
||||||
|
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||||
|
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||||
|
help='Show only pods with issues')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_pods(args.must_gather_path, args.namespace, args.problems_only)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
@@ -0,0 +1,117 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze Prometheus data from must-gather data.
|
||||||
|
Shows Prometheus status, targets, and active alerts.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a JSON file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r', encoding='utf-8') as f:
|
||||||
|
doc = json.load(f)
|
||||||
|
return doc
|
||||||
|
except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
|
||||||
|
print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
def print_alerts_table(alerts):
|
||||||
|
"""Print alerts in a table format."""
|
||||||
|
if not alerts:
|
||||||
|
print("No alerts found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("ALERTS")
|
||||||
|
print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")
|
||||||
|
|
||||||
|
for alert in alerts:
|
||||||
|
state = alert.get('state', '')
|
||||||
|
since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
|
||||||
|
labels = alert.get('labels', {})
|
||||||
|
namespace = labels.pop('namespace', '')[:50]
|
||||||
|
name = labels.pop('alertname', '')[:50]
|
||||||
|
severity = labels.pop('severity', '')[:10]
|
||||||
|
|
||||||
|
print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
|
||||||
|
"""Analyze Prometheus data in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Retrieve active alerts.
|
||||||
|
rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
|
||||||
|
rules = parse_json_file(rules_path)
|
||||||
|
|
||||||
|
if rules is None:
|
||||||
|
return 1
|
||||||
|
status = rules.get("status", "")
|
||||||
|
if status != "success":
|
||||||
|
print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if "data" not in rules or "groups" not in rules["data"]:
|
||||||
|
print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
alerts = []
|
||||||
|
for group in rules["data"]["groups"]:
|
||||||
|
for rule in group["rules"]:
|
||||||
|
if rule["type"] == 'alerting' and rule["state"] != 'inactive':
|
||||||
|
for alert in rule["alerts"]:
|
||||||
|
if namespace is None or namespace == '':
|
||||||
|
alerts.append(alert)
|
||||||
|
elif alert.get('labels', {}).get('namespace', '') == namespace:
|
||||||
|
alerts.append(alert)
|
||||||
|
|
||||||
|
# Sort alerts by namespace, alertname and severity.
|
||||||
|
alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_alerts_table(alerts)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
total_alerts = len(alerts)
|
||||||
|
pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
|
||||||
|
firing = sum(1 for alert in alerts if alert.get('state') == 'firing')
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"SUMMARY")
|
||||||
|
print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Analyze Prometheus data from must-gather data',
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s ./must-gather
|
||||||
|
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('-n', '--namespace', help='Filter information by namespace')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_prometheus(args.must_gather_path, args.namespace)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
|
|
||||||
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
@@ -0,0 +1,235 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Analyze PersistentVolumes and PersistentVolumeClaims from must-gather data.
|
||||||
|
Shows PV/PVC status, capacity, and binding information.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import yaml
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
|
||||||
|
|
||||||
|
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Parse a YAML file."""
|
||||||
|
try:
|
||||||
|
with open(file_path, 'r') as f:
|
||||||
|
doc = yaml.safe_load(f)
|
||||||
|
return doc
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def format_pv(pv: Dict[str, Any]) -> Dict[str, str]:
|
||||||
|
"""Format a PersistentVolume for display."""
|
||||||
|
name = pv.get('metadata', {}).get('name', 'unknown')
|
||||||
|
spec = pv.get('spec', {})
|
||||||
|
status = pv.get('status', {})
|
||||||
|
|
||||||
|
capacity = spec.get('capacity', {}).get('storage', '')
|
||||||
|
access_modes = ','.join(spec.get('accessModes', []))[:20]
|
||||||
|
reclaim_policy = spec.get('persistentVolumeReclaimPolicy', '')
|
||||||
|
pv_status = status.get('phase', 'Unknown')
|
||||||
|
|
||||||
|
claim_ref = spec.get('claimRef', {})
|
||||||
|
claim = ''
|
||||||
|
if claim_ref:
|
||||||
|
claim_ns = claim_ref.get('namespace', '')
|
||||||
|
claim_name = claim_ref.get('name', '')
|
||||||
|
claim = f"{claim_ns}/{claim_name}" if claim_ns else claim_name
|
||||||
|
|
||||||
|
storage_class = spec.get('storageClassName', '')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'name': name,
|
||||||
|
'capacity': capacity,
|
||||||
|
'access_modes': access_modes,
|
||||||
|
'reclaim_policy': reclaim_policy,
|
||||||
|
'status': pv_status,
|
||||||
|
'claim': claim,
|
||||||
|
'storage_class': storage_class
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def format_pvc(pvc: Dict[str, Any]) -> Dict[str, str]:
|
||||||
|
"""Format a PersistentVolumeClaim for display."""
|
||||||
|
metadata = pvc.get('metadata', {})
|
||||||
|
name = metadata.get('name', 'unknown')
|
||||||
|
namespace = metadata.get('namespace', 'unknown')
|
||||||
|
spec = pvc.get('spec', {})
|
||||||
|
status = pvc.get('status', {})
|
||||||
|
|
||||||
|
pvc_status = status.get('phase', 'Unknown')
|
||||||
|
volume = spec.get('volumeName', '')
|
||||||
|
capacity = status.get('capacity', {}).get('storage', '')
|
||||||
|
access_modes = ','.join(status.get('accessModes', []))[:20]
|
||||||
|
storage_class = spec.get('storageClassName', '')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'namespace': namespace,
|
||||||
|
'name': name,
|
||||||
|
'status': pvc_status,
|
||||||
|
'volume': volume,
|
||||||
|
'capacity': capacity,
|
||||||
|
'access_modes': access_modes,
|
||||||
|
'storage_class': storage_class
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def print_pvs_table(pvs: List[Dict[str, str]]):
|
||||||
|
"""Print PVs in a table format."""
|
||||||
|
if not pvs:
|
||||||
|
print("No PersistentVolumes found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("PERSISTENT VOLUMES")
|
||||||
|
print(f"{'NAME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} {'RECLAIM':<10} {'STATUS':<10} {'CLAIM':<40} STORAGECLASS")
|
||||||
|
|
||||||
|
for pv in pvs:
|
||||||
|
name = pv['name'][:50]
|
||||||
|
capacity = pv['capacity'][:10]
|
||||||
|
access = pv['access_modes'][:20]
|
||||||
|
reclaim = pv['reclaim_policy'][:10]
|
||||||
|
status = pv['status'][:10]
|
||||||
|
claim = pv['claim'][:40]
|
||||||
|
sc = pv['storage_class']
|
||||||
|
|
||||||
|
print(f"{name:<50} {capacity:<10} {access:<20} {reclaim:<10} {status:<10} {claim:<40} {sc}")
|
||||||
|
|
||||||
|
|
||||||
|
def print_pvcs_table(pvcs: List[Dict[str, str]]):
|
||||||
|
"""Print PVCs in a table format."""
|
||||||
|
if not pvcs:
|
||||||
|
print("\nNo PersistentVolumeClaims found.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("\nPERSISTENT VOLUME CLAIMS")
|
||||||
|
print(f"{'NAMESPACE':<30} {'NAME':<40} {'STATUS':<10} {'VOLUME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} STORAGECLASS")
|
||||||
|
|
||||||
|
for pvc in pvcs:
|
||||||
|
namespace = pvc['namespace'][:30]
|
||||||
|
name = pvc['name'][:40]
|
||||||
|
status = pvc['status'][:10]
|
||||||
|
volume = pvc['volume'][:50]
|
||||||
|
capacity = pvc['capacity'][:10]
|
||||||
|
access = pvc['access_modes'][:20]
|
||||||
|
sc = pvc['storage_class']
|
||||||
|
|
||||||
|
print(f"{namespace:<30} {name:<40} {status:<10} {volume:<50} {capacity:<10} {access:<20} {sc}")
|
||||||
|
|
||||||
|
|
||||||
|
def analyze_storage(must_gather_path: str, namespace: Optional[str] = None):
|
||||||
|
"""Analyze PVs and PVCs in a must-gather directory."""
|
||||||
|
base_path = Path(must_gather_path)
|
||||||
|
|
||||||
|
# Find PVs (cluster-scoped)
|
||||||
|
pv_patterns = [
|
||||||
|
"cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||||
|
"*/cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
pvs = []
|
||||||
|
for pattern in pv_patterns:
|
||||||
|
for pv_file in base_path.glob(pattern):
|
||||||
|
if pv_file.name == 'persistentvolumes.yaml':
|
||||||
|
continue
|
||||||
|
pv = parse_yaml_file(pv_file)
|
||||||
|
if pv and pv.get('kind') == 'PersistentVolume':
|
||||||
|
pvs.append(format_pv(pv))
|
||||||
|
|
||||||
|
# Find PVCs (namespace-scoped)
|
||||||
|
if namespace:
|
||||||
|
pvc_patterns = [
|
||||||
|
f"namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||||
|
f"*/namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||||
|
]
|
||||||
|
else:
|
||||||
|
pvc_patterns = [
|
||||||
|
"namespaces/*/core/persistentvolumeclaims.yaml",
|
||||||
|
"*/namespaces/*/core/persistentvolumeclaims.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
pvcs = []
|
||||||
|
for pattern in pvc_patterns:
|
||||||
|
for pvc_file in base_path.glob(pattern):
|
||||||
|
pvc_doc = parse_yaml_file(pvc_file)
|
||||||
|
if pvc_doc:
|
||||||
|
if pvc_doc.get('kind') == 'PersistentVolumeClaim':
|
||||||
|
pvcs.append(format_pvc(pvc_doc))
|
||||||
|
elif pvc_doc.get('kind') == 'List':
|
||||||
|
for item in pvc_doc.get('items', []):
|
||||||
|
if item.get('kind') == 'PersistentVolumeClaim':
|
||||||
|
pvcs.append(format_pvc(item))
|
||||||
|
|
||||||
|
# Remove duplicates
|
||||||
|
seen_pvs = set()
|
||||||
|
unique_pvs = []
|
||||||
|
for pv in pvs:
|
||||||
|
if pv['name'] not in seen_pvs:
|
||||||
|
seen_pvs.add(pv['name'])
|
||||||
|
unique_pvs.append(pv)
|
||||||
|
|
||||||
|
seen_pvcs = set()
|
||||||
|
unique_pvcs = []
|
||||||
|
for pvc in pvcs:
|
||||||
|
key = f"{pvc['namespace']}/{pvc['name']}"
|
||||||
|
if key not in seen_pvcs:
|
||||||
|
seen_pvcs.add(key)
|
||||||
|
unique_pvcs.append(pvc)
|
||||||
|
|
||||||
|
# Sort
|
||||||
|
unique_pvs.sort(key=lambda x: x['name'])
|
||||||
|
unique_pvcs.sort(key=lambda x: (x['namespace'], x['name']))
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
print_pvs_table(unique_pvs)
|
||||||
|
print_pvcs_table(unique_pvcs)
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
total_pvs = len(unique_pvs)
|
||||||
|
bound_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Bound')
|
||||||
|
available_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Available')
|
||||||
|
|
||||||
|
total_pvcs = len(unique_pvcs)
|
||||||
|
bound_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Bound')
|
||||||
|
pending_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Pending')
|
||||||
|
|
||||||
|
print(f"\n{'='*80}")
|
||||||
|
print(f"SUMMARY")
|
||||||
|
print(f"PVs: {total_pvs} total ({bound_pvs} bound, {available_pvs} available)")
|
||||||
|
print(f"PVCs: {total_pvcs} total ({bound_pvcs} bound, {pending_pvcs} pending)")
|
||||||
|
if pending_pvcs > 0:
|
||||||
|
print(f" ⚠️ {pending_pvcs} PVC(s) pending - check storage provisioner")
|
||||||
|
print(f"{'='*80}")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description='Analyze PVs and PVCs from must-gather data',
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s ./must-gather
|
||||||
|
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||||
|
parser.add_argument('-n', '--namespace', help='Filter PVCs by namespace')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not os.path.isdir(args.must_gather_path):
|
||||||
|
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return analyze_storage(args.must_gather_path, args.namespace)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
Reference in New Issue
Block a user