Initial commit
This commit is contained in:
285
skills/must-gather-analyzer/SKILL.md
Normal file
285
skills/must-gather-analyzer/SKILL.md
Normal file
@@ -0,0 +1,285 @@
|
||||
---
|
||||
name: Must-Gather Analyzer
|
||||
description: |
|
||||
Analyze OpenShift must-gather diagnostic data including cluster operators, pods, nodes,
|
||||
and network components. Use this skill when the user asks about cluster health, operator status,
|
||||
pod issues, node conditions, or wants diagnostic insights from must-gather data.
|
||||
|
||||
Triggers: "analyze must-gather", "check cluster health", "operator status", "pod issues",
|
||||
"node status", "failing pods", "degraded operators", "cluster problems", "crashlooping",
|
||||
"network issues", "etcd health", "analyze clusteroperators", "analyze pods", "analyze nodes"
|
||||
---
|
||||
|
||||
# Must-Gather Analyzer Skill
|
||||
|
||||
Comprehensive analysis of OpenShift must-gather diagnostic data with helper scripts that parse YAML and display output in `oc`-like format.
|
||||
|
||||
## Overview
|
||||
|
||||
This skill provides analysis for:
|
||||
- **ClusterVersion**: Current version, update status, and capabilities
|
||||
- **Cluster Operators**: Status, degradation, and availability
|
||||
- **Pods**: Health, restarts, crashes, and failures across namespaces
|
||||
- **Nodes**: Conditions, capacity, and readiness
|
||||
- **Network**: OVN/SDN diagnostics and connectivity
|
||||
- **Events**: Warning and error events across namespaces
|
||||
- **etcd**: Cluster health, member status, and quorum
|
||||
- **Storage**: PersistentVolumes and PersistentVolumeClaims status
|
||||
|
||||
## Must-Gather Directory Structure
|
||||
|
||||
**Important**: Must-gather data is contained in a subdirectory with a long hash name:
|
||||
```
|
||||
must-gather/
|
||||
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
|
||||
├── cluster-scoped-resources/
|
||||
│ ├── config.openshift.io/clusteroperators/
|
||||
│ └── core/nodes/
|
||||
├── namespaces/
|
||||
│ └── <namespace>/
|
||||
│ └── pods/
|
||||
│ └── <pod-name>/
|
||||
│ └── <pod-name>.yaml
|
||||
└── network_logs/
|
||||
```
|
||||
|
||||
The analysis scripts expect the path to the **subdirectory** (the one with the hash), not the root must-gather folder.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Get Must-Gather Path
|
||||
Ask the user for the must-gather directory path if not already provided.
|
||||
- If they provide the root directory, look for the subdirectory with the hash name
|
||||
- The correct path contains `cluster-scoped-resources/` and `namespaces/` directories
|
||||
|
||||
### 2. Choose Analysis Type
|
||||
|
||||
Based on user's request, run the appropriate helper script:
|
||||
|
||||
#### ClusterVersion Analysis
|
||||
```bash
|
||||
./scripts/analyze_clusterversion.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows cluster version information similar to `oc get clusterversion`:
|
||||
- Current version and update status
|
||||
- Progressing state
|
||||
- Available updates
|
||||
- Version conditions
|
||||
- Enabled capabilities
|
||||
- Update history
|
||||
|
||||
#### Cluster Operators Analysis
|
||||
```bash
|
||||
./scripts/analyze_clusteroperators.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows cluster operator status similar to `oc get clusteroperators`:
|
||||
- Available, Progressing, Degraded conditions
|
||||
- Version information
|
||||
- Time since condition change
|
||||
- Detailed messages for operators with issues
|
||||
|
||||
#### Pods Analysis
|
||||
```bash
|
||||
# All namespaces
|
||||
./scripts/analyze_pods.py <must-gather-path>
|
||||
|
||||
# Specific namespace
|
||||
./scripts/analyze_pods.py <must-gather-path> --namespace <namespace>
|
||||
|
||||
# Show only problematic pods
|
||||
./scripts/analyze_pods.py <must-gather-path> --problems-only
|
||||
```
|
||||
|
||||
Shows pod status similar to `oc get pods -A`:
|
||||
- Ready/Total containers
|
||||
- Status (Running, Pending, CrashLoopBackOff, etc.)
|
||||
- Restart counts
|
||||
- Age
|
||||
- Categorized issues (crashlooping, pending, failed)
|
||||
|
||||
#### Nodes Analysis
|
||||
```bash
|
||||
./scripts/analyze_nodes.py <must-gather-path>
|
||||
|
||||
# Show only nodes with issues
|
||||
./scripts/analyze_nodes.py <must-gather-path> --problems-only
|
||||
```
|
||||
|
||||
Shows node status similar to `oc get nodes`:
|
||||
- Ready status
|
||||
- Roles (master, worker)
|
||||
- Age
|
||||
- Kubernetes version
|
||||
- Node conditions (DiskPressure, MemoryPressure, etc.)
|
||||
- Capacity and allocatable resources
|
||||
|
||||
#### Network Analysis
|
||||
```bash
|
||||
./scripts/analyze_network.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows network health:
|
||||
- Network type (OVN-Kubernetes, OpenShift SDN)
|
||||
- Network operator status
|
||||
- OVN pod health
|
||||
- PodNetworkConnectivityCheck results
|
||||
- Network-related issues
|
||||
|
||||
#### Events Analysis
|
||||
```bash
|
||||
# Recent events (last 100)
|
||||
./scripts/analyze_events.py <must-gather-path>
|
||||
|
||||
# Warning events only
|
||||
./scripts/analyze_events.py <must-gather-path> --type Warning
|
||||
|
||||
# Events in specific namespace
|
||||
./scripts/analyze_events.py <must-gather-path> --namespace openshift-etcd
|
||||
|
||||
# Show last 50 events
|
||||
./scripts/analyze_events.py <must-gather-path> --count 50
|
||||
```
|
||||
|
||||
Shows cluster events:
|
||||
- Event type (Warning, Normal)
|
||||
- Last seen timestamp
|
||||
- Reason and message
|
||||
- Affected object
|
||||
- Event count
|
||||
|
||||
#### etcd Analysis
|
||||
```bash
|
||||
./scripts/analyze_etcd.py <must-gather-path>
|
||||
```
|
||||
|
||||
Shows etcd cluster health:
|
||||
- Member health status
|
||||
- Member list with IDs and URLs
|
||||
- Endpoint status (leader, version, DB size)
|
||||
- Quorum status
|
||||
- Cluster summary
|
||||
|
||||
#### Storage Analysis
|
||||
```bash
|
||||
# All PVs and PVCs
|
||||
./scripts/analyze_pvs.py <must-gather-path>
|
||||
|
||||
# PVCs in specific namespace
|
||||
./scripts/analyze_pvs.py <must-gather-path> --namespace openshift-monitoring
|
||||
```
|
||||
|
||||
Shows storage resources:
|
||||
- PersistentVolumes (capacity, status, claims)
|
||||
- PersistentVolumeClaims (binding, capacity)
|
||||
- Storage classes
|
||||
- Pending/unbound volumes
|
||||
|
||||
#### Monitoring Analysis
|
||||
```bash
|
||||
# All alerts.
|
||||
./scripts/analyze_prometheus.py <must-gather-path>
|
||||
|
||||
# Alerts in specific namespace
|
||||
./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
|
||||
```
|
||||
|
||||
Shows monitoring information:
|
||||
- Alerts (state, namespace, name, active since, labels)
|
||||
- Total of pending/firing alerts
|
||||
|
||||
### 3. Interpret and Report
|
||||
|
||||
After running the scripts:
|
||||
1. Review the summary statistics
|
||||
2. Focus on items flagged with issues
|
||||
3. Provide actionable insights and next steps
|
||||
4. Suggest log analysis for specific components if needed
|
||||
5. Cross-reference issues (e.g., degraded operator → failing pods → node issues)
|
||||
|
||||
## Output Format
|
||||
|
||||
All scripts provide:
|
||||
- **Summary Section**: High-level statistics with emoji indicators
|
||||
- **Table View**: `oc`-like formatted output
|
||||
- **Issues Section**: Detailed breakdown of problems
|
||||
|
||||
Example summary format:
|
||||
```
|
||||
================================================================================
|
||||
SUMMARY: 25/28 operators healthy
|
||||
⚠️ 3 operators with issues
|
||||
🔄 1 progressing
|
||||
❌ 2 degraded
|
||||
================================================================================
|
||||
```
|
||||
|
||||
## Helper Scripts Reference
|
||||
|
||||
### scripts/analyze_clusterversion.py
|
||||
Parses: `cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml`
|
||||
Output: ClusterVersion table with detailed version info, conditions, and capabilities
|
||||
|
||||
### scripts/analyze_clusteroperators.py
|
||||
Parses: `cluster-scoped-resources/config.openshift.io/clusteroperators/`
|
||||
Output: ClusterOperator status table with conditions
|
||||
|
||||
### scripts/analyze_pods.py
|
||||
Parses: `namespaces/*/pods/*/*.yaml` (individual pod directories)
|
||||
Output: Pod status table with issues categorized
|
||||
|
||||
### scripts/analyze_nodes.py
|
||||
Parses: `cluster-scoped-resources/core/nodes/`
|
||||
Output: Node status table with conditions and capacity
|
||||
|
||||
### scripts/analyze_network.py
|
||||
Parses: `network_logs/`, network operator, OVN resources
|
||||
Output: Network health summary and diagnostics
|
||||
|
||||
### scripts/analyze_events.py
|
||||
Parses: `namespaces/*/core/events.yaml`
|
||||
Output: Event table sorted by last occurrence
|
||||
|
||||
### scripts/analyze_etcd.py
|
||||
Parses: `etcd_info/` (endpoint_health.json, member_list.json, endpoint_status.json)
|
||||
Output: etcd cluster health and member status
|
||||
|
||||
### scripts/analyze_pvs.py
|
||||
Parses: `cluster-scoped-resources/core/persistentvolumes/`, `namespaces/*/core/persistentvolumeclaims.yaml`
|
||||
Output: PV and PVC status tables
|
||||
|
||||
## Tips for Analysis
|
||||
|
||||
1. **Start with Cluster Operators**: They often reveal system-wide issues
|
||||
2. **Check Timing**: Look at "SINCE" columns to understand when issues started
|
||||
3. **Follow Dependencies**: Degraded operator → check its namespace pods → check hosting nodes
|
||||
4. **Look for Patterns**: Multiple pods failing on same node suggests node issue
|
||||
5. **Cross-reference**: Use multiple scripts together for complete picture
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### "Why is my cluster degraded?"
|
||||
1. Run `analyze_clusteroperators.py` - identify degraded operators
|
||||
2. Run `analyze_pods.py --namespace <operator-namespace>` - check operator pods
|
||||
3. Run `analyze_nodes.py` - verify node health
|
||||
|
||||
### "Pods keep crashing"
|
||||
1. Run `analyze_pods.py --problems-only` - find crashlooping pods
|
||||
2. Check which nodes they're on
|
||||
3. Run `analyze_nodes.py` - verify node conditions
|
||||
4. Suggest checking pod logs in must-gather data
|
||||
|
||||
### "Network connectivity issues"
|
||||
1. Run `analyze_network.py` - check network health
|
||||
2. Run `analyze_pods.py --namespace openshift-ovn-kubernetes`
|
||||
3. Check PodNetworkConnectivityCheck results
|
||||
|
||||
## Next Steps After Analysis
|
||||
|
||||
Based on findings, suggest:
|
||||
- Examining specific pod logs in `namespaces/<ns>/pods/<pod>/<container>/logs/`
|
||||
- Reviewing events in `namespaces/<ns>/core/events.yaml`
|
||||
- Checking audit logs in `audit_logs/`
|
||||
- Analyzing metrics data if available
|
||||
- Looking at host service logs in `host_service_logs/`
|
||||
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
199
skills/must-gather-analyzer/scripts/analyze_clusteroperators.py
Executable file
@@ -0,0 +1,199 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze ClusterOperator resources from must-gather data.
|
||||
Displays output similar to 'oc get clusteroperators' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_clusteroperator(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single clusteroperator YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'ClusterOperator':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_condition_status(conditions: List[Dict], condition_type: str) -> tuple[str, str, str]:
|
||||
"""
|
||||
Get status, reason, and message for a specific condition type.
|
||||
Returns (status, reason, message).
|
||||
"""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
status = condition.get('status', 'Unknown')
|
||||
reason = condition.get('reason', '')
|
||||
message = condition.get('message', '')
|
||||
return status, reason, message
|
||||
return 'Unknown', '', ''
|
||||
|
||||
|
||||
def calculate_duration(timestamp_str: str) -> str:
|
||||
"""Calculate duration from timestamp to now."""
|
||||
try:
|
||||
# Parse Kubernetes timestamp format
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return "unknown"
|
||||
|
||||
|
||||
def get_condition_duration(conditions: List[Dict], condition_type: str) -> str:
|
||||
"""Get the duration since a condition last transitioned."""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
last_transition = condition.get('lastTransitionTime')
|
||||
if last_transition:
|
||||
return calculate_duration(last_transition)
|
||||
return ""
|
||||
|
||||
|
||||
def format_operator_row(operator: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a ClusterOperator into a row for display."""
|
||||
name = operator.get('metadata', {}).get('name', 'unknown')
|
||||
conditions = operator.get('status', {}).get('conditions', [])
|
||||
versions = operator.get('status', {}).get('versions', [])
|
||||
|
||||
# Get version (first version in the list, usually the operator version)
|
||||
version = versions[0].get('version', '') if versions else ''
|
||||
|
||||
# Get condition statuses
|
||||
available_status, _, _ = get_condition_status(conditions, 'Available')
|
||||
progressing_status, _, _ = get_condition_status(conditions, 'Progressing')
|
||||
degraded_status, degraded_reason, degraded_msg = get_condition_status(conditions, 'Degraded')
|
||||
|
||||
# Determine which condition to show duration and message for
|
||||
# Priority: Degraded > Progressing > Available (if false)
|
||||
if degraded_status == 'True':
|
||||
since = get_condition_duration(conditions, 'Degraded')
|
||||
message = degraded_msg if degraded_msg else degraded_reason
|
||||
elif progressing_status == 'True':
|
||||
since = get_condition_duration(conditions, 'Progressing')
|
||||
_, prog_reason, prog_msg = get_condition_status(conditions, 'Progressing')
|
||||
message = prog_msg if prog_msg else prog_reason
|
||||
elif available_status == 'False':
|
||||
since = get_condition_duration(conditions, 'Available')
|
||||
_, avail_reason, avail_msg = get_condition_status(conditions, 'Available')
|
||||
message = avail_msg if avail_msg else avail_reason
|
||||
else:
|
||||
# All good, show time since available
|
||||
since = get_condition_duration(conditions, 'Available')
|
||||
message = ''
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'version': version,
|
||||
'available': available_status,
|
||||
'progressing': progressing_status,
|
||||
'degraded': degraded_status,
|
||||
'since': since,
|
||||
'message': message
|
||||
}
|
||||
|
||||
|
||||
def print_operators_table(operators: List[Dict[str, str]]):
|
||||
"""Print operators in a formatted table like 'oc get clusteroperators'."""
|
||||
if not operators:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header - no width limit on VERSION to match oc output
|
||||
print(f"{'NAME':<42} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'DEGRADED':<10} {'SINCE':<7} MESSAGE")
|
||||
|
||||
# Print rows
|
||||
for op in operators:
|
||||
name = op['name'][:42]
|
||||
version = op['version'] # Don't truncate version
|
||||
available = op['available'][:11]
|
||||
progressing = op['progressing'][:13]
|
||||
degraded = op['degraded'][:10]
|
||||
since = op['since'][:7]
|
||||
message = op['message']
|
||||
|
||||
print(f"{name:<42} {version:<50} {available:<11} {progressing:<13} {degraded:<10} {since:<7} {message}")
|
||||
|
||||
|
||||
def analyze_clusteroperators(must_gather_path: str):
|
||||
"""Analyze all clusteroperators in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Common paths where clusteroperators might be
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
|
||||
]
|
||||
|
||||
clusteroperators = []
|
||||
|
||||
# Find and parse all clusteroperator files
|
||||
for pattern in possible_patterns:
|
||||
for co_file in base_path.glob(pattern):
|
||||
operator = parse_clusteroperator(co_file)
|
||||
if operator:
|
||||
clusteroperators.append(operator)
|
||||
|
||||
if not clusteroperators:
|
||||
print("No resources found.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Remove duplicates (same operator from different glob patterns)
|
||||
seen = set()
|
||||
unique_operators = []
|
||||
for op in clusteroperators:
|
||||
name = op.get('metadata', {}).get('name')
|
||||
if name and name not in seen:
|
||||
seen.add(name)
|
||||
unique_operators.append(op)
|
||||
|
||||
# Format and sort operators by name
|
||||
formatted_ops = [format_operator_row(op) for op in unique_operators]
|
||||
formatted_ops.sort(key=lambda x: x['name'])
|
||||
|
||||
# Print results
|
||||
print_operators_table(formatted_ops)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_clusteroperators.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_clusteroperators.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_clusteroperators(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
261
skills/must-gather-analyzer/scripts/analyze_clusterversion.py
Executable file
@@ -0,0 +1,261 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze ClusterVersion from must-gather data.
|
||||
Displays output similar to 'oc get clusterversion' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_clusterversion(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse the clusterversion YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'ClusterVersion':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_condition_status(conditions: list, condition_type: str) -> str:
|
||||
"""Get status for a specific condition type."""
|
||||
for condition in conditions:
|
||||
if condition.get('type') == condition_type:
|
||||
return condition.get('status', 'Unknown')
|
||||
return 'Unknown'
|
||||
|
||||
|
||||
def calculate_duration(timestamp_str: str) -> str:
|
||||
"""Calculate duration from timestamp to now."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def format_clusterversion(cv: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format ClusterVersion for display."""
|
||||
name = cv.get('metadata', {}).get('name', 'version')
|
||||
status = cv.get('status', {})
|
||||
|
||||
# Get version from desired
|
||||
desired = status.get('desired', {})
|
||||
version = desired.get('version', '')
|
||||
|
||||
# Get available updates count
|
||||
available_updates = status.get('availableUpdates')
|
||||
if available_updates and isinstance(available_updates, list):
|
||||
available = str(len(available_updates))
|
||||
elif available_updates is None:
|
||||
available = ''
|
||||
else:
|
||||
available = '0'
|
||||
|
||||
# Get conditions
|
||||
conditions = status.get('conditions', [])
|
||||
progressing = get_condition_status(conditions, 'Progressing')
|
||||
since = ''
|
||||
|
||||
# Get time since progressing started (if true) or since last update
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Progressing':
|
||||
last_transition = condition.get('lastTransitionTime')
|
||||
if last_transition:
|
||||
since = calculate_duration(last_transition)
|
||||
break
|
||||
|
||||
# Get status message
|
||||
status_msg = ''
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Progressing' and condition.get('status') == 'True':
|
||||
status_msg = condition.get('message', '')[:80]
|
||||
break
|
||||
|
||||
# If not progressing, check if failed
|
||||
if progressing != 'True':
|
||||
for condition in conditions:
|
||||
if condition.get('type') == 'Failing' and condition.get('status') == 'True':
|
||||
status_msg = condition.get('message', '')[:80]
|
||||
break
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'version': version,
|
||||
'available': available,
|
||||
'progressing': progressing,
|
||||
'since': since,
|
||||
'status': status_msg
|
||||
}
|
||||
|
||||
|
||||
def print_clusterversion_table(cv_info: Dict[str, str]):
|
||||
"""Print ClusterVersion in a formatted table like 'oc get clusterversion'."""
|
||||
# Print header
|
||||
print(f"{'NAME':<10} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'SINCE':<7} STATUS")
|
||||
|
||||
# Print row
|
||||
name = cv_info['name'][:10]
|
||||
version = cv_info['version'][:50]
|
||||
available = cv_info['available'][:11]
|
||||
progressing = cv_info['progressing'][:13]
|
||||
since = cv_info['since'][:7]
|
||||
status = cv_info['status']
|
||||
|
||||
print(f"{name:<10} {version:<50} {available:<11} {progressing:<13} {since:<7} {status}")
|
||||
|
||||
|
||||
def print_detailed_info(cv: Dict[str, Any]):
|
||||
"""Print detailed cluster version information."""
|
||||
status = cv.get('status', {})
|
||||
spec = cv.get('spec', {})
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print("CLUSTER VERSION DETAILS")
|
||||
print(f"{'='*80}")
|
||||
|
||||
# Cluster ID
|
||||
cluster_id = spec.get('clusterID', 'unknown')
|
||||
print(f"Cluster ID: {cluster_id}")
|
||||
|
||||
# Desired version
|
||||
desired = status.get('desired', {})
|
||||
print(f"Desired Version: {desired.get('version', 'unknown')}")
|
||||
print(f"Desired Image: {desired.get('image', 'unknown')}")
|
||||
|
||||
# Version hash
|
||||
version_hash = status.get('versionHash', '')
|
||||
if version_hash:
|
||||
print(f"Version Hash: {version_hash}")
|
||||
|
||||
# Upstream
|
||||
upstream = spec.get('upstream', '')
|
||||
if upstream:
|
||||
print(f"Update Server: {upstream}")
|
||||
|
||||
# Conditions
|
||||
conditions = status.get('conditions', [])
|
||||
print(f"\nCONDITIONS:")
|
||||
for condition in conditions:
|
||||
cond_type = condition.get('type', 'Unknown')
|
||||
cond_status = condition.get('status', 'Unknown')
|
||||
last_transition = condition.get('lastTransitionTime', '')
|
||||
message = condition.get('message', '')
|
||||
|
||||
# Calculate time since transition
|
||||
age = calculate_duration(last_transition) if last_transition else ''
|
||||
|
||||
status_indicator = "✅" if cond_status == "True" else "❌" if cond_status == "False" else "❓"
|
||||
print(f" {status_indicator} {cond_type}: {cond_status} (for {age})")
|
||||
if message and cond_status == 'True':
|
||||
print(f" Message: {message[:100]}")
|
||||
|
||||
# Update history
|
||||
history = status.get('history', [])
|
||||
if history:
|
||||
print(f"\nUPDATE HISTORY (last 5):")
|
||||
for i, entry in enumerate(history[:5]):
|
||||
state = entry.get('state', 'Unknown')
|
||||
version = entry.get('version', 'unknown')
|
||||
image = entry.get('image', '')
|
||||
completion_time = entry.get('completionTime', '')
|
||||
|
||||
age = calculate_duration(completion_time) if completion_time else ''
|
||||
print(f" {i+1}. {version} - {state} {f'({age} ago)' if age else ''}")
|
||||
|
||||
# Available updates
|
||||
available_updates = status.get('availableUpdates')
|
||||
if available_updates and isinstance(available_updates, list) and len(available_updates) > 0:
|
||||
print(f"\nAVAILABLE UPDATES ({len(available_updates)}):")
|
||||
for i, update in enumerate(available_updates[:5]):
|
||||
version = update.get('version', 'unknown')
|
||||
image = update.get('image', '')
|
||||
print(f" {i+1}. {version}")
|
||||
elif available_updates is None:
|
||||
print(f"\nAVAILABLE UPDATES: Unable to retrieve updates")
|
||||
|
||||
# Capabilities
|
||||
capabilities = status.get('capabilities', {})
|
||||
enabled_caps = capabilities.get('enabledCapabilities', [])
|
||||
if enabled_caps:
|
||||
print(f"\nENABLED CAPABILITIES ({len(enabled_caps)}):")
|
||||
# Print in columns
|
||||
for i in range(0, len(enabled_caps), 3):
|
||||
caps = enabled_caps[i:i+3]
|
||||
print(f" {', '.join(caps)}")
|
||||
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
|
||||
def analyze_clusterversion(must_gather_path: str):
|
||||
"""Analyze ClusterVersion in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find ClusterVersion file
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
|
||||
]
|
||||
|
||||
cv = None
|
||||
for pattern in possible_patterns:
|
||||
for cv_file in base_path.glob(pattern):
|
||||
cv = parse_clusterversion(cv_file)
|
||||
if cv:
|
||||
break
|
||||
if cv:
|
||||
break
|
||||
|
||||
if not cv:
|
||||
print("No ClusterVersion found.")
|
||||
return 1
|
||||
|
||||
# Format and print table
|
||||
cv_info = format_clusterversion(cv)
|
||||
print_clusterversion_table(cv_info)
|
||||
|
||||
# Print detailed information
|
||||
print_detailed_info(cv)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_clusterversion.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_clusterversion.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_clusterversion(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
206
skills/must-gather-analyzer/scripts/analyze_etcd.py
Executable file
@@ -0,0 +1,206 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze etcd information from must-gather data.
|
||||
Shows etcd cluster health, member status, and diagnostics.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
|
||||
|
||||
def parse_etcd_info(must_gather_path: Path) -> Dict[str, Any]:
|
||||
"""Parse etcd_info directory for cluster health information."""
|
||||
etcd_data = {
|
||||
'member_health': [],
|
||||
'member_list': [],
|
||||
'endpoint_health': [],
|
||||
'endpoint_status': []
|
||||
}
|
||||
|
||||
# Find etcd_info directory
|
||||
etcd_dirs = list(must_gather_path.glob("etcd_info")) + \
|
||||
list(must_gather_path.glob("*/etcd_info"))
|
||||
|
||||
if not etcd_dirs:
|
||||
return etcd_data
|
||||
|
||||
etcd_info_dir = etcd_dirs[0]
|
||||
|
||||
# Parse member health
|
||||
member_health_file = etcd_info_dir / "endpoint_health.json"
|
||||
if member_health_file.exists():
|
||||
try:
|
||||
with open(member_health_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['member_health'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse member list
|
||||
member_list_file = etcd_info_dir / "member_list.json"
|
||||
if member_list_file.exists():
|
||||
try:
|
||||
with open(member_list_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
if isinstance(data, dict) and 'members' in data:
|
||||
etcd_data['member_list'] = data['members']
|
||||
elif isinstance(data, list):
|
||||
etcd_data['member_list'] = data
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse member_list.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse endpoint health
|
||||
endpoint_health_file = etcd_info_dir / "endpoint_health.json"
|
||||
if endpoint_health_file.exists():
|
||||
try:
|
||||
with open(endpoint_health_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['endpoint_health'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
|
||||
|
||||
# Parse endpoint status
|
||||
endpoint_status_file = etcd_info_dir / "endpoint_status.json"
|
||||
if endpoint_status_file.exists():
|
||||
try:
|
||||
with open(endpoint_status_file, 'r') as f:
|
||||
data = json.load(f)
|
||||
etcd_data['endpoint_status'] = data if isinstance(data, list) else [data]
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse endpoint_status.json: {e}", file=sys.stderr)
|
||||
|
||||
return etcd_data
|
||||
|
||||
|
||||
def print_member_health(members: List[Dict[str, Any]]):
|
||||
"""Print etcd member health status."""
|
||||
if not members:
|
||||
print("No member health data found.")
|
||||
return
|
||||
|
||||
print("ETCD MEMBER HEALTH")
|
||||
print(f"{'ENDPOINT':<60} {'HEALTH':<10} {'TOOK':<10} ERROR")
|
||||
|
||||
for member in members:
|
||||
endpoint = member.get('endpoint', 'unknown')[:60]
|
||||
health = 'true' if member.get('health') else 'false'
|
||||
took = member.get('took', '')
|
||||
error = member.get('error', '')
|
||||
|
||||
print(f"{endpoint:<60} {health:<10} {took:<10} {error}")
|
||||
|
||||
|
||||
def print_member_list(members: List[Dict[str, Any]]):
|
||||
"""Print etcd member list."""
|
||||
if not members:
|
||||
print("\nNo member list data found.")
|
||||
return
|
||||
|
||||
print("\nETCD MEMBER LIST")
|
||||
print(f"{'ID':<20} {'NAME':<40} {'PEER URLS':<60} {'CLIENT URLS':<60}")
|
||||
|
||||
for member in members:
|
||||
member_id = str(member.get('ID', member.get('id', 'unknown')))[:20]
|
||||
name = member.get('name', 'unknown')[:40]
|
||||
peer_urls = ','.join(member.get('peerURLs', []))[:60]
|
||||
client_urls = ','.join(member.get('clientURLs', []))[:60]
|
||||
|
||||
print(f"{member_id:<20} {name:<40} {peer_urls:<60} {client_urls:<60}")
|
||||
|
||||
|
||||
def print_endpoint_status(endpoints: List[Dict[str, Any]]):
|
||||
"""Print etcd endpoint status."""
|
||||
if not endpoints:
|
||||
print("\nNo endpoint status data found.")
|
||||
return
|
||||
|
||||
print("\nETCD ENDPOINT STATUS")
|
||||
print(f"{'ENDPOINT':<60} {'LEADER':<20} {'VERSION':<10} {'DB SIZE':<10} {'IS LEARNER'}")
|
||||
|
||||
for endpoint in endpoints:
|
||||
ep = endpoint.get('Endpoint', 'unknown')[:60]
|
||||
|
||||
status = endpoint.get('Status', {})
|
||||
leader = str(status.get('leader', 'unknown'))[:20]
|
||||
version = status.get('version', 'unknown')[:10]
|
||||
|
||||
db_size = status.get('dbSize', 0)
|
||||
db_size_mb = f"{db_size / (1024*1024):.1f}MB" if db_size else '0MB'
|
||||
|
||||
is_learner = 'true' if status.get('isLearner') else 'false'
|
||||
|
||||
print(f"{ep:<60} {leader:<20} {version:<10} {db_size_mb:<10} {is_learner}")
|
||||
|
||||
|
||||
def print_summary(etcd_data: Dict[str, Any]):
|
||||
"""Print summary of etcd cluster health."""
|
||||
member_health = etcd_data.get('member_health', [])
|
||||
member_list = etcd_data.get('member_list', [])
|
||||
|
||||
total_members = len(member_list)
|
||||
healthy_members = sum(1 for m in member_health if m.get('health'))
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"ETCD CLUSTER SUMMARY")
|
||||
print(f"{'='*80}")
|
||||
print(f"Total Members: {total_members}")
|
||||
print(f"Healthy Members: {healthy_members}/{len(member_health) if member_health else total_members}")
|
||||
|
||||
if healthy_members < total_members:
|
||||
print(f" ⚠️ Warning: Not all members are healthy!")
|
||||
elif healthy_members == total_members and total_members > 0:
|
||||
print(f" ✅ All members healthy")
|
||||
|
||||
# Check for quorum
|
||||
if total_members >= 3:
|
||||
quorum = (total_members // 2) + 1
|
||||
if healthy_members >= quorum:
|
||||
print(f" ✅ Quorum achieved ({healthy_members}/{quorum})")
|
||||
else:
|
||||
print(f" ❌ Quorum lost! ({healthy_members}/{quorum})")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
|
||||
def analyze_etcd(must_gather_path: str):
|
||||
"""Analyze etcd information in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
etcd_data = parse_etcd_info(base_path)
|
||||
|
||||
if not any(etcd_data.values()):
|
||||
print("No etcd_info data found in must-gather.")
|
||||
print("Expected location: etcd_info/ directory")
|
||||
return 1
|
||||
|
||||
# Print summary first
|
||||
print_summary(etcd_data)
|
||||
|
||||
# Print detailed information
|
||||
print_member_health(etcd_data.get('member_health', []))
|
||||
print_member_list(etcd_data.get('member_list', []))
|
||||
print_endpoint_status(etcd_data.get('endpoint_status', []))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_etcd.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_etcd.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_etcd(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
201
skills/must-gather-analyzer/scripts/analyze_events.py
Executable file
@@ -0,0 +1,201 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Events from must-gather data.
|
||||
Shows warning and error events sorted by last occurrence.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
def parse_events_file(file_path: Path) -> List[Dict[str, Any]]:
|
||||
"""Parse events YAML file which may contain multiple events."""
|
||||
events = []
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
docs = yaml.safe_load_all(f)
|
||||
for doc in docs:
|
||||
if doc and doc.get('kind') == 'Event':
|
||||
events.append(doc)
|
||||
elif doc and doc.get('kind') == 'EventList':
|
||||
# Handle EventList
|
||||
events.extend(doc.get('items', []))
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return events
|
||||
|
||||
|
||||
def calculate_age(timestamp_str: str) -> str:
|
||||
"""Calculate age from timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def format_event(event: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Format an event for display."""
|
||||
metadata = event.get('metadata', {})
|
||||
|
||||
namespace = metadata.get('namespace', '')
|
||||
name = metadata.get('name', 'unknown')
|
||||
|
||||
# Get last timestamp
|
||||
last_timestamp = event.get('lastTimestamp') or event.get('eventTime') or metadata.get('creationTimestamp', '')
|
||||
age = calculate_age(last_timestamp) if last_timestamp else ''
|
||||
|
||||
# Event details
|
||||
event_type = event.get('type', 'Normal')
|
||||
reason = event.get('reason', '')
|
||||
message = event.get('message', '')
|
||||
count = event.get('count', 1)
|
||||
|
||||
# Involved object
|
||||
involved = event.get('involvedObject', {})
|
||||
obj_kind = involved.get('kind', '')
|
||||
obj_name = involved.get('name', '')
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'last_seen': age,
|
||||
'type': event_type,
|
||||
'reason': reason,
|
||||
'object_kind': obj_kind,
|
||||
'object_name': obj_name,
|
||||
'message': message,
|
||||
'count': count,
|
||||
'timestamp': last_timestamp
|
||||
}
|
||||
|
||||
|
||||
def print_events_table(events: List[Dict[str, Any]]):
|
||||
"""Print events in a table format."""
|
||||
if not events:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
print(f"{'NAMESPACE':<30} {'LAST SEEN':<10} {'TYPE':<10} {'REASON':<30} {'OBJECT':<40} {'MESSAGE':<60}")
|
||||
|
||||
# Print rows
|
||||
for event in events:
|
||||
namespace = event['namespace'][:30] if event['namespace'] else '<cluster>'
|
||||
last_seen = event['last_seen'][:10]
|
||||
event_type = event['type'][:10]
|
||||
reason = event['reason'][:30]
|
||||
obj = f"{event['object_kind']}/{event['object_name']}"[:40]
|
||||
message = event['message'][:60]
|
||||
|
||||
print(f"{namespace:<30} {last_seen:<10} {event_type:<10} {reason:<30} {obj:<40} {message:<60}")
|
||||
|
||||
|
||||
def analyze_events(must_gather_path: str, namespace: Optional[str] = None,
|
||||
event_type: Optional[str] = None, show_count: int = 100):
|
||||
"""Analyze events in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
all_events = []
|
||||
|
||||
# Find all events files
|
||||
if namespace:
|
||||
patterns = [
|
||||
f"namespaces/{namespace}/core/events.yaml",
|
||||
f"*/namespaces/{namespace}/core/events.yaml",
|
||||
]
|
||||
else:
|
||||
patterns = [
|
||||
"namespaces/*/core/events.yaml",
|
||||
"*/namespaces/*/core/events.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for events_file in base_path.glob(pattern):
|
||||
events = parse_events_file(events_file)
|
||||
all_events.extend(events)
|
||||
|
||||
if not all_events:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Format events
|
||||
formatted_events = [format_event(e) for e in all_events]
|
||||
|
||||
# Filter by type if specified
|
||||
if event_type:
|
||||
formatted_events = [e for e in formatted_events if e['type'].lower() == event_type.lower()]
|
||||
|
||||
# Sort by timestamp (most recent first)
|
||||
formatted_events.sort(key=lambda x: x['timestamp'], reverse=True)
|
||||
|
||||
# Limit count
|
||||
if show_count and show_count > 0:
|
||||
formatted_events = formatted_events[:show_count]
|
||||
|
||||
# Print results
|
||||
print_events_table(formatted_events)
|
||||
|
||||
# Summary
|
||||
total = len(formatted_events)
|
||||
warnings = sum(1 for e in formatted_events if e['type'] == 'Warning')
|
||||
normal = sum(1 for e in formatted_events if e['type'] == 'Normal')
|
||||
|
||||
print(f"\nShowing {total} most recent events")
|
||||
if warnings > 0:
|
||||
print(f" ⚠️ {warnings} Warning events")
|
||||
if normal > 0:
|
||||
print(f" ℹ️ {normal} Normal events")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze events from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-etcd
|
||||
%(prog)s ./must-gather --type Warning
|
||||
%(prog)s ./must-gather --count 50
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||
parser.add_argument('-t', '--type', help='Filter by event type (Warning, Normal)')
|
||||
parser.add_argument('-c', '--count', type=int, default=100,
|
||||
help='Number of events to show (default: 100)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_events(args.must_gather_path, args.namespace, args.type, args.count)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
281
skills/must-gather-analyzer/scripts/analyze_network.py
Executable file
@@ -0,0 +1,281 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Network resources and diagnostics from must-gather data.
|
||||
Shows network operator status, OVN pods, and connectivity checks.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def get_network_type(must_gather_path: Path) -> str:
|
||||
"""Determine the network type from cluster network config."""
|
||||
# First try to find networks.yaml (List object)
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/networks.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for network_file in must_gather_path.glob(pattern):
|
||||
network_list = parse_yaml_file(network_file)
|
||||
if network_list:
|
||||
# Handle NetworkList object
|
||||
items = network_list.get('items', [])
|
||||
if items:
|
||||
# Get the first network item
|
||||
network = items[0]
|
||||
spec = network.get('spec', {})
|
||||
network_type = spec.get('networkType', 'Unknown')
|
||||
if network_type != 'Unknown':
|
||||
return network_type
|
||||
|
||||
# Fallback: try individual network config files
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for network_file in must_gather_path.glob(pattern):
|
||||
if network_file.name in ['networks.yaml']:
|
||||
continue
|
||||
|
||||
network = parse_yaml_file(network_file)
|
||||
if network:
|
||||
spec = network.get('spec', {})
|
||||
network_type = spec.get('networkType', 'Unknown')
|
||||
if network_type != 'Unknown':
|
||||
return network_type
|
||||
|
||||
return 'Unknown'
|
||||
|
||||
|
||||
def analyze_network_operator(must_gather_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Analyze network operator status."""
|
||||
patterns = [
|
||||
"cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for op_file in must_gather_path.glob(pattern):
|
||||
operator = parse_yaml_file(op_file)
|
||||
if operator:
|
||||
conditions = operator.get('status', {}).get('conditions', [])
|
||||
result = {}
|
||||
|
||||
for cond in conditions:
|
||||
cond_type = cond.get('type')
|
||||
if cond_type in ['Available', 'Progressing', 'Degraded']:
|
||||
result[cond_type] = cond.get('status', 'Unknown')
|
||||
result[f'{cond_type}_message'] = cond.get('message', '')
|
||||
|
||||
return result
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def analyze_ovn_pods(must_gather_path: Path) -> List[Dict[str, str]]:
|
||||
"""Analyze OVN-Kubernetes pods."""
|
||||
pods = []
|
||||
|
||||
patterns = [
|
||||
"namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||
"*/namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for pod_file in must_gather_path.glob(pattern):
|
||||
if pod_file.name == 'pods.yaml':
|
||||
continue
|
||||
|
||||
pod = parse_yaml_file(pod_file)
|
||||
if pod:
|
||||
name = pod.get('metadata', {}).get('name', 'unknown')
|
||||
status = pod.get('status', {})
|
||||
phase = status.get('phase', 'Unknown')
|
||||
|
||||
container_statuses = status.get('containerStatuses', [])
|
||||
total = len(pod.get('spec', {}).get('containers', []))
|
||||
ready = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||
|
||||
pods.append({
|
||||
'name': name,
|
||||
'ready': f"{ready}/{total}",
|
||||
'status': phase
|
||||
})
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_pods = []
|
||||
for p in pods:
|
||||
if p['name'] not in seen:
|
||||
seen.add(p['name'])
|
||||
unique_pods.append(p)
|
||||
|
||||
return sorted(unique_pods, key=lambda x: x['name'])
|
||||
|
||||
|
||||
def analyze_connectivity_checks(must_gather_path: Path) -> Dict[str, Any]:
|
||||
"""Analyze PodNetworkConnectivityCheck resources."""
|
||||
# First try to find podnetworkconnectivitychecks.yaml (List object)
|
||||
patterns = [
|
||||
"pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||
"*/pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
|
||||
]
|
||||
|
||||
total_checks = 0
|
||||
failed_checks = []
|
||||
|
||||
for pattern in patterns:
|
||||
for check_file in must_gather_path.glob(pattern):
|
||||
check_list = parse_yaml_file(check_file)
|
||||
if check_list:
|
||||
items = check_list.get('items', [])
|
||||
for check in items:
|
||||
total_checks += 1
|
||||
name = check.get('metadata', {}).get('name', 'unknown')
|
||||
status = check.get('status', {})
|
||||
|
||||
conditions = status.get('conditions', [])
|
||||
for cond in conditions:
|
||||
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||
failed_checks.append({
|
||||
'name': name,
|
||||
'message': cond.get('message', 'Unknown')
|
||||
})
|
||||
|
||||
# If we found the list file, no need to continue
|
||||
if total_checks > 0:
|
||||
return {
|
||||
'total': total_checks,
|
||||
'failed': failed_checks
|
||||
}
|
||||
|
||||
# Fallback: try individual check files
|
||||
patterns = [
|
||||
"*/pod_network_connectivity_check/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for check_file in must_gather_path.glob(pattern):
|
||||
if check_file.name == 'podnetworkconnectivitychecks.yaml':
|
||||
continue
|
||||
|
||||
check = parse_yaml_file(check_file)
|
||||
if check:
|
||||
total_checks += 1
|
||||
name = check.get('metadata', {}).get('name', 'unknown')
|
||||
status = check.get('status', {})
|
||||
|
||||
conditions = status.get('conditions', [])
|
||||
for cond in conditions:
|
||||
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
|
||||
failed_checks.append({
|
||||
'name': name,
|
||||
'message': cond.get('message', 'Unknown')
|
||||
})
|
||||
|
||||
return {
|
||||
'total': total_checks,
|
||||
'failed': failed_checks
|
||||
}
|
||||
|
||||
|
||||
def print_network_summary(network_type: str, operator_status: Optional[Dict],
|
||||
ovn_pods: List[Dict], connectivity: Dict):
|
||||
"""Print network analysis summary."""
|
||||
print(f"{'NETWORK TYPE':<30} {network_type}")
|
||||
print()
|
||||
|
||||
if operator_status:
|
||||
print("NETWORK OPERATOR STATUS")
|
||||
print(f"{'Available':<15} {operator_status.get('Available', 'Unknown')}")
|
||||
print(f"{'Progressing':<15} {operator_status.get('Progressing', 'Unknown')}")
|
||||
print(f"{'Degraded':<15} {operator_status.get('Degraded', 'Unknown')}")
|
||||
|
||||
if operator_status.get('Degraded') == 'True':
|
||||
msg = operator_status.get('Degraded_message', '')
|
||||
if msg:
|
||||
print(f" Message: {msg}")
|
||||
print()
|
||||
|
||||
if ovn_pods and network_type == 'OVNKubernetes':
|
||||
print("OVN-KUBERNETES PODS")
|
||||
print(f"{'NAME':<60} {'READY':<10} STATUS")
|
||||
for pod in ovn_pods:
|
||||
name = pod['name'][:60]
|
||||
ready = pod['ready'][:10]
|
||||
status = pod['status']
|
||||
print(f"{name:<60} {ready:<10} {status}")
|
||||
print()
|
||||
|
||||
if connectivity['total'] > 0:
|
||||
print(f"NETWORK CONNECTIVITY CHECKS: {connectivity['total']} total")
|
||||
if connectivity['failed']:
|
||||
print(f" Failed: {len(connectivity['failed'])}")
|
||||
for failed in connectivity['failed'][:10]: # Show first 10
|
||||
print(f" - {failed['name']}")
|
||||
if failed['message']:
|
||||
print(f" {failed['message'][:100]}")
|
||||
else:
|
||||
print(" All checks passing")
|
||||
print()
|
||||
|
||||
|
||||
def analyze_network(must_gather_path: str):
|
||||
"""Analyze network resources in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Get network type
|
||||
network_type = get_network_type(base_path)
|
||||
|
||||
# Get network operator status
|
||||
operator_status = analyze_network_operator(base_path)
|
||||
|
||||
# Get OVN pods if applicable
|
||||
ovn_pods = []
|
||||
if network_type == 'OVNKubernetes':
|
||||
ovn_pods = analyze_ovn_pods(base_path)
|
||||
|
||||
# Get connectivity checks
|
||||
connectivity = analyze_connectivity_checks(base_path)
|
||||
|
||||
# Print summary
|
||||
print_network_summary(network_type, operator_status, ovn_pods, connectivity)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: analyze_network.py <must-gather-directory>", file=sys.stderr)
|
||||
print("\nExample:", file=sys.stderr)
|
||||
print(" analyze_network.py ./must-gather.local.123456789", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
must_gather_path = sys.argv[1]
|
||||
|
||||
if not os.path.isdir(must_gather_path):
|
||||
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_network(must_gather_path)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_nodes.py
Executable file
@@ -0,0 +1,224 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Node resources from must-gather data.
|
||||
Displays output similar to 'oc get nodes' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_node(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single node YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'Node':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def calculate_age(creation_timestamp: str) -> str:
|
||||
"""Calculate age from creation timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
else:
|
||||
return "<1h"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def get_node_roles(labels: Dict[str, str]) -> str:
|
||||
"""Extract node roles from labels."""
|
||||
roles = []
|
||||
for key in labels:
|
||||
if key.startswith('node-role.kubernetes.io/'):
|
||||
role = key.split('/')[-1]
|
||||
if role:
|
||||
roles.append(role)
|
||||
|
||||
return ','.join(sorted(roles)) if roles else '<none>'
|
||||
|
||||
|
||||
def get_node_status(node: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract node status information."""
|
||||
metadata = node.get('metadata', {})
|
||||
status = node.get('status', {})
|
||||
|
||||
name = metadata.get('name', 'unknown')
|
||||
labels = metadata.get('labels', {})
|
||||
creation_time = metadata.get('creationTimestamp', '')
|
||||
|
||||
# Get roles
|
||||
roles = get_node_roles(labels)
|
||||
|
||||
# Get conditions
|
||||
conditions = status.get('conditions', [])
|
||||
ready_condition = 'Unknown'
|
||||
node_issues = []
|
||||
|
||||
for condition in conditions:
|
||||
cond_type = condition.get('type', '')
|
||||
cond_status = condition.get('status', 'Unknown')
|
||||
|
||||
if cond_type == 'Ready':
|
||||
ready_condition = cond_status
|
||||
elif cond_status == 'True' and cond_type in ['MemoryPressure', 'DiskPressure', 'PIDPressure', 'NetworkUnavailable']:
|
||||
node_issues.append(cond_type)
|
||||
|
||||
# Determine overall status
|
||||
if ready_condition == 'True':
|
||||
node_status = 'Ready'
|
||||
elif ready_condition == 'False':
|
||||
node_status = 'NotReady'
|
||||
else:
|
||||
node_status = 'Unknown'
|
||||
|
||||
# Add issues to status
|
||||
if node_issues:
|
||||
node_status = f"{node_status},{','.join(node_issues)}"
|
||||
|
||||
# Get version
|
||||
node_info = status.get('nodeInfo', {})
|
||||
version = node_info.get('kubeletVersion', '')
|
||||
|
||||
# Get age
|
||||
age = calculate_age(creation_time) if creation_time else ''
|
||||
|
||||
# Internal IP
|
||||
addresses = status.get('addresses', [])
|
||||
internal_ip = ''
|
||||
for addr in addresses:
|
||||
if addr.get('type') == 'InternalIP':
|
||||
internal_ip = addr.get('address', '')
|
||||
break
|
||||
|
||||
# OS Image
|
||||
os_image = node_info.get('osImage', '')
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'status': node_status,
|
||||
'roles': roles,
|
||||
'age': age,
|
||||
'version': version,
|
||||
'internal_ip': internal_ip,
|
||||
'os_image': os_image,
|
||||
'is_problem': node_status != 'Ready' or len(node_issues) > 0
|
||||
}
|
||||
|
||||
|
||||
def print_nodes_table(nodes: List[Dict[str, Any]]):
|
||||
"""Print nodes in a formatted table like 'oc get nodes'."""
|
||||
if not nodes:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
print(f"{'NAME':<50} {'STATUS':<30} {'ROLES':<20} {'AGE':<7} VERSION")
|
||||
|
||||
# Print rows
|
||||
for node in nodes:
|
||||
name = node['name'][:50]
|
||||
status = node['status'][:30]
|
||||
roles = node['roles'][:20]
|
||||
age = node['age'][:7]
|
||||
version = node['version']
|
||||
|
||||
print(f"{name:<50} {status:<30} {roles:<20} {age:<7} {version}")
|
||||
|
||||
|
||||
def analyze_nodes(must_gather_path: str, problems_only: bool = False):
|
||||
"""Analyze all nodes in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find all node YAML files
|
||||
possible_patterns = [
|
||||
"cluster-scoped-resources/core/nodes/*.yaml",
|
||||
"*/cluster-scoped-resources/core/nodes/*.yaml",
|
||||
]
|
||||
|
||||
nodes = []
|
||||
|
||||
for pattern in possible_patterns:
|
||||
for node_file in base_path.glob(pattern):
|
||||
# Skip the nodes.yaml file that contains all nodes
|
||||
if node_file.name == 'nodes.yaml':
|
||||
continue
|
||||
|
||||
node = parse_node(node_file)
|
||||
if node:
|
||||
node_status = get_node_status(node)
|
||||
nodes.append(node_status)
|
||||
|
||||
if not nodes:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_nodes = []
|
||||
for n in nodes:
|
||||
if n['name'] not in seen:
|
||||
seen.add(n['name'])
|
||||
unique_nodes.append(n)
|
||||
|
||||
# Sort by name
|
||||
unique_nodes.sort(key=lambda x: x['name'])
|
||||
|
||||
# Filter if problems only
|
||||
if problems_only:
|
||||
unique_nodes = [n for n in unique_nodes if n['is_problem']]
|
||||
if not unique_nodes:
|
||||
print("No resources found.")
|
||||
return 0
|
||||
|
||||
# Print results
|
||||
print_nodes_table(unique_nodes)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze node resources from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather.local.123456789
|
||||
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||
help='Show only nodes with issues')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_nodes(args.must_gather_path, args.problems_only)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
444
skills/must-gather-analyzer/scripts/analyze_ovn_dbs.py
Executable file
@@ -0,0 +1,444 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze OVN Northbound and Southbound databases from must-gather.
|
||||
Uses ovsdb-tool to read binary .db files collected per-node.
|
||||
|
||||
Must-gather structure:
|
||||
network_logs/
|
||||
└── ovnk_database_store.tar.gz
|
||||
└── ovnk_database_store/
|
||||
├── ovnkube-node-{pod}_nbdb (per-zone NBDB)
|
||||
├── ovnkube-node-{pod}_sbdb (per-zone SBDB)
|
||||
└── ...
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import tarfile
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
class OVNDatabase:
|
||||
"""Wrapper for querying OVSDB files using ovsdb-tool"""
|
||||
|
||||
def __init__(self, db_path: Path, db_type: str, node_name: str = None):
|
||||
self.db_path = db_path
|
||||
self.db_type = db_type # 'nbdb' or 'sbdb'
|
||||
self.pod_name = db_path.stem.replace('_nbdb', '').replace('_sbdb', '')
|
||||
self.node_name = node_name or self.pod_name # Use node name if available
|
||||
|
||||
def query(self, table: str, columns: List[str] = None, where: List = None) -> List[Dict]:
|
||||
"""Query OVSDB table using ovsdb-tool query command"""
|
||||
schema = "OVN_Northbound" if self.db_type == "nbdb" else "OVN_Southbound"
|
||||
|
||||
# Build query
|
||||
query_op = {
|
||||
"op": "select",
|
||||
"table": table,
|
||||
"where": where or []
|
||||
}
|
||||
|
||||
if columns:
|
||||
query_op["columns"] = columns
|
||||
|
||||
query_json = json.dumps([schema, query_op])
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ovsdb-tool', 'query', str(self.db_path), query_json],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Warning: Query failed for {self.db_path}: {result.stderr}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
data = json.loads(result.stdout)
|
||||
return data[0].get('rows', [])
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to query {table} from {self.db_path}: {e}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
|
||||
def build_pod_to_node_mapping(mg_path: Path) -> Dict[str, str]:
|
||||
"""Build mapping of ovnkube pod names to node names"""
|
||||
pod_to_node = {}
|
||||
|
||||
# Look for ovnkube-node pods in openshift-ovn-kubernetes namespace
|
||||
ovn_ns_path = mg_path / "namespaces" / "openshift-ovn-kubernetes" / "pods"
|
||||
|
||||
if not ovn_ns_path.exists():
|
||||
print(f"Warning: OVN namespace pods not found at {ovn_ns_path}", file=sys.stderr)
|
||||
return pod_to_node
|
||||
|
||||
# Find all ovnkube-node pod directories
|
||||
for pod_dir in ovn_ns_path.glob("ovnkube-node-*"):
|
||||
if not pod_dir.is_dir():
|
||||
continue
|
||||
|
||||
pod_name = pod_dir.name
|
||||
pod_yaml = pod_dir / f"{pod_name}.yaml"
|
||||
|
||||
if not pod_yaml.exists():
|
||||
continue
|
||||
|
||||
try:
|
||||
with open(pod_yaml, 'r') as f:
|
||||
pod = yaml.safe_load(f)
|
||||
node_name = pod.get('spec', {}).get('nodeName')
|
||||
if node_name:
|
||||
pod_to_node[pod_name] = node_name
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {pod_yaml}: {e}", file=sys.stderr)
|
||||
|
||||
return pod_to_node
|
||||
|
||||
|
||||
def extract_db_tarball(mg_path: Path) -> Path:
|
||||
"""Extract ovnk_database_store.tar.gz if not already extracted"""
|
||||
network_logs = mg_path / "network_logs"
|
||||
tarball = network_logs / "ovnk_database_store.tar.gz"
|
||||
extract_dir = network_logs / "ovnk_database_store"
|
||||
|
||||
if not tarball.exists():
|
||||
print(f"Error: Database tarball not found: {tarball}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
# Extract if directory doesn't exist
|
||||
if not extract_dir.exists():
|
||||
print(f"Extracting {tarball}...")
|
||||
with tarfile.open(tarball, 'r:gz') as tar:
|
||||
tar.extractall(path=network_logs)
|
||||
|
||||
return extract_dir
|
||||
|
||||
|
||||
def get_nb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||
"""Find all NB database files and map them to nodes"""
|
||||
databases = []
|
||||
for db in sorted(db_dir.glob("*_nbdb")):
|
||||
pod_name = db.stem.replace('_nbdb', '')
|
||||
node_name = pod_to_node.get(pod_name)
|
||||
databases.append(OVNDatabase(db, 'nbdb', node_name))
|
||||
return databases
|
||||
|
||||
|
||||
def get_sb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
|
||||
"""Find all SB database files and map them to nodes"""
|
||||
databases = []
|
||||
for db in sorted(db_dir.glob("*_sbdb")):
|
||||
pod_name = db.stem.replace('_sbdb', '')
|
||||
node_name = pod_to_node.get(pod_name)
|
||||
databases.append(OVNDatabase(db, 'sbdb', node_name))
|
||||
return databases
|
||||
|
||||
|
||||
def analyze_logical_switches(db: OVNDatabase):
|
||||
"""Analyze logical switches in the zone"""
|
||||
switches = db.query("Logical_Switch", columns=["name", "ports", "other_config"])
|
||||
|
||||
if not switches:
|
||||
print(" No logical switches found.")
|
||||
return
|
||||
|
||||
print(f"\n LOGICAL SWITCHES ({len(switches)}):")
|
||||
print(f" {'NAME':<60} PORTS")
|
||||
print(f" {'-'*80}")
|
||||
|
||||
for sw in switches:
|
||||
name = sw.get('name', 'unknown')
|
||||
# ports is a UUID set, just count them
|
||||
port_count = 0
|
||||
ports = sw.get('ports', [])
|
||||
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||
port_count = len(ports[1])
|
||||
|
||||
print(f" {name:<60} {port_count}")
|
||||
|
||||
|
||||
def analyze_logical_switch_ports(db: OVNDatabase):
|
||||
"""Analyze logical switch ports, focusing on pods"""
|
||||
lsps = db.query("Logical_Switch_Port", columns=["name", "external_ids", "addresses"])
|
||||
|
||||
# Filter for pod ports (have pod=true in external_ids)
|
||||
pod_ports = []
|
||||
for lsp in lsps:
|
||||
ext_ids = lsp.get('external_ids', [])
|
||||
if isinstance(ext_ids, list) and len(ext_ids) == 2 and ext_ids[0] == "map":
|
||||
ext_map = dict(ext_ids[1])
|
||||
if ext_map.get('pod') == 'true':
|
||||
# Pod name is in the LSP name (format: namespace_podname)
|
||||
lsp_name = lsp.get('name', '')
|
||||
namespace = ext_map.get('namespace', '')
|
||||
|
||||
# Extract pod name from LSP name
|
||||
pod_name = lsp_name
|
||||
if lsp_name.startswith(namespace + '_'):
|
||||
pod_name = lsp_name[len(namespace) + 1:]
|
||||
|
||||
# Extract IP from addresses (format can be string "MAC IP" or empty)
|
||||
ip = ""
|
||||
addrs = lsp.get('addresses', '')
|
||||
if isinstance(addrs, str) and addrs:
|
||||
parts = addrs.split()
|
||||
if len(parts) > 1:
|
||||
ip = parts[1]
|
||||
|
||||
pod_ports.append({
|
||||
'name': lsp_name,
|
||||
'namespace': namespace,
|
||||
'pod_name': pod_name,
|
||||
'ip': ip
|
||||
})
|
||||
|
||||
if not pod_ports:
|
||||
print(" No pod logical switch ports found.")
|
||||
return
|
||||
|
||||
print(f"\n POD LOGICAL SWITCH PORTS ({len(pod_ports)}):")
|
||||
print(f" {'NAMESPACE':<40} {'POD':<45} IP")
|
||||
print(f" {'-'*120}")
|
||||
|
||||
for port in sorted(pod_ports, key=lambda x: (x['namespace'], x['pod_name']))[:20]: # Show first 20
|
||||
namespace = port['namespace'][:40]
|
||||
pod_name = port['pod_name'][:45]
|
||||
ip = port['ip']
|
||||
|
||||
print(f" {namespace:<40} {pod_name:<45} {ip}")
|
||||
|
||||
if len(pod_ports) > 20:
|
||||
print(f" ... and {len(pod_ports) - 20} more")
|
||||
|
||||
|
||||
def analyze_acls(db: OVNDatabase):
|
||||
"""Analyze ACLs in the zone"""
|
||||
acls = db.query("ACL", columns=["priority", "direction", "match", "action", "severity"])
|
||||
|
||||
if not acls:
|
||||
print(" No ACLs found.")
|
||||
return
|
||||
|
||||
print(f"\n ACCESS CONTROL LISTS ({len(acls)}):")
|
||||
print(f" {'PRIORITY':<10} {'DIRECTION':<15} {'ACTION':<15} MATCH")
|
||||
print(f" {'-'*120}")
|
||||
|
||||
# Show highest priority ACLs first
|
||||
sorted_acls = sorted(acls, key=lambda x: x.get('priority', 0), reverse=True)
|
||||
|
||||
for acl in sorted_acls[:15]: # Show top 15
|
||||
priority = acl.get('priority', 0)
|
||||
direction = acl.get('direction', '')
|
||||
action = acl.get('action', '')
|
||||
match = acl.get('match', '')[:70] # Truncate long matches
|
||||
|
||||
print(f" {priority:<10} {direction:<15} {action:<15} {match}")
|
||||
|
||||
if len(acls) > 15:
|
||||
print(f" ... and {len(acls) - 15} more")
|
||||
|
||||
|
||||
def analyze_logical_routers(db: OVNDatabase):
|
||||
"""Analyze logical routers in the zone"""
|
||||
routers = db.query("Logical_Router", columns=["name", "ports", "static_routes"])
|
||||
|
||||
if not routers:
|
||||
print(" No logical routers found.")
|
||||
return
|
||||
|
||||
print(f"\n LOGICAL ROUTERS ({len(routers)}):")
|
||||
print(f" {'NAME':<60} PORTS")
|
||||
print(f" {'-'*80}")
|
||||
|
||||
for router in routers:
|
||||
name = router.get('name', 'unknown')
|
||||
|
||||
# Count ports
|
||||
port_count = 0
|
||||
ports = router.get('ports', [])
|
||||
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
|
||||
port_count = len(ports[1])
|
||||
|
||||
print(f" {name:<60} {port_count}")
|
||||
|
||||
|
||||
def analyze_zone_summary(db: OVNDatabase):
|
||||
"""Print summary for a zone"""
|
||||
# Get counts - for ACLs we need multiple columns to get accurate count
|
||||
switches = db.query("Logical_Switch", columns=["name"])
|
||||
lsps = db.query("Logical_Switch_Port", columns=["name"])
|
||||
acls = db.query("ACL", columns=["priority", "direction", "match"])
|
||||
routers = db.query("Logical_Router", columns=["name"])
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"Node: {db.node_name}")
|
||||
if db.node_name != db.pod_name:
|
||||
print(f"Pod: {db.pod_name}")
|
||||
print(f"{'='*80}")
|
||||
print(f" Logical Switches: {len(switches)}")
|
||||
print(f" Logical Switch Ports: {len(lsps)}")
|
||||
print(f" ACLs: {len(acls)}")
|
||||
print(f" Logical Routers: {len(routers)}")
|
||||
|
||||
|
||||
def run_raw_query(mg_path: str, node_filter: str, query_json: str):
|
||||
"""Run a raw JSON query against OVN databases"""
|
||||
base_path = Path(mg_path)
|
||||
|
||||
# Build pod-to-node mapping
|
||||
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||
|
||||
# Extract tarball
|
||||
db_dir = extract_db_tarball(base_path)
|
||||
if not db_dir:
|
||||
return 1
|
||||
|
||||
# Get all NB databases
|
||||
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||
|
||||
if not nb_dbs:
|
||||
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Filter by node if specified
|
||||
if node_filter:
|
||||
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||
if not filtered_dbs:
|
||||
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||
for db in nb_dbs:
|
||||
print(f" - {db.node_name}", file=sys.stderr)
|
||||
return 1
|
||||
nb_dbs = filtered_dbs
|
||||
|
||||
# Run query on each database
|
||||
for db in nb_dbs:
|
||||
print(f"\n{'='*80}")
|
||||
print(f"Node: {db.node_name}")
|
||||
if db.node_name != db.pod_name:
|
||||
print(f"Pod: {db.pod_name}")
|
||||
print(f"{'='*80}\n")
|
||||
|
||||
try:
|
||||
# Run the raw query using ovsdb-tool
|
||||
result = subprocess.run(
|
||||
['ovsdb-tool', 'query', str(db.db_path), query_json],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: Query failed: {result.stderr}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
# Pretty print the JSON result
|
||||
try:
|
||||
data = json.loads(result.stdout)
|
||||
print(json.dumps(data, indent=2))
|
||||
except json.JSONDecodeError:
|
||||
# If not valid JSON, just print raw output
|
||||
print(result.stdout)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: Failed to execute query: {e}", file=sys.stderr)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def analyze_northbound_databases(mg_path: str, node_filter: str = None):
|
||||
"""Analyze all Northbound databases"""
|
||||
base_path = Path(mg_path)
|
||||
|
||||
# Build pod-to-node mapping
|
||||
pod_to_node = build_pod_to_node_mapping(base_path)
|
||||
|
||||
# Extract tarball
|
||||
db_dir = extract_db_tarball(base_path)
|
||||
if not db_dir:
|
||||
return 1
|
||||
|
||||
# Get all NB databases
|
||||
nb_dbs = get_nb_databases(db_dir, pod_to_node)
|
||||
|
||||
if not nb_dbs:
|
||||
print("No Northbound databases found in must-gather.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Filter by node if specified
|
||||
if node_filter:
|
||||
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
|
||||
if not filtered_dbs:
|
||||
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
|
||||
print(f"\nAvailable nodes:", file=sys.stderr)
|
||||
for db in nb_dbs:
|
||||
print(f" - {db.node_name}", file=sys.stderr)
|
||||
return 1
|
||||
nb_dbs = filtered_dbs
|
||||
|
||||
print(f"\nFound {len(nb_dbs)} node(s)\n")
|
||||
|
||||
# Analyze each zone
|
||||
for db in nb_dbs:
|
||||
analyze_zone_summary(db)
|
||||
analyze_logical_switches(db)
|
||||
analyze_logical_switch_ports(db)
|
||||
analyze_acls(db)
|
||||
analyze_logical_routers(db)
|
||||
print()
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze OVN databases from must-gather",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Analyze all nodes
|
||||
analyze_ovn_dbs.py ./must-gather.local.123456789
|
||||
|
||||
# Analyze specific node
|
||||
analyze_ovn_dbs.py ./must-gather.local.123456789 --node ip-10-0-26-145
|
||||
|
||||
# Run raw OVSDB query (Claude can construct the JSON)
|
||||
analyze_ovn_dbs.py ./must-gather/ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
|
||||
|
||||
# Query specific node
|
||||
analyze_ovn_dbs.py ./must-gather/ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name"]}]'
|
||||
"""
|
||||
)
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('--node', '-n', help='Filter by node name (supports partial matches)')
|
||||
parser.add_argument('--query', '-q', help='Run raw OVSDB JSON query instead of standard analysis')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Check if ovsdb-tool is available
|
||||
try:
|
||||
subprocess.run(['ovsdb-tool', '--version'], capture_output=True, check=True)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("Error: ovsdb-tool not found. Please install openvswitch package.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
# Run query mode or standard analysis
|
||||
if args.query:
|
||||
return run_raw_query(args.must_gather_path, args.node, args.query)
|
||||
else:
|
||||
return analyze_northbound_databases(args.must_gather_path, args.node)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
224
skills/must-gather-analyzer/scripts/analyze_pods.py
Executable file
@@ -0,0 +1,224 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Pod resources from must-gather data.
|
||||
Displays output similar to 'oc get pods -A' command.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_pod(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a single pod YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
if doc and doc.get('kind') == 'Pod':
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def calculate_age(creation_timestamp: str) -> str:
|
||||
"""Calculate age from creation timestamp."""
|
||||
try:
|
||||
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
|
||||
now = datetime.now(ts.tzinfo)
|
||||
delta = now - ts
|
||||
|
||||
days = delta.days
|
||||
hours = delta.seconds // 3600
|
||||
minutes = (delta.seconds % 3600) // 60
|
||||
|
||||
if days > 0:
|
||||
return f"{days}d"
|
||||
elif hours > 0:
|
||||
return f"{hours}h"
|
||||
elif minutes > 0:
|
||||
return f"{minutes}m"
|
||||
else:
|
||||
return "<1m"
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def get_pod_status(pod: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extract pod status information."""
|
||||
metadata = pod.get('metadata', {})
|
||||
status = pod.get('status', {})
|
||||
spec = pod.get('spec', {})
|
||||
|
||||
name = metadata.get('name', 'unknown')
|
||||
namespace = metadata.get('namespace', 'unknown')
|
||||
creation_time = metadata.get('creationTimestamp', '')
|
||||
|
||||
# Get container statuses
|
||||
container_statuses = status.get('containerStatuses', [])
|
||||
init_container_statuses = status.get('initContainerStatuses', [])
|
||||
|
||||
# Calculate ready containers
|
||||
total_containers = len(spec.get('containers', []))
|
||||
ready_containers = sum(1 for cs in container_statuses if cs.get('ready', False))
|
||||
|
||||
# Get overall phase
|
||||
phase = status.get('phase', 'Unknown')
|
||||
|
||||
# Determine more specific status
|
||||
pod_status = phase
|
||||
reason = status.get('reason', '')
|
||||
|
||||
# Check for specific container states
|
||||
for cs in container_statuses:
|
||||
state = cs.get('state', {})
|
||||
if 'waiting' in state:
|
||||
waiting = state['waiting']
|
||||
pod_status = waiting.get('reason', 'Waiting')
|
||||
elif 'terminated' in state:
|
||||
terminated = state['terminated']
|
||||
if terminated.get('exitCode', 0) != 0:
|
||||
pod_status = terminated.get('reason', 'Error')
|
||||
|
||||
# Check init containers
|
||||
for ics in init_container_statuses:
|
||||
state = ics.get('state', {})
|
||||
if 'waiting' in state:
|
||||
waiting = state['waiting']
|
||||
if waiting.get('reason') in ['CrashLoopBackOff', 'ImagePullBackOff', 'ErrImagePull']:
|
||||
pod_status = f"Init:{waiting.get('reason', 'Waiting')}"
|
||||
|
||||
# Calculate total restarts
|
||||
total_restarts = sum(cs.get('restartCount', 0) for cs in container_statuses)
|
||||
|
||||
# Calculate age
|
||||
age = calculate_age(creation_time) if creation_time else ''
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'name': name,
|
||||
'ready': f"{ready_containers}/{total_containers}",
|
||||
'status': pod_status,
|
||||
'restarts': str(total_restarts),
|
||||
'age': age,
|
||||
'node': spec.get('nodeName', ''),
|
||||
'is_problem': pod_status not in ['Running', 'Succeeded', 'Completed'] or total_restarts > 0
|
||||
}
|
||||
|
||||
|
||||
def print_pods_table(pods: List[Dict[str, Any]], show_namespace: bool = True):
|
||||
"""Print pods in a formatted table like 'oc get pods'."""
|
||||
if not pods:
|
||||
print("No resources found.")
|
||||
return
|
||||
|
||||
# Print header
|
||||
if show_namespace:
|
||||
print(f"{'NAMESPACE':<42} {'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||
else:
|
||||
print(f"{'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
|
||||
|
||||
# Print rows
|
||||
for pod in pods:
|
||||
name = pod['name'][:50]
|
||||
ready = pod['ready'][:7]
|
||||
status = pod['status'][:20]
|
||||
restarts = pod['restarts'][:9]
|
||||
age = pod['age']
|
||||
|
||||
if show_namespace:
|
||||
namespace = pod['namespace'][:42]
|
||||
print(f"{namespace:<42} {name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||
else:
|
||||
print(f"{name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
|
||||
|
||||
|
||||
def analyze_pods(must_gather_path: str, namespace: Optional[str] = None, problems_only: bool = False):
|
||||
"""Analyze all pods in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
pods = []
|
||||
|
||||
# Find all pod YAML files
|
||||
# Structure: namespaces/<namespace>/pods/<pod-name>/<pod-name>.yaml
|
||||
if namespace:
|
||||
# Specific namespace
|
||||
patterns = [
|
||||
f"namespaces/{namespace}/pods/*/*.yaml",
|
||||
f"*/namespaces/{namespace}/pods/*/*.yaml",
|
||||
]
|
||||
else:
|
||||
# All namespaces
|
||||
patterns = [
|
||||
"namespaces/*/pods/*/*.yaml",
|
||||
"*/namespaces/*/pods/*/*.yaml",
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
for pod_file in base_path.glob(pattern):
|
||||
pod = parse_pod(pod_file)
|
||||
if pod:
|
||||
pod_status = get_pod_status(pod)
|
||||
pods.append(pod_status)
|
||||
|
||||
if not pods:
|
||||
print("No resources found.")
|
||||
return 1
|
||||
|
||||
# Remove duplicates
|
||||
seen = set()
|
||||
unique_pods = []
|
||||
for p in pods:
|
||||
key = f"{p['namespace']}/{p['name']}"
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
unique_pods.append(p)
|
||||
|
||||
# Sort by namespace, then name
|
||||
unique_pods.sort(key=lambda x: (x['namespace'], x['name']))
|
||||
|
||||
# Filter if problems only
|
||||
if problems_only:
|
||||
unique_pods = [p for p in unique_pods if p['is_problem']]
|
||||
if not unique_pods:
|
||||
print("No resources found.")
|
||||
return 0
|
||||
|
||||
# Print results
|
||||
print_pods_table(unique_pods, show_namespace=(namespace is None))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze pod resources from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather.local.123456789
|
||||
%(prog)s ./must-gather.local.123456789 --namespace openshift-etcd
|
||||
%(prog)s ./must-gather.local.123456789 --problems-only
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter by namespace')
|
||||
parser.add_argument('-p', '--problems-only', action='store_true',
|
||||
help='Show only pods with issues')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_pods(args.must_gather_path, args.namespace, args.problems_only)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
117
skills/must-gather-analyzer/scripts/analyze_prometheus.py
Executable file
@@ -0,0 +1,117 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze Prometheus data from must-gather data.
|
||||
Shows Prometheus status, targets, and active alerts.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a JSON file."""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
doc = json.load(f)
|
||||
return doc
|
||||
except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
|
||||
print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
def print_alerts_table(alerts):
|
||||
"""Print alerts in a table format."""
|
||||
if not alerts:
|
||||
print("No alerts found.")
|
||||
return
|
||||
|
||||
print("ALERTS")
|
||||
print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")
|
||||
|
||||
for alert in alerts:
|
||||
state = alert.get('state', '')
|
||||
since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
|
||||
labels = alert.get('labels', {})
|
||||
namespace = labels.pop('namespace', '')[:50]
|
||||
name = labels.pop('alertname', '')[:50]
|
||||
severity = labels.pop('severity', '')[:10]
|
||||
|
||||
print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")
|
||||
|
||||
|
||||
def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
|
||||
"""Analyze Prometheus data in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Retrieve active alerts.
|
||||
rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
|
||||
rules = parse_json_file(rules_path)
|
||||
|
||||
if rules is None:
|
||||
return 1
|
||||
status = rules.get("status", "")
|
||||
if status != "success":
|
||||
print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
if "data" not in rules or "groups" not in rules["data"]:
|
||||
print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
alerts = []
|
||||
for group in rules["data"]["groups"]:
|
||||
for rule in group["rules"]:
|
||||
if rule["type"] == 'alerting' and rule["state"] != 'inactive':
|
||||
for alert in rule["alerts"]:
|
||||
if namespace is None or namespace == '':
|
||||
alerts.append(alert)
|
||||
elif alert.get('labels', {}).get('namespace', '') == namespace:
|
||||
alerts.append(alert)
|
||||
|
||||
# Sort alerts by namespace, alertname and severity.
|
||||
alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))
|
||||
|
||||
# Print results
|
||||
print_alerts_table(alerts)
|
||||
|
||||
# Summary
|
||||
total_alerts = len(alerts)
|
||||
pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
|
||||
firing = sum(1 for alert in alerts if alert.get('state') == 'firing')
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"SUMMARY")
|
||||
print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
|
||||
print(f"{'='*80}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze Prometheus data from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter information by namespace')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_prometheus(args.must_gather_path, args.namespace)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
|
||||
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
235
skills/must-gather-analyzer/scripts/analyze_pvs.py
Executable file
@@ -0,0 +1,235 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Analyze PersistentVolumes and PersistentVolumeClaims from must-gather data.
|
||||
Shows PV/PVC status, capacity, and binding information.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import yaml
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Any, Optional
|
||||
|
||||
|
||||
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
|
||||
"""Parse a YAML file."""
|
||||
try:
|
||||
with open(file_path, 'r') as f:
|
||||
doc = yaml.safe_load(f)
|
||||
return doc
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def format_pv(pv: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a PersistentVolume for display."""
|
||||
name = pv.get('metadata', {}).get('name', 'unknown')
|
||||
spec = pv.get('spec', {})
|
||||
status = pv.get('status', {})
|
||||
|
||||
capacity = spec.get('capacity', {}).get('storage', '')
|
||||
access_modes = ','.join(spec.get('accessModes', []))[:20]
|
||||
reclaim_policy = spec.get('persistentVolumeReclaimPolicy', '')
|
||||
pv_status = status.get('phase', 'Unknown')
|
||||
|
||||
claim_ref = spec.get('claimRef', {})
|
||||
claim = ''
|
||||
if claim_ref:
|
||||
claim_ns = claim_ref.get('namespace', '')
|
||||
claim_name = claim_ref.get('name', '')
|
||||
claim = f"{claim_ns}/{claim_name}" if claim_ns else claim_name
|
||||
|
||||
storage_class = spec.get('storageClassName', '')
|
||||
|
||||
return {
|
||||
'name': name,
|
||||
'capacity': capacity,
|
||||
'access_modes': access_modes,
|
||||
'reclaim_policy': reclaim_policy,
|
||||
'status': pv_status,
|
||||
'claim': claim,
|
||||
'storage_class': storage_class
|
||||
}
|
||||
|
||||
|
||||
def format_pvc(pvc: Dict[str, Any]) -> Dict[str, str]:
|
||||
"""Format a PersistentVolumeClaim for display."""
|
||||
metadata = pvc.get('metadata', {})
|
||||
name = metadata.get('name', 'unknown')
|
||||
namespace = metadata.get('namespace', 'unknown')
|
||||
spec = pvc.get('spec', {})
|
||||
status = pvc.get('status', {})
|
||||
|
||||
pvc_status = status.get('phase', 'Unknown')
|
||||
volume = spec.get('volumeName', '')
|
||||
capacity = status.get('capacity', {}).get('storage', '')
|
||||
access_modes = ','.join(status.get('accessModes', []))[:20]
|
||||
storage_class = spec.get('storageClassName', '')
|
||||
|
||||
return {
|
||||
'namespace': namespace,
|
||||
'name': name,
|
||||
'status': pvc_status,
|
||||
'volume': volume,
|
||||
'capacity': capacity,
|
||||
'access_modes': access_modes,
|
||||
'storage_class': storage_class
|
||||
}
|
||||
|
||||
|
||||
def print_pvs_table(pvs: List[Dict[str, str]]):
|
||||
"""Print PVs in a table format."""
|
||||
if not pvs:
|
||||
print("No PersistentVolumes found.")
|
||||
return
|
||||
|
||||
print("PERSISTENT VOLUMES")
|
||||
print(f"{'NAME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} {'RECLAIM':<10} {'STATUS':<10} {'CLAIM':<40} STORAGECLASS")
|
||||
|
||||
for pv in pvs:
|
||||
name = pv['name'][:50]
|
||||
capacity = pv['capacity'][:10]
|
||||
access = pv['access_modes'][:20]
|
||||
reclaim = pv['reclaim_policy'][:10]
|
||||
status = pv['status'][:10]
|
||||
claim = pv['claim'][:40]
|
||||
sc = pv['storage_class']
|
||||
|
||||
print(f"{name:<50} {capacity:<10} {access:<20} {reclaim:<10} {status:<10} {claim:<40} {sc}")
|
||||
|
||||
|
||||
def print_pvcs_table(pvcs: List[Dict[str, str]]):
|
||||
"""Print PVCs in a table format."""
|
||||
if not pvcs:
|
||||
print("\nNo PersistentVolumeClaims found.")
|
||||
return
|
||||
|
||||
print("\nPERSISTENT VOLUME CLAIMS")
|
||||
print(f"{'NAMESPACE':<30} {'NAME':<40} {'STATUS':<10} {'VOLUME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} STORAGECLASS")
|
||||
|
||||
for pvc in pvcs:
|
||||
namespace = pvc['namespace'][:30]
|
||||
name = pvc['name'][:40]
|
||||
status = pvc['status'][:10]
|
||||
volume = pvc['volume'][:50]
|
||||
capacity = pvc['capacity'][:10]
|
||||
access = pvc['access_modes'][:20]
|
||||
sc = pvc['storage_class']
|
||||
|
||||
print(f"{namespace:<30} {name:<40} {status:<10} {volume:<50} {capacity:<10} {access:<20} {sc}")
|
||||
|
||||
|
||||
def analyze_storage(must_gather_path: str, namespace: Optional[str] = None):
|
||||
"""Analyze PVs and PVCs in a must-gather directory."""
|
||||
base_path = Path(must_gather_path)
|
||||
|
||||
# Find PVs (cluster-scoped)
|
||||
pv_patterns = [
|
||||
"cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||
"*/cluster-scoped-resources/core/persistentvolumes/*.yaml",
|
||||
]
|
||||
|
||||
pvs = []
|
||||
for pattern in pv_patterns:
|
||||
for pv_file in base_path.glob(pattern):
|
||||
if pv_file.name == 'persistentvolumes.yaml':
|
||||
continue
|
||||
pv = parse_yaml_file(pv_file)
|
||||
if pv and pv.get('kind') == 'PersistentVolume':
|
||||
pvs.append(format_pv(pv))
|
||||
|
||||
# Find PVCs (namespace-scoped)
|
||||
if namespace:
|
||||
pvc_patterns = [
|
||||
f"namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||
f"*/namespaces/{namespace}/core/persistentvolumeclaims.yaml",
|
||||
]
|
||||
else:
|
||||
pvc_patterns = [
|
||||
"namespaces/*/core/persistentvolumeclaims.yaml",
|
||||
"*/namespaces/*/core/persistentvolumeclaims.yaml",
|
||||
]
|
||||
|
||||
pvcs = []
|
||||
for pattern in pvc_patterns:
|
||||
for pvc_file in base_path.glob(pattern):
|
||||
pvc_doc = parse_yaml_file(pvc_file)
|
||||
if pvc_doc:
|
||||
if pvc_doc.get('kind') == 'PersistentVolumeClaim':
|
||||
pvcs.append(format_pvc(pvc_doc))
|
||||
elif pvc_doc.get('kind') == 'List':
|
||||
for item in pvc_doc.get('items', []):
|
||||
if item.get('kind') == 'PersistentVolumeClaim':
|
||||
pvcs.append(format_pvc(item))
|
||||
|
||||
# Remove duplicates
|
||||
seen_pvs = set()
|
||||
unique_pvs = []
|
||||
for pv in pvs:
|
||||
if pv['name'] not in seen_pvs:
|
||||
seen_pvs.add(pv['name'])
|
||||
unique_pvs.append(pv)
|
||||
|
||||
seen_pvcs = set()
|
||||
unique_pvcs = []
|
||||
for pvc in pvcs:
|
||||
key = f"{pvc['namespace']}/{pvc['name']}"
|
||||
if key not in seen_pvcs:
|
||||
seen_pvcs.add(key)
|
||||
unique_pvcs.append(pvc)
|
||||
|
||||
# Sort
|
||||
unique_pvs.sort(key=lambda x: x['name'])
|
||||
unique_pvcs.sort(key=lambda x: (x['namespace'], x['name']))
|
||||
|
||||
# Print results
|
||||
print_pvs_table(unique_pvs)
|
||||
print_pvcs_table(unique_pvcs)
|
||||
|
||||
# Summary
|
||||
total_pvs = len(unique_pvs)
|
||||
bound_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Bound')
|
||||
available_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Available')
|
||||
|
||||
total_pvcs = len(unique_pvcs)
|
||||
bound_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Bound')
|
||||
pending_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Pending')
|
||||
|
||||
print(f"\n{'='*80}")
|
||||
print(f"SUMMARY")
|
||||
print(f"PVs: {total_pvs} total ({bound_pvs} bound, {available_pvs} available)")
|
||||
print(f"PVCs: {total_pvcs} total ({bound_pvcs} bound, {pending_pvcs} pending)")
|
||||
if pending_pvcs > 0:
|
||||
print(f" ⚠️ {pending_pvcs} PVC(s) pending - check storage provisioner")
|
||||
print(f"{'='*80}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Analyze PVs and PVCs from must-gather data',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s ./must-gather
|
||||
%(prog)s ./must-gather --namespace openshift-monitoring
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('must_gather_path', help='Path to must-gather directory')
|
||||
parser.add_argument('-n', '--namespace', help='Filter PVCs by namespace')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not os.path.isdir(args.must_gather_path):
|
||||
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
return analyze_storage(args.must_gather_path, args.namespace)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user