Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:46:06 +08:00
commit cf9da06850
16 changed files with 3315 additions and 0 deletions

View File

@@ -0,0 +1,285 @@
---
name: Must-Gather Analyzer
description: |
Analyze OpenShift must-gather diagnostic data including cluster operators, pods, nodes,
and network components. Use this skill when the user asks about cluster health, operator status,
pod issues, node conditions, or wants diagnostic insights from must-gather data.
Triggers: "analyze must-gather", "check cluster health", "operator status", "pod issues",
"node status", "failing pods", "degraded operators", "cluster problems", "crashlooping",
"network issues", "etcd health", "analyze clusteroperators", "analyze pods", "analyze nodes"
---
# Must-Gather Analyzer Skill
Comprehensive analysis of OpenShift must-gather diagnostic data with helper scripts that parse YAML and display output in `oc`-like format.
## Overview
This skill provides analysis for:
- **ClusterVersion**: Current version, update status, and capabilities
- **Cluster Operators**: Status, degradation, and availability
- **Pods**: Health, restarts, crashes, and failures across namespaces
- **Nodes**: Conditions, capacity, and readiness
- **Network**: OVN/SDN diagnostics and connectivity
- **Events**: Warning and error events across namespaces
- **etcd**: Cluster health, member status, and quorum
- **Storage**: PersistentVolumes and PersistentVolumeClaims status
## Must-Gather Directory Structure
**Important**: Must-gather data is contained in a subdirectory with a long hash name:
```
must-gather/
└── registry-ci-openshift-org-origin-...-sha256-<hash>/
├── cluster-scoped-resources/
│ ├── config.openshift.io/clusteroperators/
│ └── core/nodes/
├── namespaces/
│ └── <namespace>/
│ └── pods/
│ └── <pod-name>/
│ └── <pod-name>.yaml
└── network_logs/
```
The analysis scripts expect the path to the **subdirectory** (the one with the hash), not the root must-gather folder.
## Instructions
### 1. Get Must-Gather Path
Ask the user for the must-gather directory path if not already provided.
- If they provide the root directory, look for the subdirectory with the hash name
- The correct path contains `cluster-scoped-resources/` and `namespaces/` directories
### 2. Choose Analysis Type
Based on user's request, run the appropriate helper script:
#### ClusterVersion Analysis
```bash
./scripts/analyze_clusterversion.py <must-gather-path>
```
Shows cluster version information similar to `oc get clusterversion`:
- Current version and update status
- Progressing state
- Available updates
- Version conditions
- Enabled capabilities
- Update history
#### Cluster Operators Analysis
```bash
./scripts/analyze_clusteroperators.py <must-gather-path>
```
Shows cluster operator status similar to `oc get clusteroperators`:
- Available, Progressing, Degraded conditions
- Version information
- Time since condition change
- Detailed messages for operators with issues
#### Pods Analysis
```bash
# All namespaces
./scripts/analyze_pods.py <must-gather-path>
# Specific namespace
./scripts/analyze_pods.py <must-gather-path> --namespace <namespace>
# Show only problematic pods
./scripts/analyze_pods.py <must-gather-path> --problems-only
```
Shows pod status similar to `oc get pods -A`:
- Ready/Total containers
- Status (Running, Pending, CrashLoopBackOff, etc.)
- Restart counts
- Age
- Categorized issues (crashlooping, pending, failed)
#### Nodes Analysis
```bash
./scripts/analyze_nodes.py <must-gather-path>
# Show only nodes with issues
./scripts/analyze_nodes.py <must-gather-path> --problems-only
```
Shows node status similar to `oc get nodes`:
- Ready status
- Roles (master, worker)
- Age
- Kubernetes version
- Node conditions (DiskPressure, MemoryPressure, etc.)
- Capacity and allocatable resources
#### Network Analysis
```bash
./scripts/analyze_network.py <must-gather-path>
```
Shows network health:
- Network type (OVN-Kubernetes, OpenShift SDN)
- Network operator status
- OVN pod health
- PodNetworkConnectivityCheck results
- Network-related issues
#### Events Analysis
```bash
# Recent events (last 100)
./scripts/analyze_events.py <must-gather-path>
# Warning events only
./scripts/analyze_events.py <must-gather-path> --type Warning
# Events in specific namespace
./scripts/analyze_events.py <must-gather-path> --namespace openshift-etcd
# Show last 50 events
./scripts/analyze_events.py <must-gather-path> --count 50
```
Shows cluster events:
- Event type (Warning, Normal)
- Last seen timestamp
- Reason and message
- Affected object
- Event count
#### etcd Analysis
```bash
./scripts/analyze_etcd.py <must-gather-path>
```
Shows etcd cluster health:
- Member health status
- Member list with IDs and URLs
- Endpoint status (leader, version, DB size)
- Quorum status
- Cluster summary
#### Storage Analysis
```bash
# All PVs and PVCs
./scripts/analyze_pvs.py <must-gather-path>
# PVCs in specific namespace
./scripts/analyze_pvs.py <must-gather-path> --namespace openshift-monitoring
```
Shows storage resources:
- PersistentVolumes (capacity, status, claims)
- PersistentVolumeClaims (binding, capacity)
- Storage classes
- Pending/unbound volumes
#### Monitoring Analysis
```bash
# All alerts.
./scripts/analyze_prometheus.py <must-gather-path>
# Alerts in specific namespace
./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
```
Shows monitoring information:
- Alerts (state, namespace, name, active since, labels)
- Total of pending/firing alerts
### 3. Interpret and Report
After running the scripts:
1. Review the summary statistics
2. Focus on items flagged with issues
3. Provide actionable insights and next steps
4. Suggest log analysis for specific components if needed
5. Cross-reference issues (e.g., degraded operator → failing pods → node issues)
## Output Format
All scripts provide:
- **Summary Section**: High-level statistics with emoji indicators
- **Table View**: `oc`-like formatted output
- **Issues Section**: Detailed breakdown of problems
Example summary format:
```
================================================================================
SUMMARY: 25/28 operators healthy
⚠️ 3 operators with issues
🔄 1 progressing
❌ 2 degraded
================================================================================
```
## Helper Scripts Reference
### scripts/analyze_clusterversion.py
Parses: `cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml`
Output: ClusterVersion table with detailed version info, conditions, and capabilities
### scripts/analyze_clusteroperators.py
Parses: `cluster-scoped-resources/config.openshift.io/clusteroperators/`
Output: ClusterOperator status table with conditions
### scripts/analyze_pods.py
Parses: `namespaces/*/pods/*/*.yaml` (individual pod directories)
Output: Pod status table with issues categorized
### scripts/analyze_nodes.py
Parses: `cluster-scoped-resources/core/nodes/`
Output: Node status table with conditions and capacity
### scripts/analyze_network.py
Parses: `network_logs/`, network operator, OVN resources
Output: Network health summary and diagnostics
### scripts/analyze_events.py
Parses: `namespaces/*/core/events.yaml`
Output: Event table sorted by last occurrence
### scripts/analyze_etcd.py
Parses: `etcd_info/` (endpoint_health.json, member_list.json, endpoint_status.json)
Output: etcd cluster health and member status
### scripts/analyze_pvs.py
Parses: `cluster-scoped-resources/core/persistentvolumes/`, `namespaces/*/core/persistentvolumeclaims.yaml`
Output: PV and PVC status tables
## Tips for Analysis
1. **Start with Cluster Operators**: They often reveal system-wide issues
2. **Check Timing**: Look at "SINCE" columns to understand when issues started
3. **Follow Dependencies**: Degraded operator → check its namespace pods → check hosting nodes
4. **Look for Patterns**: Multiple pods failing on same node suggests node issue
5. **Cross-reference**: Use multiple scripts together for complete picture
## Common Scenarios
### "Why is my cluster degraded?"
1. Run `analyze_clusteroperators.py` - identify degraded operators
2. Run `analyze_pods.py --namespace <operator-namespace>` - check operator pods
3. Run `analyze_nodes.py` - verify node health
### "Pods keep crashing"
1. Run `analyze_pods.py --problems-only` - find crashlooping pods
2. Check which nodes they're on
3. Run `analyze_nodes.py` - verify node conditions
4. Suggest checking pod logs in must-gather data
### "Network connectivity issues"
1. Run `analyze_network.py` - check network health
2. Run `analyze_pods.py --namespace openshift-ovn-kubernetes`
3. Check PodNetworkConnectivityCheck results
## Next Steps After Analysis
Based on findings, suggest:
- Examining specific pod logs in `namespaces/<ns>/pods/<pod>/<container>/logs/`
- Reviewing events in `namespaces/<ns>/core/events.yaml`
- Checking audit logs in `audit_logs/`
- Analyzing metrics data if available
- Looking at host service logs in `host_service_logs/`

View File

@@ -0,0 +1,199 @@
#!/usr/bin/env python3
"""
Analyze ClusterOperator resources from must-gather data.
Displays output similar to 'oc get clusteroperators' command.
"""
import sys
import os
import yaml
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
def parse_clusteroperator(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a single clusteroperator YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
if doc and doc.get('kind') == 'ClusterOperator':
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def get_condition_status(conditions: List[Dict], condition_type: str) -> tuple[str, str, str]:
"""
Get status, reason, and message for a specific condition type.
Returns (status, reason, message).
"""
for condition in conditions:
if condition.get('type') == condition_type:
status = condition.get('status', 'Unknown')
reason = condition.get('reason', '')
message = condition.get('message', '')
return status, reason, message
return 'Unknown', '', ''
def calculate_duration(timestamp_str: str) -> str:
"""Calculate duration from timestamp to now."""
try:
# Parse Kubernetes timestamp format
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
now = datetime.now(ts.tzinfo)
delta = now - ts
days = delta.days
hours = delta.seconds // 3600
minutes = (delta.seconds % 3600) // 60
if days > 0:
return f"{days}d"
elif hours > 0:
return f"{hours}h"
elif minutes > 0:
return f"{minutes}m"
else:
return "<1m"
except Exception:
return "unknown"
def get_condition_duration(conditions: List[Dict], condition_type: str) -> str:
"""Get the duration since a condition last transitioned."""
for condition in conditions:
if condition.get('type') == condition_type:
last_transition = condition.get('lastTransitionTime')
if last_transition:
return calculate_duration(last_transition)
return ""
def format_operator_row(operator: Dict[str, Any]) -> Dict[str, str]:
"""Format a ClusterOperator into a row for display."""
name = operator.get('metadata', {}).get('name', 'unknown')
conditions = operator.get('status', {}).get('conditions', [])
versions = operator.get('status', {}).get('versions', [])
# Get version (first version in the list, usually the operator version)
version = versions[0].get('version', '') if versions else ''
# Get condition statuses
available_status, _, _ = get_condition_status(conditions, 'Available')
progressing_status, _, _ = get_condition_status(conditions, 'Progressing')
degraded_status, degraded_reason, degraded_msg = get_condition_status(conditions, 'Degraded')
# Determine which condition to show duration and message for
# Priority: Degraded > Progressing > Available (if false)
if degraded_status == 'True':
since = get_condition_duration(conditions, 'Degraded')
message = degraded_msg if degraded_msg else degraded_reason
elif progressing_status == 'True':
since = get_condition_duration(conditions, 'Progressing')
_, prog_reason, prog_msg = get_condition_status(conditions, 'Progressing')
message = prog_msg if prog_msg else prog_reason
elif available_status == 'False':
since = get_condition_duration(conditions, 'Available')
_, avail_reason, avail_msg = get_condition_status(conditions, 'Available')
message = avail_msg if avail_msg else avail_reason
else:
# All good, show time since available
since = get_condition_duration(conditions, 'Available')
message = ''
return {
'name': name,
'version': version,
'available': available_status,
'progressing': progressing_status,
'degraded': degraded_status,
'since': since,
'message': message
}
def print_operators_table(operators: List[Dict[str, str]]):
"""Print operators in a formatted table like 'oc get clusteroperators'."""
if not operators:
print("No resources found.")
return
# Print header - no width limit on VERSION to match oc output
print(f"{'NAME':<42} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'DEGRADED':<10} {'SINCE':<7} MESSAGE")
# Print rows
for op in operators:
name = op['name'][:42]
version = op['version'] # Don't truncate version
available = op['available'][:11]
progressing = op['progressing'][:13]
degraded = op['degraded'][:10]
since = op['since'][:7]
message = op['message']
print(f"{name:<42} {version:<50} {available:<11} {progressing:<13} {degraded:<10} {since:<7} {message}")
def analyze_clusteroperators(must_gather_path: str):
"""Analyze all clusteroperators in a must-gather directory."""
base_path = Path(must_gather_path)
# Common paths where clusteroperators might be
possible_patterns = [
"cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/*.yaml",
]
clusteroperators = []
# Find and parse all clusteroperator files
for pattern in possible_patterns:
for co_file in base_path.glob(pattern):
operator = parse_clusteroperator(co_file)
if operator:
clusteroperators.append(operator)
if not clusteroperators:
print("No resources found.", file=sys.stderr)
return 1
# Remove duplicates (same operator from different glob patterns)
seen = set()
unique_operators = []
for op in clusteroperators:
name = op.get('metadata', {}).get('name')
if name and name not in seen:
seen.add(name)
unique_operators.append(op)
# Format and sort operators by name
formatted_ops = [format_operator_row(op) for op in unique_operators]
formatted_ops.sort(key=lambda x: x['name'])
# Print results
print_operators_table(formatted_ops)
return 0
def main():
if len(sys.argv) < 2:
print("Usage: analyze_clusteroperators.py <must-gather-directory>", file=sys.stderr)
print("\nExample:", file=sys.stderr)
print(" analyze_clusteroperators.py ./must-gather.local.123456789", file=sys.stderr)
return 1
must_gather_path = sys.argv[1]
if not os.path.isdir(must_gather_path):
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
return 1
return analyze_clusteroperators(must_gather_path)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,261 @@
#!/usr/bin/env python3
"""
Analyze ClusterVersion from must-gather data.
Displays output similar to 'oc get clusterversion' command.
"""
import sys
import os
import yaml
from pathlib import Path
from datetime import datetime
from typing import Dict, Any, Optional
def parse_clusterversion(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse the clusterversion YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
if doc and doc.get('kind') == 'ClusterVersion':
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def get_condition_status(conditions: list, condition_type: str) -> str:
"""Get status for a specific condition type."""
for condition in conditions:
if condition.get('type') == condition_type:
return condition.get('status', 'Unknown')
return 'Unknown'
def calculate_duration(timestamp_str: str) -> str:
"""Calculate duration from timestamp to now."""
try:
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
now = datetime.now(ts.tzinfo)
delta = now - ts
days = delta.days
hours = delta.seconds // 3600
minutes = (delta.seconds % 3600) // 60
if days > 0:
return f"{days}d"
elif hours > 0:
return f"{hours}h"
elif minutes > 0:
return f"{minutes}m"
else:
return "<1m"
except Exception:
return ""
def format_clusterversion(cv: Dict[str, Any]) -> Dict[str, str]:
"""Format ClusterVersion for display."""
name = cv.get('metadata', {}).get('name', 'version')
status = cv.get('status', {})
# Get version from desired
desired = status.get('desired', {})
version = desired.get('version', '')
# Get available updates count
available_updates = status.get('availableUpdates')
if available_updates and isinstance(available_updates, list):
available = str(len(available_updates))
elif available_updates is None:
available = ''
else:
available = '0'
# Get conditions
conditions = status.get('conditions', [])
progressing = get_condition_status(conditions, 'Progressing')
since = ''
# Get time since progressing started (if true) or since last update
for condition in conditions:
if condition.get('type') == 'Progressing':
last_transition = condition.get('lastTransitionTime')
if last_transition:
since = calculate_duration(last_transition)
break
# Get status message
status_msg = ''
for condition in conditions:
if condition.get('type') == 'Progressing' and condition.get('status') == 'True':
status_msg = condition.get('message', '')[:80]
break
# If not progressing, check if failed
if progressing != 'True':
for condition in conditions:
if condition.get('type') == 'Failing' and condition.get('status') == 'True':
status_msg = condition.get('message', '')[:80]
break
return {
'name': name,
'version': version,
'available': available,
'progressing': progressing,
'since': since,
'status': status_msg
}
def print_clusterversion_table(cv_info: Dict[str, str]):
"""Print ClusterVersion in a formatted table like 'oc get clusterversion'."""
# Print header
print(f"{'NAME':<10} {'VERSION':<50} {'AVAILABLE':<11} {'PROGRESSING':<13} {'SINCE':<7} STATUS")
# Print row
name = cv_info['name'][:10]
version = cv_info['version'][:50]
available = cv_info['available'][:11]
progressing = cv_info['progressing'][:13]
since = cv_info['since'][:7]
status = cv_info['status']
print(f"{name:<10} {version:<50} {available:<11} {progressing:<13} {since:<7} {status}")
def print_detailed_info(cv: Dict[str, Any]):
"""Print detailed cluster version information."""
status = cv.get('status', {})
spec = cv.get('spec', {})
print(f"\n{'='*80}")
print("CLUSTER VERSION DETAILS")
print(f"{'='*80}")
# Cluster ID
cluster_id = spec.get('clusterID', 'unknown')
print(f"Cluster ID: {cluster_id}")
# Desired version
desired = status.get('desired', {})
print(f"Desired Version: {desired.get('version', 'unknown')}")
print(f"Desired Image: {desired.get('image', 'unknown')}")
# Version hash
version_hash = status.get('versionHash', '')
if version_hash:
print(f"Version Hash: {version_hash}")
# Upstream
upstream = spec.get('upstream', '')
if upstream:
print(f"Update Server: {upstream}")
# Conditions
conditions = status.get('conditions', [])
print(f"\nCONDITIONS:")
for condition in conditions:
cond_type = condition.get('type', 'Unknown')
cond_status = condition.get('status', 'Unknown')
last_transition = condition.get('lastTransitionTime', '')
message = condition.get('message', '')
# Calculate time since transition
age = calculate_duration(last_transition) if last_transition else ''
status_indicator = "" if cond_status == "True" else "" if cond_status == "False" else ""
print(f" {status_indicator} {cond_type}: {cond_status} (for {age})")
if message and cond_status == 'True':
print(f" Message: {message[:100]}")
# Update history
history = status.get('history', [])
if history:
print(f"\nUPDATE HISTORY (last 5):")
for i, entry in enumerate(history[:5]):
state = entry.get('state', 'Unknown')
version = entry.get('version', 'unknown')
image = entry.get('image', '')
completion_time = entry.get('completionTime', '')
age = calculate_duration(completion_time) if completion_time else ''
print(f" {i+1}. {version} - {state} {f'({age} ago)' if age else ''}")
# Available updates
available_updates = status.get('availableUpdates')
if available_updates and isinstance(available_updates, list) and len(available_updates) > 0:
print(f"\nAVAILABLE UPDATES ({len(available_updates)}):")
for i, update in enumerate(available_updates[:5]):
version = update.get('version', 'unknown')
image = update.get('image', '')
print(f" {i+1}. {version}")
elif available_updates is None:
print(f"\nAVAILABLE UPDATES: Unable to retrieve updates")
# Capabilities
capabilities = status.get('capabilities', {})
enabled_caps = capabilities.get('enabledCapabilities', [])
if enabled_caps:
print(f"\nENABLED CAPABILITIES ({len(enabled_caps)}):")
# Print in columns
for i in range(0, len(enabled_caps), 3):
caps = enabled_caps[i:i+3]
print(f" {', '.join(caps)}")
print(f"{'='*80}\n")
def analyze_clusterversion(must_gather_path: str):
"""Analyze ClusterVersion in a must-gather directory."""
base_path = Path(must_gather_path)
# Find ClusterVersion file
possible_patterns = [
"cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
"*/cluster-scoped-resources/config.openshift.io/clusterversions/version.yaml",
]
cv = None
for pattern in possible_patterns:
for cv_file in base_path.glob(pattern):
cv = parse_clusterversion(cv_file)
if cv:
break
if cv:
break
if not cv:
print("No ClusterVersion found.")
return 1
# Format and print table
cv_info = format_clusterversion(cv)
print_clusterversion_table(cv_info)
# Print detailed information
print_detailed_info(cv)
return 0
def main():
if len(sys.argv) < 2:
print("Usage: analyze_clusterversion.py <must-gather-directory>", file=sys.stderr)
print("\nExample:", file=sys.stderr)
print(" analyze_clusterversion.py ./must-gather.local.123456789", file=sys.stderr)
return 1
must_gather_path = sys.argv[1]
if not os.path.isdir(must_gather_path):
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
return 1
return analyze_clusterversion(must_gather_path)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,206 @@
#!/usr/bin/env python3
"""
Analyze etcd information from must-gather data.
Shows etcd cluster health, member status, and diagnostics.
"""
import sys
import os
import json
from pathlib import Path
from typing import Dict, Any, List, Optional
def parse_etcd_info(must_gather_path: Path) -> Dict[str, Any]:
"""Parse etcd_info directory for cluster health information."""
etcd_data = {
'member_health': [],
'member_list': [],
'endpoint_health': [],
'endpoint_status': []
}
# Find etcd_info directory
etcd_dirs = list(must_gather_path.glob("etcd_info")) + \
list(must_gather_path.glob("*/etcd_info"))
if not etcd_dirs:
return etcd_data
etcd_info_dir = etcd_dirs[0]
# Parse member health
member_health_file = etcd_info_dir / "endpoint_health.json"
if member_health_file.exists():
try:
with open(member_health_file, 'r') as f:
data = json.load(f)
etcd_data['member_health'] = data if isinstance(data, list) else [data]
except Exception as e:
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
# Parse member list
member_list_file = etcd_info_dir / "member_list.json"
if member_list_file.exists():
try:
with open(member_list_file, 'r') as f:
data = json.load(f)
if isinstance(data, dict) and 'members' in data:
etcd_data['member_list'] = data['members']
elif isinstance(data, list):
etcd_data['member_list'] = data
except Exception as e:
print(f"Warning: Failed to parse member_list.json: {e}", file=sys.stderr)
# Parse endpoint health
endpoint_health_file = etcd_info_dir / "endpoint_health.json"
if endpoint_health_file.exists():
try:
with open(endpoint_health_file, 'r') as f:
data = json.load(f)
etcd_data['endpoint_health'] = data if isinstance(data, list) else [data]
except Exception as e:
print(f"Warning: Failed to parse endpoint_health.json: {e}", file=sys.stderr)
# Parse endpoint status
endpoint_status_file = etcd_info_dir / "endpoint_status.json"
if endpoint_status_file.exists():
try:
with open(endpoint_status_file, 'r') as f:
data = json.load(f)
etcd_data['endpoint_status'] = data if isinstance(data, list) else [data]
except Exception as e:
print(f"Warning: Failed to parse endpoint_status.json: {e}", file=sys.stderr)
return etcd_data
def print_member_health(members: List[Dict[str, Any]]):
"""Print etcd member health status."""
if not members:
print("No member health data found.")
return
print("ETCD MEMBER HEALTH")
print(f"{'ENDPOINT':<60} {'HEALTH':<10} {'TOOK':<10} ERROR")
for member in members:
endpoint = member.get('endpoint', 'unknown')[:60]
health = 'true' if member.get('health') else 'false'
took = member.get('took', '')
error = member.get('error', '')
print(f"{endpoint:<60} {health:<10} {took:<10} {error}")
def print_member_list(members: List[Dict[str, Any]]):
"""Print etcd member list."""
if not members:
print("\nNo member list data found.")
return
print("\nETCD MEMBER LIST")
print(f"{'ID':<20} {'NAME':<40} {'PEER URLS':<60} {'CLIENT URLS':<60}")
for member in members:
member_id = str(member.get('ID', member.get('id', 'unknown')))[:20]
name = member.get('name', 'unknown')[:40]
peer_urls = ','.join(member.get('peerURLs', []))[:60]
client_urls = ','.join(member.get('clientURLs', []))[:60]
print(f"{member_id:<20} {name:<40} {peer_urls:<60} {client_urls:<60}")
def print_endpoint_status(endpoints: List[Dict[str, Any]]):
"""Print etcd endpoint status."""
if not endpoints:
print("\nNo endpoint status data found.")
return
print("\nETCD ENDPOINT STATUS")
print(f"{'ENDPOINT':<60} {'LEADER':<20} {'VERSION':<10} {'DB SIZE':<10} {'IS LEARNER'}")
for endpoint in endpoints:
ep = endpoint.get('Endpoint', 'unknown')[:60]
status = endpoint.get('Status', {})
leader = str(status.get('leader', 'unknown'))[:20]
version = status.get('version', 'unknown')[:10]
db_size = status.get('dbSize', 0)
db_size_mb = f"{db_size / (1024*1024):.1f}MB" if db_size else '0MB'
is_learner = 'true' if status.get('isLearner') else 'false'
print(f"{ep:<60} {leader:<20} {version:<10} {db_size_mb:<10} {is_learner}")
def print_summary(etcd_data: Dict[str, Any]):
"""Print summary of etcd cluster health."""
member_health = etcd_data.get('member_health', [])
member_list = etcd_data.get('member_list', [])
total_members = len(member_list)
healthy_members = sum(1 for m in member_health if m.get('health'))
print(f"\n{'='*80}")
print(f"ETCD CLUSTER SUMMARY")
print(f"{'='*80}")
print(f"Total Members: {total_members}")
print(f"Healthy Members: {healthy_members}/{len(member_health) if member_health else total_members}")
if healthy_members < total_members:
print(f" ⚠️ Warning: Not all members are healthy!")
elif healthy_members == total_members and total_members > 0:
print(f" ✅ All members healthy")
# Check for quorum
if total_members >= 3:
quorum = (total_members // 2) + 1
if healthy_members >= quorum:
print(f" ✅ Quorum achieved ({healthy_members}/{quorum})")
else:
print(f" ❌ Quorum lost! ({healthy_members}/{quorum})")
print(f"{'='*80}\n")
def analyze_etcd(must_gather_path: str):
"""Analyze etcd information in a must-gather directory."""
base_path = Path(must_gather_path)
etcd_data = parse_etcd_info(base_path)
if not any(etcd_data.values()):
print("No etcd_info data found in must-gather.")
print("Expected location: etcd_info/ directory")
return 1
# Print summary first
print_summary(etcd_data)
# Print detailed information
print_member_health(etcd_data.get('member_health', []))
print_member_list(etcd_data.get('member_list', []))
print_endpoint_status(etcd_data.get('endpoint_status', []))
return 0
def main():
if len(sys.argv) < 2:
print("Usage: analyze_etcd.py <must-gather-directory>", file=sys.stderr)
print("\nExample:", file=sys.stderr)
print(" analyze_etcd.py ./must-gather.local.123456789", file=sys.stderr)
return 1
must_gather_path = sys.argv[1]
if not os.path.isdir(must_gather_path):
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
return 1
return analyze_etcd(must_gather_path)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,201 @@
#!/usr/bin/env python3
"""
Analyze Events from must-gather data.
Shows warning and error events sorted by last occurrence.
"""
import sys
import os
import yaml
import argparse
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
from collections import defaultdict
def parse_events_file(file_path: Path) -> List[Dict[str, Any]]:
"""Parse events YAML file which may contain multiple events."""
events = []
try:
with open(file_path, 'r') as f:
docs = yaml.safe_load_all(f)
for doc in docs:
if doc and doc.get('kind') == 'Event':
events.append(doc)
elif doc and doc.get('kind') == 'EventList':
# Handle EventList
events.extend(doc.get('items', []))
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return events
def calculate_age(timestamp_str: str) -> str:
"""Calculate age from timestamp."""
try:
ts = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
now = datetime.now(ts.tzinfo)
delta = now - ts
days = delta.days
hours = delta.seconds // 3600
minutes = (delta.seconds % 3600) // 60
if days > 0:
return f"{days}d"
elif hours > 0:
return f"{hours}h"
elif minutes > 0:
return f"{minutes}m"
else:
return "<1m"
except Exception:
return ""
def format_event(event: Dict[str, Any]) -> Dict[str, Any]:
"""Format an event for display."""
metadata = event.get('metadata', {})
namespace = metadata.get('namespace', '')
name = metadata.get('name', 'unknown')
# Get last timestamp
last_timestamp = event.get('lastTimestamp') or event.get('eventTime') or metadata.get('creationTimestamp', '')
age = calculate_age(last_timestamp) if last_timestamp else ''
# Event details
event_type = event.get('type', 'Normal')
reason = event.get('reason', '')
message = event.get('message', '')
count = event.get('count', 1)
# Involved object
involved = event.get('involvedObject', {})
obj_kind = involved.get('kind', '')
obj_name = involved.get('name', '')
return {
'namespace': namespace,
'last_seen': age,
'type': event_type,
'reason': reason,
'object_kind': obj_kind,
'object_name': obj_name,
'message': message,
'count': count,
'timestamp': last_timestamp
}
def print_events_table(events: List[Dict[str, Any]]):
"""Print events in a table format."""
if not events:
print("No resources found.")
return
# Print header
print(f"{'NAMESPACE':<30} {'LAST SEEN':<10} {'TYPE':<10} {'REASON':<30} {'OBJECT':<40} {'MESSAGE':<60}")
# Print rows
for event in events:
namespace = event['namespace'][:30] if event['namespace'] else '<cluster>'
last_seen = event['last_seen'][:10]
event_type = event['type'][:10]
reason = event['reason'][:30]
obj = f"{event['object_kind']}/{event['object_name']}"[:40]
message = event['message'][:60]
print(f"{namespace:<30} {last_seen:<10} {event_type:<10} {reason:<30} {obj:<40} {message:<60}")
def analyze_events(must_gather_path: str, namespace: Optional[str] = None,
event_type: Optional[str] = None, show_count: int = 100):
"""Analyze events in a must-gather directory."""
base_path = Path(must_gather_path)
all_events = []
# Find all events files
if namespace:
patterns = [
f"namespaces/{namespace}/core/events.yaml",
f"*/namespaces/{namespace}/core/events.yaml",
]
else:
patterns = [
"namespaces/*/core/events.yaml",
"*/namespaces/*/core/events.yaml",
]
for pattern in patterns:
for events_file in base_path.glob(pattern):
events = parse_events_file(events_file)
all_events.extend(events)
if not all_events:
print("No resources found.")
return 1
# Format events
formatted_events = [format_event(e) for e in all_events]
# Filter by type if specified
if event_type:
formatted_events = [e for e in formatted_events if e['type'].lower() == event_type.lower()]
# Sort by timestamp (most recent first)
formatted_events.sort(key=lambda x: x['timestamp'], reverse=True)
# Limit count
if show_count and show_count > 0:
formatted_events = formatted_events[:show_count]
# Print results
print_events_table(formatted_events)
# Summary
total = len(formatted_events)
warnings = sum(1 for e in formatted_events if e['type'] == 'Warning')
normal = sum(1 for e in formatted_events if e['type'] == 'Normal')
print(f"\nShowing {total} most recent events")
if warnings > 0:
print(f" ⚠️ {warnings} Warning events")
if normal > 0:
print(f" {normal} Normal events")
return 0
def main():
parser = argparse.ArgumentParser(
description='Analyze events from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather
%(prog)s ./must-gather --namespace openshift-etcd
%(prog)s ./must-gather --type Warning
%(prog)s ./must-gather --count 50
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-n', '--namespace', help='Filter by namespace')
parser.add_argument('-t', '--type', help='Filter by event type (Warning, Normal)')
parser.add_argument('-c', '--count', type=int, default=100,
help='Number of events to show (default: 100)')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
return analyze_events(args.must_gather_path, args.namespace, args.type, args.count)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,281 @@
#!/usr/bin/env python3
"""
Analyze Network resources and diagnostics from must-gather data.
Shows network operator status, OVN pods, and connectivity checks.
"""
import sys
import os
import yaml
from pathlib import Path
from typing import List, Dict, Any, Optional
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def get_network_type(must_gather_path: Path) -> str:
"""Determine the network type from cluster network config."""
# First try to find networks.yaml (List object)
patterns = [
"cluster-scoped-resources/config.openshift.io/networks.yaml",
"*/cluster-scoped-resources/config.openshift.io/networks.yaml",
]
for pattern in patterns:
for network_file in must_gather_path.glob(pattern):
network_list = parse_yaml_file(network_file)
if network_list:
# Handle NetworkList object
items = network_list.get('items', [])
if items:
# Get the first network item
network = items[0]
spec = network.get('spec', {})
network_type = spec.get('networkType', 'Unknown')
if network_type != 'Unknown':
return network_type
# Fallback: try individual network config files
patterns = [
"cluster-scoped-resources/config.openshift.io/*.yaml",
]
for pattern in patterns:
for network_file in must_gather_path.glob(pattern):
if network_file.name in ['networks.yaml']:
continue
network = parse_yaml_file(network_file)
if network:
spec = network.get('spec', {})
network_type = spec.get('networkType', 'Unknown')
if network_type != 'Unknown':
return network_type
return 'Unknown'
def analyze_network_operator(must_gather_path: Path) -> Optional[Dict[str, Any]]:
"""Analyze network operator status."""
patterns = [
"cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
"*/cluster-scoped-resources/config.openshift.io/clusteroperators/network.yaml",
]
for pattern in patterns:
for op_file in must_gather_path.glob(pattern):
operator = parse_yaml_file(op_file)
if operator:
conditions = operator.get('status', {}).get('conditions', [])
result = {}
for cond in conditions:
cond_type = cond.get('type')
if cond_type in ['Available', 'Progressing', 'Degraded']:
result[cond_type] = cond.get('status', 'Unknown')
result[f'{cond_type}_message'] = cond.get('message', '')
return result
return None
def analyze_ovn_pods(must_gather_path: Path) -> List[Dict[str, str]]:
"""Analyze OVN-Kubernetes pods."""
pods = []
patterns = [
"namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
"*/namespaces/openshift-ovn-kubernetes/pods/*/*.yaml",
]
for pattern in patterns:
for pod_file in must_gather_path.glob(pattern):
if pod_file.name == 'pods.yaml':
continue
pod = parse_yaml_file(pod_file)
if pod:
name = pod.get('metadata', {}).get('name', 'unknown')
status = pod.get('status', {})
phase = status.get('phase', 'Unknown')
container_statuses = status.get('containerStatuses', [])
total = len(pod.get('spec', {}).get('containers', []))
ready = sum(1 for cs in container_statuses if cs.get('ready', False))
pods.append({
'name': name,
'ready': f"{ready}/{total}",
'status': phase
})
# Remove duplicates
seen = set()
unique_pods = []
for p in pods:
if p['name'] not in seen:
seen.add(p['name'])
unique_pods.append(p)
return sorted(unique_pods, key=lambda x: x['name'])
def analyze_connectivity_checks(must_gather_path: Path) -> Dict[str, Any]:
"""Analyze PodNetworkConnectivityCheck resources."""
# First try to find podnetworkconnectivitychecks.yaml (List object)
patterns = [
"pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
"*/pod_network_connectivity_check/podnetworkconnectivitychecks.yaml",
]
total_checks = 0
failed_checks = []
for pattern in patterns:
for check_file in must_gather_path.glob(pattern):
check_list = parse_yaml_file(check_file)
if check_list:
items = check_list.get('items', [])
for check in items:
total_checks += 1
name = check.get('metadata', {}).get('name', 'unknown')
status = check.get('status', {})
conditions = status.get('conditions', [])
for cond in conditions:
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
failed_checks.append({
'name': name,
'message': cond.get('message', 'Unknown')
})
# If we found the list file, no need to continue
if total_checks > 0:
return {
'total': total_checks,
'failed': failed_checks
}
# Fallback: try individual check files
patterns = [
"*/pod_network_connectivity_check/*.yaml",
]
for pattern in patterns:
for check_file in must_gather_path.glob(pattern):
if check_file.name == 'podnetworkconnectivitychecks.yaml':
continue
check = parse_yaml_file(check_file)
if check:
total_checks += 1
name = check.get('metadata', {}).get('name', 'unknown')
status = check.get('status', {})
conditions = status.get('conditions', [])
for cond in conditions:
if cond.get('type') == 'Reachable' and cond.get('status') == 'False':
failed_checks.append({
'name': name,
'message': cond.get('message', 'Unknown')
})
return {
'total': total_checks,
'failed': failed_checks
}
def print_network_summary(network_type: str, operator_status: Optional[Dict],
ovn_pods: List[Dict], connectivity: Dict):
"""Print network analysis summary."""
print(f"{'NETWORK TYPE':<30} {network_type}")
print()
if operator_status:
print("NETWORK OPERATOR STATUS")
print(f"{'Available':<15} {operator_status.get('Available', 'Unknown')}")
print(f"{'Progressing':<15} {operator_status.get('Progressing', 'Unknown')}")
print(f"{'Degraded':<15} {operator_status.get('Degraded', 'Unknown')}")
if operator_status.get('Degraded') == 'True':
msg = operator_status.get('Degraded_message', '')
if msg:
print(f" Message: {msg}")
print()
if ovn_pods and network_type == 'OVNKubernetes':
print("OVN-KUBERNETES PODS")
print(f"{'NAME':<60} {'READY':<10} STATUS")
for pod in ovn_pods:
name = pod['name'][:60]
ready = pod['ready'][:10]
status = pod['status']
print(f"{name:<60} {ready:<10} {status}")
print()
if connectivity['total'] > 0:
print(f"NETWORK CONNECTIVITY CHECKS: {connectivity['total']} total")
if connectivity['failed']:
print(f" Failed: {len(connectivity['failed'])}")
for failed in connectivity['failed'][:10]: # Show first 10
print(f" - {failed['name']}")
if failed['message']:
print(f" {failed['message'][:100]}")
else:
print(" All checks passing")
print()
def analyze_network(must_gather_path: str):
"""Analyze network resources in a must-gather directory."""
base_path = Path(must_gather_path)
# Get network type
network_type = get_network_type(base_path)
# Get network operator status
operator_status = analyze_network_operator(base_path)
# Get OVN pods if applicable
ovn_pods = []
if network_type == 'OVNKubernetes':
ovn_pods = analyze_ovn_pods(base_path)
# Get connectivity checks
connectivity = analyze_connectivity_checks(base_path)
# Print summary
print_network_summary(network_type, operator_status, ovn_pods, connectivity)
return 0
def main():
if len(sys.argv) < 2:
print("Usage: analyze_network.py <must-gather-directory>", file=sys.stderr)
print("\nExample:", file=sys.stderr)
print(" analyze_network.py ./must-gather.local.123456789", file=sys.stderr)
return 1
must_gather_path = sys.argv[1]
if not os.path.isdir(must_gather_path):
print(f"Error: Directory not found: {must_gather_path}", file=sys.stderr)
return 1
return analyze_network(must_gather_path)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,224 @@
#!/usr/bin/env python3
"""
Analyze Node resources from must-gather data.
Displays output similar to 'oc get nodes' command.
"""
import sys
import os
import yaml
import argparse
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
def parse_node(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a single node YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
if doc and doc.get('kind') == 'Node':
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def calculate_age(creation_timestamp: str) -> str:
"""Calculate age from creation timestamp."""
try:
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
now = datetime.now(ts.tzinfo)
delta = now - ts
days = delta.days
hours = delta.seconds // 3600
if days > 0:
return f"{days}d"
elif hours > 0:
return f"{hours}h"
else:
return "<1h"
except Exception:
return ""
def get_node_roles(labels: Dict[str, str]) -> str:
"""Extract node roles from labels."""
roles = []
for key in labels:
if key.startswith('node-role.kubernetes.io/'):
role = key.split('/')[-1]
if role:
roles.append(role)
return ','.join(sorted(roles)) if roles else '<none>'
def get_node_status(node: Dict[str, Any]) -> Dict[str, Any]:
"""Extract node status information."""
metadata = node.get('metadata', {})
status = node.get('status', {})
name = metadata.get('name', 'unknown')
labels = metadata.get('labels', {})
creation_time = metadata.get('creationTimestamp', '')
# Get roles
roles = get_node_roles(labels)
# Get conditions
conditions = status.get('conditions', [])
ready_condition = 'Unknown'
node_issues = []
for condition in conditions:
cond_type = condition.get('type', '')
cond_status = condition.get('status', 'Unknown')
if cond_type == 'Ready':
ready_condition = cond_status
elif cond_status == 'True' and cond_type in ['MemoryPressure', 'DiskPressure', 'PIDPressure', 'NetworkUnavailable']:
node_issues.append(cond_type)
# Determine overall status
if ready_condition == 'True':
node_status = 'Ready'
elif ready_condition == 'False':
node_status = 'NotReady'
else:
node_status = 'Unknown'
# Add issues to status
if node_issues:
node_status = f"{node_status},{','.join(node_issues)}"
# Get version
node_info = status.get('nodeInfo', {})
version = node_info.get('kubeletVersion', '')
# Get age
age = calculate_age(creation_time) if creation_time else ''
# Internal IP
addresses = status.get('addresses', [])
internal_ip = ''
for addr in addresses:
if addr.get('type') == 'InternalIP':
internal_ip = addr.get('address', '')
break
# OS Image
os_image = node_info.get('osImage', '')
return {
'name': name,
'status': node_status,
'roles': roles,
'age': age,
'version': version,
'internal_ip': internal_ip,
'os_image': os_image,
'is_problem': node_status != 'Ready' or len(node_issues) > 0
}
def print_nodes_table(nodes: List[Dict[str, Any]]):
"""Print nodes in a formatted table like 'oc get nodes'."""
if not nodes:
print("No resources found.")
return
# Print header
print(f"{'NAME':<50} {'STATUS':<30} {'ROLES':<20} {'AGE':<7} VERSION")
# Print rows
for node in nodes:
name = node['name'][:50]
status = node['status'][:30]
roles = node['roles'][:20]
age = node['age'][:7]
version = node['version']
print(f"{name:<50} {status:<30} {roles:<20} {age:<7} {version}")
def analyze_nodes(must_gather_path: str, problems_only: bool = False):
"""Analyze all nodes in a must-gather directory."""
base_path = Path(must_gather_path)
# Find all node YAML files
possible_patterns = [
"cluster-scoped-resources/core/nodes/*.yaml",
"*/cluster-scoped-resources/core/nodes/*.yaml",
]
nodes = []
for pattern in possible_patterns:
for node_file in base_path.glob(pattern):
# Skip the nodes.yaml file that contains all nodes
if node_file.name == 'nodes.yaml':
continue
node = parse_node(node_file)
if node:
node_status = get_node_status(node)
nodes.append(node_status)
if not nodes:
print("No resources found.")
return 1
# Remove duplicates
seen = set()
unique_nodes = []
for n in nodes:
if n['name'] not in seen:
seen.add(n['name'])
unique_nodes.append(n)
# Sort by name
unique_nodes.sort(key=lambda x: x['name'])
# Filter if problems only
if problems_only:
unique_nodes = [n for n in unique_nodes if n['is_problem']]
if not unique_nodes:
print("No resources found.")
return 0
# Print results
print_nodes_table(unique_nodes)
return 0
def main():
parser = argparse.ArgumentParser(
description='Analyze node resources from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather.local.123456789
%(prog)s ./must-gather.local.123456789 --problems-only
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-p', '--problems-only', action='store_true',
help='Show only nodes with issues')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
return analyze_nodes(args.must_gather_path, args.problems_only)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,444 @@
#!/usr/bin/env python3
"""
Analyze OVN Northbound and Southbound databases from must-gather.
Uses ovsdb-tool to read binary .db files collected per-node.
Must-gather structure:
network_logs/
└── ovnk_database_store.tar.gz
└── ovnk_database_store/
├── ovnkube-node-{pod}_nbdb (per-zone NBDB)
├── ovnkube-node-{pod}_sbdb (per-zone SBDB)
└── ...
"""
import subprocess
import json
import sys
import os
import tarfile
import yaml
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional
class OVNDatabase:
"""Wrapper for querying OVSDB files using ovsdb-tool"""
def __init__(self, db_path: Path, db_type: str, node_name: str = None):
self.db_path = db_path
self.db_type = db_type # 'nbdb' or 'sbdb'
self.pod_name = db_path.stem.replace('_nbdb', '').replace('_sbdb', '')
self.node_name = node_name or self.pod_name # Use node name if available
def query(self, table: str, columns: List[str] = None, where: List = None) -> List[Dict]:
"""Query OVSDB table using ovsdb-tool query command"""
schema = "OVN_Northbound" if self.db_type == "nbdb" else "OVN_Southbound"
# Build query
query_op = {
"op": "select",
"table": table,
"where": where or []
}
if columns:
query_op["columns"] = columns
query_json = json.dumps([schema, query_op])
try:
result = subprocess.run(
['ovsdb-tool', 'query', str(self.db_path), query_json],
capture_output=True,
text=True,
timeout=30
)
if result.returncode != 0:
print(f"Warning: Query failed for {self.db_path}: {result.stderr}", file=sys.stderr)
return []
data = json.loads(result.stdout)
return data[0].get('rows', [])
except Exception as e:
print(f"Warning: Failed to query {table} from {self.db_path}: {e}", file=sys.stderr)
return []
def build_pod_to_node_mapping(mg_path: Path) -> Dict[str, str]:
"""Build mapping of ovnkube pod names to node names"""
pod_to_node = {}
# Look for ovnkube-node pods in openshift-ovn-kubernetes namespace
ovn_ns_path = mg_path / "namespaces" / "openshift-ovn-kubernetes" / "pods"
if not ovn_ns_path.exists():
print(f"Warning: OVN namespace pods not found at {ovn_ns_path}", file=sys.stderr)
return pod_to_node
# Find all ovnkube-node pod directories
for pod_dir in ovn_ns_path.glob("ovnkube-node-*"):
if not pod_dir.is_dir():
continue
pod_name = pod_dir.name
pod_yaml = pod_dir / f"{pod_name}.yaml"
if not pod_yaml.exists():
continue
try:
with open(pod_yaml, 'r') as f:
pod = yaml.safe_load(f)
node_name = pod.get('spec', {}).get('nodeName')
if node_name:
pod_to_node[pod_name] = node_name
except Exception as e:
print(f"Warning: Failed to parse {pod_yaml}: {e}", file=sys.stderr)
return pod_to_node
def extract_db_tarball(mg_path: Path) -> Path:
"""Extract ovnk_database_store.tar.gz if not already extracted"""
network_logs = mg_path / "network_logs"
tarball = network_logs / "ovnk_database_store.tar.gz"
extract_dir = network_logs / "ovnk_database_store"
if not tarball.exists():
print(f"Error: Database tarball not found: {tarball}", file=sys.stderr)
return None
# Extract if directory doesn't exist
if not extract_dir.exists():
print(f"Extracting {tarball}...")
with tarfile.open(tarball, 'r:gz') as tar:
tar.extractall(path=network_logs)
return extract_dir
def get_nb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
"""Find all NB database files and map them to nodes"""
databases = []
for db in sorted(db_dir.glob("*_nbdb")):
pod_name = db.stem.replace('_nbdb', '')
node_name = pod_to_node.get(pod_name)
databases.append(OVNDatabase(db, 'nbdb', node_name))
return databases
def get_sb_databases(db_dir: Path, pod_to_node: Dict[str, str]) -> List[OVNDatabase]:
"""Find all SB database files and map them to nodes"""
databases = []
for db in sorted(db_dir.glob("*_sbdb")):
pod_name = db.stem.replace('_sbdb', '')
node_name = pod_to_node.get(pod_name)
databases.append(OVNDatabase(db, 'sbdb', node_name))
return databases
def analyze_logical_switches(db: OVNDatabase):
"""Analyze logical switches in the zone"""
switches = db.query("Logical_Switch", columns=["name", "ports", "other_config"])
if not switches:
print(" No logical switches found.")
return
print(f"\n LOGICAL SWITCHES ({len(switches)}):")
print(f" {'NAME':<60} PORTS")
print(f" {'-'*80}")
for sw in switches:
name = sw.get('name', 'unknown')
# ports is a UUID set, just count them
port_count = 0
ports = sw.get('ports', [])
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
port_count = len(ports[1])
print(f" {name:<60} {port_count}")
def analyze_logical_switch_ports(db: OVNDatabase):
"""Analyze logical switch ports, focusing on pods"""
lsps = db.query("Logical_Switch_Port", columns=["name", "external_ids", "addresses"])
# Filter for pod ports (have pod=true in external_ids)
pod_ports = []
for lsp in lsps:
ext_ids = lsp.get('external_ids', [])
if isinstance(ext_ids, list) and len(ext_ids) == 2 and ext_ids[0] == "map":
ext_map = dict(ext_ids[1])
if ext_map.get('pod') == 'true':
# Pod name is in the LSP name (format: namespace_podname)
lsp_name = lsp.get('name', '')
namespace = ext_map.get('namespace', '')
# Extract pod name from LSP name
pod_name = lsp_name
if lsp_name.startswith(namespace + '_'):
pod_name = lsp_name[len(namespace) + 1:]
# Extract IP from addresses (format can be string "MAC IP" or empty)
ip = ""
addrs = lsp.get('addresses', '')
if isinstance(addrs, str) and addrs:
parts = addrs.split()
if len(parts) > 1:
ip = parts[1]
pod_ports.append({
'name': lsp_name,
'namespace': namespace,
'pod_name': pod_name,
'ip': ip
})
if not pod_ports:
print(" No pod logical switch ports found.")
return
print(f"\n POD LOGICAL SWITCH PORTS ({len(pod_ports)}):")
print(f" {'NAMESPACE':<40} {'POD':<45} IP")
print(f" {'-'*120}")
for port in sorted(pod_ports, key=lambda x: (x['namespace'], x['pod_name']))[:20]: # Show first 20
namespace = port['namespace'][:40]
pod_name = port['pod_name'][:45]
ip = port['ip']
print(f" {namespace:<40} {pod_name:<45} {ip}")
if len(pod_ports) > 20:
print(f" ... and {len(pod_ports) - 20} more")
def analyze_acls(db: OVNDatabase):
"""Analyze ACLs in the zone"""
acls = db.query("ACL", columns=["priority", "direction", "match", "action", "severity"])
if not acls:
print(" No ACLs found.")
return
print(f"\n ACCESS CONTROL LISTS ({len(acls)}):")
print(f" {'PRIORITY':<10} {'DIRECTION':<15} {'ACTION':<15} MATCH")
print(f" {'-'*120}")
# Show highest priority ACLs first
sorted_acls = sorted(acls, key=lambda x: x.get('priority', 0), reverse=True)
for acl in sorted_acls[:15]: # Show top 15
priority = acl.get('priority', 0)
direction = acl.get('direction', '')
action = acl.get('action', '')
match = acl.get('match', '')[:70] # Truncate long matches
print(f" {priority:<10} {direction:<15} {action:<15} {match}")
if len(acls) > 15:
print(f" ... and {len(acls) - 15} more")
def analyze_logical_routers(db: OVNDatabase):
"""Analyze logical routers in the zone"""
routers = db.query("Logical_Router", columns=["name", "ports", "static_routes"])
if not routers:
print(" No logical routers found.")
return
print(f"\n LOGICAL ROUTERS ({len(routers)}):")
print(f" {'NAME':<60} PORTS")
print(f" {'-'*80}")
for router in routers:
name = router.get('name', 'unknown')
# Count ports
port_count = 0
ports = router.get('ports', [])
if isinstance(ports, list) and len(ports) == 2 and ports[0] == "set":
port_count = len(ports[1])
print(f" {name:<60} {port_count}")
def analyze_zone_summary(db: OVNDatabase):
"""Print summary for a zone"""
# Get counts - for ACLs we need multiple columns to get accurate count
switches = db.query("Logical_Switch", columns=["name"])
lsps = db.query("Logical_Switch_Port", columns=["name"])
acls = db.query("ACL", columns=["priority", "direction", "match"])
routers = db.query("Logical_Router", columns=["name"])
print(f"\n{'='*80}")
print(f"Node: {db.node_name}")
if db.node_name != db.pod_name:
print(f"Pod: {db.pod_name}")
print(f"{'='*80}")
print(f" Logical Switches: {len(switches)}")
print(f" Logical Switch Ports: {len(lsps)}")
print(f" ACLs: {len(acls)}")
print(f" Logical Routers: {len(routers)}")
def run_raw_query(mg_path: str, node_filter: str, query_json: str):
"""Run a raw JSON query against OVN databases"""
base_path = Path(mg_path)
# Build pod-to-node mapping
pod_to_node = build_pod_to_node_mapping(base_path)
# Extract tarball
db_dir = extract_db_tarball(base_path)
if not db_dir:
return 1
# Get all NB databases
nb_dbs = get_nb_databases(db_dir, pod_to_node)
if not nb_dbs:
print("No Northbound databases found in must-gather.", file=sys.stderr)
return 1
# Filter by node if specified
if node_filter:
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
if not filtered_dbs:
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
print(f"\nAvailable nodes:", file=sys.stderr)
for db in nb_dbs:
print(f" - {db.node_name}", file=sys.stderr)
return 1
nb_dbs = filtered_dbs
# Run query on each database
for db in nb_dbs:
print(f"\n{'='*80}")
print(f"Node: {db.node_name}")
if db.node_name != db.pod_name:
print(f"Pod: {db.pod_name}")
print(f"{'='*80}\n")
try:
# Run the raw query using ovsdb-tool
result = subprocess.run(
['ovsdb-tool', 'query', str(db.db_path), query_json],
capture_output=True,
text=True,
timeout=30
)
if result.returncode != 0:
print(f"Error: Query failed: {result.stderr}", file=sys.stderr)
continue
# Pretty print the JSON result
try:
data = json.loads(result.stdout)
print(json.dumps(data, indent=2))
except json.JSONDecodeError:
# If not valid JSON, just print raw output
print(result.stdout)
except Exception as e:
print(f"Error: Failed to execute query: {e}", file=sys.stderr)
return 0
def analyze_northbound_databases(mg_path: str, node_filter: str = None):
"""Analyze all Northbound databases"""
base_path = Path(mg_path)
# Build pod-to-node mapping
pod_to_node = build_pod_to_node_mapping(base_path)
# Extract tarball
db_dir = extract_db_tarball(base_path)
if not db_dir:
return 1
# Get all NB databases
nb_dbs = get_nb_databases(db_dir, pod_to_node)
if not nb_dbs:
print("No Northbound databases found in must-gather.", file=sys.stderr)
return 1
# Filter by node if specified
if node_filter:
filtered_dbs = [db for db in nb_dbs if node_filter in db.node_name]
if not filtered_dbs:
print(f"Error: No databases found for node matching '{node_filter}'", file=sys.stderr)
print(f"\nAvailable nodes:", file=sys.stderr)
for db in nb_dbs:
print(f" - {db.node_name}", file=sys.stderr)
return 1
nb_dbs = filtered_dbs
print(f"\nFound {len(nb_dbs)} node(s)\n")
# Analyze each zone
for db in nb_dbs:
analyze_zone_summary(db)
analyze_logical_switches(db)
analyze_logical_switch_ports(db)
analyze_acls(db)
analyze_logical_routers(db)
print()
return 0
def main():
parser = argparse.ArgumentParser(
description="Analyze OVN databases from must-gather",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Analyze all nodes
analyze_ovn_dbs.py ./must-gather.local.123456789
# Analyze specific node
analyze_ovn_dbs.py ./must-gather.local.123456789 --node ip-10-0-26-145
# Run raw OVSDB query (Claude can construct the JSON)
analyze_ovn_dbs.py ./must-gather/ --query '["OVN_Northbound", {"op":"select", "table":"ACL", "where":[["priority", ">", 1000]], "columns":["priority","match","action"]}]'
# Query specific node
analyze_ovn_dbs.py ./must-gather/ --node master-0 --query '["OVN_Northbound", {"op":"select", "table":"Logical_Switch", "where":[], "columns":["name"]}]'
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('--node', '-n', help='Filter by node name (supports partial matches)')
parser.add_argument('--query', '-q', help='Run raw OVSDB JSON query instead of standard analysis')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
# Check if ovsdb-tool is available
try:
subprocess.run(['ovsdb-tool', '--version'], capture_output=True, check=True)
except (subprocess.CalledProcessError, FileNotFoundError):
print("Error: ovsdb-tool not found. Please install openvswitch package.", file=sys.stderr)
return 1
# Run query mode or standard analysis
if args.query:
return run_raw_query(args.must_gather_path, args.node, args.query)
else:
return analyze_northbound_databases(args.must_gather_path, args.node)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,224 @@
#!/usr/bin/env python3
"""
Analyze Pod resources from must-gather data.
Displays output similar to 'oc get pods -A' command.
"""
import sys
import os
import yaml
import argparse
from pathlib import Path
from datetime import datetime
from typing import List, Dict, Any, Optional
def parse_pod(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a single pod YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
if doc and doc.get('kind') == 'Pod':
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def calculate_age(creation_timestamp: str) -> str:
"""Calculate age from creation timestamp."""
try:
ts = datetime.fromisoformat(creation_timestamp.replace('Z', '+00:00'))
now = datetime.now(ts.tzinfo)
delta = now - ts
days = delta.days
hours = delta.seconds // 3600
minutes = (delta.seconds % 3600) // 60
if days > 0:
return f"{days}d"
elif hours > 0:
return f"{hours}h"
elif minutes > 0:
return f"{minutes}m"
else:
return "<1m"
except Exception:
return ""
def get_pod_status(pod: Dict[str, Any]) -> Dict[str, Any]:
"""Extract pod status information."""
metadata = pod.get('metadata', {})
status = pod.get('status', {})
spec = pod.get('spec', {})
name = metadata.get('name', 'unknown')
namespace = metadata.get('namespace', 'unknown')
creation_time = metadata.get('creationTimestamp', '')
# Get container statuses
container_statuses = status.get('containerStatuses', [])
init_container_statuses = status.get('initContainerStatuses', [])
# Calculate ready containers
total_containers = len(spec.get('containers', []))
ready_containers = sum(1 for cs in container_statuses if cs.get('ready', False))
# Get overall phase
phase = status.get('phase', 'Unknown')
# Determine more specific status
pod_status = phase
reason = status.get('reason', '')
# Check for specific container states
for cs in container_statuses:
state = cs.get('state', {})
if 'waiting' in state:
waiting = state['waiting']
pod_status = waiting.get('reason', 'Waiting')
elif 'terminated' in state:
terminated = state['terminated']
if terminated.get('exitCode', 0) != 0:
pod_status = terminated.get('reason', 'Error')
# Check init containers
for ics in init_container_statuses:
state = ics.get('state', {})
if 'waiting' in state:
waiting = state['waiting']
if waiting.get('reason') in ['CrashLoopBackOff', 'ImagePullBackOff', 'ErrImagePull']:
pod_status = f"Init:{waiting.get('reason', 'Waiting')}"
# Calculate total restarts
total_restarts = sum(cs.get('restartCount', 0) for cs in container_statuses)
# Calculate age
age = calculate_age(creation_time) if creation_time else ''
return {
'namespace': namespace,
'name': name,
'ready': f"{ready_containers}/{total_containers}",
'status': pod_status,
'restarts': str(total_restarts),
'age': age,
'node': spec.get('nodeName', ''),
'is_problem': pod_status not in ['Running', 'Succeeded', 'Completed'] or total_restarts > 0
}
def print_pods_table(pods: List[Dict[str, Any]], show_namespace: bool = True):
"""Print pods in a formatted table like 'oc get pods'."""
if not pods:
print("No resources found.")
return
# Print header
if show_namespace:
print(f"{'NAMESPACE':<42} {'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
else:
print(f"{'NAME':<50} {'READY':<7} {'STATUS':<20} {'RESTARTS':<9} AGE")
# Print rows
for pod in pods:
name = pod['name'][:50]
ready = pod['ready'][:7]
status = pod['status'][:20]
restarts = pod['restarts'][:9]
age = pod['age']
if show_namespace:
namespace = pod['namespace'][:42]
print(f"{namespace:<42} {name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
else:
print(f"{name:<50} {ready:<7} {status:<20} {restarts:<9} {age}")
def analyze_pods(must_gather_path: str, namespace: Optional[str] = None, problems_only: bool = False):
"""Analyze all pods in a must-gather directory."""
base_path = Path(must_gather_path)
pods = []
# Find all pod YAML files
# Structure: namespaces/<namespace>/pods/<pod-name>/<pod-name>.yaml
if namespace:
# Specific namespace
patterns = [
f"namespaces/{namespace}/pods/*/*.yaml",
f"*/namespaces/{namespace}/pods/*/*.yaml",
]
else:
# All namespaces
patterns = [
"namespaces/*/pods/*/*.yaml",
"*/namespaces/*/pods/*/*.yaml",
]
for pattern in patterns:
for pod_file in base_path.glob(pattern):
pod = parse_pod(pod_file)
if pod:
pod_status = get_pod_status(pod)
pods.append(pod_status)
if not pods:
print("No resources found.")
return 1
# Remove duplicates
seen = set()
unique_pods = []
for p in pods:
key = f"{p['namespace']}/{p['name']}"
if key not in seen:
seen.add(key)
unique_pods.append(p)
# Sort by namespace, then name
unique_pods.sort(key=lambda x: (x['namespace'], x['name']))
# Filter if problems only
if problems_only:
unique_pods = [p for p in unique_pods if p['is_problem']]
if not unique_pods:
print("No resources found.")
return 0
# Print results
print_pods_table(unique_pods, show_namespace=(namespace is None))
return 0
def main():
parser = argparse.ArgumentParser(
description='Analyze pod resources from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather.local.123456789
%(prog)s ./must-gather.local.123456789 --namespace openshift-etcd
%(prog)s ./must-gather.local.123456789 --problems-only
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-n', '--namespace', help='Filter by namespace')
parser.add_argument('-p', '--problems-only', action='store_true',
help='Show only pods with issues')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
return analyze_pods(args.must_gather_path, args.namespace, args.problems_only)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,117 @@
#!/usr/bin/env python3
"""
Analyze Prometheus data from must-gather data.
Shows Prometheus status, targets, and active alerts.
"""
import sys
import os
import json
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional
def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a JSON file."""
try:
with open(file_path, 'r', encoding='utf-8') as f:
doc = json.load(f)
return doc
except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def print_alerts_table(alerts):
"""Print alerts in a table format."""
if not alerts:
print("No alerts found.")
return
print("ALERTS")
print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")
for alert in alerts:
state = alert.get('state', '')
since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
labels = alert.get('labels', {})
namespace = labels.pop('namespace', '')[:50]
name = labels.pop('alertname', '')[:50]
severity = labels.pop('severity', '')[:10]
print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")
def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
"""Analyze Prometheus data in a must-gather directory."""
base_path = Path(must_gather_path)
# Retrieve active alerts.
rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
rules = parse_json_file(rules_path)
if rules is None:
return 1
status = rules.get("status", "")
if status != "success":
print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
return 1
if "data" not in rules or "groups" not in rules["data"]:
print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
return 1
alerts = []
for group in rules["data"]["groups"]:
for rule in group["rules"]:
if rule["type"] == 'alerting' and rule["state"] != 'inactive':
for alert in rule["alerts"]:
if namespace is None or namespace == '':
alerts.append(alert)
elif alert.get('labels', {}).get('namespace', '') == namespace:
alerts.append(alert)
# Sort alerts by namespace, alertname and severity.
alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))
# Print results
print_alerts_table(alerts)
# Summary
total_alerts = len(alerts)
pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
firing = sum(1 for alert in alerts if alert.get('state') == 'firing')
print(f"\n{'='*80}")
print(f"SUMMARY")
print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
print(f"{'='*80}")
return 0
def main():
parser = argparse.ArgumentParser(
description='Analyze Prometheus data from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather
%(prog)s ./must-gather --namespace openshift-monitoring
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-n', '--namespace', help='Filter information by namespace')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
return analyze_prometheus(args.must_gather_path, args.namespace)
if __name__ == '__main__':
sys.exit(main())

View File

@@ -0,0 +1,235 @@
#!/usr/bin/env python3
"""
Analyze PersistentVolumes and PersistentVolumeClaims from must-gather data.
Shows PV/PVC status, capacity, and binding information.
"""
import sys
import os
import yaml
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional
def parse_yaml_file(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a YAML file."""
try:
with open(file_path, 'r') as f:
doc = yaml.safe_load(f)
return doc
except Exception as e:
print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr)
return None
def format_pv(pv: Dict[str, Any]) -> Dict[str, str]:
"""Format a PersistentVolume for display."""
name = pv.get('metadata', {}).get('name', 'unknown')
spec = pv.get('spec', {})
status = pv.get('status', {})
capacity = spec.get('capacity', {}).get('storage', '')
access_modes = ','.join(spec.get('accessModes', []))[:20]
reclaim_policy = spec.get('persistentVolumeReclaimPolicy', '')
pv_status = status.get('phase', 'Unknown')
claim_ref = spec.get('claimRef', {})
claim = ''
if claim_ref:
claim_ns = claim_ref.get('namespace', '')
claim_name = claim_ref.get('name', '')
claim = f"{claim_ns}/{claim_name}" if claim_ns else claim_name
storage_class = spec.get('storageClassName', '')
return {
'name': name,
'capacity': capacity,
'access_modes': access_modes,
'reclaim_policy': reclaim_policy,
'status': pv_status,
'claim': claim,
'storage_class': storage_class
}
def format_pvc(pvc: Dict[str, Any]) -> Dict[str, str]:
"""Format a PersistentVolumeClaim for display."""
metadata = pvc.get('metadata', {})
name = metadata.get('name', 'unknown')
namespace = metadata.get('namespace', 'unknown')
spec = pvc.get('spec', {})
status = pvc.get('status', {})
pvc_status = status.get('phase', 'Unknown')
volume = spec.get('volumeName', '')
capacity = status.get('capacity', {}).get('storage', '')
access_modes = ','.join(status.get('accessModes', []))[:20]
storage_class = spec.get('storageClassName', '')
return {
'namespace': namespace,
'name': name,
'status': pvc_status,
'volume': volume,
'capacity': capacity,
'access_modes': access_modes,
'storage_class': storage_class
}
def print_pvs_table(pvs: List[Dict[str, str]]):
"""Print PVs in a table format."""
if not pvs:
print("No PersistentVolumes found.")
return
print("PERSISTENT VOLUMES")
print(f"{'NAME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} {'RECLAIM':<10} {'STATUS':<10} {'CLAIM':<40} STORAGECLASS")
for pv in pvs:
name = pv['name'][:50]
capacity = pv['capacity'][:10]
access = pv['access_modes'][:20]
reclaim = pv['reclaim_policy'][:10]
status = pv['status'][:10]
claim = pv['claim'][:40]
sc = pv['storage_class']
print(f"{name:<50} {capacity:<10} {access:<20} {reclaim:<10} {status:<10} {claim:<40} {sc}")
def print_pvcs_table(pvcs: List[Dict[str, str]]):
"""Print PVCs in a table format."""
if not pvcs:
print("\nNo PersistentVolumeClaims found.")
return
print("\nPERSISTENT VOLUME CLAIMS")
print(f"{'NAMESPACE':<30} {'NAME':<40} {'STATUS':<10} {'VOLUME':<50} {'CAPACITY':<10} {'ACCESS MODES':<20} STORAGECLASS")
for pvc in pvcs:
namespace = pvc['namespace'][:30]
name = pvc['name'][:40]
status = pvc['status'][:10]
volume = pvc['volume'][:50]
capacity = pvc['capacity'][:10]
access = pvc['access_modes'][:20]
sc = pvc['storage_class']
print(f"{namespace:<30} {name:<40} {status:<10} {volume:<50} {capacity:<10} {access:<20} {sc}")
def analyze_storage(must_gather_path: str, namespace: Optional[str] = None):
"""Analyze PVs and PVCs in a must-gather directory."""
base_path = Path(must_gather_path)
# Find PVs (cluster-scoped)
pv_patterns = [
"cluster-scoped-resources/core/persistentvolumes/*.yaml",
"*/cluster-scoped-resources/core/persistentvolumes/*.yaml",
]
pvs = []
for pattern in pv_patterns:
for pv_file in base_path.glob(pattern):
if pv_file.name == 'persistentvolumes.yaml':
continue
pv = parse_yaml_file(pv_file)
if pv and pv.get('kind') == 'PersistentVolume':
pvs.append(format_pv(pv))
# Find PVCs (namespace-scoped)
if namespace:
pvc_patterns = [
f"namespaces/{namespace}/core/persistentvolumeclaims.yaml",
f"*/namespaces/{namespace}/core/persistentvolumeclaims.yaml",
]
else:
pvc_patterns = [
"namespaces/*/core/persistentvolumeclaims.yaml",
"*/namespaces/*/core/persistentvolumeclaims.yaml",
]
pvcs = []
for pattern in pvc_patterns:
for pvc_file in base_path.glob(pattern):
pvc_doc = parse_yaml_file(pvc_file)
if pvc_doc:
if pvc_doc.get('kind') == 'PersistentVolumeClaim':
pvcs.append(format_pvc(pvc_doc))
elif pvc_doc.get('kind') == 'List':
for item in pvc_doc.get('items', []):
if item.get('kind') == 'PersistentVolumeClaim':
pvcs.append(format_pvc(item))
# Remove duplicates
seen_pvs = set()
unique_pvs = []
for pv in pvs:
if pv['name'] not in seen_pvs:
seen_pvs.add(pv['name'])
unique_pvs.append(pv)
seen_pvcs = set()
unique_pvcs = []
for pvc in pvcs:
key = f"{pvc['namespace']}/{pvc['name']}"
if key not in seen_pvcs:
seen_pvcs.add(key)
unique_pvcs.append(pvc)
# Sort
unique_pvs.sort(key=lambda x: x['name'])
unique_pvcs.sort(key=lambda x: (x['namespace'], x['name']))
# Print results
print_pvs_table(unique_pvs)
print_pvcs_table(unique_pvcs)
# Summary
total_pvs = len(unique_pvs)
bound_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Bound')
available_pvs = sum(1 for pv in unique_pvs if pv['status'] == 'Available')
total_pvcs = len(unique_pvcs)
bound_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Bound')
pending_pvcs = sum(1 for pvc in unique_pvcs if pvc['status'] == 'Pending')
print(f"\n{'='*80}")
print(f"SUMMARY")
print(f"PVs: {total_pvs} total ({bound_pvs} bound, {available_pvs} available)")
print(f"PVCs: {total_pvcs} total ({bound_pvcs} bound, {pending_pvcs} pending)")
if pending_pvcs > 0:
print(f" ⚠️ {pending_pvcs} PVC(s) pending - check storage provisioner")
print(f"{'='*80}")
return 0
def main():
parser = argparse.ArgumentParser(
description='Analyze PVs and PVCs from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather
%(prog)s ./must-gather --namespace openshift-monitoring
"""
)
parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-n', '--namespace', help='Filter PVCs by namespace')
args = parser.parse_args()
if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1
return analyze_storage(args.must_gather_path, args.namespace)
if __name__ == '__main__':
sys.exit(main())