Files
gh-openshift-eng-ai-helpers…/commands/analyze.md
2025-11-30 08:46:21 +08:00

349 lines
13 KiB
Markdown

---
description: Analyze sosreport archive for system diagnostics and issues
argument-hint: <path-to-sosreport> [--only <areas>] [--skip <areas>]
---
## Name
sosreport:analyze
## Synopsis
```
/sosreport:analyze <path-to-sosreport> [--only <areas>] [--skip <areas>]
```
**Analysis Areas:**
- **`logs`**: Analyze system and application logs (journald, syslog, dmesg, application logs)
- Identifies errors, warnings, critical messages
- Detects OOM killer events, kernel panics, segfaults
- Counts and categorizes errors by severity
- Provides timeline of critical events
- **`resources`**: Analyze system resource usage (memory, CPU, disk, processes)
- Memory usage, swap, and pressure indicators
- CPU information and load averages
- Disk usage and filesystem capacity
- Top resource consumers and zombie processes
- **`network`**: Analyze network configuration and connectivity
- Network interface status and IP addresses
- Routing table and default gateway
- Active connections and listening services
- Firewall rules (firewalld/iptables)
- DNS configuration and hostname resolution
- **`system-config`**: Analyze system configuration (packages, services, security)
- OS version and kernel information
- Installed package versions
- Systemd service status and failures
- SELinux/AppArmor configuration and denials
- Kernel parameters and resource limits
## Description
The `sosreport:analyze` command performs comprehensive analysis of a sosreport archive (from <https://github.com/sosreport/sos>) to identify system issues, configuration problems, and potential causes of failures. It examines system logs, resource usage, network configuration, installed packages, and other diagnostic data collected by sosreport.
By default, all analysis areas are executed. Use `--only` to run specific areas or `--skip` to exclude areas from analysis.
## Arguments
- `$1` (required): Path to the sosreport archive file (`.tar.gz` or `.tar.xz`) or extracted directory
- `--only <areas>` (optional): Comma-separated list of analysis areas to run. Valid areas: `logs`, `resources`, `network`, `system-config`. If not specified, all areas are analyzed.
- `--skip <areas>` (optional): Comma-separated list of analysis areas to skip. Valid areas: `logs`, `resources`, `network`, `system-config`. Cannot be used with `--only`.
## Implementation
The sosreport analysis is organized into several specialized phases, each with detailed implementation guidance in separate skill documents. The command supports selective analysis through optional arguments.
### 1. Parse Arguments and Determine Analysis Scope
1. **Parse command-line arguments**
- Extract the sosreport path (required first argument)
- Check for `--only` flag and parse comma-separated areas
- Check for `--skip` flag and parse comma-separated areas
- Validate that `--only` and `--skip` are not used together
2. **Validate analysis areas**
- Valid areas: `logs`, `resources`, `network`, `system-config`
- If invalid area specified, return error with list of valid areas
- Normalize area names (case-insensitive, accept variations like `system` for `system-config`)
3. **Determine which skills to run**
- If no flags specified: Run all skills (default comprehensive analysis)
- If `--only` specified: Run only the specified skills
- If `--skip` specified: Run all skills except the specified ones
- Store the list of skills to execute for later phases
4. **Example argument parsing**:
```bash
# Parse: /sosreport:analyze /path/sos.tar.gz --only logs,network
# Result: Run only logs-analysis and network-analysis skills
# Parse: /sosreport:analyze /path/sos.tar.gz --skip resources
# Result: Run logs, network, and system-config (skip resources)
# Parse: /sosreport:analyze /path/sos.tar.gz
# Result: Run all skills (comprehensive analysis)
```
### 2. Extract and Validate Sosreport
1. **Check if path exists**
- Verify the provided path points to a valid file or directory
- If file doesn't exist, return error with helpful message
2. **Extract archive if needed**
- If path is a `.tar.gz` or `.tar.xz` file:
- Create extraction directory: `.work/sosreport-analyze/{timestamp}/`
- Extract archive: `tar -xf <path> -C .work/sosreport-analyze/{timestamp}/`
- Store extracted directory path for analysis
- If path is already a directory:
- Verify it's a valid sosreport directory (check for `sos_commands/`, `sos_logs/`, etc.)
- Use the directory directly
3. **Identify sosreport structure**
- Locate the root directory (usually has format `sosreport-{hostname}-{date}/`)
- Verify expected directories exist: `sos_commands/`, `sos_logs/`, `sos_reports/`
### 3. Analyze System Logs
**Run condition**: Only if `logs` area is selected (or no filters specified)
**Detailed implementation**: See `plugins/sosreport/skills/logs-analysis/SKILL.md`
Perform comprehensive log analysis including:
- Journald logs (journalctl output)
- System logs (messages, dmesg, secure)
- Application-specific logs
- Error counting and categorization
- Timeline of critical events
- OOM killer events, kernel panics, segfaults
**Key outputs**:
- Error statistics by severity
- Top error messages by frequency
- Critical findings with timestamps
- Log file locations for investigation
### 4. Analyze Resource Usage
**Run condition**: Only if `resources` area is selected (or no filters specified)
**Detailed implementation**: See `plugins/sosreport/skills/resource-analysis/SKILL.md`
Perform resource analysis including:
- Memory usage and pressure indicators
- CPU information and load averages
- Disk usage and I/O errors
- Process analysis (top consumers, zombies)
- Resource exhaustion patterns
**Key outputs**:
- Memory usage metrics and swap status
- CPU count and load per CPU
- Filesystems near capacity
- Top CPU and memory-consuming processes
- Resource-related issues and recommendations
### 5. Analyze Network Configuration
**Run condition**: Only if `network` area is selected (or no filters specified)
**Detailed implementation**: See `plugins/sosreport/skills/network-analysis/SKILL.md`
Perform network analysis including:
- Network interface configuration and status
- Routing table and default gateway
- Active connections and listening services
- Firewall rules (firewalld/iptables)
- DNS configuration and hostname resolution
- Network errors from logs
**Key outputs**:
- Interface status with IP addresses
- Routing configuration
- Connection statistics by state
- Firewall configuration summary
- DNS and hostname settings
- Network-related errors and issues
### 6. Analyze Installed Packages and System Configuration
**Run condition**: Only if `system-config` area is selected (or no filters specified)
**Detailed implementation**: See `plugins/sosreport/skills/system-config-analysis/SKILL.md`
Perform system configuration analysis including:
- OS version and kernel information
- Installed package versions
- Systemd service status
- Failed services with reasons
- SELinux/AppArmor configuration and denials
- Kernel parameters and resource limits
**Key outputs**:
- System information summary
- Key package versions
- Failed services with failure reasons
- SELinux status and denial count
- Configuration issues and recommendations
### 7. Generate Interactive Summary
1. **Create findings structure**
- Organize findings by category (Critical, High, Medium, Low, Info)
- Include only findings from the selected analysis areas
- For each finding, include:
- Severity level
- Category (logs, resources, network, packages, config)
- Description of the issue
- Evidence (file paths, log snippets, metrics)
- Recommended actions
2. **Display summary in terminal**
- Show executive summary with key statistics
- List critical and high-severity findings
- Provide file paths for detailed investigation
- Include timeline of significant events
- Suggest next steps for troubleshooting
3. **Format output**
```bash
SOSREPORT ANALYSIS SUMMARY
==========================
System: {hostname}
Report Date: {date}
OS: {os_version}
Kernel: {kernel_version}
CRITICAL ISSUES (count)
-----------------------
- [Issue description with file reference]
HIGH PRIORITY (count)
---------------------
- [Issue description with file reference]
MEDIUM PRIORITY (count)
-----------------------
- [Issue description with file reference]
RESOURCE SUMMARY
----------------
- Memory: X GB used / Y GB total (Z% used)
- Disk: Most full filesystem at X%
- Load Average: X.XX, X.XX, X.XX
TOP ERRORS IN LOGS
------------------
1. [Error message] (count occurrences)
2. [Error message] (count occurrences)
FAILED SERVICES
---------------
- [service name]: [reason]
RECOMMENDATIONS
---------------
1. [Actionable recommendation]
2. [Actionable recommendation]
ANALYSIS LOCATION
-----------------
Extracted to: {extraction_path}
```
4. **Interactive drill-down**
- Offer to explore specific areas in more detail
- Allow user to ask follow-up questions about findings
- Provide file paths for manual investigation
## Return Value
- **Format**: Interactive summary displayed in terminal with categorized findings
- **Exit code**:
- 0 if analysis completes successfully
- 1 if sosreport path is invalid
- 2 if sosreport structure is malformed
## Examples
1. **Comprehensive analysis (default)**:
```bash
/sosreport:analyze /tmp/sosreport-server01-2024-01-15.tar.xz
```
Extracts archive to `.work/sosreport-analyze/{timestamp}/` and performs comprehensive analysis using all skills (logs, resources, network, system-config).
2. **Analyze only logs and network**:
```bash
/sosreport:analyze /tmp/sosreport-server01-2024-01-15.tar.xz --only logs,network
```
Performs only log analysis and network analysis. Useful when investigating connectivity or service issues without needing full resource analysis.
3. **Skip resource analysis**:
```bash
/sosreport:analyze /tmp/sosreport.tar.gz --skip resources
```
Performs all analysis except resource analysis. Useful when you already know resource metrics and want to focus on configuration and logs.
4. **Quick log-only analysis**:
```bash
/sosreport:analyze /tmp/sosreport.tar.xz --only logs
```
Performs only log analysis. Fastest option for quickly identifying errors and critical events without analyzing configuration or resources.
5. **Analyze extracted sosreport directory**:
```bash
/sosreport:analyze /tmp/sosreport-server01-2024-01-15/
```
Analyzes an already extracted sosreport directory with comprehensive analysis.
6. **Selective analysis on extracted directory**:
```bash
/sosreport:analyze /tmp/sosreport-server01-2024-01-15/ --only system-config,network
```
Analyzes only system configuration and network from an already extracted directory.
7. **Follow-up investigation**:
```bash
User: /sosreport:analyze /tmp/sosreport.tar.gz --only logs
Agent: [Shows log analysis summary]
User: Can you now analyze the resources as well?
Agent: /sosreport:analyze /tmp/sosreport.tar.gz --only resources
Agent: [Shows resource analysis]
```
## Notes
- Sosreport structure varies by OS version and sosreport version
- Command handles both compressed archives and extracted directories
- Analysis focuses on common issues but can be extended for specific use cases
- For OpenShift/Kubernetes sosreports, additional pod/container analysis may be relevant
- Large sosreports (>1GB) may take several minutes to analyze
- **Selective analysis**: Use `--only` or `--skip` to run specific analysis areas for faster results
- **Performance**: Running only needed analysis areas reduces analysis time significantly
- **Valid areas**: `logs`, `resources`, `network`, `system-config`
## Prerequisites
1. **tar utility**: Required for extracting compressed sosreports
- Check: `which tar`
- Usually pre-installed on Linux/macOS
2. **Sufficient disk space**: Extracted sosreports can be large
- Check available space: `df -h .work/`
- Recommend at least 2x the compressed archive size
## See Also
### Analysis Skills
- **Logs Analysis**: `plugins/sosreport/skills/logs-analysis/SKILL.md` - Detailed guidance for analyzing system and application logs
- **Resource Analysis**: `plugins/sosreport/skills/resource-analysis/SKILL.md` - Detailed guidance for analyzing memory, CPU, disk, and processes
- **Network Analysis**: `plugins/sosreport/skills/network-analysis/SKILL.md` - Detailed guidance for analyzing network configuration and connectivity
- **System Configuration Analysis**: `plugins/sosreport/skills/system-config-analysis/SKILL.md` - Detailed guidance for analyzing packages, services, and security settings
### External Resources
- Sosreport documentation: <https://github.com/sosreport/sos>
- Red Hat sosreport guide: <https://access.redhat.com/solutions/3592>