8.5 KiB
description, argument-hint
| description | argument-hint |
|---|---|
| Debug OLM issues using must-gather logs and source code analysis | <issue-description> <must-gather-path> [olm-version] |
Name
olm:debug
Synopsis
/olm:debug <issue-description> <must-gather-path> [olm-version]
Description
The olm:debug command analyzes OLM (Operator Lifecycle Manager) issues by correlating must-gather logs with the appropriate OLM source code. It automatically determines the OCP version from the must-gather logs, checks out the corresponding branch from the relevant OLM repositories, queries Jira for known bugs in the OCPBUGS project (OLM component), and provides detailed analysis and debugging insights.
Arguments
- $1 (required): Issue description - A brief description of the OLM issue being investigated
- $2 (required): Must-gather path - Absolute or relative path to the must-gather log directory
- $3 (optional): OLM version - Either
olmv0(default) orolmv1olmv0: Uses operator-framework-olm repositoryolmv1: Uses operator-framework-operator-controller and cluster-olm-operator repositories
Implementation
Phase 1: Environment Setup and Validation
-
Validate arguments
- Check that issue description is provided
- Verify must-gather path exists and is accessible
- Set OLM version to
olmv0if not specified
-
Parse must-gather logs to determine OCP version
- Look for version information in must-gather logs
- Common locations:
cluster-scoped-resources/core/nodes/*.yaml- check node annotationscluster-scoped-resources/config.openshift.io/clusterversions/*.yaml
- Extract OCP version (e.g.,
4.14,4.15,4.16) - Determine corresponding branch name (e.g.,
release-4.14)
-
Create working directory
- Use
.work/olm-debug/<timestamp>/for temporary files - Create subdirectories:
repos/,analysis/,logs/
- Use
Phase 2: Repository Setup
-
Clone appropriate repositories based on OLM version
For olmv0:
- Clone
https://github.com/openshift/operator-framework-olm.git - Checkout branch
release-<ocp-version>(e.g.,release-4.14) - If branch doesn't exist, try
mainormasterbranch
For olmv1:
- Clone
https://github.com/openshift/operator-framework-operator-controller.git - Clone
https://github.com/openshift/cluster-olm-operator.git - For each repo, checkout branch
release-<ocp-version> - If branch doesn't exist, try
mainormasterbranch
- Clone
-
Verify repository setup
- Confirm branches are checked out successfully
- List key directories to understand codebase structure
Phase 3: Log Analysis
-
Extract relevant OLM logs from must-gather
- For olmv0, look for:
namespaces/openshift-operator-lifecycle-manager/logs- OLM operator logs:
pods/catalog-operator-*/,pods/olm-operator-*/ - CSV (ClusterServiceVersion) resources
- Subscription resources
- InstallPlan resources
- For olmv1, look for:
namespaces/openshift-operator-controller/logs- Operator controller logs
- ClusterExtension resources
- Catalog resources
- For olmv0, look for:
-
Identify error patterns and relevant logs
- Search for ERROR, WARN, FATAL level logs
- Extract stack traces
- Identify failed reconciliations
- Note timestamps of issues
Phase 4: Known Bug Search in Jira
-
Query Jira for known OLM bugs
- Search OCPBUGS project with component "olm"
- Use Jira REST API or web scraping to fetch bugs
- Query parameters:
- Project:
OCPBUGS - Component:
olm - Affects Version: Matches the OCP version (e.g.,
4.14.0,4.15.0) - Status: Open, In Progress, or Recently Resolved
- Project:
- API endpoint example:
https://issues.redhat.com/rest/api/2/search?jql=project=OCPBUGS AND component=olm AND affectedVersion~"4.14"
-
Match errors with known bugs
- Extract error messages and keywords from logs
- Search for matching patterns in Jira bug summaries and descriptions
- Look for similar symptoms in bug reports
- Identify potential matches based on:
- Error message similarity
- Affected OCP version
- Component affected (catalog-operator, olm-operator, etc.)
- Symptom descriptions
-
Categorize and prioritize matches
- High priority: Exact error message match with same OCP version
- Medium priority: Similar symptoms with same component
- Low priority: Related issues in same version range
- Note bugs that have patches or workarounds available
Phase 5: Code Correlation
-
Map errors to source code
- Search cloned repositories for:
- Error messages found in logs
- Function names from stack traces
- Related controllers and reconcilers
- Use grep/ripgrep to find relevant code sections
- Search cloned repositories for:
-
Analyze relevant code sections
- Read the source code around identified errors
- Understand the reconciliation logic
- Identify potential root causes
Phase 6: Analysis and Recommendations
-
Generate detailed analysis report
- Summary of the issue
- OCP and OLM version information
- Timeline of events from logs
- Known bugs section with Jira links
- Relevant code sections with explanations
- Potential root causes
- Recommended debugging steps
- Suggested fixes or workarounds
-
Create output files
analysis.md: Detailed analysis reportrelevant-logs.txt: Extracted relevant log entriescode-references.md: Links to relevant source code sections with line numbersknown-bugs.md: List of potentially related Jira bugs with match confidence
Error Handling
- Must-gather path not found: Provide clear error message with expected path format
- Unable to determine OCP version: Ask user to provide OCP version manually
- Repository clone failures: Check network connectivity, provide manual clone instructions
- Branch not found: Fall back to main/master branch and warn user about version mismatch
- No relevant logs found: Provide guidance on what logs to look for manually
- Jira access failures: Continue with analysis if Jira is unavailable; note in report that known bug search was skipped
- Jira authentication required: Provide instructions for setting up Jira credentials if needed
Return Value
The command generates the following outputs in .work/olm-debug/<timestamp>/:
-
analysis.md: Comprehensive analysis report including:
- Issue summary
- Version information (OCP, OLM)
- Log analysis with timeline
- Known bugs section with links to matching Jira issues
- Code correlation and root cause analysis
- Recommendations
-
relevant-logs.txt: Extracted relevant log entries from must-gather
-
code-references.md: Links to relevant source code files with line numbers
-
known-bugs.md: List of potentially related Jira bugs including:
- Bug ID and link (e.g., OCPBUGS-12345)
- Bug summary and status
- Match confidence (High/Medium/Low)
- Affected versions
- Available workarounds or patches
-
repos/: Cloned repository directories for further manual investigation
Examples
-
Basic usage with olmv0 (default):
/olm:debug "CSV stuck in pending state" /path/to/must-gather -
Debug olmv1 issue:
/olm:debug "ClusterExtension installation failing" /path/to/must-gather olmv1 -
Debug with detailed issue description:
/olm:debug "Operator upgrade from v1.0 to v2.0 fails with dependency resolution error" ~/Downloads/must-gather.local.123456 olmv0
Notes
- The command requires
gitto be installed for cloning repositories - Network access is required to clone from GitHub and access Jira
- Large must-gather archives may take time to process
- The analysis is based on pattern matching and may require manual verification
- For private repositories, ensure GitHub credentials are configured
- Jira access to https://issues.redhat.com/ may require authentication for full access
- Known bug matching is based on text similarity and may produce false positives
- Always verify suggested bug matches by reading the full bug description
See Also
- OLM Documentation: https://olm.operatorframework.io/
- OpenShift OLM: https://docs.openshift.com/container-platform/latest/operators/understanding/olm/olm-understanding-olm.html
- Must-gather documentation: https://docs.openshift.com/container-platform/latest/support/gathering-cluster-data.html
- OCPBUGS Jira Project: https://issues.redhat.com/projects/OCPBUGS/
- Jira REST API: https://docs.atlassian.com/jira-software/REST/latest/