Initial commit
This commit is contained in:
665
skills/prow-job-analyze-install-failure/SKILL.md
Normal file
665
skills/prow-job-analyze-install-failure/SKILL.md
Normal file
@@ -0,0 +1,665 @@
|
||||
---
|
||||
name: Prow Job Analyze Install Failure
|
||||
description: Analyze OpenShift installation failures in Prow CI jobs by examining installer logs, log bundles, and sosreports. Use when CI job fails "install should succeed" tests at bootstrap, cluster creation or other stages.
|
||||
---
|
||||
|
||||
# Prow Job Analyze Install Failure
|
||||
|
||||
This skill helps debug OpenShift installation failures in CI jobs by downloading and analyzing installer logs, log bundles, and sosreports from Google Cloud Storage.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- A CI job fails with "install should succeed" test failure
|
||||
- You need to debug installation failures at specific stages (bootstrap, install-complete, etc.)
|
||||
- You need to analyze installer logs and log bundles from failed CI jobs
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting, verify these prerequisites:
|
||||
|
||||
1. **gcloud CLI Installation**
|
||||
- Check if installed: `which gcloud`
|
||||
- If not installed, provide instructions for the user's platform
|
||||
- Installation guide: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
2. **gcloud Authentication (Optional)**
|
||||
- The `test-platform-results` bucket is publicly accessible
|
||||
- No authentication is required for read access
|
||||
- Skip authentication checks
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
1. **Prow job URL** - URL to the failed CI job
|
||||
- Example: `https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview/1983307151598161920`
|
||||
- The URL should contain `test-platform-results/`
|
||||
|
||||
## Understanding Job Types from Names
|
||||
|
||||
Job names contain important clues about the test environment and what to look for:
|
||||
|
||||
1. **Upgrade Jobs** (names containing "upgrade")
|
||||
- These jobs perform a **fresh installation first**, then upgrade
|
||||
- **Minor upgrade jobs**: Contain "upgrade-from-stable-4.X" in the name - upgrade from previous minor version (e.g., 4.21 job installs 4.20, then upgrades to 4.21)
|
||||
- Example: `periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade`
|
||||
- **Micro upgrade jobs**: Have "upgrade" in the name but NO "upgrade-from-stable" - upgrade within the same minor version (e.g., earlier 4.21 to newer 4.21)
|
||||
- Example: `periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips`
|
||||
- Example: `periodic-ci-openshift-release-master-ci-4.21-e2e-azure-ovn-upgrade`
|
||||
- If installation fails, the upgrade never happens
|
||||
- **Key point**: Installation failures in upgrade jobs are still installation failures, not upgrade failures
|
||||
|
||||
2. **FIPS Jobs** (names containing "fips")
|
||||
- FIPS mode enabled for cryptographic operations
|
||||
- Pay special attention to errors related to:
|
||||
- Cryptography libraries
|
||||
- TLS/SSL handshakes
|
||||
- Certificate validation
|
||||
- Hash algorithms
|
||||
|
||||
3. **IPv6 and Dualstack Jobs** (names containing "ipv6" or "dualstack")
|
||||
- Using IPv6 or dual IPv4/IPv6 networking stack
|
||||
- Most IPv6 jobs are **disconnected** (no internet access)
|
||||
- Use a locally-hosted mirror registry for images
|
||||
- Pay attention to:
|
||||
- Network connectivity errors
|
||||
- DNS resolution issues
|
||||
- Mirror registry logs
|
||||
- IPv6 address configuration
|
||||
|
||||
4. **Metal Jobs with IPv6** (names containing "metal" and "ipv6")
|
||||
- Disconnected environment with additional complexity
|
||||
- The metal install failure skill will analyze squid proxy logs and disconnected environment configuration
|
||||
|
||||
5. **Single-Node Jobs** (names containing "single-node")
|
||||
- All control plane and compute workloads on one node
|
||||
- More prone to resource exhaustion
|
||||
- Pay attention to CPU, memory, and disk pressure
|
||||
|
||||
6. **Platform-Specific Indicators**
|
||||
- `aws`, `gcp`, `azure`: Cloud platform used
|
||||
- `metal`, `baremetal`: Bare metal environment (uses specialized metal install failure skill)
|
||||
- `ovn`: OVN-Kubernetes networking (standard)
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Parse and Validate URL
|
||||
|
||||
1. **Extract bucket path**
|
||||
- Find `test-platform-results/` in URL
|
||||
- Extract everything after it as the GCS bucket relative path
|
||||
- If not found, error: "URL must contain 'test-platform-results/'"
|
||||
|
||||
2. **Extract build_id**
|
||||
- Search for pattern `/(\d{10,})/` in the bucket path
|
||||
- build_id must be at least 10 consecutive decimal digits
|
||||
- Handle URLs with or without trailing slash
|
||||
- If not found, error: "Could not find build ID (10+ digits) in URL"
|
||||
|
||||
3. **Determine job type**
|
||||
- Check if job name contains "metal" (case-insensitive)
|
||||
- Metal jobs: Set `is_metal_job = true`
|
||||
- Other jobs: Set `is_metal_job = false`
|
||||
|
||||
4. **Construct GCS paths**
|
||||
- Bucket: `test-platform-results`
|
||||
- Base GCS path: `gs://test-platform-results/{bucket-path}/`
|
||||
- Ensure path ends with `/`
|
||||
|
||||
### Step 2: Create Working Directory
|
||||
|
||||
1. **Create directory structure**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-analyze-install-failure/{build_id}/logs
|
||||
mkdir -p .work/prow-job-analyze-install-failure/{build_id}/analysis
|
||||
```
|
||||
- Use `.work/prow-job-analyze-install-failure/` as the base directory (already in .gitignore)
|
||||
- Use build_id as subdirectory name
|
||||
- Create `logs/` subdirectory for all downloads
|
||||
- Create `analysis/` subdirectory for analysis files
|
||||
- Working directory: `.work/prow-job-analyze-install-failure/{build_id}/`
|
||||
|
||||
### Step 3: Download prowjob.json and Identify Target
|
||||
|
||||
1. **Download prowjob.json**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-analyze-install-failure/{build_id}/logs/prowjob.json --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Parse and validate**
|
||||
- Read `.work/prow-job-analyze-install-failure/{build_id}/logs/prowjob.json`
|
||||
- Search for pattern: `--target=([a-zA-Z0-9-]+)`
|
||||
- If not found:
|
||||
- Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
|
||||
- Explain: ci-operator jobs have a --target argument specifying the test target
|
||||
- Exit skill
|
||||
|
||||
3. **Extract target name**
|
||||
- Capture the target value (e.g., `e2e-aws-ovn-techpreview`)
|
||||
- Store for constructing artifact paths
|
||||
|
||||
### Step 4: Download JUnit XML to Identify Failure Stage
|
||||
|
||||
**Note on install-status.txt**: You may see an `install-status.txt` file in the artifacts. This file contains only the installer's exit code (a single number). The junit_install.xml file translates this exit code into a human-readable failure mode, so always prefer junit_install.xml for determining the failure stage.
|
||||
|
||||
1. **Find junit_install.xml**
|
||||
- Use recursive listing to find the file anywhere in the job artifacts:
|
||||
```bash
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "junit_install.xml"
|
||||
```
|
||||
- The file location varies by job configuration - don't assume any specific path
|
||||
|
||||
2. **Download junit_install.xml**
|
||||
- Download from the discovered location:
|
||||
```bash
|
||||
# Download from wherever it was found
|
||||
gcloud storage cp {full-gcs-path-to-junit_install.xml} .work/prow-job-analyze-install-failure/{build_id}/logs/junit_install.xml --no-user-output-enabled
|
||||
```
|
||||
- If file not found, continue anyway (older jobs or early failures may not have this file)
|
||||
|
||||
3. **Parse junit_install.xml to find failure stage**
|
||||
- Look for failed test cases with pattern `install should succeed: <stage>`
|
||||
- Installation failure modes:
|
||||
- **`cluster bootstrap`** - Early install failure where we failed to bootstrap the cluster. Bootstrap is typically an ephemeral VM that runs a temporary kube apiserver. Check bootkube logs in the bundle.
|
||||
- **`infrastructure`** - Early failure before we're able to create all cloud resources. Often but not always due to cloud quota, rate limiting, or outages. Check installer log for cloud API errors.
|
||||
- **`cluster creation`** - Usually means one or more operators was unable to stabilize. Check operator logs in gather-must-gather artifacts.
|
||||
- **`configuration`** - Extremely rare failure mode where we failed to create the install-config.yaml for one reason or another. Check installer log for validation errors.
|
||||
- **`cluster operator stability`** - Operators never stabilized (available=True, progressing=False, degraded=False). Check specific operator logs to determine why they didn't reach stable state.
|
||||
- **`other`** - Unknown install failure, could be for one of the previously declared reasons, or an unknown one. Requires full log analysis.
|
||||
- Extract the failure stage for targeted log analysis
|
||||
- Use the failure mode to guide which logs to prioritize
|
||||
|
||||
### Step 5: Locate and Download Installer Logs
|
||||
|
||||
1. **List all artifacts to find installer logs**
|
||||
- Installer logs follow the pattern `.openshift_install*.log`
|
||||
- **IMPORTANT**: Exclude deprovision logs - they are from cluster teardown, not installation
|
||||
- Use recursive listing to find all installer logs:
|
||||
```bash
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep -E "\.openshift_install.*\.log$" | grep -v "deprovision"
|
||||
```
|
||||
- This will find installer logs regardless of which CI step created them
|
||||
|
||||
2. **Download all installer logs found**
|
||||
```bash
|
||||
# For each installer log found in the listing (excluding deprovision)
|
||||
gcloud storage cp {full-gcs-path-to-installer-log} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
```
|
||||
- Download all installer logs found that are NOT from deprovision steps
|
||||
- If multiple installer logs exist, download all of them (they may be from different install phases)
|
||||
- Deprovision logs are from cluster cleanup and not relevant for installation failures
|
||||
|
||||
### Step 6: Locate and Download Log Bundle
|
||||
|
||||
1. **List all artifacts to find log bundle**
|
||||
- Log bundles are `.tar` files (NOT `.tar.gz`) starting with `log-bundle-`
|
||||
- **IMPORTANT**: Prefer non-deprovision log bundles over deprovision ones
|
||||
- Use recursive listing to find all log bundles:
|
||||
```bash
|
||||
# Find all log bundles, preferring non-deprovision
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep -E "log-bundle.*\.tar$"
|
||||
```
|
||||
- This will find log bundles regardless of which CI step created them
|
||||
|
||||
2. **Download log bundle**
|
||||
```bash
|
||||
# If non-deprovision log bundles exist, download one of those (prefer most recent by timestamp)
|
||||
# Otherwise, download deprovision log bundle if that's the only one available
|
||||
gcloud storage cp {full-gcs-path-to-log-bundle} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
```
|
||||
- Prefer log bundles NOT from deprovision steps (they capture the failure state during installation)
|
||||
- Deprovision log bundles may also contain useful info if no other bundle exists
|
||||
- If multiple log bundles exist, prefer the one from a non-deprovision step
|
||||
- If no log bundle found, continue with installer log analysis only (early failures may not produce log bundles)
|
||||
|
||||
3. **Extract log bundle**
|
||||
```bash
|
||||
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/log-bundle-{timestamp}.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
```
|
||||
|
||||
### Step 7: Invoke Metal Install Failure Skill (Metal Jobs Only)
|
||||
|
||||
**IMPORTANT: Only perform this step if `is_metal_job = true`**
|
||||
|
||||
Metal IPI jobs use **dev-scripts** with **Metal3** and **Ironic** to install OpenShift on bare metal. These require specialized analysis.
|
||||
|
||||
1. **Invoke the metal install failure skill**
|
||||
- Use the Skill tool to invoke: `prow-job:prow-job-analyze-metal-install-failure`
|
||||
- Pass the following information:
|
||||
- Build ID: `{build_id}`
|
||||
- Bucket path: `{bucket-path}`
|
||||
- Target name: `{target}`
|
||||
- Working directory already created: `.work/prow-job-analyze-install-failure/{build_id}/`
|
||||
|
||||
2. **The metal skill will**:
|
||||
- Download and analyze dev-scripts logs (setup process before OpenShift installation)
|
||||
- Download and analyze libvirt console logs (VM/node boot sequence)
|
||||
- Download and analyze optional artifacts (sosreport, squid logs)
|
||||
- Determine if failure was in dev-scripts setup or cluster installation
|
||||
- Generate metal-specific analysis report
|
||||
|
||||
3. **Continue with standard analysis**:
|
||||
- After metal skill completes, continue with Step 8 (Analyze Installer Logs)
|
||||
- The metal skill provides additional context about dev-scripts and console logs
|
||||
- Standard installer log analysis is still relevant for understanding cluster creation failures
|
||||
|
||||
### Step 8: Analyze Installer Logs
|
||||
|
||||
**CRITICAL: Understanding OpenShift's Eventual Consistency**
|
||||
|
||||
OpenShift installations exhibit "eventual consistency" behavior, which means:
|
||||
- Components may report errors while waiting for dependencies to become ready
|
||||
- Example: Ingress operator may error waiting for networking, which errors waiting for other components
|
||||
- These intermediate errors are **expected and normal** during installation
|
||||
- Early errors in the log often resolve themselves and are NOT the root cause
|
||||
|
||||
**Error Analysis Strategy**:
|
||||
1. **Start with the NEWEST/FINAL errors** - Work backwards in time
|
||||
2. Focus on errors that persisted until installation timeout
|
||||
3. Track backwards from final errors to identify the dependency chain
|
||||
4. Early errors are only relevant if they directly relate to the final failure state
|
||||
5. Don't chase errors that occurred early and then disappeared - they likely resolved
|
||||
|
||||
**Example**: If installation fails at 40 minutes with "kube-apiserver not available", an error at 5 minutes saying "ingress operator degraded" is likely irrelevant because it probably resolved. Focus on what was still broken when the timeout occurred.
|
||||
|
||||
1. **Read installer log**
|
||||
- The installer log is a sequential log file with timestamp, log level, and message
|
||||
- Format: `time="YYYY-MM-DDTHH:MM:SSZ" level=<level> msg="<message>"`
|
||||
|
||||
2. **Identify key failure indicators (WORK BACKWARDS FROM END)**
|
||||
- **Start at the end of the log** - Look at final error/fatal messages
|
||||
- **Error messages**: Lines with `level=error` or `level=fatal` near the end of the log
|
||||
- **Last status messages**: The final "Still waiting for" or "Cluster operators X, Y, Z are not available" messages
|
||||
- **Warning messages**: Lines with `level=warning` near the failure time that may indicate problems
|
||||
- **Then work backwards** to find when the failing component first started having issues
|
||||
- Ignore errors from early in the log unless they persist to the end
|
||||
|
||||
3. **Extract relevant log sections (prioritize recent errors)**
|
||||
- For bootstrap failures:
|
||||
- Search for: "bootstrap", "bootkube", "kube-apiserver", "etcd" in the **last 20% of the log**
|
||||
- For install-complete failures:
|
||||
- Search for: "Cluster operators", "clusteroperator", "degraded", "available" in the **final messages**
|
||||
- For timeout failures:
|
||||
- Search for: "context deadline exceeded", "timeout", "timed out"
|
||||
- Look at what component was being waited for when timeout occurred
|
||||
|
||||
4. **Create installer log summary**
|
||||
- Extract **final/last** error or fatal message (most important)
|
||||
- Extract the last "Still waiting for..." message showing what didn't stabilize
|
||||
- Extract surrounding context (10-20 lines before and after final errors)
|
||||
- Optionally note early errors only if they relate to the final failure
|
||||
- Save to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/installer-summary.txt`
|
||||
|
||||
### Step 9: Analyze Log Bundle
|
||||
|
||||
**Skip this step if no log bundle was downloaded**
|
||||
|
||||
1. **Understand log bundle structure**
|
||||
- `log-bundle-{timestamp}/`
|
||||
- `bootstrap/journals/` - Journal logs from bootstrap node
|
||||
- `bootkube.log` - Bootkube service that starts initial control plane
|
||||
- `kubelet.log` - Kubelet service logs
|
||||
- `crio.log` - Container runtime logs
|
||||
- `journal.log.gz` - Complete system journal (gzipped)
|
||||
- `bootstrap/network/` - Network configuration
|
||||
- `ip-addr.txt` - IP addresses
|
||||
- `ip-route.txt` - Routing table
|
||||
- `hostname.txt` - Hostname
|
||||
- `serial/` - Serial console logs from all nodes
|
||||
- `{cluster-name}-bootstrap-serial.log` - Bootstrap node console
|
||||
- `{cluster-name}-master-N-serial.log` - Master node consoles
|
||||
- `clusterapi/` - Cluster API resources
|
||||
- `*.yaml` - Kubernetes resource definitions
|
||||
- `etcd.log` - etcd logs
|
||||
- `kube-apiserver.log` - API server logs
|
||||
- `failed-units.txt` - List of systemd units that failed
|
||||
- `gather.log` - Log bundle collection process log
|
||||
|
||||
2. **Analyze based on failure mode from junit_install.xml**
|
||||
|
||||
**For "cluster bootstrap" failures:**
|
||||
- Check `bootstrap/journals/bootkube.log` for bootkube errors
|
||||
- Check `bootstrap/journals/kubelet.log` for kubelet issues
|
||||
- Check `clusterapi/kube-apiserver.log` for API server startup issues
|
||||
- Check `clusterapi/etcd.log` for etcd cluster formation issues
|
||||
- Check `serial/{cluster-name}-bootstrap-serial.log` for bootstrap VM boot issues
|
||||
- Look for temporary control plane startup problems
|
||||
- This is an early failure - focus on bootstrap node and initial control plane
|
||||
|
||||
**For "infrastructure" failures:**
|
||||
- Primary focus on installer log, not log bundle (failure happens before bootstrap)
|
||||
- Search installer log for cloud provider API errors
|
||||
- Look for quota exceeded messages (e.g., "QuotaExceeded", "LimitExceeded")
|
||||
- Look for rate limiting errors (e.g., "RequestLimitExceeded", "Throttling")
|
||||
- Check for authentication/permission errors
|
||||
- **Infrastructure provisioning methods** (varies by OpenShift version):
|
||||
- **Newer versions**: Use **Cluster API (CAPI)** to provision infrastructure
|
||||
- Look for errors in ClusterAPI-related logs and resources
|
||||
- Check for Machine/MachineSet/MachineDeployment errors
|
||||
- Search for "clusterapi" or "machine-api" related errors
|
||||
- **Older versions**: Use **Terraform** to provision infrastructure
|
||||
- Look for "terraform" in log entries
|
||||
- Check for terraform state errors or apply failures
|
||||
- Search for terraform-related error messages
|
||||
- Log bundle may not exist or be incomplete for this failure mode
|
||||
|
||||
**For "cluster creation" failures:**
|
||||
- Check if must-gather was successfully collected:
|
||||
- Look for `must-gather*.tar` files in the gather-must-gather step directory
|
||||
- If NO .tar file exists, must-gather collection failed (cluster was too unstable)
|
||||
- Do NOT suggest downloading must-gather if the .tar file doesn't exist
|
||||
- If must-gather exists, check for operator logs
|
||||
- Look for degraded cluster operators
|
||||
- Check operator-specific logs to see why they couldn't stabilize
|
||||
- Review cluster operator status conditions
|
||||
- This indicates cluster bootstrapped but operators failed to deploy
|
||||
|
||||
**For "configuration" failures:**
|
||||
- Focus entirely on installer log
|
||||
- Look for install-config.yaml validation errors
|
||||
- Check for missing required fields or invalid values
|
||||
- This is a very early failure before any infrastructure is created
|
||||
- Log bundle will not exist for this failure mode
|
||||
|
||||
**For "cluster operator stability" failures:**
|
||||
- Similar to "cluster creation" but operators are stuck in unstable state
|
||||
- Check if must-gather was successfully collected (look for `must-gather*.tar` files)
|
||||
- If must-gather doesn't exist, rely on installer log and log bundle only
|
||||
- Check for operators with available=False, progressing=True, or degraded=True
|
||||
- Review operator logs in gather-must-gather (if it exists)
|
||||
- Check for resource conflicts or dependency issues
|
||||
- Look at time-series of operator status changes
|
||||
|
||||
**For "other" failures:**
|
||||
- Perform comprehensive analysis of all available logs
|
||||
- Check installer log for any errors or fatal messages
|
||||
- Review log bundle if available
|
||||
- Look for unusual patterns or timeout messages
|
||||
|
||||
3. **Extract key information**
|
||||
- If `failed-units.txt` exists, read it to find failed services
|
||||
- For each failed service, find corresponding journal log
|
||||
- Extract error messages from journal logs
|
||||
- Save findings to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/log-bundle-summary.txt`
|
||||
|
||||
### Step 10: Generate Analysis Report
|
||||
|
||||
1. **Create comprehensive analysis report**
|
||||
- Combine findings from all sources:
|
||||
- Installer log analysis
|
||||
- Log bundle analysis
|
||||
- sosreport analysis (if applicable)
|
||||
|
||||
2. **Report structure**
|
||||
```
|
||||
OpenShift Installation Failure Analysis
|
||||
========================================
|
||||
|
||||
Job: {job-name}
|
||||
Build ID: {build_id}
|
||||
Job Type: {metal/cloud}
|
||||
Prow URL: {original-url}
|
||||
|
||||
Failure Stage: {stage from junit_install.xml}
|
||||
|
||||
Summary
|
||||
-------
|
||||
{High-level summary of the failure}
|
||||
|
||||
Installer Log Analysis
|
||||
----------------------
|
||||
{Key findings from installer log}
|
||||
|
||||
First Error:
|
||||
{First error message with timestamp}
|
||||
|
||||
Context:
|
||||
{Surrounding log lines}
|
||||
|
||||
Log Bundle Analysis
|
||||
-------------------
|
||||
{Findings from log bundle}
|
||||
|
||||
Failed Units:
|
||||
{List from failed-units.txt}
|
||||
|
||||
Key Journal Errors:
|
||||
{Important errors from journal logs}
|
||||
|
||||
Metal Installation Analysis (Metal Jobs Only)
|
||||
-----------------------------------------
|
||||
{Summary from metal install failure skill}
|
||||
- Dev-scripts setup status
|
||||
- Console log findings
|
||||
- Key metal-specific errors
|
||||
|
||||
See detailed metal analysis: .work/prow-job-analyze-install-failure/{build_id}/analysis/metal-analysis.txt
|
||||
|
||||
Recommended Next Steps
|
||||
----------------------
|
||||
{Actionable debugging steps based on failure mode:
|
||||
|
||||
For "configuration" failures:
|
||||
- Review install-config.yaml validation errors
|
||||
- Check for missing required fields
|
||||
- Verify credential format and availability
|
||||
|
||||
For "infrastructure" failures:
|
||||
- Check cloud provider quota and limits
|
||||
- Review cloud provider service status for outages
|
||||
- Verify API credentials and permissions
|
||||
- Check for rate limiting in cloud API calls
|
||||
|
||||
For "cluster bootstrap" failures:
|
||||
- Review bootkube logs for control plane startup issues
|
||||
- Check etcd cluster formation in etcd.log
|
||||
- Examine kube-apiserver startup in kube-apiserver.log
|
||||
- Review bootstrap VM serial console for boot issues
|
||||
|
||||
For "cluster creation" failures:
|
||||
- Identify which operators failed to deploy
|
||||
- Check if must-gather was collected (look for must-gather*.tar files)
|
||||
- If must-gather exists: Review specific operator logs in gather-must-gather
|
||||
- If must-gather doesn't exist: Cluster was too unstable to collect diagnostics; rely on installer log and log bundle
|
||||
- Check for resource conflicts or missing dependencies
|
||||
|
||||
For "cluster operator stability" failures:
|
||||
- Identify operators not reaching stable state
|
||||
- Check operator conditions (available, progressing, degraded)
|
||||
- Check if must-gather exists before suggesting to review it
|
||||
- Review operator logs for stuck operations (if must-gather available)
|
||||
- Look for time-series of operator status changes
|
||||
|
||||
For "other" failures:
|
||||
- Perform comprehensive log review
|
||||
- Look for timeout or unusual error patterns
|
||||
- Check all available artifacts systematically
|
||||
}
|
||||
|
||||
Artifacts Location
|
||||
------------------
|
||||
All artifacts downloaded to:
|
||||
.work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
|
||||
- Installer logs: .openshift_install*.log
|
||||
- Log bundle: log-bundle-*/
|
||||
- sosreport: sosreport-*/ (metal jobs only)
|
||||
```
|
||||
|
||||
3. **Save report**
|
||||
- Save to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/report.txt`
|
||||
|
||||
### Step 11: Present Results to User
|
||||
|
||||
1. **Display summary**
|
||||
- Show the analysis report to the user
|
||||
- Highlight the most critical findings
|
||||
- Provide file paths for further investigation
|
||||
|
||||
2. **Offer next steps**
|
||||
- Based on the failure type, suggest specific debugging actions:
|
||||
- For bootstrap failures: Check API server and etcd logs
|
||||
- For install-complete failures: Check cluster operator status
|
||||
- For network issues: Review network configuration
|
||||
- For metal job failures: Examine VM console logs
|
||||
- **IMPORTANT**: Only suggest reviewing must-gather if you verified the .tar file exists
|
||||
- Don't suggest downloading must-gather if no .tar file was found
|
||||
- If must-gather doesn't exist, note that the cluster was too unstable to collect it
|
||||
|
||||
3. **Provide artifact locations**
|
||||
- List all downloaded files with their paths
|
||||
- Note whether must-gather was successfully collected or not
|
||||
- Explain how to explore the logs further
|
||||
- Mention that artifacts are cached for faster re-analysis
|
||||
|
||||
## Installation Stages Reference
|
||||
|
||||
Understanding the installation stages helps target analysis:
|
||||
|
||||
1. **Pre-installation** (Failure mode: "configuration")
|
||||
- Validation of install-config.yaml
|
||||
- Credential checks
|
||||
- Image resolution
|
||||
- **Common failures**: Invalid install-config.yaml, missing required fields, validation errors
|
||||
|
||||
2. **Infrastructure Creation** (Failure mode: "infrastructure")
|
||||
- Creating cloud resources (VMs, networks, storage)
|
||||
- For metal: VM provisioning on hypervisor
|
||||
- **Common failures**: Cloud quota exceeded, rate limiting, API outages, permission errors
|
||||
|
||||
3. **Bootstrap** (Failure mode: "cluster bootstrap")
|
||||
- Bootstrap node boots with temporary control plane
|
||||
- Bootstrap API server and etcd start
|
||||
- Bootstrap creates master nodes
|
||||
- **Common failures**: API server won't start, etcd formation issues, bootkube errors
|
||||
|
||||
4. **Master Node Bootstrap**
|
||||
- Master nodes boot and join bootstrap etcd
|
||||
- Masters form permanent control plane
|
||||
- Bootstrap control plane transfers to masters
|
||||
- **Common failures**: Masters can't reach bootstrap, network issues, ignition failures
|
||||
|
||||
5. **Bootstrap Complete**
|
||||
- Bootstrap node is no longer needed
|
||||
- Masters are running permanent control plane
|
||||
- Cluster operators begin initialization
|
||||
- **Common failures**: Control plane not transferring, master nodes not ready
|
||||
|
||||
6. **Cluster Operators Initialization** (Failure mode: "cluster creation")
|
||||
- Core cluster operators start
|
||||
- Operators begin deployment
|
||||
- Initial operator stabilization
|
||||
- **Common failures**: Operators can't deploy, resource conflicts, dependency issues
|
||||
|
||||
7. **Cluster Operators Stabilization** (Failure mode: "cluster operator stability")
|
||||
- Operators reach stable state (available=True, progressing=False, degraded=False)
|
||||
- Worker nodes can join
|
||||
- **Common failures**: Operators stuck progressing, degraded state, availability issues
|
||||
|
||||
8. **Install Complete**
|
||||
- All cluster operators are available and stable
|
||||
- Cluster is fully functional
|
||||
- Installation successful
|
||||
|
||||
## Failure Mode Mapping
|
||||
|
||||
| JUnit Failure Mode | Installation Stage | Where to Look | Artifacts Available |
|
||||
|-------------------|-------------------|---------------|-------------------|
|
||||
| `configuration` | Pre-installation | Installer log only | No log bundle |
|
||||
| `infrastructure` | Infrastructure Creation | Installer log, cloud API errors | Partial or no log bundle |
|
||||
| `cluster bootstrap` | Bootstrap | Log bundle (bootkube, etcd, kube-apiserver) | Full log bundle |
|
||||
| `cluster creation` | Operators Initialization | gather-must-gather, operator logs | Full artifacts |
|
||||
| `cluster operator stability` | Operators Stabilization | gather-must-gather, operator status | Full artifacts |
|
||||
| `other` | Unknown | All available logs | Varies |
|
||||
|
||||
## Key Files Reference
|
||||
|
||||
### Installer Logs
|
||||
- **Location**: `.openshift_install-{timestamp}.log`
|
||||
- **Format**: Structured log with timestamp, level, message
|
||||
- **Key patterns**:
|
||||
- `level=error` - Error messages
|
||||
- `level=fatal` - Fatal errors that stop installation
|
||||
- `waiting for` - Timeout/waiting messages
|
||||
|
||||
### Log Bundle Structure
|
||||
- **bootstrap/journals/bootkube.log**: Bootstrap control plane initialization
|
||||
- **bootstrap/journals/kubelet.log**: Bootstrap kubelet (container orchestration)
|
||||
- **clusterapi/kube-apiserver.log**: API server logs
|
||||
- **clusterapi/etcd.log**: etcd cluster logs
|
||||
- **serial/*.log**: Node console output
|
||||
- **failed-units.txt**: systemd services that failed
|
||||
|
||||
### Metal Job Artifacts
|
||||
|
||||
Metal installations are handled by the `prow-job-analyze-metal-install-failure` skill.
|
||||
See that skill's documentation for details on dev-scripts, libvirt logs, sosreport, and squid logs.
|
||||
|
||||
## Tips
|
||||
|
||||
- **CRITICAL**: Work backwards from the end of the installer log, not forwards from the beginning
|
||||
- Early errors often resolve themselves due to eventual consistency - focus on final errors
|
||||
- Always start by checking the installer log for the **LAST** error, not the first
|
||||
- The log bundle provides detailed node-level diagnostics
|
||||
- Serial console logs show the actual boot sequence and can reveal kernel panics
|
||||
- Failed systemd units in failed-units.txt are strong indicators of the problem
|
||||
- Bootstrap failures are often etcd or API server related
|
||||
- Install-complete failures are usually cluster operator issues
|
||||
- Metal jobs use a specialized skill for analysis (prow-job-analyze-metal-install-failure)
|
||||
- Cache artifacts in `.work/prow-job-analyze-install-failure/{build_id}/` for re-analysis
|
||||
- Use grep with relevant keywords to filter large log files
|
||||
- Timeline of events from installer log helps correlate issues across logs
|
||||
- Pay attention to job name clues: fips, ipv6, dualstack, metal, single-node, upgrade
|
||||
- IPv6 jobs are often disconnected and use mirror registries
|
||||
- Only suggest must-gather if the .tar file exists; if not, cluster was too unstable
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Eventual Consistency Behavior**
|
||||
- OpenShift installations exhibit eventual consistency
|
||||
- Components report errors while waiting for dependencies
|
||||
- Early errors are EXPECTED and usually resolve automatically
|
||||
- **Always analyze backwards from the final timeout, not forwards from the start**
|
||||
- Only errors that persist until failure are relevant root causes
|
||||
|
||||
2. **Upgrade Jobs and Installation**
|
||||
- Jobs with "upgrade" in the name perform installation FIRST, then upgrade
|
||||
- If you're analyzing an installation failure in an upgrade job, it never got to the upgrade phase
|
||||
- "minor" upgrade: Installs 4.n-1 version (e.g., 4.20 for a 4.21 upgrade job)
|
||||
- "micro" upgrade: Installs earlier payload in same stream
|
||||
|
||||
3. **Log Bundle Availability**
|
||||
- Not all jobs produce log bundles
|
||||
- Older jobs may not have this feature
|
||||
- Installation must reach a certain point to generate log bundle
|
||||
|
||||
4. **Must-Gather Availability**
|
||||
- Must-gather only exists if a `must-gather*.tar` file is present
|
||||
- If no .tar file exists, the cluster was too unstable to collect diagnostics
|
||||
- **Never suggest downloading must-gather unless you verified the .tar file exists**
|
||||
|
||||
5. **Metal Job Specifics**
|
||||
- Metal jobs are analyzed using the specialized `prow-job-analyze-metal-install-failure` skill
|
||||
- That skill handles dev-scripts, libvirt console logs, sosreport, and squid logs
|
||||
- See the metal skill documentation for details
|
||||
|
||||
6. **Debugging Workflow**
|
||||
- Start with installer log to find **LAST** error (not first)
|
||||
- Use failure stage to guide which logs to examine
|
||||
- Log bundle provides node-level details
|
||||
- For metal jobs, invoke the metal-specific skill for additional analysis
|
||||
- Work backwards in time to trace dependency chains
|
||||
|
||||
7. **Common Failure Patterns**
|
||||
- **Bootstrap etcd not starting**: Check etcd.log and bootkube.log
|
||||
- **API server not responding**: Check kube-apiserver.log
|
||||
- **Masters not joining**: Check master serial logs
|
||||
- **Operators degraded**: Check specific operator logs in must-gather (if it exists)
|
||||
- **Network issues**: Check network configuration in bootstrap/network/
|
||||
|
||||
8. **File Formats**
|
||||
- Installer log: Plain text, structured format
|
||||
- Journal logs: systemd journal format (plain text export)
|
||||
- Serial logs: Raw console output
|
||||
- YAML files: Kubernetes resource definitions
|
||||
- Compressed files: .gz (gzip), .xz (xz)
|
||||
379
skills/prow-job-analyze-metal-install-failure/SKILL.md
Normal file
379
skills/prow-job-analyze-metal-install-failure/SKILL.md
Normal file
@@ -0,0 +1,379 @@
|
||||
---
|
||||
name: Prow Job Analyze Metal Install Failure
|
||||
description: Analyze OpenShift bare metal installation failures in Prow CI jobs using dev-scripts artifacts. Use for jobs with "metal" in name, for debugging Metal3/Ironic provisioning, installation, or dev-scripts setup failures. You may also use the prow-job-analyze-install-failure skill with this one.
|
||||
---
|
||||
|
||||
# Prow Job Analyze Metal Install Failure
|
||||
|
||||
This skill helps debug OpenShift bare metal installation failures in CI jobs by analyzing dev-scripts logs, libvirt console logs, sosreports, and other metal-specific artifacts.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- A bare metal CI job fails with "install should succeed" test failure
|
||||
- The job name contains "metal" or "baremetal"
|
||||
- You need to debug Metal3/Ironic provisioning issues
|
||||
- You need to analyze dev-scripts setup failures
|
||||
|
||||
This skill is invoked by the main `prow-job-analyze-install-failure` skill when it detects a metal job.
|
||||
|
||||
## Metal Installation Overview
|
||||
|
||||
Metal IPI jobs use **dev-scripts** (https://github.com/openshift-metal3/dev-scripts) with **Metal3** and **Ironic** to install OpenShift:
|
||||
- **dev-scripts**: Framework for setting up and installing OpenShift on bare metal
|
||||
- **Metal3**: Kubernetes-native interface to Ironic
|
||||
- **Ironic**: Bare metal provisioning service
|
||||
|
||||
The installation process has multiple layers:
|
||||
1. **dev-scripts setup**: Configures hypervisor, sets up Ironic/Metal3, builds installer
|
||||
2. **Ironic provisioning**: Provisions bare metal nodes (or VMs acting as bare metal)
|
||||
3. **OpenShift installation**: Standard installer runs on provisioned nodes
|
||||
|
||||
Failures can occur at any layer, so analysis must check all of them.
|
||||
|
||||
## Network Architecture (CRITICAL for Understanding IPv6/Disconnected Jobs)
|
||||
|
||||
**IMPORTANT**: The term "disconnected" refers to the cluster nodes, NOT the hypervisor.
|
||||
|
||||
### Hypervisor (dev-scripts host)
|
||||
- **HAS** full internet access
|
||||
- Downloads packages, container images, and dependencies from the public internet
|
||||
- Runs dev-scripts Ansible playbooks that download tools (Go, installer, etc.)
|
||||
- Hosts a local mirror registry to serve the cluster
|
||||
|
||||
### Cluster VMs/Nodes
|
||||
- Run in a **private IPv6-only network** (when IP_STACK=v6)
|
||||
- **NO** direct internet access (truly disconnected)
|
||||
- Pull container images from the hypervisor's local mirror registry
|
||||
- Access to hypervisor services only (registry, DNS, etc.)
|
||||
|
||||
### Common Misconception
|
||||
When analyzing failures in "metal-ipi-ovn-ipv6" jobs:
|
||||
- ❌ WRONG: "The hypervisor cannot access the internet, so downloads fail"
|
||||
- ✅ CORRECT: "The hypervisor has internet access. If downloads fail, it's likely due to the remote service being unavailable, not network restrictions"
|
||||
|
||||
### Implications for Failure Analysis
|
||||
1. **Dev-scripts failures** (steps 01-05): If external downloads fail, check if the remote service/URL is down or has removed the resource
|
||||
2. **Installation failures** (step 06+): If cluster nodes cannot pull images, check the local mirror registry on the hypervisor
|
||||
3. **HTTP 403/404 errors during dev-scripts**: Usually means the resource was removed from the upstream source, not that the network is restricted
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **gcloud CLI Installation**
|
||||
- Check if installed: `which gcloud`
|
||||
- If not installed, provide instructions for the user's platform
|
||||
- Installation guide: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
2. **gcloud Authentication (Optional)**
|
||||
- The `test-platform-results` bucket is publicly accessible
|
||||
- No authentication is required for read access
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
1. **Build ID** - Extracted by the main skill
|
||||
2. **Bucket path** - Extracted by the main skill
|
||||
3. **Target name** - Extracted by the main skill
|
||||
4. **Working directory** - Already created by main skill
|
||||
|
||||
## Metal-Specific Artifacts
|
||||
|
||||
Metal jobs produce several diagnostic archives:
|
||||
|
||||
### OFCIR Acquisition Logs
|
||||
- **Location**: `{target}/ofcir-acquire/`
|
||||
- **Purpose**: Shows the OFCIR host acquisition process
|
||||
- **Contains**:
|
||||
- `build-log.txt`: Log showing pool, provider, and host details
|
||||
- `artifacts/junit_metal_setup.xml`: JUnit with test `[sig-metal] should get working host from infra provider`
|
||||
- **Critical for**: Determining if the job failed to acquire a host before installation started
|
||||
- **Key information**:
|
||||
- Pool name (e.g., "cipool-ironic-cluster-el9", "cipool-ibmcloud")
|
||||
- Provider (e.g., "ironic", "equinix", "aws", "ibmcloud")
|
||||
- Host name and details
|
||||
|
||||
### Dev-scripts Logs
|
||||
- **Location**: `{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/`
|
||||
- **Purpose**: Shows installation setup process and cluster installation
|
||||
- **Contains**: Numbered log files showing each setup step (requirements, host config, Ironic setup, installer build, cluster creation). **Note**: dev-scripts invokes the installer, so installer logs (`.openshift_install*.log`) will also be present in the devscripts folders.
|
||||
- **Critical for**: Early failures before cluster creation, Ironic/Metal3 setup issues, installation failures
|
||||
|
||||
### libvirt-logs.tar
|
||||
- **Location**: `{target}/baremetalds-devscripts-gather/artifacts/`
|
||||
- **Purpose**: VM/node console logs showing boot sequence
|
||||
- **Contains**: Console output from bootstrap and master VMs/nodes
|
||||
- **Critical for**: Boot failures, Ignition errors, kernel panics, network configuration issues
|
||||
|
||||
### sosreport
|
||||
- **Location**: `{target}/baremetalds-devscripts-gather/artifacts/`
|
||||
- **Purpose**: Hypervisor system diagnostics
|
||||
- **Contains**: Hypervisor logs, system configuration, diagnostic command output
|
||||
- **Useful for**: Hypervisor-level issues, not typically needed for VM boot problems
|
||||
|
||||
### squid-logs.tar
|
||||
- **Location**: `{target}/baremetalds-devscripts-gather/artifacts/`
|
||||
- **Purpose**: Squid proxy logs for inbound CI access to the cluster
|
||||
- **Contains**: Logs showing CI system's inbound connections to the cluster under test. **Note**: The squid proxy runs on the hypervisor for INBOUND access (CI → cluster), NOT for outbound access (cluster → registry).
|
||||
- **Critical for**: Debugging CI access issues to the cluster, particularly in IPv6/disconnected environments
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Check OFCIR Acquisition
|
||||
|
||||
1. **Download OFCIR logs**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/build-log.txt .work/prow-job-analyze-install-failure/{build_id}/logs/ofcir-build-log.txt --no-user-output-enabled 2>&1 || echo "OFCIR build log not found"
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/artifacts/junit_metal_setup.xml .work/prow-job-analyze-install-failure/{build_id}/logs/junit_metal_setup.xml --no-user-output-enabled 2>&1 || echo "OFCIR JUnit not found"
|
||||
```
|
||||
|
||||
2. **Check junit_metal_setup.xml for acquisition failure**
|
||||
- Read the JUnit file
|
||||
- Look for test case: `[sig-metal] should get working host from infra provider`
|
||||
- If the test failed, OFCIR failed to acquire a host
|
||||
- This means installation never started - the failure is in host acquisition
|
||||
|
||||
3. **Extract OFCIR details from build-log.txt**
|
||||
- Parse the JSON in the build log to extract:
|
||||
- `pool`: The OFCIR pool name
|
||||
- `provider`: The infrastructure provider
|
||||
- `name`: The host name allocated
|
||||
- Save these for the final report
|
||||
|
||||
4. **If OFCIR acquisition failed**
|
||||
- Stop analysis - installation never started
|
||||
- Report: "OFCIR host acquisition failed"
|
||||
- Include pool and provider information
|
||||
- Suggest: Check OFCIR pool availability and provider status
|
||||
|
||||
### Step 2: Download Dev-Scripts Logs
|
||||
|
||||
1. **Download dev-scripts logs directory**
|
||||
```bash
|
||||
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/ .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/ --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Handle missing dev-scripts logs gracefully**
|
||||
- Some metal jobs may not have dev-scripts artifacts
|
||||
- If missing, note this in the analysis and proceed with other artifacts
|
||||
|
||||
### Step 2: Download libvirt Console Logs
|
||||
|
||||
1. **Find and download libvirt-logs.tar**
|
||||
```bash
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "libvirt-logs\.tar$"
|
||||
gcloud storage cp {full-gcs-path-to-libvirt-logs.tar} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Extract libvirt logs**
|
||||
```bash
|
||||
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/libvirt-logs.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
```
|
||||
|
||||
### Step 3: Download Optional Artifacts
|
||||
|
||||
1. **Download sosreport (optional)**
|
||||
```bash
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "sosreport.*\.tar\.xz$"
|
||||
gcloud storage cp {full-gcs-path-to-sosreport} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-{name}.tar.xz -C .work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
```
|
||||
|
||||
2. **Download squid-logs (optional, for IPv6/disconnected jobs)**
|
||||
```bash
|
||||
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "squid-logs.*\.tar$"
|
||||
gcloud storage cp {full-gcs-path-to-squid-logs} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-{name}.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
```
|
||||
|
||||
### Step 4: Analyze Dev-Scripts Logs
|
||||
|
||||
**Check dev-scripts logs FIRST** - they show what happened during setup and installation.
|
||||
|
||||
1. **Read dev-scripts logs in order**
|
||||
- Logs are numbered sequentially showing setup steps
|
||||
- **Note**: dev-scripts invokes the installer, so you'll find `.openshift_install*.log` files in the devscripts directories
|
||||
- Look for the first error or failure
|
||||
|
||||
2. **Key errors to look for**:
|
||||
- **Host configuration failures**: Networking, DNS, storage setup issues
|
||||
- **Ironic/Metal3 setup issues**: BMC connectivity, provisioning network, node registration failures
|
||||
- **Installer build failures**: Problems building the OpenShift installer binary
|
||||
- **Install-config validation errors**: Invalid configuration before cluster creation
|
||||
- **Installation failures**: Check installer logs (`.openshift_install*.log`) present in devscripts folders
|
||||
|
||||
3. **Important distinction**:
|
||||
- If failure is in dev-scripts setup logs (01-05), the problem is in the setup process
|
||||
- If failure is in installer logs or 06_create_cluster, the problem is in the cluster installation (also analyzed by main skill)
|
||||
|
||||
4. **Save dev-scripts analysis**:
|
||||
- Save findings to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/devscripts-summary.txt`
|
||||
|
||||
### Step 5: Analyze libvirt Console Logs
|
||||
|
||||
**Console logs are CRITICAL for metal failures during cluster creation.**
|
||||
|
||||
1. **Find console logs**
|
||||
```bash
|
||||
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -name "*console*.log"
|
||||
```
|
||||
- Look for patterns like `{cluster-name}-bootstrap_console.log`, `{cluster-name}-master-{N}_console.log`
|
||||
|
||||
2. **Analyze console logs for boot/provisioning issues**:
|
||||
- **Kernel boot failures or panics**: Look for "panic", "kernel", "oops"
|
||||
- **Ignition failures**: Look for "ignition", "config fetch failed", "Ignition failed"
|
||||
- **Network configuration issues**: Look for "dhcp", "network unreachable", "DNS", "timeout"
|
||||
- **Disk mounting failures**: Look for "mount", "disk", "filesystem"
|
||||
- **Service startup failures**: Look for systemd errors, service failures
|
||||
|
||||
3. **Console logs show the complete boot sequence**:
|
||||
- As if you were watching a physical console
|
||||
- Shows kernel messages, Ignition provisioning, CoreOS startup
|
||||
- Critical for understanding what happened before the system was fully booted
|
||||
|
||||
4. **Save console log analysis**:
|
||||
- Save findings to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/console-summary.txt`
|
||||
|
||||
### Step 6: Analyze sosreport (If Downloaded)
|
||||
|
||||
**Only needed for hypervisor-level issues.**
|
||||
|
||||
1. **Check sosreport for hypervisor diagnostics**:
|
||||
- `var/log/messages` - Hypervisor system log
|
||||
- `sos_commands/` - Output of diagnostic commands
|
||||
- `etc/libvirt/` - Libvirt configuration
|
||||
|
||||
2. **Look for hypervisor-level issues**:
|
||||
- Libvirt errors
|
||||
- Network configuration problems on hypervisor
|
||||
- Resource constraints (CPU, memory, disk)
|
||||
|
||||
### Step 7: Analyze squid-logs (If Downloaded)
|
||||
|
||||
**Important for debugging CI access to the cluster.**
|
||||
|
||||
1. **Check squid proxy logs**:
|
||||
- Look for failed connections from CI to the cluster
|
||||
- Look for HTTP errors or blocked requests
|
||||
- Check patterns of CI test framework access issues
|
||||
|
||||
2. **Common issues**:
|
||||
- CI unable to connect to cluster API
|
||||
- Proxy configuration errors blocking CI access
|
||||
- Network routing issues between CI and cluster
|
||||
- **Note**: These logs are for INBOUND access (CI → cluster), not for cluster's outbound access to registries
|
||||
|
||||
### Step 8: Generate Metal-Specific Analysis Report
|
||||
|
||||
1. **Create comprehensive metal analysis report**:
|
||||
```
|
||||
Metal Installation Failure Analysis
|
||||
====================================
|
||||
|
||||
Job: {job-name}
|
||||
Build ID: {build_id}
|
||||
Prow URL: {original-url}
|
||||
|
||||
Installation Method: dev-scripts + Metal3 + Ironic
|
||||
|
||||
OFCIR Host Acquisition
|
||||
----------------------
|
||||
Pool: {pool name from OFCIR build log}
|
||||
Provider: {provider from OFCIR build log}
|
||||
Host: {host name from OFCIR build log}
|
||||
Status: {Success or Failure}
|
||||
|
||||
{If OFCIR acquisition failed, note that installation never started}
|
||||
|
||||
Dev-Scripts Analysis
|
||||
--------------------
|
||||
{Summary of dev-scripts logs}
|
||||
|
||||
Key Findings:
|
||||
- {First error in dev-scripts setup}
|
||||
- {Related errors}
|
||||
|
||||
If dev-scripts failed: The problem is in the setup process (host config, Ironic, installer build)
|
||||
If dev-scripts succeeded: The problem is in cluster installation (see main analysis)
|
||||
|
||||
Console Logs Analysis
|
||||
---------------------
|
||||
{Summary of VM/node console logs}
|
||||
|
||||
Bootstrap Node:
|
||||
- {Boot sequence status}
|
||||
- {Ignition status}
|
||||
- {Network configuration}
|
||||
- {Key errors}
|
||||
|
||||
Master Nodes:
|
||||
- {Status for each master}
|
||||
- {Key errors}
|
||||
|
||||
Hypervisor Diagnostics (sosreport)
|
||||
-----------------------------------
|
||||
{Summary of sosreport findings, if applicable}
|
||||
|
||||
Proxy Logs (squid)
|
||||
------------------
|
||||
{Summary of proxy logs, if applicable}
|
||||
Note: Squid logs show CI access to the cluster, not cluster's registry access
|
||||
|
||||
Metal-Specific Recommended Steps
|
||||
---------------------------------
|
||||
Based on the failure:
|
||||
|
||||
For dev-scripts setup failures:
|
||||
- Review host configuration (networking, DNS, storage)
|
||||
- Check Ironic/Metal3 setup logs for BMC/provisioning issues
|
||||
- Verify installer build completed successfully
|
||||
- Check installer logs in devscripts folders
|
||||
|
||||
For console boot failures:
|
||||
- Check Ignition configuration and network connectivity
|
||||
- Review kernel boot messages for hardware issues
|
||||
- Verify network configuration (DHCP, DNS, routing)
|
||||
|
||||
For CI access issues:
|
||||
- Check squid proxy logs for failed CI connections to cluster
|
||||
- Verify network routing between CI and cluster
|
||||
- Check proxy configuration
|
||||
|
||||
Artifacts Location
|
||||
------------------
|
||||
Dev-scripts logs: .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/
|
||||
Console logs: .work/prow-job-analyze-install-failure/{build_id}/logs/
|
||||
sosreport: .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-*/
|
||||
squid logs: .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-*/
|
||||
```
|
||||
|
||||
2. **Save report**:
|
||||
- Save to: `.work/prow-job-analyze-install-failure/{build_id}/analysis/metal-analysis.txt`
|
||||
|
||||
### Step 9: Return Metal Analysis to Main Skill
|
||||
|
||||
1. **Provide summary to main skill**:
|
||||
- Brief summary of metal-specific findings
|
||||
- Indication of whether failure was in dev-scripts setup or cluster installation
|
||||
- Key error messages and recommended actions
|
||||
|
||||
## Common Metal Failure Patterns
|
||||
|
||||
| Issue | Symptoms | Where to Look |
|
||||
|-------|----------|---------------|
|
||||
| **Dev-scripts host config** | Early failure before cluster creation | Dev-scripts logs (host configuration step) |
|
||||
| **Ironic/Metal3 setup** | Provisioning failures, BMC errors | Dev-scripts logs (Ironic setup), Ironic logs |
|
||||
| **Node boot failure** | VMs/nodes won't boot | Console logs (kernel, boot sequence) |
|
||||
| **Ignition failure** | Nodes boot but don't provision | Console logs (Ignition messages) |
|
||||
| **Network config** | DHCP failures, DNS issues | Console logs (network messages), dev-scripts host config |
|
||||
| **CI access issues** | Tests can't connect to cluster | squid logs (proxy logs for CI → cluster access) |
|
||||
| **Hypervisor issues** | Resource constraints, libvirt errors | sosreport (system logs, libvirt config) |
|
||||
|
||||
## Tips
|
||||
|
||||
- **Check dev-scripts logs FIRST**: They show setup and installation (dev-scripts invokes the installer)
|
||||
- **Installer logs in devscripts**: Look for `.openshift_install*.log` files in devscripts directories
|
||||
- **Console logs are critical**: They show the actual boot sequence like a physical console
|
||||
- **Ironic/Metal3 errors** often appear in dev-scripts setup logs
|
||||
- **Squid logs are for CI access**: They show inbound CI → cluster access, not outbound cluster → registry
|
||||
- **Boot vs. provisioning**: Boot failures appear in console logs, provisioning failures in Ironic logs
|
||||
- **Layer distinction**: Separate dev-scripts setup from Ironic provisioning from OpenShift installation
|
||||
206
skills/prow-job-analyze-resource/CHANGELOG.md
Normal file
206
skills/prow-job-analyze-resource/CHANGELOG.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Changelog
|
||||
|
||||
## 2025-10-16 - Regex Pattern Support, Resource List Display, and Glog Severity Detection
|
||||
|
||||
### Changes
|
||||
|
||||
1. **Regex Pattern Support** (parse_all_logs.py)
|
||||
2. **Show Searched Resources in HTML Report** (generate_html_report.py)
|
||||
3. **Glog Severity Level Detection** (parse_all_logs.py)
|
||||
|
||||
---
|
||||
|
||||
## 2025-10-16 - Glog Severity Level Detection
|
||||
|
||||
### Problem
|
||||
Pod logs were all marked as "info" level, even when they contained errors or warnings. Glog format logs (used by many Kubernetes components) have severity indicators at the start of each line:
|
||||
- `E` = Error
|
||||
- `W` = Warning
|
||||
- `I` = Info
|
||||
- `F` = Fatal
|
||||
|
||||
Example error line:
|
||||
```
|
||||
E0910 11:43:41.153414 1 service_account_controller.go:368] "Unhandled Error" err="e2e-test-..."
|
||||
```
|
||||
|
||||
This made it impossible to filter pod logs by severity level in the HTML report.
|
||||
|
||||
### Solution
|
||||
Updated `parse_pod_logs()` function to:
|
||||
1. Detect glog format at the start of each line
|
||||
2. Extract the severity character (E, W, I, F) and timestamp components
|
||||
3. Map severity to our level scheme:
|
||||
- E (Error) and F (Fatal) → `error`
|
||||
- W (Warning) → `warn`
|
||||
- I (Info) → `info`
|
||||
4. Parse glog timestamp (MMDD HH:MM:SS.microseconds) into ISO format
|
||||
5. Infer year (2025) since glog doesn't include it
|
||||
6. Default to `info` for non-glog formatted lines
|
||||
|
||||
### Changes Made
|
||||
|
||||
#### Code Changes
|
||||
- **parse_all_logs.py**:
|
||||
- Updated glog pattern regex: `^([EIWF])(\d{2})(\d{2})\s+(\d{2}:\d{2}:\d{2}\.\d+)`
|
||||
- Capture severity, month, day, and time components
|
||||
- Construct ISO 8601 timestamp with inferred year
|
||||
- Extract severity character and map to level
|
||||
- Keep default "info" for non-glog lines
|
||||
|
||||
### Testing
|
||||
|
||||
Verified with real Prow job data:
|
||||
- Pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
||||
- Pod log results:
|
||||
- 8 error-level entries (glog E and F lines)
|
||||
- 0 warning-level entries
|
||||
- 155 info-level entries
|
||||
- Sample error correctly detected: `E0910 11:43:41.153414 1 service_account_controller.go:368] "Unhandled Error" err="e2e-test-...`
|
||||
- **Timestamp parsing**: All 8 error entries now have timestamps (previously showed "No timestamp")
|
||||
- Example: `E0910 11:37:35.363241` → `2025-09-10T11:37:35.363241Z`
|
||||
|
||||
### Benefits
|
||||
- Users can now filter pod logs by severity in the HTML report
|
||||
- Error and warning pod logs are highlighted with red/yellow badges
|
||||
- Timeline shows error events in red for quick identification
|
||||
- More accurate representation of pod log severity
|
||||
|
||||
---
|
||||
|
||||
## 2025-10-16 - Regex Pattern Support
|
||||
|
||||
### Problem
|
||||
The original `parse_all_logs.py` script used simple substring matching, which meant searching for multiple resources required:
|
||||
1. Running the script multiple times (once per resource)
|
||||
2. Manually merging the JSON outputs
|
||||
3. More time and complexity
|
||||
|
||||
For example, searching for `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx` would look for that literal string (including the pipe character), finding zero results.
|
||||
|
||||
### Solution
|
||||
Updated `parse_all_logs.py` to support **regex pattern matching**:
|
||||
|
||||
1. **Regex compilation**: Compile the resource pattern as a regex for efficient matching
|
||||
2. **Smart detection**: Use fast substring search for simple patterns, regex for complex patterns
|
||||
3. **Flexible matching**: Match pattern against both `namespace` and `name` fields in audit logs
|
||||
4. **Performance optimized**: Only use regex when needed (patterns containing `|`, `.*`, `[`, etc.)
|
||||
|
||||
### Changes Made
|
||||
|
||||
#### Code Changes
|
||||
- **parse_all_logs.py**:
|
||||
- Added regex compilation for resource patterns
|
||||
- Smart detection of regex vs. simple string patterns
|
||||
- Updated both `parse_audit_logs()` and `parse_pod_logs()` functions
|
||||
- Added usage documentation for regex patterns
|
||||
|
||||
#### Documentation Changes
|
||||
- **SKILL.md**:
|
||||
- Updated "Input Format" section with regex pattern examples
|
||||
- Added "Resource Pattern Parameter" section in Step 6
|
||||
- Updated "Filter matches" explanation to reflect regex matching
|
||||
- Added Example 4 showing multi-resource search using regex
|
||||
- Updated Tips and Important Notes sections
|
||||
|
||||
### Usage Examples
|
||||
|
||||
**Before** (required multiple runs + manual merge):
|
||||
```bash
|
||||
# Run 1: First resource
|
||||
python3 parse_all_logs.py "e2e-test-project-api-pkjxf" ... > output1.json
|
||||
|
||||
# Run 2: Second resource
|
||||
python3 parse_all_logs.py "e2e-test-project-api-7zdxx" ... > output2.json
|
||||
|
||||
# Manually merge JSON files with Python
|
||||
```
|
||||
|
||||
**After** (single run):
|
||||
```bash
|
||||
# Single run for multiple resources
|
||||
python3 parse_all_logs.py "e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx" ... > output.json
|
||||
```
|
||||
|
||||
### Pattern Support
|
||||
|
||||
The script now supports all standard regex patterns:
|
||||
|
||||
- **Multiple resources**: `resource1|resource2|resource3`
|
||||
- **Wildcards**: `e2e-test-project-api-.*`
|
||||
- **Character classes**: `resource-[abc]-name`
|
||||
- **Optional characters**: `resource-name-?`
|
||||
- **Simple substrings**: `my-namespace` (backward compatible)
|
||||
|
||||
### Performance
|
||||
|
||||
- Simple patterns (no regex chars) use fast substring search
|
||||
- Regex patterns are compiled once and reused
|
||||
- No performance degradation for simple searches
|
||||
- Minimal overhead for regex searches
|
||||
|
||||
### Testing
|
||||
|
||||
Verified with real Prow job data:
|
||||
- Pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
||||
- Result: 1,047 entries (884 audit + 163 pod logs)
|
||||
- Matches manual merge of individual searches: ✓
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
All existing simple substring patterns continue to work:
|
||||
- `my-namespace` → still uses fast substring search
|
||||
- `pod-name` → still uses fast substring search
|
||||
- No breaking changes to existing functionality
|
||||
|
||||
---
|
||||
|
||||
## 2025-10-16 - Show Searched Resources in HTML Report
|
||||
|
||||
### Problem
|
||||
The HTML report only displayed the single `resource_name` parameter in the "Resources:" section. When searching for multiple resources using a regex pattern like `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`, the header would only show:
|
||||
```
|
||||
Resources: e2e-test-project-api-pkjxf
|
||||
```
|
||||
|
||||
This was misleading because the report actually contained data for both resources.
|
||||
|
||||
### Solution
|
||||
Updated `generate_html_report.py` to:
|
||||
1. Accept a `resource_pattern` parameter (the same pattern used in parse script)
|
||||
2. Parse the pattern to extract the searched resources (split on `|` for regex patterns)
|
||||
3. Display the searched resources as a comma-separated list
|
||||
4. Use only the first resource name for the HTML filename (to avoid special chars like `|`)
|
||||
|
||||
### Changes Made
|
||||
|
||||
#### Code Changes
|
||||
- **generate_html_report.py**:
|
||||
- Renamed parameter from `resource_name` to `resource_pattern`
|
||||
- Parse pattern by splitting on `|` to extract individual resources
|
||||
- Sort and display parsed resources in header
|
||||
- Sanitize filename by using only first resource and removing regex special chars
|
||||
|
||||
#### Skill Documentation
|
||||
- **SKILL.md**:
|
||||
- Updated Step 7 to specify passing `resource_pattern` instead of `resource_name`
|
||||
- Added note that the pattern should be the same as used in parse script
|
||||
- Updated Example 4 to show the expected output
|
||||
|
||||
#### Display Examples
|
||||
|
||||
**Before**:
|
||||
```
|
||||
Resources: e2e-test-project-api-pkjxf
|
||||
```
|
||||
|
||||
**After (searching with pattern "e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx")**:
|
||||
```
|
||||
Resources: e2e-test-project-api-7zdxx, e2e-test-project-api-pkjxf
|
||||
```
|
||||
|
||||
### Benefits
|
||||
- Users see **only** what they searched for, not all related resources
|
||||
- Clear indication of which resources were analyzed
|
||||
- More accurate and less cluttered
|
||||
- Filename remains safe (no special characters)
|
||||
294
skills/prow-job-analyze-resource/README.md
Normal file
294
skills/prow-job-analyze-resource/README.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Prow Job Analyze Resource Skill
|
||||
|
||||
This skill analyzes Kubernetes resource lifecycles in Prow CI job artifacts by downloading and parsing audit logs and pod logs from Google Cloud Storage, then generating interactive HTML reports with timelines.
|
||||
|
||||
## Overview
|
||||
|
||||
The skill provides both a Claude Code skill interface and standalone scripts for analyzing Prow CI job results. It helps debug test failures by tracking resource state changes throughout a test run.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. SKILL.md
|
||||
Claude Code skill definition that provides detailed implementation instructions for the AI assistant.
|
||||
|
||||
### 2. Python Scripts
|
||||
|
||||
#### parse_url.py
|
||||
Parses and validates Prow job URLs from gcsweb.
|
||||
- Extracts build_id (10+ digit identifier)
|
||||
- Extracts prowjob name
|
||||
- Constructs GCS paths
|
||||
- Validates URL format
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./parse_url.py "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/"
|
||||
```
|
||||
|
||||
**Output:** JSON with build_id, prowjob_name, bucket_path, gcs_base_path
|
||||
|
||||
#### parse_audit_logs.py
|
||||
Parses Kubernetes audit logs in JSONL format.
|
||||
- Searches for specific resources by name, kind, and namespace
|
||||
- Supports prefix matching for kinds (e.g., "pod" matches "pods")
|
||||
- Extracts timestamps, HTTP codes, verbs, and user information
|
||||
- Generates contextual summaries
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./parse_audit_logs.py ./1978913325970362368/logs pod/etcd-0 configmap/cluster-config
|
||||
```
|
||||
|
||||
**Output:** JSON array of audit log entries
|
||||
|
||||
#### parse_pod_logs.py
|
||||
Parses unstructured pod logs.
|
||||
- Flexible pattern matching with forgiving regex (handles plural/singular)
|
||||
- Detects multiple timestamp formats (glog, RFC3339, common, syslog)
|
||||
- Detects log levels (info, warn, error)
|
||||
- Generates contextual summaries
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./parse_pod_logs.py ./1978913325970362368/logs pod/etcd-0
|
||||
```
|
||||
|
||||
**Output:** JSON array of pod log entries
|
||||
|
||||
#### generate_report.py
|
||||
Generates interactive HTML reports from parsed log data.
|
||||
- Combines audit and pod log entries
|
||||
- Sorts chronologically
|
||||
- Creates interactive timeline visualization
|
||||
- Adds filtering and search capabilities
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./generate_report.py \
|
||||
report_template.html \
|
||||
output.html \
|
||||
metadata.json \
|
||||
audit_entries.json \
|
||||
pod_entries.json
|
||||
```
|
||||
|
||||
### 3. Bash Script
|
||||
|
||||
#### prow_job_resource_grep.sh
|
||||
Main orchestration script that ties everything together.
|
||||
- Checks prerequisites (Python 3, gcloud)
|
||||
- Validates gcloud authentication
|
||||
- Downloads artifacts from GCS
|
||||
- Parses logs
|
||||
- Generates HTML report
|
||||
- Provides interactive prompts and progress indicators
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./prow_job_resource_grep.sh \
|
||||
"https://gcsweb-ci.../1978913325970362368/" \
|
||||
pod/etcd-0 \
|
||||
configmap/cluster-config
|
||||
```
|
||||
|
||||
### 4. HTML Template
|
||||
|
||||
#### report_template.html
|
||||
Modern, responsive HTML template for reports featuring:
|
||||
- Interactive SVG timeline with clickable events
|
||||
- Color-coded log levels (info=blue, warn=yellow, error=red)
|
||||
- Expandable log entry details
|
||||
- Filtering by log level
|
||||
- Search functionality
|
||||
- Statistics dashboard
|
||||
- Mobile-responsive design
|
||||
|
||||
## Resource Specification Format
|
||||
|
||||
Resources can be specified in the flexible format: `[namespace:][kind/]name`
|
||||
|
||||
**Examples:**
|
||||
- `pod/etcd-0` - pod named etcd-0 in any namespace
|
||||
- `openshift-etcd:pod/etcd-0` - pod in specific namespace
|
||||
- `deployment/cluster-version-operator` - deployment in any namespace
|
||||
- `etcd-0` - any resource named etcd-0 (no kind filter)
|
||||
- `openshift-etcd:etcd-0` - any resource in specific namespace
|
||||
|
||||
**Multiple resources:**
|
||||
```bash
|
||||
pod/etcd-0,configmap/cluster-config,openshift-etcd:secret/etcd-all-certs
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Python 3** - For running parser and report generator scripts
|
||||
2. **gcloud CLI** - For downloading artifacts from GCS
|
||||
- Install: https://cloud.google.com/sdk/docs/install
|
||||
- Authenticate: `gcloud auth login`
|
||||
3. **jq** - For JSON processing (used in bash script)
|
||||
4. **Access to test-platform-results GCS bucket**
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **URL Parsing**
|
||||
- Validate URL contains `test-platform-results/`
|
||||
- Extract build_id (10+ digits)
|
||||
- Extract prowjob name
|
||||
- Construct GCS paths
|
||||
|
||||
2. **Working Directory**
|
||||
- Create `{build_id}/logs/` directory
|
||||
- Check for existing artifacts (offers to skip re-download)
|
||||
|
||||
3. **prowjob.json Validation**
|
||||
- Download prowjob.json
|
||||
- Search for `--target=` pattern
|
||||
- Exit if not a ci-operator job
|
||||
|
||||
4. **Artifact Download**
|
||||
- Download audit logs: `artifacts/{target}/gather-extra/artifacts/audit_logs/**/*.log`
|
||||
- Download pod logs: `artifacts/{target}/gather-extra/artifacts/pods/**/*.log`
|
||||
|
||||
5. **Log Parsing**
|
||||
- Parse audit logs (structured JSONL)
|
||||
- Parse pod logs (unstructured text)
|
||||
- Filter by resource specifications
|
||||
- Extract timestamps and log levels
|
||||
|
||||
6. **Report Generation**
|
||||
- Sort entries chronologically
|
||||
- Calculate timeline bounds
|
||||
- Generate SVG timeline events
|
||||
- Render HTML with template
|
||||
- Output to `{build_id}/{resource-spec}.html`
|
||||
|
||||
## Output
|
||||
|
||||
### Console Output
|
||||
```
|
||||
Resource Lifecycle Analysis Complete
|
||||
|
||||
Prow Job: pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn
|
||||
Build ID: 1978913325970362368
|
||||
Target: e2e-aws-ovn
|
||||
|
||||
Resources Analyzed:
|
||||
- pod/etcd-0
|
||||
|
||||
Artifacts downloaded to: 1978913325970362368/logs/
|
||||
|
||||
Results:
|
||||
- Audit log entries: 47
|
||||
- Pod log entries: 23
|
||||
- Total entries: 70
|
||||
|
||||
Report generated: 1978913325970362368/pod_etcd-0.html
|
||||
```
|
||||
|
||||
### HTML Report
|
||||
- Header with metadata
|
||||
- Statistics dashboard
|
||||
- Interactive timeline
|
||||
- Filterable log entries
|
||||
- Expandable details
|
||||
- Search functionality
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
{build_id}/
|
||||
├── logs/
|
||||
│ ├── prowjob.json
|
||||
│ ├── metadata.json
|
||||
│ ├── audit_entries.json
|
||||
│ ├── pod_entries.json
|
||||
│ └── artifacts/
|
||||
│ └── {target}/
|
||||
│ └── gather-extra/
|
||||
│ └── artifacts/
|
||||
│ ├── audit_logs/
|
||||
│ │ └── **/*.log
|
||||
│ └── pods/
|
||||
│ └── **/*.log
|
||||
└── {resource-spec}.html
|
||||
```
|
||||
|
||||
## Performance Features
|
||||
|
||||
1. **Caching**
|
||||
- Downloaded artifacts are cached in `{build_id}/logs/`
|
||||
- Offers to skip re-download if artifacts exist
|
||||
|
||||
2. **Incremental Processing**
|
||||
- Logs processed line-by-line
|
||||
- Memory-efficient for large files
|
||||
|
||||
3. **Progress Indicators**
|
||||
- Colored output for different log levels
|
||||
- Status messages for long-running operations
|
||||
|
||||
4. **Error Handling**
|
||||
- Graceful handling of missing files
|
||||
- Helpful error messages with suggestions
|
||||
- Continues processing if some artifacts are missing
|
||||
|
||||
## Examples
|
||||
|
||||
### Single Resource
|
||||
```bash
|
||||
./prow_job_resource_grep.sh \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/" \
|
||||
pod/etcd-0
|
||||
```
|
||||
|
||||
### Multiple Resources
|
||||
```bash
|
||||
./prow_job_resource_grep.sh \
|
||||
"https://gcsweb-ci.../1978913325970362368/" \
|
||||
pod/etcd-0 \
|
||||
configmap/cluster-config \
|
||||
openshift-etcd:secret/etcd-all-certs
|
||||
```
|
||||
|
||||
### Resource in Specific Namespace
|
||||
```bash
|
||||
./prow_job_resource_grep.sh \
|
||||
"https://gcsweb-ci.../1978913325970362368/" \
|
||||
openshift-cluster-version:deployment/cluster-version-operator
|
||||
```
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
When you ask Claude to analyze a Prow job, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:
|
||||
- Validating prerequisites
|
||||
- Parsing URLs
|
||||
- Downloading artifacts
|
||||
- Parsing logs
|
||||
- Generating reports
|
||||
|
||||
You can simply ask:
|
||||
> "Analyze pod/etcd-0 in this Prow job: https://gcsweb-ci.../1978913325970362368/"
|
||||
|
||||
Claude will execute the workflow and generate the interactive HTML report.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### gcloud authentication
|
||||
```bash
|
||||
gcloud auth login
|
||||
gcloud auth list # Verify active account
|
||||
```
|
||||
|
||||
### Missing artifacts
|
||||
- Verify job completed successfully
|
||||
- Check target name is correct
|
||||
- Confirm gather-extra ran in the job
|
||||
|
||||
### No matches found
|
||||
- Check resource name spelling
|
||||
- Try without kind filter
|
||||
- Verify resource existed during test run
|
||||
- Check namespace if specified
|
||||
|
||||
### Permission denied
|
||||
- Verify access to test-platform-results bucket
|
||||
- Check gcloud project configuration
|
||||
186
skills/prow-job-analyze-resource/SCRIPTS.md
Normal file
186
skills/prow-job-analyze-resource/SCRIPTS.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# Prow Job Analyze Resource Scripts
|
||||
|
||||
This directory contains Python scripts to parse Prow job artifacts and generate interactive HTML reports.
|
||||
|
||||
## Scripts
|
||||
|
||||
### parse_all_logs.py
|
||||
|
||||
Parses audit logs from Prow job artifacts and outputs structured JSON.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 parse_all_logs.py <resource_pattern> <audit_logs_directory>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `resource_pattern`: Pattern to search for (e.g., "e2e-test-project-api-p28m")
|
||||
- `audit_logs_directory`: Path to audit logs directory
|
||||
|
||||
**Output:**
|
||||
- Writes JSON to stdout
|
||||
- Writes status messages to stderr (first 2 lines)
|
||||
- Use `tail -n +3` to clean the output
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py \
|
||||
e2e-test-project-api-p28m \
|
||||
.work/prow-job-analyze-resource/1964725888612306944/logs/artifacts/e2e-aws-ovn-techpreview/gather-extra/artifacts/audit_logs \
|
||||
> .work/prow-job-analyze-resource/1964725888612306944/tmp/audit_entries.json 2>&1
|
||||
|
||||
tail -n +3 .work/prow-job-analyze-resource/1964725888612306944/tmp/audit_entries.json \
|
||||
> .work/prow-job-analyze-resource/1964725888612306944/tmp/audit_entries_clean.json
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Recursively finds all .log files in the audit logs directory
|
||||
2. Parses each line as JSON (JSONL format)
|
||||
3. Filters entries where the resource name or namespace contains the pattern
|
||||
4. Extracts key fields: verb, user, response code, namespace, resource type, timestamp
|
||||
5. Generates human-readable summaries for each entry
|
||||
6. Outputs sorted by timestamp
|
||||
|
||||
### generate_html_report.py
|
||||
|
||||
Generates an interactive HTML report from parsed audit log entries.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 generate_html_report.py <entries.json> <prowjob_name> <build_id> <target> <resource_name> <gcsweb_url>
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `entries.json`: Path to the cleaned JSON file from parse_all_logs.py
|
||||
- `prowjob_name`: Name of the Prow job
|
||||
- `build_id`: Build ID (numeric)
|
||||
- `target`: CI operator target name
|
||||
- `resource_name`: Primary resource name for the report
|
||||
- `gcsweb_url`: Full gcsweb URL to the Prow job
|
||||
|
||||
**Output:**
|
||||
- Creates `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \
|
||||
.work/prow-job-analyze-resource/1964725888612306944/tmp/audit_entries_clean.json \
|
||||
"periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview" \
|
||||
"1964725888612306944" \
|
||||
"e2e-aws-ovn-techpreview" \
|
||||
"e2e-test-project-api-p28mx" \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview/1964725888612306944"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
1. **Interactive Timeline**:
|
||||
- Visual timeline showing all events with color-coded severity (blue=info, yellow=warn, red=error)
|
||||
- Hover over timeline to see approximate time at cursor position
|
||||
- Click events to jump to detailed entry
|
||||
- Start/End times displayed in timeline header
|
||||
|
||||
2. **Multi-Select Filters**:
|
||||
- Filter by multiple log levels simultaneously (info/warn/error)
|
||||
- Filter by multiple verbs simultaneously (create/get/delete/etc.)
|
||||
- All levels selected by default, verbs show all when none selected
|
||||
|
||||
3. **Search**: Full-text search across summaries and content
|
||||
|
||||
4. **Expandable Details**: Click to view full JSON content for each entry
|
||||
|
||||
5. **Scroll to Top**: Floating button appears when scrolled down, smoothly returns to top
|
||||
|
||||
6. **Dark Theme**: Modern, readable dark theme optimized for long viewing sessions
|
||||
|
||||
7. **Statistics**: Summary stats showing total events, top verbs
|
||||
|
||||
**HTML Report Structure:**
|
||||
- Header with metadata (prowjob name, build ID, target, resource, GCS URL)
|
||||
- Statistics section with event counts
|
||||
- Interactive SVG timeline with:
|
||||
- Hover tooltip showing time at cursor
|
||||
- Start/End time display
|
||||
- Click events to jump to entries
|
||||
- Multi-select filter controls (level, verb, search)
|
||||
- Sorted list of entries with expandable JSON details
|
||||
- All CSS and JavaScript inline for portability
|
||||
|
||||
## Workflow
|
||||
|
||||
Complete workflow for analyzing a resource:
|
||||
|
||||
```bash
|
||||
# 1. Set variables
|
||||
BUILD_ID="1964725888612306944"
|
||||
RESOURCE_PATTERN="e2e-test-project-api-p28m"
|
||||
RESOURCE_NAME="e2e-test-project-api-p28mx"
|
||||
PROWJOB_NAME="periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview"
|
||||
TARGET="e2e-aws-ovn-techpreview"
|
||||
GCSWEB_URL="https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/${PROWJOB_NAME}/${BUILD_ID}"
|
||||
|
||||
# 2. Create working directory
|
||||
mkdir -p .work/prow-job-analyze-resource/${BUILD_ID}/logs
|
||||
mkdir -p .work/prow-job-analyze-resource/${BUILD_ID}/tmp
|
||||
|
||||
# 3. Download prowjob.json
|
||||
gcloud storage cp \
|
||||
gs://test-platform-results/logs/${PROWJOB_NAME}/${BUILD_ID}/prowjob.json \
|
||||
.work/prow-job-analyze-resource/${BUILD_ID}/logs/prowjob.json \
|
||||
--no-user-output-enabled
|
||||
|
||||
# 4. Download audit logs
|
||||
mkdir -p .work/prow-job-analyze-resource/${BUILD_ID}/logs/artifacts/${TARGET}/gather-extra/artifacts/audit_logs
|
||||
gcloud storage cp -r \
|
||||
gs://test-platform-results/logs/${PROWJOB_NAME}/${BUILD_ID}/artifacts/${TARGET}/gather-extra/artifacts/audit_logs/ \
|
||||
.work/prow-job-analyze-resource/${BUILD_ID}/logs/artifacts/${TARGET}/gather-extra/artifacts/audit_logs/ \
|
||||
--no-user-output-enabled
|
||||
|
||||
# 5. Parse audit logs
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py \
|
||||
${RESOURCE_PATTERN} \
|
||||
.work/prow-job-analyze-resource/${BUILD_ID}/logs/artifacts/${TARGET}/gather-extra/artifacts/audit_logs \
|
||||
> .work/prow-job-analyze-resource/${BUILD_ID}/tmp/audit_entries.json 2>&1
|
||||
|
||||
# 6. Clean JSON output
|
||||
tail -n +3 .work/prow-job-analyze-resource/${BUILD_ID}/tmp/audit_entries.json \
|
||||
> .work/prow-job-analyze-resource/${BUILD_ID}/tmp/audit_entries_clean.json
|
||||
|
||||
# 7. Generate HTML report
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \
|
||||
.work/prow-job-analyze-resource/${BUILD_ID}/tmp/audit_entries_clean.json \
|
||||
"${PROWJOB_NAME}" \
|
||||
"${BUILD_ID}" \
|
||||
"${TARGET}" \
|
||||
"${RESOURCE_NAME}" \
|
||||
"${GCSWEB_URL}"
|
||||
|
||||
# 8. Open report in browser
|
||||
xdg-open .work/prow-job-analyze-resource/${BUILD_ID}/${RESOURCE_NAME}.html
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Pattern Matching**: The `resource_pattern` is used for substring matching. It will find resources with names containing the pattern.
|
||||
- Example: Pattern `e2e-test-project-api-p28m` matches `e2e-test-project-api-p28mx`
|
||||
|
||||
2. **Namespaces vs Projects**: In OpenShift, searching for a namespace will also find related project resources.
|
||||
|
||||
3. **JSON Cleaning**: The parse script outputs status messages to stderr. Use `tail -n +3` to skip the first 2 lines.
|
||||
|
||||
4. **Working Directory**: All artifacts are stored in `.work/prow-job-analyze-resource/` which is in .gitignore.
|
||||
|
||||
5. **No Authentication Required**: The `test-platform-results` GCS bucket is publicly accessible.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Issue**: "No log entries found matching the specified resources"
|
||||
- Check the resource name spelling
|
||||
- Try a shorter pattern (e.g., just "project-api" instead of full name)
|
||||
- Verify the resource actually exists in the job artifacts
|
||||
|
||||
**Issue**: "JSON decode error"
|
||||
- Make sure you used `tail -n +3` to clean the JSON output
|
||||
- Check that the parse script completed successfully
|
||||
|
||||
**Issue**: "Destination URL must name an existing directory"
|
||||
- Create the target directory with `mkdir -p` before running gcloud commands
|
||||
594
skills/prow-job-analyze-resource/SKILL.md
Normal file
594
skills/prow-job-analyze-resource/SKILL.md
Normal file
@@ -0,0 +1,594 @@
|
||||
---
|
||||
name: Prow Job Analyze Resource
|
||||
description: Analyze Kubernetes resource lifecycle in Prow CI job artifacts by parsing audit logs and pod logs from GCS, generating interactive HTML reports with timelines
|
||||
---
|
||||
|
||||
# Prow Job Analyze Resource
|
||||
|
||||
This skill analyzes the lifecycle of Kubernetes resources during Prow CI job execution by downloading and parsing artifacts from Google Cloud Storage.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when the user wants to:
|
||||
- Debug Prow CI test failures by tracking resource state changes
|
||||
- Understand when and how a Kubernetes resource was created, modified, or deleted during a test
|
||||
- Analyze resource lifecycle across audit logs and pod logs from ephemeral test clusters
|
||||
- Generate interactive HTML reports showing resource events over time
|
||||
- Search for specific resources (pods, deployments, configmaps, etc.) in Prow job artifacts
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting, verify these prerequisites:
|
||||
|
||||
1. **gcloud CLI Installation**
|
||||
- Check if installed: `which gcloud`
|
||||
- If not installed, provide instructions for the user's platform
|
||||
- Installation guide: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
2. **gcloud Authentication (Optional)**
|
||||
- The `test-platform-results` bucket is publicly accessible
|
||||
- No authentication is required for read access
|
||||
- Skip authentication checks
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
|
||||
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/`
|
||||
- URL may or may not have trailing slash
|
||||
|
||||
2. **Resource specifications** - Comma-delimited list in format `[namespace:][kind/]name`
|
||||
- Supports regex patterns for matching multiple resources
|
||||
- Examples:
|
||||
- `pod/etcd-0` - pod named etcd-0 in any namespace
|
||||
- `openshift-etcd:pod/etcd-0` - pod in specific namespace
|
||||
- `etcd-0` - any resource named etcd-0 (no kind filter)
|
||||
- `pod/etcd-0,configmap/cluster-config` - multiple resources
|
||||
- `resource-name-1|resource-name-2` - multiple resources using regex OR
|
||||
- `e2e-test-project-api-.*` - all resources matching the pattern
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Parse and Validate URL
|
||||
|
||||
1. **Extract bucket path**
|
||||
- Find `test-platform-results/` in URL
|
||||
- Extract everything after it as the GCS bucket relative path
|
||||
- If not found, error: "URL must contain 'test-platform-results/'"
|
||||
|
||||
2. **Extract build_id**
|
||||
- Search for pattern `/(\d{10,})/` in the bucket path
|
||||
- build_id must be at least 10 consecutive decimal digits
|
||||
- Handle URLs with or without trailing slash
|
||||
- If not found, error: "Could not find build ID (10+ digits) in URL"
|
||||
|
||||
3. **Extract prowjob name**
|
||||
- Find the path segment immediately preceding build_id
|
||||
- Example: In `.../pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/`
|
||||
- Prowjob name: `pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn`
|
||||
|
||||
4. **Construct GCS paths**
|
||||
- Bucket: `test-platform-results`
|
||||
- Base GCS path: `gs://test-platform-results/{bucket-path}/`
|
||||
- Ensure path ends with `/`
|
||||
|
||||
### Step 2: Parse Resource Specifications
|
||||
|
||||
For each comma-delimited resource spec:
|
||||
|
||||
1. **Parse format** `[namespace:][kind/]name`
|
||||
- Split on `:` to get namespace (optional)
|
||||
- Split remaining on `/` to get kind (optional) and name (required)
|
||||
- Store as structured data: `{namespace, kind, name}`
|
||||
|
||||
2. **Validate**
|
||||
- name is required
|
||||
- namespace and kind are optional
|
||||
- Examples:
|
||||
- `pod/etcd-0` → `{kind: "pod", name: "etcd-0"}`
|
||||
- `openshift-etcd:pod/etcd-0` → `{namespace: "openshift-etcd", kind: "pod", name: "etcd-0"}`
|
||||
- `etcd-0` → `{name: "etcd-0"}`
|
||||
|
||||
### Step 3: Create Working Directory
|
||||
|
||||
1. **Check for existing artifacts first**
|
||||
- Check if `.work/prow-job-analyze-resource/{build_id}/logs/` directory exists and has content
|
||||
- If it exists with content:
|
||||
- Use AskUserQuestion tool to ask:
|
||||
- Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?"
|
||||
- Options:
|
||||
- "Use existing" - Skip to artifact parsing step (Step 6)
|
||||
- "Re-download" - Continue to clean and re-download
|
||||
- If user chooses "Re-download":
|
||||
- Remove all existing content: `rm -rf .work/prow-job-analyze-resource/{build_id}/logs/`
|
||||
- Also remove tmp directory: `rm -rf .work/prow-job-analyze-resource/{build_id}/tmp/`
|
||||
- This ensures clean state before downloading new content
|
||||
- If user chooses "Use existing":
|
||||
- Skip directly to Step 6 (Parse Audit Logs)
|
||||
- Still need to download prowjob.json if it doesn't exist
|
||||
|
||||
2. **Create directory structure**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs
|
||||
mkdir -p .work/prow-job-analyze-resource/{build_id}/tmp
|
||||
```
|
||||
- Use `.work/prow-job-analyze-resource/` as the base directory (already in .gitignore)
|
||||
- Use build_id as subdirectory name
|
||||
- Create `logs/` subdirectory for all downloads
|
||||
- Create `tmp/` subdirectory for temporary files (intermediate JSON, etc.)
|
||||
- Working directory: `.work/prow-job-analyze-resource/{build_id}/`
|
||||
|
||||
### Step 4: Download and Validate prowjob.json
|
||||
|
||||
1. **Download prowjob.json**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-analyze-resource/{build_id}/logs/prowjob.json --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Parse and validate**
|
||||
- Read `.work/prow-job-analyze-resource/{build_id}/logs/prowjob.json`
|
||||
- Search for pattern: `--target=([a-zA-Z0-9-]+)`
|
||||
- If not found:
|
||||
- Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
|
||||
- Explain: ci-operator jobs have a --target argument specifying the test target
|
||||
- Exit skill
|
||||
|
||||
3. **Extract target name**
|
||||
- Capture the target value (e.g., `e2e-aws-ovn`)
|
||||
- Store for constructing gather-extra path
|
||||
|
||||
### Step 5: Download Audit Logs and Pod Logs
|
||||
|
||||
1. **Construct gather-extra paths**
|
||||
- GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/`
|
||||
- Local path: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/`
|
||||
|
||||
2. **Download audit logs**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs
|
||||
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/audit_logs/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/ --no-user-output-enabled
|
||||
```
|
||||
- Create directory first to avoid gcloud errors
|
||||
- Use `--no-user-output-enabled` to suppress progress output
|
||||
- If directory not found, warn: "No audit logs found. Job may not have completed or audit logging may be disabled."
|
||||
|
||||
3. **Download pod logs**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods
|
||||
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/pods/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/ --no-user-output-enabled
|
||||
```
|
||||
- Create directory first to avoid gcloud errors
|
||||
- Use `--no-user-output-enabled` to suppress progress output
|
||||
- If directory not found, warn: "No pod logs found."
|
||||
|
||||
### Step 6: Parse Audit Logs and Pod Logs
|
||||
|
||||
**IMPORTANT: Use the provided Python script `parse_all_logs.py` from the skill directory to parse both audit logs and pod logs efficiently.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py <resource_pattern> \
|
||||
.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs \
|
||||
.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods \
|
||||
> .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json
|
||||
```
|
||||
|
||||
**Resource Pattern Parameter:**
|
||||
- The `<resource_pattern>` parameter supports **regex patterns**
|
||||
- Use `|` (pipe) to search for multiple resources: `resource1|resource2|resource3`
|
||||
- Use `.*` for wildcards: `e2e-test-project-.*`
|
||||
- Simple substring matching still works: `my-namespace`
|
||||
- Examples:
|
||||
- Single resource: `e2e-test-project-api-pkjxf`
|
||||
- Multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
||||
- Pattern matching: `e2e-test-project-api-.*`
|
||||
|
||||
**Note:** The script outputs status messages to stderr which will display as progress. The JSON output to stdout is clean and ready to use.
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Find all log files**
|
||||
- Audit logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/**/*.log`
|
||||
- Pod logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/**/*.log`
|
||||
|
||||
2. **Parse audit log files (JSONL format)**
|
||||
- Read file line by line
|
||||
- Each line is a JSON object (JSONL format)
|
||||
- Parse JSON into object `e`
|
||||
|
||||
3. **Extract fields from each audit log entry**
|
||||
- `e.verb` - action (get, list, create, update, patch, delete, watch)
|
||||
- `e.user.username` - user making request
|
||||
- `e.responseStatus.code` - HTTP response code (integer)
|
||||
- `e.objectRef.namespace` - namespace (if namespaced)
|
||||
- `e.objectRef.resource` - lowercase plural kind (e.g., "pods", "configmaps")
|
||||
- `e.objectRef.name` - resource name
|
||||
- `e.requestReceivedTimestamp` - ISO 8601 timestamp
|
||||
|
||||
4. **Filter matches for each resource spec**
|
||||
- Uses **regex matching** on `e.objectRef.namespace` and `e.objectRef.name`
|
||||
- Pattern matches if found in either namespace or name field
|
||||
- Supports all regex features:
|
||||
- Pipe operator: `resource1|resource2` matches either resource
|
||||
- Wildcards: `e2e-test-.*` matches all resources starting with `e2e-test-`
|
||||
- Character classes: `[abc]` matches a, b, or c
|
||||
- Simple substring matching still works for patterns without regex special chars
|
||||
- Performance optimization: plain strings use fast substring search
|
||||
|
||||
5. **For each audit log match, capture**
|
||||
- **Source**: "audit"
|
||||
- **Filename**: Full path to .log file
|
||||
- **Line number**: Line number in file (1-indexed)
|
||||
- **Level**: Based on `e.responseStatus.code`
|
||||
- 200-299: "info"
|
||||
- 400-499: "warn"
|
||||
- 500-599: "error"
|
||||
- **Timestamp**: Parse `e.requestReceivedTimestamp` to datetime
|
||||
- **Content**: Full JSON line (for expandable details)
|
||||
- **Summary**: Generate formatted summary
|
||||
- Format: `{verb} {resource}/{name} in {namespace} by {username} → HTTP {code}`
|
||||
- Example: `create pod/etcd-0 in openshift-etcd by system:serviceaccount:kube-system:deployment-controller → HTTP 201`
|
||||
|
||||
6. **Parse pod log files (plain text format)**
|
||||
- Read file line by line
|
||||
- Each line is plain text (not JSON)
|
||||
- Search for resource pattern in line content
|
||||
|
||||
7. **For each pod log match, capture**
|
||||
- **Source**: "pod"
|
||||
- **Filename**: Full path to .log file
|
||||
- **Line number**: Line number in file (1-indexed)
|
||||
- **Level**: Detect from glog format or default to "info"
|
||||
- Glog format: `E0910 11:43:41.153414 ...` (E=error, W=warn, I=info, F=fatal→error)
|
||||
- Non-glog format: default to "info"
|
||||
- **Timestamp**: Extract from start of line if present (format: `YYYY-MM-DDTHH:MM:SS.mmmmmmZ`)
|
||||
- **Content**: Full log line
|
||||
- **Summary**: First 200 characters of line (after timestamp if present)
|
||||
|
||||
8. **Combine and sort all entries**
|
||||
- Merge audit log entries and pod log entries
|
||||
- Sort all entries chronologically by timestamp
|
||||
- Entries without timestamps are placed at the end
|
||||
|
||||
### Step 7: Generate HTML Report
|
||||
|
||||
**IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \
|
||||
.work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json \
|
||||
"{prowjob_name}" \
|
||||
"{build_id}" \
|
||||
"{target}" \
|
||||
"{resource_pattern}" \
|
||||
"{gcsweb_url}"
|
||||
```
|
||||
|
||||
**Resource Pattern Parameter:**
|
||||
- The `{resource_pattern}` should be the **same pattern used in the parse script**
|
||||
- For single resources: `e2e-test-project-api-pkjxf`
|
||||
- For multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
||||
- The script will parse the pattern to display the searched resources in the HTML header
|
||||
|
||||
**Output:** The script generates `.work/prow-job-analyze-resource/{build_id}/{first_resource_name}.html`
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Determine report filename**
|
||||
- Format: `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
- Uses the primary resource name for the filename
|
||||
|
||||
2. **Sort all entries by timestamp**
|
||||
- Loads audit log entries from JSON
|
||||
- Sort chronologically (ascending)
|
||||
- Entries without timestamps go at the end
|
||||
|
||||
3. **Calculate timeline bounds**
|
||||
- min_time: Earliest timestamp found
|
||||
- max_time: Latest timestamp found
|
||||
- Time range: max_time - min_time
|
||||
|
||||
4. **Generate HTML structure**
|
||||
|
||||
**Header Section:**
|
||||
```html
|
||||
<div class="header">
|
||||
<h1>Prow Job Resource Lifecycle Analysis</h1>
|
||||
<div class="metadata">
|
||||
<p><strong>Prow Job:</strong> {prowjob-name}</p>
|
||||
<p><strong>Build ID:</strong> {build_id}</p>
|
||||
<p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
|
||||
<p><strong>Target:</strong> {target}</p>
|
||||
<p><strong>Resources:</strong> {resource-list}</p>
|
||||
<p><strong>Total Entries:</strong> {count}</p>
|
||||
<p><strong>Time Range:</strong> {min_time} to {max_time}</p>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**Interactive Timeline:**
|
||||
```html
|
||||
<div class="timeline-container">
|
||||
<svg id="timeline" width="100%" height="100">
|
||||
<!-- For each entry, render colored vertical line -->
|
||||
<line x1="{position}%" y1="0" x2="{position}%" y2="100"
|
||||
stroke="{color}" stroke-width="2"
|
||||
class="timeline-event" data-entry-id="{entry-id}"
|
||||
title="{summary}">
|
||||
</line>
|
||||
</svg>
|
||||
</div>
|
||||
```
|
||||
- Position: Calculate percentage based on timestamp between min_time and max_time
|
||||
- Color: white/lightgray (info), yellow (warn), red (error)
|
||||
- Clickable: Jump to corresponding entry
|
||||
- Tooltip on hover: Show summary
|
||||
|
||||
**Log Entries Section:**
|
||||
```html
|
||||
<div class="entries">
|
||||
<div class="filters">
|
||||
<!-- Filter controls: by level, by resource, by time range -->
|
||||
</div>
|
||||
|
||||
<div class="entry" id="entry-{index}">
|
||||
<div class="entry-header">
|
||||
<span class="timestamp">{formatted-timestamp}</span>
|
||||
<span class="level badge-{level}">{level}</span>
|
||||
<span class="source">{filename}:{line-number}</span>
|
||||
</div>
|
||||
<div class="entry-summary">{summary}</div>
|
||||
<details class="entry-details">
|
||||
<summary>Show full content</summary>
|
||||
<pre><code>{content}</code></pre>
|
||||
</details>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**CSS Styling:**
|
||||
- Modern, clean design with good contrast
|
||||
- Responsive layout
|
||||
- Badge colors: info=gray, warn=yellow, error=red
|
||||
- Monospace font for log content
|
||||
- Syntax highlighting for JSON (in audit logs)
|
||||
|
||||
**JavaScript Interactivity:**
|
||||
```javascript
|
||||
// Timeline click handler
|
||||
document.querySelectorAll('.timeline-event').forEach(el => {
|
||||
el.addEventListener('click', () => {
|
||||
const entryId = el.dataset.entryId;
|
||||
document.getElementById(entryId).scrollIntoView({behavior: 'smooth'});
|
||||
});
|
||||
});
|
||||
|
||||
// Filter controls
|
||||
// Expand/collapse details
|
||||
// Search within entries
|
||||
```
|
||||
|
||||
5. **Write HTML to file**
|
||||
- Script automatically writes to `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
- Includes proper HTML5 structure
|
||||
- All CSS and JavaScript are inline for portability
|
||||
|
||||
### Step 8: Present Results to User
|
||||
|
||||
1. **Display summary**
|
||||
```
|
||||
Resource Lifecycle Analysis Complete
|
||||
|
||||
Prow Job: {prowjob-name}
|
||||
Build ID: {build_id}
|
||||
Target: {target}
|
||||
|
||||
Resources Analyzed:
|
||||
- {resource-spec-1}
|
||||
- {resource-spec-2}
|
||||
...
|
||||
|
||||
Artifacts downloaded to: .work/prow-job-analyze-resource/{build_id}/logs/
|
||||
|
||||
Results:
|
||||
- Audit log entries: {audit-count}
|
||||
- Pod log entries: {pod-count}
|
||||
- Total entries: {total-count}
|
||||
- Time range: {min_time} to {max_time}
|
||||
|
||||
Report generated: .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
|
||||
|
||||
Open in browser to view interactive timeline and detailed entries.
|
||||
```
|
||||
|
||||
2. **Open report in browser**
|
||||
- Detect platform and automatically open the HTML report in the default browser
|
||||
- Linux: `xdg-open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
- macOS: `open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
- Windows: `start .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
||||
- On Linux (most common for this environment), use `xdg-open`
|
||||
|
||||
3. **Offer next steps**
|
||||
- Ask if user wants to search for additional resources in the same job
|
||||
- Ask if user wants to analyze a different Prow job
|
||||
- Explain that artifacts are cached in `.work/prow-job-analyze-resource/{build_id}/` for faster subsequent searches
|
||||
|
||||
## Error Handling
|
||||
|
||||
Handle these error scenarios gracefully:
|
||||
|
||||
1. **Invalid URL format**
|
||||
- Error: "URL must contain 'test-platform-results/' substring"
|
||||
- Provide example of valid URL
|
||||
|
||||
2. **Build ID not found**
|
||||
- Error: "Could not find build ID (10+ decimal digits) in URL path"
|
||||
- Explain requirement and show URL parsing
|
||||
|
||||
3. **gcloud not installed**
|
||||
- Detect with: `which gcloud`
|
||||
- Provide installation instructions for user's platform
|
||||
- Link: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
4. **gcloud not authenticated**
|
||||
- Detect with: `gcloud auth list`
|
||||
- Instruct: "Please run: gcloud auth login"
|
||||
|
||||
5. **No access to bucket**
|
||||
- Error from gcloud storage commands
|
||||
- Explain: "You need read access to the test-platform-results GCS bucket"
|
||||
- Suggest checking project access
|
||||
|
||||
6. **prowjob.json not found**
|
||||
- Suggest verifying URL and checking if job completed
|
||||
- Provide gcsweb URL for manual verification
|
||||
|
||||
7. **Not a ci-operator job**
|
||||
- Error: "This is not a ci-operator job. No --target found in prowjob.json."
|
||||
- Explain: Only ci-operator jobs can be analyzed by this skill
|
||||
|
||||
8. **gather-extra not found**
|
||||
- Warn: "gather-extra directory not found for target {target}"
|
||||
- Suggest: Job may not have completed or target name is incorrect
|
||||
|
||||
9. **No matches found**
|
||||
- Display: "No log entries found matching the specified resources"
|
||||
- Suggest:
|
||||
- Check resource names for typos
|
||||
- Try searching without kind or namespace filters
|
||||
- Verify resources existed during this job execution
|
||||
|
||||
10. **Timestamp parsing failures**
|
||||
- Warn about unparseable timestamps
|
||||
- Fall back to line order for sorting
|
||||
- Still include entries in report
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Avoid re-downloading**
|
||||
- Check if `.work/prow-job-analyze-resource/{build_id}/logs/` already has content
|
||||
- Ask user before re-downloading
|
||||
|
||||
2. **Efficient downloads**
|
||||
- Use `gcloud storage cp -r` for recursive downloads
|
||||
- Use `--no-user-output-enabled` to suppress verbose output
|
||||
- Create target directories with `mkdir -p` before downloading to avoid gcloud errors
|
||||
|
||||
3. **Memory efficiency**
|
||||
- The `parse_all_logs.py` script processes log files incrementally (line by line)
|
||||
- Don't load entire files into memory
|
||||
- Script outputs to JSON for efficient HTML generation
|
||||
|
||||
4. **Content length limits**
|
||||
- The HTML generator trims JSON content to ~2000 chars in display
|
||||
- Full content is available in expandable details sections
|
||||
|
||||
5. **Progress indicators**
|
||||
- Show "Downloading audit logs..." before gcloud commands
|
||||
- Show "Parsing audit logs..." before running parse script
|
||||
- Show "Generating HTML report..." before running report generator
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Search for a namespace/project
|
||||
```
|
||||
User: "Analyze e2e-test-project-api-p28m in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview/1964725888612306944"
|
||||
|
||||
Output:
|
||||
- Downloads artifacts to: .work/prow-job-analyze-resource/1964725888612306944/logs/
|
||||
- Finds actual resource name: e2e-test-project-api-p28mx (namespace)
|
||||
- Parses 382 audit log entries
|
||||
- Finds 86 pod log mentions
|
||||
- Creates: .work/prow-job-analyze-resource/1964725888612306944/e2e-test-project-api-p28mx.html
|
||||
- Shows timeline from creation (18:11:02) to deletion (18:17:32)
|
||||
```
|
||||
|
||||
### Example 2: Search for a pod
|
||||
```
|
||||
User: "Analyze pod/etcd-0 in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/"
|
||||
|
||||
Output:
|
||||
- Creates: .work/prow-job-analyze-resource/1978913325970362368/etcd-0.html
|
||||
- Shows timeline of all pod/etcd-0 events across namespaces
|
||||
```
|
||||
|
||||
### Example 3: Search by name only
|
||||
```
|
||||
User: "Find all resources named cluster-version-operator in job {url}"
|
||||
|
||||
Output:
|
||||
- Searches without kind filter
|
||||
- Finds deployments, pods, services, etc. all named cluster-version-operator
|
||||
- Creates: .work/prow-job-analyze-resource/{build_id}/cluster-version-operator.html
|
||||
```
|
||||
|
||||
### Example 4: Search for multiple resources using regex
|
||||
```
|
||||
User: "Analyze e2e-test-project-api-pkjxf and e2e-test-project-api-7zdxx in job {url}"
|
||||
|
||||
Output:
|
||||
- Uses regex pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
||||
- Finds all events for both namespaces in a single pass
|
||||
- Parses 1,047 total entries (501 for first namespace, 546 for second)
|
||||
- Passes the same pattern to generate_html_report.py
|
||||
- HTML displays: "Resources: e2e-test-project-api-7zdxx, e2e-test-project-api-pkjxf"
|
||||
- Creates: .work/prow-job-analyze-resource/{build_id}/e2e-test-project-api-pkjxf.html
|
||||
- Timeline shows interleaved events from both namespaces chronologically
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
- Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
|
||||
- Authentication is NOT required - the bucket is publicly accessible
|
||||
- Use `.work/prow-job-analyze-resource/{build_id}/` directory structure for organization
|
||||
- All work files are in `.work/` which is already in .gitignore
|
||||
- The Python scripts handle all parsing and HTML generation - use them!
|
||||
- Cache artifacts in `.work/prow-job-analyze-resource/{build_id}/` to speed up subsequent searches
|
||||
- The parse script supports **regex patterns** for flexible matching:
|
||||
- Use `resource1|resource2` to search for multiple resources in a single pass
|
||||
- Use `.*` wildcards to match resource name patterns
|
||||
- Simple substring matching still works for basic searches
|
||||
- The resource name provided by the user may not exactly match the actual resource name in logs
|
||||
- Example: User asks for `e2e-test-project-api-p28m` but actual resource is `e2e-test-project-api-p28mx`
|
||||
- Use regex patterns like `e2e-test-project-api-p28m.*` to find partial matches
|
||||
- For namespaces/projects, search for the resource name - it will match both `namespace` and `project` resources
|
||||
- Provide helpful error messages with actionable solutions
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Resource Name Matching:**
|
||||
- The parse script uses **regex pattern matching** for maximum flexibility
|
||||
- Supports pipe operator (`|`) to search for multiple resources: `resource1|resource2`
|
||||
- Supports wildcards (`.*`) for pattern matching: `e2e-test-.*`
|
||||
- Simple substrings still work for basic searches
|
||||
- May match multiple related resources (e.g., namespace, project, rolebindings in that namespace)
|
||||
- Report all matches - this provides complete lifecycle context
|
||||
|
||||
2. **Namespace vs Project:**
|
||||
- In OpenShift, a `project` is essentially a `namespace` with additional metadata
|
||||
- Searching for a namespace will find both namespace and project resources
|
||||
- The audit logs contain events for both resource types
|
||||
|
||||
3. **Target Extraction:**
|
||||
- Must extract the `--target` argument from prowjob.json
|
||||
- This is critical for finding the correct gather-extra path
|
||||
- Non-ci-operator jobs cannot be analyzed (they don't have --target)
|
||||
|
||||
4. **Working with Scripts:**
|
||||
- All scripts are in `plugins/prow-job/skills/prow-job-analyze-resource/`
|
||||
- `parse_all_logs.py` - Parses audit logs and pod logs, outputs JSON
|
||||
- Detects glog severity levels (E=error, W=warn, I=info, F=fatal)
|
||||
- Supports regex patterns for resource matching
|
||||
- `generate_html_report.py` - Generates interactive HTML report from JSON
|
||||
- Scripts output status messages to stderr for progress display. JSON output to stdout is clean.
|
||||
|
||||
5. **Pod Log Glog Format Support:**
|
||||
- The parser automatically detects and parses glog format logs
|
||||
- Glog format: `E0910 11:43:41.153414 ...`
|
||||
- `E` = severity (E/F → error, W → warn, I → info)
|
||||
- `0910` = month/day (MMDD)
|
||||
- `11:43:41.153414` = time with microseconds
|
||||
- Timestamp parsing: Extracts timestamp and infers year (2025)
|
||||
- Severity mapping allows filtering by level in HTML report
|
||||
- Non-glog logs default to info level
|
||||
421
skills/prow-job-analyze-resource/create_context_html_files.py
Normal file
421
skills/prow-job-analyze-resource/create_context_html_files.py
Normal file
@@ -0,0 +1,421 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Create HTML files for log viewing with line numbers, regex filtering, and line selection.
|
||||
For files >1MB, creates context files with ±1000 lines around each referenced line.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import hashlib
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
|
||||
# HTML template for viewing log files
|
||||
HTML_TEMPLATE = '''<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>{title}</title>
|
||||
<style>
|
||||
body {{
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
background: #161b22;
|
||||
color: #c9d1d9;
|
||||
font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, 'Cascadia Mono', 'Segoe UI Mono', monospace;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
}}
|
||||
.filter-bar {{
|
||||
position: sticky;
|
||||
top: 0;
|
||||
background: #0d1117;
|
||||
border-bottom: 1px solid #30363d;
|
||||
padding: 8px 16px;
|
||||
z-index: 100;
|
||||
}}
|
||||
.filter-input-wrapper {{
|
||||
position: relative;
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
}}
|
||||
.filter-input {{
|
||||
flex: 1;
|
||||
padding: 6px 10px;
|
||||
background: #0d1117;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 4px;
|
||||
color: #c9d1d9;
|
||||
font-size: 12px;
|
||||
font-family: ui-monospace, SFMono-Regular, monospace;
|
||||
}}
|
||||
.filter-input:focus {{
|
||||
outline: none;
|
||||
border-color: #58a6ff;
|
||||
}}
|
||||
.clear-btn {{
|
||||
padding: 6px 12px;
|
||||
background: #21262d;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 4px;
|
||||
color: #c9d1d9;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
white-space: nowrap;
|
||||
}}
|
||||
.clear-btn:hover {{
|
||||
background: #30363d;
|
||||
border-color: #58a6ff;
|
||||
}}
|
||||
.filter-error {{
|
||||
color: #f85149;
|
||||
font-size: 11px;
|
||||
margin-top: 4px;
|
||||
display: none;
|
||||
}}
|
||||
.filter-error.visible {{
|
||||
display: block;
|
||||
}}
|
||||
.context-notice {{
|
||||
background: #1c2128;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 4px;
|
||||
padding: 8px 12px;
|
||||
margin: 8px 16px;
|
||||
color: #8b949e;
|
||||
font-size: 11px;
|
||||
}}
|
||||
.context-notice strong {{
|
||||
color: #58a6ff;
|
||||
}}
|
||||
.content-wrapper {{
|
||||
padding: 16px;
|
||||
}}
|
||||
pre {{
|
||||
margin: 0;
|
||||
white-space: pre-wrap;
|
||||
word-wrap: break-word;
|
||||
}}
|
||||
.line-number {{
|
||||
color: #6e7681;
|
||||
user-select: none;
|
||||
margin-right: 16px;
|
||||
display: inline-block;
|
||||
}}
|
||||
.line {{
|
||||
display: block;
|
||||
cursor: pointer;
|
||||
}}
|
||||
.line:hover {{
|
||||
background: rgba(139, 148, 158, 0.1);
|
||||
}}
|
||||
.line.hidden {{
|
||||
display: none;
|
||||
}}
|
||||
.line.match {{
|
||||
background: rgba(88, 166, 255, 0.15);
|
||||
}}
|
||||
.line.selected {{
|
||||
background: rgba(187, 128, 9, 0.25);
|
||||
border-left: 3px solid #d29922;
|
||||
padding-left: 13px;
|
||||
}}
|
||||
.line.selected.match {{
|
||||
background: rgba(187, 128, 9, 0.25);
|
||||
}}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="filter-bar">
|
||||
<div class="filter-input-wrapper">
|
||||
<input type="text" class="filter-input" id="filter" placeholder="Filter lines by regex (e.g., error|warning, ^INFO.*)">
|
||||
<button class="clear-btn" id="clear-btn" title="Clear filter (Ctrl+C)">Clear</button>
|
||||
</div>
|
||||
<div class="filter-error" id="filter-error">Invalid regex pattern</div>
|
||||
</div>
|
||||
{context_notice}
|
||||
<div class="content-wrapper">
|
||||
<pre id="content">{content}</pre>
|
||||
</div>
|
||||
<script>
|
||||
const filterInput = document.getElementById('filter');
|
||||
const filterError = document.getElementById('filter-error');
|
||||
const clearBtn = document.getElementById('clear-btn');
|
||||
const content = document.getElementById('content');
|
||||
let filterTimeout;
|
||||
let selectedLine = null;
|
||||
|
||||
// Wrap each line in a span for filtering
|
||||
const lines = content.innerHTML.split('\\n');
|
||||
const wrappedLines = lines.map(line => `<span class="line">${{line}}</span>`).join('');
|
||||
content.innerHTML = wrappedLines;
|
||||
|
||||
// Line selection handler
|
||||
content.addEventListener('click', function(e) {{
|
||||
const clickedLine = e.target.closest('.line');
|
||||
if (clickedLine) {{
|
||||
// Remove previous selection
|
||||
if (selectedLine) {{
|
||||
selectedLine.classList.remove('selected');
|
||||
}}
|
||||
// Select new line
|
||||
selectedLine = clickedLine;
|
||||
selectedLine.classList.add('selected');
|
||||
}}
|
||||
}});
|
||||
|
||||
// Clear filter function
|
||||
function clearFilter() {{
|
||||
filterInput.value = '';
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
const lineElements = content.querySelectorAll('.line');
|
||||
lineElements.forEach(line => {{
|
||||
line.classList.remove('hidden', 'match');
|
||||
}});
|
||||
|
||||
// Scroll to selected line if exists
|
||||
if (selectedLine) {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}
|
||||
}}
|
||||
|
||||
// Clear button click handler
|
||||
clearBtn.addEventListener('click', clearFilter);
|
||||
|
||||
// Ctrl+C hotkey to clear filter
|
||||
document.addEventListener('keydown', function(e) {{
|
||||
if (e.ctrlKey && e.key === 'c') {{
|
||||
// Only clear if filter input is not focused (to allow normal copy)
|
||||
if (document.activeElement !== filterInput) {{
|
||||
e.preventDefault();
|
||||
clearFilter();
|
||||
}}
|
||||
}}
|
||||
}});
|
||||
|
||||
// Filter input handler
|
||||
filterInput.addEventListener('input', function() {{
|
||||
clearTimeout(filterTimeout);
|
||||
filterTimeout = setTimeout(() => {{
|
||||
const pattern = this.value.trim();
|
||||
const lineElements = content.querySelectorAll('.line');
|
||||
|
||||
if (!pattern) {{
|
||||
// Show all lines
|
||||
lineElements.forEach(line => {{
|
||||
line.classList.remove('hidden', 'match');
|
||||
}});
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
// Scroll to selected line if exists
|
||||
if (selectedLine) {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}
|
||||
return;
|
||||
}}
|
||||
|
||||
try {{
|
||||
const regex = new RegExp(pattern);
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
lineElements.forEach(line => {{
|
||||
// Get text content without line number span
|
||||
const textContent = line.textContent;
|
||||
const lineNumberMatch = textContent.match(/^\\s*\\d+\\s+/);
|
||||
const actualContent = lineNumberMatch ? textContent.substring(lineNumberMatch[0].length) : textContent;
|
||||
|
||||
if (regex.test(actualContent)) {{
|
||||
line.classList.remove('hidden');
|
||||
line.classList.add('match');
|
||||
}} else {{
|
||||
line.classList.add('hidden');
|
||||
line.classList.remove('match');
|
||||
}}
|
||||
}});
|
||||
}} catch (e) {{
|
||||
filterError.classList.add('visible');
|
||||
}}
|
||||
}}, 300);
|
||||
}});
|
||||
|
||||
// Select line from hash on load
|
||||
if (window.location.hash) {{
|
||||
const lineNum = window.location.hash.substring(1).replace('line-', '');
|
||||
const lineNumElement = document.getElementById('linenum-' + lineNum);
|
||||
if (lineNumElement) {{
|
||||
const lineElement = lineNumElement.closest('.line');
|
||||
if (lineElement) {{
|
||||
selectedLine = lineElement;
|
||||
selectedLine.classList.add('selected');
|
||||
setTimeout(() => {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}, 100);
|
||||
}}
|
||||
}}
|
||||
}}
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
'''
|
||||
|
||||
def create_html_for_file(file_path, logs_dir, build_id, line_numbers=None, context_lines=1000):
|
||||
"""
|
||||
Create an HTML file for viewing a log file.
|
||||
|
||||
Args:
|
||||
file_path: Absolute path to the log file
|
||||
logs_dir: Base logs directory
|
||||
build_id: Build ID
|
||||
line_numbers: List of line numbers to include (for large files). If None, includes all lines.
|
||||
context_lines: Number of lines before/after each line_number to include (default 1000)
|
||||
|
||||
Returns:
|
||||
Tuple of (relative_path_key, html_file_path) or None if file should be skipped
|
||||
"""
|
||||
file_size = os.path.getsize(file_path)
|
||||
relative_path = os.path.relpath(file_path, logs_dir)
|
||||
|
||||
# For small files (<1MB), create full HTML
|
||||
if file_size < 1024 * 1024:
|
||||
line_numbers = None # Include all lines
|
||||
|
||||
# Read the file and extract lines
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='replace') as f:
|
||||
all_lines = f.readlines()
|
||||
except Exception as e:
|
||||
print(f"Error reading {file_path}: {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
# Determine which lines to include
|
||||
if line_numbers is None:
|
||||
# Include all lines (small file)
|
||||
lines_to_include = set(range(1, len(all_lines) + 1))
|
||||
context_notice = ''
|
||||
else:
|
||||
# Include context around each line number (large file)
|
||||
lines_to_include = set()
|
||||
for line_num in line_numbers:
|
||||
start = max(1, line_num - context_lines)
|
||||
end = min(len(all_lines), line_num + context_lines)
|
||||
lines_to_include.update(range(start, end + 1))
|
||||
|
||||
# Create context notice
|
||||
line_ranges = []
|
||||
sorted_lines = sorted(lines_to_include)
|
||||
if sorted_lines:
|
||||
range_start = sorted_lines[0]
|
||||
range_end = sorted_lines[0]
|
||||
for line in sorted_lines[1:]:
|
||||
if line == range_end + 1:
|
||||
range_end = line
|
||||
else:
|
||||
line_ranges.append(f"{range_start}-{range_end}" if range_start != range_end else str(range_start))
|
||||
range_start = line
|
||||
range_end = line
|
||||
line_ranges.append(f"{range_start}-{range_end}" if range_start != range_end else str(range_start))
|
||||
|
||||
context_notice = f'''<div class="context-notice">
|
||||
<strong>Context View:</strong> Showing {len(lines_to_include):,} of {len(all_lines):,} lines
|
||||
(±{context_lines} lines around {len(line_numbers)} reference points).
|
||||
Full file is {file_size / (1024 * 1024):.1f}MB.
|
||||
</div>'''
|
||||
|
||||
# Build HTML content with line numbers
|
||||
html_lines = []
|
||||
for i, line in enumerate(all_lines, 1):
|
||||
if i in lines_to_include:
|
||||
# Escape HTML characters
|
||||
line_content = line.rstrip('\n').replace('&', '&').replace('<', '<').replace('>', '>')
|
||||
html_lines.append(f'<span class="line-number" id="linenum-{i}">{i:>5}</span> {line_content}')
|
||||
|
||||
content = '\n'.join(html_lines)
|
||||
|
||||
# Generate unique filename based on content and line selection
|
||||
if line_numbers is None:
|
||||
# For full files, use simple hash of path
|
||||
hash_str = hashlib.md5(relative_path.encode()).hexdigest()[:8]
|
||||
suffix = ''
|
||||
else:
|
||||
# For context files, include line numbers in hash
|
||||
hash_input = f"{relative_path}:{','.join(map(str, sorted(line_numbers)))}"
|
||||
hash_str = hashlib.md5(hash_input.encode()).hexdigest()[:8]
|
||||
suffix = f"-ctx{len(line_numbers)}"
|
||||
|
||||
filename = os.path.basename(file_path)
|
||||
html_filename = f"{filename}{suffix}.{hash_str}.html"
|
||||
|
||||
# Create _links directory
|
||||
links_dir = os.path.join(logs_dir, "_links")
|
||||
os.makedirs(links_dir, exist_ok=True)
|
||||
|
||||
html_path = os.path.join(links_dir, html_filename)
|
||||
relative_html_path = f"logs/_links/{html_filename}"
|
||||
|
||||
# Generate HTML
|
||||
title = filename
|
||||
html = HTML_TEMPLATE.format(
|
||||
title=title,
|
||||
context_notice=context_notice,
|
||||
content=content
|
||||
)
|
||||
|
||||
# Write HTML file
|
||||
with open(html_path, 'w', encoding='utf-8') as f:
|
||||
f.write(html)
|
||||
|
||||
return (relative_path, relative_html_path)
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: create_context_html_files.py <logs_dir> <build_id> [entries_json]", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
logs_dir = sys.argv[1]
|
||||
build_id = sys.argv[2]
|
||||
entries_json = sys.argv[3] if len(sys.argv) > 3 else None
|
||||
|
||||
# Load entries to get line numbers per file
|
||||
file_line_numbers = defaultdict(set)
|
||||
if entries_json:
|
||||
with open(entries_json, 'r') as f:
|
||||
entries = json.load(f)
|
||||
|
||||
for entry in entries:
|
||||
filename = entry.get('filename', '')
|
||||
line_num = entry.get('line_number', 0)
|
||||
if filename and line_num:
|
||||
file_line_numbers[filename].add(line_num)
|
||||
|
||||
# Collect all log files
|
||||
log_files = []
|
||||
for root, dirs, files in os.walk(logs_dir):
|
||||
# Skip _links directory
|
||||
if '_links' in root:
|
||||
continue
|
||||
for file in files:
|
||||
if file.endswith('.log') or file.endswith('.jsonl'):
|
||||
log_files.append(os.path.join(root, file))
|
||||
|
||||
# Create HTML files
|
||||
file_mapping = {}
|
||||
for log_file in log_files:
|
||||
# Get line numbers for this file (if any)
|
||||
line_nums = file_line_numbers.get(log_file)
|
||||
if line_nums:
|
||||
line_nums = sorted(list(line_nums))
|
||||
else:
|
||||
line_nums = None
|
||||
|
||||
result = create_html_for_file(log_file, logs_dir, build_id, line_nums)
|
||||
if result:
|
||||
relative_path, html_path = result
|
||||
file_mapping[relative_path] = html_path
|
||||
|
||||
# Output JSON mapping to stdout
|
||||
print(json.dumps(file_mapping, indent=2))
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
333
skills/prow-job-analyze-resource/create_inline_html_files.py
Normal file
333
skills/prow-job-analyze-resource/create_inline_html_files.py
Normal file
@@ -0,0 +1,333 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Create HTML files with line numbers for inline viewing."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import hashlib
|
||||
import html as html_module
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def create_html_files_for_logs(logs_dir, build_id):
|
||||
"""Create .html files with line numbers for log files under 1MB."""
|
||||
MAX_INLINE_SIZE = 1 * 1024 * 1024 # 1MB
|
||||
links_dir = os.path.join(logs_dir, '_links')
|
||||
|
||||
# Create _links directory if it doesn't exist
|
||||
os.makedirs(links_dir, exist_ok=True)
|
||||
|
||||
html_count = 0
|
||||
file_mapping = {} # Map from original path to HTML path
|
||||
|
||||
# Walk through all log files
|
||||
for root, dirs, filenames in os.walk(logs_dir):
|
||||
# Skip the _links directory itself
|
||||
if '_links' in root:
|
||||
continue
|
||||
|
||||
for filename in filenames:
|
||||
file_path = os.path.join(root, filename)
|
||||
|
||||
try:
|
||||
# Get file size
|
||||
size = os.path.getsize(file_path)
|
||||
|
||||
if size < MAX_INLINE_SIZE:
|
||||
# Get relative path from logs_dir
|
||||
rel_path = os.path.relpath(file_path, logs_dir)
|
||||
|
||||
# Generate unique HTML name by hashing the full path
|
||||
path_hash = hashlib.md5(rel_path.encode()).hexdigest()[:8]
|
||||
html_name = f"{filename}.{path_hash}.html"
|
||||
html_path = os.path.join(links_dir, html_name)
|
||||
|
||||
# Read original file content
|
||||
with open(file_path, 'r', encoding='utf-8', errors='replace') as f:
|
||||
content = f.read()
|
||||
|
||||
# Split into lines and add line numbers
|
||||
lines = content.split('\n')
|
||||
line_count = len(lines)
|
||||
line_number_width = len(str(line_count))
|
||||
|
||||
# Build content with line numbers
|
||||
numbered_lines = []
|
||||
for i, line in enumerate(lines, 1):
|
||||
escaped_line = html_module.escape(line)
|
||||
line_num = str(i).rjust(line_number_width)
|
||||
numbered_lines.append(f'<span class="line-number" id="linenum-{i}">{line_num}</span> {escaped_line}')
|
||||
|
||||
numbered_content = '\n'.join(numbered_lines)
|
||||
|
||||
# Wrap in HTML
|
||||
html_content = f'''<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>{html_module.escape(filename)}</title>
|
||||
<style>
|
||||
body {{
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
background: #161b22;
|
||||
color: #c9d1d9;
|
||||
font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, 'Cascadia Mono', 'Segoe UI Mono', monospace;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
}}
|
||||
.filter-bar {{
|
||||
position: sticky;
|
||||
top: 0;
|
||||
background: #0d1117;
|
||||
border-bottom: 1px solid #30363d;
|
||||
padding: 8px 16px;
|
||||
z-index: 100;
|
||||
}}
|
||||
.filter-input-wrapper {{
|
||||
position: relative;
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
}}
|
||||
.filter-input {{
|
||||
flex: 1;
|
||||
padding: 6px 10px;
|
||||
background: #0d1117;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 4px;
|
||||
color: #c9d1d9;
|
||||
font-size: 12px;
|
||||
font-family: ui-monospace, SFMono-Regular, monospace;
|
||||
}}
|
||||
.filter-input:focus {{
|
||||
outline: none;
|
||||
border-color: #58a6ff;
|
||||
}}
|
||||
.clear-btn {{
|
||||
padding: 6px 12px;
|
||||
background: #21262d;
|
||||
border: 1px solid #30363d;
|
||||
border-radius: 4px;
|
||||
color: #c9d1d9;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
white-space: nowrap;
|
||||
}}
|
||||
.clear-btn:hover {{
|
||||
background: #30363d;
|
||||
border-color: #58a6ff;
|
||||
}}
|
||||
.filter-error {{
|
||||
color: #f85149;
|
||||
font-size: 11px;
|
||||
margin-top: 4px;
|
||||
display: none;
|
||||
}}
|
||||
.filter-error.visible {{
|
||||
display: block;
|
||||
}}
|
||||
.content-wrapper {{
|
||||
padding: 16px;
|
||||
}}
|
||||
pre {{
|
||||
margin: 0;
|
||||
white-space: pre-wrap;
|
||||
word-wrap: break-word;
|
||||
}}
|
||||
.line-number {{
|
||||
color: #6e7681;
|
||||
user-select: none;
|
||||
margin-right: 16px;
|
||||
display: inline-block;
|
||||
}}
|
||||
.line {{
|
||||
display: block;
|
||||
cursor: pointer;
|
||||
}}
|
||||
.line:hover {{
|
||||
background: rgba(139, 148, 158, 0.1);
|
||||
}}
|
||||
.line.hidden {{
|
||||
display: none;
|
||||
}}
|
||||
.line.match {{
|
||||
background: rgba(88, 166, 255, 0.15);
|
||||
}}
|
||||
.line.selected {{
|
||||
background: rgba(187, 128, 9, 0.25);
|
||||
border-left: 3px solid #d29922;
|
||||
padding-left: 13px;
|
||||
}}
|
||||
.line.selected.match {{
|
||||
background: rgba(187, 128, 9, 0.25);
|
||||
}}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="filter-bar">
|
||||
<div class="filter-input-wrapper">
|
||||
<input type="text" class="filter-input" id="filter" placeholder="Filter lines by regex (e.g., error|warning, ^INFO.*)">
|
||||
<button class="clear-btn" id="clear-btn" title="Clear filter (Ctrl+C)">Clear</button>
|
||||
</div>
|
||||
<div class="filter-error" id="filter-error">Invalid regex pattern</div>
|
||||
</div>
|
||||
<div class="content-wrapper">
|
||||
<pre id="content">{numbered_content}</pre>
|
||||
</div>
|
||||
<script>
|
||||
const filterInput = document.getElementById('filter');
|
||||
const filterError = document.getElementById('filter-error');
|
||||
const clearBtn = document.getElementById('clear-btn');
|
||||
const content = document.getElementById('content');
|
||||
let filterTimeout;
|
||||
let selectedLine = null;
|
||||
|
||||
// Wrap each line in a span for filtering
|
||||
const lines = content.innerHTML.split('\\n');
|
||||
const wrappedLines = lines.map(line => `<span class="line">${{line}}</span>`).join('');
|
||||
content.innerHTML = wrappedLines;
|
||||
|
||||
// Line selection handler
|
||||
content.addEventListener('click', function(e) {{
|
||||
const clickedLine = e.target.closest('.line');
|
||||
if (clickedLine) {{
|
||||
// Remove previous selection
|
||||
if (selectedLine) {{
|
||||
selectedLine.classList.remove('selected');
|
||||
}}
|
||||
// Select new line
|
||||
selectedLine = clickedLine;
|
||||
selectedLine.classList.add('selected');
|
||||
}}
|
||||
}});
|
||||
|
||||
// Clear filter function
|
||||
function clearFilter() {{
|
||||
filterInput.value = '';
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
const lineElements = content.querySelectorAll('.line');
|
||||
lineElements.forEach(line => {{
|
||||
line.classList.remove('hidden', 'match');
|
||||
}});
|
||||
|
||||
// Scroll to selected line if exists
|
||||
if (selectedLine) {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}
|
||||
}}
|
||||
|
||||
// Clear button click handler
|
||||
clearBtn.addEventListener('click', clearFilter);
|
||||
|
||||
// Ctrl+C hotkey to clear filter
|
||||
document.addEventListener('keydown', function(e) {{
|
||||
if (e.ctrlKey && e.key === 'c') {{
|
||||
// Only clear if filter input is not focused (to allow normal copy)
|
||||
if (document.activeElement !== filterInput) {{
|
||||
e.preventDefault();
|
||||
clearFilter();
|
||||
}}
|
||||
}}
|
||||
}});
|
||||
|
||||
// Filter input handler
|
||||
filterInput.addEventListener('input', function() {{
|
||||
clearTimeout(filterTimeout);
|
||||
filterTimeout = setTimeout(() => {{
|
||||
const pattern = this.value.trim();
|
||||
const lineElements = content.querySelectorAll('.line');
|
||||
|
||||
if (!pattern) {{
|
||||
// Show all lines
|
||||
lineElements.forEach(line => {{
|
||||
line.classList.remove('hidden', 'match');
|
||||
}});
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
// Scroll to selected line if exists
|
||||
if (selectedLine) {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}
|
||||
return;
|
||||
}}
|
||||
|
||||
try {{
|
||||
const regex = new RegExp(pattern);
|
||||
filterError.classList.remove('visible');
|
||||
|
||||
lineElements.forEach(line => {{
|
||||
// Get text content without line number span
|
||||
const textContent = line.textContent;
|
||||
const lineNumberMatch = textContent.match(/^\\s*\\d+\\s+/);
|
||||
const actualContent = lineNumberMatch ? textContent.substring(lineNumberMatch[0].length) : textContent;
|
||||
|
||||
if (regex.test(actualContent)) {{
|
||||
line.classList.remove('hidden');
|
||||
line.classList.add('match');
|
||||
}} else {{
|
||||
line.classList.add('hidden');
|
||||
line.classList.remove('match');
|
||||
}}
|
||||
}});
|
||||
}} catch (e) {{
|
||||
filterError.classList.add('visible');
|
||||
}}
|
||||
}}, 300);
|
||||
}});
|
||||
|
||||
// Select line from hash on load
|
||||
if (window.location.hash) {{
|
||||
const lineNum = window.location.hash.substring(1).replace('line-', '');
|
||||
const lineNumElement = document.getElementById('linenum-' + lineNum);
|
||||
if (lineNumElement) {{
|
||||
const lineElement = lineNumElement.closest('.line');
|
||||
if (lineElement) {{
|
||||
selectedLine = lineElement;
|
||||
selectedLine.classList.add('selected');
|
||||
setTimeout(() => {{
|
||||
selectedLine.scrollIntoView({{ behavior: 'smooth', block: 'center' }});
|
||||
}}, 100);
|
||||
}}
|
||||
}}
|
||||
}}
|
||||
</script>
|
||||
</body>
|
||||
</html>'''
|
||||
|
||||
# Write HTML file
|
||||
with open(html_path, 'w', encoding='utf-8') as f:
|
||||
f.write(html_content)
|
||||
|
||||
# Store mapping
|
||||
rel_html_path = f"logs/_links/{html_name}"
|
||||
file_mapping[rel_path] = rel_html_path
|
||||
html_count += 1
|
||||
|
||||
except Exception as e:
|
||||
print(f"WARNING: Could not create HTML for {file_path}: {e}", file=sys.stderr)
|
||||
|
||||
print(f"Created {html_count} .html files for inline viewing", file=sys.stderr)
|
||||
return file_mapping
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: create_inline_html_files.py <logs_dir> <build_id>")
|
||||
sys.exit(1)
|
||||
|
||||
logs_dir = sys.argv[1]
|
||||
build_id = sys.argv[2]
|
||||
|
||||
if not os.path.exists(logs_dir):
|
||||
print(f"ERROR: Logs directory not found: {logs_dir}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
file_mapping = create_html_files_for_logs(logs_dir, build_id)
|
||||
|
||||
# Output mapping as JSON for use by other scripts
|
||||
import json
|
||||
print(json.dumps(file_mapping))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
1197
skills/prow-job-analyze-resource/generate_html_report.py
Normal file
1197
skills/prow-job-analyze-resource/generate_html_report.py
Normal file
File diff suppressed because it is too large
Load Diff
265
skills/prow-job-analyze-resource/generate_report.py
Executable file
265
skills/prow-job-analyze-resource/generate_report.py
Executable file
@@ -0,0 +1,265 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Generate HTML report from parsed audit and pod log entries.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Optional
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
def parse_timestamp(ts_str: Optional[str]) -> Optional[datetime]:
|
||||
"""Parse timestamp string to datetime object."""
|
||||
if not ts_str:
|
||||
return None
|
||||
|
||||
# Try various formats
|
||||
formats = [
|
||||
'%Y-%m-%dT%H:%M:%S.%fZ', # RFC3339 with microseconds
|
||||
'%Y-%m-%dT%H:%M:%SZ', # RFC3339 without microseconds
|
||||
'%Y-%m-%d %H:%M:%S.%f', # Common with microseconds
|
||||
'%Y-%m-%d %H:%M:%S', # Common without microseconds
|
||||
]
|
||||
|
||||
for fmt in formats:
|
||||
try:
|
||||
return datetime.strptime(ts_str, fmt)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def calculate_timeline_position(timestamp: Optional[str], min_time: datetime, max_time: datetime) -> float:
|
||||
"""
|
||||
Calculate position on timeline (0-100%).
|
||||
|
||||
Args:
|
||||
timestamp: ISO timestamp string
|
||||
min_time: Earliest timestamp
|
||||
max_time: Latest timestamp
|
||||
|
||||
Returns:
|
||||
Position as percentage (0-100)
|
||||
"""
|
||||
if not timestamp:
|
||||
return 100.0 # Put entries without timestamps at the end
|
||||
|
||||
dt = parse_timestamp(timestamp)
|
||||
if not dt:
|
||||
return 100.0
|
||||
|
||||
if max_time == min_time:
|
||||
return 50.0
|
||||
|
||||
time_range = (max_time - min_time).total_seconds()
|
||||
position = (dt - min_time).total_seconds()
|
||||
|
||||
return (position / time_range) * 100.0
|
||||
|
||||
|
||||
def get_level_color(level: str) -> str:
|
||||
"""Get SVG color for log level."""
|
||||
colors = {
|
||||
'info': '#3498db',
|
||||
'warn': '#f39c12',
|
||||
'error': '#e74c3c',
|
||||
}
|
||||
return colors.get(level, '#95a5a6')
|
||||
|
||||
|
||||
def format_timestamp(ts_str: Optional[str]) -> str:
|
||||
"""Format timestamp for display."""
|
||||
if not ts_str:
|
||||
return 'N/A'
|
||||
|
||||
dt = parse_timestamp(ts_str)
|
||||
if not dt:
|
||||
return ts_str
|
||||
|
||||
return dt.strftime('%Y-%m-%d %H:%M:%S')
|
||||
|
||||
|
||||
def generate_timeline_events(entries: List[Dict], min_time: datetime, max_time: datetime) -> str:
|
||||
"""Generate SVG elements for timeline events."""
|
||||
svg_lines = []
|
||||
|
||||
for i, entry in enumerate(entries):
|
||||
timestamp = entry.get('timestamp')
|
||||
level = entry.get('level', 'info')
|
||||
summary = entry.get('summary', '')
|
||||
|
||||
position = calculate_timeline_position(timestamp, min_time, max_time)
|
||||
color = get_level_color(level)
|
||||
|
||||
# Create vertical line
|
||||
svg_line = (
|
||||
f'<line x1="{position}%" y1="0" x2="{position}%" y2="100" '
|
||||
f'stroke="{color}" stroke-width="2" '
|
||||
f'class="timeline-event" data-entry-id="entry-{i}" '
|
||||
f'opacity="0.7">'
|
||||
f'<title>{summary[:100]}</title>'
|
||||
f'</line>'
|
||||
)
|
||||
svg_lines.append(svg_line)
|
||||
|
||||
return '\n'.join(svg_lines)
|
||||
|
||||
|
||||
def generate_entries_html(entries: List[Dict]) -> str:
|
||||
"""Generate HTML for all log entries."""
|
||||
html_parts = []
|
||||
|
||||
for i, entry in enumerate(entries):
|
||||
timestamp = entry.get('timestamp')
|
||||
level = entry.get('level', 'info')
|
||||
filename = entry.get('filename', 'unknown')
|
||||
line_number = entry.get('line_number', 0)
|
||||
summary = entry.get('summary', '')
|
||||
content = entry.get('content', '')
|
||||
|
||||
# Escape HTML in content
|
||||
content = content.replace('&', '&').replace('<', '<').replace('>', '>')
|
||||
|
||||
entry_html = f'''
|
||||
<div class="entry" id="entry-{i}" data-level="{level}">
|
||||
<div class="entry-header">
|
||||
<span class="timestamp">{format_timestamp(timestamp)}</span>
|
||||
<span class="level {level}">{level}</span>
|
||||
<span class="source">{filename}:{line_number}</span>
|
||||
</div>
|
||||
<div class="entry-summary">{summary}</div>
|
||||
<details class="entry-details">
|
||||
<summary>Show full content</summary>
|
||||
<pre><code>{content}</code></pre>
|
||||
</details>
|
||||
</div>
|
||||
'''
|
||||
html_parts.append(entry_html)
|
||||
|
||||
return '\n'.join(html_parts)
|
||||
|
||||
|
||||
def generate_report(
|
||||
template_path: Path,
|
||||
output_path: Path,
|
||||
metadata: Dict,
|
||||
entries: List[Dict]
|
||||
) -> None:
|
||||
"""
|
||||
Generate HTML report from template and data.
|
||||
|
||||
Args:
|
||||
template_path: Path to HTML template
|
||||
output_path: Path to write output HTML
|
||||
metadata: Metadata dict with prowjob info
|
||||
entries: List of log entry dicts (combined audit + pod logs)
|
||||
"""
|
||||
# Read template
|
||||
with open(template_path, 'r') as f:
|
||||
template = f.read()
|
||||
|
||||
# Sort entries by timestamp
|
||||
entries_with_time = []
|
||||
entries_without_time = []
|
||||
|
||||
for entry in entries:
|
||||
ts_str = entry.get('timestamp')
|
||||
dt = parse_timestamp(ts_str)
|
||||
if dt:
|
||||
entries_with_time.append((dt, entry))
|
||||
else:
|
||||
entries_without_time.append(entry)
|
||||
|
||||
entries_with_time.sort(key=lambda x: x[0])
|
||||
sorted_entries = [e for _, e in entries_with_time] + entries_without_time
|
||||
|
||||
# Calculate timeline bounds
|
||||
if entries_with_time:
|
||||
min_time = entries_with_time[0][0]
|
||||
max_time = entries_with_time[-1][0]
|
||||
time_range = f"{min_time.strftime('%Y-%m-%d %H:%M:%S')} to {max_time.strftime('%Y-%m-%d %H:%M:%S')}"
|
||||
else:
|
||||
min_time = datetime.now()
|
||||
max_time = datetime.now()
|
||||
time_range = "N/A"
|
||||
|
||||
# Count entries by type and level
|
||||
audit_count = sum(1 for e in entries if 'verb' in e or 'http_code' in e)
|
||||
pod_count = len(entries) - audit_count
|
||||
error_count = sum(1 for e in entries if e.get('level') == 'error')
|
||||
|
||||
# Generate timeline events
|
||||
timeline_events = generate_timeline_events(sorted_entries, min_time, max_time)
|
||||
|
||||
# Generate entries HTML
|
||||
entries_html = generate_entries_html(sorted_entries)
|
||||
|
||||
# Replace template variables
|
||||
replacements = {
|
||||
'{{prowjob_name}}': metadata.get('prowjob_name', 'Unknown'),
|
||||
'{{build_id}}': metadata.get('build_id', 'Unknown'),
|
||||
'{{original_url}}': metadata.get('original_url', '#'),
|
||||
'{{target}}': metadata.get('target', 'Unknown'),
|
||||
'{{resources}}': ', '.join(metadata.get('resources', [])),
|
||||
'{{time_range}}': time_range,
|
||||
'{{total_entries}}': str(len(entries)),
|
||||
'{{audit_entries}}': str(audit_count),
|
||||
'{{pod_entries}}': str(pod_count),
|
||||
'{{error_count}}': str(error_count),
|
||||
'{{min_time}}': min_time.strftime('%Y-%m-%d %H:%M:%S') if entries_with_time else 'N/A',
|
||||
'{{max_time}}': max_time.strftime('%Y-%m-%d %H:%M:%S') if entries_with_time else 'N/A',
|
||||
'{{timeline_events}}': timeline_events,
|
||||
'{{entries}}': entries_html,
|
||||
}
|
||||
|
||||
html = template
|
||||
for key, value in replacements.items():
|
||||
html = html.replace(key, value)
|
||||
|
||||
# Write output
|
||||
with open(output_path, 'w') as f:
|
||||
f.write(html)
|
||||
|
||||
print(f"Report generated: {output_path}")
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Generate HTML report from JSON data.
|
||||
|
||||
Usage: generate_report.py <template> <output> <metadata.json> <audit_entries.json> <pod_entries.json>
|
||||
"""
|
||||
if len(sys.argv) != 6:
|
||||
print("Usage: generate_report.py <template> <output> <metadata.json> <audit_entries.json> <pod_entries.json>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
template_path = Path(sys.argv[1])
|
||||
output_path = Path(sys.argv[2])
|
||||
metadata_path = Path(sys.argv[3])
|
||||
audit_entries_path = Path(sys.argv[4])
|
||||
pod_entries_path = Path(sys.argv[5])
|
||||
|
||||
# Load data
|
||||
with open(metadata_path, 'r') as f:
|
||||
metadata = json.load(f)
|
||||
|
||||
with open(audit_entries_path, 'r') as f:
|
||||
audit_entries = json.load(f)
|
||||
|
||||
with open(pod_entries_path, 'r') as f:
|
||||
pod_entries = json.load(f)
|
||||
|
||||
# Combine entries
|
||||
all_entries = audit_entries + pod_entries
|
||||
|
||||
# Generate report
|
||||
generate_report(template_path, output_path, metadata, all_entries)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
247
skills/prow-job-analyze-resource/parse_all_logs.py
Normal file
247
skills/prow-job-analyze-resource/parse_all_logs.py
Normal file
@@ -0,0 +1,247 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Parse audit logs and pod logs for resource lifecycle analysis."""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import re
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any
|
||||
|
||||
def parse_timestamp(ts_str: str) -> datetime:
|
||||
"""Parse various timestamp formats."""
|
||||
if not ts_str:
|
||||
return None
|
||||
|
||||
try:
|
||||
# ISO 8601 with Z
|
||||
return datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
|
||||
except:
|
||||
pass
|
||||
|
||||
try:
|
||||
# Standard datetime
|
||||
return datetime.strptime(ts_str, '%Y-%m-%d %H:%M:%S')
|
||||
except:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
def parse_audit_logs(log_files: List[str], resource_pattern: str) -> List[Dict[str, Any]]:
|
||||
"""Parse audit log files for matching resource entries."""
|
||||
entries = []
|
||||
|
||||
# Compile regex pattern for efficient matching
|
||||
pattern_regex = re.compile(resource_pattern)
|
||||
|
||||
for log_file in log_files:
|
||||
try:
|
||||
with open(log_file, 'r') as f:
|
||||
line_num = 0
|
||||
for line in f:
|
||||
line_num += 1
|
||||
# Quick substring check first for performance (only if pattern has no regex chars)
|
||||
if '|' not in resource_pattern and '.*' not in resource_pattern and '[' not in resource_pattern:
|
||||
if resource_pattern not in line:
|
||||
continue
|
||||
else:
|
||||
# For regex patterns, check if pattern matches the line
|
||||
if not pattern_regex.search(line):
|
||||
continue
|
||||
|
||||
try:
|
||||
entry = json.loads(line.strip())
|
||||
|
||||
# Extract relevant fields
|
||||
verb = entry.get('verb', '')
|
||||
user = entry.get('user', {}).get('username', 'unknown')
|
||||
response_code = entry.get('responseStatus', {}).get('code', 0)
|
||||
obj_ref = entry.get('objectRef', {})
|
||||
namespace = obj_ref.get('namespace', '')
|
||||
resource_type = obj_ref.get('resource', '')
|
||||
name = obj_ref.get('name', '')
|
||||
timestamp_str = entry.get('requestReceivedTimestamp', '')
|
||||
|
||||
# Skip if doesn't match the pattern (using regex)
|
||||
if not (pattern_regex.search(namespace) or pattern_regex.search(name)):
|
||||
continue
|
||||
|
||||
# Determine log level based on response code
|
||||
if 200 <= response_code < 300:
|
||||
level = 'info'
|
||||
elif 400 <= response_code < 500:
|
||||
level = 'warn'
|
||||
elif 500 <= response_code < 600:
|
||||
level = 'error'
|
||||
else:
|
||||
level = 'info'
|
||||
|
||||
# Parse timestamp
|
||||
timestamp = parse_timestamp(timestamp_str)
|
||||
|
||||
# Generate summary
|
||||
summary = f"{verb} {resource_type}"
|
||||
if name:
|
||||
summary += f"/{name}"
|
||||
if namespace and namespace != name:
|
||||
summary += f" in {namespace}"
|
||||
summary += f" by {user} → HTTP {response_code}"
|
||||
|
||||
entries.append({
|
||||
'source': 'audit',
|
||||
'filename': log_file,
|
||||
'line_number': line_num,
|
||||
'level': level,
|
||||
'timestamp': timestamp,
|
||||
'timestamp_str': timestamp_str,
|
||||
'content': line.strip(),
|
||||
'summary': summary,
|
||||
'verb': verb,
|
||||
'resource_type': resource_type,
|
||||
'namespace': namespace,
|
||||
'name': name,
|
||||
'user': user,
|
||||
'response_code': response_code
|
||||
})
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f"Error processing {log_file}: {e}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
return entries
|
||||
|
||||
def parse_pod_logs(log_files: List[str], resource_pattern: str) -> List[Dict[str, Any]]:
|
||||
"""Parse pod log files for matching resource mentions."""
|
||||
entries = []
|
||||
|
||||
# Common timestamp patterns in pod logs
|
||||
timestamp_pattern = re.compile(r'^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}[.\d]*Z?)')
|
||||
|
||||
# Glog format: E0910 11:43:41.153414 ... or W1234 12:34:56.123456 ...
|
||||
# Format: <severity><MMDD> <HH:MM:SS.microseconds>
|
||||
# Capture: severity, month, day, time
|
||||
glog_pattern = re.compile(r'^([EIWF])(\d{2})(\d{2})\s+(\d{2}:\d{2}:\d{2}\.\d+)')
|
||||
|
||||
# Compile resource pattern regex for efficient matching
|
||||
pattern_regex = re.compile(resource_pattern)
|
||||
|
||||
for log_file in log_files:
|
||||
try:
|
||||
with open(log_file, 'r', encoding='utf-8', errors='replace') as f:
|
||||
line_num = 0
|
||||
for line in f:
|
||||
line_num += 1
|
||||
# Quick substring check first for performance (only if pattern has no regex chars)
|
||||
if '|' not in resource_pattern and '.*' not in resource_pattern and '[' not in resource_pattern:
|
||||
if resource_pattern not in line:
|
||||
continue
|
||||
else:
|
||||
# For regex patterns, use regex search
|
||||
if not pattern_regex.search(line):
|
||||
continue
|
||||
|
||||
# Detect log level and timestamp from glog format
|
||||
level = 'info' # Default level
|
||||
timestamp_str = ''
|
||||
timestamp = None
|
||||
timestamp_end = 0 # Track where timestamp ends for summary extraction
|
||||
|
||||
glog_match = glog_pattern.match(line)
|
||||
if glog_match:
|
||||
severity = glog_match.group(1)
|
||||
month = glog_match.group(2)
|
||||
day = glog_match.group(3)
|
||||
time_part = glog_match.group(4)
|
||||
|
||||
# Map glog severity to our level scheme
|
||||
if severity == 'E' or severity == 'F': # Error or Fatal
|
||||
level = 'error'
|
||||
elif severity == 'W': # Warning
|
||||
level = 'warn'
|
||||
elif severity == 'I': # Info
|
||||
level = 'info'
|
||||
|
||||
# Parse glog timestamp - glog doesn't include year, so we infer it
|
||||
# Use 2025 as a reasonable default (current test year based on audit logs)
|
||||
# In production, you might want to get this from the prowjob metadata
|
||||
year = 2025
|
||||
timestamp_str = f"{year}-{month}-{day}T{time_part}Z"
|
||||
timestamp = parse_timestamp(timestamp_str)
|
||||
timestamp_end = glog_match.end()
|
||||
else:
|
||||
# Try ISO 8601 format for non-glog logs
|
||||
match = timestamp_pattern.match(line)
|
||||
if match:
|
||||
timestamp_str = match.group(1)
|
||||
timestamp = parse_timestamp(timestamp_str)
|
||||
timestamp_end = match.end()
|
||||
|
||||
# Generate summary - use first 200 chars of line (after timestamp)
|
||||
if timestamp_end > 0:
|
||||
summary = line[timestamp_end:].strip()[:200]
|
||||
else:
|
||||
summary = line.strip()[:200]
|
||||
|
||||
entries.append({
|
||||
'source': 'pod',
|
||||
'filename': log_file,
|
||||
'line_number': line_num,
|
||||
'level': level,
|
||||
'timestamp': timestamp,
|
||||
'timestamp_str': timestamp_str,
|
||||
'content': line.strip(),
|
||||
'summary': summary,
|
||||
'verb': '', # Pod logs don't have verbs
|
||||
'resource_type': '',
|
||||
'namespace': '',
|
||||
'name': '',
|
||||
'user': '',
|
||||
'response_code': 0
|
||||
})
|
||||
except Exception as e:
|
||||
print(f"Error processing pod log {log_file}: {e}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
return entries
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 4:
|
||||
print("Usage: parse_all_logs.py <resource_pattern> <audit_logs_dir> <pods_dir>")
|
||||
print(" resource_pattern: Regex pattern to match resource names (e.g., 'resource1|resource2')")
|
||||
sys.exit(1)
|
||||
|
||||
resource_pattern = sys.argv[1]
|
||||
audit_logs_dir = Path(sys.argv[2])
|
||||
pods_dir = Path(sys.argv[3])
|
||||
|
||||
# Find all audit log files
|
||||
audit_log_files = list(audit_logs_dir.glob('**/*.log'))
|
||||
print(f"Found {len(audit_log_files)} audit log files", file=sys.stderr)
|
||||
|
||||
# Find all pod log files
|
||||
pod_log_files = list(pods_dir.glob('**/*.log'))
|
||||
print(f"Found {len(pod_log_files)} pod log files", file=sys.stderr)
|
||||
|
||||
# Parse audit logs
|
||||
audit_entries = parse_audit_logs([str(f) for f in audit_log_files], resource_pattern)
|
||||
print(f"Found {len(audit_entries)} matching audit log entries", file=sys.stderr)
|
||||
|
||||
# Parse pod logs
|
||||
pod_entries = parse_pod_logs([str(f) for f in pod_log_files], resource_pattern)
|
||||
print(f"Found {len(pod_entries)} matching pod log entries", file=sys.stderr)
|
||||
|
||||
# Combine and sort by timestamp
|
||||
all_entries = audit_entries + pod_entries
|
||||
# Use a large datetime with timezone for sorting entries without timestamps
|
||||
from datetime import timezone
|
||||
max_datetime = datetime(9999, 12, 31, tzinfo=timezone.utc)
|
||||
all_entries.sort(key=lambda x: x['timestamp'] if x['timestamp'] else max_datetime)
|
||||
|
||||
print(f"Total {len(all_entries)} entries", file=sys.stderr)
|
||||
|
||||
# Output as JSON
|
||||
print(json.dumps(all_entries, default=str, indent=2))
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
97
skills/prow-job-analyze-resource/parse_audit_logs.py
Normal file
97
skills/prow-job-analyze-resource/parse_audit_logs.py
Normal file
@@ -0,0 +1,97 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Parse audit logs for resource lifecycle analysis."""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Any
|
||||
|
||||
def parse_audit_logs(log_files: List[str], resource_name: str) -> List[Dict[str, Any]]:
|
||||
"""Parse audit log files for matching resource entries."""
|
||||
entries = []
|
||||
|
||||
for log_file in log_files:
|
||||
try:
|
||||
with open(log_file, 'r') as f:
|
||||
line_num = 0
|
||||
for line in f:
|
||||
line_num += 1
|
||||
try:
|
||||
entry = json.loads(line.strip())
|
||||
|
||||
# Check if this entry matches our resource
|
||||
obj_ref = entry.get('objectRef', {})
|
||||
if obj_ref.get('name') == resource_name:
|
||||
# Extract relevant fields
|
||||
verb = entry.get('verb', '')
|
||||
user = entry.get('user', {}).get('username', 'unknown')
|
||||
response_code = entry.get('responseStatus', {}).get('code', 0)
|
||||
namespace = obj_ref.get('namespace', '')
|
||||
resource_type = obj_ref.get('resource', '')
|
||||
timestamp_str = entry.get('requestReceivedTimestamp', '')
|
||||
|
||||
# Determine log level based on response code
|
||||
if 200 <= response_code < 300:
|
||||
level = 'info'
|
||||
elif 400 <= response_code < 500:
|
||||
level = 'warn'
|
||||
elif 500 <= response_code < 600:
|
||||
level = 'error'
|
||||
else:
|
||||
level = 'info'
|
||||
|
||||
# Parse timestamp
|
||||
timestamp = None
|
||||
if timestamp_str:
|
||||
try:
|
||||
timestamp = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
|
||||
except:
|
||||
pass
|
||||
|
||||
# Generate summary
|
||||
summary = f"{verb} {resource_type}/{resource_name}"
|
||||
if namespace:
|
||||
summary += f" in {namespace}"
|
||||
summary += f" by {user} → HTTP {response_code}"
|
||||
|
||||
entries.append({
|
||||
'filename': log_file,
|
||||
'line_number': line_num,
|
||||
'level': level,
|
||||
'timestamp': timestamp,
|
||||
'timestamp_str': timestamp_str,
|
||||
'content': line.strip(),
|
||||
'summary': summary,
|
||||
'verb': verb,
|
||||
'resource_type': resource_type,
|
||||
'namespace': namespace,
|
||||
'user': user,
|
||||
'response_code': response_code
|
||||
})
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
except Exception as e:
|
||||
print(f"Error processing {log_file}: {e}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
return entries
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: parse_audit_logs.py <resource_name> <log_file1> [log_file2 ...]")
|
||||
sys.exit(1)
|
||||
|
||||
resource_name = sys.argv[1]
|
||||
log_files = sys.argv[2:]
|
||||
|
||||
entries = parse_audit_logs(log_files, resource_name)
|
||||
|
||||
# Sort by timestamp
|
||||
entries.sort(key=lambda x: x['timestamp'] if x['timestamp'] else datetime.max)
|
||||
|
||||
# Output as JSON
|
||||
print(json.dumps(entries, default=str, indent=2))
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
281
skills/prow-job-analyze-resource/parse_pod_logs.py
Executable file
281
skills/prow-job-analyze-resource/parse_pod_logs.py
Executable file
@@ -0,0 +1,281 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Parse unstructured pod logs and search for resource references.
|
||||
"""
|
||||
|
||||
import re
|
||||
import sys
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Optional, Tuple
|
||||
from dataclasses import dataclass, asdict
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
@dataclass
|
||||
class ResourceSpec:
|
||||
"""Specification for a resource to search for."""
|
||||
name: str
|
||||
kind: Optional[str] = None
|
||||
namespace: Optional[str] = None
|
||||
|
||||
@classmethod
|
||||
def from_string(cls, spec_str: str) -> 'ResourceSpec':
|
||||
"""Parse resource spec from string format: [namespace:][kind/]name"""
|
||||
namespace = None
|
||||
kind = None
|
||||
name = spec_str
|
||||
|
||||
if ':' in spec_str:
|
||||
namespace, rest = spec_str.split(':', 1)
|
||||
spec_str = rest
|
||||
|
||||
if '/' in spec_str:
|
||||
kind, name = spec_str.split('/', 1)
|
||||
|
||||
return cls(name=name, kind=kind, namespace=namespace)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PodLogEntry:
|
||||
"""Parsed pod log entry with metadata."""
|
||||
filename: str
|
||||
line_number: int
|
||||
timestamp: Optional[str]
|
||||
level: str # info, warn, error
|
||||
content: str # Full line
|
||||
summary: str
|
||||
|
||||
|
||||
# Common timestamp patterns
|
||||
TIMESTAMP_PATTERNS = [
|
||||
# glog: I1016 21:35:33.920070
|
||||
(r'^([IWEF])(\d{4})\s+(\d{2}:\d{2}:\d{2}\.\d+)', 'glog'),
|
||||
# RFC3339: 2025-10-16T21:35:33.920070Z
|
||||
(r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?Z?)', 'rfc3339'),
|
||||
# Common: 2025-10-16 21:35:33
|
||||
(r'(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})', 'common'),
|
||||
# Syslog: Oct 16 21:35:33
|
||||
(r'((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})', 'syslog'),
|
||||
]
|
||||
|
||||
|
||||
# Log level patterns
|
||||
LEVEL_PATTERNS = [
|
||||
# glog levels
|
||||
(r'^[I]', 'info'),
|
||||
(r'^[W]', 'warn'),
|
||||
(r'^[EF]', 'error'),
|
||||
# Standard levels
|
||||
(r'\bINFO\b', 'info'),
|
||||
(r'\b(?:WARN|WARNING)\b', 'warn'),
|
||||
(r'\b(?:ERROR|ERR|FATAL)\b', 'error'),
|
||||
]
|
||||
|
||||
|
||||
def parse_timestamp(line: str) -> Tuple[Optional[str], str]:
|
||||
"""
|
||||
Parse timestamp from log line.
|
||||
|
||||
Returns:
|
||||
Tuple of (timestamp_str, timestamp_format) or (None, 'unknown')
|
||||
"""
|
||||
for pattern, fmt in TIMESTAMP_PATTERNS:
|
||||
match = re.search(pattern, line)
|
||||
if match:
|
||||
if fmt == 'glog':
|
||||
# glog format: LMMDD HH:MM:SS.microseconds
|
||||
# Extract date and time parts
|
||||
month_day = match.group(2)
|
||||
time_part = match.group(3)
|
||||
# Approximate year (use current year)
|
||||
year = datetime.now().year
|
||||
# Parse MMDD
|
||||
month = month_day[:2]
|
||||
day = month_day[2:]
|
||||
timestamp = f"{year}-{month}-{day} {time_part}"
|
||||
return timestamp, fmt
|
||||
else:
|
||||
return match.group(1), fmt
|
||||
|
||||
return None, 'unknown'
|
||||
|
||||
|
||||
def parse_level(line: str) -> str:
|
||||
"""Parse log level from line. Returns 'info' if not detected."""
|
||||
for pattern, level in LEVEL_PATTERNS:
|
||||
if re.search(pattern, line, re.IGNORECASE):
|
||||
return level
|
||||
return 'info'
|
||||
|
||||
|
||||
def build_search_pattern(spec: ResourceSpec) -> re.Pattern:
|
||||
"""
|
||||
Build regex pattern for searching pod logs.
|
||||
|
||||
Args:
|
||||
spec: ResourceSpec to build pattern for
|
||||
|
||||
Returns:
|
||||
Compiled regex pattern (case-insensitive)
|
||||
"""
|
||||
if spec.kind:
|
||||
# Pattern: {kind}i?e?s?/{name}
|
||||
# This matches: pod/etcd-0, pods/etcd-0
|
||||
kind_pattern = spec.kind + r'i?e?s?'
|
||||
pattern = rf'{kind_pattern}/{re.escape(spec.name)}'
|
||||
else:
|
||||
# Just search for name
|
||||
pattern = re.escape(spec.name)
|
||||
|
||||
return re.compile(pattern, re.IGNORECASE)
|
||||
|
||||
|
||||
def generate_summary(line: str, spec: ResourceSpec) -> str:
|
||||
"""
|
||||
Generate a contextual summary from the log line.
|
||||
|
||||
Args:
|
||||
line: Full log line
|
||||
spec: ResourceSpec that matched
|
||||
|
||||
Returns:
|
||||
Summary string
|
||||
"""
|
||||
# Remove common prefixes (timestamps, log levels)
|
||||
clean_line = line
|
||||
|
||||
# Remove timestamps
|
||||
for pattern, _ in TIMESTAMP_PATTERNS:
|
||||
clean_line = re.sub(pattern, '', clean_line)
|
||||
|
||||
# Remove log level markers
|
||||
clean_line = re.sub(r'^[IWEF]\s*', '', clean_line)
|
||||
clean_line = re.sub(r'\b(?:INFO|WARN|WARNING|ERROR|ERR|FATAL)\b:?\s*', '', clean_line, flags=re.IGNORECASE)
|
||||
|
||||
# Trim and limit length
|
||||
clean_line = clean_line.strip()
|
||||
if len(clean_line) > 200:
|
||||
clean_line = clean_line[:197] + '...'
|
||||
|
||||
return clean_line if clean_line else "Log entry mentioning resource"
|
||||
|
||||
|
||||
def parse_pod_log_file(filepath: Path, resource_specs: List[ResourceSpec]) -> List[PodLogEntry]:
|
||||
"""
|
||||
Parse a single pod log file and extract matching entries.
|
||||
|
||||
Args:
|
||||
filepath: Path to pod log file
|
||||
resource_specs: List of resource specifications to match
|
||||
|
||||
Returns:
|
||||
List of matching PodLogEntry objects
|
||||
"""
|
||||
entries = []
|
||||
|
||||
# Build search patterns for each resource spec
|
||||
patterns = [(spec, build_search_pattern(spec)) for spec in resource_specs]
|
||||
|
||||
try:
|
||||
with open(filepath, 'r', errors='ignore') as f:
|
||||
for line_num, line in enumerate(f, start=1):
|
||||
line = line.rstrip('\n')
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# Check if line matches any pattern
|
||||
for spec, pattern in patterns:
|
||||
if pattern.search(line):
|
||||
# Parse timestamp
|
||||
timestamp, _ = parse_timestamp(line)
|
||||
|
||||
# Parse level
|
||||
level = parse_level(line)
|
||||
|
||||
# Generate summary
|
||||
summary = generate_summary(line, spec)
|
||||
|
||||
# Trim content if too long
|
||||
content = line
|
||||
if len(content) > 500:
|
||||
content = content[:497] + '...'
|
||||
|
||||
entry = PodLogEntry(
|
||||
filename=str(filepath),
|
||||
line_number=line_num,
|
||||
timestamp=timestamp,
|
||||
level=level,
|
||||
content=content,
|
||||
summary=summary
|
||||
)
|
||||
entries.append(entry)
|
||||
break # Only match once per line
|
||||
|
||||
except Exception as e:
|
||||
print(f"Warning: Error reading {filepath}: {e}", file=sys.stderr)
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
def find_pod_log_files(base_path: Path) -> List[Path]:
|
||||
"""Find all .log files in pods directory recursively."""
|
||||
log_files = []
|
||||
|
||||
artifacts_path = base_path / "artifacts"
|
||||
if artifacts_path.exists():
|
||||
for target_dir in artifacts_path.iterdir():
|
||||
if target_dir.is_dir():
|
||||
pods_dir = target_dir / "gather-extra" / "artifacts" / "pods"
|
||||
if pods_dir.exists():
|
||||
log_files.extend(pods_dir.rglob("*.log"))
|
||||
|
||||
return sorted(log_files)
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Parse pod logs from command line arguments.
|
||||
|
||||
Usage: parse_pod_logs.py <base_path> <resource_spec1> [<resource_spec2> ...]
|
||||
|
||||
Example: parse_pod_logs.py ./1978913325970362368/logs pod/etcd-0 configmap/cluster-config
|
||||
"""
|
||||
if len(sys.argv) < 3:
|
||||
print("Usage: parse_pod_logs.py <base_path> <resource_spec1> [<resource_spec2> ...]", file=sys.stderr)
|
||||
print("Example: parse_pod_logs.py ./1978913325970362368/logs pod/etcd-0", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
base_path = Path(sys.argv[1])
|
||||
resource_spec_strs = sys.argv[2:]
|
||||
|
||||
# Parse resource specs
|
||||
resource_specs = [ResourceSpec.from_string(spec) for spec in resource_spec_strs]
|
||||
|
||||
# Find pod log files
|
||||
log_files = find_pod_log_files(base_path)
|
||||
|
||||
if not log_files:
|
||||
print(f"Warning: No pod log files found in {base_path}", file=sys.stderr)
|
||||
print(json.dumps([]))
|
||||
return 0
|
||||
|
||||
print(f"Found {len(log_files)} pod log files", file=sys.stderr)
|
||||
|
||||
# Parse all log files
|
||||
all_entries = []
|
||||
for log_file in log_files:
|
||||
entries = parse_pod_log_file(log_file, resource_specs)
|
||||
all_entries.extend(entries)
|
||||
|
||||
print(f"Found {len(all_entries)} matching pod log entries", file=sys.stderr)
|
||||
|
||||
# Output as JSON
|
||||
entries_json = [asdict(entry) for entry in all_entries]
|
||||
print(json.dumps(entries_json, indent=2))
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
99
skills/prow-job-analyze-resource/parse_url.py
Executable file
99
skills/prow-job-analyze-resource/parse_url.py
Executable file
@@ -0,0 +1,99 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Parse and validate Prow job URLs from gcsweb.
|
||||
Extracts build_id, prowjob name, and GCS paths.
|
||||
"""
|
||||
|
||||
import re
|
||||
import sys
|
||||
import json
|
||||
from urllib.parse import urlparse
|
||||
|
||||
|
||||
def parse_prowjob_url(url):
|
||||
"""
|
||||
Parse a Prow job URL and extract relevant information.
|
||||
|
||||
Args:
|
||||
url: gcsweb URL containing test-platform-results
|
||||
|
||||
Returns:
|
||||
dict with keys: bucket_path, build_id, prowjob_name, gcs_base_path
|
||||
|
||||
Raises:
|
||||
ValueError: if URL format is invalid
|
||||
"""
|
||||
# Find test-platform-results in URL
|
||||
if 'test-platform-results/' not in url:
|
||||
raise ValueError(
|
||||
"URL must contain 'test-platform-results/' substring.\n"
|
||||
"Example: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/"
|
||||
"test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/"
|
||||
)
|
||||
|
||||
# Extract path after test-platform-results/
|
||||
bucket_path = url.split('test-platform-results/')[1]
|
||||
|
||||
# Remove trailing slash if present
|
||||
bucket_path = bucket_path.rstrip('/')
|
||||
|
||||
# Find build_id: at least 10 consecutive decimal digits delimited by /
|
||||
build_id_pattern = r'/(\d{10,})(?:/|$)'
|
||||
match = re.search(build_id_pattern, bucket_path)
|
||||
|
||||
if not match:
|
||||
raise ValueError(
|
||||
f"Could not find build ID (10+ decimal digits) in URL path.\n"
|
||||
f"Bucket path: {bucket_path}\n"
|
||||
f"Expected pattern: /NNNNNNNNNN/ where N is a digit"
|
||||
)
|
||||
|
||||
build_id = match.group(1)
|
||||
|
||||
# Extract prowjob name: path segment immediately before build_id
|
||||
# Split bucket_path by / and find segment before build_id
|
||||
path_segments = bucket_path.split('/')
|
||||
|
||||
try:
|
||||
build_id_index = path_segments.index(build_id)
|
||||
if build_id_index == 0:
|
||||
raise ValueError("Build ID cannot be the first path segment")
|
||||
prowjob_name = path_segments[build_id_index - 1]
|
||||
except (ValueError, IndexError):
|
||||
raise ValueError(
|
||||
f"Could not extract prowjob name from path.\n"
|
||||
f"Build ID: {build_id}\n"
|
||||
f"Path segments: {path_segments}"
|
||||
)
|
||||
|
||||
# Construct GCS base path
|
||||
gcs_base_path = f"gs://test-platform-results/{bucket_path}/"
|
||||
|
||||
return {
|
||||
'bucket_path': bucket_path,
|
||||
'build_id': build_id,
|
||||
'prowjob_name': prowjob_name,
|
||||
'gcs_base_path': gcs_base_path,
|
||||
'original_url': url
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
"""Parse URL from command line argument and output JSON."""
|
||||
if len(sys.argv) != 2:
|
||||
print("Usage: parse_url.py <prowjob-url>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
url = sys.argv[1]
|
||||
|
||||
try:
|
||||
result = parse_prowjob_url(url)
|
||||
print(json.dumps(result, indent=2))
|
||||
return 0
|
||||
except ValueError as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
323
skills/prow-job-analyze-resource/prow_job_resource_grep.sh
Executable file
323
skills/prow-job-analyze-resource/prow_job_resource_grep.sh
Executable file
@@ -0,0 +1,323 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Main orchestration script for Prow Job Resource Grep
|
||||
#
|
||||
# Usage: prow_job_resource_grep.sh <prowjob-url> <resource-spec1> [<resource-spec2> ...]
|
||||
#
|
||||
# Example: prow_job_resource_grep.sh \
|
||||
# "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/" \
|
||||
# pod/etcd-0
|
||||
#
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Helper functions
|
||||
log_info() {
|
||||
echo -e "${BLUE}INFO:${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}SUCCESS:${NC} $1"
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo -e "${YELLOW}WARNING:${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}ERROR:${NC} $1" >&2
|
||||
}
|
||||
|
||||
# Check prerequisites
|
||||
check_prerequisites() {
|
||||
log_info "Checking prerequisites..."
|
||||
|
||||
# Check Python 3
|
||||
if ! command -v python3 &> /dev/null; then
|
||||
log_error "python3 is required but not installed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check gcloud
|
||||
if ! command -v gcloud &> /dev/null; then
|
||||
log_error "gcloud CLI is required but not installed"
|
||||
log_error "Install from: https://cloud.google.com/sdk/docs/install"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check gcloud authentication
|
||||
if ! gcloud auth list --filter=status:ACTIVE --format="value(account)" &> /dev/null; then
|
||||
log_error "gcloud is not authenticated"
|
||||
log_error "Please run: gcloud auth login"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "Prerequisites check passed"
|
||||
}
|
||||
|
||||
# Parse and validate URL
|
||||
parse_url() {
|
||||
local url="$1"
|
||||
log_info "Parsing Prow job URL..."
|
||||
|
||||
local metadata_file="${WORK_DIR}/metadata.json"
|
||||
|
||||
if ! python3 "${SCRIPT_DIR}/parse_url.py" "$url" > "$metadata_file"; then
|
||||
log_error "Failed to parse URL"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Extract values from metadata
|
||||
BUILD_ID=$(jq -r '.build_id' "$metadata_file")
|
||||
PROWJOB_NAME=$(jq -r '.prowjob_name' "$metadata_file")
|
||||
GCS_BASE_PATH=$(jq -r '.gcs_base_path' "$metadata_file")
|
||||
BUCKET_PATH=$(jq -r '.bucket_path' "$metadata_file")
|
||||
|
||||
log_success "Build ID: $BUILD_ID"
|
||||
log_success "Prowjob: $PROWJOB_NAME"
|
||||
}
|
||||
|
||||
# Create working directory
|
||||
create_work_dir() {
|
||||
log_info "Creating working directory: ${BUILD_ID}/"
|
||||
|
||||
mkdir -p "${BUILD_ID}/logs"
|
||||
WORK_DIR="${BUILD_ID}/logs"
|
||||
|
||||
# Check if artifacts already exist
|
||||
if [ -d "${WORK_DIR}/artifacts" ] && [ "$(ls -A ${WORK_DIR}/artifacts)" ]; then
|
||||
read -p "Artifacts already exist for build ${BUILD_ID}. Re-download? (y/n) " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
SKIP_DOWNLOAD=true
|
||||
log_info "Skipping download, using existing artifacts"
|
||||
else
|
||||
SKIP_DOWNLOAD=false
|
||||
fi
|
||||
else
|
||||
SKIP_DOWNLOAD=false
|
||||
fi
|
||||
}
|
||||
|
||||
# Download and validate prowjob.json
|
||||
download_prowjob_json() {
|
||||
if [ "$SKIP_DOWNLOAD" = true ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
log_info "Downloading prowjob.json..."
|
||||
|
||||
local prowjob_json="${WORK_DIR}/prowjob.json"
|
||||
local gcs_prowjob="${GCS_BASE_PATH}prowjob.json"
|
||||
|
||||
if ! gcloud storage cp "$gcs_prowjob" "$prowjob_json" 2>/dev/null; then
|
||||
log_error "Failed to download prowjob.json from $gcs_prowjob"
|
||||
log_error "Verify the URL and check if the job completed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "Downloaded prowjob.json"
|
||||
}
|
||||
|
||||
# Extract target from prowjob.json
|
||||
extract_target() {
|
||||
log_info "Extracting target from prowjob.json..."
|
||||
|
||||
local prowjob_json="${WORK_DIR}/prowjob.json"
|
||||
|
||||
# Search for --target=xxx pattern
|
||||
if ! TARGET=$(grep -oP -- '--target=\K[a-zA-Z0-9-]+' "$prowjob_json" | head -1); then
|
||||
log_error "This is not a ci-operator job (no --target found in prowjob.json)"
|
||||
log_error "Only ci-operator jobs can be analyzed by this tool"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log_success "Target: $TARGET"
|
||||
}
|
||||
|
||||
# Download artifacts
|
||||
download_artifacts() {
|
||||
if [ "$SKIP_DOWNLOAD" = true ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
log_info "Downloading audit logs and pod logs..."
|
||||
|
||||
local gcs_audit_logs="${GCS_BASE_PATH}artifacts/${TARGET}/gather-extra/artifacts/audit_logs/"
|
||||
local local_audit_logs="${WORK_DIR}/artifacts/${TARGET}/gather-extra/artifacts/audit_logs/"
|
||||
|
||||
local gcs_pod_logs="${GCS_BASE_PATH}artifacts/${TARGET}/gather-extra/artifacts/pods/"
|
||||
local local_pod_logs="${WORK_DIR}/artifacts/${TARGET}/gather-extra/artifacts/pods/"
|
||||
|
||||
# Download audit logs
|
||||
log_info "Downloading audit logs..."
|
||||
if gcloud storage cp -r "$gcs_audit_logs" "$local_audit_logs" 2>/dev/null; then
|
||||
log_success "Downloaded audit logs"
|
||||
else
|
||||
log_warn "No audit logs found (job may not have completed or audit logging disabled)"
|
||||
fi
|
||||
|
||||
# Download pod logs
|
||||
log_info "Downloading pod logs..."
|
||||
if gcloud storage cp -r "$gcs_pod_logs" "$local_pod_logs" 2>/dev/null; then
|
||||
log_success "Downloaded pod logs"
|
||||
else
|
||||
log_warn "No pod logs found"
|
||||
fi
|
||||
}
|
||||
|
||||
# Parse logs
|
||||
parse_logs() {
|
||||
log_info "Parsing audit logs..."
|
||||
|
||||
local audit_output="${WORK_DIR}/audit_entries.json"
|
||||
python3 "${SCRIPT_DIR}/parse_audit_logs.py" "${WORK_DIR}" "${RESOURCE_SPECS[@]}" > "$audit_output" 2>&1
|
||||
|
||||
AUDIT_COUNT=$(jq '. | length' "$audit_output")
|
||||
log_success "Found $AUDIT_COUNT audit log entries"
|
||||
|
||||
log_info "Parsing pod logs..."
|
||||
|
||||
local pod_output="${WORK_DIR}/pod_entries.json"
|
||||
python3 "${SCRIPT_DIR}/parse_pod_logs.py" "${WORK_DIR}" "${RESOURCE_SPECS[@]}" > "$pod_output" 2>&1
|
||||
|
||||
POD_COUNT=$(jq '. | length' "$pod_output")
|
||||
log_success "Found $POD_COUNT pod log entries"
|
||||
|
||||
TOTAL_COUNT=$((AUDIT_COUNT + POD_COUNT))
|
||||
|
||||
if [ "$TOTAL_COUNT" -eq 0 ]; then
|
||||
log_warn "No log entries found matching the specified resources"
|
||||
log_warn "Suggestions:"
|
||||
log_warn " - Check resource names for typos"
|
||||
log_warn " - Try searching without kind or namespace filters"
|
||||
log_warn " - Verify resources existed during this job execution"
|
||||
fi
|
||||
}
|
||||
|
||||
# Generate report
|
||||
generate_report() {
|
||||
log_info "Generating HTML report..."
|
||||
|
||||
# Build report filename
|
||||
local report_filename=""
|
||||
for spec in "${RESOURCE_SPECS[@]}"; do
|
||||
# Replace special characters
|
||||
local safe_spec="${spec//:/_}"
|
||||
safe_spec="${safe_spec//\//_}"
|
||||
|
||||
if [ -z "$report_filename" ]; then
|
||||
report_filename="${safe_spec}"
|
||||
else
|
||||
report_filename="${report_filename}__${safe_spec}"
|
||||
fi
|
||||
done
|
||||
|
||||
REPORT_PATH="${BUILD_ID}/${report_filename}.html"
|
||||
|
||||
# Update metadata with additional fields
|
||||
local metadata_file="${WORK_DIR}/metadata.json"
|
||||
jq --arg target "$TARGET" \
|
||||
--argjson resources "$(printf '%s\n' "${RESOURCE_SPECS[@]}" | jq -R . | jq -s .)" \
|
||||
'. + {target: $target, resources: $resources}' \
|
||||
"$metadata_file" > "${metadata_file}.tmp"
|
||||
mv "${metadata_file}.tmp" "$metadata_file"
|
||||
|
||||
# Generate report
|
||||
python3 "${SCRIPT_DIR}/generate_report.py" \
|
||||
"${SCRIPT_DIR}/report_template.html" \
|
||||
"$REPORT_PATH" \
|
||||
"$metadata_file" \
|
||||
"${WORK_DIR}/audit_entries.json" \
|
||||
"${WORK_DIR}/pod_entries.json"
|
||||
|
||||
log_success "Report generated: $REPORT_PATH"
|
||||
}
|
||||
|
||||
# Print summary
|
||||
print_summary() {
|
||||
echo
|
||||
echo "=========================================="
|
||||
echo "Resource Lifecycle Analysis Complete"
|
||||
echo "=========================================="
|
||||
echo
|
||||
echo "Prow Job: $PROWJOB_NAME"
|
||||
echo "Build ID: $BUILD_ID"
|
||||
echo "Target: $TARGET"
|
||||
echo
|
||||
echo "Resources Analyzed:"
|
||||
for spec in "${RESOURCE_SPECS[@]}"; do
|
||||
echo " - $spec"
|
||||
done
|
||||
echo
|
||||
echo "Artifacts downloaded to: ${WORK_DIR}/"
|
||||
echo
|
||||
echo "Results:"
|
||||
echo " - Audit log entries: $AUDIT_COUNT"
|
||||
echo " - Pod log entries: $POD_COUNT"
|
||||
echo " - Total entries: $TOTAL_COUNT"
|
||||
echo
|
||||
echo "Report generated: $REPORT_PATH"
|
||||
echo
|
||||
echo "To open report:"
|
||||
if [[ "$OSTYPE" == "darwin"* ]]; then
|
||||
echo " open $REPORT_PATH"
|
||||
elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
|
||||
echo " xdg-open $REPORT_PATH"
|
||||
else
|
||||
echo " Open $REPORT_PATH in your browser"
|
||||
fi
|
||||
echo
|
||||
}
|
||||
|
||||
# Main function
|
||||
main() {
|
||||
if [ $# -lt 2 ]; then
|
||||
echo "Usage: $0 <prowjob-url> <resource-spec1> [<resource-spec2> ...]"
|
||||
echo
|
||||
echo "Examples:"
|
||||
echo " $0 'https://gcsweb-ci.../1978913325970362368/' pod/etcd-0"
|
||||
echo " $0 'https://gcsweb-ci.../1978913325970362368/' openshift-etcd:pod/etcd-0 configmap/cluster-config"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local url="$1"
|
||||
shift
|
||||
RESOURCE_SPECS=("$@")
|
||||
|
||||
# Initialize variables
|
||||
SKIP_DOWNLOAD=false
|
||||
WORK_DIR=""
|
||||
BUILD_ID=""
|
||||
PROWJOB_NAME=""
|
||||
GCS_BASE_PATH=""
|
||||
BUCKET_PATH=""
|
||||
TARGET=""
|
||||
REPORT_PATH=""
|
||||
AUDIT_COUNT=0
|
||||
POD_COUNT=0
|
||||
TOTAL_COUNT=0
|
||||
|
||||
# Execute workflow
|
||||
check_prerequisites
|
||||
parse_url "$url"
|
||||
create_work_dir
|
||||
download_prowjob_json
|
||||
extract_target
|
||||
download_artifacts
|
||||
parse_logs
|
||||
generate_report
|
||||
print_summary
|
||||
}
|
||||
|
||||
# Run main function
|
||||
main "$@"
|
||||
434
skills/prow-job-analyze-resource/report_template.html
Normal file
434
skills/prow-job-analyze-resource/report_template.html
Normal file
@@ -0,0 +1,434 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Prow Job Resource Lifecycle - {{build_id}}</title>
|
||||
<style>
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
background: #f5f5f5;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.container {
|
||||
max-width: 1400px;
|
||||
margin: 0 auto;
|
||||
background: white;
|
||||
padding: 30px;
|
||||
border-radius: 8px;
|
||||
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
|
||||
}
|
||||
|
||||
h1 {
|
||||
color: #2c3e50;
|
||||
margin-bottom: 20px;
|
||||
font-size: 28px;
|
||||
}
|
||||
|
||||
.metadata {
|
||||
background: #f8f9fa;
|
||||
padding: 20px;
|
||||
border-radius: 6px;
|
||||
margin-bottom: 30px;
|
||||
border-left: 4px solid #3498db;
|
||||
}
|
||||
|
||||
.metadata p {
|
||||
margin: 8px 0;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.metadata strong {
|
||||
color: #2c3e50;
|
||||
display: inline-block;
|
||||
min-width: 120px;
|
||||
}
|
||||
|
||||
.metadata a {
|
||||
color: #3498db;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.metadata a:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.timeline-section {
|
||||
margin: 30px 0;
|
||||
}
|
||||
|
||||
.timeline-section h2 {
|
||||
color: #2c3e50;
|
||||
margin-bottom: 15px;
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
.timeline-container {
|
||||
background: #fff;
|
||||
border: 1px solid #ddd;
|
||||
border-radius: 6px;
|
||||
padding: 20px;
|
||||
margin-bottom: 30px;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
#timeline {
|
||||
width: 100%;
|
||||
height: 100px;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.timeline-labels {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
margin-top: 10px;
|
||||
font-size: 12px;
|
||||
color: #666;
|
||||
}
|
||||
|
||||
.filters {
|
||||
margin: 20px 0;
|
||||
padding: 15px;
|
||||
background: #f8f9fa;
|
||||
border-radius: 6px;
|
||||
display: flex;
|
||||
gap: 15px;
|
||||
flex-wrap: wrap;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.filter-group {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.filter-group label {
|
||||
font-size: 14px;
|
||||
font-weight: 500;
|
||||
color: #555;
|
||||
}
|
||||
|
||||
.filter-group select,
|
||||
.filter-group input {
|
||||
padding: 6px 12px;
|
||||
border: 1px solid #ddd;
|
||||
border-radius: 4px;
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.entries-section h2 {
|
||||
color: #2c3e50;
|
||||
margin-bottom: 15px;
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
.entry {
|
||||
background: white;
|
||||
border: 1px solid #e0e0e0;
|
||||
border-radius: 6px;
|
||||
padding: 15px;
|
||||
margin-bottom: 12px;
|
||||
transition: box-shadow 0.2s;
|
||||
}
|
||||
|
||||
.entry:hover {
|
||||
box-shadow: 0 2px 8px rgba(0,0,0,0.1);
|
||||
}
|
||||
|
||||
.entry-header {
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
align-items: center;
|
||||
margin-bottom: 8px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.timestamp {
|
||||
font-family: 'Monaco', 'Courier New', monospace;
|
||||
font-size: 13px;
|
||||
color: #555;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.level {
|
||||
padding: 3px 10px;
|
||||
border-radius: 12px;
|
||||
font-size: 11px;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
.level.info {
|
||||
background: #e8f4f8;
|
||||
color: #2980b9;
|
||||
}
|
||||
|
||||
.level.warn {
|
||||
background: #fff3cd;
|
||||
color: #856404;
|
||||
}
|
||||
|
||||
.level.error {
|
||||
background: #f8d7da;
|
||||
color: #721c24;
|
||||
}
|
||||
|
||||
.source {
|
||||
font-size: 12px;
|
||||
color: #7f8c8d;
|
||||
font-family: 'Monaco', 'Courier New', monospace;
|
||||
}
|
||||
|
||||
.entry-summary {
|
||||
font-size: 14px;
|
||||
color: #2c3e50;
|
||||
margin: 8px 0;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
.entry-details {
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
.entry-details summary {
|
||||
cursor: pointer;
|
||||
font-size: 13px;
|
||||
color: #3498db;
|
||||
user-select: none;
|
||||
padding: 5px 0;
|
||||
}
|
||||
|
||||
.entry-details summary:hover {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.entry-details pre {
|
||||
margin-top: 10px;
|
||||
padding: 12px;
|
||||
background: #f8f9fa;
|
||||
border-radius: 4px;
|
||||
overflow-x: auto;
|
||||
font-size: 12px;
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
.entry-details code {
|
||||
font-family: 'Monaco', 'Courier New', monospace;
|
||||
color: #333;
|
||||
}
|
||||
|
||||
.stats {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
margin: 20px 0;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.stat-box {
|
||||
flex: 1;
|
||||
min-width: 150px;
|
||||
padding: 15px;
|
||||
background: #f8f9fa;
|
||||
border-radius: 6px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.stat-box .number {
|
||||
font-size: 32px;
|
||||
font-weight: 700;
|
||||
color: #2c3e50;
|
||||
}
|
||||
|
||||
.stat-box .label {
|
||||
font-size: 13px;
|
||||
color: #7f8c8d;
|
||||
margin-top: 5px;
|
||||
}
|
||||
|
||||
.legend {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
margin-top: 10px;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.legend-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
}
|
||||
|
||||
.legend-color {
|
||||
width: 16px;
|
||||
height: 16px;
|
||||
border-radius: 2px;
|
||||
}
|
||||
|
||||
.timeline-event {
|
||||
cursor: pointer;
|
||||
transition: opacity 0.2s;
|
||||
}
|
||||
|
||||
.timeline-event:hover {
|
||||
opacity: 0.7;
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.container {
|
||||
padding: 15px;
|
||||
}
|
||||
|
||||
.entry-header {
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.filters {
|
||||
flex-direction: column;
|
||||
align-items: stretch;
|
||||
}
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="header">
|
||||
<h1>Prow Job Resource Lifecycle Analysis</h1>
|
||||
<div class="metadata">
|
||||
<p><strong>Prow Job:</strong> {{prowjob_name}}</p>
|
||||
<p><strong>Build ID:</strong> {{build_id}}</p>
|
||||
<p><strong>gcsweb URL:</strong> <a href="{{original_url}}" target="_blank">{{original_url}}</a></p>
|
||||
<p><strong>Target:</strong> {{target}}</p>
|
||||
<p><strong>Resources:</strong> {{resources}}</p>
|
||||
<p><strong>Time Range:</strong> {{time_range}}</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="stats">
|
||||
<div class="stat-box">
|
||||
<div class="number">{{total_entries}}</div>
|
||||
<div class="label">Total Entries</div>
|
||||
</div>
|
||||
<div class="stat-box">
|
||||
<div class="number">{{audit_entries}}</div>
|
||||
<div class="label">Audit Logs</div>
|
||||
</div>
|
||||
<div class="stat-box">
|
||||
<div class="number">{{pod_entries}}</div>
|
||||
<div class="label">Pod Logs</div>
|
||||
</div>
|
||||
<div class="stat-box">
|
||||
<div class="number">{{error_count}}</div>
|
||||
<div class="label">Errors</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="timeline-section">
|
||||
<h2>Interactive Timeline</h2>
|
||||
<div class="timeline-container">
|
||||
<svg id="timeline">
|
||||
{{timeline_events}}
|
||||
</svg>
|
||||
<div class="timeline-labels">
|
||||
<span>{{min_time}}</span>
|
||||
<span>{{max_time}}</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="legend">
|
||||
<div class="legend-item">
|
||||
<div class="legend-color" style="background: #e8f4f8;"></div>
|
||||
<span>Info</span>
|
||||
</div>
|
||||
<div class="legend-item">
|
||||
<div class="legend-color" style="background: #fff3cd;"></div>
|
||||
<span>Warning</span>
|
||||
</div>
|
||||
<div class="legend-item">
|
||||
<div class="legend-color" style="background: #f8d7da;"></div>
|
||||
<span>Error</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="entries-section">
|
||||
<h2>Log Entries</h2>
|
||||
|
||||
<div class="filters">
|
||||
<div class="filter-group">
|
||||
<label>Filter by Level:</label>
|
||||
<select id="level-filter">
|
||||
<option value="all">All</option>
|
||||
<option value="info">Info</option>
|
||||
<option value="warn">Warning</option>
|
||||
<option value="error">Error</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>Search:</label>
|
||||
<input type="text" id="search-input" placeholder="Search in entries...">
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="entries-container">
|
||||
{{entries}}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
// Timeline click handler
|
||||
document.querySelectorAll('.timeline-event').forEach(el => {
|
||||
el.addEventListener('click', (e) => {
|
||||
e.stopPropagation();
|
||||
const entryId = el.getAttribute('data-entry-id');
|
||||
const entryElement = document.getElementById(entryId);
|
||||
if (entryElement) {
|
||||
entryElement.scrollIntoView({ behavior: 'smooth', block: 'center' });
|
||||
entryElement.style.background = '#e3f2fd';
|
||||
setTimeout(() => {
|
||||
entryElement.style.background = 'white';
|
||||
}, 2000);
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// Level filter
|
||||
const levelFilter = document.getElementById('level-filter');
|
||||
levelFilter.addEventListener('change', () => {
|
||||
const selectedLevel = levelFilter.value;
|
||||
const entries = document.querySelectorAll('.entry');
|
||||
|
||||
entries.forEach(entry => {
|
||||
if (selectedLevel === 'all') {
|
||||
entry.style.display = 'block';
|
||||
} else {
|
||||
const level = entry.getAttribute('data-level');
|
||||
entry.style.display = level === selectedLevel ? 'block' : 'none';
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
// Search filter
|
||||
const searchInput = document.getElementById('search-input');
|
||||
searchInput.addEventListener('input', () => {
|
||||
const searchTerm = searchInput.value.toLowerCase();
|
||||
const entries = document.querySelectorAll('.entry');
|
||||
|
||||
entries.forEach(entry => {
|
||||
const text = entry.textContent.toLowerCase();
|
||||
entry.style.display = text.includes(searchTerm) ? 'block' : 'none';
|
||||
});
|
||||
});
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
13
skills/prow-job-analyze-test-failure/README.md
Normal file
13
skills/prow-job-analyze-test-failure/README.md
Normal file
@@ -0,0 +1,13 @@
|
||||
## Using with Claude Code
|
||||
|
||||
When you ask Claude to analyze a test failure in Prow job, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:
|
||||
- Validating prerequisites
|
||||
- Parsing URLs
|
||||
- Downloading artifacts
|
||||
- Analyzing test failure
|
||||
- Generating reports
|
||||
|
||||
You can simply ask:
|
||||
> "Analyze test failure XYZ in this Prow job: https://gcsweb-ci.../1978913325970362368/"
|
||||
|
||||
Claude will execute the workflow and generate a text report
|
||||
143
skills/prow-job-analyze-test-failure/SKILL.md
Normal file
143
skills/prow-job-analyze-test-failure/SKILL.md
Normal file
@@ -0,0 +1,143 @@
|
||||
---
|
||||
name: Prow Job Analyze Test Failure
|
||||
description: Analyze a failed test by inspecting the code in the current project and artifacts in Prow CI job. Provide a detailed analysis of the test failure in a pre-defined format.
|
||||
---
|
||||
|
||||
# Prow Job Analyze Test Failure
|
||||
|
||||
This skill analyzes the given test failure by downloading artifacts using the "Prow Job Analyze Resource" skill, checking test logs, inspecting resources, logs and events from the artifacts, and the test source code.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when the user wants to do an initial analysis of a Prow CI test failure.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Identical with "Prow Job Analyze Resource" skill.
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
|
||||
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
|
||||
|
||||
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/6731/pull-ci-openshift-hypershift-main-e2e-aws/1962527613477982208`
|
||||
- URL may or may not have trailing slash
|
||||
|
||||
2. **Test name** - test name that failed
|
||||
- Examples:
|
||||
- `TestKarpenter/EnsureHostedCluster/ValidateMetricsAreExposed`
|
||||
- `TestCreateClusterCustomConfig`
|
||||
- `The openshift-console downloads pods [apigroup:console.openshift.io] should be scheduled on different nodes`
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Parse and Validate URL
|
||||
|
||||
Use the "Parse and Validate URL" steps from "Prow Job Analyze Resource" skill
|
||||
|
||||
### Step 2: Create Working Directory
|
||||
|
||||
1. **Check for existing artifacts first**
|
||||
|
||||
- Check if `.work/prow-job-analyze-test-failure/{build_id}/logs/` directory exists and has content
|
||||
- If it exists with content:
|
||||
- Use AskUserQuestion tool to ask:
|
||||
- Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?"
|
||||
- Options:
|
||||
- "Use existing" - Skip to step Analyze Test Failure
|
||||
- "Re-download" - Continue to clean and re-download
|
||||
- If user chooses "Re-download":
|
||||
- Remove all existing content: `rm -rf .work/prow-job-analyze-test-failure/{build_id}/logs/`
|
||||
- Also remove tmp directory: `rm -rf .work/prow-job-analyze-test-failure/{build_id}/tmp/`
|
||||
- This ensures clean state before downloading new content
|
||||
- If user chooses "Use existing":
|
||||
- Skip directly to Step 4 (Analyze Test Failure)
|
||||
- Still need to download prowjob.json if it doesn't exist
|
||||
|
||||
2. **Create directory structure**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-analyze-test-failure/{build_id}/logs
|
||||
mkdir -p .work/prow-job-analyze-test-failure/{build_id}/tmp
|
||||
```
|
||||
- Use `.work/prow-job-analyze-test-failure/` as the base directory (already in .gitignore)
|
||||
- Use build_id as subdirectory name
|
||||
- Create `logs/` subdirectory for all downloads
|
||||
- Create `tmp/` subdirectory for temporary files (intermediate JSON, etc.)
|
||||
- Working directory: `.work/prow-job-analyze-test-failure/{build_id}/`
|
||||
|
||||
### Step 3: Download and Validate prowjob.json
|
||||
|
||||
Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resource" skill.
|
||||
|
||||
### Step 4: Analyze Test Failure
|
||||
|
||||
1. **Download build-log.txt**
|
||||
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/build-log.txt .work/prow-job-analyze-test-failure/{build_id}/logs/build-log.txt --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Parse and validate**
|
||||
|
||||
- Read `.work/prow-job-analyze-resource/{build_id}/logs/build-log.txt`
|
||||
- Search for the Test name
|
||||
- Gather stack trace related to the test
|
||||
|
||||
3. **Examine intervals files for cluster activity during E2E failures**
|
||||
|
||||
- Search recursively for E2E timeline artifacts (known as "interval files") within the bucket-path:
|
||||
```bash
|
||||
gcloud storage ls 'gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*json'
|
||||
```
|
||||
- The files can be nested at unpredictable levels below the bucket-path
|
||||
- There could be as many as two matching files
|
||||
- Download all matching interval files (use the full paths from the search results):
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*.json .work/prow-job-analyze-test-failure/{build_id}/logs/ --no-user-output-enabled
|
||||
```
|
||||
- If the wildcard copy doesn't work, copy each file individually using the full paths from the search results
|
||||
- **Scan interval files for test failure timing:**
|
||||
- Look for intervals where `source = "E2ETest"` and `message.annotations.status = "Failed"`
|
||||
- Note the `from` and `to` timestamps on this interval - this indicates when the test was running
|
||||
- **Scan interval files for related cluster events:**
|
||||
- Look for intervals that overlap the timeframe when the failed test was running
|
||||
- Filter for intervals with:
|
||||
- `level = "Error"` or `level = "Warning"`
|
||||
- `source = "OperatorState"`
|
||||
- These events may indicate cluster issues that caused or contributed to the test failure
|
||||
|
||||
4. **Determine root cause**
|
||||
- Determine a possible root cause for the test failure
|
||||
- Analyze stack traces
|
||||
- Analyze related code in the code repository
|
||||
- Store artifacts from Prow CI job (json/yaml files) related to the failure under `.work/prow-job-analyze-resource/{build_id}/tmp`
|
||||
- Store logs under `.work/prow-job-analyze-resource/{build_id}/logs/`
|
||||
- Provide evidence for the failure
|
||||
- Try to find additional evidence. For example, in logs and events and other json/yaml files
|
||||
|
||||
### Step 5: Present Results to User
|
||||
|
||||
1. **Display summary**
|
||||
|
||||
```text
|
||||
Test Failure Analysis Complete
|
||||
|
||||
Prow Job: {prowjob-name}
|
||||
Build ID: {build_id}
|
||||
Error: {error message}
|
||||
|
||||
Summary: {failure analysis}
|
||||
Evidence: {evidence}
|
||||
Additional evidence: {additional evidence}
|
||||
|
||||
Artifacts downloaded to: .work/prow-job-analyze-test-failure/{build_id}/logs/
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
Handle errors in the same way as "Error handling" in "Prow Job Analyze Resource" skill
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
Follow the instructions in "Performance Considerations" in "Prow Job Analyze Resource" skill
|
||||
54
skills/prow-job-extract-must-gather/CHANGELOG.md
Normal file
54
skills/prow-job-extract-must-gather/CHANGELOG.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to the Prow Job Extract Must-Gather skill will be documented in this file.
|
||||
|
||||
## [1.0.0] - 2025-01-17
|
||||
|
||||
### Added
|
||||
- Initial release of Prow Job Extract Must-Gather skill
|
||||
- Command file: `plugins/prow-job/commands/extract-must-gather.md`
|
||||
- Comprehensive SKILL.md with detailed implementation instructions
|
||||
- `extract_archives.py` script for recursive archive extraction
|
||||
- Extracts must-gather.tar to specified directory
|
||||
- Renames long "-ci-" containing subdirectory to "content/"
|
||||
- Recursively processes nested .tar.gz, .tgz, and .gz archives
|
||||
- Removes original compressed files after extraction
|
||||
- Handles up to 10 levels of nesting
|
||||
- Reports extraction statistics
|
||||
- `generate_html_report.py` script for HTML file browser generation
|
||||
- Scans directory tree and collects file metadata
|
||||
- Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
- Generates interactive HTML with dark theme matching analyze-resource skill
|
||||
- Multi-select file type filters
|
||||
- Regex pattern filter for powerful file searches
|
||||
- Text search for file names and paths
|
||||
- Direct links to files with relative paths
|
||||
- Statistics dashboard showing file counts and sizes
|
||||
- Scroll to top button
|
||||
- Comprehensive README.md documentation
|
||||
- Working directory structure: `.work/prow-job-extract-must-gather/{build_id}/`
|
||||
- Subdirectory organization: `logs/` for extracted content, `tmp/` for temporary files
|
||||
- Same URL parsing logic as analyze-resource skill
|
||||
- Support for caching extracted content (ask user before re-extracting)
|
||||
- Error handling for corrupted archives, missing files, and invalid URLs
|
||||
- Progress indicators for all long-running operations
|
||||
- Platform-aware browser opening (xdg-open, open, start)
|
||||
|
||||
### Features
|
||||
- **Automatic Archive Extraction**: Handles all nested archive formats automatically
|
||||
- **Directory Renaming**: Shortens long subdirectory names for better usability
|
||||
- **Interactive File Browser**: Modern HTML interface with powerful filtering
|
||||
- **Regex Pattern Matching**: Search files using full regex syntax
|
||||
- **File Type Classification**: Automatic detection and categorization of file types
|
||||
- **Relative File Links**: Click to open files directly from HTML browser
|
||||
- **Statistics Dashboard**: Visual overview of extracted content
|
||||
- **Extraction Caching**: Avoid re-extracting by reusing cached content
|
||||
- **Error Recovery**: Continue processing despite individual archive failures
|
||||
|
||||
### Technical Details
|
||||
- Python 3 scripts using standard library (tarfile, gzip, os, pathlib)
|
||||
- No external dependencies required
|
||||
- Memory-efficient incremental processing
|
||||
- Follows same patterns as analyze-resource skill
|
||||
- Integrated with Claude Code permissions system
|
||||
- Uses `.work/` directory (already in .gitignore)
|
||||
350
skills/prow-job-extract-must-gather/README.md
Normal file
350
skills/prow-job-extract-must-gather/README.md
Normal file
@@ -0,0 +1,350 @@
|
||||
# Prow Job Extract Must-Gather Skill
|
||||
|
||||
This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
|
||||
|
||||
## Overview
|
||||
|
||||
The skill provides both a Claude Code skill interface and standalone scripts for extracting must-gather data from Prow CI jobs. It eliminates the manual steps of downloading and recursively extracting nested archives.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. SKILL.md
|
||||
Claude Code skill definition that provides detailed implementation instructions for the AI assistant.
|
||||
|
||||
### 2. Python Scripts
|
||||
|
||||
#### extract_archives.py
|
||||
Extracts and recursively processes must-gather archives.
|
||||
|
||||
**Features:**
|
||||
- Extracts must-gather.tar to specified directory
|
||||
- Renames long subdirectory (containing "-ci-") to "content/" for readability
|
||||
- Recursively processes nested archives:
|
||||
- `.tar.gz` and `.tgz`: Extract in place, remove original
|
||||
- `.gz` (plain gzip): Decompress in place, remove original
|
||||
- Handles up to 10 levels of nested archives
|
||||
- Reports extraction statistics
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 extract_archives.py <must-gather.tar> <output-directory>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
================================================================================
|
||||
Must-Gather Archive Extraction
|
||||
================================================================================
|
||||
|
||||
Step 1: Extracting must-gather.tar
|
||||
From: .work/.../tmp/must-gather.tar
|
||||
To: .work/.../logs
|
||||
Extracting: .work/.../tmp/must-gather.tar
|
||||
|
||||
Step 2: Renaming long directory to 'content/'
|
||||
From: registry-build09-ci-openshift-org-ci-op-...
|
||||
To: content/
|
||||
|
||||
Step 3: Processing nested archives
|
||||
Extracting: .../content/namespaces/openshift-etcd/pods/etcd-0.tar.gz
|
||||
Decompressing: .../content/cluster-scoped-resources/nodes/ip-10-0-1-234.log.gz
|
||||
... (continues for all archives)
|
||||
|
||||
================================================================================
|
||||
Extraction Complete
|
||||
================================================================================
|
||||
|
||||
Statistics:
|
||||
Total files: 3,421
|
||||
Total size: 234.5 MB
|
||||
Archives processed: 247
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
```
|
||||
|
||||
#### generate_html_report.py
|
||||
Generates an interactive HTML file browser with filters and search.
|
||||
|
||||
**Features:**
|
||||
- Scans directory tree and collects file metadata
|
||||
- Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
- Generates statistics (total files, total size, counts by type)
|
||||
- Creates interactive HTML with:
|
||||
- Multi-select file type filters
|
||||
- Regex pattern filter for powerful searches
|
||||
- Text search for file names/paths
|
||||
- Direct links to files (relative paths)
|
||||
- Same dark theme as analyze-resource skill
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 generate_html_report.py <logs-directory> <prowjob_name> <build_id> <target> <gcsweb_url>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs \
|
||||
"periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
|
||||
"1965715986610917376" \
|
||||
"e2e-aws-ovn-techpreview" \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Creates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Python 3** - For running extraction and report generator scripts
|
||||
2. **gcloud CLI** - For downloading artifacts from GCS
|
||||
- Install: https://cloud.google.com/sdk/docs/install
|
||||
- Authentication NOT required (bucket is publicly accessible)
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **URL Parsing**
|
||||
- Validate URL contains `test-platform-results/`
|
||||
- Extract build_id (10+ digits)
|
||||
- Extract prowjob name
|
||||
- Construct GCS paths
|
||||
|
||||
2. **Working Directory**
|
||||
- Create `.work/prow-job-extract-must-gather/{build_id}/` directory
|
||||
- Create `logs/` subdirectory for extraction
|
||||
- Create `tmp/` subdirectory for temporary files
|
||||
- Check for existing extraction (offers to skip re-extraction)
|
||||
|
||||
3. **prowjob.json Validation**
|
||||
- Download prowjob.json
|
||||
- Search for `--target=` pattern
|
||||
- Exit if not a ci-operator job
|
||||
|
||||
4. **Must-Gather Download**
|
||||
- Download from: `artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
|
||||
- Save to: `{build_id}/tmp/must-gather.tar`
|
||||
|
||||
5. **Extraction and Processing**
|
||||
- Extract must-gather.tar to `{build_id}/logs/`
|
||||
- Rename long subdirectory to "content/"
|
||||
- Recursively extract nested archives (.tar.gz, .tgz, .gz)
|
||||
- Remove original compressed files after extraction
|
||||
|
||||
6. **HTML Report Generation**
|
||||
- Scan directory tree
|
||||
- Classify files by type
|
||||
- Calculate statistics
|
||||
- Generate interactive HTML browser
|
||||
- Output to `{build_id}/must-gather-browser.html`
|
||||
|
||||
## Output
|
||||
|
||||
### Console Output
|
||||
```
|
||||
Must-Gather Extraction Complete
|
||||
|
||||
Prow Job: periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
|
||||
Build ID: 1965715986610917376
|
||||
Target: e2e-aws-ovn-techpreview
|
||||
|
||||
Extraction Statistics:
|
||||
- Total files: 3,421
|
||||
- Total size: 234.5 MB
|
||||
- Archives extracted: 247
|
||||
- Log files: 1,234
|
||||
- YAML files: 856
|
||||
- JSON files: 423
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
|
||||
|
||||
File browser generated: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
|
||||
|
||||
Open in browser to browse and search extracted files.
|
||||
```
|
||||
|
||||
### HTML File Browser
|
||||
|
||||
The generated HTML report includes:
|
||||
|
||||
1. **Header Section**
|
||||
- Prow job name
|
||||
- Build ID
|
||||
- Target name
|
||||
- GCS URL (link to gcsweb)
|
||||
- Local extraction path
|
||||
|
||||
2. **Statistics Dashboard**
|
||||
- Total files count
|
||||
- Total size (human-readable)
|
||||
- Counts by file type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
|
||||
3. **Filter Controls**
|
||||
- **File Type Filter**: Multi-select buttons to filter by type
|
||||
- **Regex Pattern Filter**: Input field for regex patterns (e.g., `.*etcd.*`, `.*\.log$`, `^content/namespaces/.*`)
|
||||
- **Name Search**: Text search for file names and paths
|
||||
|
||||
4. **File List**
|
||||
- Icon for each file type
|
||||
- File name (clickable link to open file)
|
||||
- Directory path
|
||||
- File size
|
||||
- File type badge (color-coded)
|
||||
- Sorted alphabetically by path
|
||||
|
||||
5. **Interactive Features**
|
||||
- All filters work together (AND logic)
|
||||
- Real-time filtering (300ms debounce)
|
||||
- Regex pattern validation
|
||||
- Scroll to top button
|
||||
- No results message when filters match nothing
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
.work/prow-job-extract-must-gather/{build_id}/
|
||||
├── tmp/
|
||||
│ ├── prowjob.json
|
||||
│ └── must-gather.tar (downloaded, not deleted)
|
||||
├── logs/
|
||||
│ └── content/ # Renamed from long directory
|
||||
│ ├── cluster-scoped-resources/
|
||||
│ │ ├── nodes/
|
||||
│ │ ├── clusterroles/
|
||||
│ │ └── ...
|
||||
│ ├── namespaces/
|
||||
│ │ ├── openshift-etcd/
|
||||
│ │ │ ├── pods/
|
||||
│ │ │ ├── services/
|
||||
│ │ │ └── ...
|
||||
│ │ └── ...
|
||||
│ └── ... (all extracted and decompressed)
|
||||
└── must-gather-browser.html
|
||||
```
|
||||
|
||||
## Performance Features
|
||||
|
||||
1. **Caching**
|
||||
- Extracted files are cached in `{build_id}/logs/`
|
||||
- Offers to skip re-extraction if content already exists
|
||||
|
||||
2. **Incremental Processing**
|
||||
- Archives processed iteratively (up to 10 passes)
|
||||
- Handles deeply nested archive structures
|
||||
|
||||
3. **Progress Indicators**
|
||||
- Colored output for different stages
|
||||
- Status messages for long-running operations
|
||||
- Final statistics summary
|
||||
|
||||
4. **Error Handling**
|
||||
- Graceful handling of corrupted archives
|
||||
- Continues processing after errors
|
||||
- Reports all errors in final summary
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Usage
|
||||
```bash
|
||||
# Via Claude Code
|
||||
User: "Extract must-gather from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
|
||||
# Standalone script
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs \
|
||||
"periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
|
||||
"1965715986610917376" \
|
||||
"e2e-aws-ovn-techpreview" \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
```
|
||||
|
||||
### Using Regex Filters in HTML Browser
|
||||
|
||||
**Find all etcd-related files:**
|
||||
```regex
|
||||
.*etcd.*
|
||||
```
|
||||
|
||||
**Find all log files:**
|
||||
```regex
|
||||
.*\.log$
|
||||
```
|
||||
|
||||
**Find files in specific namespace:**
|
||||
```regex
|
||||
^content/namespaces/openshift-etcd/.*
|
||||
```
|
||||
|
||||
**Find YAML manifests for pods:**
|
||||
```regex
|
||||
.*pods/.*\.yaml$
|
||||
```
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
When you ask Claude to extract a must-gather, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:
|
||||
- Validating prerequisites
|
||||
- Parsing URLs
|
||||
- Downloading archives
|
||||
- Extracting and decompressing
|
||||
- Generating HTML browser
|
||||
|
||||
You can simply ask:
|
||||
> "Extract must-gather from this Prow job: https://gcsweb-ci.../1965715986610917376/"
|
||||
|
||||
Claude will execute the workflow and generate the interactive HTML file browser.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### gcloud not installed
|
||||
```bash
|
||||
# Check installation
|
||||
which gcloud
|
||||
|
||||
# Install (follow platform-specific instructions)
|
||||
# https://cloud.google.com/sdk/docs/install
|
||||
```
|
||||
|
||||
### must-gather.tar not found
|
||||
- Verify job completed successfully
|
||||
- Check target name is correct
|
||||
- Confirm gather-must-gather ran in the job
|
||||
- Manually check GCS path in gcsweb
|
||||
|
||||
### Corrupted archives
|
||||
- Check error messages in extraction output
|
||||
- Extraction continues despite individual failures
|
||||
- Final summary lists all errors
|
||||
|
||||
### No "-ci-" directory found
|
||||
- Extraction continues with original directory names
|
||||
- Check logs for warning message
|
||||
- Files will still be accessible
|
||||
|
||||
### HTML browser not opening files
|
||||
- Verify files were extracted to `logs/` directory
|
||||
- Check that relative paths are correct
|
||||
- Files must be opened from the same directory as HTML file
|
||||
|
||||
## File Type Classifications
|
||||
|
||||
| Extension | Type | Badge Color |
|
||||
|-----------|------|-------------|
|
||||
| .log, .txt | log | Blue |
|
||||
| .yaml, .yml | yaml | Purple |
|
||||
| .json | json | Green |
|
||||
| .xml | xml | Yellow |
|
||||
| .crt, .pem, .key | cert | Red |
|
||||
| .tar, .gz, .tgz, .zip | archive | Gray |
|
||||
| .sh, .py | script | Blue |
|
||||
| .conf, .cfg, .ini | config | Yellow |
|
||||
| others | other | Gray |
|
||||
493
skills/prow-job-extract-must-gather/SKILL.md
Normal file
493
skills/prow-job-extract-must-gather/SKILL.md
Normal file
@@ -0,0 +1,493 @@
|
||||
---
|
||||
name: Prow Job Extract Must-Gather
|
||||
description: Extract and decompress must-gather archives from Prow CI job artifacts, generating an interactive HTML file browser with filters
|
||||
---
|
||||
|
||||
# Prow Job Extract Must-Gather
|
||||
|
||||
This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when the user wants to:
|
||||
- Extract must-gather archives from Prow CI job artifacts
|
||||
- Avoid manually downloading and extracting nested archives
|
||||
- Browse must-gather contents with an interactive HTML interface
|
||||
- Search for specific files or file types in must-gather data
|
||||
- Analyze OpenShift cluster state from CI test runs
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting, verify these prerequisites:
|
||||
|
||||
1. **gcloud CLI Installation**
|
||||
- Check if installed: `which gcloud`
|
||||
- If not installed, provide instructions for the user's platform
|
||||
- Installation guide: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
2. **gcloud Authentication (Optional)**
|
||||
- The `test-platform-results` bucket is publicly accessible
|
||||
- No authentication is required for read access
|
||||
- Skip authentication checks
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
|
||||
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
|
||||
- URL may or may not have trailing slash
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Parse and Validate URL
|
||||
|
||||
1. **Extract bucket path**
|
||||
- Find `test-platform-results/` in URL
|
||||
- Extract everything after it as the GCS bucket relative path
|
||||
- If not found, error: "URL must contain 'test-platform-results/'"
|
||||
|
||||
2. **Extract build_id**
|
||||
- Search for pattern `/(\\d{10,})/` in the bucket path
|
||||
- build_id must be at least 10 consecutive decimal digits
|
||||
- Handle URLs with or without trailing slash
|
||||
- If not found, error: "Could not find build ID (10+ digits) in URL"
|
||||
|
||||
3. **Extract prowjob name**
|
||||
- Find the path segment immediately preceding build_id
|
||||
- Example: In `.../periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
|
||||
- Prowjob name: `periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview`
|
||||
|
||||
4. **Construct GCS paths**
|
||||
- Bucket: `test-platform-results`
|
||||
- Base GCS path: `gs://test-platform-results/{bucket-path}/`
|
||||
- Ensure path ends with `/`
|
||||
|
||||
### Step 2: Create Working Directory
|
||||
|
||||
1. **Check for existing extraction first**
|
||||
- Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` directory exists and has content
|
||||
- If it exists with content:
|
||||
- Use AskUserQuestion tool to ask:
|
||||
- Question: "Must-gather already extracted for build {build_id}. Would you like to use the existing extraction or re-extract?"
|
||||
- Options:
|
||||
- "Use existing" - Skip to HTML report generation (Step 6)
|
||||
- "Re-extract" - Continue to clean and re-download
|
||||
- If user chooses "Re-extract":
|
||||
- Remove all existing content: `rm -rf .work/prow-job-extract-must-gather/{build_id}/logs/`
|
||||
- Also remove tmp directory: `rm -rf .work/prow-job-extract-must-gather/{build_id}/tmp/`
|
||||
- This ensures clean state before downloading new content
|
||||
- If user chooses "Use existing":
|
||||
- Skip directly to Step 6 (Generate HTML Report)
|
||||
|
||||
2. **Create directory structure**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-extract-must-gather/{build_id}/logs
|
||||
mkdir -p .work/prow-job-extract-must-gather/{build_id}/tmp
|
||||
```
|
||||
- Use `.work/prow-job-extract-must-gather/` as the base directory (already in .gitignore)
|
||||
- Use build_id as subdirectory name
|
||||
- Create `logs/` subdirectory for extraction
|
||||
- Create `tmp/` subdirectory for temporary files
|
||||
- Working directory: `.work/prow-job-extract-must-gather/{build_id}/`
|
||||
|
||||
### Step 3: Download and Validate prowjob.json
|
||||
|
||||
1. **Download prowjob.json**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Parse and validate**
|
||||
- Read `.work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json`
|
||||
- Search for pattern: `--target=([a-zA-Z0-9-]+)`
|
||||
- If not found:
|
||||
- Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
|
||||
- Explain: ci-operator jobs have a --target argument specifying the test target
|
||||
- Exit skill
|
||||
|
||||
3. **Extract target name**
|
||||
- Capture the target value (e.g., `e2e-aws-ovn-techpreview`)
|
||||
- Store for constructing must-gather path
|
||||
|
||||
### Step 4: Download Must-Gather Archive
|
||||
|
||||
1. **Construct must-gather path**
|
||||
- GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
|
||||
- Local path: `.work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar`
|
||||
|
||||
2. **Download must-gather.tar**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar .work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar --no-user-output-enabled
|
||||
```
|
||||
- Use `--no-user-output-enabled` to suppress progress output
|
||||
- If file not found, error: "No must-gather archive found. Job may not have completed or gather-must-gather may not have run."
|
||||
|
||||
### Step 5: Extract and Process Archives
|
||||
|
||||
**IMPORTANT: Use the provided Python script `extract_archives.py` from the skill directory.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/{build_id}/logs
|
||||
```
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Extract must-gather.tar**
|
||||
- Extract to `{build_id}/logs/` directory
|
||||
- Uses Python's tarfile module for reliable extraction
|
||||
|
||||
2. **Rename long subdirectory to "content/"**
|
||||
- Find subdirectory containing "-ci-" in the name
|
||||
- Example: `registry-build09-ci-openshift-org-ci-op-m8t77165-stable-sha256-d1ae126eed86a47fdbc8db0ad176bf078a5edebdbb0df180d73f02e5f03779e0/`
|
||||
- Rename to: `content/`
|
||||
- Preserves all files and subdirectories
|
||||
|
||||
3. **Recursively process nested archives**
|
||||
- Walk entire directory tree
|
||||
- Find and process archives:
|
||||
|
||||
**For .tar.gz and .tgz files:**
|
||||
```python
|
||||
# Extract in place
|
||||
with tarfile.open(archive_path, 'r:gz') as tar:
|
||||
tar.extractall(path=parent_dir)
|
||||
# Remove original archive
|
||||
os.remove(archive_path)
|
||||
```
|
||||
|
||||
**For .gz files (no tar):**
|
||||
```python
|
||||
# Gunzip in place
|
||||
with gzip.open(gz_path, 'rb') as f_in:
|
||||
with open(output_path, 'wb') as f_out:
|
||||
shutil.copyfileobj(f_in, f_out)
|
||||
# Remove original archive
|
||||
os.remove(gz_path)
|
||||
```
|
||||
|
||||
4. **Progress reporting**
|
||||
- Print status for each extracted archive
|
||||
- Count total files and archives processed
|
||||
- Report final statistics
|
||||
|
||||
5. **Error handling**
|
||||
- Skip corrupted archives with warning
|
||||
- Continue processing other files
|
||||
- Report all errors at the end
|
||||
|
||||
### Step 6: Generate HTML File Browser
|
||||
|
||||
**IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/{build_id}/logs \
|
||||
"{prowjob_name}" \
|
||||
"{build_id}" \
|
||||
"{target}" \
|
||||
"{gcsweb_url}"
|
||||
```
|
||||
|
||||
**Output:** The script generates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Scan directory tree**
|
||||
- Recursively walk `{build_id}/logs/` directory
|
||||
- Collect all files with metadata:
|
||||
- Relative path from logs/
|
||||
- File size (human-readable: KB, MB, GB)
|
||||
- File extension
|
||||
- Directory depth
|
||||
- Last modified time
|
||||
|
||||
2. **Classify files**
|
||||
- Detect file types based on extension:
|
||||
- Logs: `.log`, `.txt`
|
||||
- YAML: `.yaml`, `.yml`
|
||||
- JSON: `.json`
|
||||
- XML: `.xml`
|
||||
- Certificates: `.crt`, `.pem`, `.key`
|
||||
- Binaries: `.tar`, `.gz`, `.tgz`, `.tar.gz`
|
||||
- Other
|
||||
- Count files by type for statistics
|
||||
|
||||
3. **Generate HTML structure**
|
||||
|
||||
**Header Section:**
|
||||
```html
|
||||
<div class="header">
|
||||
<h1>Must-Gather File Browser</h1>
|
||||
<div class="metadata">
|
||||
<p><strong>Prow Job:</strong> {prowjob-name}</p>
|
||||
<p><strong>Build ID:</strong> {build_id}</p>
|
||||
<p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
|
||||
<p><strong>Target:</strong> {target}</p>
|
||||
<p><strong>Total Files:</strong> {count}</p>
|
||||
<p><strong>Total Size:</strong> {human-readable-size}</p>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**Filter Controls:**
|
||||
```html
|
||||
<div class="filters">
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">File Type (multi-select)</label>
|
||||
<div class="filter-buttons">
|
||||
<button class="filter-btn" data-filter="type" data-value="log">Logs ({count})</button>
|
||||
<button class="filter-btn" data-filter="type" data-value="yaml">YAML ({count})</button>
|
||||
<button class="filter-btn" data-filter="type" data-value="json">JSON ({count})</button>
|
||||
<!-- etc -->
|
||||
</div>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">Filter by Regex Pattern</label>
|
||||
<input type="text" class="search-box" id="pattern" placeholder="Enter regex pattern (e.g., .*etcd.*, .*\\.log$)">
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">Search by Name</label>
|
||||
<input type="text" class="search-box" id="search" placeholder="Search file names...">
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**File List:**
|
||||
```html
|
||||
<div class="file-list">
|
||||
<div class="file-item" data-type="{type}" data-path="{path}">
|
||||
<div class="file-icon">{icon}</div>
|
||||
<div class="file-info">
|
||||
<div class="file-name">
|
||||
<a href="{relative-path}" target="_blank">{filename}</a>
|
||||
</div>
|
||||
<div class="file-meta">
|
||||
<span class="file-path">{directory-path}</span>
|
||||
<span class="file-size">{size}</span>
|
||||
<span class="file-type badge badge-{type}">{type}</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**CSS Styling:**
|
||||
- Use same dark theme as analyze-resource skill
|
||||
- Modern, clean design with good contrast
|
||||
- Responsive layout
|
||||
- File type color coding
|
||||
- Monospace fonts for paths
|
||||
- Hover effects on file items
|
||||
|
||||
**JavaScript Interactivity:**
|
||||
```javascript
|
||||
// Multi-select file type filters
|
||||
document.querySelectorAll('.filter-btn').forEach(btn => {
|
||||
btn.addEventListener('click', function() {
|
||||
// Toggle active state
|
||||
// Apply filters
|
||||
});
|
||||
});
|
||||
|
||||
// Regex pattern filter
|
||||
document.getElementById('pattern').addEventListener('input', function() {
|
||||
const pattern = this.value;
|
||||
if (pattern) {
|
||||
const regex = new RegExp(pattern);
|
||||
// Filter files matching regex
|
||||
}
|
||||
});
|
||||
|
||||
// Name search filter
|
||||
document.getElementById('search').addEventListener('input', function() {
|
||||
const query = this.value.toLowerCase();
|
||||
// Filter files by name substring
|
||||
});
|
||||
|
||||
// Combine all active filters
|
||||
function applyFilters() {
|
||||
// Show/hide files based on all active filters
|
||||
}
|
||||
```
|
||||
|
||||
4. **Statistics Section:**
|
||||
```html
|
||||
<div class="stats">
|
||||
<div class="stat">
|
||||
<div class="stat-value">{total-files}</div>
|
||||
<div class="stat-label">Total Files</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{total-size}</div>
|
||||
<div class="stat-label">Total Size</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{log-count}</div>
|
||||
<div class="stat-label">Log Files</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{yaml-count}</div>
|
||||
<div class="stat-label">YAML Files</div>
|
||||
</div>
|
||||
<!-- etc -->
|
||||
</div>
|
||||
```
|
||||
|
||||
5. **Write HTML to file**
|
||||
- Script automatically writes to `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- Includes proper HTML5 structure
|
||||
- All CSS and JavaScript are inline for portability
|
||||
|
||||
### Step 7: Present Results to User
|
||||
|
||||
1. **Display summary**
|
||||
```
|
||||
Must-Gather Extraction Complete
|
||||
|
||||
Prow Job: {prowjob-name}
|
||||
Build ID: {build_id}
|
||||
Target: {target}
|
||||
|
||||
Extraction Statistics:
|
||||
- Total files: {file-count}
|
||||
- Total size: {human-readable-size}
|
||||
- Archives extracted: {archive-count}
|
||||
- Log files: {log-count}
|
||||
- YAML files: {yaml-count}
|
||||
- JSON files: {json-count}
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/{build_id}/logs/
|
||||
|
||||
File browser generated: .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html
|
||||
|
||||
Open in browser to browse and search extracted files.
|
||||
```
|
||||
|
||||
2. **Open report in browser**
|
||||
- Detect platform and automatically open the HTML report in the default browser
|
||||
- Linux: `xdg-open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- macOS: `open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- Windows: `start .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- On Linux (most common for this environment), use `xdg-open`
|
||||
|
||||
3. **Offer next steps**
|
||||
- Ask if user wants to search for specific files
|
||||
- Explain that extracted files are available in `.work/prow-job-extract-must-gather/{build_id}/logs/`
|
||||
- Mention that extraction is cached for faster subsequent browsing
|
||||
|
||||
## Error Handling
|
||||
|
||||
Handle these error scenarios gracefully:
|
||||
|
||||
1. **Invalid URL format**
|
||||
- Error: "URL must contain 'test-platform-results/' substring"
|
||||
- Provide example of valid URL
|
||||
|
||||
2. **Build ID not found**
|
||||
- Error: "Could not find build ID (10+ decimal digits) in URL path"
|
||||
- Explain requirement and show URL parsing
|
||||
|
||||
3. **gcloud not installed**
|
||||
- Detect with: `which gcloud`
|
||||
- Provide installation instructions for user's platform
|
||||
- Link: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
4. **prowjob.json not found**
|
||||
- Suggest verifying URL and checking if job completed
|
||||
- Provide gcsweb URL for manual verification
|
||||
|
||||
5. **Not a ci-operator job**
|
||||
- Error: "This is not a ci-operator job. No --target found in prowjob.json."
|
||||
- Explain: Only ci-operator jobs can be analyzed by this skill
|
||||
|
||||
6. **must-gather.tar not found**
|
||||
- Warn: "Must-gather archive not found at expected path"
|
||||
- Suggest: Job may not have completed or gather-must-gather may not have run
|
||||
- Provide full GCS path that was checked
|
||||
|
||||
7. **Corrupted archive**
|
||||
- Warn: "Could not extract {archive-path}: {error}"
|
||||
- Continue processing other archives
|
||||
- Report all errors in final summary
|
||||
|
||||
8. **No "-ci-" subdirectory found**
|
||||
- Warn: "Could not find expected subdirectory to rename to 'content/'"
|
||||
- Continue with extraction anyway
|
||||
- Files will be in original directory structure
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Avoid re-extracting**
|
||||
- Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` already has content
|
||||
- Ask user before re-extracting
|
||||
|
||||
2. **Efficient downloads**
|
||||
- Use `gcloud storage cp` with `--no-user-output-enabled` to suppress verbose output
|
||||
|
||||
3. **Memory efficiency**
|
||||
- Process archives incrementally
|
||||
- Don't load entire files into memory
|
||||
- Use streaming extraction
|
||||
|
||||
4. **Progress indicators**
|
||||
- Show "Downloading must-gather archive..." before gcloud command
|
||||
- Show "Extracting must-gather.tar..." before extraction
|
||||
- Show "Processing nested archives..." during recursive extraction
|
||||
- Show "Generating HTML file browser..." before report generation
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Extract must-gather from periodic job
|
||||
```
|
||||
User: "Extract must-gather from this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
|
||||
Output:
|
||||
- Downloads must-gather.tar to: .work/prow-job-extract-must-gather/1965715986610917376/tmp/
|
||||
- Extracts to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
|
||||
- Renames long subdirectory to: content/
|
||||
- Processes 247 nested archives (.tar.gz, .tgz, .gz)
|
||||
- Creates: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
|
||||
- Opens browser with interactive file list (3,421 files, 234 MB)
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
- Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
|
||||
- Authentication is NOT required - the bucket is publicly accessible
|
||||
- Use `.work/prow-job-extract-must-gather/{build_id}/` directory structure for organization
|
||||
- All work files are in `.work/` which is already in .gitignore
|
||||
- The Python scripts handle all extraction and HTML generation - use them!
|
||||
- Cache extracted files in `.work/prow-job-extract-must-gather/{build_id}/` to avoid re-extraction
|
||||
- The HTML file browser supports regex patterns for powerful file filtering
|
||||
- Extracted files can be opened directly from the HTML browser (links are relative)
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Archive Processing:**
|
||||
- The script automatically handles nested archives
|
||||
- Original compressed files are removed after successful extraction
|
||||
- Corrupted archives are skipped with warnings
|
||||
|
||||
2. **Directory Renaming:**
|
||||
- The long subdirectory name (containing "-ci-") is renamed to "content/" for brevity
|
||||
- Files within "content/" are NOT altered
|
||||
- This makes paths more readable in the HTML browser
|
||||
|
||||
3. **File Type Detection:**
|
||||
- File types are detected based on extension
|
||||
- Common types are color-coded in the HTML browser
|
||||
- All file types can be filtered
|
||||
|
||||
4. **Regex Pattern Filtering:**
|
||||
- Users can enter regex patterns in the filter input
|
||||
- Patterns match against full file paths
|
||||
- Invalid regex patterns are ignored gracefully
|
||||
|
||||
5. **Working with Scripts:**
|
||||
- All scripts are in `plugins/prow-job/skills/prow-job-extract-must-gather/`
|
||||
- `extract_archives.py` - Extracts and processes archives
|
||||
- `generate_html_report.py` - Generates interactive HTML file browser
|
||||
202
skills/prow-job-extract-must-gather/extract_archives.py
Executable file
202
skills/prow-job-extract-must-gather/extract_archives.py
Executable file
@@ -0,0 +1,202 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Extract and recursively decompress must-gather archives."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import tarfile
|
||||
import gzip
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def human_readable_size(size_bytes):
|
||||
"""Convert bytes to human-readable format."""
|
||||
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
|
||||
if size_bytes < 1024.0:
|
||||
return f"{size_bytes:.1f} {unit}"
|
||||
size_bytes /= 1024.0
|
||||
return f"{size_bytes:.1f} PB"
|
||||
|
||||
|
||||
def extract_tar_archive(tar_path, extract_to):
|
||||
"""Extract a tar archive (including .tar.gz and .tgz)."""
|
||||
try:
|
||||
print(f" Extracting: {tar_path}")
|
||||
with tarfile.open(tar_path, 'r:*') as tar:
|
||||
tar.extractall(path=extract_to)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f" ERROR: Failed to extract {tar_path}: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def gunzip_file(gz_path):
|
||||
"""Gunzip a .gz file (not a tar.gz)."""
|
||||
try:
|
||||
# Output file is the same name without .gz extension
|
||||
output_path = gz_path[:-3] if gz_path.endswith('.gz') else gz_path + '.decompressed'
|
||||
|
||||
print(f" Decompressing: {gz_path}")
|
||||
with gzip.open(gz_path, 'rb') as f_in:
|
||||
with open(output_path, 'wb') as f_out:
|
||||
shutil.copyfileobj(f_in, f_out)
|
||||
return True, output_path
|
||||
except Exception as e:
|
||||
print(f" ERROR: Failed to decompress {gz_path}: {e}", file=sys.stderr)
|
||||
return False, None
|
||||
|
||||
|
||||
def find_and_rename_ci_directory(base_path):
|
||||
"""Find directory containing '-ci-' and rename it to 'content'."""
|
||||
try:
|
||||
for item in os.listdir(base_path):
|
||||
item_path = os.path.join(base_path, item)
|
||||
if os.path.isdir(item_path) and '-ci-' in item:
|
||||
content_path = os.path.join(base_path, 'content')
|
||||
print(f"\nRenaming directory:")
|
||||
print(f" From: {item}")
|
||||
print(f" To: content/")
|
||||
os.rename(item_path, content_path)
|
||||
return True
|
||||
print("\nWARNING: No directory containing '-ci-' found to rename", file=sys.stderr)
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"ERROR: Failed to rename directory: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def process_nested_archives(base_path):
|
||||
"""Recursively find and extract nested archives."""
|
||||
archives_processed = 0
|
||||
errors = []
|
||||
|
||||
print("\nProcessing nested archives...")
|
||||
|
||||
# Keep processing until no more archives are found
|
||||
# (since extracting one archive might create new archives)
|
||||
max_iterations = 10
|
||||
iteration = 0
|
||||
|
||||
while iteration < max_iterations:
|
||||
iteration += 1
|
||||
found_archives = False
|
||||
|
||||
# Walk directory tree
|
||||
for root, dirs, files in os.walk(base_path):
|
||||
for filename in files:
|
||||
file_path = os.path.join(root, filename)
|
||||
processed = False
|
||||
|
||||
# Handle .tar.gz and .tgz files
|
||||
if filename.endswith('.tar.gz') or filename.endswith('.tgz'):
|
||||
parent_dir = os.path.dirname(file_path)
|
||||
if extract_tar_archive(file_path, parent_dir):
|
||||
os.remove(file_path)
|
||||
archives_processed += 1
|
||||
processed = True
|
||||
found_archives = True
|
||||
else:
|
||||
errors.append(f"Failed to extract: {file_path}")
|
||||
|
||||
# Handle plain .gz files (not .tar.gz)
|
||||
elif filename.endswith('.gz') and not filename.endswith('.tar.gz'):
|
||||
success, output_path = gunzip_file(file_path)
|
||||
if success:
|
||||
os.remove(file_path)
|
||||
archives_processed += 1
|
||||
processed = True
|
||||
found_archives = True
|
||||
else:
|
||||
errors.append(f"Failed to decompress: {file_path}")
|
||||
|
||||
# If no archives were found in this iteration, we're done
|
||||
if not found_archives:
|
||||
break
|
||||
|
||||
if iteration >= max_iterations:
|
||||
print(f"\nWARNING: Stopped after {max_iterations} iterations. Some nested archives may remain.", file=sys.stderr)
|
||||
|
||||
return archives_processed, errors
|
||||
|
||||
|
||||
def count_files_and_size(base_path):
|
||||
"""Count total files and calculate total size."""
|
||||
total_files = 0
|
||||
total_size = 0
|
||||
|
||||
for root, dirs, files in os.walk(base_path):
|
||||
for filename in files:
|
||||
file_path = os.path.join(root, filename)
|
||||
try:
|
||||
total_files += 1
|
||||
total_size += os.path.getsize(file_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
return total_files, total_size
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) != 3:
|
||||
print("Usage: extract_archives.py <must-gather.tar> <output-directory>")
|
||||
print(" <must-gather.tar>: Path to the must-gather.tar file")
|
||||
print(" <output-directory>: Directory to extract to")
|
||||
sys.exit(1)
|
||||
|
||||
tar_file = sys.argv[1]
|
||||
output_dir = sys.argv[2]
|
||||
|
||||
# Validate inputs
|
||||
if not os.path.exists(tar_file):
|
||||
print(f"ERROR: Input file not found: {tar_file}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Create output directory
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
print("=" * 80)
|
||||
print("Must-Gather Archive Extraction")
|
||||
print("=" * 80)
|
||||
|
||||
# Step 1: Extract main tar file
|
||||
print(f"\nStep 1: Extracting must-gather.tar")
|
||||
print(f" From: {tar_file}")
|
||||
print(f" To: {output_dir}")
|
||||
|
||||
if not extract_tar_archive(tar_file, output_dir):
|
||||
print("ERROR: Failed to extract must-gather.tar", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Step 2: Rename directory containing '-ci-' to 'content'
|
||||
print(f"\nStep 2: Renaming long directory to 'content/'")
|
||||
find_and_rename_ci_directory(output_dir)
|
||||
|
||||
# Step 3: Process nested archives
|
||||
print(f"\nStep 3: Processing nested archives")
|
||||
archives_processed, errors = process_nested_archives(output_dir)
|
||||
|
||||
# Final statistics
|
||||
print("\n" + "=" * 80)
|
||||
print("Extraction Complete")
|
||||
print("=" * 80)
|
||||
|
||||
total_files, total_size = count_files_and_size(output_dir)
|
||||
|
||||
print(f"\nStatistics:")
|
||||
print(f" Total files: {total_files:,}")
|
||||
print(f" Total size: {human_readable_size(total_size)}")
|
||||
print(f" Archives processed: {archives_processed}")
|
||||
|
||||
if errors:
|
||||
print(f"\nErrors encountered: {len(errors)}")
|
||||
for error in errors[:10]: # Show first 10 errors
|
||||
print(f" - {error}")
|
||||
if len(errors) > 10:
|
||||
print(f" ... and {len(errors) - 10} more errors")
|
||||
|
||||
print(f"\nExtracted to: {output_dir}")
|
||||
print("")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
1289
skills/prow-job-extract-must-gather/generate_html_report.py
Executable file
1289
skills/prow-job-extract-must-gather/generate_html_report.py
Executable file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user