Initial commit
This commit is contained in:
54
skills/prow-job-extract-must-gather/CHANGELOG.md
Normal file
54
skills/prow-job-extract-must-gather/CHANGELOG.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to the Prow Job Extract Must-Gather skill will be documented in this file.
|
||||
|
||||
## [1.0.0] - 2025-01-17
|
||||
|
||||
### Added
|
||||
- Initial release of Prow Job Extract Must-Gather skill
|
||||
- Command file: `plugins/prow-job/commands/extract-must-gather.md`
|
||||
- Comprehensive SKILL.md with detailed implementation instructions
|
||||
- `extract_archives.py` script for recursive archive extraction
|
||||
- Extracts must-gather.tar to specified directory
|
||||
- Renames long "-ci-" containing subdirectory to "content/"
|
||||
- Recursively processes nested .tar.gz, .tgz, and .gz archives
|
||||
- Removes original compressed files after extraction
|
||||
- Handles up to 10 levels of nesting
|
||||
- Reports extraction statistics
|
||||
- `generate_html_report.py` script for HTML file browser generation
|
||||
- Scans directory tree and collects file metadata
|
||||
- Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
- Generates interactive HTML with dark theme matching analyze-resource skill
|
||||
- Multi-select file type filters
|
||||
- Regex pattern filter for powerful file searches
|
||||
- Text search for file names and paths
|
||||
- Direct links to files with relative paths
|
||||
- Statistics dashboard showing file counts and sizes
|
||||
- Scroll to top button
|
||||
- Comprehensive README.md documentation
|
||||
- Working directory structure: `.work/prow-job-extract-must-gather/{build_id}/`
|
||||
- Subdirectory organization: `logs/` for extracted content, `tmp/` for temporary files
|
||||
- Same URL parsing logic as analyze-resource skill
|
||||
- Support for caching extracted content (ask user before re-extracting)
|
||||
- Error handling for corrupted archives, missing files, and invalid URLs
|
||||
- Progress indicators for all long-running operations
|
||||
- Platform-aware browser opening (xdg-open, open, start)
|
||||
|
||||
### Features
|
||||
- **Automatic Archive Extraction**: Handles all nested archive formats automatically
|
||||
- **Directory Renaming**: Shortens long subdirectory names for better usability
|
||||
- **Interactive File Browser**: Modern HTML interface with powerful filtering
|
||||
- **Regex Pattern Matching**: Search files using full regex syntax
|
||||
- **File Type Classification**: Automatic detection and categorization of file types
|
||||
- **Relative File Links**: Click to open files directly from HTML browser
|
||||
- **Statistics Dashboard**: Visual overview of extracted content
|
||||
- **Extraction Caching**: Avoid re-extracting by reusing cached content
|
||||
- **Error Recovery**: Continue processing despite individual archive failures
|
||||
|
||||
### Technical Details
|
||||
- Python 3 scripts using standard library (tarfile, gzip, os, pathlib)
|
||||
- No external dependencies required
|
||||
- Memory-efficient incremental processing
|
||||
- Follows same patterns as analyze-resource skill
|
||||
- Integrated with Claude Code permissions system
|
||||
- Uses `.work/` directory (already in .gitignore)
|
||||
350
skills/prow-job-extract-must-gather/README.md
Normal file
350
skills/prow-job-extract-must-gather/README.md
Normal file
@@ -0,0 +1,350 @@
|
||||
# Prow Job Extract Must-Gather Skill
|
||||
|
||||
This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
|
||||
|
||||
## Overview
|
||||
|
||||
The skill provides both a Claude Code skill interface and standalone scripts for extracting must-gather data from Prow CI jobs. It eliminates the manual steps of downloading and recursively extracting nested archives.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. SKILL.md
|
||||
Claude Code skill definition that provides detailed implementation instructions for the AI assistant.
|
||||
|
||||
### 2. Python Scripts
|
||||
|
||||
#### extract_archives.py
|
||||
Extracts and recursively processes must-gather archives.
|
||||
|
||||
**Features:**
|
||||
- Extracts must-gather.tar to specified directory
|
||||
- Renames long subdirectory (containing "-ci-") to "content/" for readability
|
||||
- Recursively processes nested archives:
|
||||
- `.tar.gz` and `.tgz`: Extract in place, remove original
|
||||
- `.gz` (plain gzip): Decompress in place, remove original
|
||||
- Handles up to 10 levels of nested archives
|
||||
- Reports extraction statistics
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 extract_archives.py <must-gather.tar> <output-directory>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
================================================================================
|
||||
Must-Gather Archive Extraction
|
||||
================================================================================
|
||||
|
||||
Step 1: Extracting must-gather.tar
|
||||
From: .work/.../tmp/must-gather.tar
|
||||
To: .work/.../logs
|
||||
Extracting: .work/.../tmp/must-gather.tar
|
||||
|
||||
Step 2: Renaming long directory to 'content/'
|
||||
From: registry-build09-ci-openshift-org-ci-op-...
|
||||
To: content/
|
||||
|
||||
Step 3: Processing nested archives
|
||||
Extracting: .../content/namespaces/openshift-etcd/pods/etcd-0.tar.gz
|
||||
Decompressing: .../content/cluster-scoped-resources/nodes/ip-10-0-1-234.log.gz
|
||||
... (continues for all archives)
|
||||
|
||||
================================================================================
|
||||
Extraction Complete
|
||||
================================================================================
|
||||
|
||||
Statistics:
|
||||
Total files: 3,421
|
||||
Total size: 234.5 MB
|
||||
Archives processed: 247
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
```
|
||||
|
||||
#### generate_html_report.py
|
||||
Generates an interactive HTML file browser with filters and search.
|
||||
|
||||
**Features:**
|
||||
- Scans directory tree and collects file metadata
|
||||
- Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
- Generates statistics (total files, total size, counts by type)
|
||||
- Creates interactive HTML with:
|
||||
- Multi-select file type filters
|
||||
- Regex pattern filter for powerful searches
|
||||
- Text search for file names/paths
|
||||
- Direct links to files (relative paths)
|
||||
- Same dark theme as analyze-resource skill
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 generate_html_report.py <logs-directory> <prowjob_name> <build_id> <target> <gcsweb_url>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs \
|
||||
"periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
|
||||
"1965715986610917376" \
|
||||
"e2e-aws-ovn-techpreview" \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Creates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Python 3** - For running extraction and report generator scripts
|
||||
2. **gcloud CLI** - For downloading artifacts from GCS
|
||||
- Install: https://cloud.google.com/sdk/docs/install
|
||||
- Authentication NOT required (bucket is publicly accessible)
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **URL Parsing**
|
||||
- Validate URL contains `test-platform-results/`
|
||||
- Extract build_id (10+ digits)
|
||||
- Extract prowjob name
|
||||
- Construct GCS paths
|
||||
|
||||
2. **Working Directory**
|
||||
- Create `.work/prow-job-extract-must-gather/{build_id}/` directory
|
||||
- Create `logs/` subdirectory for extraction
|
||||
- Create `tmp/` subdirectory for temporary files
|
||||
- Check for existing extraction (offers to skip re-extraction)
|
||||
|
||||
3. **prowjob.json Validation**
|
||||
- Download prowjob.json
|
||||
- Search for `--target=` pattern
|
||||
- Exit if not a ci-operator job
|
||||
|
||||
4. **Must-Gather Download**
|
||||
- Download from: `artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
|
||||
- Save to: `{build_id}/tmp/must-gather.tar`
|
||||
|
||||
5. **Extraction and Processing**
|
||||
- Extract must-gather.tar to `{build_id}/logs/`
|
||||
- Rename long subdirectory to "content/"
|
||||
- Recursively extract nested archives (.tar.gz, .tgz, .gz)
|
||||
- Remove original compressed files after extraction
|
||||
|
||||
6. **HTML Report Generation**
|
||||
- Scan directory tree
|
||||
- Classify files by type
|
||||
- Calculate statistics
|
||||
- Generate interactive HTML browser
|
||||
- Output to `{build_id}/must-gather-browser.html`
|
||||
|
||||
## Output
|
||||
|
||||
### Console Output
|
||||
```
|
||||
Must-Gather Extraction Complete
|
||||
|
||||
Prow Job: periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
|
||||
Build ID: 1965715986610917376
|
||||
Target: e2e-aws-ovn-techpreview
|
||||
|
||||
Extraction Statistics:
|
||||
- Total files: 3,421
|
||||
- Total size: 234.5 MB
|
||||
- Archives extracted: 247
|
||||
- Log files: 1,234
|
||||
- YAML files: 856
|
||||
- JSON files: 423
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
|
||||
|
||||
File browser generated: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
|
||||
|
||||
Open in browser to browse and search extracted files.
|
||||
```
|
||||
|
||||
### HTML File Browser
|
||||
|
||||
The generated HTML report includes:
|
||||
|
||||
1. **Header Section**
|
||||
- Prow job name
|
||||
- Build ID
|
||||
- Target name
|
||||
- GCS URL (link to gcsweb)
|
||||
- Local extraction path
|
||||
|
||||
2. **Statistics Dashboard**
|
||||
- Total files count
|
||||
- Total size (human-readable)
|
||||
- Counts by file type (log, yaml, json, xml, cert, archive, script, config, other)
|
||||
|
||||
3. **Filter Controls**
|
||||
- **File Type Filter**: Multi-select buttons to filter by type
|
||||
- **Regex Pattern Filter**: Input field for regex patterns (e.g., `.*etcd.*`, `.*\.log$`, `^content/namespaces/.*`)
|
||||
- **Name Search**: Text search for file names and paths
|
||||
|
||||
4. **File List**
|
||||
- Icon for each file type
|
||||
- File name (clickable link to open file)
|
||||
- Directory path
|
||||
- File size
|
||||
- File type badge (color-coded)
|
||||
- Sorted alphabetically by path
|
||||
|
||||
5. **Interactive Features**
|
||||
- All filters work together (AND logic)
|
||||
- Real-time filtering (300ms debounce)
|
||||
- Regex pattern validation
|
||||
- Scroll to top button
|
||||
- No results message when filters match nothing
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
.work/prow-job-extract-must-gather/{build_id}/
|
||||
├── tmp/
|
||||
│ ├── prowjob.json
|
||||
│ └── must-gather.tar (downloaded, not deleted)
|
||||
├── logs/
|
||||
│ └── content/ # Renamed from long directory
|
||||
│ ├── cluster-scoped-resources/
|
||||
│ │ ├── nodes/
|
||||
│ │ ├── clusterroles/
|
||||
│ │ └── ...
|
||||
│ ├── namespaces/
|
||||
│ │ ├── openshift-etcd/
|
||||
│ │ │ ├── pods/
|
||||
│ │ │ ├── services/
|
||||
│ │ │ └── ...
|
||||
│ │ └── ...
|
||||
│ └── ... (all extracted and decompressed)
|
||||
└── must-gather-browser.html
|
||||
```
|
||||
|
||||
## Performance Features
|
||||
|
||||
1. **Caching**
|
||||
- Extracted files are cached in `{build_id}/logs/`
|
||||
- Offers to skip re-extraction if content already exists
|
||||
|
||||
2. **Incremental Processing**
|
||||
- Archives processed iteratively (up to 10 passes)
|
||||
- Handles deeply nested archive structures
|
||||
|
||||
3. **Progress Indicators**
|
||||
- Colored output for different stages
|
||||
- Status messages for long-running operations
|
||||
- Final statistics summary
|
||||
|
||||
4. **Error Handling**
|
||||
- Graceful handling of corrupted archives
|
||||
- Continues processing after errors
|
||||
- Reports all errors in final summary
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Usage
|
||||
```bash
|
||||
# Via Claude Code
|
||||
User: "Extract must-gather from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
|
||||
# Standalone script
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs
|
||||
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/1965715986610917376/logs \
|
||||
"periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
|
||||
"1965715986610917376" \
|
||||
"e2e-aws-ovn-techpreview" \
|
||||
"https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
```
|
||||
|
||||
### Using Regex Filters in HTML Browser
|
||||
|
||||
**Find all etcd-related files:**
|
||||
```regex
|
||||
.*etcd.*
|
||||
```
|
||||
|
||||
**Find all log files:**
|
||||
```regex
|
||||
.*\.log$
|
||||
```
|
||||
|
||||
**Find files in specific namespace:**
|
||||
```regex
|
||||
^content/namespaces/openshift-etcd/.*
|
||||
```
|
||||
|
||||
**Find YAML manifests for pods:**
|
||||
```regex
|
||||
.*pods/.*\.yaml$
|
||||
```
|
||||
|
||||
## Using with Claude Code
|
||||
|
||||
When you ask Claude to extract a must-gather, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:
|
||||
- Validating prerequisites
|
||||
- Parsing URLs
|
||||
- Downloading archives
|
||||
- Extracting and decompressing
|
||||
- Generating HTML browser
|
||||
|
||||
You can simply ask:
|
||||
> "Extract must-gather from this Prow job: https://gcsweb-ci.../1965715986610917376/"
|
||||
|
||||
Claude will execute the workflow and generate the interactive HTML file browser.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### gcloud not installed
|
||||
```bash
|
||||
# Check installation
|
||||
which gcloud
|
||||
|
||||
# Install (follow platform-specific instructions)
|
||||
# https://cloud.google.com/sdk/docs/install
|
||||
```
|
||||
|
||||
### must-gather.tar not found
|
||||
- Verify job completed successfully
|
||||
- Check target name is correct
|
||||
- Confirm gather-must-gather ran in the job
|
||||
- Manually check GCS path in gcsweb
|
||||
|
||||
### Corrupted archives
|
||||
- Check error messages in extraction output
|
||||
- Extraction continues despite individual failures
|
||||
- Final summary lists all errors
|
||||
|
||||
### No "-ci-" directory found
|
||||
- Extraction continues with original directory names
|
||||
- Check logs for warning message
|
||||
- Files will still be accessible
|
||||
|
||||
### HTML browser not opening files
|
||||
- Verify files were extracted to `logs/` directory
|
||||
- Check that relative paths are correct
|
||||
- Files must be opened from the same directory as HTML file
|
||||
|
||||
## File Type Classifications
|
||||
|
||||
| Extension | Type | Badge Color |
|
||||
|-----------|------|-------------|
|
||||
| .log, .txt | log | Blue |
|
||||
| .yaml, .yml | yaml | Purple |
|
||||
| .json | json | Green |
|
||||
| .xml | xml | Yellow |
|
||||
| .crt, .pem, .key | cert | Red |
|
||||
| .tar, .gz, .tgz, .zip | archive | Gray |
|
||||
| .sh, .py | script | Blue |
|
||||
| .conf, .cfg, .ini | config | Yellow |
|
||||
| others | other | Gray |
|
||||
493
skills/prow-job-extract-must-gather/SKILL.md
Normal file
493
skills/prow-job-extract-must-gather/SKILL.md
Normal file
@@ -0,0 +1,493 @@
|
||||
---
|
||||
name: Prow Job Extract Must-Gather
|
||||
description: Extract and decompress must-gather archives from Prow CI job artifacts, generating an interactive HTML file browser with filters
|
||||
---
|
||||
|
||||
# Prow Job Extract Must-Gather
|
||||
|
||||
This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when the user wants to:
|
||||
- Extract must-gather archives from Prow CI job artifacts
|
||||
- Avoid manually downloading and extracting nested archives
|
||||
- Browse must-gather contents with an interactive HTML interface
|
||||
- Search for specific files or file types in must-gather data
|
||||
- Analyze OpenShift cluster state from CI test runs
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting, verify these prerequisites:
|
||||
|
||||
1. **gcloud CLI Installation**
|
||||
- Check if installed: `which gcloud`
|
||||
- If not installed, provide instructions for the user's platform
|
||||
- Installation guide: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
2. **gcloud Authentication (Optional)**
|
||||
- The `test-platform-results` bucket is publicly accessible
|
||||
- No authentication is required for read access
|
||||
- Skip authentication checks
|
||||
|
||||
## Input Format
|
||||
|
||||
The user will provide:
|
||||
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
|
||||
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
|
||||
- URL may or may not have trailing slash
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Parse and Validate URL
|
||||
|
||||
1. **Extract bucket path**
|
||||
- Find `test-platform-results/` in URL
|
||||
- Extract everything after it as the GCS bucket relative path
|
||||
- If not found, error: "URL must contain 'test-platform-results/'"
|
||||
|
||||
2. **Extract build_id**
|
||||
- Search for pattern `/(\\d{10,})/` in the bucket path
|
||||
- build_id must be at least 10 consecutive decimal digits
|
||||
- Handle URLs with or without trailing slash
|
||||
- If not found, error: "Could not find build ID (10+ digits) in URL"
|
||||
|
||||
3. **Extract prowjob name**
|
||||
- Find the path segment immediately preceding build_id
|
||||
- Example: In `.../periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
|
||||
- Prowjob name: `periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview`
|
||||
|
||||
4. **Construct GCS paths**
|
||||
- Bucket: `test-platform-results`
|
||||
- Base GCS path: `gs://test-platform-results/{bucket-path}/`
|
||||
- Ensure path ends with `/`
|
||||
|
||||
### Step 2: Create Working Directory
|
||||
|
||||
1. **Check for existing extraction first**
|
||||
- Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` directory exists and has content
|
||||
- If it exists with content:
|
||||
- Use AskUserQuestion tool to ask:
|
||||
- Question: "Must-gather already extracted for build {build_id}. Would you like to use the existing extraction or re-extract?"
|
||||
- Options:
|
||||
- "Use existing" - Skip to HTML report generation (Step 6)
|
||||
- "Re-extract" - Continue to clean and re-download
|
||||
- If user chooses "Re-extract":
|
||||
- Remove all existing content: `rm -rf .work/prow-job-extract-must-gather/{build_id}/logs/`
|
||||
- Also remove tmp directory: `rm -rf .work/prow-job-extract-must-gather/{build_id}/tmp/`
|
||||
- This ensures clean state before downloading new content
|
||||
- If user chooses "Use existing":
|
||||
- Skip directly to Step 6 (Generate HTML Report)
|
||||
|
||||
2. **Create directory structure**
|
||||
```bash
|
||||
mkdir -p .work/prow-job-extract-must-gather/{build_id}/logs
|
||||
mkdir -p .work/prow-job-extract-must-gather/{build_id}/tmp
|
||||
```
|
||||
- Use `.work/prow-job-extract-must-gather/` as the base directory (already in .gitignore)
|
||||
- Use build_id as subdirectory name
|
||||
- Create `logs/` subdirectory for extraction
|
||||
- Create `tmp/` subdirectory for temporary files
|
||||
- Working directory: `.work/prow-job-extract-must-gather/{build_id}/`
|
||||
|
||||
### Step 3: Download and Validate prowjob.json
|
||||
|
||||
1. **Download prowjob.json**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json --no-user-output-enabled
|
||||
```
|
||||
|
||||
2. **Parse and validate**
|
||||
- Read `.work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json`
|
||||
- Search for pattern: `--target=([a-zA-Z0-9-]+)`
|
||||
- If not found:
|
||||
- Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
|
||||
- Explain: ci-operator jobs have a --target argument specifying the test target
|
||||
- Exit skill
|
||||
|
||||
3. **Extract target name**
|
||||
- Capture the target value (e.g., `e2e-aws-ovn-techpreview`)
|
||||
- Store for constructing must-gather path
|
||||
|
||||
### Step 4: Download Must-Gather Archive
|
||||
|
||||
1. **Construct must-gather path**
|
||||
- GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
|
||||
- Local path: `.work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar`
|
||||
|
||||
2. **Download must-gather.tar**
|
||||
```bash
|
||||
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar .work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar --no-user-output-enabled
|
||||
```
|
||||
- Use `--no-user-output-enabled` to suppress progress output
|
||||
- If file not found, error: "No must-gather archive found. Job may not have completed or gather-must-gather may not have run."
|
||||
|
||||
### Step 5: Extract and Process Archives
|
||||
|
||||
**IMPORTANT: Use the provided Python script `extract_archives.py` from the skill directory.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
|
||||
.work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar \
|
||||
.work/prow-job-extract-must-gather/{build_id}/logs
|
||||
```
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Extract must-gather.tar**
|
||||
- Extract to `{build_id}/logs/` directory
|
||||
- Uses Python's tarfile module for reliable extraction
|
||||
|
||||
2. **Rename long subdirectory to "content/"**
|
||||
- Find subdirectory containing "-ci-" in the name
|
||||
- Example: `registry-build09-ci-openshift-org-ci-op-m8t77165-stable-sha256-d1ae126eed86a47fdbc8db0ad176bf078a5edebdbb0df180d73f02e5f03779e0/`
|
||||
- Rename to: `content/`
|
||||
- Preserves all files and subdirectories
|
||||
|
||||
3. **Recursively process nested archives**
|
||||
- Walk entire directory tree
|
||||
- Find and process archives:
|
||||
|
||||
**For .tar.gz and .tgz files:**
|
||||
```python
|
||||
# Extract in place
|
||||
with tarfile.open(archive_path, 'r:gz') as tar:
|
||||
tar.extractall(path=parent_dir)
|
||||
# Remove original archive
|
||||
os.remove(archive_path)
|
||||
```
|
||||
|
||||
**For .gz files (no tar):**
|
||||
```python
|
||||
# Gunzip in place
|
||||
with gzip.open(gz_path, 'rb') as f_in:
|
||||
with open(output_path, 'wb') as f_out:
|
||||
shutil.copyfileobj(f_in, f_out)
|
||||
# Remove original archive
|
||||
os.remove(gz_path)
|
||||
```
|
||||
|
||||
4. **Progress reporting**
|
||||
- Print status for each extracted archive
|
||||
- Count total files and archives processed
|
||||
- Report final statistics
|
||||
|
||||
5. **Error handling**
|
||||
- Skip corrupted archives with warning
|
||||
- Continue processing other files
|
||||
- Report all errors at the end
|
||||
|
||||
### Step 6: Generate HTML File Browser
|
||||
|
||||
**IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.**
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
|
||||
.work/prow-job-extract-must-gather/{build_id}/logs \
|
||||
"{prowjob_name}" \
|
||||
"{build_id}" \
|
||||
"{target}" \
|
||||
"{gcsweb_url}"
|
||||
```
|
||||
|
||||
**Output:** The script generates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
|
||||
**What the script does:**
|
||||
|
||||
1. **Scan directory tree**
|
||||
- Recursively walk `{build_id}/logs/` directory
|
||||
- Collect all files with metadata:
|
||||
- Relative path from logs/
|
||||
- File size (human-readable: KB, MB, GB)
|
||||
- File extension
|
||||
- Directory depth
|
||||
- Last modified time
|
||||
|
||||
2. **Classify files**
|
||||
- Detect file types based on extension:
|
||||
- Logs: `.log`, `.txt`
|
||||
- YAML: `.yaml`, `.yml`
|
||||
- JSON: `.json`
|
||||
- XML: `.xml`
|
||||
- Certificates: `.crt`, `.pem`, `.key`
|
||||
- Binaries: `.tar`, `.gz`, `.tgz`, `.tar.gz`
|
||||
- Other
|
||||
- Count files by type for statistics
|
||||
|
||||
3. **Generate HTML structure**
|
||||
|
||||
**Header Section:**
|
||||
```html
|
||||
<div class="header">
|
||||
<h1>Must-Gather File Browser</h1>
|
||||
<div class="metadata">
|
||||
<p><strong>Prow Job:</strong> {prowjob-name}</p>
|
||||
<p><strong>Build ID:</strong> {build_id}</p>
|
||||
<p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
|
||||
<p><strong>Target:</strong> {target}</p>
|
||||
<p><strong>Total Files:</strong> {count}</p>
|
||||
<p><strong>Total Size:</strong> {human-readable-size}</p>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**Filter Controls:**
|
||||
```html
|
||||
<div class="filters">
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">File Type (multi-select)</label>
|
||||
<div class="filter-buttons">
|
||||
<button class="filter-btn" data-filter="type" data-value="log">Logs ({count})</button>
|
||||
<button class="filter-btn" data-filter="type" data-value="yaml">YAML ({count})</button>
|
||||
<button class="filter-btn" data-filter="type" data-value="json">JSON ({count})</button>
|
||||
<!-- etc -->
|
||||
</div>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">Filter by Regex Pattern</label>
|
||||
<input type="text" class="search-box" id="pattern" placeholder="Enter regex pattern (e.g., .*etcd.*, .*\\.log$)">
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label class="filter-label">Search by Name</label>
|
||||
<input type="text" class="search-box" id="search" placeholder="Search file names...">
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**File List:**
|
||||
```html
|
||||
<div class="file-list">
|
||||
<div class="file-item" data-type="{type}" data-path="{path}">
|
||||
<div class="file-icon">{icon}</div>
|
||||
<div class="file-info">
|
||||
<div class="file-name">
|
||||
<a href="{relative-path}" target="_blank">{filename}</a>
|
||||
</div>
|
||||
<div class="file-meta">
|
||||
<span class="file-path">{directory-path}</span>
|
||||
<span class="file-size">{size}</span>
|
||||
<span class="file-type badge badge-{type}">{type}</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
**CSS Styling:**
|
||||
- Use same dark theme as analyze-resource skill
|
||||
- Modern, clean design with good contrast
|
||||
- Responsive layout
|
||||
- File type color coding
|
||||
- Monospace fonts for paths
|
||||
- Hover effects on file items
|
||||
|
||||
**JavaScript Interactivity:**
|
||||
```javascript
|
||||
// Multi-select file type filters
|
||||
document.querySelectorAll('.filter-btn').forEach(btn => {
|
||||
btn.addEventListener('click', function() {
|
||||
// Toggle active state
|
||||
// Apply filters
|
||||
});
|
||||
});
|
||||
|
||||
// Regex pattern filter
|
||||
document.getElementById('pattern').addEventListener('input', function() {
|
||||
const pattern = this.value;
|
||||
if (pattern) {
|
||||
const regex = new RegExp(pattern);
|
||||
// Filter files matching regex
|
||||
}
|
||||
});
|
||||
|
||||
// Name search filter
|
||||
document.getElementById('search').addEventListener('input', function() {
|
||||
const query = this.value.toLowerCase();
|
||||
// Filter files by name substring
|
||||
});
|
||||
|
||||
// Combine all active filters
|
||||
function applyFilters() {
|
||||
// Show/hide files based on all active filters
|
||||
}
|
||||
```
|
||||
|
||||
4. **Statistics Section:**
|
||||
```html
|
||||
<div class="stats">
|
||||
<div class="stat">
|
||||
<div class="stat-value">{total-files}</div>
|
||||
<div class="stat-label">Total Files</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{total-size}</div>
|
||||
<div class="stat-label">Total Size</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{log-count}</div>
|
||||
<div class="stat-label">Log Files</div>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<div class="stat-value">{yaml-count}</div>
|
||||
<div class="stat-label">YAML Files</div>
|
||||
</div>
|
||||
<!-- etc -->
|
||||
</div>
|
||||
```
|
||||
|
||||
5. **Write HTML to file**
|
||||
- Script automatically writes to `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- Includes proper HTML5 structure
|
||||
- All CSS and JavaScript are inline for portability
|
||||
|
||||
### Step 7: Present Results to User
|
||||
|
||||
1. **Display summary**
|
||||
```
|
||||
Must-Gather Extraction Complete
|
||||
|
||||
Prow Job: {prowjob-name}
|
||||
Build ID: {build_id}
|
||||
Target: {target}
|
||||
|
||||
Extraction Statistics:
|
||||
- Total files: {file-count}
|
||||
- Total size: {human-readable-size}
|
||||
- Archives extracted: {archive-count}
|
||||
- Log files: {log-count}
|
||||
- YAML files: {yaml-count}
|
||||
- JSON files: {json-count}
|
||||
|
||||
Extracted to: .work/prow-job-extract-must-gather/{build_id}/logs/
|
||||
|
||||
File browser generated: .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html
|
||||
|
||||
Open in browser to browse and search extracted files.
|
||||
```
|
||||
|
||||
2. **Open report in browser**
|
||||
- Detect platform and automatically open the HTML report in the default browser
|
||||
- Linux: `xdg-open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- macOS: `open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- Windows: `start .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
|
||||
- On Linux (most common for this environment), use `xdg-open`
|
||||
|
||||
3. **Offer next steps**
|
||||
- Ask if user wants to search for specific files
|
||||
- Explain that extracted files are available in `.work/prow-job-extract-must-gather/{build_id}/logs/`
|
||||
- Mention that extraction is cached for faster subsequent browsing
|
||||
|
||||
## Error Handling
|
||||
|
||||
Handle these error scenarios gracefully:
|
||||
|
||||
1. **Invalid URL format**
|
||||
- Error: "URL must contain 'test-platform-results/' substring"
|
||||
- Provide example of valid URL
|
||||
|
||||
2. **Build ID not found**
|
||||
- Error: "Could not find build ID (10+ decimal digits) in URL path"
|
||||
- Explain requirement and show URL parsing
|
||||
|
||||
3. **gcloud not installed**
|
||||
- Detect with: `which gcloud`
|
||||
- Provide installation instructions for user's platform
|
||||
- Link: https://cloud.google.com/sdk/docs/install
|
||||
|
||||
4. **prowjob.json not found**
|
||||
- Suggest verifying URL and checking if job completed
|
||||
- Provide gcsweb URL for manual verification
|
||||
|
||||
5. **Not a ci-operator job**
|
||||
- Error: "This is not a ci-operator job. No --target found in prowjob.json."
|
||||
- Explain: Only ci-operator jobs can be analyzed by this skill
|
||||
|
||||
6. **must-gather.tar not found**
|
||||
- Warn: "Must-gather archive not found at expected path"
|
||||
- Suggest: Job may not have completed or gather-must-gather may not have run
|
||||
- Provide full GCS path that was checked
|
||||
|
||||
7. **Corrupted archive**
|
||||
- Warn: "Could not extract {archive-path}: {error}"
|
||||
- Continue processing other archives
|
||||
- Report all errors in final summary
|
||||
|
||||
8. **No "-ci-" subdirectory found**
|
||||
- Warn: "Could not find expected subdirectory to rename to 'content/'"
|
||||
- Continue with extraction anyway
|
||||
- Files will be in original directory structure
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Avoid re-extracting**
|
||||
- Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` already has content
|
||||
- Ask user before re-extracting
|
||||
|
||||
2. **Efficient downloads**
|
||||
- Use `gcloud storage cp` with `--no-user-output-enabled` to suppress verbose output
|
||||
|
||||
3. **Memory efficiency**
|
||||
- Process archives incrementally
|
||||
- Don't load entire files into memory
|
||||
- Use streaming extraction
|
||||
|
||||
4. **Progress indicators**
|
||||
- Show "Downloading must-gather archive..." before gcloud command
|
||||
- Show "Extracting must-gather.tar..." before extraction
|
||||
- Show "Processing nested archives..." during recursive extraction
|
||||
- Show "Generating HTML file browser..." before report generation
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Extract must-gather from periodic job
|
||||
```
|
||||
User: "Extract must-gather from this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
|
||||
|
||||
Output:
|
||||
- Downloads must-gather.tar to: .work/prow-job-extract-must-gather/1965715986610917376/tmp/
|
||||
- Extracts to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
|
||||
- Renames long subdirectory to: content/
|
||||
- Processes 247 nested archives (.tar.gz, .tgz, .gz)
|
||||
- Creates: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
|
||||
- Opens browser with interactive file list (3,421 files, 234 MB)
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
- Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
|
||||
- Authentication is NOT required - the bucket is publicly accessible
|
||||
- Use `.work/prow-job-extract-must-gather/{build_id}/` directory structure for organization
|
||||
- All work files are in `.work/` which is already in .gitignore
|
||||
- The Python scripts handle all extraction and HTML generation - use them!
|
||||
- Cache extracted files in `.work/prow-job-extract-must-gather/{build_id}/` to avoid re-extraction
|
||||
- The HTML file browser supports regex patterns for powerful file filtering
|
||||
- Extracted files can be opened directly from the HTML browser (links are relative)
|
||||
|
||||
## Important Notes
|
||||
|
||||
1. **Archive Processing:**
|
||||
- The script automatically handles nested archives
|
||||
- Original compressed files are removed after successful extraction
|
||||
- Corrupted archives are skipped with warnings
|
||||
|
||||
2. **Directory Renaming:**
|
||||
- The long subdirectory name (containing "-ci-") is renamed to "content/" for brevity
|
||||
- Files within "content/" are NOT altered
|
||||
- This makes paths more readable in the HTML browser
|
||||
|
||||
3. **File Type Detection:**
|
||||
- File types are detected based on extension
|
||||
- Common types are color-coded in the HTML browser
|
||||
- All file types can be filtered
|
||||
|
||||
4. **Regex Pattern Filtering:**
|
||||
- Users can enter regex patterns in the filter input
|
||||
- Patterns match against full file paths
|
||||
- Invalid regex patterns are ignored gracefully
|
||||
|
||||
5. **Working with Scripts:**
|
||||
- All scripts are in `plugins/prow-job/skills/prow-job-extract-must-gather/`
|
||||
- `extract_archives.py` - Extracts and processes archives
|
||||
- `generate_html_report.py` - Generates interactive HTML file browser
|
||||
202
skills/prow-job-extract-must-gather/extract_archives.py
Executable file
202
skills/prow-job-extract-must-gather/extract_archives.py
Executable file
@@ -0,0 +1,202 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Extract and recursively decompress must-gather archives."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import tarfile
|
||||
import gzip
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def human_readable_size(size_bytes):
|
||||
"""Convert bytes to human-readable format."""
|
||||
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
|
||||
if size_bytes < 1024.0:
|
||||
return f"{size_bytes:.1f} {unit}"
|
||||
size_bytes /= 1024.0
|
||||
return f"{size_bytes:.1f} PB"
|
||||
|
||||
|
||||
def extract_tar_archive(tar_path, extract_to):
|
||||
"""Extract a tar archive (including .tar.gz and .tgz)."""
|
||||
try:
|
||||
print(f" Extracting: {tar_path}")
|
||||
with tarfile.open(tar_path, 'r:*') as tar:
|
||||
tar.extractall(path=extract_to)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f" ERROR: Failed to extract {tar_path}: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def gunzip_file(gz_path):
|
||||
"""Gunzip a .gz file (not a tar.gz)."""
|
||||
try:
|
||||
# Output file is the same name without .gz extension
|
||||
output_path = gz_path[:-3] if gz_path.endswith('.gz') else gz_path + '.decompressed'
|
||||
|
||||
print(f" Decompressing: {gz_path}")
|
||||
with gzip.open(gz_path, 'rb') as f_in:
|
||||
with open(output_path, 'wb') as f_out:
|
||||
shutil.copyfileobj(f_in, f_out)
|
||||
return True, output_path
|
||||
except Exception as e:
|
||||
print(f" ERROR: Failed to decompress {gz_path}: {e}", file=sys.stderr)
|
||||
return False, None
|
||||
|
||||
|
||||
def find_and_rename_ci_directory(base_path):
|
||||
"""Find directory containing '-ci-' and rename it to 'content'."""
|
||||
try:
|
||||
for item in os.listdir(base_path):
|
||||
item_path = os.path.join(base_path, item)
|
||||
if os.path.isdir(item_path) and '-ci-' in item:
|
||||
content_path = os.path.join(base_path, 'content')
|
||||
print(f"\nRenaming directory:")
|
||||
print(f" From: {item}")
|
||||
print(f" To: content/")
|
||||
os.rename(item_path, content_path)
|
||||
return True
|
||||
print("\nWARNING: No directory containing '-ci-' found to rename", file=sys.stderr)
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"ERROR: Failed to rename directory: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def process_nested_archives(base_path):
|
||||
"""Recursively find and extract nested archives."""
|
||||
archives_processed = 0
|
||||
errors = []
|
||||
|
||||
print("\nProcessing nested archives...")
|
||||
|
||||
# Keep processing until no more archives are found
|
||||
# (since extracting one archive might create new archives)
|
||||
max_iterations = 10
|
||||
iteration = 0
|
||||
|
||||
while iteration < max_iterations:
|
||||
iteration += 1
|
||||
found_archives = False
|
||||
|
||||
# Walk directory tree
|
||||
for root, dirs, files in os.walk(base_path):
|
||||
for filename in files:
|
||||
file_path = os.path.join(root, filename)
|
||||
processed = False
|
||||
|
||||
# Handle .tar.gz and .tgz files
|
||||
if filename.endswith('.tar.gz') or filename.endswith('.tgz'):
|
||||
parent_dir = os.path.dirname(file_path)
|
||||
if extract_tar_archive(file_path, parent_dir):
|
||||
os.remove(file_path)
|
||||
archives_processed += 1
|
||||
processed = True
|
||||
found_archives = True
|
||||
else:
|
||||
errors.append(f"Failed to extract: {file_path}")
|
||||
|
||||
# Handle plain .gz files (not .tar.gz)
|
||||
elif filename.endswith('.gz') and not filename.endswith('.tar.gz'):
|
||||
success, output_path = gunzip_file(file_path)
|
||||
if success:
|
||||
os.remove(file_path)
|
||||
archives_processed += 1
|
||||
processed = True
|
||||
found_archives = True
|
||||
else:
|
||||
errors.append(f"Failed to decompress: {file_path}")
|
||||
|
||||
# If no archives were found in this iteration, we're done
|
||||
if not found_archives:
|
||||
break
|
||||
|
||||
if iteration >= max_iterations:
|
||||
print(f"\nWARNING: Stopped after {max_iterations} iterations. Some nested archives may remain.", file=sys.stderr)
|
||||
|
||||
return archives_processed, errors
|
||||
|
||||
|
||||
def count_files_and_size(base_path):
|
||||
"""Count total files and calculate total size."""
|
||||
total_files = 0
|
||||
total_size = 0
|
||||
|
||||
for root, dirs, files in os.walk(base_path):
|
||||
for filename in files:
|
||||
file_path = os.path.join(root, filename)
|
||||
try:
|
||||
total_files += 1
|
||||
total_size += os.path.getsize(file_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
return total_files, total_size
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) != 3:
|
||||
print("Usage: extract_archives.py <must-gather.tar> <output-directory>")
|
||||
print(" <must-gather.tar>: Path to the must-gather.tar file")
|
||||
print(" <output-directory>: Directory to extract to")
|
||||
sys.exit(1)
|
||||
|
||||
tar_file = sys.argv[1]
|
||||
output_dir = sys.argv[2]
|
||||
|
||||
# Validate inputs
|
||||
if not os.path.exists(tar_file):
|
||||
print(f"ERROR: Input file not found: {tar_file}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Create output directory
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
print("=" * 80)
|
||||
print("Must-Gather Archive Extraction")
|
||||
print("=" * 80)
|
||||
|
||||
# Step 1: Extract main tar file
|
||||
print(f"\nStep 1: Extracting must-gather.tar")
|
||||
print(f" From: {tar_file}")
|
||||
print(f" To: {output_dir}")
|
||||
|
||||
if not extract_tar_archive(tar_file, output_dir):
|
||||
print("ERROR: Failed to extract must-gather.tar", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Step 2: Rename directory containing '-ci-' to 'content'
|
||||
print(f"\nStep 2: Renaming long directory to 'content/'")
|
||||
find_and_rename_ci_directory(output_dir)
|
||||
|
||||
# Step 3: Process nested archives
|
||||
print(f"\nStep 3: Processing nested archives")
|
||||
archives_processed, errors = process_nested_archives(output_dir)
|
||||
|
||||
# Final statistics
|
||||
print("\n" + "=" * 80)
|
||||
print("Extraction Complete")
|
||||
print("=" * 80)
|
||||
|
||||
total_files, total_size = count_files_and_size(output_dir)
|
||||
|
||||
print(f"\nStatistics:")
|
||||
print(f" Total files: {total_files:,}")
|
||||
print(f" Total size: {human_readable_size(total_size)}")
|
||||
print(f" Archives processed: {archives_processed}")
|
||||
|
||||
if errors:
|
||||
print(f"\nErrors encountered: {len(errors)}")
|
||||
for error in errors[:10]: # Show first 10 errors
|
||||
print(f" - {error}")
|
||||
if len(errors) > 10:
|
||||
print(f" ... and {len(errors) - 10} more errors")
|
||||
|
||||
print(f"\nExtracted to: {output_dir}")
|
||||
print("")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
1289
skills/prow-job-extract-must-gather/generate_html_report.py
Executable file
1289
skills/prow-job-extract-must-gather/generate_html_report.py
Executable file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user