Files
gh-openshift-eng-ai-helpers…/skills/prow-job-extract-must-gather
2025-11-30 08:46:16 +08:00
..
2025-11-30 08:46:16 +08:00
2025-11-30 08:46:16 +08:00
2025-11-30 08:46:16 +08:00
2025-11-30 08:46:16 +08:00
2025-11-30 08:46:16 +08:00

Prow Job Extract Must-Gather Skill

This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.

Overview

The skill provides both a Claude Code skill interface and standalone scripts for extracting must-gather data from Prow CI jobs. It eliminates the manual steps of downloading and recursively extracting nested archives.

Components

1. SKILL.md

Claude Code skill definition that provides detailed implementation instructions for the AI assistant.

2. Python Scripts

extract_archives.py

Extracts and recursively processes must-gather archives.

Features:

  • Extracts must-gather.tar to specified directory
  • Renames long subdirectory (containing "-ci-") to "content/" for readability
  • Recursively processes nested archives:
    • .tar.gz and .tgz: Extract in place, remove original
    • .gz (plain gzip): Decompress in place, remove original
  • Handles up to 10 levels of nested archives
  • Reports extraction statistics

Usage:

python3 extract_archives.py <must-gather.tar> <output-directory>

Example:

python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
  .work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
  .work/prow-job-extract-must-gather/1965715986610917376/logs

Output:

================================================================================
Must-Gather Archive Extraction
================================================================================

Step 1: Extracting must-gather.tar
  From: .work/.../tmp/must-gather.tar
  To: .work/.../logs
  Extracting: .work/.../tmp/must-gather.tar

Step 2: Renaming long directory to 'content/'
  From: registry-build09-ci-openshift-org-ci-op-...
  To: content/

Step 3: Processing nested archives
  Extracting: .../content/namespaces/openshift-etcd/pods/etcd-0.tar.gz
  Decompressing: .../content/cluster-scoped-resources/nodes/ip-10-0-1-234.log.gz
  ... (continues for all archives)

================================================================================
Extraction Complete
================================================================================

Statistics:
  Total files: 3,421
  Total size: 234.5 MB
  Archives processed: 247

Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs

generate_html_report.py

Generates an interactive HTML file browser with filters and search.

Features:

  • Scans directory tree and collects file metadata
  • Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
  • Generates statistics (total files, total size, counts by type)
  • Creates interactive HTML with:
    • Multi-select file type filters
    • Regex pattern filter for powerful searches
    • Text search for file names/paths
    • Direct links to files (relative paths)
    • Same dark theme as analyze-resource skill

Usage:

python3 generate_html_report.py <logs-directory> <prowjob_name> <build_id> <target> <gcsweb_url>

Example:

python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
  .work/prow-job-extract-must-gather/1965715986610917376/logs \
  "periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
  "1965715986610917376" \
  "e2e-aws-ovn-techpreview" \
  "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"

Output:

  • Creates .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html

Prerequisites

  1. Python 3 - For running extraction and report generator scripts
  2. gcloud CLI - For downloading artifacts from GCS

Workflow

  1. URL Parsing

    • Validate URL contains test-platform-results/
    • Extract build_id (10+ digits)
    • Extract prowjob name
    • Construct GCS paths
  2. Working Directory

    • Create .work/prow-job-extract-must-gather/{build_id}/ directory
    • Create logs/ subdirectory for extraction
    • Create tmp/ subdirectory for temporary files
    • Check for existing extraction (offers to skip re-extraction)
  3. prowjob.json Validation

    • Download prowjob.json
    • Search for --target= pattern
    • Exit if not a ci-operator job
  4. Must-Gather Download

    • Download from: artifacts/{target}/gather-must-gather/artifacts/must-gather.tar
    • Save to: {build_id}/tmp/must-gather.tar
  5. Extraction and Processing

    • Extract must-gather.tar to {build_id}/logs/
    • Rename long subdirectory to "content/"
    • Recursively extract nested archives (.tar.gz, .tgz, .gz)
    • Remove original compressed files after extraction
  6. HTML Report Generation

    • Scan directory tree
    • Classify files by type
    • Calculate statistics
    • Generate interactive HTML browser
    • Output to {build_id}/must-gather-browser.html

Output

Console Output

Must-Gather Extraction Complete

Prow Job: periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
Build ID: 1965715986610917376
Target: e2e-aws-ovn-techpreview

Extraction Statistics:
- Total files: 3,421
- Total size: 234.5 MB
- Archives extracted: 247
- Log files: 1,234
- YAML files: 856
- JSON files: 423

Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs/

File browser generated: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html

Open in browser to browse and search extracted files.

HTML File Browser

The generated HTML report includes:

  1. Header Section

    • Prow job name
    • Build ID
    • Target name
    • GCS URL (link to gcsweb)
    • Local extraction path
  2. Statistics Dashboard

    • Total files count
    • Total size (human-readable)
    • Counts by file type (log, yaml, json, xml, cert, archive, script, config, other)
  3. Filter Controls

    • File Type Filter: Multi-select buttons to filter by type
    • Regex Pattern Filter: Input field for regex patterns (e.g., .*etcd.*, .*\.log$, ^content/namespaces/.*)
    • Name Search: Text search for file names and paths
  4. File List

    • Icon for each file type
    • File name (clickable link to open file)
    • Directory path
    • File size
    • File type badge (color-coded)
    • Sorted alphabetically by path
  5. Interactive Features

    • All filters work together (AND logic)
    • Real-time filtering (300ms debounce)
    • Regex pattern validation
    • Scroll to top button
    • No results message when filters match nothing

Directory Structure

.work/prow-job-extract-must-gather/{build_id}/
├── tmp/
│   ├── prowjob.json
│   └── must-gather.tar (downloaded, not deleted)
├── logs/
│   └── content/                    # Renamed from long directory
│       ├── cluster-scoped-resources/
│       │   ├── nodes/
│       │   ├── clusterroles/
│       │   └── ...
│       ├── namespaces/
│       │   ├── openshift-etcd/
│       │   │   ├── pods/
│       │   │   ├── services/
│       │   │   └── ...
│       │   └── ...
│       └── ... (all extracted and decompressed)
└── must-gather-browser.html

Performance Features

  1. Caching

    • Extracted files are cached in {build_id}/logs/
    • Offers to skip re-extraction if content already exists
  2. Incremental Processing

    • Archives processed iteratively (up to 10 passes)
    • Handles deeply nested archive structures
  3. Progress Indicators

    • Colored output for different stages
    • Status messages for long-running operations
    • Final statistics summary
  4. Error Handling

    • Graceful handling of corrupted archives
    • Continues processing after errors
    • Reports all errors in final summary

Examples

Basic Usage

# Via Claude Code
User: "Extract must-gather from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"

# Standalone script
python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
  .work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
  .work/prow-job-extract-must-gather/1965715986610917376/logs

python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
  .work/prow-job-extract-must-gather/1965715986610917376/logs \
  "periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
  "1965715986610917376" \
  "e2e-aws-ovn-techpreview" \
  "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"

Using Regex Filters in HTML Browser

Find all etcd-related files:

.*etcd.*

Find all log files:

.*\.log$

Find files in specific namespace:

^content/namespaces/openshift-etcd/.*

Find YAML manifests for pods:

.*pods/.*\.yaml$

Using with Claude Code

When you ask Claude to extract a must-gather, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:

  • Validating prerequisites
  • Parsing URLs
  • Downloading archives
  • Extracting and decompressing
  • Generating HTML browser

You can simply ask:

"Extract must-gather from this Prow job: https://gcsweb-ci.../1965715986610917376/"

Claude will execute the workflow and generate the interactive HTML file browser.

Troubleshooting

gcloud not installed

# Check installation
which gcloud

# Install (follow platform-specific instructions)
# https://cloud.google.com/sdk/docs/install

must-gather.tar not found

  • Verify job completed successfully
  • Check target name is correct
  • Confirm gather-must-gather ran in the job
  • Manually check GCS path in gcsweb

Corrupted archives

  • Check error messages in extraction output
  • Extraction continues despite individual failures
  • Final summary lists all errors

No "-ci-" directory found

  • Extraction continues with original directory names
  • Check logs for warning message
  • Files will still be accessible

HTML browser not opening files

  • Verify files were extracted to logs/ directory
  • Check that relative paths are correct
  • Files must be opened from the same directory as HTML file

File Type Classifications

Extension Type Badge Color
.log, .txt log Blue
.yaml, .yml yaml Purple
.json json Green
.xml xml Yellow
.crt, .pem, .key cert Red
.tar, .gz, .tgz, .zip archive Gray
.sh, .py script Blue
.conf, .cfg, .ini config Yellow
others other Gray