Initial commit

2025-11-30 08:46:16 +08:00
commit c556a2eace
30 changed files with 8957 additions and 0 deletions
--- a/skills/prow-job-extract-must-gather/CHANGELOG.md
+++ b/skills/prow-job-extract-must-gather/CHANGELOG.md
@@ -0,0 +1,54 @@
+# Changelog
+
+All notable changes to the Prow Job Extract Must-Gather skill will be documented in this file.
+
+## [1.0.0] - 2025-01-17
+
+### Added
+- Initial release of Prow Job Extract Must-Gather skill
+- Command file: `plugins/prow-job/commands/extract-must-gather.md`
+- Comprehensive SKILL.md with detailed implementation instructions
+- `extract_archives.py` script for recursive archive extraction
+  - Extracts must-gather.tar to specified directory
+  - Renames long "-ci-" containing subdirectory to "content/"
+  - Recursively processes nested .tar.gz, .tgz, and .gz archives
+  - Removes original compressed files after extraction
+  - Handles up to 10 levels of nesting
+  - Reports extraction statistics
+- `generate_html_report.py` script for HTML file browser generation
+  - Scans directory tree and collects file metadata
+  - Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
+  - Generates interactive HTML with dark theme matching analyze-resource skill
+  - Multi-select file type filters
+  - Regex pattern filter for powerful file searches
+  - Text search for file names and paths
+  - Direct links to files with relative paths
+  - Statistics dashboard showing file counts and sizes
+  - Scroll to top button
+- Comprehensive README.md documentation
+- Working directory structure: `.work/prow-job-extract-must-gather/{build_id}/`
+- Subdirectory organization: `logs/` for extracted content, `tmp/` for temporary files
+- Same URL parsing logic as analyze-resource skill
+- Support for caching extracted content (ask user before re-extracting)
+- Error handling for corrupted archives, missing files, and invalid URLs
+- Progress indicators for all long-running operations
+- Platform-aware browser opening (xdg-open, open, start)
+
+### Features
+- **Automatic Archive Extraction**: Handles all nested archive formats automatically
+- **Directory Renaming**: Shortens long subdirectory names for better usability
+- **Interactive File Browser**: Modern HTML interface with powerful filtering
+- **Regex Pattern Matching**: Search files using full regex syntax
+- **File Type Classification**: Automatic detection and categorization of file types
+- **Relative File Links**: Click to open files directly from HTML browser
+- **Statistics Dashboard**: Visual overview of extracted content
+- **Extraction Caching**: Avoid re-extracting by reusing cached content
+- **Error Recovery**: Continue processing despite individual archive failures
+
+### Technical Details
+- Python 3 scripts using standard library (tarfile, gzip, os, pathlib)
+- No external dependencies required
+- Memory-efficient incremental processing
+- Follows same patterns as analyze-resource skill
+- Integrated with Claude Code permissions system
+- Uses `.work/` directory (already in .gitignore)
--- a/skills/prow-job-extract-must-gather/README.md
+++ b/skills/prow-job-extract-must-gather/README.md
@@ -0,0 +1,350 @@
+# Prow Job Extract Must-Gather Skill
+
+This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
+
+## Overview
+
+The skill provides both a Claude Code skill interface and standalone scripts for extracting must-gather data from Prow CI jobs. It eliminates the manual steps of downloading and recursively extracting nested archives.
+
+## Components
+
+### 1. SKILL.md
+Claude Code skill definition that provides detailed implementation instructions for the AI assistant.
+
+### 2. Python Scripts
+
+#### extract_archives.py
+Extracts and recursively processes must-gather archives.
+
+**Features:**
+- Extracts must-gather.tar to specified directory
+- Renames long subdirectory (containing "-ci-") to "content/" for readability
+- Recursively processes nested archives:
+  - `.tar.gz` and `.tgz`: Extract in place, remove original
+  - `.gz` (plain gzip): Decompress in place, remove original
+- Handles up to 10 levels of nested archives
+- Reports extraction statistics
+
+**Usage:**
+```bash
+python3 extract_archives.py <must-gather.tar> <output-directory>
+```
+
+**Example:**
+```bash
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
+  .work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
+  .work/prow-job-extract-must-gather/1965715986610917376/logs
+```
+
+**Output:**
+```
+================================================================================
+Must-Gather Archive Extraction
+================================================================================
+
+Step 1: Extracting must-gather.tar
+  From: .work/.../tmp/must-gather.tar
+  To: .work/.../logs
+  Extracting: .work/.../tmp/must-gather.tar
+
+Step 2: Renaming long directory to 'content/'
+  From: registry-build09-ci-openshift-org-ci-op-...
+  To: content/
+
+Step 3: Processing nested archives
+  Extracting: .../content/namespaces/openshift-etcd/pods/etcd-0.tar.gz
+  Decompressing: .../content/cluster-scoped-resources/nodes/ip-10-0-1-234.log.gz
+  ... (continues for all archives)
+
+================================================================================
+Extraction Complete
+================================================================================
+
+Statistics:
+  Total files: 3,421
+  Total size: 234.5 MB
+  Archives processed: 247
+
+Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs
+```
+
+#### generate_html_report.py
+Generates an interactive HTML file browser with filters and search.
+
+**Features:**
+- Scans directory tree and collects file metadata
+- Classifies files by type (log, yaml, json, xml, cert, archive, script, config, other)
+- Generates statistics (total files, total size, counts by type)
+- Creates interactive HTML with:
+  - Multi-select file type filters
+  - Regex pattern filter for powerful searches
+  - Text search for file names/paths
+  - Direct links to files (relative paths)
+  - Same dark theme as analyze-resource skill
+
+**Usage:**
+```bash
+python3 generate_html_report.py <logs-directory> <prowjob_name> <build_id> <target> <gcsweb_url>
+```
+
+**Example:**
+```bash
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
+  .work/prow-job-extract-must-gather/1965715986610917376/logs \
+  "periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
+  "1965715986610917376" \
+  "e2e-aws-ovn-techpreview" \
+  "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
+```
+
+**Output:**
+- Creates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+
+## Prerequisites
+
+1. **Python 3** - For running extraction and report generator scripts
+2. **gcloud CLI** - For downloading artifacts from GCS
+   - Install: https://cloud.google.com/sdk/docs/install
+   - Authentication NOT required (bucket is publicly accessible)
+
+## Workflow
+
+1. **URL Parsing**
+   - Validate URL contains `test-platform-results/`
+   - Extract build_id (10+ digits)
+   - Extract prowjob name
+   - Construct GCS paths
+
+2. **Working Directory**
+   - Create `.work/prow-job-extract-must-gather/{build_id}/` directory
+   - Create `logs/` subdirectory for extraction
+   - Create `tmp/` subdirectory for temporary files
+   - Check for existing extraction (offers to skip re-extraction)
+
+3. **prowjob.json Validation**
+   - Download prowjob.json
+   - Search for `--target=` pattern
+   - Exit if not a ci-operator job
+
+4. **Must-Gather Download**
+   - Download from: `artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
+   - Save to: `{build_id}/tmp/must-gather.tar`
+
+5. **Extraction and Processing**
+   - Extract must-gather.tar to `{build_id}/logs/`
+   - Rename long subdirectory to "content/"
+   - Recursively extract nested archives (.tar.gz, .tgz, .gz)
+   - Remove original compressed files after extraction
+
+6. **HTML Report Generation**
+   - Scan directory tree
+   - Classify files by type
+   - Calculate statistics
+   - Generate interactive HTML browser
+   - Output to `{build_id}/must-gather-browser.html`
+
+## Output
+
+### Console Output
+```
+Must-Gather Extraction Complete
+
+Prow Job: periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
+Build ID: 1965715986610917376
+Target: e2e-aws-ovn-techpreview
+
+Extraction Statistics:
+- Total files: 3,421
+- Total size: 234.5 MB
+- Archives extracted: 247
+- Log files: 1,234
+- YAML files: 856
+- JSON files: 423
+
+Extracted to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
+
+File browser generated: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
+
+Open in browser to browse and search extracted files.
+```
+
+### HTML File Browser
+
+The generated HTML report includes:
+
+1. **Header Section**
+   - Prow job name
+   - Build ID
+   - Target name
+   - GCS URL (link to gcsweb)
+   - Local extraction path
+
+2. **Statistics Dashboard**
+   - Total files count
+   - Total size (human-readable)
+   - Counts by file type (log, yaml, json, xml, cert, archive, script, config, other)
+
+3. **Filter Controls**
+   - **File Type Filter**: Multi-select buttons to filter by type
+   - **Regex Pattern Filter**: Input field for regex patterns (e.g., `.*etcd.*`, `.*\.log$`, `^content/namespaces/.*`)
+   - **Name Search**: Text search for file names and paths
+
+4. **File List**
+   - Icon for each file type
+   - File name (clickable link to open file)
+   - Directory path
+   - File size
+   - File type badge (color-coded)
+   - Sorted alphabetically by path
+
+5. **Interactive Features**
+   - All filters work together (AND logic)
+   - Real-time filtering (300ms debounce)
+   - Regex pattern validation
+   - Scroll to top button
+   - No results message when filters match nothing
+
+### Directory Structure
+```
+.work/prow-job-extract-must-gather/{build_id}/
+├── tmp/
+│   ├── prowjob.json
+│   └── must-gather.tar (downloaded, not deleted)
+├── logs/
+│   └── content/                    # Renamed from long directory
+│       ├── cluster-scoped-resources/
+│       │   ├── nodes/
+│       │   ├── clusterroles/
+│       │   └── ...
+│       ├── namespaces/
+│       │   ├── openshift-etcd/
+│       │   │   ├── pods/
+│       │   │   ├── services/
+│       │   │   └── ...
+│       │   └── ...
+│       └── ... (all extracted and decompressed)
+└── must-gather-browser.html
+```
+
+## Performance Features
+
+1. **Caching**
+   - Extracted files are cached in `{build_id}/logs/`
+   - Offers to skip re-extraction if content already exists
+
+2. **Incremental Processing**
+   - Archives processed iteratively (up to 10 passes)
+   - Handles deeply nested archive structures
+
+3. **Progress Indicators**
+   - Colored output for different stages
+   - Status messages for long-running operations
+   - Final statistics summary
+
+4. **Error Handling**
+   - Graceful handling of corrupted archives
+   - Continues processing after errors
+   - Reports all errors in final summary
+
+## Examples
+
+### Basic Usage
+```bash
+# Via Claude Code
+User: "Extract must-gather from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
+
+# Standalone script
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
+  .work/prow-job-extract-must-gather/1965715986610917376/tmp/must-gather.tar \
+  .work/prow-job-extract-must-gather/1965715986610917376/logs
+
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
+  .work/prow-job-extract-must-gather/1965715986610917376/logs \
+  "periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview" \
+  "1965715986610917376" \
+  "e2e-aws-ovn-techpreview" \
+  "https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
+```
+
+### Using Regex Filters in HTML Browser
+
+**Find all etcd-related files:**
+```regex
+.*etcd.*
+```
+
+**Find all log files:**
+```regex
+.*\.log$
+```
+
+**Find files in specific namespace:**
+```regex
+^content/namespaces/openshift-etcd/.*
+```
+
+**Find YAML manifests for pods:**
+```regex
+.*pods/.*\.yaml$
+```
+
+## Using with Claude Code
+
+When you ask Claude to extract a must-gather, it will automatically use this skill. The skill provides detailed instructions that guide Claude through:
+- Validating prerequisites
+- Parsing URLs
+- Downloading archives
+- Extracting and decompressing
+- Generating HTML browser
+
+You can simply ask:
+> "Extract must-gather from this Prow job: https://gcsweb-ci.../1965715986610917376/"
+
+Claude will execute the workflow and generate the interactive HTML file browser.
+
+## Troubleshooting
+
+### gcloud not installed
+```bash
+# Check installation
+which gcloud
+
+# Install (follow platform-specific instructions)
+# https://cloud.google.com/sdk/docs/install
+```
+
+### must-gather.tar not found
+- Verify job completed successfully
+- Check target name is correct
+- Confirm gather-must-gather ran in the job
+- Manually check GCS path in gcsweb
+
+### Corrupted archives
+- Check error messages in extraction output
+- Extraction continues despite individual failures
+- Final summary lists all errors
+
+### No "-ci-" directory found
+- Extraction continues with original directory names
+- Check logs for warning message
+- Files will still be accessible
+
+### HTML browser not opening files
+- Verify files were extracted to `logs/` directory
+- Check that relative paths are correct
+- Files must be opened from the same directory as HTML file
+
+## File Type Classifications
+
+| Extension | Type | Badge Color |
+|-----------|------|-------------|
+| .log, .txt | log | Blue |
+| .yaml, .yml | yaml | Purple |
+| .json | json | Green |
+| .xml | xml | Yellow |
+| .crt, .pem, .key | cert | Red |
+| .tar, .gz, .tgz, .zip | archive | Gray |
+| .sh, .py | script | Blue |
+| .conf, .cfg, .ini | config | Yellow |
+| others | other | Gray |
--- a/skills/prow-job-extract-must-gather/SKILL.md
+++ b/skills/prow-job-extract-must-gather/SKILL.md
@@ -0,0 +1,493 @@
+---
+name: Prow Job Extract Must-Gather
+description: Extract and decompress must-gather archives from Prow CI job artifacts, generating an interactive HTML file browser with filters
+---
+
+# Prow Job Extract Must-Gather
+
+This skill extracts and decompresses must-gather archives from Prow CI job artifacts, automatically handling nested tar and gzip archives, and generating an interactive HTML file browser.
+
+## When to Use This Skill
+
+Use this skill when the user wants to:
+- Extract must-gather archives from Prow CI job artifacts
+- Avoid manually downloading and extracting nested archives
+- Browse must-gather contents with an interactive HTML interface
+- Search for specific files or file types in must-gather data
+- Analyze OpenShift cluster state from CI test runs
+
+## Prerequisites
+
+Before starting, verify these prerequisites:
+
+1. **gcloud CLI Installation**
+   - Check if installed: `which gcloud`
+   - If not installed, provide instructions for the user's platform
+   - Installation guide: https://cloud.google.com/sdk/docs/install
+
+2. **gcloud Authentication (Optional)**
+   - The `test-platform-results` bucket is publicly accessible
+   - No authentication is required for read access
+   - Skip authentication checks
+
+## Input Format
+
+The user will provide:
+1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
+   - Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
+   - URL may or may not have trailing slash
+
+## Implementation Steps
+
+### Step 1: Parse and Validate URL
+
+1. **Extract bucket path**
+   - Find `test-platform-results/` in URL
+   - Extract everything after it as the GCS bucket relative path
+   - If not found, error: "URL must contain 'test-platform-results/'"
+
+2. **Extract build_id**
+   - Search for pattern `/(\\d{10,})/` in the bucket path
+   - build_id must be at least 10 consecutive decimal digits
+   - Handle URLs with or without trailing slash
+   - If not found, error: "Could not find build ID (10+ digits) in URL"
+
+3. **Extract prowjob name**
+   - Find the path segment immediately preceding build_id
+   - Example: In `.../periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376/`
+   - Prowjob name: `periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview`
+
+4. **Construct GCS paths**
+   - Bucket: `test-platform-results`
+   - Base GCS path: `gs://test-platform-results/{bucket-path}/`
+   - Ensure path ends with `/`
+
+### Step 2: Create Working Directory
+
+1. **Check for existing extraction first**
+   - Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` directory exists and has content
+   - If it exists with content:
+     - Use AskUserQuestion tool to ask:
+       - Question: "Must-gather already extracted for build {build_id}. Would you like to use the existing extraction or re-extract?"
+       - Options:
+         - "Use existing" - Skip to HTML report generation (Step 6)
+         - "Re-extract" - Continue to clean and re-download
+     - If user chooses "Re-extract":
+       - Remove all existing content: `rm -rf .work/prow-job-extract-must-gather/{build_id}/logs/`
+       - Also remove tmp directory: `rm -rf .work/prow-job-extract-must-gather/{build_id}/tmp/`
+       - This ensures clean state before downloading new content
+     - If user chooses "Use existing":
+       - Skip directly to Step 6 (Generate HTML Report)
+
+2. **Create directory structure**
+   ```bash
+   mkdir -p .work/prow-job-extract-must-gather/{build_id}/logs
+   mkdir -p .work/prow-job-extract-must-gather/{build_id}/tmp
+   ```
+   - Use `.work/prow-job-extract-must-gather/` as the base directory (already in .gitignore)
+   - Use build_id as subdirectory name
+   - Create `logs/` subdirectory for extraction
+   - Create `tmp/` subdirectory for temporary files
+   - Working directory: `.work/prow-job-extract-must-gather/{build_id}/`
+
+### Step 3: Download and Validate prowjob.json
+
+1. **Download prowjob.json**
+   ```bash
+   gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json --no-user-output-enabled
+   ```
+
+2. **Parse and validate**
+   - Read `.work/prow-job-extract-must-gather/{build_id}/tmp/prowjob.json`
+   - Search for pattern: `--target=([a-zA-Z0-9-]+)`
+   - If not found:
+     - Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
+     - Explain: ci-operator jobs have a --target argument specifying the test target
+     - Exit skill
+
+3. **Extract target name**
+   - Capture the target value (e.g., `e2e-aws-ovn-techpreview`)
+   - Store for constructing must-gather path
+
+### Step 4: Download Must-Gather Archive
+
+1. **Construct must-gather path**
+   - GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar`
+   - Local path: `.work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar`
+
+2. **Download must-gather.tar**
+   ```bash
+   gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-must-gather/artifacts/must-gather.tar .work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar --no-user-output-enabled
+   ```
+   - Use `--no-user-output-enabled` to suppress progress output
+   - If file not found, error: "No must-gather archive found. Job may not have completed or gather-must-gather may not have run."
+
+### Step 5: Extract and Process Archives
+
+**IMPORTANT: Use the provided Python script `extract_archives.py` from the skill directory.**
+
+**Usage:**
+```bash
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/extract_archives.py \
+  .work/prow-job-extract-must-gather/{build_id}/tmp/must-gather.tar \
+  .work/prow-job-extract-must-gather/{build_id}/logs
+```
+
+**What the script does:**
+
+1. **Extract must-gather.tar**
+   - Extract to `{build_id}/logs/` directory
+   - Uses Python's tarfile module for reliable extraction
+
+2. **Rename long subdirectory to "content/"**
+   - Find subdirectory containing "-ci-" in the name
+   - Example: `registry-build09-ci-openshift-org-ci-op-m8t77165-stable-sha256-d1ae126eed86a47fdbc8db0ad176bf078a5edebdbb0df180d73f02e5f03779e0/`
+   - Rename to: `content/`
+   - Preserves all files and subdirectories
+
+3. **Recursively process nested archives**
+   - Walk entire directory tree
+   - Find and process archives:
+
+   **For .tar.gz and .tgz files:**
+   ```python
+   # Extract in place
+   with tarfile.open(archive_path, 'r:gz') as tar:
+       tar.extractall(path=parent_dir)
+   # Remove original archive
+   os.remove(archive_path)
+   ```
+
+   **For .gz files (no tar):**
+   ```python
+   # Gunzip in place
+   with gzip.open(gz_path, 'rb') as f_in:
+       with open(output_path, 'wb') as f_out:
+           shutil.copyfileobj(f_in, f_out)
+   # Remove original archive
+   os.remove(gz_path)
+   ```
+
+4. **Progress reporting**
+   - Print status for each extracted archive
+   - Count total files and archives processed
+   - Report final statistics
+
+5. **Error handling**
+   - Skip corrupted archives with warning
+   - Continue processing other files
+   - Report all errors at the end
+
+### Step 6: Generate HTML File Browser
+
+**IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.**
+
+**Usage:**
+```bash
+python3 plugins/prow-job/skills/prow-job-extract-must-gather/generate_html_report.py \
+  .work/prow-job-extract-must-gather/{build_id}/logs \
+  "{prowjob_name}" \
+  "{build_id}" \
+  "{target}" \
+  "{gcsweb_url}"
+```
+
+**Output:** The script generates `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+
+**What the script does:**
+
+1. **Scan directory tree**
+   - Recursively walk `{build_id}/logs/` directory
+   - Collect all files with metadata:
+     - Relative path from logs/
+     - File size (human-readable: KB, MB, GB)
+     - File extension
+     - Directory depth
+     - Last modified time
+
+2. **Classify files**
+   - Detect file types based on extension:
+     - Logs: `.log`, `.txt`
+     - YAML: `.yaml`, `.yml`
+     - JSON: `.json`
+     - XML: `.xml`
+     - Certificates: `.crt`, `.pem`, `.key`
+     - Binaries: `.tar`, `.gz`, `.tgz`, `.tar.gz`
+     - Other
+   - Count files by type for statistics
+
+3. **Generate HTML structure**
+
+   **Header Section:**
+   ```html
+   <div class="header">
+     <h1>Must-Gather File Browser</h1>
+     <div class="metadata">
+       <p><strong>Prow Job:</strong> {prowjob-name}</p>
+       <p><strong>Build ID:</strong> {build_id}</p>
+       <p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
+       <p><strong>Target:</strong> {target}</p>
+       <p><strong>Total Files:</strong> {count}</p>
+       <p><strong>Total Size:</strong> {human-readable-size}</p>
+     </div>
+   </div>
+   ```
+
+   **Filter Controls:**
+   ```html
+   <div class="filters">
+     <div class="filter-group">
+       <label class="filter-label">File Type (multi-select)</label>
+       <div class="filter-buttons">
+         <button class="filter-btn" data-filter="type" data-value="log">Logs ({count})</button>
+         <button class="filter-btn" data-filter="type" data-value="yaml">YAML ({count})</button>
+         <button class="filter-btn" data-filter="type" data-value="json">JSON ({count})</button>
+         <!-- etc -->
+       </div>
+     </div>
+     <div class="filter-group">
+       <label class="filter-label">Filter by Regex Pattern</label>
+       <input type="text" class="search-box" id="pattern" placeholder="Enter regex pattern (e.g., .*etcd.*, .*\\.log$)">
+     </div>
+     <div class="filter-group">
+       <label class="filter-label">Search by Name</label>
+       <input type="text" class="search-box" id="search" placeholder="Search file names...">
+     </div>
+   </div>
+   ```
+
+   **File List:**
+   ```html
+   <div class="file-list">
+     <div class="file-item" data-type="{type}" data-path="{path}">
+       <div class="file-icon">{icon}</div>
+       <div class="file-info">
+         <div class="file-name">
+           <a href="{relative-path}" target="_blank">{filename}</a>
+         </div>
+         <div class="file-meta">
+           <span class="file-path">{directory-path}</span>
+           <span class="file-size">{size}</span>
+           <span class="file-type badge badge-{type}">{type}</span>
+         </div>
+       </div>
+     </div>
+   </div>
+   ```
+
+   **CSS Styling:**
+   - Use same dark theme as analyze-resource skill
+   - Modern, clean design with good contrast
+   - Responsive layout
+   - File type color coding
+   - Monospace fonts for paths
+   - Hover effects on file items
+
+   **JavaScript Interactivity:**
+   ```javascript
+   // Multi-select file type filters
+   document.querySelectorAll('.filter-btn').forEach(btn => {
+     btn.addEventListener('click', function() {
+       // Toggle active state
+       // Apply filters
+     });
+   });
+
+   // Regex pattern filter
+   document.getElementById('pattern').addEventListener('input', function() {
+     const pattern = this.value;
+     if (pattern) {
+       const regex = new RegExp(pattern);
+       // Filter files matching regex
+     }
+   });
+
+   // Name search filter
+   document.getElementById('search').addEventListener('input', function() {
+     const query = this.value.toLowerCase();
+     // Filter files by name substring
+   });
+
+   // Combine all active filters
+   function applyFilters() {
+     // Show/hide files based on all active filters
+   }
+   ```
+
+4. **Statistics Section:**
+   ```html
+   <div class="stats">
+     <div class="stat">
+       <div class="stat-value">{total-files}</div>
+       <div class="stat-label">Total Files</div>
+     </div>
+     <div class="stat">
+       <div class="stat-value">{total-size}</div>
+       <div class="stat-label">Total Size</div>
+     </div>
+     <div class="stat">
+       <div class="stat-value">{log-count}</div>
+       <div class="stat-label">Log Files</div>
+     </div>
+     <div class="stat">
+       <div class="stat-value">{yaml-count}</div>
+       <div class="stat-label">YAML Files</div>
+     </div>
+     <!-- etc -->
+   </div>
+   ```
+
+5. **Write HTML to file**
+   - Script automatically writes to `.work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+   - Includes proper HTML5 structure
+   - All CSS and JavaScript are inline for portability
+
+### Step 7: Present Results to User
+
+1. **Display summary**
+   ```
+   Must-Gather Extraction Complete
+
+   Prow Job: {prowjob-name}
+   Build ID: {build_id}
+   Target: {target}
+
+   Extraction Statistics:
+   - Total files: {file-count}
+   - Total size: {human-readable-size}
+   - Archives extracted: {archive-count}
+   - Log files: {log-count}
+   - YAML files: {yaml-count}
+   - JSON files: {json-count}
+
+   Extracted to: .work/prow-job-extract-must-gather/{build_id}/logs/
+
+   File browser generated: .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html
+
+   Open in browser to browse and search extracted files.
+   ```
+
+2. **Open report in browser**
+   - Detect platform and automatically open the HTML report in the default browser
+   - Linux: `xdg-open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+   - macOS: `open .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+   - Windows: `start .work/prow-job-extract-must-gather/{build_id}/must-gather-browser.html`
+   - On Linux (most common for this environment), use `xdg-open`
+
+3. **Offer next steps**
+   - Ask if user wants to search for specific files
+   - Explain that extracted files are available in `.work/prow-job-extract-must-gather/{build_id}/logs/`
+   - Mention that extraction is cached for faster subsequent browsing
+
+## Error Handling
+
+Handle these error scenarios gracefully:
+
+1. **Invalid URL format**
+   - Error: "URL must contain 'test-platform-results/' substring"
+   - Provide example of valid URL
+
+2. **Build ID not found**
+   - Error: "Could not find build ID (10+ decimal digits) in URL path"
+   - Explain requirement and show URL parsing
+
+3. **gcloud not installed**
+   - Detect with: `which gcloud`
+   - Provide installation instructions for user's platform
+   - Link: https://cloud.google.com/sdk/docs/install
+
+4. **prowjob.json not found**
+   - Suggest verifying URL and checking if job completed
+   - Provide gcsweb URL for manual verification
+
+5. **Not a ci-operator job**
+   - Error: "This is not a ci-operator job. No --target found in prowjob.json."
+   - Explain: Only ci-operator jobs can be analyzed by this skill
+
+6. **must-gather.tar not found**
+   - Warn: "Must-gather archive not found at expected path"
+   - Suggest: Job may not have completed or gather-must-gather may not have run
+   - Provide full GCS path that was checked
+
+7. **Corrupted archive**
+   - Warn: "Could not extract {archive-path}: {error}"
+   - Continue processing other archives
+   - Report all errors in final summary
+
+8. **No "-ci-" subdirectory found**
+   - Warn: "Could not find expected subdirectory to rename to 'content/'"
+   - Continue with extraction anyway
+   - Files will be in original directory structure
+
+## Performance Considerations
+
+1. **Avoid re-extracting**
+   - Check if `.work/prow-job-extract-must-gather/{build_id}/logs/` already has content
+   - Ask user before re-extracting
+
+2. **Efficient downloads**
+   - Use `gcloud storage cp` with `--no-user-output-enabled` to suppress verbose output
+
+3. **Memory efficiency**
+   - Process archives incrementally
+   - Don't load entire files into memory
+   - Use streaming extraction
+
+4. **Progress indicators**
+   - Show "Downloading must-gather archive..." before gcloud command
+   - Show "Extracting must-gather.tar..." before extraction
+   - Show "Processing nested archives..." during recursive extraction
+   - Show "Generating HTML file browser..." before report generation
+
+## Examples
+
+### Example 1: Extract must-gather from periodic job
+```
+User: "Extract must-gather from this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview/1965715986610917376"
+
+Output:
+- Downloads must-gather.tar to: .work/prow-job-extract-must-gather/1965715986610917376/tmp/
+- Extracts to: .work/prow-job-extract-must-gather/1965715986610917376/logs/
+- Renames long subdirectory to: content/
+- Processes 247 nested archives (.tar.gz, .tgz, .gz)
+- Creates: .work/prow-job-extract-must-gather/1965715986610917376/must-gather-browser.html
+- Opens browser with interactive file list (3,421 files, 234 MB)
+```
+
+## Tips
+
+- Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
+- Authentication is NOT required - the bucket is publicly accessible
+- Use `.work/prow-job-extract-must-gather/{build_id}/` directory structure for organization
+- All work files are in `.work/` which is already in .gitignore
+- The Python scripts handle all extraction and HTML generation - use them!
+- Cache extracted files in `.work/prow-job-extract-must-gather/{build_id}/` to avoid re-extraction
+- The HTML file browser supports regex patterns for powerful file filtering
+- Extracted files can be opened directly from the HTML browser (links are relative)
+
+## Important Notes
+
+1. **Archive Processing:**
+   - The script automatically handles nested archives
+   - Original compressed files are removed after successful extraction
+   - Corrupted archives are skipped with warnings
+
+2. **Directory Renaming:**
+   - The long subdirectory name (containing "-ci-") is renamed to "content/" for brevity
+   - Files within "content/" are NOT altered
+   - This makes paths more readable in the HTML browser
+
+3. **File Type Detection:**
+   - File types are detected based on extension
+   - Common types are color-coded in the HTML browser
+   - All file types can be filtered
+
+4. **Regex Pattern Filtering:**
+   - Users can enter regex patterns in the filter input
+   - Patterns match against full file paths
+   - Invalid regex patterns are ignored gracefully
+
+5. **Working with Scripts:**
+   - All scripts are in `plugins/prow-job/skills/prow-job-extract-must-gather/`
+   - `extract_archives.py` - Extracts and processes archives
+   - `generate_html_report.py` - Generates interactive HTML file browser
--- a/skills/prow-job-extract-must-gather/extract_archives.py
+++ b/skills/prow-job-extract-must-gather/extract_archives.py
@@ -0,0 +1,202 @@
+#!/usr/bin/env python3
+"""Extract and recursively decompress must-gather archives."""
+
+import os
+import sys
+import tarfile
+import gzip
+import shutil
+from pathlib import Path
+
+
+def human_readable_size(size_bytes):
+    """Convert bytes to human-readable format."""
+    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
+        if size_bytes < 1024.0:
+            return f"{size_bytes:.1f} {unit}"
+        size_bytes /= 1024.0
+    return f"{size_bytes:.1f} PB"
+
+
+def extract_tar_archive(tar_path, extract_to):
+    """Extract a tar archive (including .tar.gz and .tgz)."""
+    try:
+        print(f"  Extracting: {tar_path}")
+        with tarfile.open(tar_path, 'r:*') as tar:
+            tar.extractall(path=extract_to)
+        return True
+    except Exception as e:
+        print(f"  ERROR: Failed to extract {tar_path}: {e}", file=sys.stderr)
+        return False
+
+
+def gunzip_file(gz_path):
+    """Gunzip a .gz file (not a tar.gz)."""
+    try:
+        # Output file is the same name without .gz extension
+        output_path = gz_path[:-3] if gz_path.endswith('.gz') else gz_path + '.decompressed'
+
+        print(f"  Decompressing: {gz_path}")
+        with gzip.open(gz_path, 'rb') as f_in:
+            with open(output_path, 'wb') as f_out:
+                shutil.copyfileobj(f_in, f_out)
+        return True, output_path
+    except Exception as e:
+        print(f"  ERROR: Failed to decompress {gz_path}: {e}", file=sys.stderr)
+        return False, None
+
+
+def find_and_rename_ci_directory(base_path):
+    """Find directory containing '-ci-' and rename it to 'content'."""
+    try:
+        for item in os.listdir(base_path):
+            item_path = os.path.join(base_path, item)
+            if os.path.isdir(item_path) and '-ci-' in item:
+                content_path = os.path.join(base_path, 'content')
+                print(f"\nRenaming directory:")
+                print(f"  From: {item}")
+                print(f"  To: content/")
+                os.rename(item_path, content_path)
+                return True
+        print("\nWARNING: No directory containing '-ci-' found to rename", file=sys.stderr)
+        return False
+    except Exception as e:
+        print(f"ERROR: Failed to rename directory: {e}", file=sys.stderr)
+        return False
+
+
+def process_nested_archives(base_path):
+    """Recursively find and extract nested archives."""
+    archives_processed = 0
+    errors = []
+
+    print("\nProcessing nested archives...")
+
+    # Keep processing until no more archives are found
+    # (since extracting one archive might create new archives)
+    max_iterations = 10
+    iteration = 0
+
+    while iteration < max_iterations:
+        iteration += 1
+        found_archives = False
+
+        # Walk directory tree
+        for root, dirs, files in os.walk(base_path):
+            for filename in files:
+                file_path = os.path.join(root, filename)
+                processed = False
+
+                # Handle .tar.gz and .tgz files
+                if filename.endswith('.tar.gz') or filename.endswith('.tgz'):
+                    parent_dir = os.path.dirname(file_path)
+                    if extract_tar_archive(file_path, parent_dir):
+                        os.remove(file_path)
+                        archives_processed += 1
+                        processed = True
+                        found_archives = True
+                    else:
+                        errors.append(f"Failed to extract: {file_path}")
+
+                # Handle plain .gz files (not .tar.gz)
+                elif filename.endswith('.gz') and not filename.endswith('.tar.gz'):
+                    success, output_path = gunzip_file(file_path)
+                    if success:
+                        os.remove(file_path)
+                        archives_processed += 1
+                        processed = True
+                        found_archives = True
+                    else:
+                        errors.append(f"Failed to decompress: {file_path}")
+
+        # If no archives were found in this iteration, we're done
+        if not found_archives:
+            break
+
+    if iteration >= max_iterations:
+        print(f"\nWARNING: Stopped after {max_iterations} iterations. Some nested archives may remain.", file=sys.stderr)
+
+    return archives_processed, errors
+
+
+def count_files_and_size(base_path):
+    """Count total files and calculate total size."""
+    total_files = 0
+    total_size = 0
+
+    for root, dirs, files in os.walk(base_path):
+        for filename in files:
+            file_path = os.path.join(root, filename)
+            try:
+                total_files += 1
+                total_size += os.path.getsize(file_path)
+            except:
+                pass
+
+    return total_files, total_size
+
+
+def main():
+    if len(sys.argv) != 3:
+        print("Usage: extract_archives.py <must-gather.tar> <output-directory>")
+        print("  <must-gather.tar>: Path to the must-gather.tar file")
+        print("  <output-directory>: Directory to extract to")
+        sys.exit(1)
+
+    tar_file = sys.argv[1]
+    output_dir = sys.argv[2]
+
+    # Validate inputs
+    if not os.path.exists(tar_file):
+        print(f"ERROR: Input file not found: {tar_file}", file=sys.stderr)
+        sys.exit(1)
+
+    # Create output directory
+    os.makedirs(output_dir, exist_ok=True)
+
+    print("=" * 80)
+    print("Must-Gather Archive Extraction")
+    print("=" * 80)
+
+    # Step 1: Extract main tar file
+    print(f"\nStep 1: Extracting must-gather.tar")
+    print(f"  From: {tar_file}")
+    print(f"  To: {output_dir}")
+
+    if not extract_tar_archive(tar_file, output_dir):
+        print("ERROR: Failed to extract must-gather.tar", file=sys.stderr)
+        sys.exit(1)
+
+    # Step 2: Rename directory containing '-ci-' to 'content'
+    print(f"\nStep 2: Renaming long directory to 'content/'")
+    find_and_rename_ci_directory(output_dir)
+
+    # Step 3: Process nested archives
+    print(f"\nStep 3: Processing nested archives")
+    archives_processed, errors = process_nested_archives(output_dir)
+
+    # Final statistics
+    print("\n" + "=" * 80)
+    print("Extraction Complete")
+    print("=" * 80)
+
+    total_files, total_size = count_files_and_size(output_dir)
+
+    print(f"\nStatistics:")
+    print(f"  Total files: {total_files:,}")
+    print(f"  Total size: {human_readable_size(total_size)}")
+    print(f"  Archives processed: {archives_processed}")
+
+    if errors:
+        print(f"\nErrors encountered: {len(errors)}")
+        for error in errors[:10]:  # Show first 10 errors
+            print(f"  - {error}")
+        if len(errors) > 10:
+            print(f"  ... and {len(errors) - 10} more errors")
+
+    print(f"\nExtracted to: {output_dir}")
+    print("")
+
+
+if __name__ == '__main__':
+    main()
--- a/skills/prow-job-extract-must-gather/generate_html_report.py
+++ b/skills/prow-job-extract-must-gather/generate_html_report.py