--- name: Prow Job Analyze Resource description: Analyze Kubernetes resource lifecycle in Prow CI job artifacts by parsing audit logs and pod logs from GCS, generating interactive HTML reports with timelines --- # Prow Job Analyze Resource This skill analyzes the lifecycle of Kubernetes resources during Prow CI job execution by downloading and parsing artifacts from Google Cloud Storage. ## When to Use This Skill Use this skill when the user wants to: - Debug Prow CI test failures by tracking resource state changes - Understand when and how a Kubernetes resource was created, modified, or deleted during a test - Analyze resource lifecycle across audit logs and pod logs from ephemeral test clusters - Generate interactive HTML reports showing resource events over time - Search for specific resources (pods, deployments, configmaps, etc.) in Prow job artifacts ## Prerequisites Before starting, verify these prerequisites: 1. **gcloud CLI Installation** - Check if installed: `which gcloud` - If not installed, provide instructions for the user's platform - Installation guide: https://cloud.google.com/sdk/docs/install 2. **gcloud Authentication (Optional)** - The `test-platform-results` bucket is publicly accessible - No authentication is required for read access - Skip authentication checks ## Input Format The user will provide: 1. **Prow job URL** - gcsweb URL containing `test-platform-results/` - Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/` - URL may or may not have trailing slash 2. **Resource specifications** - Comma-delimited list in format `[namespace:][kind/]name` - Supports regex patterns for matching multiple resources - Examples: - `pod/etcd-0` - pod named etcd-0 in any namespace - `openshift-etcd:pod/etcd-0` - pod in specific namespace - `etcd-0` - any resource named etcd-0 (no kind filter) - `pod/etcd-0,configmap/cluster-config` - multiple resources - `resource-name-1|resource-name-2` - multiple resources using regex OR - `e2e-test-project-api-.*` - all resources matching the pattern ## Implementation Steps ### Step 1: Parse and Validate URL 1. **Extract bucket path** - Find `test-platform-results/` in URL - Extract everything after it as the GCS bucket relative path - If not found, error: "URL must contain 'test-platform-results/'" 2. **Extract build_id** - Search for pattern `/(\d{10,})/` in the bucket path - build_id must be at least 10 consecutive decimal digits - Handle URLs with or without trailing slash - If not found, error: "Could not find build ID (10+ digits) in URL" 3. **Extract prowjob name** - Find the path segment immediately preceding build_id - Example: In `.../pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/` - Prowjob name: `pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn` 4. **Construct GCS paths** - Bucket: `test-platform-results` - Base GCS path: `gs://test-platform-results/{bucket-path}/` - Ensure path ends with `/` ### Step 2: Parse Resource Specifications For each comma-delimited resource spec: 1. **Parse format** `[namespace:][kind/]name` - Split on `:` to get namespace (optional) - Split remaining on `/` to get kind (optional) and name (required) - Store as structured data: `{namespace, kind, name}` 2. **Validate** - name is required - namespace and kind are optional - Examples: - `pod/etcd-0` → `{kind: "pod", name: "etcd-0"}` - `openshift-etcd:pod/etcd-0` → `{namespace: "openshift-etcd", kind: "pod", name: "etcd-0"}` - `etcd-0` → `{name: "etcd-0"}` ### Step 3: Create Working Directory 1. **Check for existing artifacts first** - Check if `.work/prow-job-analyze-resource/{build_id}/logs/` directory exists and has content - If it exists with content: - Use AskUserQuestion tool to ask: - Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?" - Options: - "Use existing" - Skip to artifact parsing step (Step 6) - "Re-download" - Continue to clean and re-download - If user chooses "Re-download": - Remove all existing content: `rm -rf .work/prow-job-analyze-resource/{build_id}/logs/` - Also remove tmp directory: `rm -rf .work/prow-job-analyze-resource/{build_id}/tmp/` - This ensures clean state before downloading new content - If user chooses "Use existing": - Skip directly to Step 6 (Parse Audit Logs) - Still need to download prowjob.json if it doesn't exist 2. **Create directory structure** ```bash mkdir -p .work/prow-job-analyze-resource/{build_id}/logs mkdir -p .work/prow-job-analyze-resource/{build_id}/tmp ``` - Use `.work/prow-job-analyze-resource/` as the base directory (already in .gitignore) - Use build_id as subdirectory name - Create `logs/` subdirectory for all downloads - Create `tmp/` subdirectory for temporary files (intermediate JSON, etc.) - Working directory: `.work/prow-job-analyze-resource/{build_id}/` ### Step 4: Download and Validate prowjob.json 1. **Download prowjob.json** ```bash gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-analyze-resource/{build_id}/logs/prowjob.json --no-user-output-enabled ``` 2. **Parse and validate** - Read `.work/prow-job-analyze-resource/{build_id}/logs/prowjob.json` - Search for pattern: `--target=([a-zA-Z0-9-]+)` - If not found: - Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill." - Explain: ci-operator jobs have a --target argument specifying the test target - Exit skill 3. **Extract target name** - Capture the target value (e.g., `e2e-aws-ovn`) - Store for constructing gather-extra path ### Step 5: Download Audit Logs and Pod Logs 1. **Construct gather-extra paths** - GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/` - Local path: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/` 2. **Download audit logs** ```bash mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/audit_logs/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/ --no-user-output-enabled ``` - Create directory first to avoid gcloud errors - Use `--no-user-output-enabled` to suppress progress output - If directory not found, warn: "No audit logs found. Job may not have completed or audit logging may be disabled." 3. **Download pod logs** ```bash mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/pods/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/ --no-user-output-enabled ``` - Create directory first to avoid gcloud errors - Use `--no-user-output-enabled` to suppress progress output - If directory not found, warn: "No pod logs found." ### Step 6: Parse Audit Logs and Pod Logs **IMPORTANT: Use the provided Python script `parse_all_logs.py` from the skill directory to parse both audit logs and pod logs efficiently.** **Usage:** ```bash python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py \ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs \ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods \ > .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json ``` **Resource Pattern Parameter:** - The `` parameter supports **regex patterns** - Use `|` (pipe) to search for multiple resources: `resource1|resource2|resource3` - Use `.*` for wildcards: `e2e-test-project-.*` - Simple substring matching still works: `my-namespace` - Examples: - Single resource: `e2e-test-project-api-pkjxf` - Multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx` - Pattern matching: `e2e-test-project-api-.*` **Note:** The script outputs status messages to stderr which will display as progress. The JSON output to stdout is clean and ready to use. **What the script does:** 1. **Find all log files** - Audit logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/**/*.log` - Pod logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/**/*.log` 2. **Parse audit log files (JSONL format)** - Read file line by line - Each line is a JSON object (JSONL format) - Parse JSON into object `e` 3. **Extract fields from each audit log entry** - `e.verb` - action (get, list, create, update, patch, delete, watch) - `e.user.username` - user making request - `e.responseStatus.code` - HTTP response code (integer) - `e.objectRef.namespace` - namespace (if namespaced) - `e.objectRef.resource` - lowercase plural kind (e.g., "pods", "configmaps") - `e.objectRef.name` - resource name - `e.requestReceivedTimestamp` - ISO 8601 timestamp 4. **Filter matches for each resource spec** - Uses **regex matching** on `e.objectRef.namespace` and `e.objectRef.name` - Pattern matches if found in either namespace or name field - Supports all regex features: - Pipe operator: `resource1|resource2` matches either resource - Wildcards: `e2e-test-.*` matches all resources starting with `e2e-test-` - Character classes: `[abc]` matches a, b, or c - Simple substring matching still works for patterns without regex special chars - Performance optimization: plain strings use fast substring search 5. **For each audit log match, capture** - **Source**: "audit" - **Filename**: Full path to .log file - **Line number**: Line number in file (1-indexed) - **Level**: Based on `e.responseStatus.code` - 200-299: "info" - 400-499: "warn" - 500-599: "error" - **Timestamp**: Parse `e.requestReceivedTimestamp` to datetime - **Content**: Full JSON line (for expandable details) - **Summary**: Generate formatted summary - Format: `{verb} {resource}/{name} in {namespace} by {username} → HTTP {code}` - Example: `create pod/etcd-0 in openshift-etcd by system:serviceaccount:kube-system:deployment-controller → HTTP 201` 6. **Parse pod log files (plain text format)** - Read file line by line - Each line is plain text (not JSON) - Search for resource pattern in line content 7. **For each pod log match, capture** - **Source**: "pod" - **Filename**: Full path to .log file - **Line number**: Line number in file (1-indexed) - **Level**: Detect from glog format or default to "info" - Glog format: `E0910 11:43:41.153414 ...` (E=error, W=warn, I=info, F=fatal→error) - Non-glog format: default to "info" - **Timestamp**: Extract from start of line if present (format: `YYYY-MM-DDTHH:MM:SS.mmmmmmZ`) - **Content**: Full log line - **Summary**: First 200 characters of line (after timestamp if present) 8. **Combine and sort all entries** - Merge audit log entries and pod log entries - Sort all entries chronologically by timestamp - Entries without timestamps are placed at the end ### Step 7: Generate HTML Report **IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.** **Usage:** ```bash python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \ .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json \ "{prowjob_name}" \ "{build_id}" \ "{target}" \ "{resource_pattern}" \ "{gcsweb_url}" ``` **Resource Pattern Parameter:** - The `{resource_pattern}` should be the **same pattern used in the parse script** - For single resources: `e2e-test-project-api-pkjxf` - For multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx` - The script will parse the pattern to display the searched resources in the HTML header **Output:** The script generates `.work/prow-job-analyze-resource/{build_id}/{first_resource_name}.html` **What the script does:** 1. **Determine report filename** - Format: `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html` - Uses the primary resource name for the filename 2. **Sort all entries by timestamp** - Loads audit log entries from JSON - Sort chronologically (ascending) - Entries without timestamps go at the end 3. **Calculate timeline bounds** - min_time: Earliest timestamp found - max_time: Latest timestamp found - Time range: max_time - min_time 4. **Generate HTML structure** **Header Section:** ```html

Prow Job Resource Lifecycle Analysis

``` **Interactive Timeline:** ```html
``` - Position: Calculate percentage based on timestamp between min_time and max_time - Color: white/lightgray (info), yellow (warn), red (error) - Clickable: Jump to corresponding entry - Tooltip on hover: Show summary **Log Entries Section:** ```html
{formatted-timestamp} {level} {filename}:{line-number}
{summary}
Show full content
{content}
``` **CSS Styling:** - Modern, clean design with good contrast - Responsive layout - Badge colors: info=gray, warn=yellow, error=red - Monospace font for log content - Syntax highlighting for JSON (in audit logs) **JavaScript Interactivity:** ```javascript // Timeline click handler document.querySelectorAll('.timeline-event').forEach(el => { el.addEventListener('click', () => { const entryId = el.dataset.entryId; document.getElementById(entryId).scrollIntoView({behavior: 'smooth'}); }); }); // Filter controls // Expand/collapse details // Search within entries ``` 5. **Write HTML to file** - Script automatically writes to `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html` - Includes proper HTML5 structure - All CSS and JavaScript are inline for portability ### Step 8: Present Results to User 1. **Display summary** ``` Resource Lifecycle Analysis Complete Prow Job: {prowjob-name} Build ID: {build_id} Target: {target} Resources Analyzed: - {resource-spec-1} - {resource-spec-2} ... Artifacts downloaded to: .work/prow-job-analyze-resource/{build_id}/logs/ Results: - Audit log entries: {audit-count} - Pod log entries: {pod-count} - Total entries: {total-count} - Time range: {min_time} to {max_time} Report generated: .work/prow-job-analyze-resource/{build_id}/{resource_name}.html Open in browser to view interactive timeline and detailed entries. ``` 2. **Open report in browser** - Detect platform and automatically open the HTML report in the default browser - Linux: `xdg-open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html` - macOS: `open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html` - Windows: `start .work/prow-job-analyze-resource/{build_id}/{resource_name}.html` - On Linux (most common for this environment), use `xdg-open` 3. **Offer next steps** - Ask if user wants to search for additional resources in the same job - Ask if user wants to analyze a different Prow job - Explain that artifacts are cached in `.work/prow-job-analyze-resource/{build_id}/` for faster subsequent searches ## Error Handling Handle these error scenarios gracefully: 1. **Invalid URL format** - Error: "URL must contain 'test-platform-results/' substring" - Provide example of valid URL 2. **Build ID not found** - Error: "Could not find build ID (10+ decimal digits) in URL path" - Explain requirement and show URL parsing 3. **gcloud not installed** - Detect with: `which gcloud` - Provide installation instructions for user's platform - Link: https://cloud.google.com/sdk/docs/install 4. **gcloud not authenticated** - Detect with: `gcloud auth list` - Instruct: "Please run: gcloud auth login" 5. **No access to bucket** - Error from gcloud storage commands - Explain: "You need read access to the test-platform-results GCS bucket" - Suggest checking project access 6. **prowjob.json not found** - Suggest verifying URL and checking if job completed - Provide gcsweb URL for manual verification 7. **Not a ci-operator job** - Error: "This is not a ci-operator job. No --target found in prowjob.json." - Explain: Only ci-operator jobs can be analyzed by this skill 8. **gather-extra not found** - Warn: "gather-extra directory not found for target {target}" - Suggest: Job may not have completed or target name is incorrect 9. **No matches found** - Display: "No log entries found matching the specified resources" - Suggest: - Check resource names for typos - Try searching without kind or namespace filters - Verify resources existed during this job execution 10. **Timestamp parsing failures** - Warn about unparseable timestamps - Fall back to line order for sorting - Still include entries in report ## Performance Considerations 1. **Avoid re-downloading** - Check if `.work/prow-job-analyze-resource/{build_id}/logs/` already has content - Ask user before re-downloading 2. **Efficient downloads** - Use `gcloud storage cp -r` for recursive downloads - Use `--no-user-output-enabled` to suppress verbose output - Create target directories with `mkdir -p` before downloading to avoid gcloud errors 3. **Memory efficiency** - The `parse_all_logs.py` script processes log files incrementally (line by line) - Don't load entire files into memory - Script outputs to JSON for efficient HTML generation 4. **Content length limits** - The HTML generator trims JSON content to ~2000 chars in display - Full content is available in expandable details sections 5. **Progress indicators** - Show "Downloading audit logs..." before gcloud commands - Show "Parsing audit logs..." before running parse script - Show "Generating HTML report..." before running report generator ## Examples ### Example 1: Search for a namespace/project ``` User: "Analyze e2e-test-project-api-p28m in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview/1964725888612306944" Output: - Downloads artifacts to: .work/prow-job-analyze-resource/1964725888612306944/logs/ - Finds actual resource name: e2e-test-project-api-p28mx (namespace) - Parses 382 audit log entries - Finds 86 pod log mentions - Creates: .work/prow-job-analyze-resource/1964725888612306944/e2e-test-project-api-p28mx.html - Shows timeline from creation (18:11:02) to deletion (18:17:32) ``` ### Example 2: Search for a pod ``` User: "Analyze pod/etcd-0 in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/" Output: - Creates: .work/prow-job-analyze-resource/1978913325970362368/etcd-0.html - Shows timeline of all pod/etcd-0 events across namespaces ``` ### Example 3: Search by name only ``` User: "Find all resources named cluster-version-operator in job {url}" Output: - Searches without kind filter - Finds deployments, pods, services, etc. all named cluster-version-operator - Creates: .work/prow-job-analyze-resource/{build_id}/cluster-version-operator.html ``` ### Example 4: Search for multiple resources using regex ``` User: "Analyze e2e-test-project-api-pkjxf and e2e-test-project-api-7zdxx in job {url}" Output: - Uses regex pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx` - Finds all events for both namespaces in a single pass - Parses 1,047 total entries (501 for first namespace, 546 for second) - Passes the same pattern to generate_html_report.py - HTML displays: "Resources: e2e-test-project-api-7zdxx, e2e-test-project-api-pkjxf" - Creates: .work/prow-job-analyze-resource/{build_id}/e2e-test-project-api-pkjxf.html - Timeline shows interleaved events from both namespaces chronologically ``` ## Tips - Always verify gcloud prerequisites before starting (gcloud CLI must be installed) - Authentication is NOT required - the bucket is publicly accessible - Use `.work/prow-job-analyze-resource/{build_id}/` directory structure for organization - All work files are in `.work/` which is already in .gitignore - The Python scripts handle all parsing and HTML generation - use them! - Cache artifacts in `.work/prow-job-analyze-resource/{build_id}/` to speed up subsequent searches - The parse script supports **regex patterns** for flexible matching: - Use `resource1|resource2` to search for multiple resources in a single pass - Use `.*` wildcards to match resource name patterns - Simple substring matching still works for basic searches - The resource name provided by the user may not exactly match the actual resource name in logs - Example: User asks for `e2e-test-project-api-p28m` but actual resource is `e2e-test-project-api-p28mx` - Use regex patterns like `e2e-test-project-api-p28m.*` to find partial matches - For namespaces/projects, search for the resource name - it will match both `namespace` and `project` resources - Provide helpful error messages with actionable solutions ## Important Notes 1. **Resource Name Matching:** - The parse script uses **regex pattern matching** for maximum flexibility - Supports pipe operator (`|`) to search for multiple resources: `resource1|resource2` - Supports wildcards (`.*`) for pattern matching: `e2e-test-.*` - Simple substrings still work for basic searches - May match multiple related resources (e.g., namespace, project, rolebindings in that namespace) - Report all matches - this provides complete lifecycle context 2. **Namespace vs Project:** - In OpenShift, a `project` is essentially a `namespace` with additional metadata - Searching for a namespace will find both namespace and project resources - The audit logs contain events for both resource types 3. **Target Extraction:** - Must extract the `--target` argument from prowjob.json - This is critical for finding the correct gather-extra path - Non-ci-operator jobs cannot be analyzed (they don't have --target) 4. **Working with Scripts:** - All scripts are in `plugins/prow-job/skills/prow-job-analyze-resource/` - `parse_all_logs.py` - Parses audit logs and pod logs, outputs JSON - Detects glog severity levels (E=error, W=warn, I=info, F=fatal) - Supports regex patterns for resource matching - `generate_html_report.py` - Generates interactive HTML report from JSON - Scripts output status messages to stderr for progress display. JSON output to stdout is clean. 5. **Pod Log Glog Format Support:** - The parser automatically detects and parses glog format logs - Glog format: `E0910 11:43:41.153414 ...` - `E` = severity (E/F → error, W → warn, I → info) - `0910` = month/day (MMDD) - `11:43:41.153414` = time with microseconds - Timestamp parsing: Extracts timestamp and infers year (2025) - Severity mapping allows filtering by level in HTML report - Non-glog logs default to info level