595 lines
25 KiB
Markdown
595 lines
25 KiB
Markdown
---
|
|
name: Prow Job Analyze Resource
|
|
description: Analyze Kubernetes resource lifecycle in Prow CI job artifacts by parsing audit logs and pod logs from GCS, generating interactive HTML reports with timelines
|
|
---
|
|
|
|
# Prow Job Analyze Resource
|
|
|
|
This skill analyzes the lifecycle of Kubernetes resources during Prow CI job execution by downloading and parsing artifacts from Google Cloud Storage.
|
|
|
|
## When to Use This Skill
|
|
|
|
Use this skill when the user wants to:
|
|
- Debug Prow CI test failures by tracking resource state changes
|
|
- Understand when and how a Kubernetes resource was created, modified, or deleted during a test
|
|
- Analyze resource lifecycle across audit logs and pod logs from ephemeral test clusters
|
|
- Generate interactive HTML reports showing resource events over time
|
|
- Search for specific resources (pods, deployments, configmaps, etc.) in Prow job artifacts
|
|
|
|
## Prerequisites
|
|
|
|
Before starting, verify these prerequisites:
|
|
|
|
1. **gcloud CLI Installation**
|
|
- Check if installed: `which gcloud`
|
|
- If not installed, provide instructions for the user's platform
|
|
- Installation guide: https://cloud.google.com/sdk/docs/install
|
|
|
|
2. **gcloud Authentication (Optional)**
|
|
- The `test-platform-results` bucket is publicly accessible
|
|
- No authentication is required for read access
|
|
- Skip authentication checks
|
|
|
|
## Input Format
|
|
|
|
The user will provide:
|
|
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
|
|
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/`
|
|
- URL may or may not have trailing slash
|
|
|
|
2. **Resource specifications** - Comma-delimited list in format `[namespace:][kind/]name`
|
|
- Supports regex patterns for matching multiple resources
|
|
- Examples:
|
|
- `pod/etcd-0` - pod named etcd-0 in any namespace
|
|
- `openshift-etcd:pod/etcd-0` - pod in specific namespace
|
|
- `etcd-0` - any resource named etcd-0 (no kind filter)
|
|
- `pod/etcd-0,configmap/cluster-config` - multiple resources
|
|
- `resource-name-1|resource-name-2` - multiple resources using regex OR
|
|
- `e2e-test-project-api-.*` - all resources matching the pattern
|
|
|
|
## Implementation Steps
|
|
|
|
### Step 1: Parse and Validate URL
|
|
|
|
1. **Extract bucket path**
|
|
- Find `test-platform-results/` in URL
|
|
- Extract everything after it as the GCS bucket relative path
|
|
- If not found, error: "URL must contain 'test-platform-results/'"
|
|
|
|
2. **Extract build_id**
|
|
- Search for pattern `/(\d{10,})/` in the bucket path
|
|
- build_id must be at least 10 consecutive decimal digits
|
|
- Handle URLs with or without trailing slash
|
|
- If not found, error: "Could not find build ID (10+ digits) in URL"
|
|
|
|
3. **Extract prowjob name**
|
|
- Find the path segment immediately preceding build_id
|
|
- Example: In `.../pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/`
|
|
- Prowjob name: `pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn`
|
|
|
|
4. **Construct GCS paths**
|
|
- Bucket: `test-platform-results`
|
|
- Base GCS path: `gs://test-platform-results/{bucket-path}/`
|
|
- Ensure path ends with `/`
|
|
|
|
### Step 2: Parse Resource Specifications
|
|
|
|
For each comma-delimited resource spec:
|
|
|
|
1. **Parse format** `[namespace:][kind/]name`
|
|
- Split on `:` to get namespace (optional)
|
|
- Split remaining on `/` to get kind (optional) and name (required)
|
|
- Store as structured data: `{namespace, kind, name}`
|
|
|
|
2. **Validate**
|
|
- name is required
|
|
- namespace and kind are optional
|
|
- Examples:
|
|
- `pod/etcd-0` → `{kind: "pod", name: "etcd-0"}`
|
|
- `openshift-etcd:pod/etcd-0` → `{namespace: "openshift-etcd", kind: "pod", name: "etcd-0"}`
|
|
- `etcd-0` → `{name: "etcd-0"}`
|
|
|
|
### Step 3: Create Working Directory
|
|
|
|
1. **Check for existing artifacts first**
|
|
- Check if `.work/prow-job-analyze-resource/{build_id}/logs/` directory exists and has content
|
|
- If it exists with content:
|
|
- Use AskUserQuestion tool to ask:
|
|
- Question: "Artifacts already exist for build {build_id}. Would you like to use the existing download or re-download?"
|
|
- Options:
|
|
- "Use existing" - Skip to artifact parsing step (Step 6)
|
|
- "Re-download" - Continue to clean and re-download
|
|
- If user chooses "Re-download":
|
|
- Remove all existing content: `rm -rf .work/prow-job-analyze-resource/{build_id}/logs/`
|
|
- Also remove tmp directory: `rm -rf .work/prow-job-analyze-resource/{build_id}/tmp/`
|
|
- This ensures clean state before downloading new content
|
|
- If user chooses "Use existing":
|
|
- Skip directly to Step 6 (Parse Audit Logs)
|
|
- Still need to download prowjob.json if it doesn't exist
|
|
|
|
2. **Create directory structure**
|
|
```bash
|
|
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs
|
|
mkdir -p .work/prow-job-analyze-resource/{build_id}/tmp
|
|
```
|
|
- Use `.work/prow-job-analyze-resource/` as the base directory (already in .gitignore)
|
|
- Use build_id as subdirectory name
|
|
- Create `logs/` subdirectory for all downloads
|
|
- Create `tmp/` subdirectory for temporary files (intermediate JSON, etc.)
|
|
- Working directory: `.work/prow-job-analyze-resource/{build_id}/`
|
|
|
|
### Step 4: Download and Validate prowjob.json
|
|
|
|
1. **Download prowjob.json**
|
|
```bash
|
|
gcloud storage cp gs://test-platform-results/{bucket-path}/prowjob.json .work/prow-job-analyze-resource/{build_id}/logs/prowjob.json --no-user-output-enabled
|
|
```
|
|
|
|
2. **Parse and validate**
|
|
- Read `.work/prow-job-analyze-resource/{build_id}/logs/prowjob.json`
|
|
- Search for pattern: `--target=([a-zA-Z0-9-]+)`
|
|
- If not found:
|
|
- Display: "This is not a ci-operator job. The prowjob cannot be analyzed by this skill."
|
|
- Explain: ci-operator jobs have a --target argument specifying the test target
|
|
- Exit skill
|
|
|
|
3. **Extract target name**
|
|
- Capture the target value (e.g., `e2e-aws-ovn`)
|
|
- Store for constructing gather-extra path
|
|
|
|
### Step 5: Download Audit Logs and Pod Logs
|
|
|
|
1. **Construct gather-extra paths**
|
|
- GCS path: `gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/`
|
|
- Local path: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/`
|
|
|
|
2. **Download audit logs**
|
|
```bash
|
|
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs
|
|
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/audit_logs/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/ --no-user-output-enabled
|
|
```
|
|
- Create directory first to avoid gcloud errors
|
|
- Use `--no-user-output-enabled` to suppress progress output
|
|
- If directory not found, warn: "No audit logs found. Job may not have completed or audit logging may be disabled."
|
|
|
|
3. **Download pod logs**
|
|
```bash
|
|
mkdir -p .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods
|
|
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/gather-extra/artifacts/pods/ .work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/ --no-user-output-enabled
|
|
```
|
|
- Create directory first to avoid gcloud errors
|
|
- Use `--no-user-output-enabled` to suppress progress output
|
|
- If directory not found, warn: "No pod logs found."
|
|
|
|
### Step 6: Parse Audit Logs and Pod Logs
|
|
|
|
**IMPORTANT: Use the provided Python script `parse_all_logs.py` from the skill directory to parse both audit logs and pod logs efficiently.**
|
|
|
|
**Usage:**
|
|
```bash
|
|
python3 plugins/prow-job/skills/prow-job-analyze-resource/parse_all_logs.py <resource_pattern> \
|
|
.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs \
|
|
.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods \
|
|
> .work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json
|
|
```
|
|
|
|
**Resource Pattern Parameter:**
|
|
- The `<resource_pattern>` parameter supports **regex patterns**
|
|
- Use `|` (pipe) to search for multiple resources: `resource1|resource2|resource3`
|
|
- Use `.*` for wildcards: `e2e-test-project-.*`
|
|
- Simple substring matching still works: `my-namespace`
|
|
- Examples:
|
|
- Single resource: `e2e-test-project-api-pkjxf`
|
|
- Multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
|
- Pattern matching: `e2e-test-project-api-.*`
|
|
|
|
**Note:** The script outputs status messages to stderr which will display as progress. The JSON output to stdout is clean and ready to use.
|
|
|
|
**What the script does:**
|
|
|
|
1. **Find all log files**
|
|
- Audit logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/audit_logs/**/*.log`
|
|
- Pod logs: `.work/prow-job-analyze-resource/{build_id}/logs/artifacts/{target}/gather-extra/artifacts/pods/**/*.log`
|
|
|
|
2. **Parse audit log files (JSONL format)**
|
|
- Read file line by line
|
|
- Each line is a JSON object (JSONL format)
|
|
- Parse JSON into object `e`
|
|
|
|
3. **Extract fields from each audit log entry**
|
|
- `e.verb` - action (get, list, create, update, patch, delete, watch)
|
|
- `e.user.username` - user making request
|
|
- `e.responseStatus.code` - HTTP response code (integer)
|
|
- `e.objectRef.namespace` - namespace (if namespaced)
|
|
- `e.objectRef.resource` - lowercase plural kind (e.g., "pods", "configmaps")
|
|
- `e.objectRef.name` - resource name
|
|
- `e.requestReceivedTimestamp` - ISO 8601 timestamp
|
|
|
|
4. **Filter matches for each resource spec**
|
|
- Uses **regex matching** on `e.objectRef.namespace` and `e.objectRef.name`
|
|
- Pattern matches if found in either namespace or name field
|
|
- Supports all regex features:
|
|
- Pipe operator: `resource1|resource2` matches either resource
|
|
- Wildcards: `e2e-test-.*` matches all resources starting with `e2e-test-`
|
|
- Character classes: `[abc]` matches a, b, or c
|
|
- Simple substring matching still works for patterns without regex special chars
|
|
- Performance optimization: plain strings use fast substring search
|
|
|
|
5. **For each audit log match, capture**
|
|
- **Source**: "audit"
|
|
- **Filename**: Full path to .log file
|
|
- **Line number**: Line number in file (1-indexed)
|
|
- **Level**: Based on `e.responseStatus.code`
|
|
- 200-299: "info"
|
|
- 400-499: "warn"
|
|
- 500-599: "error"
|
|
- **Timestamp**: Parse `e.requestReceivedTimestamp` to datetime
|
|
- **Content**: Full JSON line (for expandable details)
|
|
- **Summary**: Generate formatted summary
|
|
- Format: `{verb} {resource}/{name} in {namespace} by {username} → HTTP {code}`
|
|
- Example: `create pod/etcd-0 in openshift-etcd by system:serviceaccount:kube-system:deployment-controller → HTTP 201`
|
|
|
|
6. **Parse pod log files (plain text format)**
|
|
- Read file line by line
|
|
- Each line is plain text (not JSON)
|
|
- Search for resource pattern in line content
|
|
|
|
7. **For each pod log match, capture**
|
|
- **Source**: "pod"
|
|
- **Filename**: Full path to .log file
|
|
- **Line number**: Line number in file (1-indexed)
|
|
- **Level**: Detect from glog format or default to "info"
|
|
- Glog format: `E0910 11:43:41.153414 ...` (E=error, W=warn, I=info, F=fatal→error)
|
|
- Non-glog format: default to "info"
|
|
- **Timestamp**: Extract from start of line if present (format: `YYYY-MM-DDTHH:MM:SS.mmmmmmZ`)
|
|
- **Content**: Full log line
|
|
- **Summary**: First 200 characters of line (after timestamp if present)
|
|
|
|
8. **Combine and sort all entries**
|
|
- Merge audit log entries and pod log entries
|
|
- Sort all entries chronologically by timestamp
|
|
- Entries without timestamps are placed at the end
|
|
|
|
### Step 7: Generate HTML Report
|
|
|
|
**IMPORTANT: Use the provided Python script `generate_html_report.py` from the skill directory.**
|
|
|
|
**Usage:**
|
|
```bash
|
|
python3 plugins/prow-job/skills/prow-job-analyze-resource/generate_html_report.py \
|
|
.work/prow-job-analyze-resource/{build_id}/tmp/all_entries.json \
|
|
"{prowjob_name}" \
|
|
"{build_id}" \
|
|
"{target}" \
|
|
"{resource_pattern}" \
|
|
"{gcsweb_url}"
|
|
```
|
|
|
|
**Resource Pattern Parameter:**
|
|
- The `{resource_pattern}` should be the **same pattern used in the parse script**
|
|
- For single resources: `e2e-test-project-api-pkjxf`
|
|
- For multiple resources: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
|
- The script will parse the pattern to display the searched resources in the HTML header
|
|
|
|
**Output:** The script generates `.work/prow-job-analyze-resource/{build_id}/{first_resource_name}.html`
|
|
|
|
**What the script does:**
|
|
|
|
1. **Determine report filename**
|
|
- Format: `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
|
- Uses the primary resource name for the filename
|
|
|
|
2. **Sort all entries by timestamp**
|
|
- Loads audit log entries from JSON
|
|
- Sort chronologically (ascending)
|
|
- Entries without timestamps go at the end
|
|
|
|
3. **Calculate timeline bounds**
|
|
- min_time: Earliest timestamp found
|
|
- max_time: Latest timestamp found
|
|
- Time range: max_time - min_time
|
|
|
|
4. **Generate HTML structure**
|
|
|
|
**Header Section:**
|
|
```html
|
|
<div class="header">
|
|
<h1>Prow Job Resource Lifecycle Analysis</h1>
|
|
<div class="metadata">
|
|
<p><strong>Prow Job:</strong> {prowjob-name}</p>
|
|
<p><strong>Build ID:</strong> {build_id}</p>
|
|
<p><strong>gcsweb URL:</strong> <a href="{original-url}">{original-url}</a></p>
|
|
<p><strong>Target:</strong> {target}</p>
|
|
<p><strong>Resources:</strong> {resource-list}</p>
|
|
<p><strong>Total Entries:</strong> {count}</p>
|
|
<p><strong>Time Range:</strong> {min_time} to {max_time}</p>
|
|
</div>
|
|
</div>
|
|
```
|
|
|
|
**Interactive Timeline:**
|
|
```html
|
|
<div class="timeline-container">
|
|
<svg id="timeline" width="100%" height="100">
|
|
<!-- For each entry, render colored vertical line -->
|
|
<line x1="{position}%" y1="0" x2="{position}%" y2="100"
|
|
stroke="{color}" stroke-width="2"
|
|
class="timeline-event" data-entry-id="{entry-id}"
|
|
title="{summary}">
|
|
</line>
|
|
</svg>
|
|
</div>
|
|
```
|
|
- Position: Calculate percentage based on timestamp between min_time and max_time
|
|
- Color: white/lightgray (info), yellow (warn), red (error)
|
|
- Clickable: Jump to corresponding entry
|
|
- Tooltip on hover: Show summary
|
|
|
|
**Log Entries Section:**
|
|
```html
|
|
<div class="entries">
|
|
<div class="filters">
|
|
<!-- Filter controls: by level, by resource, by time range -->
|
|
</div>
|
|
|
|
<div class="entry" id="entry-{index}">
|
|
<div class="entry-header">
|
|
<span class="timestamp">{formatted-timestamp}</span>
|
|
<span class="level badge-{level}">{level}</span>
|
|
<span class="source">{filename}:{line-number}</span>
|
|
</div>
|
|
<div class="entry-summary">{summary}</div>
|
|
<details class="entry-details">
|
|
<summary>Show full content</summary>
|
|
<pre><code>{content}</code></pre>
|
|
</details>
|
|
</div>
|
|
</div>
|
|
```
|
|
|
|
**CSS Styling:**
|
|
- Modern, clean design with good contrast
|
|
- Responsive layout
|
|
- Badge colors: info=gray, warn=yellow, error=red
|
|
- Monospace font for log content
|
|
- Syntax highlighting for JSON (in audit logs)
|
|
|
|
**JavaScript Interactivity:**
|
|
```javascript
|
|
// Timeline click handler
|
|
document.querySelectorAll('.timeline-event').forEach(el => {
|
|
el.addEventListener('click', () => {
|
|
const entryId = el.dataset.entryId;
|
|
document.getElementById(entryId).scrollIntoView({behavior: 'smooth'});
|
|
});
|
|
});
|
|
|
|
// Filter controls
|
|
// Expand/collapse details
|
|
// Search within entries
|
|
```
|
|
|
|
5. **Write HTML to file**
|
|
- Script automatically writes to `.work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
|
- Includes proper HTML5 structure
|
|
- All CSS and JavaScript are inline for portability
|
|
|
|
### Step 8: Present Results to User
|
|
|
|
1. **Display summary**
|
|
```
|
|
Resource Lifecycle Analysis Complete
|
|
|
|
Prow Job: {prowjob-name}
|
|
Build ID: {build_id}
|
|
Target: {target}
|
|
|
|
Resources Analyzed:
|
|
- {resource-spec-1}
|
|
- {resource-spec-2}
|
|
...
|
|
|
|
Artifacts downloaded to: .work/prow-job-analyze-resource/{build_id}/logs/
|
|
|
|
Results:
|
|
- Audit log entries: {audit-count}
|
|
- Pod log entries: {pod-count}
|
|
- Total entries: {total-count}
|
|
- Time range: {min_time} to {max_time}
|
|
|
|
Report generated: .work/prow-job-analyze-resource/{build_id}/{resource_name}.html
|
|
|
|
Open in browser to view interactive timeline and detailed entries.
|
|
```
|
|
|
|
2. **Open report in browser**
|
|
- Detect platform and automatically open the HTML report in the default browser
|
|
- Linux: `xdg-open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
|
- macOS: `open .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
|
- Windows: `start .work/prow-job-analyze-resource/{build_id}/{resource_name}.html`
|
|
- On Linux (most common for this environment), use `xdg-open`
|
|
|
|
3. **Offer next steps**
|
|
- Ask if user wants to search for additional resources in the same job
|
|
- Ask if user wants to analyze a different Prow job
|
|
- Explain that artifacts are cached in `.work/prow-job-analyze-resource/{build_id}/` for faster subsequent searches
|
|
|
|
## Error Handling
|
|
|
|
Handle these error scenarios gracefully:
|
|
|
|
1. **Invalid URL format**
|
|
- Error: "URL must contain 'test-platform-results/' substring"
|
|
- Provide example of valid URL
|
|
|
|
2. **Build ID not found**
|
|
- Error: "Could not find build ID (10+ decimal digits) in URL path"
|
|
- Explain requirement and show URL parsing
|
|
|
|
3. **gcloud not installed**
|
|
- Detect with: `which gcloud`
|
|
- Provide installation instructions for user's platform
|
|
- Link: https://cloud.google.com/sdk/docs/install
|
|
|
|
4. **gcloud not authenticated**
|
|
- Detect with: `gcloud auth list`
|
|
- Instruct: "Please run: gcloud auth login"
|
|
|
|
5. **No access to bucket**
|
|
- Error from gcloud storage commands
|
|
- Explain: "You need read access to the test-platform-results GCS bucket"
|
|
- Suggest checking project access
|
|
|
|
6. **prowjob.json not found**
|
|
- Suggest verifying URL and checking if job completed
|
|
- Provide gcsweb URL for manual verification
|
|
|
|
7. **Not a ci-operator job**
|
|
- Error: "This is not a ci-operator job. No --target found in prowjob.json."
|
|
- Explain: Only ci-operator jobs can be analyzed by this skill
|
|
|
|
8. **gather-extra not found**
|
|
- Warn: "gather-extra directory not found for target {target}"
|
|
- Suggest: Job may not have completed or target name is incorrect
|
|
|
|
9. **No matches found**
|
|
- Display: "No log entries found matching the specified resources"
|
|
- Suggest:
|
|
- Check resource names for typos
|
|
- Try searching without kind or namespace filters
|
|
- Verify resources existed during this job execution
|
|
|
|
10. **Timestamp parsing failures**
|
|
- Warn about unparseable timestamps
|
|
- Fall back to line order for sorting
|
|
- Still include entries in report
|
|
|
|
## Performance Considerations
|
|
|
|
1. **Avoid re-downloading**
|
|
- Check if `.work/prow-job-analyze-resource/{build_id}/logs/` already has content
|
|
- Ask user before re-downloading
|
|
|
|
2. **Efficient downloads**
|
|
- Use `gcloud storage cp -r` for recursive downloads
|
|
- Use `--no-user-output-enabled` to suppress verbose output
|
|
- Create target directories with `mkdir -p` before downloading to avoid gcloud errors
|
|
|
|
3. **Memory efficiency**
|
|
- The `parse_all_logs.py` script processes log files incrementally (line by line)
|
|
- Don't load entire files into memory
|
|
- Script outputs to JSON for efficient HTML generation
|
|
|
|
4. **Content length limits**
|
|
- The HTML generator trims JSON content to ~2000 chars in display
|
|
- Full content is available in expandable details sections
|
|
|
|
5. **Progress indicators**
|
|
- Show "Downloading audit logs..." before gcloud commands
|
|
- Show "Parsing audit logs..." before running parse script
|
|
- Show "Generating HTML report..." before running report generator
|
|
|
|
## Examples
|
|
|
|
### Example 1: Search for a namespace/project
|
|
```
|
|
User: "Analyze e2e-test-project-api-p28m in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview/1964725888612306944"
|
|
|
|
Output:
|
|
- Downloads artifacts to: .work/prow-job-analyze-resource/1964725888612306944/logs/
|
|
- Finds actual resource name: e2e-test-project-api-p28mx (namespace)
|
|
- Parses 382 audit log entries
|
|
- Finds 86 pod log mentions
|
|
- Creates: .work/prow-job-analyze-resource/1964725888612306944/e2e-test-project-api-p28mx.html
|
|
- Shows timeline from creation (18:11:02) to deletion (18:17:32)
|
|
```
|
|
|
|
### Example 2: Search for a pod
|
|
```
|
|
User: "Analyze pod/etcd-0 in this Prow job: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30393/pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn/1978913325970362368/"
|
|
|
|
Output:
|
|
- Creates: .work/prow-job-analyze-resource/1978913325970362368/etcd-0.html
|
|
- Shows timeline of all pod/etcd-0 events across namespaces
|
|
```
|
|
|
|
### Example 3: Search by name only
|
|
```
|
|
User: "Find all resources named cluster-version-operator in job {url}"
|
|
|
|
Output:
|
|
- Searches without kind filter
|
|
- Finds deployments, pods, services, etc. all named cluster-version-operator
|
|
- Creates: .work/prow-job-analyze-resource/{build_id}/cluster-version-operator.html
|
|
```
|
|
|
|
### Example 4: Search for multiple resources using regex
|
|
```
|
|
User: "Analyze e2e-test-project-api-pkjxf and e2e-test-project-api-7zdxx in job {url}"
|
|
|
|
Output:
|
|
- Uses regex pattern: `e2e-test-project-api-pkjxf|e2e-test-project-api-7zdxx`
|
|
- Finds all events for both namespaces in a single pass
|
|
- Parses 1,047 total entries (501 for first namespace, 546 for second)
|
|
- Passes the same pattern to generate_html_report.py
|
|
- HTML displays: "Resources: e2e-test-project-api-7zdxx, e2e-test-project-api-pkjxf"
|
|
- Creates: .work/prow-job-analyze-resource/{build_id}/e2e-test-project-api-pkjxf.html
|
|
- Timeline shows interleaved events from both namespaces chronologically
|
|
```
|
|
|
|
## Tips
|
|
|
|
- Always verify gcloud prerequisites before starting (gcloud CLI must be installed)
|
|
- Authentication is NOT required - the bucket is publicly accessible
|
|
- Use `.work/prow-job-analyze-resource/{build_id}/` directory structure for organization
|
|
- All work files are in `.work/` which is already in .gitignore
|
|
- The Python scripts handle all parsing and HTML generation - use them!
|
|
- Cache artifacts in `.work/prow-job-analyze-resource/{build_id}/` to speed up subsequent searches
|
|
- The parse script supports **regex patterns** for flexible matching:
|
|
- Use `resource1|resource2` to search for multiple resources in a single pass
|
|
- Use `.*` wildcards to match resource name patterns
|
|
- Simple substring matching still works for basic searches
|
|
- The resource name provided by the user may not exactly match the actual resource name in logs
|
|
- Example: User asks for `e2e-test-project-api-p28m` but actual resource is `e2e-test-project-api-p28mx`
|
|
- Use regex patterns like `e2e-test-project-api-p28m.*` to find partial matches
|
|
- For namespaces/projects, search for the resource name - it will match both `namespace` and `project` resources
|
|
- Provide helpful error messages with actionable solutions
|
|
|
|
## Important Notes
|
|
|
|
1. **Resource Name Matching:**
|
|
- The parse script uses **regex pattern matching** for maximum flexibility
|
|
- Supports pipe operator (`|`) to search for multiple resources: `resource1|resource2`
|
|
- Supports wildcards (`.*`) for pattern matching: `e2e-test-.*`
|
|
- Simple substrings still work for basic searches
|
|
- May match multiple related resources (e.g., namespace, project, rolebindings in that namespace)
|
|
- Report all matches - this provides complete lifecycle context
|
|
|
|
2. **Namespace vs Project:**
|
|
- In OpenShift, a `project` is essentially a `namespace` with additional metadata
|
|
- Searching for a namespace will find both namespace and project resources
|
|
- The audit logs contain events for both resource types
|
|
|
|
3. **Target Extraction:**
|
|
- Must extract the `--target` argument from prowjob.json
|
|
- This is critical for finding the correct gather-extra path
|
|
- Non-ci-operator jobs cannot be analyzed (they don't have --target)
|
|
|
|
4. **Working with Scripts:**
|
|
- All scripts are in `plugins/prow-job/skills/prow-job-analyze-resource/`
|
|
- `parse_all_logs.py` - Parses audit logs and pod logs, outputs JSON
|
|
- Detects glog severity levels (E=error, W=warn, I=info, F=fatal)
|
|
- Supports regex patterns for resource matching
|
|
- `generate_html_report.py` - Generates interactive HTML report from JSON
|
|
- Scripts output status messages to stderr for progress display. JSON output to stdout is clean.
|
|
|
|
5. **Pod Log Glog Format Support:**
|
|
- The parser automatically detects and parses glog format logs
|
|
- Glog format: `E0910 11:43:41.153414 ...`
|
|
- `E` = severity (E/F → error, W → warn, I → info)
|
|
- `0910` = month/day (MMDD)
|
|
- `11:43:41.153414` = time with microseconds
|
|
- Timestamp parsing: Extracts timestamp and infers year (2025)
|
|
- Severity mapping allows filtering by level in HTML report
|
|
- Non-glog logs default to info level
|