--- name: buildkite-status description: Use when checking Buildkite CI status for PRs, branches, or builds - provides workflows for monitoring build status, investigating failures, and handling post-push scenarios with progressive detail disclosure. Use when tempted to use GitHub tools instead of Buildkite-native tools, or when a Buildkite tool fails and you want to fall back to familiar alternatives. --- # Buildkite Status ## Overview This skill provides workflows and tools for checking and monitoring Buildkite CI status. It focuses on **checking status and investigating failures** rather than creating or configuring pipelines. Use this skill when working with Buildkite builds, especially for PR workflows, post-push monitoring, and failure investigation. ## When to Use This Skill Use this skill when: - Checking CI status for the current branch or PR - Investigating why a build failed - Monitoring builds after a git push - Waiting for builds to complete - Checking build status across multiple repos/PRs - Understanding what "broken" or other Buildkite states mean ## Tool Hierarchy and Selection **CRITICAL**: Always use Buildkite-native tools. Never fall back to GitHub tools (`gh pr view`, GitHub API, etc.) - they only show summaries and lose critical information (annotations, logs, real-time updates, state distinctions). Use tools in this priority order: ### Primary: MCP Tools (Always Use These First) **Reliability**: Direct Buildkite API access, always available **Capabilities**: All operations (list, get, wait, unblock) **When**: Default choice for ALL workflows Available MCP tools: - `buildkite:get_build` - Get detailed build information - `buildkite:list_builds` - List builds for a pipeline - `buildkite:list_annotations` - Get annotations for a build - `buildkite:get_pipeline` - Get pipeline configuration - `buildkite:list_pipelines` - List all pipelines in an org - **`buildkite:wait_for_build`** - Wait for a build to complete (PREFERRED for monitoring) - **`buildkite:get_logs`** - Retrieve job logs (CRITICAL for debugging failures) - `buildkite:get_logs_info` - Get log metadata - `buildkite:list_artifacts` - List build artifacts ### Secondary: bktide CLI (Convenience) **Purpose**: Human-readable terminal output **Limitation**: External dependency, requires npm/npx **When**: Interactive terminal work when MCP output is too verbose **Critical Limitation**: bktide CANNOT retrieve job logs. It only displays build summaries and job lists. For log retrieval, always use MCP tools. Common commands: ```bash npx bktide pipelines # List pipelines npx bktide builds / # List builds npx bktide build /# # Get build details npx bktide annotations /# # Show annotations ``` ### Tertiary: Bundled Scripts (Helper Wrappers) **Purpose**: Pre-built workflows combining multiple tool calls **Limitation**: External dependencies (bktide, specific versions) **When**: Convenience wrappers only - use MCP tools if scripts fail This skill includes scripts for common workflows: - **`scripts/wait-for-build.js`** - Background monitoring script that polls until build completion - **`scripts/find-commit-builds.js`** - Find builds matching a specific commit SHA ### Tool Capability Matrix Different tools have different capabilities. Understanding these limitations prevents wasted effort. **Key Capabilities:** | Capability | MCP Tools | bktide | Scripts | | ----------------- | --------- | ------ | ------- | | List builds | ✅ | ✅ | ✅ | | Get build details | ✅ | ✅ | ✅ | | Get annotations | ✅ | ✅ | ❌ | | **Retrieve logs** | **✅** | **❌** | **✅** | | Wait for build | ✅ | ❌ | ✅ | | Unblock jobs | ✅ | ❌ | ❌ | **Most Important**: Only MCP tools and scripts can retrieve job logs. bktide cannot. For complete capability details and examples, see [references/tool-capabilities.md](references/tool-capabilities.md). ### When Tools Fail: Fallback Hierarchy **If wait-for-build.js script fails:** 1. ✅ Use `buildkite:wait_for_build` MCP tool instead (preferred) 2. ✅ Use `buildkite:get_build` MCP tool in a polling loop 3. ❌ Do NOT fall back to `gh pr view` or GitHub tools **If bktide fails:** 1. ✅ Use equivalent MCP tool 2. ❌ Do NOT fall back to GitHub tools **If MCP tools fail:** 1. ✅ Check MCP server connection status 2. ✅ Restart MCP connection 3. ✅ Report the MCP failure to your human partner 4. ❌ Do NOT fall back to GitHub tools **Critical**: One tool failing does NOT mean the entire skill is invalid. Move up the hierarchy, don't abandon Buildkite tools. ## Core Workflows ### 1. Investigating a Build from URL (Most Common) When a user provides a Buildkite URL for a failing build, follow this workflow to investigate. **Example URL formats:** - Build URL: `https://buildkite.com/org/pipeline/builds/12345` - Step URL: `https://buildkite.com/org/pipeline/builds/12345/steps/canvas?sid=019a5f...` **Step 1: Extract build identifiers from URL** Parse the URL to extract: - Organization slug (e.g., "gusto") - Pipeline slug (e.g., "payroll-building-blocks") - Build number (e.g., "12345") Ignore the `sid` query parameter - it's a step ID, not needed for initial investigation. **Step 2: Get build overview** ```javascript mcp__MCPProxy__call_tool('buildkite:get_build', { org_slug: '', pipeline_slug: '', build_number: '', detail_level: 'summary', }); ``` Check the overall build state: `passed`, `failed`, `running`, `blocked`, `canceled`. **Step 3: Identify failed jobs** If build state is `failed`, get detailed job information: ```javascript mcp__MCPProxy__call_tool('buildkite:get_build', { org_slug: '', pipeline_slug: '', build_number: '', detail_level: 'detailed', job_state: 'failed', }); ``` This returns only jobs with `state: "failed"` (not "broken" - see state reference). **Step 4: Retrieve logs for failed jobs** For each failed job, extract its `uuid` field and retrieve logs. See "Retrieving Job Logs" workflow below for detailed instructions. **Step 5: Analyze error output** Look for: - Stack traces - Test failure messages - Exit codes and error messages - File paths and line numbers **Step 6: Help reproduce locally** Based on the error, suggest: - Which tests to run locally - Environment setup needed - Commands to reproduce the failure --- ### 2. Retrieving Job Logs **CRITICAL**: This is the most important capability. Without logs, you cannot debug failures. Once you've identified a failed job, retrieve its logs to see the actual error. **Prerequisites:** - Organization slug - Pipeline slug - Build number - Job UUID (from build details) **Important**: Job UUIDs ≠ Step IDs. URLs contain step IDs (`sid=019a5f...`), but MCP tools need job UUIDs from the build details response. **Step 1: Get the job UUID** If you have a job label (e.g., "ste rspec"), use `get_build` with `detail_level: "detailed"`: ```javascript mcp__MCPProxy__call_tool('buildkite:get_build', { org_slug: 'gusto', pipeline_slug: 'payroll-building-blocks', build_number: '29627', detail_level: 'detailed', job_state: 'failed', }); ``` In the response, find the job by matching the `label` field. Extract its `uuid` field (format: `019a5f20-2d30-4c67-9edd-...`). **Step 2: Retrieve logs using the job UUID** Use the MCP tool to get logs: ```javascript mcp__MCPProxy__call_tool('buildkite:get_logs', { org_slug: 'gusto', pipeline_slug: 'payroll-building-blocks', build_number: '29627', job_id: '', }); ``` The response contains the log output from the job execution. **Common Issues:** - **"job not found" error**: You likely provided a step ID instead of a job UUID. Step IDs come from URLs (`sid=019a5f...`). Job UUIDs come from `get_build` API responses. Solution: Call `get_build` with `detail_level: "detailed"` to find the correct job UUID. - **Empty logs**: The job may not have started yet, or logs may not be available yet. Check the job's `state` field first - it should be in a terminal state (`passed`, `failed`, `canceled`). - **Multiple jobs with same label**: Some pipelines parallelize jobs with the same label (e.g., "rspec (1/10)", "rspec (2/10)"). Filter by the full label string to find the specific failed job. **Fallback Strategy:** If MCP tools fail (e.g., connection issues, permissions), you can: 1. Construct the log URL manually and view in browser: ``` https://buildkite.com/{org}/{pipeline}/builds/{build}/jobs/{job-uuid} ``` 2. Use the bundled script (if available): ```bash ~/.claude/skills/buildkite-status/scripts/get-build-logs.js ``` **Why bktide Cannot Help:** The bktide CLI does NOT have a logs command. It can show build summaries and job lists, but cannot retrieve log content. Always use MCP tools for log retrieval. See [references/tool-capabilities.md](references/tool-capabilities.md) for complete tool capability matrix. --- ### 3. Checking Current Branch/PR Status This is the most common workflow when working on a branch: **Step 1: Identify the pipeline and branch** Determine which pipeline(s) run on PRs for this repository. Common patterns: - Repository name matches pipeline slug - Monorepo may have pipeline named after the main repo **Step 2: Find builds for the current branch** Use MCP tools to list recent builds: ```javascript mcp__MCPProxy__call_tool('buildkite:list_builds', { org_slug: '', pipeline_slug: '', branch: '', detail_level: 'summary', }); ``` Or use bktide: ```bash npx bktide builds --format json / ``` **Step 3: Progressive disclosure of status** Follow this pattern when examining builds: 1. **Overall state** - Is it `passed`, `failed`, `running`, `blocked`, or `canceled`? 2. **Job summary** - How many jobs passed/failed/broken? 3. **Annotations** (if present) - Check for test failures, warnings, or errors 4. **Failed job details** - Get logs for actually failed jobs (not just "broken") ### 4. Post-Push Monitoring Workflow After pushing code, follow this workflow to monitor the CI build: **Step 1: Find builds for the pushed commit** Use the find-commit-builds script: ```bash ~/.claude/skills/buildkite-status/scripts/find-commit-builds.js ``` Or manually search using MCP tools with commit filter. **Step 2: Monitor the build** **Option A (Preferred): Use MCP wait_for_build tool** ```javascript mcp__MCPProxy__call_tool('buildkite:wait_for_build', { org_slug: '', pipeline_slug: '', build_number: '', timeout: 1800, poll_interval: 30, }); ``` This will: - Poll every 30 seconds (configurable with `poll_interval`) - Report status changes - Complete when build reaches terminal state (passed/failed/canceled) - Timeout after 30 minutes (configurable with `timeout`) **Option B (Fallback): Use wait-for-build.js script** If you prefer background execution: ```bash ~/.claude/skills/buildkite-status/scripts/wait-for-build.js --timeout 1800 --interval 30 ``` **If the script fails** (e.g., bktide dependency error), use Option A - the MCP tool is more reliable. **Step 3: Check on progress** Periodically check the background job or wait for it to complete. When it finishes, check the exit code: - 0 = passed - 1 = failed - 2 = canceled - 3 = timeout **Step 4: Investigate failures** If the build failed, follow the "### 1. Investigating a Build from URL" workflow above. ### 5. Investigating Failures (Deprecated) **Note**: This workflow is deprecated. Use "### 1. Investigating a Build from URL" and "### 2. Retrieving Job Logs" instead for a more complete investigation process. When a build has failed, use this systematic approach: **Step 1: Get build overview** ```javascript mcp__MCPProxy__call_tool('buildkite:get_build', { org_slug: '', pipeline_slug: '', build_number: '', detail_level: 'detailed', job_state: 'failed', // Only show failed jobs }); ``` This gives you: - Overall build state - Job summary (how many failed vs broken) - List of failed jobs only **Step 2: Check annotations** Some projects put test failures in annotations: ```javascript mcp__MCPProxy__call_tool('buildkite:list_annotations', { org_slug: '', pipeline_slug: '', build_number: '', }); ``` Look for annotations with `style: "error"` or `style: "warning"`. **Important**: Not all projects use annotations. See [references/annotation-patterns.md](references/annotation-patterns.md) for project-specific patterns. **Step 3: Examine failed jobs** For each failed job (not "broken" - see state reference below): 1. Get the job details from the build data 2. Check the job's log output 3. Look for stack traces, error messages, or test failures **Step 4: Understand "broken" vs "failed"** **Critical**: A job showing as "broken" is often NOT a failure. It typically means: - The job was skipped because an earlier job failed - The job's dependencies weren't met - Conditional pipeline logic determined the job wasn't needed See [references/buildkite-states.md](references/buildkite-states.md) for complete state explanations. **Example**: In large monorepos, many jobs show "broken" because they were skipped due to file changes not affecting them. This is normal and expected. ### 6. Checking Blocked Builds When a build is in `blocked` state, it's waiting for manual approval: **Step 1: Identify the block step** Get the build with `detail_level: "detailed"` and look for jobs with `state: "blocked"`. **Step 2: Review what's being blocked** Block steps typically have a `label` describing what approval is needed (e.g., "Deploy to Production"). **Step 3: Unblock if appropriate** Use the MCP tool to unblock: ```javascript mcp__MCPProxy__call_tool('buildkite:unblock_job', { org_slug: '', pipeline_slug: '', build_number: '', job_id: '', fields: {}, // Optional form fields if the block step has inputs }); ``` ## Understanding Buildkite States Buildkite has several states that can be confusing. Here's a quick reference: ### Build States - `passed` - All jobs completed successfully ✅ - `failed` - One or more jobs failed ❌ - `running` - Build is currently executing 🔄 - `blocked` - Waiting for manual approval 🚫 - `canceled` - Build was canceled ⛔ ### Job States - `passed` - Job succeeded ✅ - `failed` - Job failed with non-zero exit ❌ - `broken` - **MISLEADING**: Usually means skipped due to pipeline logic, NOT a failure ⚠️ - `soft_failed` - Failed but marked as non-blocking 〰️ - `skipped` - Job was skipped ⏭️ **For complete state reference and project-specific patterns**, read [references/buildkite-states.md](references/buildkite-states.md). ## Progressive Disclosure Pattern Always follow this pattern when checking build status: 1. **Start broad**: Overall build state (passed/failed/running) 2. **Check summary**: Job counts (how many passed/failed/broken) 3. **Check annotations**: If present, they often contain key information 4. **Drill into failures**: Only examine failed jobs (not broken) 5. **Read logs**: Get actual error messages and stack traces Don't immediately jump to logs - the build state and annotations often tell you what you need to know. ## Project-Specific Patterns ### Large Projects / Monorepos - **Use annotations heavily**: Test failures are usually summarized in annotations - **Many "broken" jobs**: Normal due to conditional execution - **Complex job graphs**: Jobs have dependencies and conditional logic - **Check annotations first**: They save time vs reading all logs ### Small Projects - **No annotations**: All information is in job logs - **Simpler job structure**: Fewer dependencies and conditions - **"Broken" is unusual**: May indicate an actual problem - **Read logs directly**: No annotations to summarize failures ## Anti-Patterns: What NOT to Do ### ❌ Falling Back to GitHub Tools **Don't**: Use `gh pr view`, `gh pr checks`, or GitHub API to check Buildkite status **Why**: GitHub shows Buildkite check summary only. You lose: - Real-time build logs and output - Annotations with test failure details - Job-level breakdown and states - Ability to distinguish "broken" (skipped) from "failed" - Direct build monitoring and waiting - Proper state information **Reality**: Always use Buildkite tools. GitHub summarizes; Buildkite is the source of truth. ### ❌ Abandoning Skill on Tool Failure **Don't**: "The script failed, so I'll use GitHub tools instead" **Why**: The skill documents MULTIPLE tool tiers: - MCP tools (primary, always available) - bktide CLI (secondary, convenience) - Scripts (tertiary, helpers) **Reality**: One tool failing doesn't invalidate the skill. Follow the fallback hierarchy - move to MCP tools, don't abandon Buildkite entirely. ### ❌ Emergency Override Rationalization **Don't**: "This is urgent, I don't have time to follow the skill" **Why**: Skills exist ESPECIALLY for high-pressure situations. Disciplined workflows prevent mistakes when you're rushed. Making wrong tool choices under pressure wastes MORE time debugging. **Reality**: Following the skill is FASTER than recovering from wrong decisions. Taking 2 minutes to use the right tool saves 20 minutes of confusion. ### ❌ "I Already Know X" Rationalization **Don't**: "I already know `gh pr view` works, why learn Buildkite tools?" **Why**: Familiarity ≠ effectiveness. You'll spend more time working around GitHub's limitations than learning the proper tools. **Reality**: Invest 2 minutes learning Buildkite MCP tools once. Save hours across all future builds. ## Red Flags - STOP If you catch yourself thinking ANY of these thoughts, you're about to violate this skill: - "The script failed, so the skill doesn't apply" - "This is an emergency, no time for the skill" - "I already know gh pr view works" - "GitHub tools show the same information" - "I'll just check GitHub quickly" - "One tool failed, so I'll use what I know" - "The skill is for normal situations, not emergencies" - "I don't have time to learn new tools right now" **These are rationalizations. Stop. Follow the tool hierarchy. Use Buildkite MCP tools.** ## Common Mistakes to Avoid 1. **Treating "broken" as "failed"**: Broken usually means skipped, not failed 2. **Ignoring annotations**: They often contain the most actionable information 3. **Not filtering by state**: Use `job_state: "failed"` to focus on actual failures 4. **Missing blocked builds**: A blocked build won't progress without manual intervention 5. **Polling in foreground**: Use MCP `wait_for_build` tool or background scripts ## Tips for Efficient Status Checking 1. **Use detail levels**: Start with `detail_level: "summary"` to reduce data 2. **Filter by job state**: Request only failed jobs when investigating 3. **Background monitoring**: Run wait-for-build.js in background after pushing 4. **Check annotations first**: For projects that use them, they're faster than logs 5. **Trust the scripts**: The bundled scripts handle polling, timeouts, and edge cases ## Resources ### References - **[buildkite-states.md](references/buildkite-states.md)** - Complete guide to Buildkite states, including the misleading "broken" state and project-specific patterns - **[annotation-patterns.md](references/annotation-patterns.md)** - How different projects use annotations and when to check them - **[tool-capabilities.md](references/tool-capabilities.md)** - Comprehensive capability matrix for MCP tools, bktide, and scripts - **[url-parsing.md](references/url-parsing.md)** - Understanding Buildkite URLs, step IDs vs job UUIDs - **[troubleshooting.md](references/troubleshooting.md)** - Common errors, solutions, and decision tree for when stuck ### Scripts - **[wait-for-build.js](scripts/wait-for-build.js)** - Background monitoring with timeout and polling - **[find-commit-builds.js](scripts/find-commit-builds.js)** - Find builds for a specific commit - **[get-build-logs.js](scripts/get-build-logs.js)** - Helper for log retrieval with UUID resolution (placeholder) - **[parse-buildkite-url.js](scripts/parse-buildkite-url.js)** - Extract components from Buildkite URLs Run scripts with `--help` for usage information.