Files
2025-11-30 09:00:26 +08:00

8.5 KiB

Buildkite Status Troubleshooting

Common errors when working with Buildkite and how to resolve them.

MCP Tool Errors

Error: "job not found"

When: Calling buildkite:get_logs

Cause: Using step ID from URL instead of job UUID from API

Solution:

  1. Call buildkite:get_build with detail_level: "detailed"
  2. Find job by label field
  3. Extract uuid field (NOT the id field)
  4. Use that UUID in get_logs

Example:

// ❌ Wrong - using step ID from URL
mcp__MCPProxy__call_tool('buildkite:get_logs', {
  job_id: '019a5f23-8109-4656-a033-bd62a82ca239', // This is a step ID
});

// ✅ Correct - get job UUID from API first
const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
  org_slug: 'gusto',
  pipeline_slug: 'payroll-building-blocks',
  build_number: '29627',
  detail_level: 'detailed',
});

const job = build.jobs.find(
  (j) => j.label === 'ste rspec' && j.state === 'failed'
);

await mcp__MCPProxy__call_tool('buildkite:get_logs', {
  org_slug: 'gusto',
  pipeline_slug: 'payroll-building-blocks',
  build_number: '29627',
  job_id: job.uuid, // This is the correct job UUID
});

See Also: url-parsing.md for step ID vs job UUID explanation


Error: "build not found" or "pipeline not found"

When: Calling any MCP tool

Cause: Incorrect org slug or pipeline slug format

Common Mistakes:

  • Using repository name instead of pipeline slug
  • Including org name in pipeline slug
  • Using display name instead of URL slug

Solution: Extract slugs from URL correctly:

https://buildkite.com/gusto/payroll-building-blocks/builds/123
                        ^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
                        org   pipeline slug

Slug Format Rules:

  • All lowercase
  • Hyphens instead of underscores
  • No spaces
  • No special characters

Example:

// ❌ Wrong
{ org_slug: "Gusto", pipeline_slug: "Payroll Building Blocks" }

// ✅ Correct
{ org_slug: "gusto", pipeline_slug: "payroll-building-blocks" }

Error: Empty logs returned

When: Calling buildkite:get_logs

Causes:

  1. Job hasn't started yet
  2. Job is still running
  3. Job failed before producing output
  4. Logs not available yet (eventual consistency)

Diagnosis: Check job state first:

const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
  detail_level: 'detailed',
});

const job = build.jobs.find((j) => j.uuid === jobUuid);
console.log(job.state); // Should be terminal: passed/failed/canceled
console.log(job.started_at); // Should not be null
console.log(job.finished_at); // Should not be null for terminal state

Solution:

  • If state is waiting or running: Wait for job to complete
  • If state is terminal but logs empty: Wait a few seconds for eventual consistency
  • If still empty: Job may have failed immediately (check exit_status)

Error: "Unauthorized" or "Forbidden"

When: Any MCP tool call

Cause: Authentication or permission issue

Diagnosis Steps:

  1. Check MCP server configuration:

    # MCP server should have BUILDKITE_API_TOKEN configured
    
  2. Verify token has correct scope:

    • read_builds - Required for reading build info
    • read_build_logs - Required for log retrieval
    • read_pipelines - Required for pipeline listing
  3. Check organization access:

    • Token must have access to the specific organization
    • Some orgs require SSO

Solution:


bktide CLI Errors

Error: "bktide: command not found"

Cause: bktide not installed or not in PATH

Solution: Use MCP tools instead (preferred):

// Instead of: npx bktide build gusto/payroll-building-blocks/123
mcp__MCPProxy__call_tool('buildkite:get_build', {
  org_slug: 'gusto',
  pipeline_slug: 'payroll-building-blocks',
  build_number: '123',
});

Or install bktide:

npm install -g @anthropic/bktide

Error: "Cannot read logs with bktide"

Cause: bktide does not have log retrieval capability

Solution: Use MCP tools for logs:

mcp__MCPProxy__call_tool('buildkite:get_logs', {
  org_slug: 'gusto',
  pipeline_slug: 'payroll-building-blocks',
  build_number: '123',
  job_id: '<job-uuid>',
});

See Also: tool-capabilities.md for complete capability matrix


Script Errors

Error: Script fails with "bktide error"

Cause: Scripts depend on bktide internally

Solution:

  1. Use equivalent MCP tool instead (preferred)
  2. Or ensure bktide is installed and configured
  3. Or check BK_TOKEN environment variable is set

Example:

# Script failing
~/.claude/skills/buildkite-status/scripts/wait-for-build.js gusto payroll-building-blocks 123

# Use MCP tool instead
mcp__MCPProxy__call_tool("buildkite:wait_for_build", {
  org_slug: "gusto",
  pipeline_slug: "payroll-building-blocks",
  build_number: "123",
  timeout: 1800,
  poll_interval: 30
})

Build State Confusion

Issue: Many jobs show "broken" but build looks healthy

Cause: "broken" doesn't mean failed - it usually means skipped

Explanation: Buildkite uses "broken" state for:

  • Jobs skipped because dependency failed
  • Jobs skipped due to conditional logic
  • Jobs skipped because file changes didn't affect them

Solution: Filter for actual failures:

mcp__MCPProxy__call_tool('buildkite:get_build', {
  detail_level: 'detailed',
  job_state: 'failed', // Only show actually failed jobs
});

See Also: buildkite-states.md for complete state explanations


Issue: Build shows "failed" but all jobs passed

Cause: A "soft_failed" job counts as passed in job list but failed for build state

Solution: Check for soft failures:

const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
  detail_level: 'detailed',
});

const softFails = build.jobs.filter((j) => j.soft_failed === true);
console.log(softFails); // These caused build to fail but are marked non-blocking

Common Workflow Issues

Issue: Cannot find recent build for branch

Cause: Build may be filtered or pipeline has many builds

Solution: Use branch filter and increase limit:

mcp__MCPProxy__call_tool('buildkite:list_builds', {
  org_slug: 'gusto',
  pipeline_slug: 'payroll-building-blocks',
  branch: 'my-feature-branch',
  per_page: 20, // Default may be smaller
});

Or find by commit:

~/.claude/skills/buildkite-status/scripts/find-commit-builds.js gusto <commit-sha>

Issue: Multiple jobs have same label, can't tell which failed

Cause: Parallelized jobs have same base label

Solution: Jobs with same label are numbered:

  • "rspec (1/10)"
  • "rspec (2/10)"

Match on full label including partition:

const failedJob = build.jobs.find(
  (j) => j.label === 'rspec (2/10)' && j.state === 'failed'
);

Or find all failed jobs with that label:

const failedRspecJobs = build.jobs.filter(
  (j) => j.label.startsWith('rspec (') && j.state === 'failed'
);

Decision Tree: What to Do When Stuck

Unable to investigate build failure?
│
├─ Can't get build details
│  ├─ Check URL format → [url-parsing.md]
│  ├─ Check org/pipeline slugs → lowercase, hyphenated
│  └─ Check auth → BUILDKITE_API_TOKEN configured
│
├─ Can't get job logs
│  ├─ Using bktide? → Use MCP tools instead [tool-capabilities.md]
│  ├─ Getting "job not found"? → Using step ID instead of job UUID [url-parsing.md]
│  ├─ Empty logs? → Check job state (started_at, finished_at)
│  └─ Still failing? → Report to human partner (may be auth/permission)
│
├─ Confused about job states
│  ├─ Many "broken" jobs? → Normal, means skipped [buildkite-states.md]
│  ├─ "soft_failed"? → Failed but non-blocking
│  └─ Can't find failed job? → Filter with job_state: "failed"
│
└─ Tool not working
   ├─ MCP tool error? → Check auth, verify slugs
   ├─ bktide error? → Use MCP tools instead
   └─ Script error? → Use MCP tools directly

See Also