Initial commit
This commit is contained in:
365
skills/buildkite-status/references/troubleshooting.md
Normal file
365
skills/buildkite-status/references/troubleshooting.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# Buildkite Status Troubleshooting
|
||||
|
||||
Common errors when working with Buildkite and how to resolve them.
|
||||
|
||||
## MCP Tool Errors
|
||||
|
||||
### Error: "job not found"
|
||||
|
||||
**When**: Calling `buildkite:get_logs`
|
||||
|
||||
**Cause**: Using step ID from URL instead of job UUID from API
|
||||
|
||||
**Solution**:
|
||||
|
||||
1. Call `buildkite:get_build` with `detail_level: "detailed"`
|
||||
2. Find job by `label` field
|
||||
3. Extract `uuid` field (NOT the `id` field)
|
||||
4. Use that UUID in `get_logs`
|
||||
|
||||
**Example**:
|
||||
|
||||
```javascript
|
||||
// ❌ Wrong - using step ID from URL
|
||||
mcp__MCPProxy__call_tool('buildkite:get_logs', {
|
||||
job_id: '019a5f23-8109-4656-a033-bd62a82ca239', // This is a step ID
|
||||
});
|
||||
|
||||
// ✅ Correct - get job UUID from API first
|
||||
const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
|
||||
org_slug: 'gusto',
|
||||
pipeline_slug: 'payroll-building-blocks',
|
||||
build_number: '29627',
|
||||
detail_level: 'detailed',
|
||||
});
|
||||
|
||||
const job = build.jobs.find(
|
||||
(j) => j.label === 'ste rspec' && j.state === 'failed'
|
||||
);
|
||||
|
||||
await mcp__MCPProxy__call_tool('buildkite:get_logs', {
|
||||
org_slug: 'gusto',
|
||||
pipeline_slug: 'payroll-building-blocks',
|
||||
build_number: '29627',
|
||||
job_id: job.uuid, // This is the correct job UUID
|
||||
});
|
||||
```
|
||||
|
||||
**See Also**: [url-parsing.md](url-parsing.md) for step ID vs job UUID explanation
|
||||
|
||||
---
|
||||
|
||||
### Error: "build not found" or "pipeline not found"
|
||||
|
||||
**When**: Calling any MCP tool
|
||||
|
||||
**Cause**: Incorrect org slug or pipeline slug format
|
||||
|
||||
**Common Mistakes**:
|
||||
|
||||
- Using repository name instead of pipeline slug
|
||||
- Including org name in pipeline slug
|
||||
- Using display name instead of URL slug
|
||||
|
||||
**Solution**:
|
||||
Extract slugs from URL correctly:
|
||||
|
||||
```
|
||||
https://buildkite.com/gusto/payroll-building-blocks/builds/123
|
||||
^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
|
||||
org pipeline slug
|
||||
```
|
||||
|
||||
**Slug Format Rules**:
|
||||
|
||||
- All lowercase
|
||||
- Hyphens instead of underscores
|
||||
- No spaces
|
||||
- No special characters
|
||||
|
||||
**Example**:
|
||||
|
||||
```javascript
|
||||
// ❌ Wrong
|
||||
{ org_slug: "Gusto", pipeline_slug: "Payroll Building Blocks" }
|
||||
|
||||
// ✅ Correct
|
||||
{ org_slug: "gusto", pipeline_slug: "payroll-building-blocks" }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Error: Empty logs returned
|
||||
|
||||
**When**: Calling `buildkite:get_logs`
|
||||
|
||||
**Causes**:
|
||||
|
||||
1. Job hasn't started yet
|
||||
2. Job is still running
|
||||
3. Job failed before producing output
|
||||
4. Logs not available yet (eventual consistency)
|
||||
|
||||
**Diagnosis**:
|
||||
Check job state first:
|
||||
|
||||
```javascript
|
||||
const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
|
||||
detail_level: 'detailed',
|
||||
});
|
||||
|
||||
const job = build.jobs.find((j) => j.uuid === jobUuid);
|
||||
console.log(job.state); // Should be terminal: passed/failed/canceled
|
||||
console.log(job.started_at); // Should not be null
|
||||
console.log(job.finished_at); // Should not be null for terminal state
|
||||
```
|
||||
|
||||
**Solution**:
|
||||
|
||||
- If state is `waiting` or `running`: Wait for job to complete
|
||||
- If state is terminal but logs empty: Wait a few seconds for eventual consistency
|
||||
- If still empty: Job may have failed immediately (check exit_status)
|
||||
|
||||
---
|
||||
|
||||
### Error: "Unauthorized" or "Forbidden"
|
||||
|
||||
**When**: Any MCP tool call
|
||||
|
||||
**Cause**: Authentication or permission issue
|
||||
|
||||
**Diagnosis Steps**:
|
||||
|
||||
1. Check MCP server configuration:
|
||||
|
||||
```bash
|
||||
# MCP server should have BUILDKITE_API_TOKEN configured
|
||||
```
|
||||
|
||||
2. Verify token has correct scope:
|
||||
|
||||
- `read_builds` - Required for reading build info
|
||||
- `read_build_logs` - Required for log retrieval
|
||||
- `read_pipelines` - Required for pipeline listing
|
||||
|
||||
3. Check organization access:
|
||||
- Token must have access to the specific organization
|
||||
- Some orgs require SSO
|
||||
|
||||
**Solution**:
|
||||
|
||||
- Verify BUILDKITE_API_TOKEN in MCP config
|
||||
- Generate new token at https://buildkite.com/user/api-access-tokens
|
||||
- Ensure token has required scopes
|
||||
- Report to human partner if still failing (may need org admin help)
|
||||
|
||||
---
|
||||
|
||||
## bktide CLI Errors
|
||||
|
||||
### Error: "bktide: command not found"
|
||||
|
||||
**Cause**: bktide not installed or not in PATH
|
||||
|
||||
**Solution**:
|
||||
Use MCP tools instead (preferred):
|
||||
|
||||
```javascript
|
||||
// Instead of: npx bktide build gusto/payroll-building-blocks/123
|
||||
mcp__MCPProxy__call_tool('buildkite:get_build', {
|
||||
org_slug: 'gusto',
|
||||
pipeline_slug: 'payroll-building-blocks',
|
||||
build_number: '123',
|
||||
});
|
||||
```
|
||||
|
||||
Or install bktide:
|
||||
|
||||
```bash
|
||||
npm install -g @anthropic/bktide
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Error: "Cannot read logs with bktide"
|
||||
|
||||
**Cause**: bktide does not have log retrieval capability
|
||||
|
||||
**Solution**:
|
||||
Use MCP tools for logs:
|
||||
|
||||
```javascript
|
||||
mcp__MCPProxy__call_tool('buildkite:get_logs', {
|
||||
org_slug: 'gusto',
|
||||
pipeline_slug: 'payroll-building-blocks',
|
||||
build_number: '123',
|
||||
job_id: '<job-uuid>',
|
||||
});
|
||||
```
|
||||
|
||||
**See Also**: [tool-capabilities.md](tool-capabilities.md) for complete capability matrix
|
||||
|
||||
---
|
||||
|
||||
## Script Errors
|
||||
|
||||
### Error: Script fails with "bktide error"
|
||||
|
||||
**Cause**: Scripts depend on bktide internally
|
||||
|
||||
**Solution**:
|
||||
|
||||
1. Use equivalent MCP tool instead (preferred)
|
||||
2. Or ensure bktide is installed and configured
|
||||
3. Or check `BK_TOKEN` environment variable is set
|
||||
|
||||
**Example**:
|
||||
|
||||
```bash
|
||||
# Script failing
|
||||
~/.claude/skills/buildkite-status/scripts/wait-for-build.js gusto payroll-building-blocks 123
|
||||
|
||||
# Use MCP tool instead
|
||||
mcp__MCPProxy__call_tool("buildkite:wait_for_build", {
|
||||
org_slug: "gusto",
|
||||
pipeline_slug: "payroll-building-blocks",
|
||||
build_number: "123",
|
||||
timeout: 1800,
|
||||
poll_interval: 30
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build State Confusion
|
||||
|
||||
### Issue: Many jobs show "broken" but build looks healthy
|
||||
|
||||
**Cause**: "broken" doesn't mean failed - it usually means skipped
|
||||
|
||||
**Explanation**:
|
||||
Buildkite uses "broken" state for:
|
||||
|
||||
- Jobs skipped because dependency failed
|
||||
- Jobs skipped due to conditional logic
|
||||
- Jobs skipped because file changes didn't affect them
|
||||
|
||||
**Solution**:
|
||||
Filter for actual failures:
|
||||
|
||||
```javascript
|
||||
mcp__MCPProxy__call_tool('buildkite:get_build', {
|
||||
detail_level: 'detailed',
|
||||
job_state: 'failed', // Only show actually failed jobs
|
||||
});
|
||||
```
|
||||
|
||||
**See Also**: [buildkite-states.md](buildkite-states.md) for complete state explanations
|
||||
|
||||
---
|
||||
|
||||
### Issue: Build shows "failed" but all jobs passed
|
||||
|
||||
**Cause**: A "soft_failed" job counts as passed in job list but failed for build state
|
||||
|
||||
**Solution**:
|
||||
Check for soft failures:
|
||||
|
||||
```javascript
|
||||
const build = await mcp__MCPProxy__call_tool('buildkite:get_build', {
|
||||
detail_level: 'detailed',
|
||||
});
|
||||
|
||||
const softFails = build.jobs.filter((j) => j.soft_failed === true);
|
||||
console.log(softFails); // These caused build to fail but are marked non-blocking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Workflow Issues
|
||||
|
||||
### Issue: Cannot find recent build for branch
|
||||
|
||||
**Cause**: Build may be filtered or pipeline has many builds
|
||||
|
||||
**Solution**:
|
||||
Use branch filter and increase limit:
|
||||
|
||||
```javascript
|
||||
mcp__MCPProxy__call_tool('buildkite:list_builds', {
|
||||
org_slug: 'gusto',
|
||||
pipeline_slug: 'payroll-building-blocks',
|
||||
branch: 'my-feature-branch',
|
||||
per_page: 20, // Default may be smaller
|
||||
});
|
||||
```
|
||||
|
||||
Or find by commit:
|
||||
|
||||
```bash
|
||||
~/.claude/skills/buildkite-status/scripts/find-commit-builds.js gusto <commit-sha>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue: Multiple jobs have same label, can't tell which failed
|
||||
|
||||
**Cause**: Parallelized jobs have same base label
|
||||
|
||||
**Solution**:
|
||||
Jobs with same label are numbered:
|
||||
|
||||
- "rspec (1/10)"
|
||||
- "rspec (2/10)"
|
||||
|
||||
Match on full label including partition:
|
||||
|
||||
```javascript
|
||||
const failedJob = build.jobs.find(
|
||||
(j) => j.label === 'rspec (2/10)' && j.state === 'failed'
|
||||
);
|
||||
```
|
||||
|
||||
Or find all failed jobs with that label:
|
||||
|
||||
```javascript
|
||||
const failedRspecJobs = build.jobs.filter(
|
||||
(j) => j.label.startsWith('rspec (') && j.state === 'failed'
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree: What to Do When Stuck
|
||||
|
||||
```
|
||||
Unable to investigate build failure?
|
||||
│
|
||||
├─ Can't get build details
|
||||
│ ├─ Check URL format → [url-parsing.md]
|
||||
│ ├─ Check org/pipeline slugs → lowercase, hyphenated
|
||||
│ └─ Check auth → BUILDKITE_API_TOKEN configured
|
||||
│
|
||||
├─ Can't get job logs
|
||||
│ ├─ Using bktide? → Use MCP tools instead [tool-capabilities.md]
|
||||
│ ├─ Getting "job not found"? → Using step ID instead of job UUID [url-parsing.md]
|
||||
│ ├─ Empty logs? → Check job state (started_at, finished_at)
|
||||
│ └─ Still failing? → Report to human partner (may be auth/permission)
|
||||
│
|
||||
├─ Confused about job states
|
||||
│ ├─ Many "broken" jobs? → Normal, means skipped [buildkite-states.md]
|
||||
│ ├─ "soft_failed"? → Failed but non-blocking
|
||||
│ └─ Can't find failed job? → Filter with job_state: "failed"
|
||||
│
|
||||
└─ Tool not working
|
||||
├─ MCP tool error? → Check auth, verify slugs
|
||||
├─ bktide error? → Use MCP tools instead
|
||||
└─ Script error? → Use MCP tools directly
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [SKILL.md](../SKILL.md) - Main skill documentation
|
||||
- [tool-capabilities.md](tool-capabilities.md) - What each tool can do
|
||||
- [url-parsing.md](url-parsing.md) - Understanding URLs and IDs
|
||||
- [buildkite-states.md](buildkite-states.md) - Build and job states
|
||||
Reference in New Issue
Block a user