zhongwei/gh-flanksource-claude-code-plugin

Files

Zhongwei Li 240e090d87 Initial commit

2025-11-29 18:27:15 +08:00

5.6 KiB

Raw Permalink Blame History

name, description, allowed-tools

name	description	allowed-tools
troubleshooting-config-items	Troubleshoots infrastructure and application configuration items in Mission Control by diagnosing health issues, analyzing recent changes, and investigating resource relationships. Use when users ask about unhealthy or failing resources, mention specific config items by name or ID, inquire about Kubernetes pods/deployments/services, AWS EC2 instances/volumes, Azure VMs, or other infrastructure components. Also use when investigating why a resource is down, stopped, degraded, or showing errors, or when analyzing what changed that caused an issue.	search_catalog, describe_config, list_catalog_types, get_related_configs, search_catalog_changes, get_notification_detail, get_notifications_for_resource

Config Item Troubleshooting Skill

Core Purpose

This skill enables Claude to troubleshoot infrastructure and application configuration items in Mission Control, diagnose health issues, analyze changes, and identify root causes through systematic investigation of config relationships and history.

Understanding Config Items

A ConfigItem represents a discoverable infrastructure or application configuration (Kubernetes Pods, AWS EC2 instances, Azure VMs, database instances, etc.). Each config item contains:

health: Overall health status ("healthy", "unhealthy", "warning", "unknown")
status: Operational state (e.g., "Running", "Stopped", "Pending")
description: Human-readable description (often contains error messages when unhealthy)
.config: The actual JSON specification/manifest (e.g., Kubernetes Pod spec, AWS instance details)
type: The kind of resource (e.g., "Kubernetes::Pod", "AWS::EC2::Instance")
tags: Metadata for filtering and organization
parent_id/path: Hierarchical relationships to other configs
external_id: External system identifier

Key Workflows

Initial Investigation

1. Search and Identify the Config Use the MCP search_catalog tool to find the config item:

Search by id, name, type, tags, or other attributes
Narrow down to the specific config experiencing issues

2. Get Complete Config Details Use the MCP describe_config tool to retrieve full config information:

Review the health field for overall status
Check the status field for operational state
Read the description field carefully - this often contains error messages or status information
Examine the .config JSON field - this contains the full specification/manifest

Change Analysis

3. Review Recent Changes If the issue isn't immediately apparent, use the MCP search_catalog_changes tool:

Get changes for the specific config item
Look for recent modifications to the specification
Check change_type (created, updated, deleted)
Review severity (critical, high, medium, low, info)
Examine patches and diff fields to see what changed
Check source to understand where the change originated
Note the created_at timestamp to correlate with when issues started

4. Investigate Related Configs Use the MCP get_related_configs tool to navigate the config hierarchy:

Children: Resources created/managed by this config
- Example: A Kubernetes Deployment → ReplicaSets → Pods
- Example: An AWS Auto Scaling Group → EC2 Instances
Parents: Resources that manage this config
- Example: A Pod → ReplicaSet → Deployment
Dependencies: Resources this config depends on
- Example: A Pod → ConfigMaps, Secrets, PersistentVolumeClaims

Troubleshooting Pattern: When a parent resource is unhealthy, investigate its children to find the actual failing component. When a child is unhealthy, check the parent for misconfigurations.

Critical Requirements

Hierarchical Thinking:

Kubernetes: Namespace → Deployment → ReplicaSet → Pod → Container
AWS: VPC → Subnet → EC2 Instance → Volume
Azure: Resource Group → VM → Disk

Change Impact Analysis:

Compare current config with previous working state
Identify what changed and when
Correlate timing of changes with health degradation

Evidence-Based Diagnosis:

Support conclusions with specific evidence from the config data
Quote relevant error messages from description fields
Reference specific fields in the .config JSON
Cite change diffs and timestamps

Diagnosis Workflow

Follow this systematic approach:

Identify - Find the config item
Assess - Review health, status, description, and .config spec
Analyze Changes - Check recent modifications and events
Navigate Relationships - Investigate parent/child/dependency configs
Review Analysis - Check automated findings
Synthesize - Determine root cause from all evidence
Recommend - Provide specific remediation steps

Example Troubleshooting Scenarios

Scenario 1: Unhealthy Kubernetes Deployment

Get Deployment details → health: unhealthy
Get related configs (children) → ReplicaSets → Pods
Find Pod in CrashLoopBackOff
Check Pod .config → image pull error
Check changes → recent image tag update
Root cause: Invalid image tag deployed
Recommendation: Rollback to previous image or fix image tag

Scenario 2: AWS EC2 Instance Issues

Get Instance details → status: stopped, health: unhealthy
Check description → "InsufficientInstanceCapacity"
Review changes → instance type changed to unavailable type
Get related configs → Security Groups, Volumes
Root cause: Requested instance type not available in AZ
Recommendation: Change to available instance type or different AZ

5.6 KiB Raw Permalink Blame History