Files
2025-11-29 18:27:15 +08:00

5.6 KiB

name, description, allowed-tools
name description allowed-tools
troubleshooting-config-items Troubleshoots infrastructure and application configuration items in Mission Control by diagnosing health issues, analyzing recent changes, and investigating resource relationships. Use when users ask about unhealthy or failing resources, mention specific config items by name or ID, inquire about Kubernetes pods/deployments/services, AWS EC2 instances/volumes, Azure VMs, or other infrastructure components. Also use when investigating why a resource is down, stopped, degraded, or showing errors, or when analyzing what changed that caused an issue. search_catalog, describe_config, list_catalog_types, get_related_configs, search_catalog_changes, get_notification_detail, get_notifications_for_resource

Config Item Troubleshooting Skill

Core Purpose

This skill enables Claude to troubleshoot infrastructure and application configuration items in Mission Control, diagnose health issues, analyze changes, and identify root causes through systematic investigation of config relationships and history.

Understanding Config Items

A ConfigItem represents a discoverable infrastructure or application configuration (Kubernetes Pods, AWS EC2 instances, Azure VMs, database instances, etc.). Each config item contains:

  • health: Overall health status ("healthy", "unhealthy", "warning", "unknown")
  • status: Operational state (e.g., "Running", "Stopped", "Pending")
  • description: Human-readable description (often contains error messages when unhealthy)
  • .config: The actual JSON specification/manifest (e.g., Kubernetes Pod spec, AWS instance details)
  • type: The kind of resource (e.g., "Kubernetes::Pod", "AWS::EC2::Instance")
  • tags: Metadata for filtering and organization
  • parent_id/path: Hierarchical relationships to other configs
  • external_id: External system identifier

Key Workflows

Initial Investigation

1. Search and Identify the Config Use the MCP search_catalog tool to find the config item:

  • Search by id, name, type, tags, or other attributes
  • Narrow down to the specific config experiencing issues

2. Get Complete Config Details Use the MCP describe_config tool to retrieve full config information:

  • Review the health field for overall status
  • Check the status field for operational state
  • Read the description field carefully - this often contains error messages or status information
  • Examine the .config JSON field - this contains the full specification/manifest

Change Analysis

3. Review Recent Changes If the issue isn't immediately apparent, use the MCP search_catalog_changes tool:

  • Get changes for the specific config item
  • Look for recent modifications to the specification
  • Check change_type (created, updated, deleted)
  • Review severity (critical, high, medium, low, info)
  • Examine patches and diff fields to see what changed
  • Check source to understand where the change originated
  • Note the created_at timestamp to correlate with when issues started

Relationship Navigation

4. Investigate Related Configs Use the MCP get_related_configs tool to navigate the config hierarchy:

  • Children: Resources created/managed by this config
    • Example: A Kubernetes Deployment → ReplicaSets → Pods
    • Example: An AWS Auto Scaling Group → EC2 Instances
  • Parents: Resources that manage this config
    • Example: A Pod → ReplicaSet → Deployment
  • Dependencies: Resources this config depends on
    • Example: A Pod → ConfigMaps, Secrets, PersistentVolumeClaims

Troubleshooting Pattern: When a parent resource is unhealthy, investigate its children to find the actual failing component. When a child is unhealthy, check the parent for misconfigurations.

Critical Requirements

Hierarchical Thinking:

  • Kubernetes: Namespace → Deployment → ReplicaSet → Pod → Container
  • AWS: VPC → Subnet → EC2 Instance → Volume
  • Azure: Resource Group → VM → Disk

Change Impact Analysis:

  • Compare current config with previous working state
  • Identify what changed and when
  • Correlate timing of changes with health degradation

Evidence-Based Diagnosis:

  • Support conclusions with specific evidence from the config data
  • Quote relevant error messages from description fields
  • Reference specific fields in the .config JSON
  • Cite change diffs and timestamps

Diagnosis Workflow

Follow this systematic approach:

  1. Identify - Find the config item
  2. Assess - Review health, status, description, and .config spec
  3. Analyze Changes - Check recent modifications and events
  4. Navigate Relationships - Investigate parent/child/dependency configs
  5. Review Analysis - Check automated findings
  6. Synthesize - Determine root cause from all evidence
  7. Recommend - Provide specific remediation steps

Example Troubleshooting Scenarios

Scenario 1: Unhealthy Kubernetes Deployment

  • Get Deployment details → health: unhealthy
  • Get related configs (children) → ReplicaSets → Pods
  • Find Pod in CrashLoopBackOff
  • Check Pod .config → image pull error
  • Check changes → recent image tag update
  • Root cause: Invalid image tag deployed
  • Recommendation: Rollback to previous image or fix image tag

Scenario 2: AWS EC2 Instance Issues

  • Get Instance details → status: stopped, health: unhealthy
  • Check description → "InsufficientInstanceCapacity"
  • Review changes → instance type changed to unavailable type
  • Get related configs → Security Groups, Volumes
  • Root cause: Requested instance type not available in AZ
  • Recommendation: Change to available instance type or different AZ