Files
gh-schovi-claude-schovi-schovi/skills/datadog-auto-detector/SKILL.md
2025-11-30 08:54:26 +08:00

11 KiB

name, description
name description
datadog-auto-detector Automatically detects Datadog resource mentions (URLs, service queries, natural language) and intelligently fetches condensed context via datadog-analyzer subagent when needed for the conversation (plugin:schovi@schovi-workflows)

Datadog Auto-Detector Skill

Purpose: Detect when user mentions Datadog resources and intelligently fetch relevant observability data.

Architecture: Three-tier pattern (Skill → Command → Subagent) for context isolation.

Detection Patterns

Pattern 1: Datadog URLs

Detect full Datadog URLs across all resource types:

Logs:

  • https://app.datadoghq.com/.../logs?query=...
  • https://app.datadoghq.com/.../logs?...

APM / Traces:

  • https://app.datadoghq.com/.../apm/traces?query=...
  • https://app.datadoghq.com/.../apm/trace/[trace-id]
  • https://app.datadoghq.com/.../apm/services/[service-name]

Metrics:

  • https://app.datadoghq.com/.../metric/explorer?query=...
  • https://app.datadoghq.com/.../metric/summary?metric=...

Dashboards:

  • https://app.datadoghq.com/.../dashboard/[dashboard-id]

Monitors:

  • https://app.datadoghq.com/.../monitors/[monitor-id]
  • https://app.datadoghq.com/.../monitors?query=...

Incidents:

  • https://app.datadoghq.com/.../incidents/[incident-id]
  • https://app.datadoghq.com/.../incidents?...

Services:

  • https://app.datadoghq.com/.../services/[service-name]

Events:

  • https://app.datadoghq.com/.../event/stream?query=...

RUM:

  • https://app.datadoghq.com/.../rum/...

Infrastructure/Hosts:

  • https://app.datadoghq.com/.../infrastructure/...

Pattern 2: Natural Language Queries

Detect observability-related requests:

Metrics Queries:

  • "error rate of [service]"
  • "check metrics for [service]"
  • "CPU usage of [service]"
  • "latency of [service]"
  • "throughput for [service]"
  • "request rate"
  • "response time"

Log Queries:

  • "logs for [service]"
  • "log errors in [service]"
  • "show logs from [service]"
  • "check [service] logs"
  • "error logs"

Trace Queries:

  • "traces for [service]"
  • "trace [trace-id]"
  • "slow requests in [service]"
  • "APM data for [service]"

Incident Queries:

  • "active incidents"
  • "show incidents"
  • "SEV-1 incidents"
  • "current incidents for [team]"

Monitor Queries:

  • "alerting monitors"
  • "check monitors for [service]"
  • "show triggered monitors"

Service Queries:

  • "status of [service]"
  • "health of [service]"
  • "[service] dependencies"

Pattern 3: Service Name References

Detect service names in context of observability:

  • Common patterns: pb-*, service-*, microservice names
  • Context keywords: "service", "application", "component", "backend", "frontend"
  • Combined with observability verbs: "check", "show", "analyze", "investigate"

Intelligence: When to Fetch

DO Fetch When:

  1. Direct Request: User explicitly asks for Datadog data

    • "Can you check the error rate?"
    • "Show me logs for pb-backend-web"
    • "What's happening in Datadog?"
  2. Datadog URL Provided: User shares Datadog link

  3. Investigation Context: User is troubleshooting

    • "I'm seeing errors in pb-backend-web, can you investigate?"
    • "Something's wrong with the service, check Datadog"
  4. Proactive Analysis: User asks for analysis that requires observability data

    • "Analyze the performance of [service]"
    • "Is there an outage?"
  5. Comparative Analysis: User wants to compare or correlate

    • "Compare error rates between services"
    • "Check if logs match the incident"

DON'T Fetch When:

  1. Past Tense Without URL: User mentions resolved issues

    • "I fixed the error rate yesterday"
    • "The logs showed X" (without asking for current data)
  2. Already Fetched: Datadog data already in conversation

    • Check conversation history for recent Datadog summary
    • Reuse existing data unless user requests refresh
  3. Informational Discussion: User discussing concepts

    • "Datadog is a monitoring tool"
    • "We use Datadog for observability"
  4. Vague Reference: Unclear what to fetch

    • "Something in Datadog" (too vague)
    • Ask for clarification instead
  5. Historical Context: User providing background

    • "Last week Datadog showed..."
    • "According to Datadog docs..."

Intent Classification

Before spawning subagent, classify the user's intent:

Intent Type 1: Full Context (default)

  • User wants comprehensive analysis
  • Fetch all relevant data for the resource
  • Example: "Analyze error rate of pb-backend-web"

Intent Type 2: Specific Query

  • User wants specific metric/log/trace
  • Focus fetch on exact request
  • Example: "Show me error logs for pb-backend-web in last hour"

Intent Type 3: Quick Status Check

  • User wants high-level status
  • Fetch summary data only
  • Example: "Is pb-backend-web healthy?"

Intent Type 4: Investigation

  • User is debugging an issue
  • Fetch errors, incidents, traces
  • Example: "Users report 500 errors, investigate pb-backend-web"

Intent Type 5: Comparison

  • User wants to compare metrics/services
  • Fetch data for multiple resources
  • Example: "Compare error rates of pb-backend-web and pb-frontend"

Workflow

Step 1: Detect Mention

Scan user message for:

  1. Datadog URLs (Pattern 1)
  2. Natural language queries (Pattern 2)
  3. Service names with observability context (Pattern 3)

If none detected, do nothing.

Step 2: Check Conversation History

Before fetching, check if:

  • Same resource already fetched in last 5 messages
  • Recent Datadog summary covers this request
  • User explicitly requests refresh ("latest data", "check again")

If already fetched and no refresh requested, reuse existing data.

Step 3: Determine Intent

Analyze user message to classify intent (Full Context, Specific Query, Quick Status, Investigation, Comparison).

Extract:

  • Resource Type: logs, metrics, traces, incidents, monitors, services, dashboards
  • Service Name: If mentioned (e.g., "pb-backend-web")
  • Time Range: If specified (e.g., "last hour", "today", "last 24h")
  • Filters: Any additional filters (e.g., "status:error", "SEV-1")

Step 4: Construct Subagent Prompt

Build prompt for datadog-analyzer subagent:

Fetch and summarize [resource type] for [context].

[If URL provided]:
Datadog URL: [url]

[If natural language query]:
Service: [service-name]
Query Type: [logs/metrics/traces/etc.]
Time Range: [from] to [to]
Additional Context: [user's request]

Intent: [classified intent]

Focus on: [specific aspects user cares about]

Step 5: Spawn Subagent

Use Task tool with:

  • subagent_type: "schovi:datadog-auto-detector:datadog-analyzer"
  • prompt: Constructed prompt from Step 4
  • description: Short description (e.g., "Fetching Datadog logs summary")

Step 6: Present Summary

When subagent returns:

  1. Present the summary to user
  2. Offer to investigate further if issues found
  3. Suggest related queries if relevant

Examples

Example 1: Datadog URL

User: "Look at this: https://app.datadoghq.com/.../logs?query=service:pb-backend-web%20status:error"

Action:

  1. Detect: Datadog logs URL
  2. Check: Not in recent conversation
  3. Intent: Full Context (investigation)
  4. Prompt: "Fetch and summarize logs from Datadog URL: [url]"
  5. Spawn: datadog-analyzer subagent
  6. Present: Summary of error logs

Example 2: Natural Language Query

User: "Can you check the error rate of pb-backend-web service in the last hour?"

Action:

  1. Detect: "error rate" + "pb-backend-web" + "last hour"
  2. Check: Not in recent conversation
  3. Intent: Specific Query (metrics)
  4. Prompt: "Fetch and summarize metrics for error rate. Service: pb-backend-web, Time Range: last 1h"
  5. Spawn: datadog-analyzer subagent
  6. Present: Metrics summary with error rate trend

Example 3: Investigation Context

User: "Users are reporting 500 errors on the checkout flow. Can you investigate?"

Action:

  1. Detect: "500 errors" (observability issue)
  2. Check: Not in recent conversation
  3. Intent: Investigation
  4. Prompt: "Investigate 500 errors in checkout flow. Query Type: logs and traces, Filters: status:500 OR status:error, Time Range: last 1h. Focus on: error patterns, affected endpoints, trace analysis"
  5. Spawn: datadog-analyzer subagent
  6. Present: Investigation summary with findings

Example 4: Already Fetched

User: "Show me error rate for pb-backend-web"

[Datadog summary for pb-backend-web fetched 2 messages ago]

Action:

  1. Detect: "error rate" + "pb-backend-web"
  2. Check: Already fetched in message N-2
  3. Skip fetch: "Based on the Datadog data fetched earlier, the error rate for pb-backend-web is [value]..."

Example 5: Past Tense (No Fetch)

User: "Yesterday Datadog showed high error rates"

Action:

  1. Detect: "Datadog" + "error rates"
  2. Check: Past tense ("Yesterday", "showed")
  3. Skip fetch: User is providing historical context, not requesting current data

Example 6: Comparison

User: "Compare error rates of pb-backend-web and pb-frontend over the last 24 hours"

Action:

  1. Detect: "error rates" + multiple services + "last 24 hours"
  2. Check: Not in recent conversation
  3. Intent: Comparison
  4. Prompt: "Fetch and compare metrics for error rate. Services: pb-backend-web, pb-frontend. Time Range: last 24h. Focus on: comparative analysis, trends, spikes"
  5. Spawn: datadog-analyzer subagent
  6. Present: Comparative metrics summary

Edge Cases

Ambiguous Service Name

User: "Check the backend service error rate"

Action:

  • Detect: "backend service" (ambiguous)
  • Ask: "I can fetch error rate data from Datadog. Which specific service? (e.g., pb-backend-web, pb-backend-api)"
  • Wait for clarification before spawning subagent

URL Parsing Failure

User: Provides malformed or partial Datadog URL

Action:

  • Detect: Datadog domain but unparseable
  • Spawn: Subagent with URL and note parsing might fail
  • Subagent will attempt to extract what it can or report error

Multiple Resources in One Request

User: "Show me logs, metrics, and traces for pb-backend-web"

Action:

  • Detect: Multiple resource types requested
  • Intent: Full Context (investigation)
  • Prompt: "Fetch comprehensive observability data for pb-backend-web: logs (errors), metrics (error rate, latency), traces (slow requests). Time Range: last 1h"
  • Spawn: Single subagent call (let subagent handle multiple queries)

Integration Notes

Proactive Activation: This skill should activate automatically when Datadog resources are mentioned.

No User Prompt: The skill should work silently - user doesn't need to explicitly invoke it.

Commands Integration: This skill can be used within commands like /schovi:analyze to fetch Datadog context automatically.

Token Efficiency: By using the subagent pattern, we reduce context pollution from 10k-50k tokens to ~800-1200 tokens.

Quality Checklist

Before spawning subagent, verify:

  • Clear detection of Datadog resource or query
  • Not already fetched in recent conversation (unless refresh requested)
  • Not past tense reference without current data request
  • Intent classified correctly
  • Prompt for subagent is clear and specific
  • Fully qualified subagent name used: schovi:datadog-auto-detector:datadog-analyzer