Files

Zhongwei Li 3562b3d6a4 Initial commit

2025-11-30 08:54:26 +08:00

11 KiB

Raw Blame History

name, description

name	description
datadog-auto-detector	Automatically detects Datadog resource mentions (URLs, service queries, natural language) and intelligently fetches condensed context via datadog-analyzer subagent when needed for the conversation (plugin:schovi@schovi-workflows)

Datadog Auto-Detector Skill

Purpose: Detect when user mentions Datadog resources and intelligently fetch relevant observability data.

Architecture: Three-tier pattern (Skill → Command → Subagent) for context isolation.

Detection Patterns

Pattern 1: Datadog URLs

Detect full Datadog URLs across all resource types:

Logs:

https://app.datadoghq.com/.../logs?query=...
https://app.datadoghq.com/.../logs?...

APM / Traces:

https://app.datadoghq.com/.../apm/traces?query=...
https://app.datadoghq.com/.../apm/trace/[trace-id]
https://app.datadoghq.com/.../apm/services/[service-name]

Metrics:

https://app.datadoghq.com/.../metric/explorer?query=...
https://app.datadoghq.com/.../metric/summary?metric=...

Dashboards:

https://app.datadoghq.com/.../dashboard/[dashboard-id]

Monitors:

https://app.datadoghq.com/.../monitors/[monitor-id]
https://app.datadoghq.com/.../monitors?query=...

Incidents:

https://app.datadoghq.com/.../incidents/[incident-id]
https://app.datadoghq.com/.../incidents?...

Services:

https://app.datadoghq.com/.../services/[service-name]

Events:

https://app.datadoghq.com/.../event/stream?query=...

RUM:

https://app.datadoghq.com/.../rum/...

Infrastructure/Hosts:

https://app.datadoghq.com/.../infrastructure/...

Pattern 2: Natural Language Queries

Detect observability-related requests:

Metrics Queries:

"error rate of [service]"
"check metrics for [service]"
"CPU usage of [service]"
"latency of [service]"
"throughput for [service]"
"request rate"
"response time"

Log Queries:

"logs for [service]"
"log errors in [service]"
"show logs from [service]"
"check [service] logs"
"error logs"

Trace Queries:

"traces for [service]"
"trace [trace-id]"
"slow requests in [service]"
"APM data for [service]"

Incident Queries:

"active incidents"
"show incidents"
"SEV-1 incidents"
"current incidents for [team]"

Monitor Queries:

"alerting monitors"
"check monitors for [service]"
"show triggered monitors"

Service Queries:

"status of [service]"
"health of [service]"
"[service] dependencies"

Pattern 3: Service Name References

Detect service names in context of observability:

Common patterns: pb-*, service-*, microservice names
Context keywords: "service", "application", "component", "backend", "frontend"
Combined with observability verbs: "check", "show", "analyze", "investigate"

Intelligence: When to Fetch

✅ DO Fetch When:

Direct Request: User explicitly asks for Datadog data
- "Can you check the error rate?"
- "Show me logs for pb-backend-web"
- "What's happening in Datadog?"
Datadog URL Provided: User shares Datadog link
- "Look at this: https://app.datadoghq.com/.../logs?..."
- "Here's the dashboard: [URL]"
Investigation Context: User is troubleshooting
- "I'm seeing errors in pb-backend-web, can you investigate?"
- "Something's wrong with the service, check Datadog"
Proactive Analysis: User asks for analysis that requires observability data
- "Analyze the performance of [service]"
- "Is there an outage?"
Comparative Analysis: User wants to compare or correlate
- "Compare error rates between services"
- "Check if logs match the incident"

❌ DON'T Fetch When:

Past Tense Without URL: User mentions resolved issues
- "I fixed the error rate yesterday"
- "The logs showed X" (without asking for current data)
Already Fetched: Datadog data already in conversation
- Check conversation history for recent Datadog summary
- Reuse existing data unless user requests refresh
Informational Discussion: User discussing concepts
- "Datadog is a monitoring tool"
- "We use Datadog for observability"
Vague Reference: Unclear what to fetch
- "Something in Datadog" (too vague)
- Ask for clarification instead
Historical Context: User providing background
- "Last week Datadog showed..."
- "According to Datadog docs..."

Intent Classification

Before spawning subagent, classify the user's intent:

Intent Type 1: Full Context (default)

User wants comprehensive analysis
Fetch all relevant data for the resource
Example: "Analyze error rate of pb-backend-web"

Intent Type 2: Specific Query

User wants specific metric/log/trace
Focus fetch on exact request
Example: "Show me error logs for pb-backend-web in last hour"

Intent Type 3: Quick Status Check

User wants high-level status
Fetch summary data only
Example: "Is pb-backend-web healthy?"

Intent Type 4: Investigation

User is debugging an issue
Fetch errors, incidents, traces
Example: "Users report 500 errors, investigate pb-backend-web"

Intent Type 5: Comparison

User wants to compare metrics/services
Fetch data for multiple resources
Example: "Compare error rates of pb-backend-web and pb-frontend"

Workflow

Step 1: Detect Mention

Scan user message for:

Datadog URLs (Pattern 1)
Natural language queries (Pattern 2)
Service names with observability context (Pattern 3)

If none detected, do nothing.

Step 2: Check Conversation History

Before fetching, check if:

Same resource already fetched in last 5 messages
Recent Datadog summary covers this request
User explicitly requests refresh ("latest data", "check again")

If already fetched and no refresh requested, reuse existing data.

Step 3: Determine Intent

Analyze user message to classify intent (Full Context, Specific Query, Quick Status, Investigation, Comparison).

Extract:

Resource Type: logs, metrics, traces, incidents, monitors, services, dashboards
Service Name: If mentioned (e.g., "pb-backend-web")
Time Range: If specified (e.g., "last hour", "today", "last 24h")
Filters: Any additional filters (e.g., "status:error", "SEV-1")

Step 4: Construct Subagent Prompt

Build prompt for datadog-analyzer subagent:

Fetch and summarize [resource type] for [context].

[If URL provided]:
Datadog URL: [url]

[If natural language query]:
Service: [service-name]
Query Type: [logs/metrics/traces/etc.]
Time Range: [from] to [to]
Additional Context: [user's request]

Intent: [classified intent]

Focus on: [specific aspects user cares about]

Step 5: Spawn Subagent

Use Task tool with:

subagent_type: "schovi:datadog-auto-detector:datadog-analyzer"
prompt: Constructed prompt from Step 4
description: Short description (e.g., "Fetching Datadog logs summary")

Step 6: Present Summary

When subagent returns:

Present the summary to user
Offer to investigate further if issues found
Suggest related queries if relevant

Examples

Example 1: Datadog URL

User: "Look at this: https://app.datadoghq.com/.../logs?query=service:pb-backend-web%20status:error"

Action:

Detect: Datadog logs URL
Check: Not in recent conversation
Intent: Full Context (investigation)
Prompt: "Fetch and summarize logs from Datadog URL: [url]"
Spawn: datadog-analyzer subagent
Present: Summary of error logs

Example 2: Natural Language Query

User: "Can you check the error rate of pb-backend-web service in the last hour?"

Action:

Detect: "error rate" + "pb-backend-web" + "last hour"
Check: Not in recent conversation
Intent: Specific Query (metrics)
Prompt: "Fetch and summarize metrics for error rate. Service: pb-backend-web, Time Range: last 1h"
Spawn: datadog-analyzer subagent
Present: Metrics summary with error rate trend

Example 3: Investigation Context

User: "Users are reporting 500 errors on the checkout flow. Can you investigate?"

Action:

Detect: "500 errors" (observability issue)
Check: Not in recent conversation
Intent: Investigation
Prompt: "Investigate 500 errors in checkout flow. Query Type: logs and traces, Filters: status:500 OR status:error, Time Range: last 1h. Focus on: error patterns, affected endpoints, trace analysis"
Spawn: datadog-analyzer subagent
Present: Investigation summary with findings

Example 4: Already Fetched

User: "Show me error rate for pb-backend-web"

[Datadog summary for pb-backend-web fetched 2 messages ago]

Action:

Detect: "error rate" + "pb-backend-web"
Check: Already fetched in message N-2
Skip fetch: "Based on the Datadog data fetched earlier, the error rate for pb-backend-web is [value]..."

Example 5: Past Tense (No Fetch)

User: "Yesterday Datadog showed high error rates"

Action:

Detect: "Datadog" + "error rates"
Check: Past tense ("Yesterday", "showed")
Skip fetch: User is providing historical context, not requesting current data

Example 6: Comparison

User: "Compare error rates of pb-backend-web and pb-frontend over the last 24 hours"

Action:

Detect: "error rates" + multiple services + "last 24 hours"
Check: Not in recent conversation
Intent: Comparison
Prompt: "Fetch and compare metrics for error rate. Services: pb-backend-web, pb-frontend. Time Range: last 24h. Focus on: comparative analysis, trends, spikes"
Spawn: datadog-analyzer subagent
Present: Comparative metrics summary

Edge Cases

Ambiguous Service Name

User: "Check the backend service error rate"

Action:

Detect: "backend service" (ambiguous)
Ask: "I can fetch error rate data from Datadog. Which specific service? (e.g., pb-backend-web, pb-backend-api)"
Wait for clarification before spawning subagent

URL Parsing Failure

User: Provides malformed or partial Datadog URL

Action:

Detect: Datadog domain but unparseable
Spawn: Subagent with URL and note parsing might fail
Subagent will attempt to extract what it can or report error

Multiple Resources in One Request

User: "Show me logs, metrics, and traces for pb-backend-web"

Action:

Detect: Multiple resource types requested
Intent: Full Context (investigation)
Prompt: "Fetch comprehensive observability data for pb-backend-web: logs (errors), metrics (error rate, latency), traces (slow requests). Time Range: last 1h"
Spawn: Single subagent call (let subagent handle multiple queries)

Integration Notes

Proactive Activation: This skill should activate automatically when Datadog resources are mentioned.

No User Prompt: The skill should work silently - user doesn't need to explicitly invoke it.

Commands Integration: This skill can be used within commands like /schovi:analyze to fetch Datadog context automatically.

Token Efficiency: By using the subagent pattern, we reduce context pollution from 10k-50k tokens to ~800-1200 tokens.

Quality Checklist

Before spawning subagent, verify:

Clear detection of Datadog resource or query
Not already fetched in recent conversation (unless refresh requested)
Not past tense reference without current data request
Intent classified correctly
Prompt for subagent is clear and specific
Fully qualified subagent name used: schovi:datadog-auto-detector:datadog-analyzer

11 KiB Raw Blame History

Datadog Auto-Detector Skill

Detection Patterns

Pattern 1: Datadog URLs

Pattern 2: Natural Language Queries

Pattern 3: Service Name References

Intelligence: When to Fetch

✅ DO Fetch When:

❌ DON'T Fetch When:

Intent Classification

Workflow

Step 1: Detect Mention

Step 2: Check Conversation History

Step 3: Determine Intent

Step 4: Construct Subagent Prompt

Step 5: Spawn Subagent

Step 6: Present Summary

Examples

Example 1: Datadog URL

Example 2: Natural Language Query

Example 3: Investigation Context

Example 4: Already Fetched

Example 5: Past Tense (No Fetch)

Example 6: Comparison

Edge Cases

Ambiguous Service Name

URL Parsing Failure

Multiple Resources in One Request

Integration Notes

Quality Checklist

11 KiB

Raw Blame History