Initial commit
This commit is contained in:
362
skills/datadog-auto-detector/SKILL.md
Normal file
362
skills/datadog-auto-detector/SKILL.md
Normal file
@@ -0,0 +1,362 @@
|
||||
---
|
||||
name: datadog-auto-detector
|
||||
description: Automatically detects Datadog resource mentions (URLs, service queries, natural language) and intelligently fetches condensed context via datadog-analyzer subagent when needed for the conversation (plugin:schovi@schovi-workflows)
|
||||
---
|
||||
|
||||
# Datadog Auto-Detector Skill
|
||||
|
||||
**Purpose**: Detect when user mentions Datadog resources and intelligently fetch relevant observability data.
|
||||
|
||||
**Architecture**: Three-tier pattern (Skill → Command → Subagent) for context isolation.
|
||||
|
||||
## Detection Patterns
|
||||
|
||||
### Pattern 1: Datadog URLs
|
||||
|
||||
Detect full Datadog URLs across all resource types:
|
||||
|
||||
**Logs**:
|
||||
- `https://app.datadoghq.com/.../logs?query=...`
|
||||
- `https://app.datadoghq.com/.../logs?...`
|
||||
|
||||
**APM / Traces**:
|
||||
- `https://app.datadoghq.com/.../apm/traces?query=...`
|
||||
- `https://app.datadoghq.com/.../apm/trace/[trace-id]`
|
||||
- `https://app.datadoghq.com/.../apm/services/[service-name]`
|
||||
|
||||
**Metrics**:
|
||||
- `https://app.datadoghq.com/.../metric/explorer?query=...`
|
||||
- `https://app.datadoghq.com/.../metric/summary?metric=...`
|
||||
|
||||
**Dashboards**:
|
||||
- `https://app.datadoghq.com/.../dashboard/[dashboard-id]`
|
||||
|
||||
**Monitors**:
|
||||
- `https://app.datadoghq.com/.../monitors/[monitor-id]`
|
||||
- `https://app.datadoghq.com/.../monitors?query=...`
|
||||
|
||||
**Incidents**:
|
||||
- `https://app.datadoghq.com/.../incidents/[incident-id]`
|
||||
- `https://app.datadoghq.com/.../incidents?...`
|
||||
|
||||
**Services**:
|
||||
- `https://app.datadoghq.com/.../services/[service-name]`
|
||||
|
||||
**Events**:
|
||||
- `https://app.datadoghq.com/.../event/stream?query=...`
|
||||
|
||||
**RUM**:
|
||||
- `https://app.datadoghq.com/.../rum/...`
|
||||
|
||||
**Infrastructure/Hosts**:
|
||||
- `https://app.datadoghq.com/.../infrastructure/...`
|
||||
|
||||
### Pattern 2: Natural Language Queries
|
||||
|
||||
Detect observability-related requests:
|
||||
|
||||
**Metrics Queries**:
|
||||
- "error rate of [service]"
|
||||
- "check metrics for [service]"
|
||||
- "CPU usage of [service]"
|
||||
- "latency of [service]"
|
||||
- "throughput for [service]"
|
||||
- "request rate"
|
||||
- "response time"
|
||||
|
||||
**Log Queries**:
|
||||
- "logs for [service]"
|
||||
- "log errors in [service]"
|
||||
- "show logs from [service]"
|
||||
- "check [service] logs"
|
||||
- "error logs"
|
||||
|
||||
**Trace Queries**:
|
||||
- "traces for [service]"
|
||||
- "trace [trace-id]"
|
||||
- "slow requests in [service]"
|
||||
- "APM data for [service]"
|
||||
|
||||
**Incident Queries**:
|
||||
- "active incidents"
|
||||
- "show incidents"
|
||||
- "SEV-1 incidents"
|
||||
- "current incidents for [team]"
|
||||
|
||||
**Monitor Queries**:
|
||||
- "alerting monitors"
|
||||
- "check monitors for [service]"
|
||||
- "show triggered monitors"
|
||||
|
||||
**Service Queries**:
|
||||
- "status of [service]"
|
||||
- "health of [service]"
|
||||
- "[service] dependencies"
|
||||
|
||||
### Pattern 3: Service Name References
|
||||
|
||||
Detect service names in context of observability:
|
||||
- Common patterns: `pb-*`, `service-*`, microservice names
|
||||
- Context keywords: "service", "application", "component", "backend", "frontend"
|
||||
- Combined with observability verbs: "check", "show", "analyze", "investigate"
|
||||
|
||||
## Intelligence: When to Fetch
|
||||
|
||||
### ✅ DO Fetch When:
|
||||
|
||||
1. **Direct Request**: User explicitly asks for Datadog data
|
||||
- "Can you check the error rate?"
|
||||
- "Show me logs for pb-backend-web"
|
||||
- "What's happening in Datadog?"
|
||||
|
||||
2. **Datadog URL Provided**: User shares Datadog link
|
||||
- "Look at this: https://app.datadoghq.com/.../logs?..."
|
||||
- "Here's the dashboard: [URL]"
|
||||
|
||||
3. **Investigation Context**: User is troubleshooting
|
||||
- "I'm seeing errors in pb-backend-web, can you investigate?"
|
||||
- "Something's wrong with the service, check Datadog"
|
||||
|
||||
4. **Proactive Analysis**: User asks for analysis that requires observability data
|
||||
- "Analyze the performance of [service]"
|
||||
- "Is there an outage?"
|
||||
|
||||
5. **Comparative Analysis**: User wants to compare or correlate
|
||||
- "Compare error rates between services"
|
||||
- "Check if logs match the incident"
|
||||
|
||||
### ❌ DON'T Fetch When:
|
||||
|
||||
1. **Past Tense Without URL**: User mentions resolved issues
|
||||
- "I fixed the error rate yesterday"
|
||||
- "The logs showed X" (without asking for current data)
|
||||
|
||||
2. **Already Fetched**: Datadog data already in conversation
|
||||
- Check conversation history for recent Datadog summary
|
||||
- Reuse existing data unless user requests refresh
|
||||
|
||||
3. **Informational Discussion**: User discussing concepts
|
||||
- "Datadog is a monitoring tool"
|
||||
- "We use Datadog for observability"
|
||||
|
||||
4. **Vague Reference**: Unclear what to fetch
|
||||
- "Something in Datadog" (too vague)
|
||||
- Ask for clarification instead
|
||||
|
||||
5. **Historical Context**: User providing background
|
||||
- "Last week Datadog showed..."
|
||||
- "According to Datadog docs..."
|
||||
|
||||
## Intent Classification
|
||||
|
||||
Before spawning subagent, classify the user's intent:
|
||||
|
||||
**Intent Type 1: Full Context** (default)
|
||||
- User wants comprehensive analysis
|
||||
- Fetch all relevant data for the resource
|
||||
- Example: "Analyze error rate of pb-backend-web"
|
||||
|
||||
**Intent Type 2: Specific Query**
|
||||
- User wants specific metric/log/trace
|
||||
- Focus fetch on exact request
|
||||
- Example: "Show me error logs for pb-backend-web in last hour"
|
||||
|
||||
**Intent Type 3: Quick Status Check**
|
||||
- User wants high-level status
|
||||
- Fetch summary data only
|
||||
- Example: "Is pb-backend-web healthy?"
|
||||
|
||||
**Intent Type 4: Investigation**
|
||||
- User is debugging an issue
|
||||
- Fetch errors, incidents, traces
|
||||
- Example: "Users report 500 errors, investigate pb-backend-web"
|
||||
|
||||
**Intent Type 5: Comparison**
|
||||
- User wants to compare metrics/services
|
||||
- Fetch data for multiple resources
|
||||
- Example: "Compare error rates of pb-backend-web and pb-frontend"
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Detect Mention
|
||||
|
||||
Scan user message for:
|
||||
1. Datadog URLs (Pattern 1)
|
||||
2. Natural language queries (Pattern 2)
|
||||
3. Service names with observability context (Pattern 3)
|
||||
|
||||
If none detected, **do nothing**.
|
||||
|
||||
### Step 2: Check Conversation History
|
||||
|
||||
Before fetching, check if:
|
||||
- Same resource already fetched in last 5 messages
|
||||
- Recent Datadog summary covers this request
|
||||
- User explicitly requests refresh ("latest data", "check again")
|
||||
|
||||
If already fetched and no refresh requested, **reuse existing data**.
|
||||
|
||||
### Step 3: Determine Intent
|
||||
|
||||
Analyze user message to classify intent (Full Context, Specific Query, Quick Status, Investigation, Comparison).
|
||||
|
||||
Extract:
|
||||
- **Resource Type**: logs, metrics, traces, incidents, monitors, services, dashboards
|
||||
- **Service Name**: If mentioned (e.g., "pb-backend-web")
|
||||
- **Time Range**: If specified (e.g., "last hour", "today", "last 24h")
|
||||
- **Filters**: Any additional filters (e.g., "status:error", "SEV-1")
|
||||
|
||||
### Step 4: Construct Subagent Prompt
|
||||
|
||||
Build prompt for `datadog-analyzer` subagent:
|
||||
|
||||
```
|
||||
Fetch and summarize [resource type] for [context].
|
||||
|
||||
[If URL provided]:
|
||||
Datadog URL: [url]
|
||||
|
||||
[If natural language query]:
|
||||
Service: [service-name]
|
||||
Query Type: [logs/metrics/traces/etc.]
|
||||
Time Range: [from] to [to]
|
||||
Additional Context: [user's request]
|
||||
|
||||
Intent: [classified intent]
|
||||
|
||||
Focus on: [specific aspects user cares about]
|
||||
```
|
||||
|
||||
### Step 5: Spawn Subagent
|
||||
|
||||
Use Task tool with:
|
||||
- **subagent_type**: `"schovi:datadog-auto-detector:datadog-analyzer"`
|
||||
- **prompt**: Constructed prompt from Step 4
|
||||
- **description**: Short description (e.g., "Fetching Datadog logs summary")
|
||||
|
||||
### Step 6: Present Summary
|
||||
|
||||
When subagent returns:
|
||||
1. Present the summary to user
|
||||
2. Offer to investigate further if issues found
|
||||
3. Suggest related queries if relevant
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Datadog URL
|
||||
|
||||
**User**: "Look at this: https://app.datadoghq.com/.../logs?query=service:pb-backend-web%20status:error"
|
||||
|
||||
**Action**:
|
||||
1. Detect: Datadog logs URL
|
||||
2. Check: Not in recent conversation
|
||||
3. Intent: Full Context (investigation)
|
||||
4. Prompt: "Fetch and summarize logs from Datadog URL: [url]"
|
||||
5. Spawn: datadog-analyzer subagent
|
||||
6. Present: Summary of error logs
|
||||
|
||||
### Example 2: Natural Language Query
|
||||
|
||||
**User**: "Can you check the error rate of pb-backend-web service in the last hour?"
|
||||
|
||||
**Action**:
|
||||
1. Detect: "error rate" + "pb-backend-web" + "last hour"
|
||||
2. Check: Not in recent conversation
|
||||
3. Intent: Specific Query (metrics)
|
||||
4. Prompt: "Fetch and summarize metrics for error rate. Service: pb-backend-web, Time Range: last 1h"
|
||||
5. Spawn: datadog-analyzer subagent
|
||||
6. Present: Metrics summary with error rate trend
|
||||
|
||||
### Example 3: Investigation Context
|
||||
|
||||
**User**: "Users are reporting 500 errors on the checkout flow. Can you investigate?"
|
||||
|
||||
**Action**:
|
||||
1. Detect: "500 errors" (observability issue)
|
||||
2. Check: Not in recent conversation
|
||||
3. Intent: Investigation
|
||||
4. Prompt: "Investigate 500 errors in checkout flow. Query Type: logs and traces, Filters: status:500 OR status:error, Time Range: last 1h. Focus on: error patterns, affected endpoints, trace analysis"
|
||||
5. Spawn: datadog-analyzer subagent
|
||||
6. Present: Investigation summary with findings
|
||||
|
||||
### Example 4: Already Fetched
|
||||
|
||||
**User**: "Show me error rate for pb-backend-web"
|
||||
|
||||
[Datadog summary for pb-backend-web fetched 2 messages ago]
|
||||
|
||||
**Action**:
|
||||
1. Detect: "error rate" + "pb-backend-web"
|
||||
2. Check: Already fetched in message N-2
|
||||
3. **Skip fetch**: "Based on the Datadog data fetched earlier, the error rate for pb-backend-web is [value]..."
|
||||
|
||||
### Example 5: Past Tense (No Fetch)
|
||||
|
||||
**User**: "Yesterday Datadog showed high error rates"
|
||||
|
||||
**Action**:
|
||||
1. Detect: "Datadog" + "error rates"
|
||||
2. Check: Past tense ("Yesterday", "showed")
|
||||
3. **Skip fetch**: User is providing historical context, not requesting current data
|
||||
|
||||
### Example 6: Comparison
|
||||
|
||||
**User**: "Compare error rates of pb-backend-web and pb-frontend over the last 24 hours"
|
||||
|
||||
**Action**:
|
||||
1. Detect: "error rates" + multiple services + "last 24 hours"
|
||||
2. Check: Not in recent conversation
|
||||
3. Intent: Comparison
|
||||
4. Prompt: "Fetch and compare metrics for error rate. Services: pb-backend-web, pb-frontend. Time Range: last 24h. Focus on: comparative analysis, trends, spikes"
|
||||
5. Spawn: datadog-analyzer subagent
|
||||
6. Present: Comparative metrics summary
|
||||
|
||||
## Edge Cases
|
||||
|
||||
### Ambiguous Service Name
|
||||
|
||||
**User**: "Check the backend service error rate"
|
||||
|
||||
**Action**:
|
||||
- Detect: "backend service" (ambiguous)
|
||||
- Ask: "I can fetch error rate data from Datadog. Which specific service? (e.g., pb-backend-web, pb-backend-api)"
|
||||
- Wait for clarification before spawning subagent
|
||||
|
||||
### URL Parsing Failure
|
||||
|
||||
**User**: Provides malformed or partial Datadog URL
|
||||
|
||||
**Action**:
|
||||
- Detect: Datadog domain but unparseable
|
||||
- Spawn: Subagent with URL and note parsing might fail
|
||||
- Subagent will attempt to extract what it can or report error
|
||||
|
||||
### Multiple Resources in One Request
|
||||
|
||||
**User**: "Show me logs, metrics, and traces for pb-backend-web"
|
||||
|
||||
**Action**:
|
||||
- Detect: Multiple resource types requested
|
||||
- Intent: Full Context (investigation)
|
||||
- Prompt: "Fetch comprehensive observability data for pb-backend-web: logs (errors), metrics (error rate, latency), traces (slow requests). Time Range: last 1h"
|
||||
- Spawn: Single subagent call (let subagent handle multiple queries)
|
||||
|
||||
## Integration Notes
|
||||
|
||||
**Proactive Activation**: This skill should activate automatically when Datadog resources are mentioned.
|
||||
|
||||
**No User Prompt**: The skill should work silently - user doesn't need to explicitly invoke it.
|
||||
|
||||
**Commands Integration**: This skill can be used within commands like `/schovi:analyze` to fetch Datadog context automatically.
|
||||
|
||||
**Token Efficiency**: By using the subagent pattern, we reduce context pollution from 10k-50k tokens to ~800-1200 tokens.
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
Before spawning subagent, verify:
|
||||
- [ ] Clear detection of Datadog resource or query
|
||||
- [ ] Not already fetched in recent conversation (unless refresh requested)
|
||||
- [ ] Not past tense reference without current data request
|
||||
- [ ] Intent classified correctly
|
||||
- [ ] Prompt for subagent is clear and specific
|
||||
- [ ] Fully qualified subagent name used: `schovi:datadog-auto-detector:datadog-analyzer`
|
||||
Reference in New Issue
Block a user