9.3 KiB
name, description, allowed-tools
| name | description | allowed-tools |
|---|---|---|
| evaluator | Skill evaluation and telemetry framework. Collects anonymous usage data and feedback via GitHub Issues and Projects. Privacy-first, opt-in, transparent. Helps improve ClaudeShack skills based on real-world usage. Integrates with oracle and guardian. | Read, Write, Bash, Glob |
Evaluator: Skill Evaluation & Telemetry Framework
You are the Evaluator - a privacy-first telemetry and feedback collection system for ClaudeShack skills.
Core Principles
- Privacy First: All telemetry is anonymous and opt-in
- Transparency: Users know exactly what data is collected
- Easy Opt-Out: Single command to disable telemetry
- No PII: Never collect personally identifiable information
- GitHub-Native: Uses GitHub Issues and Projects for feedback
- Community Benefit: Collected data improves skills for everyone
- Open Data: Aggregate statistics are public (not individual events)
Why Telemetry?
Based on research (OpenTelemetry 2025 best practices):
"Telemetry features are different because they can offer continuous, unfiltered insight into a user's experiences" - unlike manual surveys or issue reports.
However, we follow the consensus:
"The data needs to be anonymous, it should be clearly documented and it must be able to be switched off easily (or opt-in if possible)."
What We Collect (Opt-In)
Skill Usage Events (Anonymous)
{
"event_type": "skill_invoked",
"skill_name": "oracle",
"timestamp": "2025-01-15T10:30:00Z",
"session_id": "anonymous_hash",
"success": true,
"error_type": null,
"duration_ms": 1250
}
What we DON'T collect:
- ❌ User identity (name, email, IP address)
- ❌ File paths or code content
- ❌ Conversation history
- ❌ Project names
- ❌ Any personally identifiable information
What we DO collect:
- ✅ Skill name and success/failure
- ✅ Anonymous session ID (random hash, rotates daily)
- ✅ Error types (for debugging)
- ✅ Performance metrics (duration)
- ✅ Skill-specific metrics (e.g., Oracle query count)
Skill-Specific Metrics
Oracle Skill:
- Query success rate
- Average query duration
- Most common query types
- Cache hit rate
Guardian Skill:
- Trigger frequency (code volume, errors, churn)
- Suggestion acceptance rate (aggregate)
- Most common review categories
- Average confidence scores
Summoner Skill:
- Subagent spawn frequency
- Model distribution (haiku vs sonnet)
- Average task duration
- Success rates
Feedback Collection Methods
1. GitHub Issues (Manual Feedback)
Users can provide feedback via issue templates:
Templates:
skill_feedback.yml- General skill feedbackskill_bug.yml- Bug reportsskill_improvement.yml- Improvement suggestionsskill_request.yml- New skill requests
Example:
name: Skill Feedback
description: Provide feedback on ClaudeShack skills
labels: ["feedback", "skill"]
body:
- type: dropdown
id: skill
attributes:
label: Which skill?
options:
- Oracle
- Guardian
- Summoner
- Evaluator
- Other
- type: dropdown
id: rating
attributes:
label: How useful is this skill?
options:
- Very useful
- Somewhat useful
- Not useful
- type: textarea
id: what-works
attributes:
label: What works well?
- type: textarea
id: what-doesnt
attributes:
label: What could be improved?
2. GitHub Projects (Feedback Dashboard)
We use GitHub Projects to track and prioritize feedback:
Project Columns:
- 📥 New Feedback (Triage)
- 🔍 Investigating
- 📋 Planned
- 🚧 In Progress
- ✅ Completed
- 🚫 Won't Fix
Metrics Tracked:
- Issue velocity (feedback → resolution time)
- Top requested improvements
- Most reported bugs
- Skill satisfaction ratings
3. Anonymous Telemetry (Opt-In)
How It Works:
- User opts in:
/evaluator enable - Events are collected locally in
.evaluator/events.jsonl - Periodically (daily), events are aggregated into summary stats
- Summary stats are optionally sent to GitHub Discussions as anonymous metrics
- Individual events are never sent (only aggregates)
Example Aggregate Report (posted to GitHub Discussions):
## Weekly Skill Usage Report (Anonymous)
**Oracle Skill:**
- Total queries: 1,250 (across all users)
- Success rate: 94.2%
- Average duration: 850ms
- Most common queries: "pattern search" (45%), "gotcha lookup" (30%)
**Guardian Skill:**
- Reviews triggered: 320
- Suggestion acceptance: 72%
- Most common categories: security (40%), performance (25%), style (20%)
**Summoner Skill:**
- Subagents spawned: 580
- Haiku: 85%, Sonnet: 15%
- Success rate: 88%
**Top User Feedback Themes:**
1. "Oracle needs better search filters" (12 mentions)
2. "Guardian triggers too frequently" (8 mentions)
3. "Love the minimal context passing!" (15 mentions)
How to Use Evaluator
Enable Telemetry (Opt-In)
# Enable anonymous telemetry
/evaluator enable
# Confirm telemetry is enabled
/evaluator status
# View what will be collected
/evaluator show-sample
Disable Telemetry
# Disable telemetry
/evaluator disable
# Delete all local telemetry data
/evaluator purge
View Local Telemetry
# View local event summary (never leaves your machine)
/evaluator summary
# View local events (for transparency)
/evaluator show-events
# Export events to JSON
/evaluator export --output telemetry.json
Submit Manual Feedback
# Open feedback form in browser
/evaluator feedback
# Submit quick rating
/evaluator rate oracle 5 "Love the pattern search!"
# Report a bug
/evaluator bug guardian "Triggers too often on test files"
Privacy Guarantees
What We Guarantee:
- Opt-In Only: Telemetry is disabled by default
- No PII: We never collect personal information
- Local First: Events stored locally, you control when/if they're sent
- Aggregate Only: Only summary statistics are sent (not individual events)
- Easy Deletion: One command to delete all local data
- Transparent: Source code is open, you can audit what's collected
- No Tracking: No cookies, no fingerprinting, no cross-site tracking
Data Lifecycle:
1. Event occurs → 2. Stored locally → 3. Aggregated weekly →
4. [Optional] Send aggregate → 5. Auto-delete events >30 days old
You control steps 4 and 5.
Configuration
.evaluator/config.json:
{
"enabled": false,
"anonymous_id": "randomly-generated-daily-rotating-hash",
"send_aggregates": false,
"retention_days": 30,
"aggregation_interval_days": 7,
"collect": {
"skill_usage": true,
"performance_metrics": true,
"error_types": true,
"success_rates": true
},
"exclude_skills": [],
"github": {
"repo": "Overlord-Z/ClaudeShack",
"discussions_category": "Telemetry",
"issue_labels": ["feedback", "telemetry"]
}
}
For Skill Developers
Instrumenting Your Skill
Add telemetry hooks to your skill:
from evaluator import track_event, track_metric
# Track skill invocation
with track_event('my_skill_invoked'):
result = my_skill.execute()
# Track custom metric
track_metric('my_skill_success_rate', success_rate)
# Track error (error type only, not message)
track_error('my_skill_error', error_type='ValueError')
Viewing Skill Analytics
# View analytics for your skill
/evaluator analytics my_skill
# Compare with other skills
/evaluator compare oracle guardian summoner
Benefits to Users
Why Share Telemetry?
- Better Skills: Identify which features are most useful
- Faster Bug Fixes: Know which bugs affect the most users
- Prioritized Features: Build what users actually want
- Performance Improvements: Optimize based on real usage patterns
- Community Growth: Demonstrate value to attract contributors
What You Get Back:
- Public aggregate metrics (see how you compare)
- Priority bug fixes for highly-used features
- Better documentation based on common questions
- Skills optimized for real-world usage patterns
Implementation Status
Current:
- ✅ Privacy-first design
- ✅ GitHub Issues templates designed
- ✅ Configuration schema
- ✅ Opt-in/opt-out framework
In Progress:
- 🚧 Event collection scripts
- 🚧 Aggregation engine
- 🚧 GitHub Projects integration
- 🚧 Analytics dashboard
Planned:
- 📋 Skill instrumentation helpers
- 📋 Automated weekly reports
- 📋 Community analytics page
Transparency Report
We commit to publishing quarterly transparency reports:
Metrics Reported:
- Total opt-in users (approximate)
- Total events collected
- Top skills by usage
- Top feedback themes
- Privacy incidents (if any)
Example:
"Q1 2025: 45 users opted in, 12,500 events collected, 0 privacy incidents, 23 bugs fixed based on feedback"
Anti-Patterns (What We Won't Do)
- ❌ Collect data without consent
- ❌ Sell or share data with third parties
- ❌ Track individual users
- ❌ Collect code or file contents
- ❌ Use data for advertising
- ❌ Make telemetry difficult to disable
- ❌ Hide what we collect
References
Based on 2025 best practices:
- OpenTelemetry standards for instrumentation
- GitHub Copilot's feedback collection model
- VSCode extension telemetry guidelines
- Open source community consensus on privacy