Initial commit
This commit is contained in:
375
skills/evaluator/SKILL.md
Normal file
375
skills/evaluator/SKILL.md
Normal file
@@ -0,0 +1,375 @@
|
||||
---
|
||||
name: evaluator
|
||||
description: Skill evaluation and telemetry framework. Collects anonymous usage data and feedback via GitHub Issues and Projects. Privacy-first, opt-in, transparent. Helps improve ClaudeShack skills based on real-world usage. Integrates with oracle and guardian.
|
||||
allowed-tools: Read, Write, Bash, Glob
|
||||
---
|
||||
|
||||
# Evaluator: Skill Evaluation & Telemetry Framework
|
||||
|
||||
You are the **Evaluator** - a privacy-first telemetry and feedback collection system for ClaudeShack skills.
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Privacy First**: All telemetry is anonymous and opt-in
|
||||
2. **Transparency**: Users know exactly what data is collected
|
||||
3. **Easy Opt-Out**: Single command to disable telemetry
|
||||
4. **No PII**: Never collect personally identifiable information
|
||||
5. **GitHub-Native**: Uses GitHub Issues and Projects for feedback
|
||||
6. **Community Benefit**: Collected data improves skills for everyone
|
||||
7. **Open Data**: Aggregate statistics are public (not individual events)
|
||||
|
||||
## Why Telemetry?
|
||||
|
||||
Based on research (OpenTelemetry 2025 best practices):
|
||||
|
||||
> "Telemetry features are different because they can offer continuous, unfiltered insight into a user's experiences" - unlike manual surveys or issue reports.
|
||||
|
||||
However, we follow the consensus:
|
||||
> "The data needs to be anonymous, it should be clearly documented and it must be able to be switched off easily (or opt-in if possible)."
|
||||
|
||||
## What We Collect (Opt-In)
|
||||
|
||||
### Skill Usage Events (Anonymous)
|
||||
|
||||
```json
|
||||
{
|
||||
"event_type": "skill_invoked",
|
||||
"skill_name": "oracle",
|
||||
"timestamp": "2025-01-15T10:30:00Z",
|
||||
"session_id": "anonymous_hash",
|
||||
"success": true,
|
||||
"error_type": null,
|
||||
"duration_ms": 1250
|
||||
}
|
||||
```
|
||||
|
||||
**What we DON'T collect:**
|
||||
- ❌ User identity (name, email, IP address)
|
||||
- ❌ File paths or code content
|
||||
- ❌ Conversation history
|
||||
- ❌ Project names
|
||||
- ❌ Any personally identifiable information
|
||||
|
||||
**What we DO collect:**
|
||||
- ✅ Skill name and success/failure
|
||||
- ✅ Anonymous session ID (random hash, rotates daily)
|
||||
- ✅ Error types (for debugging)
|
||||
- ✅ Performance metrics (duration)
|
||||
- ✅ Skill-specific metrics (e.g., Oracle query count)
|
||||
|
||||
### Skill-Specific Metrics
|
||||
|
||||
**Oracle Skill:**
|
||||
- Query success rate
|
||||
- Average query duration
|
||||
- Most common query types
|
||||
- Cache hit rate
|
||||
|
||||
**Guardian Skill:**
|
||||
- Trigger frequency (code volume, errors, churn)
|
||||
- Suggestion acceptance rate (aggregate)
|
||||
- Most common review categories
|
||||
- Average confidence scores
|
||||
|
||||
**Summoner Skill:**
|
||||
- Subagent spawn frequency
|
||||
- Model distribution (haiku vs sonnet)
|
||||
- Average task duration
|
||||
- Success rates
|
||||
|
||||
## Feedback Collection Methods
|
||||
|
||||
### 1. GitHub Issues (Manual Feedback)
|
||||
|
||||
Users can provide feedback via issue templates:
|
||||
|
||||
**Templates:**
|
||||
- `skill_feedback.yml` - General skill feedback
|
||||
- `skill_bug.yml` - Bug reports
|
||||
- `skill_improvement.yml` - Improvement suggestions
|
||||
- `skill_request.yml` - New skill requests
|
||||
|
||||
**Example:**
|
||||
```yaml
|
||||
name: Skill Feedback
|
||||
description: Provide feedback on ClaudeShack skills
|
||||
labels: ["feedback", "skill"]
|
||||
body:
|
||||
- type: dropdown
|
||||
id: skill
|
||||
attributes:
|
||||
label: Which skill?
|
||||
options:
|
||||
- Oracle
|
||||
- Guardian
|
||||
- Summoner
|
||||
- Evaluator
|
||||
- Other
|
||||
- type: dropdown
|
||||
id: rating
|
||||
attributes:
|
||||
label: How useful is this skill?
|
||||
options:
|
||||
- Very useful
|
||||
- Somewhat useful
|
||||
- Not useful
|
||||
- type: textarea
|
||||
id: what-works
|
||||
attributes:
|
||||
label: What works well?
|
||||
- type: textarea
|
||||
id: what-doesnt
|
||||
attributes:
|
||||
label: What could be improved?
|
||||
```
|
||||
|
||||
### 2. GitHub Projects (Feedback Dashboard)
|
||||
|
||||
We use GitHub Projects to track and prioritize feedback:
|
||||
|
||||
**Project Columns:**
|
||||
- 📥 New Feedback (Triage)
|
||||
- 🔍 Investigating
|
||||
- 📋 Planned
|
||||
- 🚧 In Progress
|
||||
- ✅ Completed
|
||||
- 🚫 Won't Fix
|
||||
|
||||
**Metrics Tracked:**
|
||||
- Issue velocity (feedback → resolution time)
|
||||
- Top requested improvements
|
||||
- Most reported bugs
|
||||
- Skill satisfaction ratings
|
||||
|
||||
### 3. Anonymous Telemetry (Opt-In)
|
||||
|
||||
**How It Works:**
|
||||
|
||||
1. User opts in: `/evaluator enable`
|
||||
2. Events are collected locally in `.evaluator/events.jsonl`
|
||||
3. Periodically (daily), events are aggregated into summary stats
|
||||
4. Summary stats are optionally sent to GitHub Discussions as anonymous metrics
|
||||
5. Individual events are never sent (only aggregates)
|
||||
|
||||
**Example Aggregate Report (posted to GitHub Discussions):**
|
||||
|
||||
```markdown
|
||||
## Weekly Skill Usage Report (Anonymous)
|
||||
|
||||
**Oracle Skill:**
|
||||
- Total queries: 1,250 (across all users)
|
||||
- Success rate: 94.2%
|
||||
- Average duration: 850ms
|
||||
- Most common queries: "pattern search" (45%), "gotcha lookup" (30%)
|
||||
|
||||
**Guardian Skill:**
|
||||
- Reviews triggered: 320
|
||||
- Suggestion acceptance: 72%
|
||||
- Most common categories: security (40%), performance (25%), style (20%)
|
||||
|
||||
**Summoner Skill:**
|
||||
- Subagents spawned: 580
|
||||
- Haiku: 85%, Sonnet: 15%
|
||||
- Success rate: 88%
|
||||
|
||||
**Top User Feedback Themes:**
|
||||
1. "Oracle needs better search filters" (12 mentions)
|
||||
2. "Guardian triggers too frequently" (8 mentions)
|
||||
3. "Love the minimal context passing!" (15 mentions)
|
||||
```
|
||||
|
||||
## How to Use Evaluator
|
||||
|
||||
### Enable Telemetry (Opt-In)
|
||||
|
||||
```bash
|
||||
# Enable anonymous telemetry
|
||||
/evaluator enable
|
||||
|
||||
# Confirm telemetry is enabled
|
||||
/evaluator status
|
||||
|
||||
# View what will be collected
|
||||
/evaluator show-sample
|
||||
```
|
||||
|
||||
### Disable Telemetry
|
||||
|
||||
```bash
|
||||
# Disable telemetry
|
||||
/evaluator disable
|
||||
|
||||
# Delete all local telemetry data
|
||||
/evaluator purge
|
||||
```
|
||||
|
||||
### View Local Telemetry
|
||||
|
||||
```bash
|
||||
# View local event summary (never leaves your machine)
|
||||
/evaluator summary
|
||||
|
||||
# View local events (for transparency)
|
||||
/evaluator show-events
|
||||
|
||||
# Export events to JSON
|
||||
/evaluator export --output telemetry.json
|
||||
```
|
||||
|
||||
### Submit Manual Feedback
|
||||
|
||||
```bash
|
||||
# Open feedback form in browser
|
||||
/evaluator feedback
|
||||
|
||||
# Submit quick rating
|
||||
/evaluator rate oracle 5 "Love the pattern search!"
|
||||
|
||||
# Report a bug
|
||||
/evaluator bug guardian "Triggers too often on test files"
|
||||
```
|
||||
|
||||
## Privacy Guarantees
|
||||
|
||||
### What We Guarantee:
|
||||
|
||||
1. **Opt-In Only**: Telemetry is disabled by default
|
||||
2. **No PII**: We never collect personal information
|
||||
3. **Local First**: Events stored locally, you control when/if they're sent
|
||||
4. **Aggregate Only**: Only summary statistics are sent (not individual events)
|
||||
5. **Easy Deletion**: One command to delete all local data
|
||||
6. **Transparent**: Source code is open, you can audit what's collected
|
||||
7. **No Tracking**: No cookies, no fingerprinting, no cross-site tracking
|
||||
|
||||
### Data Lifecycle:
|
||||
|
||||
```
|
||||
1. Event occurs → 2. Stored locally → 3. Aggregated weekly →
|
||||
4. [Optional] Send aggregate → 5. Auto-delete events >30 days old
|
||||
```
|
||||
|
||||
**You control steps 4 and 5.**
|
||||
|
||||
## Configuration
|
||||
|
||||
`.evaluator/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"enabled": false,
|
||||
"anonymous_id": "randomly-generated-daily-rotating-hash",
|
||||
"send_aggregates": false,
|
||||
"retention_days": 30,
|
||||
"aggregation_interval_days": 7,
|
||||
"collect": {
|
||||
"skill_usage": true,
|
||||
"performance_metrics": true,
|
||||
"error_types": true,
|
||||
"success_rates": true
|
||||
},
|
||||
"exclude_skills": [],
|
||||
"github": {
|
||||
"repo": "Overlord-Z/ClaudeShack",
|
||||
"discussions_category": "Telemetry",
|
||||
"issue_labels": ["feedback", "telemetry"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## For Skill Developers
|
||||
|
||||
### Instrumenting Your Skill
|
||||
|
||||
Add telemetry hooks to your skill:
|
||||
|
||||
```python
|
||||
from evaluator import track_event, track_metric
|
||||
|
||||
# Track skill invocation
|
||||
with track_event('my_skill_invoked'):
|
||||
result = my_skill.execute()
|
||||
|
||||
# Track custom metric
|
||||
track_metric('my_skill_success_rate', success_rate)
|
||||
|
||||
# Track error (error type only, not message)
|
||||
track_error('my_skill_error', error_type='ValueError')
|
||||
```
|
||||
|
||||
### Viewing Skill Analytics
|
||||
|
||||
```bash
|
||||
# View analytics for your skill
|
||||
/evaluator analytics my_skill
|
||||
|
||||
# Compare with other skills
|
||||
/evaluator compare oracle guardian summoner
|
||||
```
|
||||
|
||||
## Benefits to Users
|
||||
|
||||
### Why Share Telemetry?
|
||||
|
||||
1. **Better Skills**: Identify which features are most useful
|
||||
2. **Faster Bug Fixes**: Know which bugs affect the most users
|
||||
3. **Prioritized Features**: Build what users actually want
|
||||
4. **Performance Improvements**: Optimize based on real usage patterns
|
||||
5. **Community Growth**: Demonstrate value to attract contributors
|
||||
|
||||
### What You Get Back:
|
||||
|
||||
- Public aggregate metrics (see how you compare)
|
||||
- Priority bug fixes for highly-used features
|
||||
- Better documentation based on common questions
|
||||
- Skills optimized for real-world usage patterns
|
||||
|
||||
## Implementation Status
|
||||
|
||||
**Current:**
|
||||
- ✅ Privacy-first design
|
||||
- ✅ GitHub Issues templates designed
|
||||
- ✅ Configuration schema
|
||||
- ✅ Opt-in/opt-out framework
|
||||
|
||||
**In Progress:**
|
||||
- 🚧 Event collection scripts
|
||||
- 🚧 Aggregation engine
|
||||
- 🚧 GitHub Projects integration
|
||||
- 🚧 Analytics dashboard
|
||||
|
||||
**Planned:**
|
||||
- 📋 Skill instrumentation helpers
|
||||
- 📋 Automated weekly reports
|
||||
- 📋 Community analytics page
|
||||
|
||||
## Transparency Report
|
||||
|
||||
We commit to publishing quarterly transparency reports:
|
||||
|
||||
**Metrics Reported:**
|
||||
- Total opt-in users (approximate)
|
||||
- Total events collected
|
||||
- Top skills by usage
|
||||
- Top feedback themes
|
||||
- Privacy incidents (if any)
|
||||
|
||||
**Example:**
|
||||
> "Q1 2025: 45 users opted in, 12,500 events collected, 0 privacy incidents, 23 bugs fixed based on feedback"
|
||||
|
||||
## Anti-Patterns (What We Won't Do)
|
||||
|
||||
- ❌ Collect data without consent
|
||||
- ❌ Sell or share data with third parties
|
||||
- ❌ Track individual users
|
||||
- ❌ Collect code or file contents
|
||||
- ❌ Use data for advertising
|
||||
- ❌ Make telemetry difficult to disable
|
||||
- ❌ Hide what we collect
|
||||
|
||||
## References
|
||||
|
||||
Based on 2025 best practices:
|
||||
- OpenTelemetry standards for instrumentation
|
||||
- GitHub Copilot's feedback collection model
|
||||
- VSCode extension telemetry guidelines
|
||||
- Open source community consensus on privacy
|
||||
412
skills/evaluator/scripts/track_event.py
Normal file
412
skills/evaluator/scripts/track_event.py
Normal file
@@ -0,0 +1,412 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Evaluator Event Tracking
|
||||
|
||||
Privacy-first anonymous telemetry for ClaudeShack skills.
|
||||
|
||||
Usage:
|
||||
# Track a skill invocation
|
||||
python track_event.py --skill oracle --event invoked --success true
|
||||
|
||||
# Track a metric
|
||||
python track_event.py --skill guardian --metric acceptance_rate --value 0.75
|
||||
|
||||
# Track an error (type only, no message)
|
||||
python track_event.py --skill summoner --event error --error-type FileNotFoundError
|
||||
|
||||
# Enable/disable telemetry
|
||||
python track_event.py --enable
|
||||
python track_event.py --disable
|
||||
|
||||
# View local events
|
||||
python track_event.py --show-events
|
||||
python track_event.py --summary
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import argparse
|
||||
import hashlib
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional, Any
|
||||
|
||||
|
||||
def find_evaluator_root() -> Path:
|
||||
"""Find or create the .evaluator directory."""
|
||||
current = Path.cwd()
|
||||
|
||||
while current != current.parent:
|
||||
evaluator_path = current / '.evaluator'
|
||||
if evaluator_path.exists():
|
||||
return evaluator_path
|
||||
current = current.parent
|
||||
|
||||
# Not found, create in current project root
|
||||
evaluator_path = Path.cwd() / '.evaluator'
|
||||
evaluator_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
return evaluator_path
|
||||
|
||||
|
||||
def load_config(evaluator_path: Path) -> Dict[str, Any]:
|
||||
"""Load Evaluator configuration."""
|
||||
config_file = evaluator_path / 'config.json'
|
||||
|
||||
if not config_file.exists():
|
||||
# Create default config (telemetry DISABLED by default)
|
||||
default_config = {
|
||||
"enabled": False,
|
||||
"anonymous_id": generate_anonymous_id(),
|
||||
"send_aggregates": False,
|
||||
"retention_days": 30,
|
||||
"aggregation_interval_days": 7,
|
||||
"collect": {
|
||||
"skill_usage": True,
|
||||
"performance_metrics": True,
|
||||
"error_types": True,
|
||||
"success_rates": True
|
||||
},
|
||||
"exclude_skills": [],
|
||||
"github": {
|
||||
"repo": "Overlord-Z/ClaudeShack",
|
||||
"discussions_category": "Telemetry",
|
||||
"issue_labels": ["feedback", "telemetry"]
|
||||
}
|
||||
}
|
||||
|
||||
with open(config_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(default_config, f, indent=2)
|
||||
|
||||
return default_config
|
||||
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, OSError, IOError):
|
||||
return {"enabled": False}
|
||||
|
||||
|
||||
def save_config(evaluator_path: Path, config: Dict[str, Any]) -> None:
|
||||
"""Save Evaluator configuration."""
|
||||
config_file = evaluator_path / 'config.json'
|
||||
|
||||
try:
|
||||
with open(config_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(config, f, indent=2)
|
||||
except (OSError, IOError) as e:
|
||||
print(f"Error: Failed to save config: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def generate_anonymous_id() -> str:
|
||||
"""Generate a daily-rotating anonymous ID.
|
||||
|
||||
Returns:
|
||||
Anonymous hash that rotates daily
|
||||
"""
|
||||
# Use date as salt for daily rotation
|
||||
date_salt = datetime.now().strftime('%Y-%m-%d')
|
||||
|
||||
# Mix with random system identifier (not personally identifiable)
|
||||
# Using just the date makes it truly anonymous - all users on same date have same ID
|
||||
combined = f"{date_salt}"
|
||||
|
||||
return hashlib.sha256(combined.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def track_event(
|
||||
evaluator_path: Path,
|
||||
config: Dict[str, Any],
|
||||
skill_name: str,
|
||||
event_type: str,
|
||||
success: Optional[bool] = None,
|
||||
error_type: Optional[str] = None,
|
||||
duration_ms: Optional[int] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
) -> None:
|
||||
"""Track a skill usage event.
|
||||
|
||||
Args:
|
||||
evaluator_path: Path to .evaluator directory
|
||||
config: Evaluator configuration
|
||||
skill_name: Name of the skill
|
||||
event_type: Type of event (invoked, error, etc.)
|
||||
success: Whether the operation succeeded
|
||||
error_type: Type of error (if applicable)
|
||||
duration_ms: Duration in milliseconds
|
||||
metadata: Additional anonymous metadata
|
||||
"""
|
||||
if not config.get('enabled', False):
|
||||
# Telemetry disabled, skip silently
|
||||
return
|
||||
|
||||
# Check if skill is excluded
|
||||
if skill_name in config.get('exclude_skills', []):
|
||||
return
|
||||
|
||||
# Build event
|
||||
event = {
|
||||
"event_type": f"{skill_name}_{event_type}",
|
||||
"skill_name": skill_name,
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"session_id": config.get('anonymous_id', 'unknown'),
|
||||
"success": success,
|
||||
"error_type": error_type, # Type only, never error message
|
||||
"duration_ms": duration_ms
|
||||
}
|
||||
|
||||
# Add anonymous metadata if provided
|
||||
if metadata:
|
||||
event["metadata"] = metadata
|
||||
|
||||
# Append to events file (JSONL format)
|
||||
events_file = evaluator_path / 'events.jsonl'
|
||||
|
||||
try:
|
||||
with open(events_file, 'a', encoding='utf-8') as f:
|
||||
f.write(json.dumps(event) + '\n')
|
||||
except (OSError, IOError) as e:
|
||||
# Fail silently - telemetry should never break the workflow
|
||||
pass
|
||||
|
||||
|
||||
def track_metric(
|
||||
evaluator_path: Path,
|
||||
config: Dict[str, Any],
|
||||
skill_name: str,
|
||||
metric_name: str,
|
||||
value: float,
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
) -> None:
|
||||
"""Track a skill metric.
|
||||
|
||||
Args:
|
||||
evaluator_path: Path to .evaluator directory
|
||||
config: Evaluator configuration
|
||||
skill_name: Name of the skill
|
||||
metric_name: Name of the metric
|
||||
value: Metric value
|
||||
metadata: Additional anonymous metadata
|
||||
"""
|
||||
track_event(
|
||||
evaluator_path,
|
||||
config,
|
||||
skill_name,
|
||||
"metric",
|
||||
metadata={
|
||||
"metric_name": metric_name,
|
||||
"value": value,
|
||||
**(metadata or {})
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def load_events(evaluator_path: Path, days: Optional[int] = None) -> List[Dict[str, Any]]:
|
||||
"""Load events from local storage.
|
||||
|
||||
Args:
|
||||
evaluator_path: Path to .evaluator directory
|
||||
days: Optional number of days to look back
|
||||
|
||||
Returns:
|
||||
List of events
|
||||
"""
|
||||
events_file = evaluator_path / 'events.jsonl'
|
||||
|
||||
if not events_file.exists():
|
||||
return []
|
||||
|
||||
events = []
|
||||
cutoff = None
|
||||
|
||||
if days:
|
||||
cutoff = datetime.now() - timedelta(days=days)
|
||||
|
||||
try:
|
||||
with open(events_file, 'r', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
try:
|
||||
event = json.loads(line.strip())
|
||||
|
||||
# Filter by date if cutoff specified
|
||||
if cutoff:
|
||||
event_time = datetime.fromisoformat(event['timestamp'])
|
||||
if event_time < cutoff:
|
||||
continue
|
||||
|
||||
events.append(event)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
except (OSError, IOError):
|
||||
return []
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def show_summary(events: List[Dict[str, Any]]) -> None:
|
||||
"""Show summary of local events.
|
||||
|
||||
Args:
|
||||
events: List of events
|
||||
"""
|
||||
if not events:
|
||||
print("No telemetry events recorded")
|
||||
return
|
||||
|
||||
print("=" * 60)
|
||||
print("LOCAL TELEMETRY SUMMARY (Never Sent Anywhere)")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Count by skill
|
||||
by_skill = {}
|
||||
for event in events:
|
||||
skill = event.get('skill_name', 'unknown')
|
||||
if skill not in by_skill:
|
||||
by_skill[skill] = {'total': 0, 'success': 0, 'errors': 0}
|
||||
|
||||
by_skill[skill]['total'] += 1
|
||||
|
||||
if event.get('success') is True:
|
||||
by_skill[skill]['success'] += 1
|
||||
elif event.get('error_type'):
|
||||
by_skill[skill]['errors'] += 1
|
||||
|
||||
# Print summary
|
||||
for skill, stats in sorted(by_skill.items()):
|
||||
print(f"{skill}:")
|
||||
print(f" Total events: {stats['total']}")
|
||||
print(f" Successes: {stats['success']}")
|
||||
print(f" Errors: {stats['errors']}")
|
||||
|
||||
if stats['total'] > 0:
|
||||
success_rate = (stats['success'] / stats['total']) * 100
|
||||
print(f" Success rate: {success_rate:.1f}%")
|
||||
|
||||
print()
|
||||
|
||||
print("=" * 60)
|
||||
print(f"Total events: {len(events)}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Privacy-first anonymous telemetry for ClaudeShack',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter
|
||||
)
|
||||
|
||||
parser.add_argument('--skill', help='Skill name')
|
||||
parser.add_argument('--event', help='Event type (invoked, error, etc.)')
|
||||
parser.add_argument('--success', type=bool, help='Whether operation succeeded')
|
||||
parser.add_argument('--error-type', help='Error type (not message)')
|
||||
parser.add_argument('--duration', type=int, help='Duration in milliseconds')
|
||||
|
||||
parser.add_argument('--metric', help='Metric name')
|
||||
parser.add_argument('--value', type=float, help='Metric value')
|
||||
|
||||
parser.add_argument('--enable', action='store_true', help='Enable telemetry (opt-in)')
|
||||
parser.add_argument('--disable', action='store_true', help='Disable telemetry')
|
||||
parser.add_argument('--status', action='store_true', help='Show telemetry status')
|
||||
|
||||
parser.add_argument('--show-events', action='store_true', help='Show local events')
|
||||
parser.add_argument('--summary', action='store_true', help='Show event summary')
|
||||
parser.add_argument('--days', type=int, help='Days to look back (default: all)')
|
||||
|
||||
parser.add_argument('--purge', action='store_true', help='Delete all local telemetry data')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Find evaluator directory
|
||||
evaluator_path = find_evaluator_root()
|
||||
config = load_config(evaluator_path)
|
||||
|
||||
# Handle enable/disable
|
||||
if args.enable:
|
||||
config['enabled'] = True
|
||||
config['anonymous_id'] = generate_anonymous_id()
|
||||
save_config(evaluator_path, config)
|
||||
print("✓ Telemetry enabled (anonymous, opt-in)")
|
||||
print(f" Anonymous ID: {config['anonymous_id']}")
|
||||
print(" No personally identifiable information is collected")
|
||||
print(" You can disable anytime with: --disable")
|
||||
sys.exit(0)
|
||||
|
||||
if args.disable:
|
||||
config['enabled'] = False
|
||||
save_config(evaluator_path, config)
|
||||
print("✓ Telemetry disabled")
|
||||
print(" Run with --purge to delete all local data")
|
||||
sys.exit(0)
|
||||
|
||||
# Handle status
|
||||
if args.status:
|
||||
print("Evaluator Telemetry Status:")
|
||||
print("=" * 60)
|
||||
print(f"Enabled: {config.get('enabled', False)}")
|
||||
print(f"Anonymous ID: {config.get('anonymous_id', 'Not set')}")
|
||||
print(f"Send aggregates: {config.get('send_aggregates', False)}")
|
||||
print(f"Retention: {config.get('retention_days', 30)} days")
|
||||
|
||||
# Count events
|
||||
events = load_events(evaluator_path)
|
||||
print(f"Local events: {len(events)}")
|
||||
print("=" * 60)
|
||||
sys.exit(0)
|
||||
|
||||
# Handle purge
|
||||
if args.purge:
|
||||
events_file = evaluator_path / 'events.jsonl'
|
||||
if events_file.exists():
|
||||
events_file.unlink()
|
||||
print("✓ All local telemetry data deleted")
|
||||
else:
|
||||
print("No telemetry data to delete")
|
||||
sys.exit(0)
|
||||
|
||||
# Handle show events
|
||||
if args.show_events:
|
||||
events = load_events(evaluator_path, args.days)
|
||||
print(json.dumps(events, indent=2))
|
||||
sys.exit(0)
|
||||
|
||||
# Handle summary
|
||||
if args.summary:
|
||||
events = load_events(evaluator_path, args.days)
|
||||
show_summary(events)
|
||||
sys.exit(0)
|
||||
|
||||
# Track event
|
||||
if args.skill and args.event:
|
||||
track_event(
|
||||
evaluator_path,
|
||||
config,
|
||||
args.skill,
|
||||
args.event,
|
||||
args.success,
|
||||
args.error_type,
|
||||
args.duration
|
||||
)
|
||||
# Silent success (telemetry should be invisible)
|
||||
sys.exit(0)
|
||||
|
||||
# Track metric
|
||||
if args.skill and args.metric and args.value is not None:
|
||||
track_metric(
|
||||
evaluator_path,
|
||||
config,
|
||||
args.skill,
|
||||
args.metric,
|
||||
args.value
|
||||
)
|
||||
# Silent success
|
||||
sys.exit(0)
|
||||
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
Reference in New Issue
Block a user