Initial commit
This commit is contained in:
15
skills/meta/skill-isolation-tester/CHANGELOG.md
Normal file
15
skills/meta/skill-isolation-tester/CHANGELOG.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Changelog
|
||||
|
||||
## 0.2.0
|
||||
|
||||
- Refactored to Anthropic progressive disclosure pattern
|
||||
- Updated description with "Use PROACTIVELY when..." format
|
||||
- Removed version/author from frontmatter
|
||||
|
||||
## 0.1.0
|
||||
|
||||
- Initial release with three isolation modes
|
||||
- Git Worktree (fast), Docker (balanced), VM (safest)
|
||||
- Automatic risk assessment and mode detection
|
||||
- Side-effect validation and dependency analysis
|
||||
- Test report generation with actionable recommendations
|
||||
335
skills/meta/skill-isolation-tester/README.md
Normal file
335
skills/meta/skill-isolation-tester/README.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# Skill Isolation Tester
|
||||
|
||||
> Automated testing framework for Claude Code skills in isolated environments
|
||||
|
||||
## Overview
|
||||
|
||||
Test your newly created Claude Code skills in isolated environments before sharing them publicly. This skill automatically spins up git worktrees, Docker containers, or VMs to validate that your skills work correctly without hidden dependencies on your local setup.
|
||||
|
||||
## Features
|
||||
|
||||
- **Multiple Isolation Levels**: Choose from git worktree (fast), Docker (balanced), or VM (safest)
|
||||
- **Automatic Mode Detection**: Analyzes skill risk and suggests appropriate isolation level
|
||||
- **Comprehensive Validation**: Checks execution, side effects, dependencies, and cleanup
|
||||
- **Detailed Reports**: Get actionable feedback with specific issues and recommendations
|
||||
- **Safe Testing**: Protect your main development environment from experimental skills
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```
|
||||
test skill my-new-skill in isolation
|
||||
```
|
||||
|
||||
Claude will analyze your skill and choose the appropriate isolation environment.
|
||||
|
||||
### Specify Environment
|
||||
|
||||
```
|
||||
test skill my-new-skill in worktree # Fast, lightweight
|
||||
test skill my-new-skill in docker # OS isolation
|
||||
test skill my-new-skill in vm # Maximum security
|
||||
```
|
||||
|
||||
### Check for Issues
|
||||
|
||||
```
|
||||
check if skill my-new-skill has hidden dependencies
|
||||
verify skill my-new-skill cleans up after itself
|
||||
```
|
||||
|
||||
## Isolation Modes
|
||||
|
||||
### 🚀 Git Worktree (Fast)
|
||||
|
||||
**Best for**: Read-only skills, quick iteration during development
|
||||
|
||||
- ✅ Creates test in seconds
|
||||
- ✅ Minimal disk space
|
||||
- ⚠️ Limited isolation (shares system packages)
|
||||
|
||||
**Prerequisites**: Git 2.5+
|
||||
|
||||
### 🐳 Docker (Balanced)
|
||||
|
||||
**Best for**: Skills that install packages or modify files
|
||||
|
||||
- ✅ Full OS isolation
|
||||
- ✅ Reproducible environment
|
||||
- ⚠️ Requires Docker installed
|
||||
|
||||
**Prerequisites**: Docker daemon running
|
||||
|
||||
### 🖥️ VM (Safest)
|
||||
|
||||
**Best for**: High-risk skills, untrusted sources
|
||||
|
||||
- ✅ Complete isolation
|
||||
- ✅ Test on different OS versions
|
||||
- ⚠️ Slower, resource-intensive
|
||||
|
||||
**Prerequisites**: Multipass, UTM, or VirtualBox
|
||||
|
||||
## What Gets Tested
|
||||
|
||||
### ✅ Execution Validation
|
||||
- Skill completes without errors
|
||||
- No unhandled exceptions
|
||||
- Acceptable performance
|
||||
|
||||
### ✅ Side Effect Detection
|
||||
- Files created/modified/deleted
|
||||
- Processes started (and stopped)
|
||||
- System configuration changes
|
||||
- Network activity
|
||||
|
||||
### ✅ Dependency Analysis
|
||||
- Required system packages
|
||||
- NPM/pip dependencies
|
||||
- Hardcoded paths
|
||||
- Environment variables needed
|
||||
|
||||
### ✅ Cleanup Verification
|
||||
- Temporary files removed
|
||||
- Processes terminated
|
||||
- System state restored
|
||||
|
||||
## Example Report
|
||||
|
||||
```markdown
|
||||
# Skill Isolation Test Report: my-new-skill
|
||||
|
||||
## Status: ⚠️ WARNING (Ready with minor fixes)
|
||||
|
||||
### Execution Results
|
||||
✅ Skill completed successfully
|
||||
✅ No errors detected
|
||||
⏱️ Execution time: 12s
|
||||
|
||||
### Issues Found
|
||||
|
||||
**HIGH Priority:**
|
||||
- Missing documentation for `jq` dependency
|
||||
- Hardcoded path: /Users/connor/.claude/config (line 45)
|
||||
|
||||
**MEDIUM Priority:**
|
||||
- 3 temporary files not cleaned up in /tmp
|
||||
|
||||
### Recommendations
|
||||
1. Document `jq` requirement in README
|
||||
2. Replace hardcoded path with $HOME/.claude/config
|
||||
3. Add cleanup for /tmp/skill-temp-*.log files
|
||||
|
||||
### Overall Grade: B (READY after addressing HIGH priority items)
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
This skill is already available in your Claude Code skills directory.
|
||||
|
||||
### Manual Installation
|
||||
|
||||
```bash
|
||||
cp -r skill-isolation-tester ~/.claude/skills/
|
||||
```
|
||||
|
||||
### Verify Installation
|
||||
|
||||
Start Claude Code and say:
|
||||
```
|
||||
test skill [any-skill-name] in isolation
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required (All Modes)
|
||||
- Git 2.5+
|
||||
- Claude Code 1.0+
|
||||
|
||||
### Optional (Docker Mode)
|
||||
- Docker Desktop or Docker Engine
|
||||
- 1GB+ free disk space
|
||||
|
||||
### Optional (VM Mode)
|
||||
- Multipass (recommended) or
|
||||
- UTM (macOS) or
|
||||
- VirtualBox (cross-platform)
|
||||
- 8GB+ host RAM
|
||||
- 20GB+ free disk space
|
||||
|
||||
## Configuration
|
||||
|
||||
### Set Default Isolation Mode
|
||||
|
||||
Create `~/.claude/skills/skill-isolation-tester/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"default_mode": "docker",
|
||||
"docker": {
|
||||
"base_image": "ubuntu:22.04",
|
||||
"memory_limit": "512m",
|
||||
"cpu_limit": "1.0"
|
||||
},
|
||||
"vm": {
|
||||
"platform": "multipass",
|
||||
"os_version": "22.04",
|
||||
"cpus": 2,
|
||||
"memory": "2G",
|
||||
"disk": "10G"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Before Submitting to Claudex Marketplace
|
||||
|
||||
```
|
||||
validate skill my-marketplace-skill in docker
|
||||
```
|
||||
|
||||
Ensures your skill works in clean environment without your personal configs.
|
||||
|
||||
### Testing Skills from Others
|
||||
|
||||
```
|
||||
test skill untrusted-skill in vm
|
||||
```
|
||||
|
||||
Maximum isolation protects your system from potential issues.
|
||||
|
||||
### Catching Environment-Specific Bugs
|
||||
|
||||
```
|
||||
test skill my-skill in worktree
|
||||
```
|
||||
|
||||
Quickly verify skill doesn't depend on your specific setup.
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# In your CI pipeline
|
||||
claude "test skill $SKILL_NAME in docker"
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "✅ Skill tests passed"
|
||||
exit 0
|
||||
else
|
||||
echo "❌ Skill tests failed"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Docker daemon not running"
|
||||
|
||||
**macOS**: Open Docker Desktop
|
||||
**Linux**: `sudo systemctl start docker`
|
||||
|
||||
### "Multipass not found"
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install multipass
|
||||
|
||||
# Linux
|
||||
sudo snap install multipass
|
||||
```
|
||||
|
||||
### "Permission denied"
|
||||
|
||||
Add your user to docker group:
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
### "Out of disk space"
|
||||
|
||||
Clean up Docker:
|
||||
```bash
|
||||
docker system prune -a
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Test before committing** - Catch issues early
|
||||
2. **Start with worktree** - Fast iteration during development
|
||||
3. **Use Docker for final validation** - Before public release
|
||||
4. **Use VM for untrusted skills** - Safety first
|
||||
5. **Review test reports** - Address all HIGH priority issues
|
||||
6. **Document dependencies** - Help other users
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Test Scenarios
|
||||
|
||||
```
|
||||
test skill my-skill with inputs "test-file.txt, --option value"
|
||||
```
|
||||
|
||||
### Batch Testing
|
||||
|
||||
```
|
||||
test all skills in directory ./skills/ in worktree
|
||||
```
|
||||
|
||||
### Keep Environment for Debugging
|
||||
|
||||
```
|
||||
test skill my-skill in docker --keep
|
||||
```
|
||||
|
||||
Preserves container/VM for manual inspection.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
skill-isolation-tester/
|
||||
├── SKILL.md # Main skill manifest
|
||||
├── README.md # This file
|
||||
├── CHANGELOG.md # Version history
|
||||
├── plugin.json # Marketplace metadata
|
||||
├── modes/ # Mode-specific workflows
|
||||
│ ├── mode1-git-worktree.md # Fast isolation
|
||||
│ ├── mode2-docker.md # Container isolation
|
||||
│ └── mode3-vm.md # VM isolation
|
||||
├── data/ # Reference materials
|
||||
│ ├── risk-assessment.md # How to assess skill risk
|
||||
│ └── side-effect-checklist.md # What to check for
|
||||
├── templates/ # Report templates
|
||||
│ └── test-report.md # Standard report format
|
||||
└── examples/ # Sample outputs
|
||||
└── test-results/ # Example test results
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Found a bug or have a feature request? Issues and PRs welcome!
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see LICENSE file for details
|
||||
|
||||
## Related Skills
|
||||
|
||||
- **skill-creator**: Create new skills with proper structure
|
||||
- **git-worktree-setup**: Manage parallel development workflows
|
||||
|
||||
## Changelog
|
||||
|
||||
See [CHANGELOG.md](CHANGELOG.md) for version history.
|
||||
|
||||
## Credits
|
||||
|
||||
Created by Connor
|
||||
Inspired by best practices in software testing and isolation
|
||||
|
||||
---
|
||||
|
||||
**Remember**: Test in isolation, ship with confidence! 🚀
|
||||
174
skills/meta/skill-isolation-tester/SKILL.md
Normal file
174
skills/meta/skill-isolation-tester/SKILL.md
Normal file
@@ -0,0 +1,174 @@
|
||||
---
|
||||
name: skill-isolation-tester
|
||||
description: Use PROACTIVELY when validating Claude Code skills before sharing or public release. Automated testing framework using multiple isolation environments (git worktree, Docker containers, VMs) to catch environment-specific bugs, hidden dependencies, and cleanup issues. Includes production-ready test templates and risk-based mode auto-detection. Not for functional testing of skill logic or non-skill code.
|
||||
---
|
||||
|
||||
# Skill Isolation Tester
|
||||
|
||||
Tests Claude Code skills in isolated environments to ensure they work correctly without dependencies on your local setup.
|
||||
|
||||
## When to Use
|
||||
|
||||
**Trigger Phrases**:
|
||||
- "test skill [name] in isolation"
|
||||
- "validate skill [name] in clean environment"
|
||||
- "test my new skill in worktree/docker/vm"
|
||||
- "check if skill [name] has hidden dependencies"
|
||||
|
||||
**Use Cases**:
|
||||
- Test before committing or sharing publicly
|
||||
- Validate no hidden dependencies on local environment
|
||||
- Verify cleanup behavior (no leftover files/processes)
|
||||
- Catch environment-specific bugs
|
||||
|
||||
## Quick Decision Matrix
|
||||
|
||||
| Request | Mode | Isolation Level |
|
||||
|---------|------|-----------------|
|
||||
| "test in worktree" | Git Worktree | Fast, lightweight |
|
||||
| "test in docker" | Docker | Full OS isolation |
|
||||
| "test in vm" | VM | Complete isolation |
|
||||
| "test skill X" (unspecified) | Auto-detect | Based on skill risk |
|
||||
|
||||
## Risk-Based Auto-Detection
|
||||
|
||||
| Risk Level | Criteria | Recommended Mode |
|
||||
|------------|----------|------------------|
|
||||
| Low | Read-only, no system commands | Git Worktree |
|
||||
| Medium | File creation, bash commands | Docker |
|
||||
| High | System config changes, VM ops | VM |
|
||||
|
||||
## Mode 1: Git Worktree (Fast)
|
||||
|
||||
**Best for**: Low-risk skills, quick iteration
|
||||
|
||||
**Process**:
|
||||
1. Create isolated git worktree
|
||||
2. Install Claude Code
|
||||
3. Copy skill and run tests
|
||||
4. Cleanup
|
||||
|
||||
**Workflow**: `modes/mode1-git-worktree.md`
|
||||
|
||||
## Mode 2: Docker Container (Balanced)
|
||||
|
||||
**Best for**: Medium-risk skills, full OS isolation
|
||||
|
||||
**Process**:
|
||||
1. Build/pull Docker image
|
||||
2. Create container with Claude Code
|
||||
3. Run skill tests with monitoring
|
||||
4. Cleanup container and images
|
||||
|
||||
**Workflow**: `modes/mode2-docker.md`
|
||||
|
||||
## Mode 3: VM (Safest)
|
||||
|
||||
**Best for**: High-risk skills, untrusted code
|
||||
|
||||
**Process**:
|
||||
1. Provision VM, take snapshot
|
||||
2. Install Claude Code
|
||||
3. Run tests with full monitoring
|
||||
4. Rollback or cleanup
|
||||
|
||||
**Workflow**: `modes/mode3-vm.md`
|
||||
|
||||
## Test Templates
|
||||
|
||||
Production-ready templates in `test-templates/`:
|
||||
|
||||
| Template | Use For |
|
||||
|----------|---------|
|
||||
| `docker-skill-test.sh` | Docker container/image skills |
|
||||
| `docker-skill-test-json.sh` | CI/CD with JSON/JUnit output |
|
||||
| `api-skill-test.sh` | HTTP/API calling skills |
|
||||
| `file-manipulation-skill-test.sh` | File modification skills |
|
||||
| `git-skill-test.sh` | Git operation skills |
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
chmod +x test-templates/docker-skill-test.sh
|
||||
./test-templates/docker-skill-test.sh my-skill-name
|
||||
|
||||
# CI/CD with JSON output
|
||||
export JSON_ENABLED=true
|
||||
./test-templates/docker-skill-test-json.sh my-skill-name
|
||||
```
|
||||
|
||||
## Helper Library
|
||||
|
||||
`lib/docker-helpers.sh` provides robust Docker testing utilities:
|
||||
|
||||
```bash
|
||||
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
|
||||
|
||||
trap cleanup_on_exit EXIT
|
||||
preflight_check_docker || exit 1
|
||||
safe_docker_build "Dockerfile" "skill-test:my-skill"
|
||||
safe_docker_run "skill-test:my-skill" bash -c "echo 'Testing...'"
|
||||
```
|
||||
|
||||
**Functions**: `validate_shell_command`, `retry_docker_command`, `cleanup_on_exit`, `preflight_check_docker`, `safe_docker_build`, `safe_docker_run`
|
||||
|
||||
## Validation Checks
|
||||
|
||||
**Execution**:
|
||||
- [ ] Skill completes without errors
|
||||
- [ ] Output matches expected format
|
||||
- [ ] Execution time acceptable
|
||||
|
||||
**Side Effects**:
|
||||
- [ ] No orphaned processes
|
||||
- [ ] Temporary files cleaned up
|
||||
- [ ] No unexpected system modifications
|
||||
|
||||
**Portability**:
|
||||
- [ ] No hardcoded paths
|
||||
- [ ] All dependencies documented
|
||||
- [ ] Works in clean environment
|
||||
|
||||
## Test Report Format
|
||||
|
||||
```markdown
|
||||
# Skill Isolation Test Report: [skill-name]
|
||||
## Environment: [Git Worktree / Docker / VM]
|
||||
## Status: [PASS / FAIL / WARNING]
|
||||
|
||||
### Execution Results
|
||||
✅ Skill completed successfully
|
||||
|
||||
### Side Effects Detected
|
||||
⚠️ 3 temporary files not cleaned up
|
||||
|
||||
### Dependency Analysis
|
||||
📦 Required: jq, git
|
||||
|
||||
### Overall Grade: B (READY with minor fixes)
|
||||
```
|
||||
|
||||
## Reference Materials
|
||||
|
||||
- `modes/mode1-git-worktree.md` - Fast isolation workflow
|
||||
- `modes/mode2-docker.md` - Container isolation workflow
|
||||
- `modes/mode3-vm.md` - Full VM isolation workflow
|
||||
- `data/risk-assessment.md` - Skill risk evaluation
|
||||
- `data/side-effect-checklist.md` - Side effect validation
|
||||
- `templates/test-report.md` - Report template
|
||||
- `test-templates/README.md` - Template documentation
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# Test with auto-detection
|
||||
test skill my-new-skill in isolation
|
||||
|
||||
# Test in specific environment
|
||||
test skill my-new-skill in worktree # Fast
|
||||
test skill my-new-skill in docker # Balanced
|
||||
test skill my-new-skill in vm # Safest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Version**: 0.1.0 | **Author**: Connor
|
||||
391
skills/meta/skill-isolation-tester/data/risk-assessment.md
Normal file
391
skills/meta/skill-isolation-tester/data/risk-assessment.md
Normal file
@@ -0,0 +1,391 @@
|
||||
# Skill Risk Assessment Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide helps you assess the risk level of a skill to determine the appropriate isolation environment for testing. Risk assessment prevents over-isolation (wasting time) and under-isolation (security issues).
|
||||
|
||||
## Risk Levels
|
||||
|
||||
### Low Risk → Git Worktree
|
||||
|
||||
**Characteristics:**
|
||||
- Read-only operations on existing files
|
||||
- No system commands (bash, npm, apt, etc.)
|
||||
- No file creation outside skill directory
|
||||
- No network requests
|
||||
- Pure data processing or analysis
|
||||
- File reading and reporting only
|
||||
|
||||
**Examples:**
|
||||
- Code analyzer that reads files and generates reports
|
||||
- Configuration validator that checks syntax
|
||||
- Documentation generator from code comments
|
||||
- Markdown formatter or linter
|
||||
- Log file parser
|
||||
|
||||
**Appropriate Environment:** Git Worktree (fast, lightweight)
|
||||
|
||||
### Medium Risk → Docker
|
||||
|
||||
**Characteristics:**
|
||||
- File creation in user directories
|
||||
- NPM/pip package installation
|
||||
- Bash commands for file operations
|
||||
- Git operations (clone, commit, etc.)
|
||||
- Network requests (API calls, downloads)
|
||||
- Environment variable reads
|
||||
- Temporary file creation
|
||||
- Database connections (local)
|
||||
|
||||
**Examples:**
|
||||
- Code generator that creates new files
|
||||
- Package installer or dependency manager
|
||||
- API integration that fetches remote data
|
||||
- Build tool that compiles code
|
||||
- Test runner that executes tests
|
||||
- Migration tool that updates files
|
||||
|
||||
**Appropriate Environment:** Docker (OS isolation, reproducible)
|
||||
|
||||
### High Risk → VM
|
||||
|
||||
**Characteristics:**
|
||||
- System configuration changes (/etc/ modifications)
|
||||
- Service installation (systemd, cron)
|
||||
- Kernel module loading
|
||||
- VM or container operations
|
||||
- Database schema migrations (production)
|
||||
- Destructive operations (file deletion, disk formatting)
|
||||
- Privilege escalation (sudo commands)
|
||||
- Unknown or untrusted source
|
||||
|
||||
**Examples:**
|
||||
- System setup automation
|
||||
- Infrastructure provisioning
|
||||
- VM management tools
|
||||
- Security testing tools
|
||||
- Experimental or unreviewed skills
|
||||
- Skills from external repositories
|
||||
|
||||
**Appropriate Environment:** VM (complete isolation, safest)
|
||||
|
||||
## Assessment Checklist
|
||||
|
||||
### Step 1: Parse Skill Manifest (SKILL.md)
|
||||
|
||||
Read the skill's SKILL.md and look for these keywords:
|
||||
|
||||
**Low Risk Indicators:**
|
||||
- "analyze", "read", "parse", "validate", "check", "lint", "format"
|
||||
- "generate report", "calculate", "summarize"
|
||||
- Read-only file operations
|
||||
- No system commands mentioned
|
||||
|
||||
**Medium Risk Indicators:**
|
||||
- "install", "create", "write", "modify", "update", "build", "compile"
|
||||
- "npm install", "pip install", "git clone"
|
||||
- "fetch", "download", "API call"
|
||||
- File creation mentioned
|
||||
- Bash commands for file operations
|
||||
|
||||
**High Risk Indicators:**
|
||||
- "sudo", "systemctl", "cron", "service"
|
||||
- "configure system", "modify /etc"
|
||||
- "VM", "docker run", "container"
|
||||
- "delete", "remove", "format"
|
||||
- "root access", "privilege"
|
||||
|
||||
### Step 2: Scan Skill Code
|
||||
|
||||
If skill includes scripts or code files, scan for:
|
||||
|
||||
**Red Flags (High Risk):**
|
||||
```bash
|
||||
# In bash scripts
|
||||
sudo
|
||||
systemctl
|
||||
/etc/
|
||||
chmod 777
|
||||
rm -rf /
|
||||
dd if=
|
||||
mkfs
|
||||
usermod
|
||||
passwd
|
||||
```
|
||||
|
||||
```javascript
|
||||
// In JavaScript/Node
|
||||
require('child_process').exec('sudo')
|
||||
fs.rmdirSync('/', { recursive: true })
|
||||
process.setuid(0)
|
||||
```
|
||||
|
||||
```python
|
||||
# In Python
|
||||
os.system('sudo')
|
||||
import subprocess
|
||||
subprocess.run(['sudo', ...])
|
||||
```
|
||||
|
||||
**Medium Risk Patterns:**
|
||||
```bash
|
||||
npm install
|
||||
git clone
|
||||
curl | bash
|
||||
apt-get install
|
||||
brew install
|
||||
pip install
|
||||
mkdir -p
|
||||
touch
|
||||
echo > file
|
||||
```
|
||||
|
||||
**Low Risk Patterns:**
|
||||
```bash
|
||||
cat file.txt
|
||||
grep pattern
|
||||
find . -name
|
||||
ls -la
|
||||
echo "message"
|
||||
```
|
||||
|
||||
### Step 3: Check Dependencies
|
||||
|
||||
Review plugin.json or README for dependencies:
|
||||
|
||||
**Low Risk:**
|
||||
- No external dependencies
|
||||
- Pure JavaScript/Python/Ruby standard library
|
||||
- Read-only CLI tools (cat, grep, jq for reading only)
|
||||
|
||||
**Medium Risk:**
|
||||
- NPM packages listed
|
||||
- Python packages (via requirements.txt)
|
||||
- Common CLI tools (git, curl, wget)
|
||||
- Database connections (read/write)
|
||||
|
||||
**High Risk:**
|
||||
- System packages (apt, yum, brew)
|
||||
- Kernel modules
|
||||
- Root-level dependencies
|
||||
- Unsigned binaries
|
||||
- External scripts from unknown sources
|
||||
|
||||
### Step 4: Review File Operations
|
||||
|
||||
Check what directories the skill accesses:
|
||||
|
||||
**Low Risk:**
|
||||
- Reads from current directory only
|
||||
- Reads from specified input files
|
||||
- Writes reports to current directory
|
||||
|
||||
**Medium Risk:**
|
||||
- Reads/writes to ~/.claude/
|
||||
- Reads/writes to /tmp/
|
||||
- Creates files in user directories
|
||||
- Modifies project files
|
||||
|
||||
**High Risk:**
|
||||
- Accesses /etc/
|
||||
- Accesses /usr/ or /usr/local/
|
||||
- Accesses /sys/ or /proc/
|
||||
- Modifies system binaries
|
||||
- Accesses /var/log/
|
||||
|
||||
### Step 5: Network Activity Assessment
|
||||
|
||||
**Low Risk:**
|
||||
- No network activity
|
||||
- Reads from local cache only
|
||||
|
||||
**Medium Risk:**
|
||||
- HTTP GET requests to public APIs
|
||||
- Documented API endpoints
|
||||
- Read-only data fetching
|
||||
- HTTPS only
|
||||
|
||||
**High Risk:**
|
||||
- HTTP POST with sensitive data
|
||||
- Unclear network destinations
|
||||
- Raw socket operations
|
||||
- Arbitrary URL from user input
|
||||
- Self-updating mechanism
|
||||
|
||||
## Automatic Risk Scoring
|
||||
|
||||
Use this scoring system:
|
||||
|
||||
```javascript
|
||||
function assessSkillRisk(skill) {
|
||||
let score = 0;
|
||||
|
||||
// File operations
|
||||
if (mentions(skill, "read", "parse", "analyze")) score += 1;
|
||||
if (mentions(skill, "write", "create", "modify")) score += 3;
|
||||
if (mentions(skill, "delete", "remove", "rm -rf")) score += 8;
|
||||
|
||||
// System operations
|
||||
if (mentions(skill, "npm install", "pip install")) score += 3;
|
||||
if (mentions(skill, "apt-get", "brew install")) score += 5;
|
||||
if (mentions(skill, "sudo", "systemctl", "service")) score += 10;
|
||||
|
||||
// File paths
|
||||
if (accesses(skill, "~/", "/tmp/")) score += 2;
|
||||
if (accesses(skill, "/etc/", "/usr/")) score += 8;
|
||||
|
||||
// Network
|
||||
if (mentions(skill, "fetch", "API", "curl")) score += 2;
|
||||
if (mentions(skill, "download", "wget")) score += 3;
|
||||
|
||||
// Process operations
|
||||
if (mentions(skill, "exec", "spawn", "child_process")) score += 4;
|
||||
|
||||
// Determine risk level
|
||||
if (score <= 3) return "low"; // Worktree
|
||||
if (score <= 10) return "medium"; // Docker
|
||||
return "high"; // VM
|
||||
}
|
||||
```
|
||||
|
||||
**Scoring Reference:**
|
||||
- 0-3: Low Risk → Git Worktree
|
||||
- 4-10: Medium Risk → Docker
|
||||
- 11+: High Risk → VM
|
||||
|
||||
## Special Cases
|
||||
|
||||
### Unknown or Unreviewed Skills
|
||||
|
||||
**Default:** High Risk (VM isolation)
|
||||
|
||||
Even if skill appears low risk, use VM for first test of:
|
||||
- Skills from external repositories
|
||||
- Skills without documentation
|
||||
- Skills with obfuscated code
|
||||
- Skills from untrusted authors
|
||||
|
||||
### Skills in Active Development
|
||||
|
||||
**Recommendation:** Medium Risk (Docker)
|
||||
|
||||
For your own skills during development:
|
||||
- Start with Git Worktree for speed
|
||||
- Use Docker before committing
|
||||
- Use VM before public release
|
||||
|
||||
### Skills from Marketplace
|
||||
|
||||
**Recommendation:** Follow listed risk level
|
||||
|
||||
Trusted marketplace skills can use their documented risk level.
|
||||
|
||||
## Override Cases
|
||||
|
||||
User can always override automatic detection:
|
||||
|
||||
```
|
||||
test skill low-risk-skill in vm # More isolation than needed (safe but slow)
|
||||
test skill high-risk-skill in docker # Less isolation (not recommended)
|
||||
```
|
||||
|
||||
**Warn user if choosing lower isolation than recommended.**
|
||||
|
||||
## Risk Re-assessment
|
||||
|
||||
Re-assess risk if skill is updated:
|
||||
- Major version changes
|
||||
- New dependencies added
|
||||
- New file operations
|
||||
- Expanded scope
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
Start
|
||||
|
|
||||
├─ Does skill read files only?
|
||||
| └─ YES → Low Risk (Worktree)
|
||||
| └─ NO → Continue
|
||||
|
|
||||
├─ Does skill install packages or modify files?
|
||||
| └─ YES → Medium Risk (Docker)
|
||||
| └─ NO → Continue
|
||||
|
|
||||
├─ Does skill modify system configs or use sudo?
|
||||
| └─ YES → High Risk (VM)
|
||||
| └─ NO → Continue
|
||||
|
|
||||
└─ Is skill from untrusted source?
|
||||
└─ YES → High Risk (VM)
|
||||
└─ NO → Medium Risk (Docker)
|
||||
```
|
||||
|
||||
## Example Assessments
|
||||
|
||||
### Example 1: "code-formatter"
|
||||
|
||||
**Description:** Formats JavaScript/TypeScript files using prettier
|
||||
|
||||
**Analysis:**
|
||||
- Reads files: Yes (score: +1)
|
||||
- Writes files: Yes (score: +3)
|
||||
- System commands: No
|
||||
- Dependencies: prettier (npm package) (score: +3)
|
||||
- File paths: Current directory only
|
||||
|
||||
**Total Score:** 7
|
||||
**Risk Level:** Medium → Docker
|
||||
|
||||
**Reasoning:** Modifies files but limited to project directory. Docker provides adequate isolation.
|
||||
|
||||
### Example 2: "log-analyzer"
|
||||
|
||||
**Description:** Parses log files and generates HTML report
|
||||
|
||||
**Analysis:**
|
||||
- Reads files: Yes (score: +1)
|
||||
- Writes files: Yes (HTML report) (score: +3)
|
||||
- System commands: No
|
||||
- Dependencies: None
|
||||
- File paths: Current directory + /tmp for temp files (score: +2)
|
||||
|
||||
**Total Score:** 6
|
||||
**Risk Level:** Medium → Docker
|
||||
|
||||
**Reasoning:** Safe operations but creates files. Docker ensures clean testing.
|
||||
|
||||
### Example 3: "system-auditor"
|
||||
|
||||
**Description:** Audits system security configuration
|
||||
|
||||
**Analysis:**
|
||||
- Reads files: Yes, including /etc/ (score: +1 + 8)
|
||||
- System commands: Runs systemctl, checks services (score: +10)
|
||||
- Dependencies: System tools
|
||||
- File paths: /etc/, /var/log/ (score: +8)
|
||||
|
||||
**Total Score:** 27
|
||||
**Risk Level:** High → VM
|
||||
|
||||
**Reasoning:** Accesses sensitive system directories and uses system commands. VM required.
|
||||
|
||||
### Example 4: "markdown-linter"
|
||||
|
||||
**Description:** Checks markdown files for style violations
|
||||
|
||||
**Analysis:**
|
||||
- Reads files: Yes (score: +1)
|
||||
- Writes files: No (only stdout)
|
||||
- System commands: No
|
||||
- Dependencies: None
|
||||
- File paths: Current directory only
|
||||
|
||||
**Total Score:** 1
|
||||
**Risk Level:** Low → Git Worktree
|
||||
|
||||
**Reasoning:** Pure read-only analysis. Worktree is sufficient and fast.
|
||||
|
||||
---
|
||||
|
||||
**Remember:** When in doubt, choose higher isolation. It's better to be safe than to clean up a compromised system. Speed is secondary to security.
|
||||
543
skills/meta/skill-isolation-tester/data/side-effect-checklist.md
Normal file
543
skills/meta/skill-isolation-tester/data/side-effect-checklist.md
Normal file
@@ -0,0 +1,543 @@
|
||||
# Side Effect Detection Checklist
|
||||
|
||||
## Overview
|
||||
|
||||
This checklist helps identify all side effects caused by skill execution. Side effects are any changes to the system state beyond the skill's primary output. Proper detection ensures skills are well-behaved and clean up after themselves.
|
||||
|
||||
## Why Side Effects Matter
|
||||
|
||||
**Portability:** Skills with untracked side effects may not work for other users
|
||||
|
||||
**Cleanliness:** Leftover files and processes waste resources
|
||||
|
||||
**Security:** Unexpected system modifications are security risks
|
||||
|
||||
**Documentation:** Users need to know what a skill changes
|
||||
|
||||
## Categories of Side Effects
|
||||
|
||||
## 1. Filesystem Changes
|
||||
|
||||
### Files Created
|
||||
|
||||
**What to Check:**
|
||||
- Files in skill directory
|
||||
- Files in /tmp/ or /var/tmp/
|
||||
- Files in user home directory (~/)
|
||||
- Files in system directories (/usr/local/, /opt/)
|
||||
- Hidden files (.*) and cache directories (.cache/)
|
||||
- Lock files (.lock, .pid)
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Before execution
|
||||
find /path -type f > /tmp/before-files.txt
|
||||
|
||||
# After execution
|
||||
find /path -type f > /tmp/after-files.txt
|
||||
|
||||
# Compare
|
||||
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^>" | sed 's/^> //'
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Temporary files in /tmp cleaned up before exit
|
||||
- ✅ Output files in current directory or specified location
|
||||
- ✅ Cache files in ~/.cache/skill-name/ (acceptable)
|
||||
- ❌ Random files scattered across filesystem
|
||||
- ❌ Files in system directories without explicit permission
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Cache files in proper location
|
||||
- **MEDIUM**: Temp files not cleaned up
|
||||
- **HIGH**: Files in system directories
|
||||
- **CRITICAL**: Files overwriting existing user data
|
||||
|
||||
### Files Modified
|
||||
|
||||
**What to Check:**
|
||||
- Project files (package.json, tsconfig.json, etc.)
|
||||
- Configuration files (.env, .config/)
|
||||
- System configs (/etc/*)
|
||||
- User configs (~/.bashrc, ~/.zshrc)
|
||||
- Git repository files (.git/)
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Take checksums before
|
||||
find /path -type f -exec md5sum {} \; > /tmp/before-checksums.txt
|
||||
|
||||
# After execution
|
||||
find /path -type f -exec md5sum {} \; > /tmp/after-checksums.txt
|
||||
|
||||
# Find modified files
|
||||
diff /tmp/before-checksums.txt /tmp/after-checksums.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Only files explicitly in skill's scope modified
|
||||
- ✅ Backup created before modifying important files
|
||||
- ✅ Modifications clearly documented in output
|
||||
- ❌ Configuration files modified without notice
|
||||
- ❌ Git repository modified unexpectedly
|
||||
- ❌ System files changed
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Intended file modifications (skill's purpose)
|
||||
- **MEDIUM**: Unintended project file changes
|
||||
- **HIGH**: User config modifications without consent
|
||||
- **CRITICAL**: System file modifications
|
||||
|
||||
### Files Deleted
|
||||
|
||||
**What to Check:**
|
||||
- Files in skill scope (expected deletions)
|
||||
- Temp files created by skill
|
||||
- User files outside skill scope
|
||||
- System files
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Compare before/after file lists
|
||||
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^<" | sed 's/^< //'
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Only temporary files created by skill deleted
|
||||
- ✅ Deletions are part of skill's documented purpose
|
||||
- ❌ User files deleted without explicit permission
|
||||
- ❌ Project files deleted accidentally
|
||||
- ❌ System files deleted
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Skill's own temp files deleted (cleanup)
|
||||
- **MEDIUM**: Unexpected file deletions in project
|
||||
- **HIGH**: User files deleted
|
||||
- **CRITICAL**: System files or important data deleted
|
||||
|
||||
### Directory Changes
|
||||
|
||||
**What to Check:**
|
||||
- New directories created
|
||||
- Working directory changed
|
||||
- Directories removed
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# List directories before/after
|
||||
find /path -type d > /tmp/before-dirs.txt
|
||||
find /path -type d > /tmp/after-dirs.txt
|
||||
diff /tmp/before-dirs.txt /tmp/after-dirs.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Directories created for skill output
|
||||
- ✅ Temp directories in /tmp
|
||||
- ✅ Working directory restored after operations
|
||||
- ❌ Empty directories left behind
|
||||
- ❌ Directories created in unexpected locations
|
||||
|
||||
## 2. Process Management
|
||||
|
||||
### Processes Created
|
||||
|
||||
**What to Check:**
|
||||
- Foreground processes (should complete)
|
||||
- Background processes (daemons, services)
|
||||
- Child processes (spawned by skill)
|
||||
- Zombie processes
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Before execution
|
||||
ps aux > /tmp/before-processes.txt
|
||||
|
||||
# After execution (wait 30 seconds)
|
||||
sleep 30
|
||||
ps aux > /tmp/after-processes.txt
|
||||
|
||||
# Find new processes
|
||||
diff /tmp/before-processes.txt /tmp/after-processes.txt | grep "^>"
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ All skill processes complete and exit
|
||||
- ✅ No orphaned child processes
|
||||
- ✅ Background services documented if needed
|
||||
- ❌ Processes still running after skill exits
|
||||
- ❌ Zombie processes
|
||||
- ❌ High CPU/memory usage processes
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Short-lived child processes that exit cleanly
|
||||
- **MEDIUM**: Background processes that should have been stopped
|
||||
- **HIGH**: Orphaned processes consuming resources
|
||||
- **CRITICAL**: Runaway processes (infinite loops, memory leaks)
|
||||
|
||||
### Process Resource Usage
|
||||
|
||||
**What to Check:**
|
||||
- CPU usage during and after execution
|
||||
- Memory consumption
|
||||
- Disk I/O
|
||||
- Network I/O
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Monitor during execution
|
||||
top -b -n 1 > /tmp/resource-usage.txt
|
||||
|
||||
# Or use htop, ps aux, etc.
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Reasonable resource usage for task
|
||||
- ✅ Resources released after completion
|
||||
- ❌ 100% CPU for extended time
|
||||
- ❌ Memory leaks (growing usage)
|
||||
- ❌ Excessive disk I/O
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Temporary spike during execution
|
||||
- **MEDIUM**: Higher than expected but acceptable
|
||||
- **HIGH**: Excessive usage (> 80% CPU, > 1GB RAM)
|
||||
- **CRITICAL**: Resource exhaustion (OOM, disk full)
|
||||
|
||||
## 3. System Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**What to Check:**
|
||||
- New environment variables set
|
||||
- Modified PATH, HOME, etc.
|
||||
- Shell configuration changes
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Before
|
||||
env | sort > /tmp/before-env.txt
|
||||
|
||||
# After
|
||||
env | sort > /tmp/after-env.txt
|
||||
|
||||
# Compare
|
||||
diff /tmp/before-env.txt /tmp/after-env.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ No permanent environment changes
|
||||
- ✅ Temporary env vars for skill only
|
||||
- ❌ PATH modified globally
|
||||
- ❌ System env vars changed
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Temporary env vars in skill scope
|
||||
- **MEDIUM**: PATH modified in current shell
|
||||
- **HIGH**: .bashrc/.zshrc modified
|
||||
- **CRITICAL**: System-wide env changes
|
||||
|
||||
### System Services
|
||||
|
||||
**What to Check:**
|
||||
- Systemd services started
|
||||
- Cron jobs created
|
||||
- Launch agents/daemons (macOS)
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Linux
|
||||
systemctl list-units --type=service > /tmp/before-services.txt
|
||||
# After
|
||||
systemctl list-units --type=service > /tmp/after-services.txt
|
||||
diff /tmp/before-services.txt /tmp/after-services.txt
|
||||
|
||||
# Cron jobs
|
||||
crontab -l > /tmp/before-cron.txt
|
||||
# After
|
||||
crontab -l > /tmp/after-cron.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ No services unless explicitly documented
|
||||
- ✅ Services stopped after skill exits
|
||||
- ❌ Services left running
|
||||
- ❌ Cron jobs created without consent
|
||||
|
||||
**Severity:**
|
||||
- **MEDIUM**: Services that should have been stopped
|
||||
- **HIGH**: Unexpected service installations
|
||||
- **CRITICAL**: System services modified
|
||||
|
||||
### Package Installations
|
||||
|
||||
**What to Check:**
|
||||
- NPM packages (global)
|
||||
- Python packages (pip)
|
||||
- System packages (apt, brew)
|
||||
- Ruby gems, Go modules, etc.
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# NPM global packages
|
||||
npm list -g --depth=0 > /tmp/before-npm.txt
|
||||
# After
|
||||
npm list -g --depth=0 > /tmp/after-npm.txt
|
||||
diff /tmp/before-npm.txt /tmp/after-npm.txt
|
||||
|
||||
# System packages (Debian/Ubuntu)
|
||||
dpkg -l > /tmp/before-packages.txt
|
||||
# After
|
||||
dpkg -l > /tmp/after-packages.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ All dependencies documented in README
|
||||
- ✅ Local installations (in project directory)
|
||||
- ❌ Global package installations without notice
|
||||
- ❌ System package changes
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Local project dependencies
|
||||
- **MEDIUM**: Global NPM packages (if documented)
|
||||
- **HIGH**: System packages installed
|
||||
- **CRITICAL**: Conflicting package versions
|
||||
|
||||
## 4. Network Activity
|
||||
|
||||
### Connections Established
|
||||
|
||||
**What to Check:**
|
||||
- HTTP/HTTPS requests
|
||||
- WebSocket connections
|
||||
- Database connections
|
||||
- SSH connections
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Monitor network during execution
|
||||
# macOS
|
||||
lsof -i -n -P | grep <skill-process>
|
||||
|
||||
# Linux
|
||||
netstat -tupn | grep <skill-process>
|
||||
|
||||
# Or use tcpdump, wireshark for detailed analysis
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ All network requests documented
|
||||
- ✅ HTTPS used for sensitive data
|
||||
- ✅ Connections properly closed
|
||||
- ❌ Unexpected outbound connections
|
||||
- ❌ Data sent to unknown servers
|
||||
- ❌ Connections left open
|
||||
|
||||
**Severity:**
|
||||
- **LOW**: Documented API calls (HTTPS)
|
||||
- **MEDIUM**: HTTP requests (not HTTPS)
|
||||
- **HIGH**: Unexpected network destinations
|
||||
- **CRITICAL**: Data exfiltration attempts
|
||||
|
||||
### Data Transmitted
|
||||
|
||||
**What to Check:**
|
||||
- API payloads
|
||||
- File uploads/downloads
|
||||
- Metrics/telemetry data
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Clear documentation of what's sent
|
||||
- ✅ User consent for data transmission
|
||||
- ✅ No sensitive data in plaintext
|
||||
- ❌ Telemetry without consent
|
||||
- ❌ Credentials sent over HTTP
|
||||
|
||||
## 5. Database & State
|
||||
|
||||
### Database Changes
|
||||
|
||||
**What to Check:**
|
||||
- Tables created/dropped
|
||||
- Records inserted/updated/deleted
|
||||
- Schema migrations
|
||||
- Indexes created
|
||||
|
||||
**How to Detect:**
|
||||
```sql
|
||||
-- Before (SQLite example)
|
||||
SELECT * FROM sqlite_master WHERE type='table';
|
||||
|
||||
-- After
|
||||
SELECT * FROM sqlite_master WHERE type='table';
|
||||
|
||||
-- Record counts
|
||||
SELECT COUNT(*) FROM each_table;
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Changes are part of skill's purpose
|
||||
- ✅ Backup created before modifications
|
||||
- ✅ Transactions used (rollback on error)
|
||||
- ❌ Unexpected table drops
|
||||
- ❌ Data loss without backup
|
||||
- ❌ Schema changes without migration docs
|
||||
|
||||
### Cache & Session State
|
||||
|
||||
**What to Check:**
|
||||
- Redis/Memcached keys
|
||||
- Session files
|
||||
- Browser storage (if skill uses web UI)
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Cache properly namespaced
|
||||
- ✅ Expired sessions cleaned up
|
||||
- ❌ Cache pollution
|
||||
- ❌ Stale session files
|
||||
|
||||
## 6. Permissions & Security
|
||||
|
||||
### File Permissions
|
||||
|
||||
**What to Check:**
|
||||
- File permission changes (chmod)
|
||||
- Ownership changes (chown)
|
||||
- ACL modifications
|
||||
|
||||
**How to Detect:**
|
||||
```bash
|
||||
# Before
|
||||
ls -la /path > /tmp/before-perms.txt
|
||||
|
||||
# After
|
||||
ls -la /path > /tmp/after-perms.txt
|
||||
diff /tmp/before-perms.txt /tmp/after-perms.txt
|
||||
```
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Appropriate permissions for created files
|
||||
- ✅ No overly permissive files (777)
|
||||
- ❌ Permissions changed on existing files
|
||||
- ❌ World-writable files created
|
||||
|
||||
**Severity:**
|
||||
- **MEDIUM**: Overly restrictive permissions
|
||||
- **HIGH**: Overly permissive permissions (777)
|
||||
- **CRITICAL**: System file permission changes
|
||||
|
||||
### Security Credentials
|
||||
|
||||
**What to Check:**
|
||||
- API keys in files or logs
|
||||
- Passwords in plaintext
|
||||
- Certificates/keys created
|
||||
- SSH keys modified
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ Credentials stored securely (keychain, vault)
|
||||
- ✅ No credentials in logs or temp files
|
||||
- ❌ API keys in plaintext files
|
||||
- ❌ Passwords in shell history
|
||||
- ❌ Private keys with wrong permissions
|
||||
|
||||
**Severity:**
|
||||
- **HIGH**: Credentials in files
|
||||
- **CRITICAL**: Credentials exposed to other users
|
||||
|
||||
## Automated Detection Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# side-effect-detector.sh
|
||||
|
||||
BEFORE_DIR="/tmp/skill-test-before"
|
||||
AFTER_DIR="/tmp/skill-test-after"
|
||||
|
||||
mkdir -p "$BEFORE_DIR" "$AFTER_DIR"
|
||||
|
||||
# Capture before state
|
||||
capture_state() {
|
||||
local DIR="$1"
|
||||
find /tmp -type f > "$DIR/tmp-files.txt"
|
||||
ps aux > "$DIR/processes.txt"
|
||||
env | sort > "$DIR/env.txt"
|
||||
npm list -g --depth=0 > "$DIR/npm-global.txt" 2>/dev/null
|
||||
netstat -tupn > "$DIR/network.txt" 2>/dev/null
|
||||
# Add more as needed
|
||||
}
|
||||
|
||||
# Before
|
||||
capture_state "$BEFORE_DIR"
|
||||
|
||||
# Run skill
|
||||
echo "Execute skill now..."
|
||||
read -p "Press enter when skill completes..."
|
||||
|
||||
# After
|
||||
capture_state "$AFTER_DIR"
|
||||
|
||||
# Compare
|
||||
echo "=== Side Effects Detected ==="
|
||||
echo ""
|
||||
echo "Files in /tmp:"
|
||||
diff "$BEFORE_DIR/tmp-files.txt" "$AFTER_DIR/tmp-files.txt" | grep "^>" | wc -l
|
||||
|
||||
echo "Processes:"
|
||||
diff "$BEFORE_DIR/processes.txt" "$AFTER_DIR/processes.txt" | grep "^>" | head -5
|
||||
|
||||
echo "Environment variables:"
|
||||
diff "$BEFORE_DIR/env.txt" "$AFTER_DIR/env.txt"
|
||||
|
||||
echo "NPM global packages:"
|
||||
diff "$BEFORE_DIR/npm-global.txt" "$AFTER_DIR/npm-global.txt"
|
||||
|
||||
# Detailed reports
|
||||
echo ""
|
||||
echo "Full reports in: $BEFORE_DIR and $AFTER_DIR"
|
||||
```
|
||||
|
||||
## Reporting Template
|
||||
|
||||
```markdown
|
||||
## Side Effects Report
|
||||
|
||||
### Filesystem Changes
|
||||
- **Files Created**: X files
|
||||
- /tmp/skill-temp-123.log (5KB)
|
||||
- ~/.cache/skill-name/data.json (15KB)
|
||||
- **Files Modified**: Y files
|
||||
- package.json (version updated)
|
||||
- **Files Deleted**: Z files
|
||||
- /tmp/old-cache.json
|
||||
|
||||
### Process Management
|
||||
- **Processes Created**: N
|
||||
- **Orphaned Processes**: M (list if > 0)
|
||||
- **Resource Usage**: Peak 45% CPU, 128MB RAM
|
||||
|
||||
### System Configuration
|
||||
- **Env Vars Changed**: None
|
||||
- **Services Started**: None
|
||||
- **Packages Installed**: jq (1.6)
|
||||
|
||||
### Network Activity
|
||||
- **Connections**: 3 HTTPS requests to api.example.com
|
||||
- **Data Transmitted**: 1.2KB (API calls)
|
||||
|
||||
### Database Changes
|
||||
- **Tables**: 1 created (skill_cache)
|
||||
- **Records**: 15 inserted
|
||||
|
||||
### Security
|
||||
- **Permissions**: All files 644 (appropriate)
|
||||
- **Credentials**: No sensitive data detected
|
||||
|
||||
### Overall Assessment
|
||||
✅ Cleanup: Mostly clean (3 temp files remaining)
|
||||
⚠️ Documentation: Missing jq dependency in README
|
||||
✅ Security: No issues
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember:** The goal is not zero side effects (that's impossible for useful skills), but **documented, intentional, and cleaned-up** side effects. Every side effect should be either part of the skill's purpose or properly cleaned up on exit.
|
||||
292
skills/meta/skill-isolation-tester/modes/mode1-git-worktree.md
Normal file
292
skills/meta/skill-isolation-tester/modes/mode1-git-worktree.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Mode 1: Git Worktree Isolation
|
||||
|
||||
## When to Use
|
||||
|
||||
**Best for:**
|
||||
- Read-only skills or skills with minimal file operations
|
||||
- Quick validation during development
|
||||
- Skills that don't require system package installation
|
||||
- Testing iterations where speed matters
|
||||
|
||||
**Not suitable for:**
|
||||
- Skills that install system packages (npm install, apt-get, brew, etc.)
|
||||
- Skills that modify system configurations
|
||||
- Skills that require a clean Node.js environment
|
||||
|
||||
**Risk Level**: Low complexity skills only
|
||||
|
||||
## Advantages
|
||||
|
||||
- ⚡ **Fast**: Creates worktree in seconds
|
||||
- 💾 **Efficient**: Shares git history, minimal disk space
|
||||
- 🔄 **Repeatable**: Easy to create, test, and destroy
|
||||
- 🛠️ **Familiar**: Same git tools you already know
|
||||
|
||||
## Limitations
|
||||
|
||||
- ❌ Shares system packages (node_modules, global npm packages)
|
||||
- ❌ Shares environment variables and configs
|
||||
- ❌ Same OS user and permissions
|
||||
- ❌ Cannot test system-level dependencies
|
||||
- ⚠️ Not true isolation - just a separate git checkout
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Must be in a git repository
|
||||
2. Git worktree feature available (Git 2.5+)
|
||||
3. Clean working directory (or willing to proceed with uncommitted changes)
|
||||
4. Sufficient disk space for additional worktree
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Validate Environment
|
||||
|
||||
```bash
|
||||
# Check if in git repo
|
||||
git rev-parse --is-inside-work-tree
|
||||
|
||||
# Check for uncommitted changes
|
||||
git status --porcelain
|
||||
|
||||
# Get current repo name
|
||||
basename $(git rev-parse --show-toplevel)
|
||||
```
|
||||
|
||||
If dirty working directory → warn user but allow proceeding (isolation is separate)
|
||||
|
||||
### Step 2: Create Isolation Worktree
|
||||
|
||||
**Generate unique branch name:**
|
||||
```bash
|
||||
BRANCH_NAME="test-skill-$(date +%s)" # e.g., test-skill-1699876543
|
||||
```
|
||||
|
||||
**Create worktree:**
|
||||
```bash
|
||||
WORKTREE_PATH="../$(basename $(pwd))-${BRANCH_NAME}"
|
||||
git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
|
||||
```
|
||||
|
||||
Example result: `/Users/connor/claude-test-skill-1699876543/`
|
||||
|
||||
### Step 3: Copy Skill to Worktree
|
||||
|
||||
```bash
|
||||
# Copy skill directory to worktree's .claude/skills/
|
||||
cp -r ~/.claude/skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
|
||||
|
||||
# Or if skill is in current repo
|
||||
cp -r ./skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
|
||||
```
|
||||
|
||||
**Verify copy:**
|
||||
```bash
|
||||
ls -la "$WORKTREE_PATH/.claude/skills/[skill-name]/"
|
||||
```
|
||||
|
||||
### Step 4: Setup Development Environment
|
||||
|
||||
**Install dependencies if needed:**
|
||||
```bash
|
||||
cd "$WORKTREE_PATH"
|
||||
|
||||
# Detect package manager
|
||||
if [ -f "pnpm-lock.yaml" ]; then
|
||||
pnpm install
|
||||
elif [ -f "yarn.lock" ]; then
|
||||
yarn install
|
||||
elif [ -f "package-lock.json" ]; then
|
||||
npm install
|
||||
fi
|
||||
```
|
||||
|
||||
**Copy environment files (optional):**
|
||||
```bash
|
||||
# Only if skill needs .env for testing
|
||||
cp ../.env "$WORKTREE_PATH/.env"
|
||||
```
|
||||
|
||||
### Step 5: Take "Before" Snapshot
|
||||
|
||||
```bash
|
||||
# List all files in worktree
|
||||
find "$WORKTREE_PATH" -type f > /tmp/before-files.txt
|
||||
|
||||
# List running processes (for comparison later)
|
||||
ps aux > /tmp/before-processes.txt
|
||||
|
||||
# Current disk usage
|
||||
du -sh "$WORKTREE_PATH" > /tmp/before-disk.txt
|
||||
```
|
||||
|
||||
### Step 6: Execute Skill in Worktree
|
||||
|
||||
**Open new Claude Code session in worktree:**
|
||||
```bash
|
||||
cd "$WORKTREE_PATH"
|
||||
claude
|
||||
```
|
||||
|
||||
**Run skill with test trigger:**
|
||||
- User manually tests skill with trigger phrases
|
||||
- OR: Use Claude CLI to run skill programmatically (if available)
|
||||
|
||||
**Monitor execution:**
|
||||
- Watch for errors in output
|
||||
- Note execution time
|
||||
- Check resource usage
|
||||
|
||||
### Step 7: Take "After" Snapshot
|
||||
|
||||
```bash
|
||||
# List all files after execution
|
||||
find "$WORKTREE_PATH" -type f > /tmp/after-files.txt
|
||||
|
||||
# Compare before/after
|
||||
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
|
||||
|
||||
# Check for new processes
|
||||
ps aux > /tmp/after-processes.txt
|
||||
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
|
||||
|
||||
# Check disk usage
|
||||
du -sh "$WORKTREE_PATH" > /tmp/after-disk.txt
|
||||
```
|
||||
|
||||
### Step 8: Analyze Results
|
||||
|
||||
**Check for side effects:**
|
||||
```bash
|
||||
# Files created
|
||||
grep ">" /tmp/file-changes.txt | wc -l
|
||||
|
||||
# Files deleted
|
||||
grep "<" /tmp/file-changes.txt | wc -l
|
||||
|
||||
# New processes (filter out expected ones)
|
||||
# Look for processes related to skill
|
||||
```
|
||||
|
||||
**Validate cleanup:**
|
||||
```bash
|
||||
# Check for leftover temp files
|
||||
find "$WORKTREE_PATH" -name "*.tmp" -o -name "*.temp" -o -name ".cache"
|
||||
|
||||
# Check for orphaned processes
|
||||
# Look for processes still running from skill
|
||||
```
|
||||
|
||||
### Step 9: Generate Report
|
||||
|
||||
**Execution Results:**
|
||||
- ✅ Skill completed successfully / ❌ Skill failed with error
|
||||
- ⏱️ Execution time: Xs
|
||||
- 📊 Resource usage: XMB disk, X% CPU
|
||||
|
||||
**Side Effects:**
|
||||
- Files created: [count] (list if < 10)
|
||||
- Files modified: [count]
|
||||
- Processes created: [count]
|
||||
- Temporary files remaining: [count]
|
||||
|
||||
**Dependency Analysis:**
|
||||
- Required tools: [list tools used by skill]
|
||||
- Hardcoded paths: [list any absolute paths found]
|
||||
- Environment variables: [list any ENV vars referenced]
|
||||
|
||||
### Step 10: Cleanup
|
||||
|
||||
**Ask user:**
|
||||
```
|
||||
Test complete. Worktree location: $WORKTREE_PATH
|
||||
|
||||
Options:
|
||||
1. Keep worktree for debugging
|
||||
2. Remove worktree and branch
|
||||
3. Remove worktree, keep branch
|
||||
|
||||
Your choice?
|
||||
```
|
||||
|
||||
**Cleanup commands:**
|
||||
```bash
|
||||
# Option 2: Full cleanup
|
||||
git worktree remove "$WORKTREE_PATH"
|
||||
git branch -D "$BRANCH_NAME"
|
||||
|
||||
# Option 3: Keep branch
|
||||
git worktree remove "$WORKTREE_PATH"
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### ✅ **PASS** - Ready for git worktree environments
|
||||
- Skill completed without errors
|
||||
- No unexpected file modifications
|
||||
- No orphaned processes
|
||||
- No hardcoded paths detected
|
||||
- Temporary files cleaned up
|
||||
|
||||
### ⚠️ **WARNING** - Works but has minor issues
|
||||
- Skill works but left temporary files
|
||||
- Uses some hardcoded paths (but non-critical)
|
||||
- Performance could be improved
|
||||
- Missing some documentation
|
||||
|
||||
### ❌ **FAIL** - Not ready
|
||||
- Skill crashed or hung
|
||||
- Requires system packages not installed
|
||||
- Modifies files outside skill directory without permission
|
||||
- Creates orphaned processes
|
||||
- Has critical hardcoded paths
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Issue: "Skill not found in Claude"
|
||||
**Cause**: Skill wasn't copied to worktree's .claude/skills/
|
||||
**Fix**: Verify copy command and path
|
||||
|
||||
### Issue: "Permission denied" errors
|
||||
**Cause**: Skill trying to write to protected directories
|
||||
**Fix**: Identify problematic paths, suggest using /tmp or skill directory
|
||||
|
||||
### Issue: "Command not found"
|
||||
**Cause**: Skill depends on system tool not installed
|
||||
**Fix**: Document dependency, suggest adding to skill README
|
||||
|
||||
### Issue: Test results different from main directory
|
||||
**Cause**: Different node_modules or configs
|
||||
**Fix**: This is expected - worktree shares some state, not true isolation
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always take before/after snapshots** for accurate comparison
|
||||
2. **Test multiple times** to ensure consistency
|
||||
3. **Check temp directories** (`/tmp`, `/var/tmp`) for leftover files
|
||||
4. **Monitor processes** for at least 30s after skill completes
|
||||
5. **Document all dependencies** found during testing
|
||||
6. **Use relative paths** in skill code, never absolute
|
||||
7. **Cleanup worktrees** regularly to avoid clutter
|
||||
|
||||
## Quick Command Reference
|
||||
|
||||
```bash
|
||||
# Create test worktree
|
||||
git worktree add ../test-branch -b test-branch
|
||||
|
||||
# List all worktrees
|
||||
git worktree list
|
||||
|
||||
# Remove worktree
|
||||
git worktree remove ../test-branch
|
||||
|
||||
# Remove worktree and branch
|
||||
git worktree remove ../test-branch && git branch -D test-branch
|
||||
|
||||
# Find temp files created
|
||||
find /tmp -name "*skill-name*" -mtime -1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Git worktree provides quick, lightweight isolation but is NOT true isolation. Use for low-risk skills or fast iteration during development. For skills that modify system state, use Docker or VM modes.
|
||||
468
skills/meta/skill-isolation-tester/modes/mode2-docker.md
Normal file
468
skills/meta/skill-isolation-tester/modes/mode2-docker.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# Mode 2: Docker Container Isolation
|
||||
|
||||
## Using Docker Helper Library
|
||||
|
||||
**RECOMMENDED:** Use the helper library for robust error handling and cleanup.
|
||||
|
||||
```bash
|
||||
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
|
||||
|
||||
# Set cleanup trap (runs automatically on exit)
|
||||
trap cleanup_on_exit EXIT
|
||||
|
||||
# Pre-flight checks
|
||||
preflight_check_docker || exit 1
|
||||
```
|
||||
|
||||
The helper library provides:
|
||||
- Shell command validation (prevents syntax errors)
|
||||
- Retry logic with exponential backoff
|
||||
- Automatic cleanup on exit
|
||||
- Pre-flight Docker environment checks
|
||||
- Safe build and run functions
|
||||
|
||||
See `lib/docker-helpers.sh` for full documentation.
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
**Best for:**
|
||||
- Skills that install npm/pip packages or system dependencies
|
||||
- Skills that modify configuration files
|
||||
- Medium-risk skills that need OS-level isolation
|
||||
- Testing skills with different Claude Code versions
|
||||
- Reproducible testing environments
|
||||
|
||||
**Not suitable for:**
|
||||
- Skills that require VM operations or nested virtualization
|
||||
- Skills that need GUI access (without X11 forwarding)
|
||||
- Extremely high-risk skills (use VM mode instead)
|
||||
|
||||
**Risk Level**: Low to medium complexity skills
|
||||
|
||||
## Advantages
|
||||
|
||||
- 🏗️ **True OS Isolation**: Complete filesystem and process separation
|
||||
- 📦 **Reproducible**: Same environment every time
|
||||
- 🔒 **Sandboxed**: Limited access to host system
|
||||
- 🎯 **Precise**: Control exactly what's installed
|
||||
- 🗑️ **Clean**: Easy to destroy and recreate
|
||||
|
||||
## Limitations
|
||||
|
||||
- ⏱️ Slower than git worktree (container overhead)
|
||||
- 💾 Requires disk space for images
|
||||
- 🐳 Requires Docker installation and running daemon
|
||||
- ⚙️ More complex setup than worktree
|
||||
- 🔧 May need volume mounts for file access
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Docker installed and running (`docker info`)
|
||||
2. Sufficient disk space (~1GB for base image + skill)
|
||||
3. Permissions to run Docker commands
|
||||
4. Internet connection (first time only, to pull images)
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Validate Docker Environment
|
||||
|
||||
```bash
|
||||
# Check Docker is installed
|
||||
command -v docker || { echo "Docker not installed"; exit 1; }
|
||||
|
||||
# Check Docker daemon is running
|
||||
docker info > /dev/null 2>&1 || { echo "Docker daemon not running"; exit 1; }
|
||||
|
||||
# Check disk space
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Step 2: Choose Base Image
|
||||
|
||||
**Options:**
|
||||
1. **claude-code-base** (preferred if available)
|
||||
- Pre-built image with Claude Code installed
|
||||
- Fastest startup time
|
||||
|
||||
2. **ubuntu:22.04** (fallback)
|
||||
- Install Claude Code manually
|
||||
- More control over environment
|
||||
|
||||
**Check if custom image exists:**
|
||||
```bash
|
||||
docker images | grep claude-code-base
|
||||
```
|
||||
|
||||
### Step 3: Prepare Skill for Container
|
||||
|
||||
**Create temporary directory:**
|
||||
```bash
|
||||
TEST_DIR="/tmp/skill-test-$(date +%s)"
|
||||
mkdir -p "$TEST_DIR"
|
||||
|
||||
# Copy skill to test directory
|
||||
cp -r ~/.claude/skills/[skill-name] "$TEST_DIR/"
|
||||
|
||||
# Create Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<'EOF'
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
git \
|
||||
nodejs \
|
||||
npm \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Claude Code (adjust version as needed)
|
||||
RUN npm install -g @anthropic/claude-code
|
||||
|
||||
# Create directory structure
|
||||
RUN mkdir -p /root/.claude/skills
|
||||
|
||||
# Copy skill
|
||||
COPY [skill-name]/ /root/.claude/skills/[skill-name]/
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /root
|
||||
|
||||
# Default command
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 4: Build Docker Image
|
||||
|
||||
```bash
|
||||
cd "$TEST_DIR"
|
||||
|
||||
# Build image with tag
|
||||
docker build -t skill-test:[skill-name] .
|
||||
|
||||
# Verify build succeeded
|
||||
docker images | grep skill-test
|
||||
```
|
||||
|
||||
**Expected build time:** 2-5 minutes (first time), < 30s (cached)
|
||||
|
||||
### Step 5: Take "Before" Snapshot
|
||||
|
||||
**Create container (don't start yet):**
|
||||
```bash
|
||||
CONTAINER_ID=$(docker create \
|
||||
--name skill-test-$(date +%s) \
|
||||
--memory="512m" \
|
||||
--cpus="1.0" \
|
||||
skill-test:[skill-name])
|
||||
|
||||
echo "Container ID: $CONTAINER_ID"
|
||||
```
|
||||
|
||||
**Snapshot filesystem:**
|
||||
```bash
|
||||
docker export $CONTAINER_ID | tar -t > /tmp/before-files.txt
|
||||
```
|
||||
|
||||
### Step 6: Run Skill in Container
|
||||
|
||||
**Start container interactively:**
|
||||
```bash
|
||||
docker start -ai $CONTAINER_ID
|
||||
```
|
||||
|
||||
**Or run with test command:**
|
||||
```bash
|
||||
docker run -it \
|
||||
--name skill-test \
|
||||
--rm \
|
||||
--memory="512m" \
|
||||
--cpus="1.0" \
|
||||
skill-test:[skill-name] \
|
||||
bash -c "claude skill run [skill-name] --test"
|
||||
```
|
||||
|
||||
**Monitor execution:**
|
||||
```bash
|
||||
# In another terminal, watch resource usage
|
||||
docker stats $CONTAINER_ID
|
||||
|
||||
# Watch logs
|
||||
docker logs -f $CONTAINER_ID
|
||||
```
|
||||
|
||||
### Step 7: Take "After" Snapshot
|
||||
|
||||
**Commit container state:**
|
||||
```bash
|
||||
docker commit $CONTAINER_ID skill-test:[skill-name]-after
|
||||
```
|
||||
|
||||
**Export and compare files:**
|
||||
```bash
|
||||
# Export after state
|
||||
docker export $CONTAINER_ID | tar -t > /tmp/after-files.txt
|
||||
|
||||
# Find differences
|
||||
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
|
||||
|
||||
# Count changes
|
||||
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
|
||||
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
|
||||
```
|
||||
|
||||
**Check for running processes:**
|
||||
```bash
|
||||
docker exec $CONTAINER_ID ps aux > /tmp/processes.txt
|
||||
```
|
||||
|
||||
### Step 8: Analyze Results
|
||||
|
||||
**Extract skill logs:**
|
||||
```bash
|
||||
docker logs $CONTAINER_ID > /tmp/skill-execution.log
|
||||
|
||||
# Check for errors
|
||||
grep -i "error\|fail\|exception" /tmp/skill-execution.log
|
||||
```
|
||||
|
||||
**Check resource usage:**
|
||||
```bash
|
||||
docker stats --no-stream $CONTAINER_ID
|
||||
```
|
||||
|
||||
**Inspect filesystem changes:**
|
||||
```bash
|
||||
# List files in skill directory
|
||||
docker exec $CONTAINER_ID find /root/.claude/skills/[skill-name] -type f
|
||||
|
||||
# Check temp directories
|
||||
docker exec $CONTAINER_ID find /tmp -name "*skill*" -o -name "*.tmp"
|
||||
|
||||
# Check for leftover processes
|
||||
docker exec $CONTAINER_ID ps aux | grep -v "ps\|bash"
|
||||
```
|
||||
|
||||
**Analyze dependencies:**
|
||||
```bash
|
||||
# Check what packages were installed
|
||||
docker diff $CONTAINER_ID | grep -E "^A /usr|^A /var/lib"
|
||||
|
||||
# Check what commands were executed
|
||||
docker logs $CONTAINER_ID | grep -E "npm install|apt-get|pip install"
|
||||
```
|
||||
|
||||
### Step 9: Generate Report
|
||||
|
||||
**Execution Status:**
|
||||
```markdown
|
||||
## Execution Results
|
||||
|
||||
**Container**: $CONTAINER_ID
|
||||
**Base Image**: ubuntu:22.04
|
||||
**Status**: [Running/Stopped/Exited]
|
||||
**Exit Code**: $(docker inspect $CONTAINER_ID --format='{{.State.ExitCode}}')
|
||||
|
||||
**Resource Usage**:
|
||||
- Memory: XMB / 512MB
|
||||
- CPU: X%
|
||||
- Execution Time: Xs
|
||||
```
|
||||
|
||||
**Side Effects:**
|
||||
```markdown
|
||||
## Filesystem Changes
|
||||
|
||||
Files added: X
|
||||
Files modified: X
|
||||
Files deleted: X
|
||||
|
||||
**Significant changes:**
|
||||
- /tmp/skill-temp-xyz.log (5KB)
|
||||
- /root/.claude/cache/skill-data.json (15KB)
|
||||
```
|
||||
|
||||
**Dependency Analysis:**
|
||||
```markdown
|
||||
## Dependencies Detected
|
||||
|
||||
**System Packages**:
|
||||
- curl (already present)
|
||||
- jq (installed by skill)
|
||||
|
||||
**NPM Packages**:
|
||||
- lodash@4.17.21 (installed)
|
||||
|
||||
**Hardcoded Paths**:
|
||||
⚠️ /root/.claude/config (line 45)
|
||||
→ Use $HOME/.claude/config instead
|
||||
```
|
||||
|
||||
### Step 10: Cleanup
|
||||
|
||||
**Ask user:**
|
||||
```
|
||||
Test complete. Container: $CONTAINER_ID
|
||||
|
||||
Options:
|
||||
1. Keep container for debugging (docker start -ai $CONTAINER_ID)
|
||||
2. Stop container, keep image (can restart later)
|
||||
3. Remove container and image (full cleanup)
|
||||
|
||||
Your choice?
|
||||
```
|
||||
|
||||
**Cleanup commands:**
|
||||
```bash
|
||||
# Option 2: Stop container
|
||||
docker stop $CONTAINER_ID
|
||||
|
||||
# Option 3: Full cleanup
|
||||
docker rm -f $CONTAINER_ID
|
||||
docker rmi skill-test:[skill-name]
|
||||
docker rmi skill-test:[skill-name]-after
|
||||
|
||||
# Cleanup test directory
|
||||
rm -rf "$TEST_DIR"
|
||||
```
|
||||
|
||||
**Cleanup all test containers:**
|
||||
```bash
|
||||
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
|
||||
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi -f
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### ✅ **PASS** - Production Ready
|
||||
- Container exited with code 0
|
||||
- Skill completed successfully
|
||||
- No excessive resource usage
|
||||
- All dependencies documented
|
||||
- No orphaned processes
|
||||
- Temp files in acceptable locations (/tmp only)
|
||||
|
||||
### ⚠️ **WARNING** - Needs Improvement
|
||||
- Exit code 0 but warnings in logs
|
||||
- Higher than expected resource usage
|
||||
- Some undocumented dependencies
|
||||
- Minor cleanup issues
|
||||
|
||||
### ❌ **FAIL** - Not Ready
|
||||
- Container exited with non-zero code
|
||||
- Skill crashed or hung
|
||||
- Excessive resource usage (> 512MB memory)
|
||||
- Attempted to access outside container
|
||||
- Critical dependencies not documented
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Issue: "Docker daemon not running"
|
||||
**Fix**:
|
||||
```bash
|
||||
# macOS
|
||||
open -a Docker
|
||||
|
||||
# Linux
|
||||
sudo systemctl start docker
|
||||
```
|
||||
|
||||
### Issue: "Permission denied" when building image
|
||||
**Cause**: User not in docker group
|
||||
**Fix**:
|
||||
```bash
|
||||
# Add user to docker group
|
||||
sudo usermod -aG docker $USER
|
||||
|
||||
# Logout/login or run:
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
### Issue: "No space left on device"
|
||||
**Cause**: Docker disk space full
|
||||
**Fix**:
|
||||
```bash
|
||||
# Clean up old images and containers
|
||||
docker system prune -a
|
||||
|
||||
# Check space
|
||||
docker system df
|
||||
```
|
||||
|
||||
### Issue: Skill requires GUI
|
||||
**Cause**: Skill opens browser or displays graphics
|
||||
**Fix**: Add X11 forwarding or mark skill as requiring GUI
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
### Volume Mounts for Live Testing
|
||||
|
||||
```bash
|
||||
# Mount skill directory for live editing
|
||||
docker run -it \
|
||||
-v ~/.claude/skills/[skill-name]:/root/.claude/skills/[skill-name] \
|
||||
skill-test:[skill-name]
|
||||
```
|
||||
|
||||
### Custom Network Settings
|
||||
|
||||
```bash
|
||||
# Isolated network (no internet)
|
||||
docker run -it --network=none skill-test:[skill-name]
|
||||
|
||||
# Monitor network traffic
|
||||
docker run -it --cap-add=NET_ADMIN skill-test:[skill-name]
|
||||
```
|
||||
|
||||
### Multi-Stage Testing
|
||||
|
||||
```bash
|
||||
# Test with different Node versions
|
||||
docker build -t skill-test:node16 --build-arg NODE_VERSION=16 .
|
||||
docker build -t skill-test:node18 --build-arg NODE_VERSION=18 .
|
||||
docker build -t skill-test:node20 --build-arg NODE_VERSION=20 .
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always set resource limits** (`--memory`, `--cpus`) to prevent runaway processes
|
||||
2. **Use `--rm` flag** for auto-cleanup in simple tests
|
||||
3. **Tag images clearly** with skill name and version
|
||||
4. **Cache base images** to speed up subsequent tests
|
||||
5. **Export test results** before removing containers
|
||||
6. **Test with minimal permissions** first, add as needed
|
||||
7. **Document all APT/NPM/PIP installs** found during testing
|
||||
|
||||
## Quick Command Reference
|
||||
|
||||
```bash
|
||||
# Build test image
|
||||
docker build -t skill-test:my-skill .
|
||||
|
||||
# Run with auto-cleanup
|
||||
docker run -it --rm skill-test:my-skill
|
||||
|
||||
# Run with resource limits
|
||||
docker run -it --memory="512m" --cpus="1.0" skill-test:my-skill
|
||||
|
||||
# Check container status
|
||||
docker ps -a | grep skill-test
|
||||
|
||||
# View container logs
|
||||
docker logs <container-id>
|
||||
|
||||
# Execute command in running container
|
||||
docker exec <container-id> <command>
|
||||
|
||||
# Stop and remove all test containers
|
||||
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
|
||||
|
||||
# Remove all test images
|
||||
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember:** Docker provides strong isolation with reproducible environments. Use for skills that install packages or modify system files. For highest security, use VM mode instead.
|
||||
565
skills/meta/skill-isolation-tester/modes/mode3-vm.md
Normal file
565
skills/meta/skill-isolation-tester/modes/mode3-vm.md
Normal file
@@ -0,0 +1,565 @@
|
||||
# Mode 3: VM (Virtual Machine) Isolation
|
||||
|
||||
## When to Use
|
||||
|
||||
**Best for:**
|
||||
- High-risk skills that modify system configurations
|
||||
- Skills that require kernel modules or system services
|
||||
- Testing skills that interact with VMs themselves
|
||||
- Maximum isolation and security
|
||||
- Skills from untrusted sources
|
||||
|
||||
**Not suitable for:**
|
||||
- Quick iteration during development (too slow)
|
||||
- Skills that are obviously safe and read-only
|
||||
- Situations where speed is more important than isolation
|
||||
|
||||
**Risk Level**: Medium to high complexity skills
|
||||
|
||||
## Advantages
|
||||
|
||||
- 🔒 **Complete Isolation**: Separate kernel, OS, and all resources
|
||||
- 🛡️ **Maximum Security**: Host system is completely protected
|
||||
- 🖥️ **Real OS Environment**: Test on actual Linux/macOS distributions
|
||||
- 📸 **Snapshots**: Easy rollback to clean state
|
||||
- 🧪 **Destructive Testing**: Safe to test potentially dangerous operations
|
||||
|
||||
## Limitations
|
||||
|
||||
- 🐌 **Slow**: Minutes to provision, slower execution
|
||||
- 💾 **Disk Space**: 10-20GB per VM
|
||||
- 💰 **Resource Intensive**: Requires significant RAM and CPU
|
||||
- 🔧 **Complex Setup**: More moving parts to configure
|
||||
- ⏱️ **Longer Feedback Loop**: Not ideal for rapid iteration
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Virtualization software installed:
|
||||
- **macOS**: UTM, Parallels, or VMware Fusion
|
||||
- **Linux**: QEMU/KVM, VirtualBox, or virt-manager
|
||||
- **Windows**: VirtualBox, Hyper-V, or VMware Workstation
|
||||
|
||||
2. Base VM image or ISO:
|
||||
- Ubuntu 22.04 LTS (recommended)
|
||||
- Debian 12
|
||||
- Fedora 39
|
||||
|
||||
3. System resources:
|
||||
- 8GB+ host RAM (allocate 2-4GB to VM)
|
||||
- 20GB+ disk space
|
||||
- CPU virtualization enabled (VT-x/AMD-V)
|
||||
|
||||
4. Command-line tools:
|
||||
- **macOS with UTM**: `utmctl` or use UI
|
||||
- **Linux**: `virsh` (libvirt) or `vboxmanage` (VirtualBox)
|
||||
- **Multipass**: `multipass` (cross-platform, recommended)
|
||||
|
||||
## Recommended: Use Multipass
|
||||
|
||||
Multipass is the easiest option for cross-platform VM management:
|
||||
|
||||
```bash
|
||||
# Install Multipass
|
||||
# macOS:
|
||||
brew install multipass
|
||||
|
||||
# Linux:
|
||||
sudo snap install multipass
|
||||
|
||||
# Windows:
|
||||
# Download from https://multipass.run/
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Validate Virtualization Environment
|
||||
|
||||
```bash
|
||||
# Check virtualization is enabled (Linux)
|
||||
grep -E 'vmx|svm' /proc/cpuinfo
|
||||
|
||||
# Check Multipass is installed
|
||||
command -v multipass || { echo "Install Multipass"; exit 1; }
|
||||
|
||||
# Check available resources
|
||||
multipass info || echo "First time setup needed"
|
||||
```
|
||||
|
||||
### Step 2: Create Base VM
|
||||
|
||||
**Launch clean Ubuntu VM:**
|
||||
```bash
|
||||
VM_NAME="skill-test-$(date +%s)"
|
||||
|
||||
# Launch VM with Multipass
|
||||
multipass launch \
|
||||
--name "$VM_NAME" \
|
||||
--cpus 2 \
|
||||
--memory 2G \
|
||||
--disk 10G \
|
||||
22.04
|
||||
|
||||
# Wait for VM to be ready
|
||||
multipass exec "$VM_NAME" -- cloud-init status --wait
|
||||
```
|
||||
|
||||
**Or use UTM (macOS GUI):**
|
||||
1. Download Ubuntu 22.04 ARM64 ISO
|
||||
2. Create new VM with 2GB RAM, 10GB disk
|
||||
3. Install Ubuntu and setup user
|
||||
4. Note VM name for scripts
|
||||
|
||||
**Or use virsh (Linux CLI):**
|
||||
```bash
|
||||
# Download cloud image
|
||||
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
|
||||
|
||||
# Create VM
|
||||
virt-install \
|
||||
--name "$VM_NAME" \
|
||||
--memory 2048 \
|
||||
--vcpus 2 \
|
||||
--disk ubuntu-22.04-server-cloudimg-amd64.img \
|
||||
--import \
|
||||
--os-variant ubuntu22.04
|
||||
```
|
||||
|
||||
### Step 3: Install Claude Code in VM
|
||||
|
||||
```bash
|
||||
# Install system dependencies
|
||||
multipass exec "$VM_NAME" -- sudo apt-get update
|
||||
multipass exec "$VM_NAME" -- sudo apt-get install -y \
|
||||
curl \
|
||||
git \
|
||||
nodejs \
|
||||
npm
|
||||
|
||||
# Install Claude Code
|
||||
multipass exec "$VM_NAME" -- npm install -g @anthropic/claude-code
|
||||
|
||||
# Verify installation
|
||||
multipass exec "$VM_NAME" -- which claude
|
||||
```
|
||||
|
||||
### Step 4: Copy Skill to VM
|
||||
|
||||
```bash
|
||||
# Create directory structure
|
||||
multipass exec "$VM_NAME" -- mkdir -p /home/ubuntu/.claude/skills
|
||||
|
||||
# Copy skill to VM
|
||||
multipass transfer \
|
||||
~/.claude/skills/[skill-name] \
|
||||
"$VM_NAME":/home/ubuntu/.claude/skills/
|
||||
|
||||
# Verify copy
|
||||
multipass exec "$VM_NAME" -- ls -la /home/ubuntu/.claude/skills/[skill-name]
|
||||
```
|
||||
|
||||
### Step 5: Take VM Snapshot
|
||||
|
||||
**With Multipass:**
|
||||
```bash
|
||||
# Multipass doesn't support snapshots directly
|
||||
# Instead, we'll capture filesystem state
|
||||
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/before-files.txt
|
||||
multipass exec "$VM_NAME" -- dpkg -l > /tmp/before-packages.txt
|
||||
multipass exec "$VM_NAME" -- ps aux > /tmp/before-processes.txt
|
||||
```
|
||||
|
||||
**With UTM (macOS):**
|
||||
```bash
|
||||
# Take snapshot via UI or CLI if available
|
||||
utmctl snapshot "$VM_NAME" --name "before-skill-test"
|
||||
```
|
||||
|
||||
**With virsh (Linux):**
|
||||
```bash
|
||||
virsh snapshot-create-as "$VM_NAME" before-skill-test "Before skill test"
|
||||
```
|
||||
|
||||
### Step 6: Execute Skill in VM
|
||||
|
||||
**Start Claude Code session in VM:**
|
||||
```bash
|
||||
# Interactive session
|
||||
multipass shell "$VM_NAME"
|
||||
|
||||
# Then inside VM:
|
||||
claude
|
||||
|
||||
# Run skill with trigger phrase
|
||||
```
|
||||
|
||||
**Or execute non-interactively:**
|
||||
```bash
|
||||
# If skill has test command
|
||||
multipass exec "$VM_NAME" -- \
|
||||
bash -c "claude skill run [skill-name] --test"
|
||||
```
|
||||
|
||||
**Monitor from host:**
|
||||
```bash
|
||||
# Watch resource usage
|
||||
multipass info "$VM_NAME" --format json | jq '.info[] | {memory_usage, cpu_usage}'
|
||||
|
||||
# Tail logs
|
||||
multipass exec "$VM_NAME" -- tail -f /var/log/syslog
|
||||
```
|
||||
|
||||
### Step 7: Take Post-Execution Snapshot
|
||||
|
||||
```bash
|
||||
# Capture filesystem state
|
||||
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/after-files.txt
|
||||
multipass exec "$VM_NAME" -- dpkg -l > /tmp/after-packages.txt
|
||||
multipass exec "$VM_NAME" -- ps aux > /tmp/after-processes.txt
|
||||
|
||||
# Compare
|
||||
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
|
||||
diff /tmp/before-packages.txt /tmp/after-packages.txt > /tmp/package-changes.txt
|
||||
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
|
||||
```
|
||||
|
||||
**Snapshot VM state:**
|
||||
```bash
|
||||
# virsh
|
||||
virsh snapshot-create-as "$VM_NAME" after-skill-test "After skill test"
|
||||
|
||||
# UTM (macOS)
|
||||
utmctl snapshot "$VM_NAME" --name "after-skill-test"
|
||||
```
|
||||
|
||||
### Step 8: Analyze Results
|
||||
|
||||
**Extract execution logs:**
|
||||
```bash
|
||||
# Copy Claude Code logs from VM
|
||||
multipass transfer \
|
||||
"$VM_NAME":/home/ubuntu/.claude/logs/ \
|
||||
/tmp/skill-test-logs/
|
||||
|
||||
# Analyze logs
|
||||
grep -i "error\|warning\|fail" /tmp/skill-test-logs/*.log
|
||||
```
|
||||
|
||||
**Check filesystem changes:**
|
||||
```bash
|
||||
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
|
||||
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
|
||||
|
||||
# Check for unexpected modifications
|
||||
grep ">/etc/" /tmp/file-changes.txt # System config changes
|
||||
grep ">/usr/local/" /tmp/file-changes.txt # Global installs
|
||||
```
|
||||
|
||||
**Check package changes:**
|
||||
```bash
|
||||
# List newly installed packages
|
||||
grep ">" /tmp/package-changes.txt
|
||||
|
||||
# Check for removed packages
|
||||
grep "<" /tmp/package-changes.txt
|
||||
```
|
||||
|
||||
**Check for orphaned processes:**
|
||||
```bash
|
||||
# Processes still running after skill completion
|
||||
grep ">" /tmp/process-changes.txt | grep -v "ps\|grep\|ssh"
|
||||
```
|
||||
|
||||
**System modifications:**
|
||||
```bash
|
||||
# Check for systemd services
|
||||
multipass exec "$VM_NAME" -- systemctl list-units --type=service --state=running
|
||||
|
||||
# Check for cron jobs
|
||||
multipass exec "$VM_NAME" -- crontab -l
|
||||
|
||||
# Check for environment modifications
|
||||
multipass exec "$VM_NAME" -- cat /etc/environment
|
||||
```
|
||||
|
||||
### Step 9: Generate Comprehensive Report
|
||||
|
||||
```markdown
|
||||
# VM Isolation Test Report: [skill-name]
|
||||
|
||||
## Environment
|
||||
**VM Platform**: Multipass / UTM / virsh
|
||||
**OS**: Ubuntu 22.04 LTS
|
||||
**VM Name**: $VM_NAME
|
||||
**Resources**: 2 vCPU, 2GB RAM, 10GB disk
|
||||
|
||||
## Execution Results
|
||||
**Status**: ✅ Completed successfully
|
||||
**Duration**: 45 seconds
|
||||
**Exit Code**: 0
|
||||
|
||||
## Filesystem Changes
|
||||
**Files Added**: 12
|
||||
- `/home/ubuntu/.claude/cache/skill-data.json` (15KB)
|
||||
- `/tmp/skill-temp-*.log` (3 files, 45KB total)
|
||||
- `/home/ubuntu/.cache/skill-assets/` (8 files, 120KB)
|
||||
|
||||
**Files Modified**: 2
|
||||
- `/home/ubuntu/.claude/config.json` (updated skill registry)
|
||||
- `/home/ubuntu/.bash_history` (normal)
|
||||
|
||||
**Files Deleted**: 0
|
||||
|
||||
## Package Changes
|
||||
**Installed Packages**: 2
|
||||
- `jq` (1.6-2.1ubuntu3)
|
||||
- `tree` (2.0.2-1)
|
||||
|
||||
**Removed Packages**: 0
|
||||
|
||||
## System Modifications
|
||||
✅ No systemd services added
|
||||
✅ No cron jobs created
|
||||
✅ No environment variables modified
|
||||
⚠️ Found leftover temp files in /tmp
|
||||
|
||||
## Process Analysis
|
||||
**Orphaned Processes**: 0
|
||||
**Background Jobs**: 0
|
||||
**Network Connections**: 0
|
||||
|
||||
## Security Assessment
|
||||
✅ No unauthorized file access attempts
|
||||
✅ No privilege escalation attempts
|
||||
✅ No suspicious network activity
|
||||
✅ All operations within user home directory
|
||||
|
||||
## Dependency Analysis
|
||||
**System Packages Required**:
|
||||
- `jq` (for JSON processing) - Not documented in README
|
||||
- `tree` (for directory visualization) - Optional
|
||||
|
||||
**NPM Packages Required**: None beyond Claude Code
|
||||
|
||||
**Hardcoded Paths Detected**:
|
||||
⚠️ `/home/ubuntu/.claude/cache` (line 67)
|
||||
→ Should use `$HOME/.claude/cache` or `~/.claude/cache`
|
||||
|
||||
## Recommendations
|
||||
1. **CRITICAL**: Document `jq` dependency in README.md
|
||||
2. **HIGH**: Fix hardcoded path on line 67
|
||||
3. **MEDIUM**: Clean up /tmp files before skill exits
|
||||
4. **LOW**: Consider making `tree` dependency optional
|
||||
|
||||
## Overall Grade: B (READY with minor fixes)
|
||||
|
||||
**Portability**: 85/100
|
||||
**Cleanliness**: 75/100
|
||||
**Security**: 100/100
|
||||
**Documentation**: 70/100
|
||||
|
||||
**Final Status**: ✅ **APPROVED** for public release after addressing CRITICAL and HIGH priority items
|
||||
```
|
||||
|
||||
### Step 10: Cleanup or Preserve
|
||||
|
||||
**Ask user:**
|
||||
```
|
||||
Test complete. VM: $VM_NAME
|
||||
|
||||
Options:
|
||||
1. Keep VM for manual inspection
|
||||
Command: multipass shell $VM_NAME
|
||||
|
||||
2. Stop VM (can restart later)
|
||||
Command: multipass stop $VM_NAME
|
||||
|
||||
3. Delete VM and snapshots (full cleanup)
|
||||
Command: multipass delete $VM_NAME && multipass purge
|
||||
|
||||
4. Rollback to "before" snapshot and retest
|
||||
(virsh/UTM only)
|
||||
|
||||
Your choice?
|
||||
```
|
||||
|
||||
**Cleanup commands:**
|
||||
```bash
|
||||
# Option 2: Stop VM
|
||||
multipass stop "$VM_NAME"
|
||||
|
||||
# Option 3: Full cleanup
|
||||
multipass delete "$VM_NAME"
|
||||
multipass purge
|
||||
|
||||
# Cleanup temp files
|
||||
rm -rf /tmp/skill-test-logs
|
||||
rm /tmp/before-*.txt /tmp/after-*.txt /tmp/*-changes.txt
|
||||
```
|
||||
|
||||
## Interpreting Results
|
||||
|
||||
### ✅ **PASS** - Production Ready
|
||||
- VM still bootable after test
|
||||
- Skill completed successfully
|
||||
- No unauthorized system modifications
|
||||
- All dependencies documented
|
||||
- No security issues detected
|
||||
- Clean cleanup (no orphaned resources)
|
||||
|
||||
### ⚠️ **WARNING** - Needs Review
|
||||
- Skill works but left system modifications
|
||||
- Installed undocumented packages
|
||||
- Modified system configs (needs user consent)
|
||||
- Performance issues (high resource usage)
|
||||
|
||||
### ❌ **FAIL** - Not Safe
|
||||
- VM corrupted or unbootable
|
||||
- Skill crashed or hung indefinitely
|
||||
- Unauthorized privilege escalation
|
||||
- Malicious behavior detected
|
||||
- Critical undocumented dependencies
|
||||
- Data exfiltration attempts
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Issue: "Multipass not found"
|
||||
**Fix**:
|
||||
```bash
|
||||
# macOS
|
||||
brew install multipass
|
||||
|
||||
# Linux
|
||||
sudo snap install multipass
|
||||
```
|
||||
|
||||
### Issue: "Virtualization not enabled"
|
||||
**Cause**: VT-x/AMD-V disabled in BIOS
|
||||
**Fix**: Enable virtualization in BIOS/UEFI settings
|
||||
|
||||
### Issue: "Failed to launch VM"
|
||||
**Cause**: Insufficient resources
|
||||
**Fix**:
|
||||
```bash
|
||||
# Reduce VM resources
|
||||
multipass launch --cpus 1 --memory 1G --disk 5G
|
||||
```
|
||||
|
||||
### Issue: "VM network not working"
|
||||
**Cause**: Network bridge issues
|
||||
**Fix**:
|
||||
```bash
|
||||
# Restart Multipass daemon
|
||||
# macOS
|
||||
sudo launchctl kickstart -k system/com.canonical.multipassd
|
||||
|
||||
# Linux
|
||||
sudo systemctl restart snap.multipass.multipassd
|
||||
```
|
||||
|
||||
### Issue: "Can't copy files to VM"
|
||||
**Cause**: SSH/sftp issues
|
||||
**Fix**:
|
||||
```bash
|
||||
# Mount host directory instead
|
||||
multipass mount ~/.claude/skills "$VM_NAME":/mnt/skills
|
||||
```
|
||||
|
||||
## Advanced Techniques
|
||||
|
||||
### Automated Testing Pipeline
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# test-skill-vm.sh
|
||||
|
||||
SKILL_NAME="$1"
|
||||
VM_NAME="skill-test-$SKILL_NAME-$(date +%s)"
|
||||
|
||||
# Launch VM
|
||||
multipass launch --name "$VM_NAME" 22.04
|
||||
|
||||
# Setup
|
||||
multipass exec "$VM_NAME" -- bash -c "
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nodejs npm
|
||||
npm install -g @anthropic/claude-code
|
||||
"
|
||||
|
||||
# Copy skill
|
||||
multipass transfer ~/.claude/skills/$SKILL_NAME "$VM_NAME":/home/ubuntu/.claude/skills/
|
||||
|
||||
# Run test
|
||||
multipass exec "$VM_NAME" -- claude skill test $SKILL_NAME
|
||||
|
||||
# Cleanup
|
||||
multipass delete "$VM_NAME"
|
||||
multipass purge
|
||||
```
|
||||
|
||||
### Testing on Multiple OS Versions
|
||||
|
||||
```bash
|
||||
# Test on Ubuntu 20.04, 22.04, and 24.04
|
||||
for version in 20.04 22.04 24.04; do
|
||||
VM="skill-test-ubuntu-${version}"
|
||||
multipass launch --name "$VM" $version
|
||||
# ... run tests ...
|
||||
multipass delete "$VM"
|
||||
done
|
||||
```
|
||||
|
||||
### Network Isolation Testing
|
||||
|
||||
```bash
|
||||
# Create VM without internet access (if supported by hypervisor)
|
||||
# Then test if skill fails gracefully without network
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always take snapshots** before running skills
|
||||
2. **Test on clean VMs** - don't reuse VMs between tests
|
||||
3. **Monitor resource usage** - catch runaway processes
|
||||
4. **Check system logs** (`/var/log/syslog`) for warnings
|
||||
5. **Test rollback** - ensure VM can be restored
|
||||
6. **Document all system dependencies** found
|
||||
7. **Use minimal VM resources** to catch resource issues
|
||||
8. **Archive test results** before destroying VMs
|
||||
|
||||
## Quick Command Reference
|
||||
|
||||
```bash
|
||||
# Launch VM
|
||||
multipass launch --name test-vm 22.04
|
||||
|
||||
# List VMs
|
||||
multipass list
|
||||
|
||||
# Shell into VM
|
||||
multipass shell test-vm
|
||||
|
||||
# Execute command in VM
|
||||
multipass exec test-vm -- <command>
|
||||
|
||||
# Copy file to VM
|
||||
multipass transfer local-file test-vm:/remote/path
|
||||
|
||||
# Copy file from VM
|
||||
multipass transfer test-vm:/remote/path local-file
|
||||
|
||||
# Stop VM
|
||||
multipass stop test-vm
|
||||
|
||||
# Start VM
|
||||
multipass start test-vm
|
||||
|
||||
# Delete VM
|
||||
multipass delete test-vm && multipass purge
|
||||
|
||||
# VM info
|
||||
multipass info test-vm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember:** VM isolation is the gold standard for testing high-risk skills. It's slower but provides complete security and accurate testing of system-level behaviors. Use for skills from untrusted sources or skills that modify system state.
|
||||
408
skills/meta/skill-isolation-tester/templates/test-report.md
Normal file
408
skills/meta/skill-isolation-tester/templates/test-report.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Skill Isolation Test Report: {{skill_name}}
|
||||
|
||||
**Generated**: {{timestamp}}
|
||||
**Tester**: {{tester_name}}
|
||||
**Environment**: {{environment}} ({{mode}})
|
||||
**Duration**: {{duration}}
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Overall Status**: {{status}}
|
||||
**Grade**: {{grade}}
|
||||
**Ready for Release**: {{ready_for_release}}
|
||||
|
||||
### Quick Stats
|
||||
- Execution Status: {{execution_status}}
|
||||
- Side Effects: {{side_effects_count}} detected
|
||||
- Dependencies: {{dependencies_count}} found
|
||||
- Issues: {{issues_high}} HIGH, {{issues_medium}} MEDIUM, {{issues_low}} LOW
|
||||
|
||||
---
|
||||
|
||||
## Test Environment
|
||||
|
||||
**Isolation Mode**: {{mode}}
|
||||
**Platform**: {{platform}}
|
||||
**OS**: {{os_version}}
|
||||
**Resources**: {{resources}}
|
||||
|
||||
{{#if mode_specific_details}}
|
||||
### Mode-Specific Details
|
||||
{{mode_specific_details}}
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Execution Results
|
||||
|
||||
### Status
|
||||
{{execution_status_icon}} **{{execution_status}}**
|
||||
|
||||
### Details
|
||||
- **Start Time**: {{start_time}}
|
||||
- **End Time**: {{end_time}}
|
||||
- **Duration**: {{duration}}
|
||||
- **Exit Code**: {{exit_code}}
|
||||
|
||||
### Output
|
||||
```
|
||||
{{skill_output}}
|
||||
```
|
||||
|
||||
{{#if execution_errors}}
|
||||
### Errors
|
||||
```
|
||||
{{execution_errors}}
|
||||
```
|
||||
{{/if}}
|
||||
|
||||
### Resource Usage
|
||||
- **Peak CPU**: {{peak_cpu}}%
|
||||
- **Peak Memory**: {{peak_memory}}
|
||||
- **Disk I/O**: {{disk_io}}
|
||||
- **Network**: {{network_usage}}
|
||||
|
||||
---
|
||||
|
||||
## Side Effects Analysis
|
||||
|
||||
### Filesystem Changes
|
||||
|
||||
#### Files Created: {{files_created_count}}
|
||||
{{#each files_created}}
|
||||
- `{{path}}` ({{size}}){{#if temporary}} - TEMPORARY{{/if}}{{#if cleanup_failed}} ⚠️ Not cleaned up{{/if}}
|
||||
{{/each}}
|
||||
|
||||
{{#if files_created_count_zero}}
|
||||
✅ No files created
|
||||
{{/if}}
|
||||
|
||||
#### Files Modified: {{files_modified_count}}
|
||||
{{#each files_modified}}
|
||||
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
|
||||
{{/each}}
|
||||
|
||||
{{#if files_modified_count_zero}}
|
||||
✅ No files modified
|
||||
{{/if}}
|
||||
|
||||
#### Files Deleted: {{files_deleted_count}}
|
||||
{{#each files_deleted}}
|
||||
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
|
||||
{{/each}}
|
||||
|
||||
{{#if files_deleted_count_zero}}
|
||||
✅ No files deleted
|
||||
{{/if}}
|
||||
|
||||
### Process Management
|
||||
|
||||
#### Processes Created: {{processes_created_count}}
|
||||
{{#each processes}}
|
||||
- PID {{pid}}: `{{command}}`{{#if still_running}} ⚠️ Still running{{/if}}
|
||||
{{/each}}
|
||||
|
||||
{{#if orphaned_processes}}
|
||||
⚠️ **Orphaned Processes**: {{orphaned_processes_count}}
|
||||
{{#each orphaned_processes}}
|
||||
- PID {{pid}}: `{{command}}` ({{runtime}} running)
|
||||
{{/each}}
|
||||
{{/if}}
|
||||
|
||||
{{#if no_process_issues}}
|
||||
✅ All processes completed successfully
|
||||
{{/if}}
|
||||
|
||||
### System Configuration
|
||||
|
||||
#### Environment Variables
|
||||
{{#if env_vars_changed}}
|
||||
{{#each env_vars_changed}}
|
||||
- `{{name}}`: {{before}} → {{after}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No environment variable changes
|
||||
{{/if}}
|
||||
|
||||
#### Services & Daemons
|
||||
{{#if services_started}}
|
||||
{{#each services_started}}
|
||||
- `{{name}}` ({{status}}){{#if undocumented}} ⚠️ Undocumented{{/if}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No services started
|
||||
{{/if}}
|
||||
|
||||
#### Package Installations
|
||||
{{#if packages_installed}}
|
||||
{{#each packages_installed}}
|
||||
- `{{name}}` ({{version}}){{#if undocumented}} ⚠️ Not documented{{/if}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No packages installed
|
||||
{{/if}}
|
||||
|
||||
### Network Activity
|
||||
|
||||
{{#if network_connections}}
|
||||
**Connections**: {{network_connections_count}}
|
||||
{{#each network_connections}}
|
||||
- {{protocol}} to `{{destination}}:{{port}}`{{#if secure}} (HTTPS){{else}} ⚠️ (HTTP){{/if}}
|
||||
{{/each}}
|
||||
|
||||
**Data Transmitted**: {{data_transmitted}}
|
||||
{{else}}
|
||||
✅ No network activity detected
|
||||
{{/if}}
|
||||
|
||||
### Database Changes
|
||||
|
||||
{{#if database_changes}}
|
||||
{{#each database_changes}}
|
||||
- {{type}}: {{description}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No database changes
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Dependency Analysis
|
||||
|
||||
### System Packages Required
|
||||
{{#if system_packages}}
|
||||
{{#each system_packages}}
|
||||
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}`{{#if version}} ({{version}}){{/if}}{{#unless documented}} - **Not documented in README**{{/unless}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No system package dependencies
|
||||
{{/if}}
|
||||
|
||||
### Language Packages (npm/pip/gem)
|
||||
{{#if language_packages}}
|
||||
{{#each language_packages}}
|
||||
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}@{{version}}`{{#unless documented}} - **Not documented**{{/unless}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No language package dependencies
|
||||
{{/if}}
|
||||
|
||||
### Runtime Requirements
|
||||
{{#if runtime_requirements}}
|
||||
{{#each runtime_requirements}}
|
||||
- {{name}}: {{requirement}}{{#if met}}✅{{else}}❌{{/if}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No special runtime requirements
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Issues
|
||||
|
||||
### Hardcoded Paths Detected
|
||||
{{#if hardcoded_paths}}
|
||||
{{#each hardcoded_paths}}
|
||||
⚠️ `{{path}}` in {{file}}:{{line}}
|
||||
→ **Recommendation**: Use `$HOME` or relative path
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No hardcoded paths detected
|
||||
{{/if}}
|
||||
|
||||
### Security Concerns
|
||||
{{#if security_issues}}
|
||||
{{#each security_issues}}
|
||||
{{severity_icon}} **{{severity}}**: {{description}}
|
||||
Location: {{file}}:{{line}}
|
||||
Recommendation: {{recommendation}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No security issues detected
|
||||
{{/if}}
|
||||
|
||||
### Performance Issues
|
||||
{{#if performance_issues}}
|
||||
{{#each performance_issues}}
|
||||
⚠️ {{description}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No performance issues detected
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Portability Assessment
|
||||
|
||||
### Cross-Platform Compatibility
|
||||
- **Linux**: {{linux_compatible}}
|
||||
- **macOS**: {{macos_compatible}}
|
||||
- **Windows**: {{windows_compatible}}
|
||||
|
||||
### Environment Dependencies
|
||||
{{#if env_dependencies}}
|
||||
{{#each env_dependencies}}
|
||||
- {{name}}: {{status}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No environment-specific dependencies
|
||||
{{/if}}
|
||||
|
||||
### User-Specific Assumptions
|
||||
{{#if user_assumptions}}
|
||||
{{#each user_assumptions}}
|
||||
⚠️ {{description}}
|
||||
{{/each}}
|
||||
{{else}}
|
||||
✅ No user-specific assumptions
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Issues Summary
|
||||
|
||||
### 🔴 HIGH Priority ({{issues_high_count}})
|
||||
{{#each issues_high}}
|
||||
{{index}}. **{{title}}**
|
||||
- Impact: {{impact}}
|
||||
- Location: {{location}}
|
||||
- Fix: {{fix_recommendation}}
|
||||
{{/each}}
|
||||
|
||||
{{#if no_high_issues}}
|
||||
✅ No HIGH priority issues
|
||||
{{/if}}
|
||||
|
||||
### 🟡 MEDIUM Priority ({{issues_medium_count}})
|
||||
{{#each issues_medium}}
|
||||
{{index}}. **{{title}}**
|
||||
- Impact: {{impact}}
|
||||
- Location: {{location}}
|
||||
- Fix: {{fix_recommendation}}
|
||||
{{/each}}
|
||||
|
||||
{{#if no_medium_issues}}
|
||||
✅ No MEDIUM priority issues
|
||||
{{/if}}
|
||||
|
||||
### 🟢 LOW Priority ({{issues_low_count}})
|
||||
{{#each issues_low}}
|
||||
{{index}}. **{{title}}**
|
||||
- Impact: {{impact}}
|
||||
- Fix: {{fix_recommendation}}
|
||||
{{/each}}
|
||||
|
||||
{{#if no_low_issues}}
|
||||
✅ No LOW priority issues
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Required Before Release
|
||||
{{#each required_fixes}}
|
||||
{{index}}. {{recommendation}}
|
||||
{{/each}}
|
||||
|
||||
{{#if no_required_fixes}}
|
||||
✅ No required fixes
|
||||
{{/if}}
|
||||
|
||||
### Suggested Improvements
|
||||
{{#each suggested_improvements}}
|
||||
{{index}}. {{recommendation}}
|
||||
{{/each}}
|
||||
|
||||
### Documentation Updates Needed
|
||||
{{#each documentation_updates}}
|
||||
- {{item}}
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
## Scoring Breakdown
|
||||
|
||||
| Category | Score | Weight | Weighted Score |
|
||||
|----------|-------|--------|----------------|
|
||||
| **Execution** | {{execution_score}}/100 | 25% | {{execution_weighted}} |
|
||||
| **Cleanliness** | {{cleanliness_score}}/100 | 25% | {{cleanliness_weighted}} |
|
||||
| **Security** | {{security_score}}/100 | 30% | {{security_weighted}} |
|
||||
| **Portability** | {{portability_score}}/100 | 10% | {{portability_weighted}} |
|
||||
| **Documentation** | {{documentation_score}}/100 | 10% | {{documentation_weighted}} |
|
||||
| **TOTAL** | | | **{{total_score}}/100** |
|
||||
|
||||
### Grade: {{grade}}
|
||||
|
||||
**Grading Scale:**
|
||||
- A (90-100): Production ready
|
||||
- B (80-89): Ready with minor fixes
|
||||
- C (70-79): Significant improvements needed
|
||||
- D (60-69): Major issues, not recommended
|
||||
- F (0-59): Not safe to use
|
||||
|
||||
---
|
||||
|
||||
## Test Artifacts
|
||||
|
||||
### Snapshots
|
||||
- Before: `{{snapshot_before_path}}`
|
||||
- After: `{{snapshot_after_path}}`
|
||||
|
||||
### Logs
|
||||
- Execution log: `{{execution_log_path}}`
|
||||
- Side effects log: `{{side_effects_log_path}}`
|
||||
|
||||
### Isolation Environment
|
||||
{{#if environment_preserved}}
|
||||
✅ **Preserved for debugging**
|
||||
|
||||
Access instructions:
|
||||
```bash
|
||||
{{access_command}}
|
||||
```
|
||||
{{else}}
|
||||
🗑️ **Cleaned up**
|
||||
{{/if}}
|
||||
|
||||
---
|
||||
|
||||
## Final Verdict
|
||||
|
||||
### Status: {{final_status}}
|
||||
|
||||
{{#if approved}}
|
||||
✅ **APPROVED for public release**
|
||||
|
||||
This skill has passed isolation testing with acceptable results. Address HIGH priority issues before release, and consider MEDIUM/LOW priority improvements in future versions.
|
||||
{{/if}}
|
||||
|
||||
{{#if approved_with_fixes}}
|
||||
⚠️ **APPROVED with required fixes**
|
||||
|
||||
This skill will be ready for public release after addressing the {{issues_high_count}} HIGH priority issue(s) listed above. Retest after fixes.
|
||||
{{/if}}
|
||||
|
||||
{{#if not_approved}}
|
||||
❌ **NOT APPROVED**
|
||||
|
||||
This skill has critical issues that must be addressed before public release. Major refactoring or fixes required. Retest after addressing all HIGH priority issues and reviewing MEDIUM priority items.
|
||||
{{/if}}
|
||||
|
||||
### Next Steps
|
||||
|
||||
{{#each next_steps}}
|
||||
{{index}}. {{step}}
|
||||
{{/each}}
|
||||
|
||||
---
|
||||
|
||||
**Test Completed**: {{completion_time}}
|
||||
**Report Version**: 1.0
|
||||
**Tester**: {{tester_name}}
|
||||
|
||||
---
|
||||
|
||||
*This report was generated by skill-isolation-tester*
|
||||
392
skills/meta/skill-isolation-tester/test-templates/README.md
Normal file
392
skills/meta/skill-isolation-tester/test-templates/README.md
Normal file
@@ -0,0 +1,392 @@
|
||||
# Skill Test Templates
|
||||
|
||||
Production-ready test templates for validating Claude Code skills in isolated environments.
|
||||
|
||||
## Overview
|
||||
|
||||
These templates provide standardized testing workflows for different skill types. Each template includes:
|
||||
- Pre-flight environment validation
|
||||
- Before/after snapshots for comparison
|
||||
- Comprehensive safety and security checks
|
||||
- Detailed reporting with pass/fail criteria
|
||||
- Automatic cleanup on exit (success or failure)
|
||||
|
||||
## CI/CD Integration with JSON Output
|
||||
|
||||
All test templates support JSON output for integration with CI/CD pipelines. The JSON reporter generates:
|
||||
- **Structured JSON** - Machine-readable test results
|
||||
- **JUnit XML** - Compatible with Jenkins, GitLab CI, GitHub Actions
|
||||
- **Markdown Summary** - Human-readable reports for GitHub Actions
|
||||
|
||||
**Enable JSON output:**
|
||||
```bash
|
||||
export JSON_ENABLED=true
|
||||
./test-templates/docker-skill-test-json.sh my-skill
|
||||
```
|
||||
|
||||
**Output files:**
|
||||
- `test-report.json` - Full structured test data
|
||||
- `test-report.junit.xml` - JUnit format for CI systems
|
||||
- `test-report.md` - Markdown summary
|
||||
|
||||
**JSON Report Structure:**
|
||||
```json
|
||||
{
|
||||
"test_name": "docker-skill-test",
|
||||
"skill_name": "my-skill",
|
||||
"timestamp": "2025-11-02T12:00:00Z",
|
||||
"status": "passed",
|
||||
"duration_seconds": 45,
|
||||
"exit_code": 0,
|
||||
"metrics": {
|
||||
"containers_created": 2,
|
||||
"images_created": 1,
|
||||
"execution_duration_seconds": 12
|
||||
},
|
||||
"issues": [],
|
||||
"recommendations": []
|
||||
}
|
||||
```
|
||||
|
||||
**GitHub Actions Integration:**
|
||||
```yaml
|
||||
- name: Test Skill
|
||||
run: |
|
||||
export JSON_ENABLED=true
|
||||
./test-templates/docker-skill-test-json.sh my-skill
|
||||
|
||||
- name: Upload Test Results
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: test-results
|
||||
path: /tmp/skill-test-*/test-report.*
|
||||
```
|
||||
|
||||
See `lib/json-reporter.sh` for full API documentation.
|
||||
|
||||
---
|
||||
|
||||
## Available Templates
|
||||
|
||||
### 1. Docker Skill Test (`docker-skill-test.sh`)
|
||||
|
||||
**Use for skills that:**
|
||||
- Start or manage Docker containers
|
||||
- Build Docker images
|
||||
- Work with Docker volumes, networks, or compose files
|
||||
- Require Docker daemon access
|
||||
|
||||
**Features:**
|
||||
- Tracks Docker resource creation (containers, images, volumes, networks)
|
||||
- Detects orphaned containers
|
||||
- Validates cleanup behavior
|
||||
- Resource limit enforcement
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
chmod +x test-templates/docker-skill-test.sh
|
||||
./test-templates/docker-skill-test.sh my-docker-skill
|
||||
```
|
||||
|
||||
**Customization:**
|
||||
Edit the skill execution command on line ~178:
|
||||
```bash
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
./skill.sh test-mode # <-- Customize this
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. API Skill Test (`api-skill-test.sh`)
|
||||
|
||||
**Use for skills that:**
|
||||
- Make HTTP/HTTPS requests to external APIs
|
||||
- Require API keys or authentication
|
||||
- Interact with web services
|
||||
- Need network access
|
||||
|
||||
**Features:**
|
||||
- Network traffic monitoring
|
||||
- API call detection and counting
|
||||
- API key/secret leak detection
|
||||
- Rate limiting validation
|
||||
- HTTPS enforcement checking
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
chmod +x test-templates/api-skill-test.sh
|
||||
./test-templates/api-skill-test.sh my-api-skill
|
||||
```
|
||||
|
||||
**Optional: Enable network capture:**
|
||||
```bash
|
||||
# Requires tcpdump and sudo
|
||||
sudo apt-get install tcpdump # or brew install tcpdump
|
||||
./test-templates/api-skill-test.sh my-api-skill
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. File Manipulation Skill Test (`file-manipulation-skill-test.sh`)
|
||||
|
||||
**Use for skills that:**
|
||||
- Create, read, update, or delete files
|
||||
- Modify configuration files
|
||||
- Generate reports or artifacts
|
||||
- Perform filesystem operations
|
||||
|
||||
**Features:**
|
||||
- Complete filesystem diff (added/removed/modified files)
|
||||
- File permission validation
|
||||
- Sensitive data scanning
|
||||
- Temp file cleanup verification
|
||||
- MD5 checksum comparison
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
chmod +x test-templates/file-manipulation-skill-test.sh
|
||||
./test-templates/file-manipulation-skill-test.sh my-file-skill
|
||||
```
|
||||
|
||||
**Customization:**
|
||||
Add your own test files to the workspace (lines 54-70):
|
||||
```bash
|
||||
cat > "$TEST_DIR/test-workspace/your-file.txt" <<'EOF'
|
||||
Your test content here
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Git Skill Test (`git-skill-test.sh`)
|
||||
|
||||
**Use for skills that:**
|
||||
- Create commits, branches, or tags
|
||||
- Modify git history or configuration
|
||||
- Work with git worktrees
|
||||
- Interact with remote repositories
|
||||
|
||||
**Features:**
|
||||
- Git state comparison (commits, branches, tags)
|
||||
- Working tree cleanliness validation
|
||||
- Force operation detection
|
||||
- History rewriting detection
|
||||
- Dangling commit detection
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
chmod +x test-templates/git-skill-test.sh
|
||||
./test-templates/git-skill-test.sh my-git-skill
|
||||
```
|
||||
|
||||
**Customization:**
|
||||
Modify the test repository setup (lines 59-81) to match your skill's requirements.
|
||||
|
||||
---
|
||||
|
||||
## Common Usage Patterns
|
||||
|
||||
### Basic Test Execution
|
||||
|
||||
```bash
|
||||
# Run test for a specific skill
|
||||
./test-templates/docker-skill-test.sh my-skill-name
|
||||
|
||||
# Keep container for debugging
|
||||
export SKILL_TEST_KEEP_CONTAINER="true"
|
||||
./test-templates/docker-skill-test.sh my-skill-name
|
||||
|
||||
# Keep images after test
|
||||
export SKILL_TEST_REMOVE_IMAGES="false"
|
||||
./test-templates/docker-skill-test.sh my-skill-name
|
||||
```
|
||||
|
||||
### Custom Resource Limits
|
||||
|
||||
```bash
|
||||
# Set custom memory/CPU limits
|
||||
export SKILL_TEST_MEMORY_LIMIT="1g"
|
||||
export SKILL_TEST_CPU_LIMIT="2.0"
|
||||
./test-templates/docker-skill-test.sh my-skill-name
|
||||
```
|
||||
|
||||
### Parallel Testing
|
||||
|
||||
```bash
|
||||
# Test multiple skills in parallel
|
||||
for skill in skill1 skill2 skill3; do
|
||||
./test-templates/docker-skill-test.sh "$skill" &
|
||||
done
|
||||
wait
|
||||
echo "All tests complete!"
|
||||
```
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
```bash
|
||||
# Exit code 0 = pass, 1 = fail
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
SKILLS=(
|
||||
"skill-creator"
|
||||
"claude-code-otel-setup"
|
||||
"playwright-e2e-automation"
|
||||
)
|
||||
|
||||
for skill in "${SKILLS[@]}"; do
|
||||
echo "Testing $skill..."
|
||||
./test-templates/docker-skill-test.sh "$skill" || {
|
||||
echo "❌ $skill failed!"
|
||||
exit 1
|
||||
}
|
||||
done
|
||||
|
||||
echo "✅ All skills passed!"
|
||||
```
|
||||
|
||||
## Customizing Templates
|
||||
|
||||
### Add Custom Validation
|
||||
|
||||
Insert your own checks before the "Generate Test Report" section:
|
||||
|
||||
```bash
|
||||
# ============================================================================
|
||||
# Custom Validation
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Custom Checks ==="
|
||||
|
||||
# Your custom checks here
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
# Example: Check if specific file exists
|
||||
test -f /workspace/expected-output.txt || {
|
||||
echo 'ERROR: Expected output file not found'
|
||||
exit 1
|
||||
}
|
||||
"
|
||||
```
|
||||
|
||||
### Modify Execution Command
|
||||
|
||||
Each template has a skill execution section. Customize the command to match your skill's interface:
|
||||
|
||||
```bash
|
||||
# Example: Run skill with arguments
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
./skill.sh --mode=test --output=/workspace/results
|
||||
"
|
||||
|
||||
# Example: Source skill as library
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
source /root/.claude/skills/$SKILL_NAME/lib.sh
|
||||
run_skill_tests
|
||||
"
|
||||
```
|
||||
|
||||
### Add Pre-Test Setup
|
||||
|
||||
Insert setup steps after the "Build Test Environment" section:
|
||||
|
||||
```bash
|
||||
# Install additional dependencies
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
apt-get update && apt-get install -y your-package
|
||||
"
|
||||
|
||||
# Set environment variables
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
export SKILL_CONFIG_PATH=/etc/skill-config.json
|
||||
"
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
All templates support these environment variables:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SKILL_TEST_KEEP_CONTAINER` | `false` | Keep container after test for debugging |
|
||||
| `SKILL_TEST_REMOVE_IMAGES` | `true` | Remove test images after completion |
|
||||
| `SKILL_TEST_MEMORY_LIMIT` | `512m` | Container memory limit |
|
||||
| `SKILL_TEST_CPU_LIMIT` | `1.0` | Container CPU limit (cores) |
|
||||
| `SKILL_TEST_TEMP_DIR` | `/tmp/skill-test-*` | Temporary directory for test artifacts |
|
||||
|
||||
## Exit Codes
|
||||
|
||||
- `0` - Test passed (skill executed successfully)
|
||||
- `1` - Test failed (skill execution error or validation failure)
|
||||
- `>1` - Other errors (environment setup, Docker issues, etc.)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Docker daemon not running"
|
||||
```bash
|
||||
# macOS
|
||||
open -a Docker
|
||||
|
||||
# Linux
|
||||
sudo systemctl start docker
|
||||
```
|
||||
|
||||
### "Permission denied" errors
|
||||
```bash
|
||||
# Add user to docker group
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
### Container hangs or never exits
|
||||
```bash
|
||||
# Set a timeout in your skill execution
|
||||
timeout 300 ./test-templates/docker-skill-test.sh my-skill
|
||||
```
|
||||
|
||||
### Need to inspect failed test
|
||||
```bash
|
||||
# Keep container after failure
|
||||
export SKILL_TEST_KEEP_CONTAINER="true"
|
||||
./test-templates/docker-skill-test.sh my-skill
|
||||
|
||||
# Inspect container
|
||||
docker start -ai <container-id>
|
||||
docker logs <container-id>
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Run tests before committing** - Catch environment-specific bugs early
|
||||
2. **Test in clean environment** - Don't rely on local configs or files
|
||||
3. **Validate cleanup** - Ensure skills don't leave orphaned resources
|
||||
4. **Check for secrets** - Never commit API keys or sensitive data
|
||||
5. **Document dependencies** - List all required packages and tools
|
||||
6. **Use resource limits** - Prevent runaway processes
|
||||
7. **Review diffs carefully** - Understand all file system changes
|
||||
|
||||
## Contributing
|
||||
|
||||
To add a new test template:
|
||||
|
||||
1. Copy an existing template as a starting point
|
||||
2. Customize for your skill type
|
||||
3. Add comprehensive validation checks
|
||||
4. Update this README with usage documentation
|
||||
5. Test your template with at least 3 different skills
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `../lib/docker-helpers.sh` - Shared helper functions
|
||||
- `../modes/mode2-docker.md` - Docker isolation mode documentation
|
||||
- `../skill.md` - Main skill documentation
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
- Check the skill logs: `docker logs <container-id>`
|
||||
- Review test artifacts in `/tmp/skill-test-*/`
|
||||
- Consult the helper library: `lib/docker-helpers.sh`
|
||||
@@ -0,0 +1,317 @@
|
||||
#!/bin/bash
|
||||
# Test Template for API-Calling Skills
|
||||
# Use this template when testing skills that:
|
||||
# - Make HTTP/HTTPS requests to external APIs
|
||||
# - Require API keys or authentication
|
||||
# - Need network access
|
||||
# - Interact with web services
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
SKILL_NAME="${1:-example-api-skill}"
|
||||
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
|
||||
TEST_ID="$(date +%s)"
|
||||
TEST_DIR="/tmp/skill-test-$TEST_ID"
|
||||
|
||||
# ============================================================================
|
||||
# Load Helper Library
|
||||
# ============================================================================
|
||||
|
||||
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
|
||||
if [[ ! -f "$HELPER_LIB" ]]; then
|
||||
echo "ERROR: Helper library not found: $HELPER_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "$HELPER_LIB"
|
||||
|
||||
# ============================================================================
|
||||
# Setup Cleanup Trap
|
||||
# ============================================================================
|
||||
|
||||
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
|
||||
export SKILL_TEST_KEEP_CONTAINER="false"
|
||||
export SKILL_TEST_REMOVE_IMAGES="true"
|
||||
|
||||
trap cleanup_on_exit EXIT
|
||||
|
||||
# ============================================================================
|
||||
# Pre-flight Checks
|
||||
# ============================================================================
|
||||
|
||||
echo "=== API Skill Test: $SKILL_NAME ==="
|
||||
echo "Test ID: $TEST_ID"
|
||||
echo ""
|
||||
|
||||
# Validate skill exists
|
||||
if [[ ! -d "$SKILL_PATH" ]]; then
|
||||
echo "ERROR: Skill not found: $SKILL_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate Docker environment
|
||||
preflight_check_docker || exit 1
|
||||
|
||||
# Check internet connectivity
|
||||
if ! curl -s --max-time 5 https://www.google.com > /dev/null 2>&1; then
|
||||
echo "⚠ WARNING: No internet connectivity detected"
|
||||
echo " API skill may fail if it requires external network access"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Build Test Environment
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Building Test Environment ==="
|
||||
|
||||
mkdir -p "$TEST_DIR"
|
||||
|
||||
# Create test Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<EOF
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install dependencies for API testing
|
||||
RUN apt-get update && apt-get install -y \\
|
||||
curl \\
|
||||
jq \\
|
||||
ca-certificates \\
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy skill under test
|
||||
COPY skill/ /root/.claude/skills/$SKILL_NAME/
|
||||
|
||||
WORKDIR /root
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
|
||||
# Copy skill to test directory
|
||||
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
|
||||
|
||||
# Build test image
|
||||
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
|
||||
echo "ERROR: Failed to build test image"
|
||||
exit 1
|
||||
}
|
||||
|
||||
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
|
||||
|
||||
# ============================================================================
|
||||
# Network Monitoring Setup
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Setting Up Network Monitoring ==="
|
||||
|
||||
# Create network monitor log
|
||||
NETWORK_LOG="$TEST_DIR/network-activity.log"
|
||||
touch "$NETWORK_LOG"
|
||||
|
||||
# Start tcpdump in background (if available)
|
||||
if command -v tcpdump &> /dev/null; then
|
||||
echo "Starting network capture..."
|
||||
sudo tcpdump -i any -w "$TEST_DIR/network-capture.pcap" &
|
||||
TCPDUMP_PID=$!
|
||||
echo "tcpdump PID: $TCPDUMP_PID"
|
||||
else
|
||||
echo "tcpdump not available - skipping network capture"
|
||||
TCPDUMP_PID=""
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Run Skill in Container
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Skill in Isolated Container ==="
|
||||
|
||||
# Start container with DNS configuration
|
||||
safe_docker_run "skill-test:$SKILL_NAME" \
|
||||
--dns 8.8.8.8 \
|
||||
--dns 8.8.4.4 \
|
||||
bash -c "sleep infinity" || {
|
||||
echo "ERROR: Failed to start test container"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Execute skill and capture network activity
|
||||
echo "Executing skill..."
|
||||
START_TIME=$(date +%s)
|
||||
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
# Add your skill execution command here
|
||||
# Example: ./api-skill.sh --test-mode
|
||||
echo 'Skill execution placeholder - customize this for your skill'
|
||||
|
||||
# Log any curl/wget/http calls made
|
||||
if command -v curl &> /dev/null; then
|
||||
echo 'curl is available in container'
|
||||
fi
|
||||
if command -v wget &> /dev/null; then
|
||||
echo 'wget is available in container'
|
||||
fi
|
||||
" 2>&1 | tee "$NETWORK_LOG" || {
|
||||
EXEC_EXIT_CODE=$?
|
||||
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
|
||||
|
||||
# Stop network capture
|
||||
if [[ -n "$TCPDUMP_PID" ]]; then
|
||||
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
exit "$EXEC_EXIT_CODE"
|
||||
}
|
||||
|
||||
END_TIME=$(date +%s)
|
||||
EXECUTION_TIME=$((END_TIME - START_TIME))
|
||||
|
||||
# Stop network capture
|
||||
if [[ -n "$TCPDUMP_PID" ]]; then
|
||||
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
|
||||
echo "Network capture saved to: $TEST_DIR/network-capture.pcap"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Analyze Network Activity
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Analyzing Network Activity ==="
|
||||
|
||||
# Check for API calls in logs
|
||||
echo "Searching for HTTP/HTTPS requests..."
|
||||
|
||||
API_CALLS=$(grep -iE "http://|https://|curl|wget|GET|POST|PUT|DELETE" "$NETWORK_LOG" || true)
|
||||
|
||||
if [[ -n "$API_CALLS" ]]; then
|
||||
echo "Detected API calls:"
|
||||
echo "$API_CALLS"
|
||||
|
||||
# Extract unique domains
|
||||
DOMAINS=$(echo "$API_CALLS" | grep -oE "https?://[^/\"]+" | sort -u || true)
|
||||
if [[ -n "$DOMAINS" ]]; then
|
||||
echo ""
|
||||
echo "Unique API endpoints:"
|
||||
echo "$DOMAINS"
|
||||
fi
|
||||
else
|
||||
echo "No obvious API calls detected in logs"
|
||||
fi
|
||||
|
||||
# Check container network stats
|
||||
echo ""
|
||||
echo "Container network statistics:"
|
||||
docker stats --no-stream --format "table {{.Name}}\t{{.NetIO}}" "$SKILL_TEST_CONTAINER_ID"
|
||||
|
||||
# ============================================================================
|
||||
# Validate API Key Handling
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Validating API Key Security ==="
|
||||
|
||||
# Check if API keys appear in logs (security concern)
|
||||
POTENTIAL_KEYS=$(grep -iE "api[-_]?key|token|secret|password|bearer" "$NETWORK_LOG" | grep -v "API_KEY=" || true)
|
||||
|
||||
if [[ -n "$POTENTIAL_KEYS" ]]; then
|
||||
echo "⚠ WARNING: Potential API keys/secrets found in logs:"
|
||||
echo "$POTENTIAL_KEYS"
|
||||
echo ""
|
||||
echo "SECURITY ISSUE: API keys should NOT appear in logs!"
|
||||
echo " - Use environment variables instead"
|
||||
echo " - Redact sensitive data in log output"
|
||||
fi
|
||||
|
||||
# Check for hardcoded endpoints
|
||||
HARDCODED_URLS=$(grep -rn "http://" "$SKILL_PATH" 2>/dev/null | grep -v "example.com" || true)
|
||||
if [[ -n "$HARDCODED_URLS" ]]; then
|
||||
echo "⚠ WARNING: Hardcoded HTTP URLs found (should use HTTPS):"
|
||||
echo "$HARDCODED_URLS"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Rate Limiting Check
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Checking Rate Limiting Behavior ==="
|
||||
|
||||
# Count number of requests made
|
||||
REQUEST_COUNT=$(grep -icE "GET|POST|PUT|DELETE" "$NETWORK_LOG" || echo "0")
|
||||
echo "Total HTTP requests detected: $REQUEST_COUNT"
|
||||
|
||||
if [[ $REQUEST_COUNT -gt 100 ]]; then
|
||||
echo "⚠ WARNING: High number of API requests ($REQUEST_COUNT)"
|
||||
echo " - Consider implementing rate limiting"
|
||||
echo " - Use caching to reduce API calls"
|
||||
echo " - Check for request loops"
|
||||
fi
|
||||
|
||||
REQUESTS_PER_SECOND=$((REQUEST_COUNT / EXECUTION_TIME))
|
||||
echo "Requests per second: $REQUESTS_PER_SECOND"
|
||||
|
||||
# ============================================================================
|
||||
# Generate Test Report
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Test Report ==="
|
||||
echo ""
|
||||
|
||||
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
|
||||
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
echo "✅ TEST PASSED"
|
||||
else
|
||||
echo "❌ TEST FAILED"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Summary:"
|
||||
echo " - Exit code: $CONTAINER_EXIT_CODE"
|
||||
echo " - Execution time: ${EXECUTION_TIME}s"
|
||||
echo " - API requests: $REQUEST_COUNT"
|
||||
echo " - Network log: $NETWORK_LOG"
|
||||
|
||||
echo ""
|
||||
echo "Security Checklist:"
|
||||
if [[ -z "$POTENTIAL_KEYS" ]]; then
|
||||
echo " ✓ No API keys in logs"
|
||||
else
|
||||
echo " ✗ API keys found in logs"
|
||||
fi
|
||||
|
||||
if [[ -z "$HARDCODED_URLS" ]]; then
|
||||
echo " ✓ No hardcoded HTTP URLs"
|
||||
else
|
||||
echo " ✗ Hardcoded HTTP URLs found"
|
||||
fi
|
||||
|
||||
if [[ $REQUEST_COUNT -lt 100 ]]; then
|
||||
echo " ✓ Reasonable request volume"
|
||||
else
|
||||
echo " ✗ High request volume"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Recommendations:"
|
||||
echo " - Document all external API dependencies"
|
||||
echo " - Implement request caching where possible"
|
||||
echo " - Use exponential backoff for retries"
|
||||
echo " - Respect API rate limits"
|
||||
echo " - Use HTTPS for all API calls"
|
||||
echo " - Never log API keys or secrets"
|
||||
|
||||
# Exit with appropriate code
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,302 @@
|
||||
#!/bin/bash
|
||||
# Test Template for Docker-Based Skills with JSON Output
|
||||
# This is an enhanced version of docker-skill-test.sh with CI/CD integration
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
SKILL_NAME="${1:-example-docker-skill}"
|
||||
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
|
||||
TEST_ID="$(date +%s)"
|
||||
TEST_DIR="/tmp/skill-test-$TEST_ID"
|
||||
|
||||
# JSON reporting
|
||||
export JSON_REPORT_FILE="$TEST_DIR/test-report.json"
|
||||
export JSON_ENABLED="${JSON_ENABLED:-true}"
|
||||
|
||||
# ============================================================================
|
||||
# Load Helper Libraries
|
||||
# ============================================================================
|
||||
|
||||
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
|
||||
JSON_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/json-reporter.sh"
|
||||
|
||||
if [[ ! -f "$HELPER_LIB" ]]; then
|
||||
echo "ERROR: Helper library not found: $HELPER_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -f "$JSON_LIB" ]]; then
|
||||
echo "ERROR: JSON reporter library not found: $JSON_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "$HELPER_LIB"
|
||||
# shellcheck source=/dev/null
|
||||
source "$JSON_LIB"
|
||||
|
||||
# ============================================================================
|
||||
# Setup Cleanup Trap
|
||||
# ============================================================================
|
||||
|
||||
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
|
||||
export SKILL_TEST_KEEP_CONTAINER="false"
|
||||
export SKILL_TEST_REMOVE_IMAGES="true"
|
||||
|
||||
cleanup_and_finalize() {
|
||||
local exit_code=$?
|
||||
local end_time=$(date +%s)
|
||||
local duration=$((end_time - START_TIME))
|
||||
|
||||
# Finalize JSON report
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_finalize "$exit_code" "$duration"
|
||||
export_all_formats "$TEST_DIR/test-report"
|
||||
fi
|
||||
|
||||
# Standard cleanup
|
||||
cleanup_on_exit
|
||||
|
||||
exit "$exit_code"
|
||||
}
|
||||
|
||||
trap cleanup_and_finalize EXIT
|
||||
|
||||
# ============================================================================
|
||||
# Pre-flight Checks
|
||||
# ============================================================================
|
||||
|
||||
echo "=== Docker Skill Test (JSON Mode): $SKILL_NAME ==="
|
||||
echo "Test ID: $TEST_ID"
|
||||
echo ""
|
||||
|
||||
# Create test directory
|
||||
mkdir -p "$TEST_DIR"
|
||||
|
||||
# Initialize JSON report
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_init "docker-skill-test" "$SKILL_NAME"
|
||||
fi
|
||||
|
||||
# Validate skill exists
|
||||
if [[ ! -d "$SKILL_PATH" ]]; then
|
||||
echo "ERROR: Skill not found: $SKILL_PATH"
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "setup" "Skill directory not found: $SKILL_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate Docker environment
|
||||
if ! preflight_check_docker; then
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "environment" "Docker pre-flight checks failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Baseline Measurements (Before)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Baseline Measurements ==="
|
||||
|
||||
START_TIME=$(date +%s)
|
||||
|
||||
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
|
||||
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
|
||||
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
|
||||
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
|
||||
|
||||
echo "Before test:"
|
||||
echo " Containers: $BEFORE_CONTAINERS"
|
||||
echo " Images: $BEFORE_IMAGES"
|
||||
echo " Volumes: $BEFORE_VOLUMES"
|
||||
echo " Networks: $BEFORE_NETWORKS"
|
||||
|
||||
# Record baseline in JSON
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_add_metric "baseline_containers" "$BEFORE_CONTAINERS"
|
||||
json_add_metric "baseline_images" "$BEFORE_IMAGES"
|
||||
json_add_metric "baseline_volumes" "$BEFORE_VOLUMES"
|
||||
json_add_metric "baseline_networks" "$BEFORE_NETWORKS"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Build Test Environment
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Building Test Environment ==="
|
||||
|
||||
# Create test Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<EOF
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install dependencies
|
||||
RUN apt-get update && apt-get install -y \\
|
||||
curl \\
|
||||
git \\
|
||||
nodejs \\
|
||||
npm \\
|
||||
docker.io \\
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Claude Code (mock for testing)
|
||||
RUN mkdir -p /root/.claude/skills
|
||||
|
||||
# Copy skill under test
|
||||
COPY skill/ /root/.claude/skills/$SKILL_NAME/
|
||||
|
||||
WORKDIR /root
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
|
||||
# Copy skill to test directory
|
||||
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
|
||||
|
||||
# Build test image
|
||||
BUILD_START=$(date +%s)
|
||||
if ! safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME"; then
|
||||
echo "ERROR: Failed to build test image"
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "build" "Docker image build failed"
|
||||
exit 1
|
||||
fi
|
||||
BUILD_END=$(date +%s)
|
||||
BUILD_DURATION=$((BUILD_END - BUILD_START))
|
||||
|
||||
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
|
||||
|
||||
# Record build metrics
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_add_metric "build_duration_seconds" "$BUILD_DURATION" "seconds"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Run Skill in Container
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Skill in Isolated Container ==="
|
||||
|
||||
# Start container with Docker socket access
|
||||
if ! safe_docker_run "skill-test:$SKILL_NAME" \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
bash -c "sleep infinity"; then
|
||||
echo "ERROR: Failed to start test container"
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "runtime" "Container failed to start"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Execute skill
|
||||
echo "Executing skill..."
|
||||
EXEC_START=$(date +%s)
|
||||
|
||||
EXEC_OUTPUT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
echo 'Skill execution placeholder - customize this for your skill'
|
||||
" 2>&1) || {
|
||||
EXEC_EXIT_CODE=$?
|
||||
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "execution" "Skill failed with exit code $EXEC_EXIT_CODE"
|
||||
exit "$EXEC_EXIT_CODE"
|
||||
}
|
||||
|
||||
EXEC_END=$(date +%s)
|
||||
EXEC_DURATION=$((EXEC_END - EXEC_START))
|
||||
|
||||
# Record execution metrics
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_add_metric "execution_duration_seconds" "$EXEC_DURATION" "seconds"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Collect Measurements (After)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Collecting Post-Execution Measurements ==="
|
||||
|
||||
sleep 2 # Wait for async operations
|
||||
|
||||
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
|
||||
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
|
||||
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
|
||||
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
|
||||
|
||||
CONTAINERS_DELTA=$((AFTER_CONTAINERS - BEFORE_CONTAINERS))
|
||||
IMAGES_DELTA=$((AFTER_IMAGES - BEFORE_IMAGES))
|
||||
VOLUMES_DELTA=$((AFTER_VOLUMES - BEFORE_VOLUMES))
|
||||
NETWORKS_DELTA=$((AFTER_NETWORKS - BEFORE_NETWORKS))
|
||||
|
||||
echo "After test:"
|
||||
echo " Containers: $AFTER_CONTAINERS (delta: $CONTAINERS_DELTA)"
|
||||
echo " Images: $AFTER_IMAGES (delta: $IMAGES_DELTA)"
|
||||
echo " Volumes: $AFTER_VOLUMES (delta: $VOLUMES_DELTA)"
|
||||
echo " Networks: $AFTER_NETWORKS (delta: $NETWORKS_DELTA)"
|
||||
|
||||
# Record changes in JSON
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_add_metric "containers_created" "$CONTAINERS_DELTA"
|
||||
json_add_metric "images_created" "$IMAGES_DELTA"
|
||||
json_add_metric "volumes_created" "$VOLUMES_DELTA"
|
||||
json_add_metric "networks_created" "$NETWORKS_DELTA"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Validate Cleanup Behavior
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Validating Skill Cleanup ==="
|
||||
|
||||
# Check for orphaned containers
|
||||
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
|
||||
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
json_add_issue "warning" "cleanup" "Found $ORPHANED_CONTAINERS orphaned containers"
|
||||
json_add_recommendation "Cleanup" "Implement automatic container cleanup in skill"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Generate Test Report
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Test Report ==="
|
||||
echo ""
|
||||
|
||||
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
|
||||
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
echo "✅ TEST PASSED"
|
||||
else
|
||||
echo "❌ TEST FAILED"
|
||||
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "test-failure" "Container exited with code $CONTAINER_EXIT_CODE"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Summary:"
|
||||
echo " - Exit code: $CONTAINER_EXIT_CODE"
|
||||
echo " - Build duration: ${BUILD_DURATION}s"
|
||||
echo " - Execution duration: ${EXEC_DURATION}s"
|
||||
echo " - Docker resources created: $CONTAINERS_DELTA containers, $IMAGES_DELTA images, $VOLUMES_DELTA volumes, $NETWORKS_DELTA networks"
|
||||
|
||||
if [[ "$JSON_ENABLED" == "true" ]]; then
|
||||
echo ""
|
||||
echo "JSON reports will be generated at:"
|
||||
echo " - $TEST_DIR/test-report.json"
|
||||
echo " - $TEST_DIR/test-report.junit.xml"
|
||||
echo " - $TEST_DIR/test-report.md"
|
||||
fi
|
||||
|
||||
# Exit with appropriate code (cleanup_and_finalize will handle JSON)
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,236 @@
|
||||
#!/bin/bash
|
||||
# Test Template for Docker-Based Skills
|
||||
# Use this template when testing skills that:
|
||||
# - Start Docker containers
|
||||
# - Build Docker images
|
||||
# - Manage Docker volumes/networks
|
||||
# - Require Docker daemon access
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
SKILL_NAME="${1:-example-docker-skill}"
|
||||
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
|
||||
TEST_ID="$(date +%s)"
|
||||
TEST_DIR="/tmp/skill-test-$TEST_ID"
|
||||
|
||||
# ============================================================================
|
||||
# Load Helper Library
|
||||
# ============================================================================
|
||||
|
||||
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
|
||||
if [[ ! -f "$HELPER_LIB" ]]; then
|
||||
echo "ERROR: Helper library not found: $HELPER_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "$HELPER_LIB"
|
||||
|
||||
# ============================================================================
|
||||
# Setup Cleanup Trap
|
||||
# ============================================================================
|
||||
|
||||
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
|
||||
export SKILL_TEST_KEEP_CONTAINER="false"
|
||||
export SKILL_TEST_REMOVE_IMAGES="true"
|
||||
|
||||
trap cleanup_on_exit EXIT
|
||||
|
||||
# ============================================================================
|
||||
# Pre-flight Checks
|
||||
# ============================================================================
|
||||
|
||||
echo "=== Docker Skill Test: $SKILL_NAME ==="
|
||||
echo "Test ID: $TEST_ID"
|
||||
echo ""
|
||||
|
||||
# Validate skill exists
|
||||
if [[ ! -d "$SKILL_PATH" ]]; then
|
||||
echo "ERROR: Skill not found: $SKILL_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate Docker environment
|
||||
preflight_check_docker || exit 1
|
||||
|
||||
# ============================================================================
|
||||
# Baseline Measurements (Before)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Baseline Measurements ==="
|
||||
|
||||
# Count Docker resources before test
|
||||
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
|
||||
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
|
||||
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
|
||||
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
|
||||
|
||||
echo "Before test:"
|
||||
echo " Containers: $BEFORE_CONTAINERS"
|
||||
echo " Images: $BEFORE_IMAGES"
|
||||
echo " Volumes: $BEFORE_VOLUMES"
|
||||
echo " Networks: $BEFORE_NETWORKS"
|
||||
|
||||
# ============================================================================
|
||||
# Build Test Environment
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Building Test Environment ==="
|
||||
|
||||
mkdir -p "$TEST_DIR"
|
||||
|
||||
# Create test Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<EOF
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install dependencies
|
||||
RUN apt-get update && apt-get install -y \\
|
||||
curl \\
|
||||
git \\
|
||||
nodejs \\
|
||||
npm \\
|
||||
docker.io \\
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install Claude Code (mock for testing)
|
||||
RUN mkdir -p /root/.claude/skills
|
||||
|
||||
# Copy skill under test
|
||||
COPY skill/ /root/.claude/skills/$SKILL_NAME/
|
||||
|
||||
WORKDIR /root
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
|
||||
# Copy skill to test directory
|
||||
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
|
||||
|
||||
# Build test image
|
||||
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
|
||||
echo "ERROR: Failed to build test image"
|
||||
exit 1
|
||||
}
|
||||
|
||||
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
|
||||
|
||||
# ============================================================================
|
||||
# Run Skill in Container
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Skill in Isolated Container ==="
|
||||
|
||||
# Start container with Docker socket access (for Docker-in-Docker skills)
|
||||
safe_docker_run "skill-test:$SKILL_NAME" \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
bash -c "sleep infinity" || {
|
||||
echo "ERROR: Failed to start test container"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Execute skill (customize this command based on your skill's interface)
|
||||
echo "Executing skill..."
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
# Add your skill execution command here
|
||||
# Example: ./skill.sh test-mode
|
||||
echo 'Skill execution placeholder - customize this for your skill'
|
||||
" || {
|
||||
EXEC_EXIT_CODE=$?
|
||||
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
|
||||
exit "$EXEC_EXIT_CODE"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Collect Measurements (After)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Collecting Post-Execution Measurements ==="
|
||||
|
||||
# Wait for async operations to complete
|
||||
sleep 2
|
||||
|
||||
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
|
||||
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
|
||||
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
|
||||
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
|
||||
|
||||
echo "After test:"
|
||||
echo " Containers: $AFTER_CONTAINERS (delta: $((AFTER_CONTAINERS - BEFORE_CONTAINERS)))"
|
||||
echo " Images: $AFTER_IMAGES (delta: $((AFTER_IMAGES - BEFORE_IMAGES)))"
|
||||
echo " Volumes: $AFTER_VOLUMES (delta: $((AFTER_VOLUMES - BEFORE_VOLUMES)))"
|
||||
echo " Networks: $AFTER_NETWORKS (delta: $((AFTER_NETWORKS - BEFORE_NETWORKS)))"
|
||||
|
||||
# ============================================================================
|
||||
# Validate Cleanup Behavior
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Validating Skill Cleanup ==="
|
||||
|
||||
# Check for orphaned containers
|
||||
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
|
||||
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
|
||||
docker ps -a --filter "label=created-by-skill=$SKILL_NAME"
|
||||
fi
|
||||
|
||||
# Check for unlabeled containers (potential orphans)
|
||||
SKILL_CONTAINERS=$(docker ps -a --filter "name=$SKILL_NAME" --format '{{.ID}}' | wc -l)
|
||||
if [[ $SKILL_CONTAINERS -gt 1 ]]; then
|
||||
echo "⚠ WARNING: Found $SKILL_CONTAINERS containers with skill name pattern"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Generate Test Report
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Test Report ==="
|
||||
echo ""
|
||||
|
||||
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
|
||||
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
echo "✅ TEST PASSED"
|
||||
echo ""
|
||||
echo "Summary:"
|
||||
echo " - Skill executed successfully"
|
||||
echo " - Exit code: 0"
|
||||
echo " - Container cleanup: Will be handled by trap"
|
||||
else
|
||||
echo "❌ TEST FAILED"
|
||||
echo ""
|
||||
echo "Summary:"
|
||||
echo " - Skill execution failed"
|
||||
echo " - Exit code: $CONTAINER_EXIT_CODE"
|
||||
echo " - Check logs: docker logs $SKILL_TEST_CONTAINER_ID"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Docker Resources Created:"
|
||||
echo " - Containers: $((AFTER_CONTAINERS - BEFORE_CONTAINERS))"
|
||||
echo " - Images: $((AFTER_IMAGES - BEFORE_IMAGES))"
|
||||
echo " - Volumes: $((AFTER_VOLUMES - BEFORE_VOLUMES))"
|
||||
echo " - Networks: $((AFTER_NETWORKS - BEFORE_NETWORKS))"
|
||||
|
||||
echo ""
|
||||
echo "Cleanup Instructions:"
|
||||
echo " - Test container will be removed automatically"
|
||||
echo " - To manually clean up: docker rm -f $SKILL_TEST_CONTAINER_ID"
|
||||
echo " - To remove test image: docker rmi skill-test:$SKILL_NAME"
|
||||
|
||||
# Exit with appropriate code
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,360 @@
|
||||
#!/bin/bash
|
||||
# Test Template for File-Manipulation Skills
|
||||
# Use this template when testing skills that:
|
||||
# - Create, read, update, or delete files
|
||||
# - Modify configurations or codebases
|
||||
# - Generate reports or artifacts
|
||||
# - Work with filesystem operations
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
SKILL_NAME="${1:-example-file-skill}"
|
||||
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
|
||||
TEST_ID="$(date +%s)"
|
||||
TEST_DIR="/tmp/skill-test-$TEST_ID"
|
||||
|
||||
# ============================================================================
|
||||
# Load Helper Library
|
||||
# ============================================================================
|
||||
|
||||
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
|
||||
if [[ ! -f "$HELPER_LIB" ]]; then
|
||||
echo "ERROR: Helper library not found: $HELPER_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "$HELPER_LIB"
|
||||
|
||||
# ============================================================================
|
||||
# Setup Cleanup Trap
|
||||
# ============================================================================
|
||||
|
||||
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
|
||||
export SKILL_TEST_KEEP_CONTAINER="false"
|
||||
export SKILL_TEST_REMOVE_IMAGES="true"
|
||||
|
||||
trap cleanup_on_exit EXIT
|
||||
|
||||
# ============================================================================
|
||||
# Pre-flight Checks
|
||||
# ============================================================================
|
||||
|
||||
echo "=== File Manipulation Skill Test: $SKILL_NAME ==="
|
||||
echo "Test ID: $TEST_ID"
|
||||
echo ""
|
||||
|
||||
# Validate skill exists
|
||||
if [[ ! -d "$SKILL_PATH" ]]; then
|
||||
echo "ERROR: Skill not found: $SKILL_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate Docker environment
|
||||
preflight_check_docker || exit 1
|
||||
|
||||
# ============================================================================
|
||||
# Build Test Environment with Sample Files
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Building Test Environment ==="
|
||||
|
||||
mkdir -p "$TEST_DIR/test-workspace"
|
||||
|
||||
# Create sample files for the skill to manipulate
|
||||
cat > "$TEST_DIR/test-workspace/sample.txt" <<'EOF'
|
||||
This is a sample text file for testing.
|
||||
Line 2
|
||||
Line 3
|
||||
EOF
|
||||
|
||||
cat > "$TEST_DIR/test-workspace/config.json" <<'EOF'
|
||||
{
|
||||
"setting1": "value1",
|
||||
"setting2": 42,
|
||||
"enabled": true
|
||||
}
|
||||
EOF
|
||||
|
||||
mkdir -p "$TEST_DIR/test-workspace/subdir"
|
||||
echo "Nested file" > "$TEST_DIR/test-workspace/subdir/nested.txt"
|
||||
|
||||
# Create Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<EOF
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install file manipulation tools
|
||||
RUN apt-get update && apt-get install -y \\
|
||||
coreutils \\
|
||||
jq \\
|
||||
tree \\
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Create workspace
|
||||
RUN mkdir -p /workspace
|
||||
|
||||
# Copy skill
|
||||
COPY skill/ /root/.claude/skills/$SKILL_NAME/
|
||||
|
||||
# Copy test files
|
||||
COPY test-workspace/ /workspace/
|
||||
|
||||
WORKDIR /workspace
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
|
||||
# Copy skill to test directory
|
||||
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
|
||||
|
||||
# Build test image
|
||||
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
|
||||
echo "ERROR: Failed to build test image"
|
||||
exit 1
|
||||
}
|
||||
|
||||
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
|
||||
|
||||
# ============================================================================
|
||||
# Take "Before" Filesystem Snapshot
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Filesystem Snapshot (Before) ==="
|
||||
|
||||
# Start container
|
||||
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
|
||||
echo "ERROR: Failed to start test container"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Get baseline file list
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/before-files.txt"
|
||||
|
||||
# Get file sizes and checksums
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /workspace
|
||||
find . -type f -exec md5sum {} \; | sort
|
||||
" > "$TEST_DIR/before-checksums.txt"
|
||||
|
||||
# Count files
|
||||
BEFORE_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
|
||||
BEFORE_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
|
||||
|
||||
echo "Before execution:"
|
||||
echo " Files: $BEFORE_FILE_COUNT"
|
||||
echo " Directories: $BEFORE_DIR_COUNT"
|
||||
|
||||
# ============================================================================
|
||||
# Run Skill in Container
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Skill in Isolated Container ==="
|
||||
|
||||
# Execute skill
|
||||
echo "Executing skill..."
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
# Add your skill execution command here
|
||||
# Example: ./file-processor.sh /workspace
|
||||
echo 'Skill execution placeholder - customize this for your skill'
|
||||
" || {
|
||||
EXEC_EXIT_CODE=$?
|
||||
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
|
||||
exit "$EXEC_EXIT_CODE"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Take "After" Filesystem Snapshot
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Filesystem Snapshot (After) ==="
|
||||
|
||||
# Get updated file list
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/after-files.txt"
|
||||
|
||||
# Get updated checksums
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /workspace
|
||||
find . -type f -exec md5sum {} \; | sort
|
||||
" > "$TEST_DIR/after-checksums.txt"
|
||||
|
||||
# Count files
|
||||
AFTER_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
|
||||
AFTER_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
|
||||
|
||||
echo "After execution:"
|
||||
echo " Files: $AFTER_FILE_COUNT"
|
||||
echo " Directories: $AFTER_DIR_COUNT"
|
||||
|
||||
# ============================================================================
|
||||
# Analyze Filesystem Changes
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Analyzing Filesystem Changes ==="
|
||||
|
||||
# Files added
|
||||
echo ""
|
||||
echo "Files Added:"
|
||||
comm -13 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-added.txt"
|
||||
ADDED_COUNT=$(wc -l < "$TEST_DIR/files-added.txt")
|
||||
echo " Count: $ADDED_COUNT"
|
||||
if [[ $ADDED_COUNT -gt 0 ]]; then
|
||||
head -10 "$TEST_DIR/files-added.txt"
|
||||
if [[ $ADDED_COUNT -gt 10 ]]; then
|
||||
echo " ... and $((ADDED_COUNT - 10)) more"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Files removed
|
||||
echo ""
|
||||
echo "Files Removed:"
|
||||
comm -23 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-removed.txt"
|
||||
REMOVED_COUNT=$(wc -l < "$TEST_DIR/files-removed.txt")
|
||||
echo " Count: $REMOVED_COUNT"
|
||||
if [[ $REMOVED_COUNT -gt 0 ]]; then
|
||||
head -10 "$TEST_DIR/files-removed.txt"
|
||||
if [[ $REMOVED_COUNT -gt 10 ]]; then
|
||||
echo " ... and $((REMOVED_COUNT - 10)) more"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Files modified
|
||||
echo ""
|
||||
echo "Files Modified:"
|
||||
comm -12 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" | while read -r file; do
|
||||
BEFORE_HASH=$(grep "$file" "$TEST_DIR/before-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
|
||||
AFTER_HASH=$(grep "$file" "$TEST_DIR/after-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
|
||||
|
||||
if [[ -n "$BEFORE_HASH" && -n "$AFTER_HASH" && "$BEFORE_HASH" != "$AFTER_HASH" ]]; then
|
||||
echo " $file"
|
||||
fi
|
||||
done | tee "$TEST_DIR/files-modified.txt"
|
||||
MODIFIED_COUNT=$(wc -l < "$TEST_DIR/files-modified.txt")
|
||||
echo " Count: $MODIFIED_COUNT"
|
||||
|
||||
# ============================================================================
|
||||
# Validate File Permissions
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Checking File Permissions ==="
|
||||
|
||||
# Find files with unusual permissions
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
find /workspace -type f -perm /111 -ls
|
||||
" > "$TEST_DIR/executable-files.txt" || true
|
||||
|
||||
EXECUTABLE_COUNT=$(wc -l < "$TEST_DIR/executable-files.txt")
|
||||
if [[ $EXECUTABLE_COUNT -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Found $EXECUTABLE_COUNT executable files"
|
||||
cat "$TEST_DIR/executable-files.txt"
|
||||
else
|
||||
echo "✓ No unexpected executable files"
|
||||
fi
|
||||
|
||||
# Check for world-writable files
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
find /workspace -type f -perm -002 -ls
|
||||
" > "$TEST_DIR/world-writable-files.txt" || true
|
||||
|
||||
WRITABLE_COUNT=$(wc -l < "$TEST_DIR/world-writable-files.txt")
|
||||
if [[ $WRITABLE_COUNT -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Found $WRITABLE_COUNT world-writable files (security risk)"
|
||||
cat "$TEST_DIR/world-writable-files.txt"
|
||||
else
|
||||
echo "✓ No world-writable files"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Check for Sensitive Data
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Scanning for Sensitive Data ==="
|
||||
|
||||
# Check for potential secrets in new files
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
grep -rni 'password\|api[-_]key\|secret\|token' /workspace
|
||||
" 2>/dev/null | tee "$TEST_DIR/potential-secrets.txt" || true
|
||||
|
||||
SECRET_COUNT=$(wc -l < "$TEST_DIR/potential-secrets.txt")
|
||||
if [[ $SECRET_COUNT -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Found $SECRET_COUNT lines with potential secrets"
|
||||
echo " Review: $TEST_DIR/potential-secrets.txt"
|
||||
else
|
||||
echo "✓ No obvious secrets detected"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Validate Cleanup Behavior
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Validating Cleanup Behavior ==="
|
||||
|
||||
# Check for leftover temp files
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
find /tmp -name '*skill*' -o -name '*.tmp' -o -name '*.temp'
|
||||
" > "$TEST_DIR/temp-files.txt" || true
|
||||
|
||||
TEMP_COUNT=$(wc -l < "$TEST_DIR/temp-files.txt")
|
||||
if [[ $TEMP_COUNT -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Found $TEMP_COUNT leftover temp files"
|
||||
cat "$TEST_DIR/temp-files.txt"
|
||||
else
|
||||
echo "✓ No leftover temp files"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Generate Test Report
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Test Report ==="
|
||||
echo ""
|
||||
|
||||
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
|
||||
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
echo "✅ TEST PASSED"
|
||||
else
|
||||
echo "❌ TEST FAILED"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Filesystem Changes Summary:"
|
||||
echo " - Files added: $ADDED_COUNT"
|
||||
echo " - Files removed: $REMOVED_COUNT"
|
||||
echo " - Files modified: $MODIFIED_COUNT"
|
||||
echo " - Total file count change: $((AFTER_FILE_COUNT - BEFORE_FILE_COUNT))"
|
||||
|
||||
echo ""
|
||||
echo "Security & Quality Checklist:"
|
||||
[[ $EXECUTABLE_COUNT -eq 0 ]] && echo " ✓ No unexpected executable files" || echo " ✗ Found executable files"
|
||||
[[ $WRITABLE_COUNT -eq 0 ]] && echo " ✓ No world-writable files" || echo " ✗ Found world-writable files"
|
||||
[[ $SECRET_COUNT -eq 0 ]] && echo " ✓ No secrets in files" || echo " ✗ Potential secrets found"
|
||||
[[ $TEMP_COUNT -eq 0 ]] && echo " ✓ Clean temp directory" || echo " ✗ Leftover temp files"
|
||||
|
||||
echo ""
|
||||
echo "Detailed Reports:"
|
||||
echo " - Files added: $TEST_DIR/files-added.txt"
|
||||
echo " - Files removed: $TEST_DIR/files-removed.txt"
|
||||
echo " - Files modified: $TEST_DIR/files-modified.txt"
|
||||
echo " - Before snapshot: $TEST_DIR/before-files.txt"
|
||||
echo " - After snapshot: $TEST_DIR/after-files.txt"
|
||||
|
||||
# Exit with appropriate code
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,395 @@
|
||||
#!/bin/bash
|
||||
# Test Template for Git-Operation Skills
|
||||
# Use this template when testing skills that:
|
||||
# - Create commits, branches, or tags
|
||||
# - Modify git history or configuration
|
||||
# - Work with git worktrees
|
||||
# - Interact with remote repositories
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================================
|
||||
# Configuration
|
||||
# ============================================================================
|
||||
|
||||
SKILL_NAME="${1:-example-git-skill}"
|
||||
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
|
||||
TEST_ID="$(date +%s)"
|
||||
TEST_DIR="/tmp/skill-test-$TEST_ID"
|
||||
|
||||
# ============================================================================
|
||||
# Load Helper Library
|
||||
# ============================================================================
|
||||
|
||||
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
|
||||
if [[ ! -f "$HELPER_LIB" ]]; then
|
||||
echo "ERROR: Helper library not found: $HELPER_LIB"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "$HELPER_LIB"
|
||||
|
||||
# ============================================================================
|
||||
# Setup Cleanup Trap
|
||||
# ============================================================================
|
||||
|
||||
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
|
||||
export SKILL_TEST_KEEP_CONTAINER="false"
|
||||
export SKILL_TEST_REMOVE_IMAGES="true"
|
||||
|
||||
trap cleanup_on_exit EXIT
|
||||
|
||||
# ============================================================================
|
||||
# Pre-flight Checks
|
||||
# ============================================================================
|
||||
|
||||
echo "=== Git Skill Test: $SKILL_NAME ==="
|
||||
echo "Test ID: $TEST_ID"
|
||||
echo ""
|
||||
|
||||
# Validate skill exists
|
||||
if [[ ! -d "$SKILL_PATH" ]]; then
|
||||
echo "ERROR: Skill not found: $SKILL_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate Docker environment
|
||||
preflight_check_docker || exit 1
|
||||
|
||||
# ============================================================================
|
||||
# Create Test Git Repository
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Creating Test Git Repository ==="
|
||||
|
||||
mkdir -p "$TEST_DIR/test-repo"
|
||||
cd "$TEST_DIR/test-repo"
|
||||
|
||||
# Initialize git repo
|
||||
git init
|
||||
git config user.name "Test User"
|
||||
git config user.email "test@example.com"
|
||||
|
||||
# Create initial commit
|
||||
echo "# Test Repository" > README.md
|
||||
echo "Initial content" > file1.txt
|
||||
git add .
|
||||
git commit -m "Initial commit"
|
||||
|
||||
# Create a branch
|
||||
git checkout -b feature-branch
|
||||
echo "Feature content" > feature.txt
|
||||
git add feature.txt
|
||||
git commit -m "Add feature"
|
||||
|
||||
# Go back to main
|
||||
git checkout main
|
||||
|
||||
# Create a tag
|
||||
git tag v1.0.0
|
||||
|
||||
echo "Test repository created:"
|
||||
git log --oneline --all --graph
|
||||
echo ""
|
||||
git branch -a
|
||||
echo ""
|
||||
git tag
|
||||
|
||||
# ============================================================================
|
||||
# Build Test Environment
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Building Test Environment ==="
|
||||
|
||||
cd "$TEST_DIR"
|
||||
|
||||
# Create Dockerfile
|
||||
cat > "$TEST_DIR/Dockerfile" <<EOF
|
||||
FROM ubuntu:22.04
|
||||
|
||||
# Install git
|
||||
RUN apt-get update && apt-get install -y \\
|
||||
git \\
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Configure git
|
||||
RUN git config --global user.name "Test User" && \\
|
||||
git config --global user.email "test@example.com"
|
||||
|
||||
# Copy skill
|
||||
COPY skill/ /root/.claude/skills/$SKILL_NAME/
|
||||
|
||||
# Copy test repository
|
||||
COPY test-repo/ /workspace/
|
||||
|
||||
WORKDIR /workspace
|
||||
|
||||
CMD ["/bin/bash"]
|
||||
EOF
|
||||
|
||||
# Copy skill
|
||||
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
|
||||
|
||||
# Build test image
|
||||
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
|
||||
echo "ERROR: Failed to build test image"
|
||||
exit 1
|
||||
}
|
||||
|
||||
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
|
||||
|
||||
# ============================================================================
|
||||
# Take "Before" Git Snapshot
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Git Snapshot (Before) ==="
|
||||
|
||||
# Start container
|
||||
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
|
||||
echo "ERROR: Failed to start test container"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Capture git state before
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /workspace
|
||||
git log --all --oneline --graph > /tmp/before-log.txt
|
||||
git branch -a > /tmp/before-branches.txt
|
||||
git tag > /tmp/before-tags.txt
|
||||
git status > /tmp/before-status.txt
|
||||
git config --list > /tmp/before-config.txt
|
||||
" || true
|
||||
|
||||
# Copy snapshots out
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-log.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-branches.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-tags.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-status.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-config.txt" "$TEST_DIR/"
|
||||
|
||||
BEFORE_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
|
||||
BEFORE_BRANCH_COUNT=$(wc -l < "$TEST_DIR/before-branches.txt")
|
||||
BEFORE_TAG_COUNT=$(wc -l < "$TEST_DIR/before-tags.txt")
|
||||
|
||||
echo "Before execution:"
|
||||
echo " Commits: $BEFORE_COMMIT_COUNT"
|
||||
echo " Branches: $BEFORE_BRANCH_COUNT"
|
||||
echo " Tags: $BEFORE_TAG_COUNT"
|
||||
|
||||
# ============================================================================
|
||||
# Run Skill in Container
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Running Skill in Isolated Container ==="
|
||||
|
||||
# Execute skill
|
||||
echo "Executing skill..."
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /root/.claude/skills/$SKILL_NAME
|
||||
# Add your skill execution command here
|
||||
# Example: ./git-skill.sh /workspace
|
||||
echo 'Skill execution placeholder - customize this for your skill'
|
||||
" || {
|
||||
EXEC_EXIT_CODE=$?
|
||||
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
|
||||
exit "$EXEC_EXIT_CODE"
|
||||
}
|
||||
|
||||
# ============================================================================
|
||||
# Take "After" Git Snapshot
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Taking Git Snapshot (After) ==="
|
||||
|
||||
# Capture git state after
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /workspace
|
||||
git log --all --oneline --graph > /tmp/after-log.txt
|
||||
git branch -a > /tmp/after-branches.txt
|
||||
git tag > /tmp/after-tags.txt
|
||||
git status > /tmp/after-status.txt
|
||||
git config --list > /tmp/after-config.txt
|
||||
" || true
|
||||
|
||||
# Copy snapshots out
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-log.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-branches.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-tags.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-status.txt" "$TEST_DIR/"
|
||||
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-config.txt" "$TEST_DIR/"
|
||||
|
||||
AFTER_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
|
||||
AFTER_BRANCH_COUNT=$(wc -l < "$TEST_DIR/after-branches.txt")
|
||||
AFTER_TAG_COUNT=$(wc -l < "$TEST_DIR/after-tags.txt")
|
||||
|
||||
echo "After execution:"
|
||||
echo " Commits: $AFTER_COMMIT_COUNT"
|
||||
echo " Branches: $AFTER_BRANCH_COUNT"
|
||||
echo " Tags: $AFTER_TAG_COUNT"
|
||||
|
||||
# ============================================================================
|
||||
# Analyze Git Changes
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Analyzing Git Changes ==="
|
||||
|
||||
# New commits
|
||||
COMMIT_DIFF=$((AFTER_COMMIT_COUNT - BEFORE_COMMIT_COUNT))
|
||||
if [[ $COMMIT_DIFF -gt 0 ]]; then
|
||||
echo "✓ Added $COMMIT_DIFF new commit(s)"
|
||||
|
||||
echo ""
|
||||
echo "New commits:"
|
||||
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
|
||||
cd /workspace
|
||||
git log --oneline -n $COMMIT_DIFF
|
||||
"
|
||||
else
|
||||
echo "No new commits created"
|
||||
fi
|
||||
|
||||
# New branches
|
||||
echo ""
|
||||
echo "Branch Changes:"
|
||||
comm -13 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-added.txt"
|
||||
BRANCH_ADDED=$(wc -l < "$TEST_DIR/branches-added.txt")
|
||||
if [[ $BRANCH_ADDED -gt 0 ]]; then
|
||||
echo " Added $BRANCH_ADDED branch(es):"
|
||||
cat "$TEST_DIR/branches-added.txt"
|
||||
fi
|
||||
|
||||
comm -23 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-removed.txt"
|
||||
BRANCH_REMOVED=$(wc -l < "$TEST_DIR/branches-removed.txt")
|
||||
if [[ $BRANCH_REMOVED -gt 0 ]]; then
|
||||
echo " Removed $BRANCH_REMOVED branch(es):"
|
||||
cat "$TEST_DIR/branches-removed.txt"
|
||||
fi
|
||||
|
||||
if [[ $BRANCH_ADDED -eq 0 && $BRANCH_REMOVED -eq 0 ]]; then
|
||||
echo " No branch changes"
|
||||
fi
|
||||
|
||||
# New tags
|
||||
echo ""
|
||||
echo "Tag Changes:"
|
||||
comm -13 "$TEST_DIR/before-tags.txt" "$TEST_DIR/after-tags.txt" > "$TEST_DIR/tags-added.txt"
|
||||
TAG_ADDED=$(wc -l < "$TEST_DIR/tags-added.txt")
|
||||
if [[ $TAG_ADDED -gt 0 ]]; then
|
||||
echo " Added $TAG_ADDED tag(s):"
|
||||
cat "$TEST_DIR/tags-added.txt"
|
||||
fi
|
||||
|
||||
# Config changes
|
||||
echo ""
|
||||
echo "Git Config Changes:"
|
||||
diff "$TEST_DIR/before-config.txt" "$TEST_DIR/after-config.txt" > "$TEST_DIR/config-diff.txt" || true
|
||||
if [[ -s "$TEST_DIR/config-diff.txt" ]]; then
|
||||
echo " Configuration was modified:"
|
||||
cat "$TEST_DIR/config-diff.txt"
|
||||
else
|
||||
echo " No configuration changes"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Check Working Tree Status
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Checking Working Tree Status ==="
|
||||
|
||||
UNCOMMITTED_CHANGES=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git status --porcelain" || echo "")
|
||||
if [[ -n "$UNCOMMITTED_CHANGES" ]]; then
|
||||
echo "⚠ WARNING: Uncommitted changes detected:"
|
||||
echo "$UNCOMMITTED_CHANGES"
|
||||
echo ""
|
||||
echo "Skills should clean up working tree after execution!"
|
||||
else
|
||||
echo "✓ Working tree is clean"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Validate Git Safety
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Git Safety Checks ==="
|
||||
|
||||
# Check for force operations in logs
|
||||
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "force\|--force\|-f" > "$TEST_DIR/force-operations.txt" || true
|
||||
FORCE_OPS=$(wc -l < "$TEST_DIR/force-operations.txt")
|
||||
if [[ $FORCE_OPS -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Detected $FORCE_OPS force operations"
|
||||
cat "$TEST_DIR/force-operations.txt"
|
||||
else
|
||||
echo "✓ No force operations detected"
|
||||
fi
|
||||
|
||||
# Check for history rewriting
|
||||
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "rebase\|reset --hard\|filter-branch" > "$TEST_DIR/history-rewrites.txt" || true
|
||||
REWRITES=$(wc -l < "$TEST_DIR/history-rewrites.txt")
|
||||
if [[ $REWRITES -gt 0 ]]; then
|
||||
echo "⚠ WARNING: Detected $REWRITES history rewrite operations"
|
||||
cat "$TEST_DIR/history-rewrites.txt"
|
||||
else
|
||||
echo "✓ No history rewriting detected"
|
||||
fi
|
||||
|
||||
# Check for dangling commits
|
||||
DANGLING_COMMITS=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git fsck --lost-found 2>&1 | grep 'dangling commit'" || echo "")
|
||||
if [[ -n "$DANGLING_COMMITS" ]]; then
|
||||
echo "⚠ WARNING: Dangling commits found (potential data loss)"
|
||||
echo "$DANGLING_COMMITS"
|
||||
else
|
||||
echo "✓ No dangling commits"
|
||||
fi
|
||||
|
||||
# ============================================================================
|
||||
# Generate Test Report
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "=== Test Report ==="
|
||||
echo ""
|
||||
|
||||
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
|
||||
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
echo "✅ TEST PASSED"
|
||||
else
|
||||
echo "❌ TEST FAILED"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Git Changes Summary:"
|
||||
echo " - Commits added: $COMMIT_DIFF"
|
||||
echo " - Branches added: $BRANCH_ADDED"
|
||||
echo " - Branches removed: $BRANCH_REMOVED"
|
||||
echo " - Tags added: $TAG_ADDED"
|
||||
|
||||
echo ""
|
||||
echo "Safety Checklist:"
|
||||
[[ -z "$UNCOMMITTED_CHANGES" ]] && echo " ✓ Clean working tree" || echo " ✗ Uncommitted changes"
|
||||
[[ $FORCE_OPS -eq 0 ]] && echo " ✓ No force operations" || echo " ✗ Force operations detected"
|
||||
[[ $REWRITES -eq 0 ]] && echo " ✓ No history rewriting" || echo " ✗ History rewriting detected"
|
||||
[[ -z "$DANGLING_COMMITS" ]] && echo " ✓ No dangling commits" || echo " ✗ Dangling commits found"
|
||||
|
||||
echo ""
|
||||
echo "Detailed Snapshots:"
|
||||
echo " - Before log: $TEST_DIR/before-log.txt"
|
||||
echo " - After log: $TEST_DIR/after-log.txt"
|
||||
echo " - Branch changes: $TEST_DIR/branches-added.txt"
|
||||
echo " - Config diff: $TEST_DIR/config-diff.txt"
|
||||
|
||||
# Exit with appropriate code
|
||||
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
|
||||
exit 0
|
||||
else
|
||||
exit 1
|
||||
fi
|
||||
Reference in New Issue
Block a user