Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:16:40 +08:00
commit f125e90b9f
370 changed files with 67769 additions and 0 deletions

View File

@@ -0,0 +1,15 @@
# Changelog
## 0.2.0
- Refactored to Anthropic progressive disclosure pattern
- Updated description with "Use PROACTIVELY when..." format
- Removed version/author from frontmatter
## 0.1.0
- Initial release with three isolation modes
- Git Worktree (fast), Docker (balanced), VM (safest)
- Automatic risk assessment and mode detection
- Side-effect validation and dependency analysis
- Test report generation with actionable recommendations

View File

@@ -0,0 +1,335 @@
# Skill Isolation Tester
> Automated testing framework for Claude Code skills in isolated environments
## Overview
Test your newly created Claude Code skills in isolated environments before sharing them publicly. This skill automatically spins up git worktrees, Docker containers, or VMs to validate that your skills work correctly without hidden dependencies on your local setup.
## Features
- **Multiple Isolation Levels**: Choose from git worktree (fast), Docker (balanced), or VM (safest)
- **Automatic Mode Detection**: Analyzes skill risk and suggests appropriate isolation level
- **Comprehensive Validation**: Checks execution, side effects, dependencies, and cleanup
- **Detailed Reports**: Get actionable feedback with specific issues and recommendations
- **Safe Testing**: Protect your main development environment from experimental skills
## Quick Start
### Basic Usage
```
test skill my-new-skill in isolation
```
Claude will analyze your skill and choose the appropriate isolation environment.
### Specify Environment
```
test skill my-new-skill in worktree # Fast, lightweight
test skill my-new-skill in docker # OS isolation
test skill my-new-skill in vm # Maximum security
```
### Check for Issues
```
check if skill my-new-skill has hidden dependencies
verify skill my-new-skill cleans up after itself
```
## Isolation Modes
### 🚀 Git Worktree (Fast)
**Best for**: Read-only skills, quick iteration during development
- ✅ Creates test in seconds
- ✅ Minimal disk space
- ⚠️ Limited isolation (shares system packages)
**Prerequisites**: Git 2.5+
### 🐳 Docker (Balanced)
**Best for**: Skills that install packages or modify files
- ✅ Full OS isolation
- ✅ Reproducible environment
- ⚠️ Requires Docker installed
**Prerequisites**: Docker daemon running
### 🖥️ VM (Safest)
**Best for**: High-risk skills, untrusted sources
- ✅ Complete isolation
- ✅ Test on different OS versions
- ⚠️ Slower, resource-intensive
**Prerequisites**: Multipass, UTM, or VirtualBox
## What Gets Tested
### ✅ Execution Validation
- Skill completes without errors
- No unhandled exceptions
- Acceptable performance
### ✅ Side Effect Detection
- Files created/modified/deleted
- Processes started (and stopped)
- System configuration changes
- Network activity
### ✅ Dependency Analysis
- Required system packages
- NPM/pip dependencies
- Hardcoded paths
- Environment variables needed
### ✅ Cleanup Verification
- Temporary files removed
- Processes terminated
- System state restored
## Example Report
```markdown
# Skill Isolation Test Report: my-new-skill
## Status: ⚠️ WARNING (Ready with minor fixes)
### Execution Results
✅ Skill completed successfully
✅ No errors detected
⏱️ Execution time: 12s
### Issues Found
**HIGH Priority:**
- Missing documentation for `jq` dependency
- Hardcoded path: /Users/connor/.claude/config (line 45)
**MEDIUM Priority:**
- 3 temporary files not cleaned up in /tmp
### Recommendations
1. Document `jq` requirement in README
2. Replace hardcoded path with $HOME/.claude/config
3. Add cleanup for /tmp/skill-temp-*.log files
### Overall Grade: B (READY after addressing HIGH priority items)
```
## Installation
This skill is already available in your Claude Code skills directory.
### Manual Installation
```bash
cp -r skill-isolation-tester ~/.claude/skills/
```
### Verify Installation
Start Claude Code and say:
```
test skill [any-skill-name] in isolation
```
## Prerequisites
### Required (All Modes)
- Git 2.5+
- Claude Code 1.0+
### Optional (Docker Mode)
- Docker Desktop or Docker Engine
- 1GB+ free disk space
### Optional (VM Mode)
- Multipass (recommended) or
- UTM (macOS) or
- VirtualBox (cross-platform)
- 8GB+ host RAM
- 20GB+ free disk space
## Configuration
### Set Default Isolation Mode
Create `~/.claude/skills/skill-isolation-tester/config.json`:
```json
{
"default_mode": "docker",
"docker": {
"base_image": "ubuntu:22.04",
"memory_limit": "512m",
"cpu_limit": "1.0"
},
"vm": {
"platform": "multipass",
"os_version": "22.04",
"cpus": 2,
"memory": "2G",
"disk": "10G"
}
}
```
## Use Cases
### Before Submitting to Claudex Marketplace
```
validate skill my-marketplace-skill in docker
```
Ensures your skill works in clean environment without your personal configs.
### Testing Skills from Others
```
test skill untrusted-skill in vm
```
Maximum isolation protects your system from potential issues.
### Catching Environment-Specific Bugs
```
test skill my-skill in worktree
```
Quickly verify skill doesn't depend on your specific setup.
### CI/CD Integration
```bash
#!/bin/bash
# In your CI pipeline
claude "test skill $SKILL_NAME in docker"
if [ $? -eq 0 ]; then
echo "✅ Skill tests passed"
exit 0
else
echo "❌ Skill tests failed"
exit 1
fi
```
## Troubleshooting
### "Docker daemon not running"
**macOS**: Open Docker Desktop
**Linux**: `sudo systemctl start docker`
### "Multipass not found"
```bash
# macOS
brew install multipass
# Linux
sudo snap install multipass
```
### "Permission denied"
Add your user to docker group:
```bash
sudo usermod -aG docker $USER
newgrp docker
```
### "Out of disk space"
Clean up Docker:
```bash
docker system prune -a
```
## Best Practices
1. **Test before committing** - Catch issues early
2. **Start with worktree** - Fast iteration during development
3. **Use Docker for final validation** - Before public release
4. **Use VM for untrusted skills** - Safety first
5. **Review test reports** - Address all HIGH priority issues
6. **Document dependencies** - Help other users
## Advanced Usage
### Custom Test Scenarios
```
test skill my-skill with inputs "test-file.txt, --option value"
```
### Batch Testing
```
test all skills in directory ./skills/ in worktree
```
### Keep Environment for Debugging
```
test skill my-skill in docker --keep
```
Preserves container/VM for manual inspection.
## Architecture
```
skill-isolation-tester/
├── SKILL.md # Main skill manifest
├── README.md # This file
├── CHANGELOG.md # Version history
├── plugin.json # Marketplace metadata
├── modes/ # Mode-specific workflows
│ ├── mode1-git-worktree.md # Fast isolation
│ ├── mode2-docker.md # Container isolation
│ └── mode3-vm.md # VM isolation
├── data/ # Reference materials
│ ├── risk-assessment.md # How to assess skill risk
│ └── side-effect-checklist.md # What to check for
├── templates/ # Report templates
│ └── test-report.md # Standard report format
└── examples/ # Sample outputs
└── test-results/ # Example test results
```
## Contributing
Found a bug or have a feature request? Issues and PRs welcome!
## License
MIT License - see LICENSE file for details
## Related Skills
- **skill-creator**: Create new skills with proper structure
- **git-worktree-setup**: Manage parallel development workflows
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for version history.
## Credits
Created by Connor
Inspired by best practices in software testing and isolation
---
**Remember**: Test in isolation, ship with confidence! 🚀

View File

@@ -0,0 +1,174 @@
---
name: skill-isolation-tester
description: Use PROACTIVELY when validating Claude Code skills before sharing or public release. Automated testing framework using multiple isolation environments (git worktree, Docker containers, VMs) to catch environment-specific bugs, hidden dependencies, and cleanup issues. Includes production-ready test templates and risk-based mode auto-detection. Not for functional testing of skill logic or non-skill code.
---
# Skill Isolation Tester
Tests Claude Code skills in isolated environments to ensure they work correctly without dependencies on your local setup.
## When to Use
**Trigger Phrases**:
- "test skill [name] in isolation"
- "validate skill [name] in clean environment"
- "test my new skill in worktree/docker/vm"
- "check if skill [name] has hidden dependencies"
**Use Cases**:
- Test before committing or sharing publicly
- Validate no hidden dependencies on local environment
- Verify cleanup behavior (no leftover files/processes)
- Catch environment-specific bugs
## Quick Decision Matrix
| Request | Mode | Isolation Level |
|---------|------|-----------------|
| "test in worktree" | Git Worktree | Fast, lightweight |
| "test in docker" | Docker | Full OS isolation |
| "test in vm" | VM | Complete isolation |
| "test skill X" (unspecified) | Auto-detect | Based on skill risk |
## Risk-Based Auto-Detection
| Risk Level | Criteria | Recommended Mode |
|------------|----------|------------------|
| Low | Read-only, no system commands | Git Worktree |
| Medium | File creation, bash commands | Docker |
| High | System config changes, VM ops | VM |
## Mode 1: Git Worktree (Fast)
**Best for**: Low-risk skills, quick iteration
**Process**:
1. Create isolated git worktree
2. Install Claude Code
3. Copy skill and run tests
4. Cleanup
**Workflow**: `modes/mode1-git-worktree.md`
## Mode 2: Docker Container (Balanced)
**Best for**: Medium-risk skills, full OS isolation
**Process**:
1. Build/pull Docker image
2. Create container with Claude Code
3. Run skill tests with monitoring
4. Cleanup container and images
**Workflow**: `modes/mode2-docker.md`
## Mode 3: VM (Safest)
**Best for**: High-risk skills, untrusted code
**Process**:
1. Provision VM, take snapshot
2. Install Claude Code
3. Run tests with full monitoring
4. Rollback or cleanup
**Workflow**: `modes/mode3-vm.md`
## Test Templates
Production-ready templates in `test-templates/`:
| Template | Use For |
|----------|---------|
| `docker-skill-test.sh` | Docker container/image skills |
| `docker-skill-test-json.sh` | CI/CD with JSON/JUnit output |
| `api-skill-test.sh` | HTTP/API calling skills |
| `file-manipulation-skill-test.sh` | File modification skills |
| `git-skill-test.sh` | Git operation skills |
**Usage**:
```bash
chmod +x test-templates/docker-skill-test.sh
./test-templates/docker-skill-test.sh my-skill-name
# CI/CD with JSON output
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill-name
```
## Helper Library
`lib/docker-helpers.sh` provides robust Docker testing utilities:
```bash
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
trap cleanup_on_exit EXIT
preflight_check_docker || exit 1
safe_docker_build "Dockerfile" "skill-test:my-skill"
safe_docker_run "skill-test:my-skill" bash -c "echo 'Testing...'"
```
**Functions**: `validate_shell_command`, `retry_docker_command`, `cleanup_on_exit`, `preflight_check_docker`, `safe_docker_build`, `safe_docker_run`
## Validation Checks
**Execution**:
- [ ] Skill completes without errors
- [ ] Output matches expected format
- [ ] Execution time acceptable
**Side Effects**:
- [ ] No orphaned processes
- [ ] Temporary files cleaned up
- [ ] No unexpected system modifications
**Portability**:
- [ ] No hardcoded paths
- [ ] All dependencies documented
- [ ] Works in clean environment
## Test Report Format
```markdown
# Skill Isolation Test Report: [skill-name]
## Environment: [Git Worktree / Docker / VM]
## Status: [PASS / FAIL / WARNING]
### Execution Results
✅ Skill completed successfully
### Side Effects Detected
⚠️ 3 temporary files not cleaned up
### Dependency Analysis
📦 Required: jq, git
### Overall Grade: B (READY with minor fixes)
```
## Reference Materials
- `modes/mode1-git-worktree.md` - Fast isolation workflow
- `modes/mode2-docker.md` - Container isolation workflow
- `modes/mode3-vm.md` - Full VM isolation workflow
- `data/risk-assessment.md` - Skill risk evaluation
- `data/side-effect-checklist.md` - Side effect validation
- `templates/test-report.md` - Report template
- `test-templates/README.md` - Template documentation
## Quick Commands
```bash
# Test with auto-detection
test skill my-new-skill in isolation
# Test in specific environment
test skill my-new-skill in worktree # Fast
test skill my-new-skill in docker # Balanced
test skill my-new-skill in vm # Safest
```
---
**Version**: 0.1.0 | **Author**: Connor

View File

@@ -0,0 +1,391 @@
# Skill Risk Assessment Guide
## Overview
This guide helps you assess the risk level of a skill to determine the appropriate isolation environment for testing. Risk assessment prevents over-isolation (wasting time) and under-isolation (security issues).
## Risk Levels
### Low Risk → Git Worktree
**Characteristics:**
- Read-only operations on existing files
- No system commands (bash, npm, apt, etc.)
- No file creation outside skill directory
- No network requests
- Pure data processing or analysis
- File reading and reporting only
**Examples:**
- Code analyzer that reads files and generates reports
- Configuration validator that checks syntax
- Documentation generator from code comments
- Markdown formatter or linter
- Log file parser
**Appropriate Environment:** Git Worktree (fast, lightweight)
### Medium Risk → Docker
**Characteristics:**
- File creation in user directories
- NPM/pip package installation
- Bash commands for file operations
- Git operations (clone, commit, etc.)
- Network requests (API calls, downloads)
- Environment variable reads
- Temporary file creation
- Database connections (local)
**Examples:**
- Code generator that creates new files
- Package installer or dependency manager
- API integration that fetches remote data
- Build tool that compiles code
- Test runner that executes tests
- Migration tool that updates files
**Appropriate Environment:** Docker (OS isolation, reproducible)
### High Risk → VM
**Characteristics:**
- System configuration changes (/etc/ modifications)
- Service installation (systemd, cron)
- Kernel module loading
- VM or container operations
- Database schema migrations (production)
- Destructive operations (file deletion, disk formatting)
- Privilege escalation (sudo commands)
- Unknown or untrusted source
**Examples:**
- System setup automation
- Infrastructure provisioning
- VM management tools
- Security testing tools
- Experimental or unreviewed skills
- Skills from external repositories
**Appropriate Environment:** VM (complete isolation, safest)
## Assessment Checklist
### Step 1: Parse Skill Manifest (SKILL.md)
Read the skill's SKILL.md and look for these keywords:
**Low Risk Indicators:**
- "analyze", "read", "parse", "validate", "check", "lint", "format"
- "generate report", "calculate", "summarize"
- Read-only file operations
- No system commands mentioned
**Medium Risk Indicators:**
- "install", "create", "write", "modify", "update", "build", "compile"
- "npm install", "pip install", "git clone"
- "fetch", "download", "API call"
- File creation mentioned
- Bash commands for file operations
**High Risk Indicators:**
- "sudo", "systemctl", "cron", "service"
- "configure system", "modify /etc"
- "VM", "docker run", "container"
- "delete", "remove", "format"
- "root access", "privilege"
### Step 2: Scan Skill Code
If skill includes scripts or code files, scan for:
**Red Flags (High Risk):**
```bash
# In bash scripts
sudo
systemctl
/etc/
chmod 777
rm -rf /
dd if=
mkfs
usermod
passwd
```
```javascript
// In JavaScript/Node
require('child_process').exec('sudo')
fs.rmdirSync('/', { recursive: true })
process.setuid(0)
```
```python
# In Python
os.system('sudo')
import subprocess
subprocess.run(['sudo', ...])
```
**Medium Risk Patterns:**
```bash
npm install
git clone
curl | bash
apt-get install
brew install
pip install
mkdir -p
touch
echo > file
```
**Low Risk Patterns:**
```bash
cat file.txt
grep pattern
find . -name
ls -la
echo "message"
```
### Step 3: Check Dependencies
Review plugin.json or README for dependencies:
**Low Risk:**
- No external dependencies
- Pure JavaScript/Python/Ruby standard library
- Read-only CLI tools (cat, grep, jq for reading only)
**Medium Risk:**
- NPM packages listed
- Python packages (via requirements.txt)
- Common CLI tools (git, curl, wget)
- Database connections (read/write)
**High Risk:**
- System packages (apt, yum, brew)
- Kernel modules
- Root-level dependencies
- Unsigned binaries
- External scripts from unknown sources
### Step 4: Review File Operations
Check what directories the skill accesses:
**Low Risk:**
- Reads from current directory only
- Reads from specified input files
- Writes reports to current directory
**Medium Risk:**
- Reads/writes to ~/.claude/
- Reads/writes to /tmp/
- Creates files in user directories
- Modifies project files
**High Risk:**
- Accesses /etc/
- Accesses /usr/ or /usr/local/
- Accesses /sys/ or /proc/
- Modifies system binaries
- Accesses /var/log/
### Step 5: Network Activity Assessment
**Low Risk:**
- No network activity
- Reads from local cache only
**Medium Risk:**
- HTTP GET requests to public APIs
- Documented API endpoints
- Read-only data fetching
- HTTPS only
**High Risk:**
- HTTP POST with sensitive data
- Unclear network destinations
- Raw socket operations
- Arbitrary URL from user input
- Self-updating mechanism
## Automatic Risk Scoring
Use this scoring system:
```javascript
function assessSkillRisk(skill) {
let score = 0;
// File operations
if (mentions(skill, "read", "parse", "analyze")) score += 1;
if (mentions(skill, "write", "create", "modify")) score += 3;
if (mentions(skill, "delete", "remove", "rm -rf")) score += 8;
// System operations
if (mentions(skill, "npm install", "pip install")) score += 3;
if (mentions(skill, "apt-get", "brew install")) score += 5;
if (mentions(skill, "sudo", "systemctl", "service")) score += 10;
// File paths
if (accesses(skill, "~/", "/tmp/")) score += 2;
if (accesses(skill, "/etc/", "/usr/")) score += 8;
// Network
if (mentions(skill, "fetch", "API", "curl")) score += 2;
if (mentions(skill, "download", "wget")) score += 3;
// Process operations
if (mentions(skill, "exec", "spawn", "child_process")) score += 4;
// Determine risk level
if (score <= 3) return "low"; // Worktree
if (score <= 10) return "medium"; // Docker
return "high"; // VM
}
```
**Scoring Reference:**
- 0-3: Low Risk → Git Worktree
- 4-10: Medium Risk → Docker
- 11+: High Risk → VM
## Special Cases
### Unknown or Unreviewed Skills
**Default:** High Risk (VM isolation)
Even if skill appears low risk, use VM for first test of:
- Skills from external repositories
- Skills without documentation
- Skills with obfuscated code
- Skills from untrusted authors
### Skills in Active Development
**Recommendation:** Medium Risk (Docker)
For your own skills during development:
- Start with Git Worktree for speed
- Use Docker before committing
- Use VM before public release
### Skills from Marketplace
**Recommendation:** Follow listed risk level
Trusted marketplace skills can use their documented risk level.
## Override Cases
User can always override automatic detection:
```
test skill low-risk-skill in vm # More isolation than needed (safe but slow)
test skill high-risk-skill in docker # Less isolation (not recommended)
```
**Warn user if choosing lower isolation than recommended.**
## Risk Re-assessment
Re-assess risk if skill is updated:
- Major version changes
- New dependencies added
- New file operations
- Expanded scope
## Decision Tree
```
Start
|
├─ Does skill read files only?
| └─ YES → Low Risk (Worktree)
| └─ NO → Continue
|
├─ Does skill install packages or modify files?
| └─ YES → Medium Risk (Docker)
| └─ NO → Continue
|
├─ Does skill modify system configs or use sudo?
| └─ YES → High Risk (VM)
| └─ NO → Continue
|
└─ Is skill from untrusted source?
└─ YES → High Risk (VM)
└─ NO → Medium Risk (Docker)
```
## Example Assessments
### Example 1: "code-formatter"
**Description:** Formats JavaScript/TypeScript files using prettier
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: Yes (score: +3)
- System commands: No
- Dependencies: prettier (npm package) (score: +3)
- File paths: Current directory only
**Total Score:** 7
**Risk Level:** Medium → Docker
**Reasoning:** Modifies files but limited to project directory. Docker provides adequate isolation.
### Example 2: "log-analyzer"
**Description:** Parses log files and generates HTML report
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: Yes (HTML report) (score: +3)
- System commands: No
- Dependencies: None
- File paths: Current directory + /tmp for temp files (score: +2)
**Total Score:** 6
**Risk Level:** Medium → Docker
**Reasoning:** Safe operations but creates files. Docker ensures clean testing.
### Example 3: "system-auditor"
**Description:** Audits system security configuration
**Analysis:**
- Reads files: Yes, including /etc/ (score: +1 + 8)
- System commands: Runs systemctl, checks services (score: +10)
- Dependencies: System tools
- File paths: /etc/, /var/log/ (score: +8)
**Total Score:** 27
**Risk Level:** High → VM
**Reasoning:** Accesses sensitive system directories and uses system commands. VM required.
### Example 4: "markdown-linter"
**Description:** Checks markdown files for style violations
**Analysis:**
- Reads files: Yes (score: +1)
- Writes files: No (only stdout)
- System commands: No
- Dependencies: None
- File paths: Current directory only
**Total Score:** 1
**Risk Level:** Low → Git Worktree
**Reasoning:** Pure read-only analysis. Worktree is sufficient and fast.
---
**Remember:** When in doubt, choose higher isolation. It's better to be safe than to clean up a compromised system. Speed is secondary to security.

View File

@@ -0,0 +1,543 @@
# Side Effect Detection Checklist
## Overview
This checklist helps identify all side effects caused by skill execution. Side effects are any changes to the system state beyond the skill's primary output. Proper detection ensures skills are well-behaved and clean up after themselves.
## Why Side Effects Matter
**Portability:** Skills with untracked side effects may not work for other users
**Cleanliness:** Leftover files and processes waste resources
**Security:** Unexpected system modifications are security risks
**Documentation:** Users need to know what a skill changes
## Categories of Side Effects
## 1. Filesystem Changes
### Files Created
**What to Check:**
- Files in skill directory
- Files in /tmp/ or /var/tmp/
- Files in user home directory (~/)
- Files in system directories (/usr/local/, /opt/)
- Hidden files (.*) and cache directories (.cache/)
- Lock files (.lock, .pid)
**How to Detect:**
```bash
# Before execution
find /path -type f > /tmp/before-files.txt
# After execution
find /path -type f > /tmp/after-files.txt
# Compare
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^>" | sed 's/^> //'
```
**Expected Behavior:**
- ✅ Temporary files in /tmp cleaned up before exit
- ✅ Output files in current directory or specified location
- ✅ Cache files in ~/.cache/skill-name/ (acceptable)
- ❌ Random files scattered across filesystem
- ❌ Files in system directories without explicit permission
**Severity:**
- **LOW**: Cache files in proper location
- **MEDIUM**: Temp files not cleaned up
- **HIGH**: Files in system directories
- **CRITICAL**: Files overwriting existing user data
### Files Modified
**What to Check:**
- Project files (package.json, tsconfig.json, etc.)
- Configuration files (.env, .config/)
- System configs (/etc/*)
- User configs (~/.bashrc, ~/.zshrc)
- Git repository files (.git/)
**How to Detect:**
```bash
# Take checksums before
find /path -type f -exec md5sum {} \; > /tmp/before-checksums.txt
# After execution
find /path -type f -exec md5sum {} \; > /tmp/after-checksums.txt
# Find modified files
diff /tmp/before-checksums.txt /tmp/after-checksums.txt
```
**Expected Behavior:**
- ✅ Only files explicitly in skill's scope modified
- ✅ Backup created before modifying important files
- ✅ Modifications clearly documented in output
- ❌ Configuration files modified without notice
- ❌ Git repository modified unexpectedly
- ❌ System files changed
**Severity:**
- **LOW**: Intended file modifications (skill's purpose)
- **MEDIUM**: Unintended project file changes
- **HIGH**: User config modifications without consent
- **CRITICAL**: System file modifications
### Files Deleted
**What to Check:**
- Files in skill scope (expected deletions)
- Temp files created by skill
- User files outside skill scope
- System files
**How to Detect:**
```bash
# Compare before/after file lists
diff /tmp/before-files.txt /tmp/after-files.txt | grep "^<" | sed 's/^< //'
```
**Expected Behavior:**
- ✅ Only temporary files created by skill deleted
- ✅ Deletions are part of skill's documented purpose
- ❌ User files deleted without explicit permission
- ❌ Project files deleted accidentally
- ❌ System files deleted
**Severity:**
- **LOW**: Skill's own temp files deleted (cleanup)
- **MEDIUM**: Unexpected file deletions in project
- **HIGH**: User files deleted
- **CRITICAL**: System files or important data deleted
### Directory Changes
**What to Check:**
- New directories created
- Working directory changed
- Directories removed
**How to Detect:**
```bash
# List directories before/after
find /path -type d > /tmp/before-dirs.txt
find /path -type d > /tmp/after-dirs.txt
diff /tmp/before-dirs.txt /tmp/after-dirs.txt
```
**Expected Behavior:**
- ✅ Directories created for skill output
- ✅ Temp directories in /tmp
- ✅ Working directory restored after operations
- ❌ Empty directories left behind
- ❌ Directories created in unexpected locations
## 2. Process Management
### Processes Created
**What to Check:**
- Foreground processes (should complete)
- Background processes (daemons, services)
- Child processes (spawned by skill)
- Zombie processes
**How to Detect:**
```bash
# Before execution
ps aux > /tmp/before-processes.txt
# After execution (wait 30 seconds)
sleep 30
ps aux > /tmp/after-processes.txt
# Find new processes
diff /tmp/before-processes.txt /tmp/after-processes.txt | grep "^>"
```
**Expected Behavior:**
- ✅ All skill processes complete and exit
- ✅ No orphaned child processes
- ✅ Background services documented if needed
- ❌ Processes still running after skill exits
- ❌ Zombie processes
- ❌ High CPU/memory usage processes
**Severity:**
- **LOW**: Short-lived child processes that exit cleanly
- **MEDIUM**: Background processes that should have been stopped
- **HIGH**: Orphaned processes consuming resources
- **CRITICAL**: Runaway processes (infinite loops, memory leaks)
### Process Resource Usage
**What to Check:**
- CPU usage during and after execution
- Memory consumption
- Disk I/O
- Network I/O
**How to Detect:**
```bash
# Monitor during execution
top -b -n 1 > /tmp/resource-usage.txt
# Or use htop, ps aux, etc.
```
**Expected Behavior:**
- ✅ Reasonable resource usage for task
- ✅ Resources released after completion
- ❌ 100% CPU for extended time
- ❌ Memory leaks (growing usage)
- ❌ Excessive disk I/O
**Severity:**
- **LOW**: Temporary spike during execution
- **MEDIUM**: Higher than expected but acceptable
- **HIGH**: Excessive usage (> 80% CPU, > 1GB RAM)
- **CRITICAL**: Resource exhaustion (OOM, disk full)
## 3. System Configuration
### Environment Variables
**What to Check:**
- New environment variables set
- Modified PATH, HOME, etc.
- Shell configuration changes
**How to Detect:**
```bash
# Before
env | sort > /tmp/before-env.txt
# After
env | sort > /tmp/after-env.txt
# Compare
diff /tmp/before-env.txt /tmp/after-env.txt
```
**Expected Behavior:**
- ✅ No permanent environment changes
- ✅ Temporary env vars for skill only
- ❌ PATH modified globally
- ❌ System env vars changed
**Severity:**
- **LOW**: Temporary env vars in skill scope
- **MEDIUM**: PATH modified in current shell
- **HIGH**: .bashrc/.zshrc modified
- **CRITICAL**: System-wide env changes
### System Services
**What to Check:**
- Systemd services started
- Cron jobs created
- Launch agents/daemons (macOS)
**How to Detect:**
```bash
# Linux
systemctl list-units --type=service > /tmp/before-services.txt
# After
systemctl list-units --type=service > /tmp/after-services.txt
diff /tmp/before-services.txt /tmp/after-services.txt
# Cron jobs
crontab -l > /tmp/before-cron.txt
# After
crontab -l > /tmp/after-cron.txt
```
**Expected Behavior:**
- ✅ No services unless explicitly documented
- ✅ Services stopped after skill exits
- ❌ Services left running
- ❌ Cron jobs created without consent
**Severity:**
- **MEDIUM**: Services that should have been stopped
- **HIGH**: Unexpected service installations
- **CRITICAL**: System services modified
### Package Installations
**What to Check:**
- NPM packages (global)
- Python packages (pip)
- System packages (apt, brew)
- Ruby gems, Go modules, etc.
**How to Detect:**
```bash
# NPM global packages
npm list -g --depth=0 > /tmp/before-npm.txt
# After
npm list -g --depth=0 > /tmp/after-npm.txt
diff /tmp/before-npm.txt /tmp/after-npm.txt
# System packages (Debian/Ubuntu)
dpkg -l > /tmp/before-packages.txt
# After
dpkg -l > /tmp/after-packages.txt
```
**Expected Behavior:**
- ✅ All dependencies documented in README
- ✅ Local installations (in project directory)
- ❌ Global package installations without notice
- ❌ System package changes
**Severity:**
- **LOW**: Local project dependencies
- **MEDIUM**: Global NPM packages (if documented)
- **HIGH**: System packages installed
- **CRITICAL**: Conflicting package versions
## 4. Network Activity
### Connections Established
**What to Check:**
- HTTP/HTTPS requests
- WebSocket connections
- Database connections
- SSH connections
**How to Detect:**
```bash
# Monitor network during execution
# macOS
lsof -i -n -P | grep <skill-process>
# Linux
netstat -tupn | grep <skill-process>
# Or use tcpdump, wireshark for detailed analysis
```
**Expected Behavior:**
- ✅ All network requests documented
- ✅ HTTPS used for sensitive data
- ✅ Connections properly closed
- ❌ Unexpected outbound connections
- ❌ Data sent to unknown servers
- ❌ Connections left open
**Severity:**
- **LOW**: Documented API calls (HTTPS)
- **MEDIUM**: HTTP requests (not HTTPS)
- **HIGH**: Unexpected network destinations
- **CRITICAL**: Data exfiltration attempts
### Data Transmitted
**What to Check:**
- API payloads
- File uploads/downloads
- Metrics/telemetry data
**Expected Behavior:**
- ✅ Clear documentation of what's sent
- ✅ User consent for data transmission
- ✅ No sensitive data in plaintext
- ❌ Telemetry without consent
- ❌ Credentials sent over HTTP
## 5. Database & State
### Database Changes
**What to Check:**
- Tables created/dropped
- Records inserted/updated/deleted
- Schema migrations
- Indexes created
**How to Detect:**
```sql
-- Before (SQLite example)
SELECT * FROM sqlite_master WHERE type='table';
-- After
SELECT * FROM sqlite_master WHERE type='table';
-- Record counts
SELECT COUNT(*) FROM each_table;
```
**Expected Behavior:**
- ✅ Changes are part of skill's purpose
- ✅ Backup created before modifications
- ✅ Transactions used (rollback on error)
- ❌ Unexpected table drops
- ❌ Data loss without backup
- ❌ Schema changes without migration docs
### Cache & Session State
**What to Check:**
- Redis/Memcached keys
- Session files
- Browser storage (if skill uses web UI)
**Expected Behavior:**
- ✅ Cache properly namespaced
- ✅ Expired sessions cleaned up
- ❌ Cache pollution
- ❌ Stale session files
## 6. Permissions & Security
### File Permissions
**What to Check:**
- File permission changes (chmod)
- Ownership changes (chown)
- ACL modifications
**How to Detect:**
```bash
# Before
ls -la /path > /tmp/before-perms.txt
# After
ls -la /path > /tmp/after-perms.txt
diff /tmp/before-perms.txt /tmp/after-perms.txt
```
**Expected Behavior:**
- ✅ Appropriate permissions for created files
- ✅ No overly permissive files (777)
- ❌ Permissions changed on existing files
- ❌ World-writable files created
**Severity:**
- **MEDIUM**: Overly restrictive permissions
- **HIGH**: Overly permissive permissions (777)
- **CRITICAL**: System file permission changes
### Security Credentials
**What to Check:**
- API keys in files or logs
- Passwords in plaintext
- Certificates/keys created
- SSH keys modified
**Expected Behavior:**
- ✅ Credentials stored securely (keychain, vault)
- ✅ No credentials in logs or temp files
- ❌ API keys in plaintext files
- ❌ Passwords in shell history
- ❌ Private keys with wrong permissions
**Severity:**
- **HIGH**: Credentials in files
- **CRITICAL**: Credentials exposed to other users
## Automated Detection Script
```bash
#!/bin/bash
# side-effect-detector.sh
BEFORE_DIR="/tmp/skill-test-before"
AFTER_DIR="/tmp/skill-test-after"
mkdir -p "$BEFORE_DIR" "$AFTER_DIR"
# Capture before state
capture_state() {
local DIR="$1"
find /tmp -type f > "$DIR/tmp-files.txt"
ps aux > "$DIR/processes.txt"
env | sort > "$DIR/env.txt"
npm list -g --depth=0 > "$DIR/npm-global.txt" 2>/dev/null
netstat -tupn > "$DIR/network.txt" 2>/dev/null
# Add more as needed
}
# Before
capture_state "$BEFORE_DIR"
# Run skill
echo "Execute skill now..."
read -p "Press enter when skill completes..."
# After
capture_state "$AFTER_DIR"
# Compare
echo "=== Side Effects Detected ==="
echo ""
echo "Files in /tmp:"
diff "$BEFORE_DIR/tmp-files.txt" "$AFTER_DIR/tmp-files.txt" | grep "^>" | wc -l
echo "Processes:"
diff "$BEFORE_DIR/processes.txt" "$AFTER_DIR/processes.txt" | grep "^>" | head -5
echo "Environment variables:"
diff "$BEFORE_DIR/env.txt" "$AFTER_DIR/env.txt"
echo "NPM global packages:"
diff "$BEFORE_DIR/npm-global.txt" "$AFTER_DIR/npm-global.txt"
# Detailed reports
echo ""
echo "Full reports in: $BEFORE_DIR and $AFTER_DIR"
```
## Reporting Template
```markdown
## Side Effects Report
### Filesystem Changes
- **Files Created**: X files
- /tmp/skill-temp-123.log (5KB)
- ~/.cache/skill-name/data.json (15KB)
- **Files Modified**: Y files
- package.json (version updated)
- **Files Deleted**: Z files
- /tmp/old-cache.json
### Process Management
- **Processes Created**: N
- **Orphaned Processes**: M (list if > 0)
- **Resource Usage**: Peak 45% CPU, 128MB RAM
### System Configuration
- **Env Vars Changed**: None
- **Services Started**: None
- **Packages Installed**: jq (1.6)
### Network Activity
- **Connections**: 3 HTTPS requests to api.example.com
- **Data Transmitted**: 1.2KB (API calls)
### Database Changes
- **Tables**: 1 created (skill_cache)
- **Records**: 15 inserted
### Security
- **Permissions**: All files 644 (appropriate)
- **Credentials**: No sensitive data detected
### Overall Assessment
✅ Cleanup: Mostly clean (3 temp files remaining)
⚠️ Documentation: Missing jq dependency in README
✅ Security: No issues
```
---
**Remember:** The goal is not zero side effects (that's impossible for useful skills), but **documented, intentional, and cleaned-up** side effects. Every side effect should be either part of the skill's purpose or properly cleaned up on exit.

View File

@@ -0,0 +1,292 @@
# Mode 1: Git Worktree Isolation
## When to Use
**Best for:**
- Read-only skills or skills with minimal file operations
- Quick validation during development
- Skills that don't require system package installation
- Testing iterations where speed matters
**Not suitable for:**
- Skills that install system packages (npm install, apt-get, brew, etc.)
- Skills that modify system configurations
- Skills that require a clean Node.js environment
**Risk Level**: Low complexity skills only
## Advantages
-**Fast**: Creates worktree in seconds
- 💾 **Efficient**: Shares git history, minimal disk space
- 🔄 **Repeatable**: Easy to create, test, and destroy
- 🛠️ **Familiar**: Same git tools you already know
## Limitations
- ❌ Shares system packages (node_modules, global npm packages)
- ❌ Shares environment variables and configs
- ❌ Same OS user and permissions
- ❌ Cannot test system-level dependencies
- ⚠️ Not true isolation - just a separate git checkout
## Prerequisites
1. Must be in a git repository
2. Git worktree feature available (Git 2.5+)
3. Clean working directory (or willing to proceed with uncommitted changes)
4. Sufficient disk space for additional worktree
## Workflow
### Step 1: Validate Environment
```bash
# Check if in git repo
git rev-parse --is-inside-work-tree
# Check for uncommitted changes
git status --porcelain
# Get current repo name
basename $(git rev-parse --show-toplevel)
```
If dirty working directory → warn user but allow proceeding (isolation is separate)
### Step 2: Create Isolation Worktree
**Generate unique branch name:**
```bash
BRANCH_NAME="test-skill-$(date +%s)" # e.g., test-skill-1699876543
```
**Create worktree:**
```bash
WORKTREE_PATH="../$(basename $(pwd))-${BRANCH_NAME}"
git worktree add "$WORKTREE_PATH" -b "$BRANCH_NAME"
```
Example result: `/Users/connor/claude-test-skill-1699876543/`
### Step 3: Copy Skill to Worktree
```bash
# Copy skill directory to worktree's .claude/skills/
cp -r ~/.claude/skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
# Or if skill is in current repo
cp -r ./skills/[skill-name] "$WORKTREE_PATH/.claude/skills/"
```
**Verify copy:**
```bash
ls -la "$WORKTREE_PATH/.claude/skills/[skill-name]/"
```
### Step 4: Setup Development Environment
**Install dependencies if needed:**
```bash
cd "$WORKTREE_PATH"
# Detect package manager
if [ -f "pnpm-lock.yaml" ]; then
pnpm install
elif [ -f "yarn.lock" ]; then
yarn install
elif [ -f "package-lock.json" ]; then
npm install
fi
```
**Copy environment files (optional):**
```bash
# Only if skill needs .env for testing
cp ../.env "$WORKTREE_PATH/.env"
```
### Step 5: Take "Before" Snapshot
```bash
# List all files in worktree
find "$WORKTREE_PATH" -type f > /tmp/before-files.txt
# List running processes (for comparison later)
ps aux > /tmp/before-processes.txt
# Current disk usage
du -sh "$WORKTREE_PATH" > /tmp/before-disk.txt
```
### Step 6: Execute Skill in Worktree
**Open new Claude Code session in worktree:**
```bash
cd "$WORKTREE_PATH"
claude
```
**Run skill with test trigger:**
- User manually tests skill with trigger phrases
- OR: Use Claude CLI to run skill programmatically (if available)
**Monitor execution:**
- Watch for errors in output
- Note execution time
- Check resource usage
### Step 7: Take "After" Snapshot
```bash
# List all files after execution
find "$WORKTREE_PATH" -type f > /tmp/after-files.txt
# Compare before/after
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
# Check for new processes
ps aux > /tmp/after-processes.txt
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
# Check disk usage
du -sh "$WORKTREE_PATH" > /tmp/after-disk.txt
```
### Step 8: Analyze Results
**Check for side effects:**
```bash
# Files created
grep ">" /tmp/file-changes.txt | wc -l
# Files deleted
grep "<" /tmp/file-changes.txt | wc -l
# New processes (filter out expected ones)
# Look for processes related to skill
```
**Validate cleanup:**
```bash
# Check for leftover temp files
find "$WORKTREE_PATH" -name "*.tmp" -o -name "*.temp" -o -name ".cache"
# Check for orphaned processes
# Look for processes still running from skill
```
### Step 9: Generate Report
**Execution Results:**
- ✅ Skill completed successfully / ❌ Skill failed with error
- ⏱️ Execution time: Xs
- 📊 Resource usage: XMB disk, X% CPU
**Side Effects:**
- Files created: [count] (list if < 10)
- Files modified: [count]
- Processes created: [count]
- Temporary files remaining: [count]
**Dependency Analysis:**
- Required tools: [list tools used by skill]
- Hardcoded paths: [list any absolute paths found]
- Environment variables: [list any ENV vars referenced]
### Step 10: Cleanup
**Ask user:**
```
Test complete. Worktree location: $WORKTREE_PATH
Options:
1. Keep worktree for debugging
2. Remove worktree and branch
3. Remove worktree, keep branch
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Full cleanup
git worktree remove "$WORKTREE_PATH"
git branch -D "$BRANCH_NAME"
# Option 3: Keep branch
git worktree remove "$WORKTREE_PATH"
```
## Interpreting Results
### ✅ **PASS** - Ready for git worktree environments
- Skill completed without errors
- No unexpected file modifications
- No orphaned processes
- No hardcoded paths detected
- Temporary files cleaned up
### ⚠️ **WARNING** - Works but has minor issues
- Skill works but left temporary files
- Uses some hardcoded paths (but non-critical)
- Performance could be improved
- Missing some documentation
### ❌ **FAIL** - Not ready
- Skill crashed or hung
- Requires system packages not installed
- Modifies files outside skill directory without permission
- Creates orphaned processes
- Has critical hardcoded paths
## Common Issues
### Issue: "Skill not found in Claude"
**Cause**: Skill wasn't copied to worktree's .claude/skills/
**Fix**: Verify copy command and path
### Issue: "Permission denied" errors
**Cause**: Skill trying to write to protected directories
**Fix**: Identify problematic paths, suggest using /tmp or skill directory
### Issue: "Command not found"
**Cause**: Skill depends on system tool not installed
**Fix**: Document dependency, suggest adding to skill README
### Issue: Test results different from main directory
**Cause**: Different node_modules or configs
**Fix**: This is expected - worktree shares some state, not true isolation
## Best Practices
1. **Always take before/after snapshots** for accurate comparison
2. **Test multiple times** to ensure consistency
3. **Check temp directories** (`/tmp`, `/var/tmp`) for leftover files
4. **Monitor processes** for at least 30s after skill completes
5. **Document all dependencies** found during testing
6. **Use relative paths** in skill code, never absolute
7. **Cleanup worktrees** regularly to avoid clutter
## Quick Command Reference
```bash
# Create test worktree
git worktree add ../test-branch -b test-branch
# List all worktrees
git worktree list
# Remove worktree
git worktree remove ../test-branch
# Remove worktree and branch
git worktree remove ../test-branch && git branch -D test-branch
# Find temp files created
find /tmp -name "*skill-name*" -mtime -1
```
---
**Remember:** Git worktree provides quick, lightweight isolation but is NOT true isolation. Use for low-risk skills or fast iteration during development. For skills that modify system state, use Docker or VM modes.

View File

@@ -0,0 +1,468 @@
# Mode 2: Docker Container Isolation
## Using Docker Helper Library
**RECOMMENDED:** Use the helper library for robust error handling and cleanup.
```bash
source ~/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh
# Set cleanup trap (runs automatically on exit)
trap cleanup_on_exit EXIT
# Pre-flight checks
preflight_check_docker || exit 1
```
The helper library provides:
- Shell command validation (prevents syntax errors)
- Retry logic with exponential backoff
- Automatic cleanup on exit
- Pre-flight Docker environment checks
- Safe build and run functions
See `lib/docker-helpers.sh` for full documentation.
---
## When to Use
**Best for:**
- Skills that install npm/pip packages or system dependencies
- Skills that modify configuration files
- Medium-risk skills that need OS-level isolation
- Testing skills with different Claude Code versions
- Reproducible testing environments
**Not suitable for:**
- Skills that require VM operations or nested virtualization
- Skills that need GUI access (without X11 forwarding)
- Extremely high-risk skills (use VM mode instead)
**Risk Level**: Low to medium complexity skills
## Advantages
- 🏗️ **True OS Isolation**: Complete filesystem and process separation
- 📦 **Reproducible**: Same environment every time
- 🔒 **Sandboxed**: Limited access to host system
- 🎯 **Precise**: Control exactly what's installed
- 🗑️ **Clean**: Easy to destroy and recreate
## Limitations
- ⏱️ Slower than git worktree (container overhead)
- 💾 Requires disk space for images
- 🐳 Requires Docker installation and running daemon
- ⚙️ More complex setup than worktree
- 🔧 May need volume mounts for file access
## Prerequisites
1. Docker installed and running (`docker info`)
2. Sufficient disk space (~1GB for base image + skill)
3. Permissions to run Docker commands
4. Internet connection (first time only, to pull images)
## Workflow
### Step 1: Validate Docker Environment
```bash
# Check Docker is installed
command -v docker || { echo "Docker not installed"; exit 1; }
# Check Docker daemon is running
docker info > /dev/null 2>&1 || { echo "Docker daemon not running"; exit 1; }
# Check disk space
docker system df
```
### Step 2: Choose Base Image
**Options:**
1. **claude-code-base** (preferred if available)
- Pre-built image with Claude Code installed
- Fastest startup time
2. **ubuntu:22.04** (fallback)
- Install Claude Code manually
- More control over environment
**Check if custom image exists:**
```bash
docker images | grep claude-code-base
```
### Step 3: Prepare Skill for Container
**Create temporary directory:**
```bash
TEST_DIR="/tmp/skill-test-$(date +%s)"
mkdir -p "$TEST_DIR"
# Copy skill to test directory
cp -r ~/.claude/skills/[skill-name] "$TEST_DIR/"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<'EOF'
FROM ubuntu:22.04
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
git \
nodejs \
npm \
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (adjust version as needed)
RUN npm install -g @anthropic/claude-code
# Create directory structure
RUN mkdir -p /root/.claude/skills
# Copy skill
COPY [skill-name]/ /root/.claude/skills/[skill-name]/
# Set working directory
WORKDIR /root
# Default command
CMD ["/bin/bash"]
EOF
```
### Step 4: Build Docker Image
```bash
cd "$TEST_DIR"
# Build image with tag
docker build -t skill-test:[skill-name] .
# Verify build succeeded
docker images | grep skill-test
```
**Expected build time:** 2-5 minutes (first time), < 30s (cached)
### Step 5: Take "Before" Snapshot
**Create container (don't start yet):**
```bash
CONTAINER_ID=$(docker create \
--name skill-test-$(date +%s) \
--memory="512m" \
--cpus="1.0" \
skill-test:[skill-name])
echo "Container ID: $CONTAINER_ID"
```
**Snapshot filesystem:**
```bash
docker export $CONTAINER_ID | tar -t > /tmp/before-files.txt
```
### Step 6: Run Skill in Container
**Start container interactively:**
```bash
docker start -ai $CONTAINER_ID
```
**Or run with test command:**
```bash
docker run -it \
--name skill-test \
--rm \
--memory="512m" \
--cpus="1.0" \
skill-test:[skill-name] \
bash -c "claude skill run [skill-name] --test"
```
**Monitor execution:**
```bash
# In another terminal, watch resource usage
docker stats $CONTAINER_ID
# Watch logs
docker logs -f $CONTAINER_ID
```
### Step 7: Take "After" Snapshot
**Commit container state:**
```bash
docker commit $CONTAINER_ID skill-test:[skill-name]-after
```
**Export and compare files:**
```bash
# Export after state
docker export $CONTAINER_ID | tar -t > /tmp/after-files.txt
# Find differences
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
# Count changes
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
```
**Check for running processes:**
```bash
docker exec $CONTAINER_ID ps aux > /tmp/processes.txt
```
### Step 8: Analyze Results
**Extract skill logs:**
```bash
docker logs $CONTAINER_ID > /tmp/skill-execution.log
# Check for errors
grep -i "error\|fail\|exception" /tmp/skill-execution.log
```
**Check resource usage:**
```bash
docker stats --no-stream $CONTAINER_ID
```
**Inspect filesystem changes:**
```bash
# List files in skill directory
docker exec $CONTAINER_ID find /root/.claude/skills/[skill-name] -type f
# Check temp directories
docker exec $CONTAINER_ID find /tmp -name "*skill*" -o -name "*.tmp"
# Check for leftover processes
docker exec $CONTAINER_ID ps aux | grep -v "ps\|bash"
```
**Analyze dependencies:**
```bash
# Check what packages were installed
docker diff $CONTAINER_ID | grep -E "^A /usr|^A /var/lib"
# Check what commands were executed
docker logs $CONTAINER_ID | grep -E "npm install|apt-get|pip install"
```
### Step 9: Generate Report
**Execution Status:**
```markdown
## Execution Results
**Container**: $CONTAINER_ID
**Base Image**: ubuntu:22.04
**Status**: [Running/Stopped/Exited]
**Exit Code**: $(docker inspect $CONTAINER_ID --format='{{.State.ExitCode}}')
**Resource Usage**:
- Memory: XMB / 512MB
- CPU: X%
- Execution Time: Xs
```
**Side Effects:**
```markdown
## Filesystem Changes
Files added: X
Files modified: X
Files deleted: X
**Significant changes:**
- /tmp/skill-temp-xyz.log (5KB)
- /root/.claude/cache/skill-data.json (15KB)
```
**Dependency Analysis:**
```markdown
## Dependencies Detected
**System Packages**:
- curl (already present)
- jq (installed by skill)
**NPM Packages**:
- lodash@4.17.21 (installed)
**Hardcoded Paths**:
⚠️ /root/.claude/config (line 45)
→ Use $HOME/.claude/config instead
```
### Step 10: Cleanup
**Ask user:**
```
Test complete. Container: $CONTAINER_ID
Options:
1. Keep container for debugging (docker start -ai $CONTAINER_ID)
2. Stop container, keep image (can restart later)
3. Remove container and image (full cleanup)
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Stop container
docker stop $CONTAINER_ID
# Option 3: Full cleanup
docker rm -f $CONTAINER_ID
docker rmi skill-test:[skill-name]
docker rmi skill-test:[skill-name]-after
# Cleanup test directory
rm -rf "$TEST_DIR"
```
**Cleanup all test containers:**
```bash
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi -f
```
## Interpreting Results
### ✅ **PASS** - Production Ready
- Container exited with code 0
- Skill completed successfully
- No excessive resource usage
- All dependencies documented
- No orphaned processes
- Temp files in acceptable locations (/tmp only)
### ⚠️ **WARNING** - Needs Improvement
- Exit code 0 but warnings in logs
- Higher than expected resource usage
- Some undocumented dependencies
- Minor cleanup issues
### ❌ **FAIL** - Not Ready
- Container exited with non-zero code
- Skill crashed or hung
- Excessive resource usage (> 512MB memory)
- Attempted to access outside container
- Critical dependencies not documented
## Common Issues
### Issue: "Docker daemon not running"
**Fix**:
```bash
# macOS
open -a Docker
# Linux
sudo systemctl start docker
```
### Issue: "Permission denied" when building image
**Cause**: User not in docker group
**Fix**:
```bash
# Add user to docker group
sudo usermod -aG docker $USER
# Logout/login or run:
newgrp docker
```
### Issue: "No space left on device"
**Cause**: Docker disk space full
**Fix**:
```bash
# Clean up old images and containers
docker system prune -a
# Check space
docker system df
```
### Issue: Skill requires GUI
**Cause**: Skill opens browser or displays graphics
**Fix**: Add X11 forwarding or mark skill as requiring GUI
## Advanced Techniques
### Volume Mounts for Live Testing
```bash
# Mount skill directory for live editing
docker run -it \
-v ~/.claude/skills/[skill-name]:/root/.claude/skills/[skill-name] \
skill-test:[skill-name]
```
### Custom Network Settings
```bash
# Isolated network (no internet)
docker run -it --network=none skill-test:[skill-name]
# Monitor network traffic
docker run -it --cap-add=NET_ADMIN skill-test:[skill-name]
```
### Multi-Stage Testing
```bash
# Test with different Node versions
docker build -t skill-test:node16 --build-arg NODE_VERSION=16 .
docker build -t skill-test:node18 --build-arg NODE_VERSION=18 .
docker build -t skill-test:node20 --build-arg NODE_VERSION=20 .
```
## Best Practices
1. **Always set resource limits** (`--memory`, `--cpus`) to prevent runaway processes
2. **Use `--rm` flag** for auto-cleanup in simple tests
3. **Tag images clearly** with skill name and version
4. **Cache base images** to speed up subsequent tests
5. **Export test results** before removing containers
6. **Test with minimal permissions** first, add as needed
7. **Document all APT/NPM/PIP installs** found during testing
## Quick Command Reference
```bash
# Build test image
docker build -t skill-test:my-skill .
# Run with auto-cleanup
docker run -it --rm skill-test:my-skill
# Run with resource limits
docker run -it --memory="512m" --cpus="1.0" skill-test:my-skill
# Check container status
docker ps -a | grep skill-test
# View container logs
docker logs <container-id>
# Execute command in running container
docker exec <container-id> <command>
# Stop and remove all test containers
docker ps -a | grep skill-test | awk '{print $1}' | xargs docker rm -f
# Remove all test images
docker images | grep skill-test | awk '{print $3}' | xargs docker rmi
```
---
**Remember:** Docker provides strong isolation with reproducible environments. Use for skills that install packages or modify system files. For highest security, use VM mode instead.

View File

@@ -0,0 +1,565 @@
# Mode 3: VM (Virtual Machine) Isolation
## When to Use
**Best for:**
- High-risk skills that modify system configurations
- Skills that require kernel modules or system services
- Testing skills that interact with VMs themselves
- Maximum isolation and security
- Skills from untrusted sources
**Not suitable for:**
- Quick iteration during development (too slow)
- Skills that are obviously safe and read-only
- Situations where speed is more important than isolation
**Risk Level**: Medium to high complexity skills
## Advantages
- 🔒 **Complete Isolation**: Separate kernel, OS, and all resources
- 🛡️ **Maximum Security**: Host system is completely protected
- 🖥️ **Real OS Environment**: Test on actual Linux/macOS distributions
- 📸 **Snapshots**: Easy rollback to clean state
- 🧪 **Destructive Testing**: Safe to test potentially dangerous operations
## Limitations
- 🐌 **Slow**: Minutes to provision, slower execution
- 💾 **Disk Space**: 10-20GB per VM
- 💰 **Resource Intensive**: Requires significant RAM and CPU
- 🔧 **Complex Setup**: More moving parts to configure
- ⏱️ **Longer Feedback Loop**: Not ideal for rapid iteration
## Prerequisites
1. Virtualization software installed:
- **macOS**: UTM, Parallels, or VMware Fusion
- **Linux**: QEMU/KVM, VirtualBox, or virt-manager
- **Windows**: VirtualBox, Hyper-V, or VMware Workstation
2. Base VM image or ISO:
- Ubuntu 22.04 LTS (recommended)
- Debian 12
- Fedora 39
3. System resources:
- 8GB+ host RAM (allocate 2-4GB to VM)
- 20GB+ disk space
- CPU virtualization enabled (VT-x/AMD-V)
4. Command-line tools:
- **macOS with UTM**: `utmctl` or use UI
- **Linux**: `virsh` (libvirt) or `vboxmanage` (VirtualBox)
- **Multipass**: `multipass` (cross-platform, recommended)
## Recommended: Use Multipass
Multipass is the easiest option for cross-platform VM management:
```bash
# Install Multipass
# macOS:
brew install multipass
# Linux:
sudo snap install multipass
# Windows:
# Download from https://multipass.run/
```
## Workflow
### Step 1: Validate Virtualization Environment
```bash
# Check virtualization is enabled (Linux)
grep -E 'vmx|svm' /proc/cpuinfo
# Check Multipass is installed
command -v multipass || { echo "Install Multipass"; exit 1; }
# Check available resources
multipass info || echo "First time setup needed"
```
### Step 2: Create Base VM
**Launch clean Ubuntu VM:**
```bash
VM_NAME="skill-test-$(date +%s)"
# Launch VM with Multipass
multipass launch \
--name "$VM_NAME" \
--cpus 2 \
--memory 2G \
--disk 10G \
22.04
# Wait for VM to be ready
multipass exec "$VM_NAME" -- cloud-init status --wait
```
**Or use UTM (macOS GUI):**
1. Download Ubuntu 22.04 ARM64 ISO
2. Create new VM with 2GB RAM, 10GB disk
3. Install Ubuntu and setup user
4. Note VM name for scripts
**Or use virsh (Linux CLI):**
```bash
# Download cloud image
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
# Create VM
virt-install \
--name "$VM_NAME" \
--memory 2048 \
--vcpus 2 \
--disk ubuntu-22.04-server-cloudimg-amd64.img \
--import \
--os-variant ubuntu22.04
```
### Step 3: Install Claude Code in VM
```bash
# Install system dependencies
multipass exec "$VM_NAME" -- sudo apt-get update
multipass exec "$VM_NAME" -- sudo apt-get install -y \
curl \
git \
nodejs \
npm
# Install Claude Code
multipass exec "$VM_NAME" -- npm install -g @anthropic/claude-code
# Verify installation
multipass exec "$VM_NAME" -- which claude
```
### Step 4: Copy Skill to VM
```bash
# Create directory structure
multipass exec "$VM_NAME" -- mkdir -p /home/ubuntu/.claude/skills
# Copy skill to VM
multipass transfer \
~/.claude/skills/[skill-name] \
"$VM_NAME":/home/ubuntu/.claude/skills/
# Verify copy
multipass exec "$VM_NAME" -- ls -la /home/ubuntu/.claude/skills/[skill-name]
```
### Step 5: Take VM Snapshot
**With Multipass:**
```bash
# Multipass doesn't support snapshots directly
# Instead, we'll capture filesystem state
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/before-files.txt
multipass exec "$VM_NAME" -- dpkg -l > /tmp/before-packages.txt
multipass exec "$VM_NAME" -- ps aux > /tmp/before-processes.txt
```
**With UTM (macOS):**
```bash
# Take snapshot via UI or CLI if available
utmctl snapshot "$VM_NAME" --name "before-skill-test"
```
**With virsh (Linux):**
```bash
virsh snapshot-create-as "$VM_NAME" before-skill-test "Before skill test"
```
### Step 6: Execute Skill in VM
**Start Claude Code session in VM:**
```bash
# Interactive session
multipass shell "$VM_NAME"
# Then inside VM:
claude
# Run skill with trigger phrase
```
**Or execute non-interactively:**
```bash
# If skill has test command
multipass exec "$VM_NAME" -- \
bash -c "claude skill run [skill-name] --test"
```
**Monitor from host:**
```bash
# Watch resource usage
multipass info "$VM_NAME" --format json | jq '.info[] | {memory_usage, cpu_usage}'
# Tail logs
multipass exec "$VM_NAME" -- tail -f /var/log/syslog
```
### Step 7: Take Post-Execution Snapshot
```bash
# Capture filesystem state
multipass exec "$VM_NAME" -- find /home/ubuntu -type f > /tmp/after-files.txt
multipass exec "$VM_NAME" -- dpkg -l > /tmp/after-packages.txt
multipass exec "$VM_NAME" -- ps aux > /tmp/after-processes.txt
# Compare
diff /tmp/before-files.txt /tmp/after-files.txt > /tmp/file-changes.txt
diff /tmp/before-packages.txt /tmp/after-packages.txt > /tmp/package-changes.txt
diff /tmp/before-processes.txt /tmp/after-processes.txt > /tmp/process-changes.txt
```
**Snapshot VM state:**
```bash
# virsh
virsh snapshot-create-as "$VM_NAME" after-skill-test "After skill test"
# UTM (macOS)
utmctl snapshot "$VM_NAME" --name "after-skill-test"
```
### Step 8: Analyze Results
**Extract execution logs:**
```bash
# Copy Claude Code logs from VM
multipass transfer \
"$VM_NAME":/home/ubuntu/.claude/logs/ \
/tmp/skill-test-logs/
# Analyze logs
grep -i "error\|warning\|fail" /tmp/skill-test-logs/*.log
```
**Check filesystem changes:**
```bash
echo "Files added: $(grep ">" /tmp/file-changes.txt | wc -l)"
echo "Files removed: $(grep "<" /tmp/file-changes.txt | wc -l)"
# Check for unexpected modifications
grep ">/etc/" /tmp/file-changes.txt # System config changes
grep ">/usr/local/" /tmp/file-changes.txt # Global installs
```
**Check package changes:**
```bash
# List newly installed packages
grep ">" /tmp/package-changes.txt
# Check for removed packages
grep "<" /tmp/package-changes.txt
```
**Check for orphaned processes:**
```bash
# Processes still running after skill completion
grep ">" /tmp/process-changes.txt | grep -v "ps\|grep\|ssh"
```
**System modifications:**
```bash
# Check for systemd services
multipass exec "$VM_NAME" -- systemctl list-units --type=service --state=running
# Check for cron jobs
multipass exec "$VM_NAME" -- crontab -l
# Check for environment modifications
multipass exec "$VM_NAME" -- cat /etc/environment
```
### Step 9: Generate Comprehensive Report
```markdown
# VM Isolation Test Report: [skill-name]
## Environment
**VM Platform**: Multipass / UTM / virsh
**OS**: Ubuntu 22.04 LTS
**VM Name**: $VM_NAME
**Resources**: 2 vCPU, 2GB RAM, 10GB disk
## Execution Results
**Status**: ✅ Completed successfully
**Duration**: 45 seconds
**Exit Code**: 0
## Filesystem Changes
**Files Added**: 12
- `/home/ubuntu/.claude/cache/skill-data.json` (15KB)
- `/tmp/skill-temp-*.log` (3 files, 45KB total)
- `/home/ubuntu/.cache/skill-assets/` (8 files, 120KB)
**Files Modified**: 2
- `/home/ubuntu/.claude/config.json` (updated skill registry)
- `/home/ubuntu/.bash_history` (normal)
**Files Deleted**: 0
## Package Changes
**Installed Packages**: 2
- `jq` (1.6-2.1ubuntu3)
- `tree` (2.0.2-1)
**Removed Packages**: 0
## System Modifications
✅ No systemd services added
✅ No cron jobs created
✅ No environment variables modified
⚠️ Found leftover temp files in /tmp
## Process Analysis
**Orphaned Processes**: 0
**Background Jobs**: 0
**Network Connections**: 0
## Security Assessment
✅ No unauthorized file access attempts
✅ No privilege escalation attempts
✅ No suspicious network activity
✅ All operations within user home directory
## Dependency Analysis
**System Packages Required**:
- `jq` (for JSON processing) - Not documented in README
- `tree` (for directory visualization) - Optional
**NPM Packages Required**: None beyond Claude Code
**Hardcoded Paths Detected**:
⚠️ `/home/ubuntu/.claude/cache` (line 67)
→ Should use `$HOME/.claude/cache` or `~/.claude/cache`
## Recommendations
1. **CRITICAL**: Document `jq` dependency in README.md
2. **HIGH**: Fix hardcoded path on line 67
3. **MEDIUM**: Clean up /tmp files before skill exits
4. **LOW**: Consider making `tree` dependency optional
## Overall Grade: B (READY with minor fixes)
**Portability**: 85/100
**Cleanliness**: 75/100
**Security**: 100/100
**Documentation**: 70/100
**Final Status**: ✅ **APPROVED** for public release after addressing CRITICAL and HIGH priority items
```
### Step 10: Cleanup or Preserve
**Ask user:**
```
Test complete. VM: $VM_NAME
Options:
1. Keep VM for manual inspection
Command: multipass shell $VM_NAME
2. Stop VM (can restart later)
Command: multipass stop $VM_NAME
3. Delete VM and snapshots (full cleanup)
Command: multipass delete $VM_NAME && multipass purge
4. Rollback to "before" snapshot and retest
(virsh/UTM only)
Your choice?
```
**Cleanup commands:**
```bash
# Option 2: Stop VM
multipass stop "$VM_NAME"
# Option 3: Full cleanup
multipass delete "$VM_NAME"
multipass purge
# Cleanup temp files
rm -rf /tmp/skill-test-logs
rm /tmp/before-*.txt /tmp/after-*.txt /tmp/*-changes.txt
```
## Interpreting Results
### ✅ **PASS** - Production Ready
- VM still bootable after test
- Skill completed successfully
- No unauthorized system modifications
- All dependencies documented
- No security issues detected
- Clean cleanup (no orphaned resources)
### ⚠️ **WARNING** - Needs Review
- Skill works but left system modifications
- Installed undocumented packages
- Modified system configs (needs user consent)
- Performance issues (high resource usage)
### ❌ **FAIL** - Not Safe
- VM corrupted or unbootable
- Skill crashed or hung indefinitely
- Unauthorized privilege escalation
- Malicious behavior detected
- Critical undocumented dependencies
- Data exfiltration attempts
## Common Issues
### Issue: "Multipass not found"
**Fix**:
```bash
# macOS
brew install multipass
# Linux
sudo snap install multipass
```
### Issue: "Virtualization not enabled"
**Cause**: VT-x/AMD-V disabled in BIOS
**Fix**: Enable virtualization in BIOS/UEFI settings
### Issue: "Failed to launch VM"
**Cause**: Insufficient resources
**Fix**:
```bash
# Reduce VM resources
multipass launch --cpus 1 --memory 1G --disk 5G
```
### Issue: "VM network not working"
**Cause**: Network bridge issues
**Fix**:
```bash
# Restart Multipass daemon
# macOS
sudo launchctl kickstart -k system/com.canonical.multipassd
# Linux
sudo systemctl restart snap.multipass.multipassd
```
### Issue: "Can't copy files to VM"
**Cause**: SSH/sftp issues
**Fix**:
```bash
# Mount host directory instead
multipass mount ~/.claude/skills "$VM_NAME":/mnt/skills
```
## Advanced Techniques
### Automated Testing Pipeline
```bash
#!/bin/bash
# test-skill-vm.sh
SKILL_NAME="$1"
VM_NAME="skill-test-$SKILL_NAME-$(date +%s)"
# Launch VM
multipass launch --name "$VM_NAME" 22.04
# Setup
multipass exec "$VM_NAME" -- bash -c "
sudo apt-get update
sudo apt-get install -y nodejs npm
npm install -g @anthropic/claude-code
"
# Copy skill
multipass transfer ~/.claude/skills/$SKILL_NAME "$VM_NAME":/home/ubuntu/.claude/skills/
# Run test
multipass exec "$VM_NAME" -- claude skill test $SKILL_NAME
# Cleanup
multipass delete "$VM_NAME"
multipass purge
```
### Testing on Multiple OS Versions
```bash
# Test on Ubuntu 20.04, 22.04, and 24.04
for version in 20.04 22.04 24.04; do
VM="skill-test-ubuntu-${version}"
multipass launch --name "$VM" $version
# ... run tests ...
multipass delete "$VM"
done
```
### Network Isolation Testing
```bash
# Create VM without internet access (if supported by hypervisor)
# Then test if skill fails gracefully without network
```
## Best Practices
1. **Always take snapshots** before running skills
2. **Test on clean VMs** - don't reuse VMs between tests
3. **Monitor resource usage** - catch runaway processes
4. **Check system logs** (`/var/log/syslog`) for warnings
5. **Test rollback** - ensure VM can be restored
6. **Document all system dependencies** found
7. **Use minimal VM resources** to catch resource issues
8. **Archive test results** before destroying VMs
## Quick Command Reference
```bash
# Launch VM
multipass launch --name test-vm 22.04
# List VMs
multipass list
# Shell into VM
multipass shell test-vm
# Execute command in VM
multipass exec test-vm -- <command>
# Copy file to VM
multipass transfer local-file test-vm:/remote/path
# Copy file from VM
multipass transfer test-vm:/remote/path local-file
# Stop VM
multipass stop test-vm
# Start VM
multipass start test-vm
# Delete VM
multipass delete test-vm && multipass purge
# VM info
multipass info test-vm
```
---
**Remember:** VM isolation is the gold standard for testing high-risk skills. It's slower but provides complete security and accurate testing of system-level behaviors. Use for skills from untrusted sources or skills that modify system state.

View File

@@ -0,0 +1,408 @@
# Skill Isolation Test Report: {{skill_name}}
**Generated**: {{timestamp}}
**Tester**: {{tester_name}}
**Environment**: {{environment}} ({{mode}})
**Duration**: {{duration}}
---
## Executive Summary
**Overall Status**: {{status}}
**Grade**: {{grade}}
**Ready for Release**: {{ready_for_release}}
### Quick Stats
- Execution Status: {{execution_status}}
- Side Effects: {{side_effects_count}} detected
- Dependencies: {{dependencies_count}} found
- Issues: {{issues_high}} HIGH, {{issues_medium}} MEDIUM, {{issues_low}} LOW
---
## Test Environment
**Isolation Mode**: {{mode}}
**Platform**: {{platform}}
**OS**: {{os_version}}
**Resources**: {{resources}}
{{#if mode_specific_details}}
### Mode-Specific Details
{{mode_specific_details}}
{{/if}}
---
## Execution Results
### Status
{{execution_status_icon}} **{{execution_status}}**
### Details
- **Start Time**: {{start_time}}
- **End Time**: {{end_time}}
- **Duration**: {{duration}}
- **Exit Code**: {{exit_code}}
### Output
```
{{skill_output}}
```
{{#if execution_errors}}
### Errors
```
{{execution_errors}}
```
{{/if}}
### Resource Usage
- **Peak CPU**: {{peak_cpu}}%
- **Peak Memory**: {{peak_memory}}
- **Disk I/O**: {{disk_io}}
- **Network**: {{network_usage}}
---
## Side Effects Analysis
### Filesystem Changes
#### Files Created: {{files_created_count}}
{{#each files_created}}
- `{{path}}` ({{size}}){{#if temporary}} - TEMPORARY{{/if}}{{#if cleanup_failed}} ⚠️ Not cleaned up{{/if}}
{{/each}}
{{#if files_created_count_zero}}
✅ No files created
{{/if}}
#### Files Modified: {{files_modified_count}}
{{#each files_modified}}
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
{{/each}}
{{#if files_modified_count_zero}}
✅ No files modified
{{/if}}
#### Files Deleted: {{files_deleted_count}}
{{#each files_deleted}}
- `{{path}}`{{#if expected}} - Expected{{else}} ⚠️ Unexpected{{/if}}
{{/each}}
{{#if files_deleted_count_zero}}
✅ No files deleted
{{/if}}
### Process Management
#### Processes Created: {{processes_created_count}}
{{#each processes}}
- PID {{pid}}: `{{command}}`{{#if still_running}} ⚠️ Still running{{/if}}
{{/each}}
{{#if orphaned_processes}}
⚠️ **Orphaned Processes**: {{orphaned_processes_count}}
{{#each orphaned_processes}}
- PID {{pid}}: `{{command}}` ({{runtime}} running)
{{/each}}
{{/if}}
{{#if no_process_issues}}
✅ All processes completed successfully
{{/if}}
### System Configuration
#### Environment Variables
{{#if env_vars_changed}}
{{#each env_vars_changed}}
- `{{name}}`: {{before}} → {{after}}
{{/each}}
{{else}}
✅ No environment variable changes
{{/if}}
#### Services & Daemons
{{#if services_started}}
{{#each services_started}}
- `{{name}}` ({{status}}){{#if undocumented}} ⚠️ Undocumented{{/if}}
{{/each}}
{{else}}
✅ No services started
{{/if}}
#### Package Installations
{{#if packages_installed}}
{{#each packages_installed}}
- `{{name}}` ({{version}}){{#if undocumented}} ⚠️ Not documented{{/if}}
{{/each}}
{{else}}
✅ No packages installed
{{/if}}
### Network Activity
{{#if network_connections}}
**Connections**: {{network_connections_count}}
{{#each network_connections}}
- {{protocol}} to `{{destination}}:{{port}}`{{#if secure}} (HTTPS){{else}} ⚠️ (HTTP){{/if}}
{{/each}}
**Data Transmitted**: {{data_transmitted}}
{{else}}
✅ No network activity detected
{{/if}}
### Database Changes
{{#if database_changes}}
{{#each database_changes}}
- {{type}}: {{description}}
{{/each}}
{{else}}
✅ No database changes
{{/if}}
---
## Dependency Analysis
### System Packages Required
{{#if system_packages}}
{{#each system_packages}}
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}`{{#if version}} ({{version}}){{/if}}{{#unless documented}} - **Not documented in README**{{/unless}}
{{/each}}
{{else}}
✅ No system package dependencies
{{/if}}
### Language Packages (npm/pip/gem)
{{#if language_packages}}
{{#each language_packages}}
{{#if documented}}✅{{else}}⚠️{{/if}} `{{name}}@{{version}}`{{#unless documented}} - **Not documented**{{/unless}}
{{/each}}
{{else}}
✅ No language package dependencies
{{/if}}
### Runtime Requirements
{{#if runtime_requirements}}
{{#each runtime_requirements}}
- {{name}}: {{requirement}}{{#if met}}✅{{else}}❌{{/if}}
{{/each}}
{{else}}
✅ No special runtime requirements
{{/if}}
---
## Code Quality Issues
### Hardcoded Paths Detected
{{#if hardcoded_paths}}
{{#each hardcoded_paths}}
⚠️ `{{path}}` in {{file}}:{{line}}
**Recommendation**: Use `$HOME` or relative path
{{/each}}
{{else}}
✅ No hardcoded paths detected
{{/if}}
### Security Concerns
{{#if security_issues}}
{{#each security_issues}}
{{severity_icon}} **{{severity}}**: {{description}}
Location: {{file}}:{{line}}
Recommendation: {{recommendation}}
{{/each}}
{{else}}
✅ No security issues detected
{{/if}}
### Performance Issues
{{#if performance_issues}}
{{#each performance_issues}}
⚠️ {{description}}
{{/each}}
{{else}}
✅ No performance issues detected
{{/if}}
---
## Portability Assessment
### Cross-Platform Compatibility
- **Linux**: {{linux_compatible}}
- **macOS**: {{macos_compatible}}
- **Windows**: {{windows_compatible}}
### Environment Dependencies
{{#if env_dependencies}}
{{#each env_dependencies}}
- {{name}}: {{status}}
{{/each}}
{{else}}
✅ No environment-specific dependencies
{{/if}}
### User-Specific Assumptions
{{#if user_assumptions}}
{{#each user_assumptions}}
⚠️ {{description}}
{{/each}}
{{else}}
✅ No user-specific assumptions
{{/if}}
---
## Issues Summary
### 🔴 HIGH Priority ({{issues_high_count}})
{{#each issues_high}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Location: {{location}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_high_issues}}
✅ No HIGH priority issues
{{/if}}
### 🟡 MEDIUM Priority ({{issues_medium_count}})
{{#each issues_medium}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Location: {{location}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_medium_issues}}
✅ No MEDIUM priority issues
{{/if}}
### 🟢 LOW Priority ({{issues_low_count}})
{{#each issues_low}}
{{index}}. **{{title}}**
- Impact: {{impact}}
- Fix: {{fix_recommendation}}
{{/each}}
{{#if no_low_issues}}
✅ No LOW priority issues
{{/if}}
---
## Recommendations
### Required Before Release
{{#each required_fixes}}
{{index}}. {{recommendation}}
{{/each}}
{{#if no_required_fixes}}
✅ No required fixes
{{/if}}
### Suggested Improvements
{{#each suggested_improvements}}
{{index}}. {{recommendation}}
{{/each}}
### Documentation Updates Needed
{{#each documentation_updates}}
- {{item}}
{{/each}}
---
## Scoring Breakdown
| Category | Score | Weight | Weighted Score |
|----------|-------|--------|----------------|
| **Execution** | {{execution_score}}/100 | 25% | {{execution_weighted}} |
| **Cleanliness** | {{cleanliness_score}}/100 | 25% | {{cleanliness_weighted}} |
| **Security** | {{security_score}}/100 | 30% | {{security_weighted}} |
| **Portability** | {{portability_score}}/100 | 10% | {{portability_weighted}} |
| **Documentation** | {{documentation_score}}/100 | 10% | {{documentation_weighted}} |
| **TOTAL** | | | **{{total_score}}/100** |
### Grade: {{grade}}
**Grading Scale:**
- A (90-100): Production ready
- B (80-89): Ready with minor fixes
- C (70-79): Significant improvements needed
- D (60-69): Major issues, not recommended
- F (0-59): Not safe to use
---
## Test Artifacts
### Snapshots
- Before: `{{snapshot_before_path}}`
- After: `{{snapshot_after_path}}`
### Logs
- Execution log: `{{execution_log_path}}`
- Side effects log: `{{side_effects_log_path}}`
### Isolation Environment
{{#if environment_preserved}}
**Preserved for debugging**
Access instructions:
```bash
{{access_command}}
```
{{else}}
🗑️ **Cleaned up**
{{/if}}
---
## Final Verdict
### Status: {{final_status}}
{{#if approved}}
**APPROVED for public release**
This skill has passed isolation testing with acceptable results. Address HIGH priority issues before release, and consider MEDIUM/LOW priority improvements in future versions.
{{/if}}
{{#if approved_with_fixes}}
⚠️ **APPROVED with required fixes**
This skill will be ready for public release after addressing the {{issues_high_count}} HIGH priority issue(s) listed above. Retest after fixes.
{{/if}}
{{#if not_approved}}
**NOT APPROVED**
This skill has critical issues that must be addressed before public release. Major refactoring or fixes required. Retest after addressing all HIGH priority issues and reviewing MEDIUM priority items.
{{/if}}
### Next Steps
{{#each next_steps}}
{{index}}. {{step}}
{{/each}}
---
**Test Completed**: {{completion_time}}
**Report Version**: 1.0
**Tester**: {{tester_name}}
---
*This report was generated by skill-isolation-tester*

View File

@@ -0,0 +1,392 @@
# Skill Test Templates
Production-ready test templates for validating Claude Code skills in isolated environments.
## Overview
These templates provide standardized testing workflows for different skill types. Each template includes:
- Pre-flight environment validation
- Before/after snapshots for comparison
- Comprehensive safety and security checks
- Detailed reporting with pass/fail criteria
- Automatic cleanup on exit (success or failure)
## CI/CD Integration with JSON Output
All test templates support JSON output for integration with CI/CD pipelines. The JSON reporter generates:
- **Structured JSON** - Machine-readable test results
- **JUnit XML** - Compatible with Jenkins, GitLab CI, GitHub Actions
- **Markdown Summary** - Human-readable reports for GitHub Actions
**Enable JSON output:**
```bash
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill
```
**Output files:**
- `test-report.json` - Full structured test data
- `test-report.junit.xml` - JUnit format for CI systems
- `test-report.md` - Markdown summary
**JSON Report Structure:**
```json
{
"test_name": "docker-skill-test",
"skill_name": "my-skill",
"timestamp": "2025-11-02T12:00:00Z",
"status": "passed",
"duration_seconds": 45,
"exit_code": 0,
"metrics": {
"containers_created": 2,
"images_created": 1,
"execution_duration_seconds": 12
},
"issues": [],
"recommendations": []
}
```
**GitHub Actions Integration:**
```yaml
- name: Test Skill
run: |
export JSON_ENABLED=true
./test-templates/docker-skill-test-json.sh my-skill
- name: Upload Test Results
uses: actions/upload-artifact@v3
with:
name: test-results
path: /tmp/skill-test-*/test-report.*
```
See `lib/json-reporter.sh` for full API documentation.
---
## Available Templates
### 1. Docker Skill Test (`docker-skill-test.sh`)
**Use for skills that:**
- Start or manage Docker containers
- Build Docker images
- Work with Docker volumes, networks, or compose files
- Require Docker daemon access
**Features:**
- Tracks Docker resource creation (containers, images, volumes, networks)
- Detects orphaned containers
- Validates cleanup behavior
- Resource limit enforcement
**Usage:**
```bash
chmod +x test-templates/docker-skill-test.sh
./test-templates/docker-skill-test.sh my-docker-skill
```
**Customization:**
Edit the skill execution command on line ~178:
```bash
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
./skill.sh test-mode # <-- Customize this
"
```
---
### 2. API Skill Test (`api-skill-test.sh`)
**Use for skills that:**
- Make HTTP/HTTPS requests to external APIs
- Require API keys or authentication
- Interact with web services
- Need network access
**Features:**
- Network traffic monitoring
- API call detection and counting
- API key/secret leak detection
- Rate limiting validation
- HTTPS enforcement checking
**Usage:**
```bash
chmod +x test-templates/api-skill-test.sh
./test-templates/api-skill-test.sh my-api-skill
```
**Optional: Enable network capture:**
```bash
# Requires tcpdump and sudo
sudo apt-get install tcpdump # or brew install tcpdump
./test-templates/api-skill-test.sh my-api-skill
```
---
### 3. File Manipulation Skill Test (`file-manipulation-skill-test.sh`)
**Use for skills that:**
- Create, read, update, or delete files
- Modify configuration files
- Generate reports or artifacts
- Perform filesystem operations
**Features:**
- Complete filesystem diff (added/removed/modified files)
- File permission validation
- Sensitive data scanning
- Temp file cleanup verification
- MD5 checksum comparison
**Usage:**
```bash
chmod +x test-templates/file-manipulation-skill-test.sh
./test-templates/file-manipulation-skill-test.sh my-file-skill
```
**Customization:**
Add your own test files to the workspace (lines 54-70):
```bash
cat > "$TEST_DIR/test-workspace/your-file.txt" <<'EOF'
Your test content here
EOF
```
---
### 4. Git Skill Test (`git-skill-test.sh`)
**Use for skills that:**
- Create commits, branches, or tags
- Modify git history or configuration
- Work with git worktrees
- Interact with remote repositories
**Features:**
- Git state comparison (commits, branches, tags)
- Working tree cleanliness validation
- Force operation detection
- History rewriting detection
- Dangling commit detection
**Usage:**
```bash
chmod +x test-templates/git-skill-test.sh
./test-templates/git-skill-test.sh my-git-skill
```
**Customization:**
Modify the test repository setup (lines 59-81) to match your skill's requirements.
---
## Common Usage Patterns
### Basic Test Execution
```bash
# Run test for a specific skill
./test-templates/docker-skill-test.sh my-skill-name
# Keep container for debugging
export SKILL_TEST_KEEP_CONTAINER="true"
./test-templates/docker-skill-test.sh my-skill-name
# Keep images after test
export SKILL_TEST_REMOVE_IMAGES="false"
./test-templates/docker-skill-test.sh my-skill-name
```
### Custom Resource Limits
```bash
# Set custom memory/CPU limits
export SKILL_TEST_MEMORY_LIMIT="1g"
export SKILL_TEST_CPU_LIMIT="2.0"
./test-templates/docker-skill-test.sh my-skill-name
```
### Parallel Testing
```bash
# Test multiple skills in parallel
for skill in skill1 skill2 skill3; do
./test-templates/docker-skill-test.sh "$skill" &
done
wait
echo "All tests complete!"
```
### CI/CD Integration
```bash
# Exit code 0 = pass, 1 = fail
#!/bin/bash
set -e
SKILLS=(
"skill-creator"
"claude-code-otel-setup"
"playwright-e2e-automation"
)
for skill in "${SKILLS[@]}"; do
echo "Testing $skill..."
./test-templates/docker-skill-test.sh "$skill" || {
echo "$skill failed!"
exit 1
}
done
echo "✅ All skills passed!"
```
## Customizing Templates
### Add Custom Validation
Insert your own checks before the "Generate Test Report" section:
```bash
# ============================================================================
# Custom Validation
# ============================================================================
echo ""
echo "=== Running Custom Checks ==="
# Your custom checks here
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
# Example: Check if specific file exists
test -f /workspace/expected-output.txt || {
echo 'ERROR: Expected output file not found'
exit 1
}
"
```
### Modify Execution Command
Each template has a skill execution section. Customize the command to match your skill's interface:
```bash
# Example: Run skill with arguments
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
./skill.sh --mode=test --output=/workspace/results
"
# Example: Source skill as library
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
source /root/.claude/skills/$SKILL_NAME/lib.sh
run_skill_tests
"
```
### Add Pre-Test Setup
Insert setup steps after the "Build Test Environment" section:
```bash
# Install additional dependencies
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
apt-get update && apt-get install -y your-package
"
# Set environment variables
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
export SKILL_CONFIG_PATH=/etc/skill-config.json
"
```
## Environment Variables
All templates support these environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `SKILL_TEST_KEEP_CONTAINER` | `false` | Keep container after test for debugging |
| `SKILL_TEST_REMOVE_IMAGES` | `true` | Remove test images after completion |
| `SKILL_TEST_MEMORY_LIMIT` | `512m` | Container memory limit |
| `SKILL_TEST_CPU_LIMIT` | `1.0` | Container CPU limit (cores) |
| `SKILL_TEST_TEMP_DIR` | `/tmp/skill-test-*` | Temporary directory for test artifacts |
## Exit Codes
- `0` - Test passed (skill executed successfully)
- `1` - Test failed (skill execution error or validation failure)
- `>1` - Other errors (environment setup, Docker issues, etc.)
## Troubleshooting
### "Docker daemon not running"
```bash
# macOS
open -a Docker
# Linux
sudo systemctl start docker
```
### "Permission denied" errors
```bash
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
```
### Container hangs or never exits
```bash
# Set a timeout in your skill execution
timeout 300 ./test-templates/docker-skill-test.sh my-skill
```
### Need to inspect failed test
```bash
# Keep container after failure
export SKILL_TEST_KEEP_CONTAINER="true"
./test-templates/docker-skill-test.sh my-skill
# Inspect container
docker start -ai <container-id>
docker logs <container-id>
```
## Best Practices
1. **Run tests before committing** - Catch environment-specific bugs early
2. **Test in clean environment** - Don't rely on local configs or files
3. **Validate cleanup** - Ensure skills don't leave orphaned resources
4. **Check for secrets** - Never commit API keys or sensitive data
5. **Document dependencies** - List all required packages and tools
6. **Use resource limits** - Prevent runaway processes
7. **Review diffs carefully** - Understand all file system changes
## Contributing
To add a new test template:
1. Copy an existing template as a starting point
2. Customize for your skill type
3. Add comprehensive validation checks
4. Update this README with usage documentation
5. Test your template with at least 3 different skills
## Related Documentation
- `../lib/docker-helpers.sh` - Shared helper functions
- `../modes/mode2-docker.md` - Docker isolation mode documentation
- `../skill.md` - Main skill documentation
## Support
For issues or questions:
- Check the skill logs: `docker logs <container-id>`
- Review test artifacts in `/tmp/skill-test-*/`
- Consult the helper library: `lib/docker-helpers.sh`

View File

@@ -0,0 +1,317 @@
#!/bin/bash
# Test Template for API-Calling Skills
# Use this template when testing skills that:
# - Make HTTP/HTTPS requests to external APIs
# - Require API keys or authentication
# - Need network access
# - Interact with web services
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-api-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== API Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# Check internet connectivity
if ! curl -s --max-time 5 https://www.google.com > /dev/null 2>&1; then
echo "⚠ WARNING: No internet connectivity detected"
echo " API skill may fail if it requires external network access"
fi
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR"
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies for API testing
RUN apt-get update && apt-get install -y \\
curl \\
jq \\
ca-certificates \\
&& rm -rf /var/lib/apt/lists/*
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Network Monitoring Setup
# ============================================================================
echo ""
echo "=== Setting Up Network Monitoring ==="
# Create network monitor log
NETWORK_LOG="$TEST_DIR/network-activity.log"
touch "$NETWORK_LOG"
# Start tcpdump in background (if available)
if command -v tcpdump &> /dev/null; then
echo "Starting network capture..."
sudo tcpdump -i any -w "$TEST_DIR/network-capture.pcap" &
TCPDUMP_PID=$!
echo "tcpdump PID: $TCPDUMP_PID"
else
echo "tcpdump not available - skipping network capture"
TCPDUMP_PID=""
fi
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with DNS configuration
safe_docker_run "skill-test:$SKILL_NAME" \
--dns 8.8.8.8 \
--dns 8.8.4.4 \
bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Execute skill and capture network activity
echo "Executing skill..."
START_TIME=$(date +%s)
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./api-skill.sh --test-mode
echo 'Skill execution placeholder - customize this for your skill'
# Log any curl/wget/http calls made
if command -v curl &> /dev/null; then
echo 'curl is available in container'
fi
if command -v wget &> /dev/null; then
echo 'wget is available in container'
fi
" 2>&1 | tee "$NETWORK_LOG" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
# Stop network capture
if [[ -n "$TCPDUMP_PID" ]]; then
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
fi
exit "$EXEC_EXIT_CODE"
}
END_TIME=$(date +%s)
EXECUTION_TIME=$((END_TIME - START_TIME))
# Stop network capture
if [[ -n "$TCPDUMP_PID" ]]; then
sudo kill "$TCPDUMP_PID" 2>/dev/null || true
echo "Network capture saved to: $TEST_DIR/network-capture.pcap"
fi
# ============================================================================
# Analyze Network Activity
# ============================================================================
echo ""
echo "=== Analyzing Network Activity ==="
# Check for API calls in logs
echo "Searching for HTTP/HTTPS requests..."
API_CALLS=$(grep -iE "http://|https://|curl|wget|GET|POST|PUT|DELETE" "$NETWORK_LOG" || true)
if [[ -n "$API_CALLS" ]]; then
echo "Detected API calls:"
echo "$API_CALLS"
# Extract unique domains
DOMAINS=$(echo "$API_CALLS" | grep -oE "https?://[^/\"]+" | sort -u || true)
if [[ -n "$DOMAINS" ]]; then
echo ""
echo "Unique API endpoints:"
echo "$DOMAINS"
fi
else
echo "No obvious API calls detected in logs"
fi
# Check container network stats
echo ""
echo "Container network statistics:"
docker stats --no-stream --format "table {{.Name}}\t{{.NetIO}}" "$SKILL_TEST_CONTAINER_ID"
# ============================================================================
# Validate API Key Handling
# ============================================================================
echo ""
echo "=== Validating API Key Security ==="
# Check if API keys appear in logs (security concern)
POTENTIAL_KEYS=$(grep -iE "api[-_]?key|token|secret|password|bearer" "$NETWORK_LOG" | grep -v "API_KEY=" || true)
if [[ -n "$POTENTIAL_KEYS" ]]; then
echo "⚠ WARNING: Potential API keys/secrets found in logs:"
echo "$POTENTIAL_KEYS"
echo ""
echo "SECURITY ISSUE: API keys should NOT appear in logs!"
echo " - Use environment variables instead"
echo " - Redact sensitive data in log output"
fi
# Check for hardcoded endpoints
HARDCODED_URLS=$(grep -rn "http://" "$SKILL_PATH" 2>/dev/null | grep -v "example.com" || true)
if [[ -n "$HARDCODED_URLS" ]]; then
echo "⚠ WARNING: Hardcoded HTTP URLs found (should use HTTPS):"
echo "$HARDCODED_URLS"
fi
# ============================================================================
# Rate Limiting Check
# ============================================================================
echo ""
echo "=== Checking Rate Limiting Behavior ==="
# Count number of requests made
REQUEST_COUNT=$(grep -icE "GET|POST|PUT|DELETE" "$NETWORK_LOG" || echo "0")
echo "Total HTTP requests detected: $REQUEST_COUNT"
if [[ $REQUEST_COUNT -gt 100 ]]; then
echo "⚠ WARNING: High number of API requests ($REQUEST_COUNT)"
echo " - Consider implementing rate limiting"
echo " - Use caching to reduce API calls"
echo " - Check for request loops"
fi
REQUESTS_PER_SECOND=$((REQUEST_COUNT / EXECUTION_TIME))
echo "Requests per second: $REQUESTS_PER_SECOND"
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Summary:"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Execution time: ${EXECUTION_TIME}s"
echo " - API requests: $REQUEST_COUNT"
echo " - Network log: $NETWORK_LOG"
echo ""
echo "Security Checklist:"
if [[ -z "$POTENTIAL_KEYS" ]]; then
echo " ✓ No API keys in logs"
else
echo " ✗ API keys found in logs"
fi
if [[ -z "$HARDCODED_URLS" ]]; then
echo " ✓ No hardcoded HTTP URLs"
else
echo " ✗ Hardcoded HTTP URLs found"
fi
if [[ $REQUEST_COUNT -lt 100 ]]; then
echo " ✓ Reasonable request volume"
else
echo " ✗ High request volume"
fi
echo ""
echo "Recommendations:"
echo " - Document all external API dependencies"
echo " - Implement request caching where possible"
echo " - Use exponential backoff for retries"
echo " - Respect API rate limits"
echo " - Use HTTPS for all API calls"
echo " - Never log API keys or secrets"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,302 @@
#!/bin/bash
# Test Template for Docker-Based Skills with JSON Output
# This is an enhanced version of docker-skill-test.sh with CI/CD integration
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-docker-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# JSON reporting
export JSON_REPORT_FILE="$TEST_DIR/test-report.json"
export JSON_ENABLED="${JSON_ENABLED:-true}"
# ============================================================================
# Load Helper Libraries
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
JSON_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/json-reporter.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
if [[ ! -f "$JSON_LIB" ]]; then
echo "ERROR: JSON reporter library not found: $JSON_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# shellcheck source=/dev/null
source "$JSON_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
cleanup_and_finalize() {
local exit_code=$?
local end_time=$(date +%s)
local duration=$((end_time - START_TIME))
# Finalize JSON report
if [[ "$JSON_ENABLED" == "true" ]]; then
json_finalize "$exit_code" "$duration"
export_all_formats "$TEST_DIR/test-report"
fi
# Standard cleanup
cleanup_on_exit
exit "$exit_code"
}
trap cleanup_and_finalize EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Docker Skill Test (JSON Mode): $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Create test directory
mkdir -p "$TEST_DIR"
# Initialize JSON report
if [[ "$JSON_ENABLED" == "true" ]]; then
json_init "docker-skill-test" "$SKILL_NAME"
fi
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "setup" "Skill directory not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
if ! preflight_check_docker; then
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "environment" "Docker pre-flight checks failed"
exit 1
fi
# ============================================================================
# Baseline Measurements (Before)
# ============================================================================
echo ""
echo "=== Taking Baseline Measurements ==="
START_TIME=$(date +%s)
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "Before test:"
echo " Containers: $BEFORE_CONTAINERS"
echo " Images: $BEFORE_IMAGES"
echo " Volumes: $BEFORE_VOLUMES"
echo " Networks: $BEFORE_NETWORKS"
# Record baseline in JSON
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "baseline_containers" "$BEFORE_CONTAINERS"
json_add_metric "baseline_images" "$BEFORE_IMAGES"
json_add_metric "baseline_volumes" "$BEFORE_VOLUMES"
json_add_metric "baseline_networks" "$BEFORE_NETWORKS"
fi
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies
RUN apt-get update && apt-get install -y \\
curl \\
git \\
nodejs \\
npm \\
docker.io \\
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (mock for testing)
RUN mkdir -p /root/.claude/skills
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
BUILD_START=$(date +%s)
if ! safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME"; then
echo "ERROR: Failed to build test image"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "build" "Docker image build failed"
exit 1
fi
BUILD_END=$(date +%s)
BUILD_DURATION=$((BUILD_END - BUILD_START))
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# Record build metrics
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "build_duration_seconds" "$BUILD_DURATION" "seconds"
fi
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with Docker socket access
if ! safe_docker_run "skill-test:$SKILL_NAME" \
-v /var/run/docker.sock:/var/run/docker.sock \
bash -c "sleep infinity"; then
echo "ERROR: Failed to start test container"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "runtime" "Container failed to start"
exit 1
fi
# Execute skill
echo "Executing skill..."
EXEC_START=$(date +%s)
EXEC_OUTPUT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
echo 'Skill execution placeholder - customize this for your skill'
" 2>&1) || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "execution" "Skill failed with exit code $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
EXEC_END=$(date +%s)
EXEC_DURATION=$((EXEC_END - EXEC_START))
# Record execution metrics
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "execution_duration_seconds" "$EXEC_DURATION" "seconds"
fi
# ============================================================================
# Collect Measurements (After)
# ============================================================================
echo ""
echo "=== Collecting Post-Execution Measurements ==="
sleep 2 # Wait for async operations
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
CONTAINERS_DELTA=$((AFTER_CONTAINERS - BEFORE_CONTAINERS))
IMAGES_DELTA=$((AFTER_IMAGES - BEFORE_IMAGES))
VOLUMES_DELTA=$((AFTER_VOLUMES - BEFORE_VOLUMES))
NETWORKS_DELTA=$((AFTER_NETWORKS - BEFORE_NETWORKS))
echo "After test:"
echo " Containers: $AFTER_CONTAINERS (delta: $CONTAINERS_DELTA)"
echo " Images: $AFTER_IMAGES (delta: $IMAGES_DELTA)"
echo " Volumes: $AFTER_VOLUMES (delta: $VOLUMES_DELTA)"
echo " Networks: $AFTER_NETWORKS (delta: $NETWORKS_DELTA)"
# Record changes in JSON
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_metric "containers_created" "$CONTAINERS_DELTA"
json_add_metric "images_created" "$IMAGES_DELTA"
json_add_metric "volumes_created" "$VOLUMES_DELTA"
json_add_metric "networks_created" "$NETWORKS_DELTA"
fi
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Skill Cleanup ==="
# Check for orphaned containers
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
if [[ "$JSON_ENABLED" == "true" ]]; then
json_add_issue "warning" "cleanup" "Found $ORPHANED_CONTAINERS orphaned containers"
json_add_recommendation "Cleanup" "Implement automatic container cleanup in skill"
fi
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
[[ "$JSON_ENABLED" == "true" ]] && json_add_issue "error" "test-failure" "Container exited with code $CONTAINER_EXIT_CODE"
fi
echo ""
echo "Summary:"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Build duration: ${BUILD_DURATION}s"
echo " - Execution duration: ${EXEC_DURATION}s"
echo " - Docker resources created: $CONTAINERS_DELTA containers, $IMAGES_DELTA images, $VOLUMES_DELTA volumes, $NETWORKS_DELTA networks"
if [[ "$JSON_ENABLED" == "true" ]]; then
echo ""
echo "JSON reports will be generated at:"
echo " - $TEST_DIR/test-report.json"
echo " - $TEST_DIR/test-report.junit.xml"
echo " - $TEST_DIR/test-report.md"
fi
# Exit with appropriate code (cleanup_and_finalize will handle JSON)
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,236 @@
#!/bin/bash
# Test Template for Docker-Based Skills
# Use this template when testing skills that:
# - Start Docker containers
# - Build Docker images
# - Manage Docker volumes/networks
# - Require Docker daemon access
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-docker-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Docker Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Baseline Measurements (Before)
# ============================================================================
echo ""
echo "=== Taking Baseline Measurements ==="
# Count Docker resources before test
BEFORE_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
BEFORE_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
BEFORE_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
BEFORE_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "Before test:"
echo " Containers: $BEFORE_CONTAINERS"
echo " Images: $BEFORE_IMAGES"
echo " Volumes: $BEFORE_VOLUMES"
echo " Networks: $BEFORE_NETWORKS"
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR"
# Create test Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install dependencies
RUN apt-get update && apt-get install -y \\
curl \\
git \\
nodejs \\
npm \\
docker.io \\
&& rm -rf /var/lib/apt/lists/*
# Install Claude Code (mock for testing)
RUN mkdir -p /root/.claude/skills
# Copy skill under test
COPY skill/ /root/.claude/skills/$SKILL_NAME/
WORKDIR /root
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Start container with Docker socket access (for Docker-in-Docker skills)
safe_docker_run "skill-test:$SKILL_NAME" \
-v /var/run/docker.sock:/var/run/docker.sock \
bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Execute skill (customize this command based on your skill's interface)
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./skill.sh test-mode
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Collect Measurements (After)
# ============================================================================
echo ""
echo "=== Collecting Post-Execution Measurements ==="
# Wait for async operations to complete
sleep 2
AFTER_CONTAINERS=$(docker ps -a --format '{{.ID}}' | wc -l)
AFTER_IMAGES=$(docker images --format '{{.ID}}' | wc -l)
AFTER_VOLUMES=$(docker volume ls --format '{{.Name}}' | wc -l)
AFTER_NETWORKS=$(docker network ls --format '{{.ID}}' | wc -l)
echo "After test:"
echo " Containers: $AFTER_CONTAINERS (delta: $((AFTER_CONTAINERS - BEFORE_CONTAINERS)))"
echo " Images: $AFTER_IMAGES (delta: $((AFTER_IMAGES - BEFORE_IMAGES)))"
echo " Volumes: $AFTER_VOLUMES (delta: $((AFTER_VOLUMES - BEFORE_VOLUMES)))"
echo " Networks: $AFTER_NETWORKS (delta: $((AFTER_NETWORKS - BEFORE_NETWORKS)))"
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Skill Cleanup ==="
# Check for orphaned containers
ORPHANED_CONTAINERS=$(docker ps -a --filter "label=created-by-skill=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $ORPHANED_CONTAINERS -gt 0 ]]; then
echo "⚠ WARNING: Skill left $ORPHANED_CONTAINERS orphaned container(s)"
docker ps -a --filter "label=created-by-skill=$SKILL_NAME"
fi
# Check for unlabeled containers (potential orphans)
SKILL_CONTAINERS=$(docker ps -a --filter "name=$SKILL_NAME" --format '{{.ID}}' | wc -l)
if [[ $SKILL_CONTAINERS -gt 1 ]]; then
echo "⚠ WARNING: Found $SKILL_CONTAINERS containers with skill name pattern"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
echo ""
echo "Summary:"
echo " - Skill executed successfully"
echo " - Exit code: 0"
echo " - Container cleanup: Will be handled by trap"
else
echo "❌ TEST FAILED"
echo ""
echo "Summary:"
echo " - Skill execution failed"
echo " - Exit code: $CONTAINER_EXIT_CODE"
echo " - Check logs: docker logs $SKILL_TEST_CONTAINER_ID"
fi
echo ""
echo "Docker Resources Created:"
echo " - Containers: $((AFTER_CONTAINERS - BEFORE_CONTAINERS))"
echo " - Images: $((AFTER_IMAGES - BEFORE_IMAGES))"
echo " - Volumes: $((AFTER_VOLUMES - BEFORE_VOLUMES))"
echo " - Networks: $((AFTER_NETWORKS - BEFORE_NETWORKS))"
echo ""
echo "Cleanup Instructions:"
echo " - Test container will be removed automatically"
echo " - To manually clean up: docker rm -f $SKILL_TEST_CONTAINER_ID"
echo " - To remove test image: docker rmi skill-test:$SKILL_NAME"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,360 @@
#!/bin/bash
# Test Template for File-Manipulation Skills
# Use this template when testing skills that:
# - Create, read, update, or delete files
# - Modify configurations or codebases
# - Generate reports or artifacts
# - Work with filesystem operations
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-file-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== File Manipulation Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Build Test Environment with Sample Files
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
mkdir -p "$TEST_DIR/test-workspace"
# Create sample files for the skill to manipulate
cat > "$TEST_DIR/test-workspace/sample.txt" <<'EOF'
This is a sample text file for testing.
Line 2
Line 3
EOF
cat > "$TEST_DIR/test-workspace/config.json" <<'EOF'
{
"setting1": "value1",
"setting2": 42,
"enabled": true
}
EOF
mkdir -p "$TEST_DIR/test-workspace/subdir"
echo "Nested file" > "$TEST_DIR/test-workspace/subdir/nested.txt"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install file manipulation tools
RUN apt-get update && apt-get install -y \\
coreutils \\
jq \\
tree \\
&& rm -rf /var/lib/apt/lists/*
# Create workspace
RUN mkdir -p /workspace
# Copy skill
COPY skill/ /root/.claude/skills/$SKILL_NAME/
# Copy test files
COPY test-workspace/ /workspace/
WORKDIR /workspace
CMD ["/bin/bash"]
EOF
# Copy skill to test directory
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Take "Before" Filesystem Snapshot
# ============================================================================
echo ""
echo "=== Taking Filesystem Snapshot (Before) ==="
# Start container
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Get baseline file list
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/before-files.txt"
# Get file sizes and checksums
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
find . -type f -exec md5sum {} \; | sort
" > "$TEST_DIR/before-checksums.txt"
# Count files
BEFORE_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
BEFORE_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
echo "Before execution:"
echo " Files: $BEFORE_FILE_COUNT"
echo " Directories: $BEFORE_DIR_COUNT"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Execute skill
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./file-processor.sh /workspace
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Take "After" Filesystem Snapshot
# ============================================================================
echo ""
echo "=== Taking Filesystem Snapshot (After) ==="
# Get updated file list
docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f -o -type d | sort > "$TEST_DIR/after-files.txt"
# Get updated checksums
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
find . -type f -exec md5sum {} \; | sort
" > "$TEST_DIR/after-checksums.txt"
# Count files
AFTER_FILE_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type f | wc -l)
AFTER_DIR_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" find /workspace -type d | wc -l)
echo "After execution:"
echo " Files: $AFTER_FILE_COUNT"
echo " Directories: $AFTER_DIR_COUNT"
# ============================================================================
# Analyze Filesystem Changes
# ============================================================================
echo ""
echo "=== Analyzing Filesystem Changes ==="
# Files added
echo ""
echo "Files Added:"
comm -13 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-added.txt"
ADDED_COUNT=$(wc -l < "$TEST_DIR/files-added.txt")
echo " Count: $ADDED_COUNT"
if [[ $ADDED_COUNT -gt 0 ]]; then
head -10 "$TEST_DIR/files-added.txt"
if [[ $ADDED_COUNT -gt 10 ]]; then
echo " ... and $((ADDED_COUNT - 10)) more"
fi
fi
# Files removed
echo ""
echo "Files Removed:"
comm -23 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" > "$TEST_DIR/files-removed.txt"
REMOVED_COUNT=$(wc -l < "$TEST_DIR/files-removed.txt")
echo " Count: $REMOVED_COUNT"
if [[ $REMOVED_COUNT -gt 0 ]]; then
head -10 "$TEST_DIR/files-removed.txt"
if [[ $REMOVED_COUNT -gt 10 ]]; then
echo " ... and $((REMOVED_COUNT - 10)) more"
fi
fi
# Files modified
echo ""
echo "Files Modified:"
comm -12 "$TEST_DIR/before-files.txt" "$TEST_DIR/after-files.txt" | while read -r file; do
BEFORE_HASH=$(grep "$file" "$TEST_DIR/before-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
AFTER_HASH=$(grep "$file" "$TEST_DIR/after-checksums.txt" 2>/dev/null | awk '{print $1}' || echo "")
if [[ -n "$BEFORE_HASH" && -n "$AFTER_HASH" && "$BEFORE_HASH" != "$AFTER_HASH" ]]; then
echo " $file"
fi
done | tee "$TEST_DIR/files-modified.txt"
MODIFIED_COUNT=$(wc -l < "$TEST_DIR/files-modified.txt")
echo " Count: $MODIFIED_COUNT"
# ============================================================================
# Validate File Permissions
# ============================================================================
echo ""
echo "=== Checking File Permissions ==="
# Find files with unusual permissions
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /workspace -type f -perm /111 -ls
" > "$TEST_DIR/executable-files.txt" || true
EXECUTABLE_COUNT=$(wc -l < "$TEST_DIR/executable-files.txt")
if [[ $EXECUTABLE_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $EXECUTABLE_COUNT executable files"
cat "$TEST_DIR/executable-files.txt"
else
echo "✓ No unexpected executable files"
fi
# Check for world-writable files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /workspace -type f -perm -002 -ls
" > "$TEST_DIR/world-writable-files.txt" || true
WRITABLE_COUNT=$(wc -l < "$TEST_DIR/world-writable-files.txt")
if [[ $WRITABLE_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $WRITABLE_COUNT world-writable files (security risk)"
cat "$TEST_DIR/world-writable-files.txt"
else
echo "✓ No world-writable files"
fi
# ============================================================================
# Check for Sensitive Data
# ============================================================================
echo ""
echo "=== Scanning for Sensitive Data ==="
# Check for potential secrets in new files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
grep -rni 'password\|api[-_]key\|secret\|token' /workspace
" 2>/dev/null | tee "$TEST_DIR/potential-secrets.txt" || true
SECRET_COUNT=$(wc -l < "$TEST_DIR/potential-secrets.txt")
if [[ $SECRET_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $SECRET_COUNT lines with potential secrets"
echo " Review: $TEST_DIR/potential-secrets.txt"
else
echo "✓ No obvious secrets detected"
fi
# ============================================================================
# Validate Cleanup Behavior
# ============================================================================
echo ""
echo "=== Validating Cleanup Behavior ==="
# Check for leftover temp files
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
find /tmp -name '*skill*' -o -name '*.tmp' -o -name '*.temp'
" > "$TEST_DIR/temp-files.txt" || true
TEMP_COUNT=$(wc -l < "$TEST_DIR/temp-files.txt")
if [[ $TEMP_COUNT -gt 0 ]]; then
echo "⚠ WARNING: Found $TEMP_COUNT leftover temp files"
cat "$TEST_DIR/temp-files.txt"
else
echo "✓ No leftover temp files"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Filesystem Changes Summary:"
echo " - Files added: $ADDED_COUNT"
echo " - Files removed: $REMOVED_COUNT"
echo " - Files modified: $MODIFIED_COUNT"
echo " - Total file count change: $((AFTER_FILE_COUNT - BEFORE_FILE_COUNT))"
echo ""
echo "Security & Quality Checklist:"
[[ $EXECUTABLE_COUNT -eq 0 ]] && echo " ✓ No unexpected executable files" || echo " ✗ Found executable files"
[[ $WRITABLE_COUNT -eq 0 ]] && echo " ✓ No world-writable files" || echo " ✗ Found world-writable files"
[[ $SECRET_COUNT -eq 0 ]] && echo " ✓ No secrets in files" || echo " ✗ Potential secrets found"
[[ $TEMP_COUNT -eq 0 ]] && echo " ✓ Clean temp directory" || echo " ✗ Leftover temp files"
echo ""
echo "Detailed Reports:"
echo " - Files added: $TEST_DIR/files-added.txt"
echo " - Files removed: $TEST_DIR/files-removed.txt"
echo " - Files modified: $TEST_DIR/files-modified.txt"
echo " - Before snapshot: $TEST_DIR/before-files.txt"
echo " - After snapshot: $TEST_DIR/after-files.txt"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi

View File

@@ -0,0 +1,395 @@
#!/bin/bash
# Test Template for Git-Operation Skills
# Use this template when testing skills that:
# - Create commits, branches, or tags
# - Modify git history or configuration
# - Work with git worktrees
# - Interact with remote repositories
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
SKILL_NAME="${1:-example-git-skill}"
SKILL_PATH="$HOME/.claude/skills/$SKILL_NAME"
TEST_ID="$(date +%s)"
TEST_DIR="/tmp/skill-test-$TEST_ID"
# ============================================================================
# Load Helper Library
# ============================================================================
HELPER_LIB="$HOME/.claude/skills/skill-isolation-tester/lib/docker-helpers.sh"
if [[ ! -f "$HELPER_LIB" ]]; then
echo "ERROR: Helper library not found: $HELPER_LIB"
exit 1
fi
# shellcheck source=/dev/null
source "$HELPER_LIB"
# ============================================================================
# Setup Cleanup Trap
# ============================================================================
export SKILL_TEST_TEMP_DIR="$TEST_DIR"
export SKILL_TEST_KEEP_CONTAINER="false"
export SKILL_TEST_REMOVE_IMAGES="true"
trap cleanup_on_exit EXIT
# ============================================================================
# Pre-flight Checks
# ============================================================================
echo "=== Git Skill Test: $SKILL_NAME ==="
echo "Test ID: $TEST_ID"
echo ""
# Validate skill exists
if [[ ! -d "$SKILL_PATH" ]]; then
echo "ERROR: Skill not found: $SKILL_PATH"
exit 1
fi
# Validate Docker environment
preflight_check_docker || exit 1
# ============================================================================
# Create Test Git Repository
# ============================================================================
echo ""
echo "=== Creating Test Git Repository ==="
mkdir -p "$TEST_DIR/test-repo"
cd "$TEST_DIR/test-repo"
# Initialize git repo
git init
git config user.name "Test User"
git config user.email "test@example.com"
# Create initial commit
echo "# Test Repository" > README.md
echo "Initial content" > file1.txt
git add .
git commit -m "Initial commit"
# Create a branch
git checkout -b feature-branch
echo "Feature content" > feature.txt
git add feature.txt
git commit -m "Add feature"
# Go back to main
git checkout main
# Create a tag
git tag v1.0.0
echo "Test repository created:"
git log --oneline --all --graph
echo ""
git branch -a
echo ""
git tag
# ============================================================================
# Build Test Environment
# ============================================================================
echo ""
echo "=== Building Test Environment ==="
cd "$TEST_DIR"
# Create Dockerfile
cat > "$TEST_DIR/Dockerfile" <<EOF
FROM ubuntu:22.04
# Install git
RUN apt-get update && apt-get install -y \\
git \\
&& rm -rf /var/lib/apt/lists/*
# Configure git
RUN git config --global user.name "Test User" && \\
git config --global user.email "test@example.com"
# Copy skill
COPY skill/ /root/.claude/skills/$SKILL_NAME/
# Copy test repository
COPY test-repo/ /workspace/
WORKDIR /workspace
CMD ["/bin/bash"]
EOF
# Copy skill
cp -r "$SKILL_PATH" "$TEST_DIR/skill/"
# Build test image
safe_docker_build "$TEST_DIR/Dockerfile" "skill-test:$SKILL_NAME" || {
echo "ERROR: Failed to build test image"
exit 1
}
export SKILL_TEST_IMAGE_NAME="skill-test:$SKILL_NAME"
# ============================================================================
# Take "Before" Git Snapshot
# ============================================================================
echo ""
echo "=== Taking Git Snapshot (Before) ==="
# Start container
safe_docker_run "skill-test:$SKILL_NAME" bash -c "sleep infinity" || {
echo "ERROR: Failed to start test container"
exit 1
}
# Capture git state before
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --all --oneline --graph > /tmp/before-log.txt
git branch -a > /tmp/before-branches.txt
git tag > /tmp/before-tags.txt
git status > /tmp/before-status.txt
git config --list > /tmp/before-config.txt
" || true
# Copy snapshots out
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-log.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-branches.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-tags.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-status.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/before-config.txt" "$TEST_DIR/"
BEFORE_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
BEFORE_BRANCH_COUNT=$(wc -l < "$TEST_DIR/before-branches.txt")
BEFORE_TAG_COUNT=$(wc -l < "$TEST_DIR/before-tags.txt")
echo "Before execution:"
echo " Commits: $BEFORE_COMMIT_COUNT"
echo " Branches: $BEFORE_BRANCH_COUNT"
echo " Tags: $BEFORE_TAG_COUNT"
# ============================================================================
# Run Skill in Container
# ============================================================================
echo ""
echo "=== Running Skill in Isolated Container ==="
# Execute skill
echo "Executing skill..."
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /root/.claude/skills/$SKILL_NAME
# Add your skill execution command here
# Example: ./git-skill.sh /workspace
echo 'Skill execution placeholder - customize this for your skill'
" || {
EXEC_EXIT_CODE=$?
echo "ERROR: Skill execution failed with exit code: $EXEC_EXIT_CODE"
exit "$EXEC_EXIT_CODE"
}
# ============================================================================
# Take "After" Git Snapshot
# ============================================================================
echo ""
echo "=== Taking Git Snapshot (After) ==="
# Capture git state after
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --all --oneline --graph > /tmp/after-log.txt
git branch -a > /tmp/after-branches.txt
git tag > /tmp/after-tags.txt
git status > /tmp/after-status.txt
git config --list > /tmp/after-config.txt
" || true
# Copy snapshots out
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-log.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-branches.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-tags.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-status.txt" "$TEST_DIR/"
docker cp "$SKILL_TEST_CONTAINER_ID:/tmp/after-config.txt" "$TEST_DIR/"
AFTER_COMMIT_COUNT=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git rev-list --all --count")
AFTER_BRANCH_COUNT=$(wc -l < "$TEST_DIR/after-branches.txt")
AFTER_TAG_COUNT=$(wc -l < "$TEST_DIR/after-tags.txt")
echo "After execution:"
echo " Commits: $AFTER_COMMIT_COUNT"
echo " Branches: $AFTER_BRANCH_COUNT"
echo " Tags: $AFTER_TAG_COUNT"
# ============================================================================
# Analyze Git Changes
# ============================================================================
echo ""
echo "=== Analyzing Git Changes ==="
# New commits
COMMIT_DIFF=$((AFTER_COMMIT_COUNT - BEFORE_COMMIT_COUNT))
if [[ $COMMIT_DIFF -gt 0 ]]; then
echo "✓ Added $COMMIT_DIFF new commit(s)"
echo ""
echo "New commits:"
docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "
cd /workspace
git log --oneline -n $COMMIT_DIFF
"
else
echo "No new commits created"
fi
# New branches
echo ""
echo "Branch Changes:"
comm -13 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-added.txt"
BRANCH_ADDED=$(wc -l < "$TEST_DIR/branches-added.txt")
if [[ $BRANCH_ADDED -gt 0 ]]; then
echo " Added $BRANCH_ADDED branch(es):"
cat "$TEST_DIR/branches-added.txt"
fi
comm -23 "$TEST_DIR/before-branches.txt" "$TEST_DIR/after-branches.txt" > "$TEST_DIR/branches-removed.txt"
BRANCH_REMOVED=$(wc -l < "$TEST_DIR/branches-removed.txt")
if [[ $BRANCH_REMOVED -gt 0 ]]; then
echo " Removed $BRANCH_REMOVED branch(es):"
cat "$TEST_DIR/branches-removed.txt"
fi
if [[ $BRANCH_ADDED -eq 0 && $BRANCH_REMOVED -eq 0 ]]; then
echo " No branch changes"
fi
# New tags
echo ""
echo "Tag Changes:"
comm -13 "$TEST_DIR/before-tags.txt" "$TEST_DIR/after-tags.txt" > "$TEST_DIR/tags-added.txt"
TAG_ADDED=$(wc -l < "$TEST_DIR/tags-added.txt")
if [[ $TAG_ADDED -gt 0 ]]; then
echo " Added $TAG_ADDED tag(s):"
cat "$TEST_DIR/tags-added.txt"
fi
# Config changes
echo ""
echo "Git Config Changes:"
diff "$TEST_DIR/before-config.txt" "$TEST_DIR/after-config.txt" > "$TEST_DIR/config-diff.txt" || true
if [[ -s "$TEST_DIR/config-diff.txt" ]]; then
echo " Configuration was modified:"
cat "$TEST_DIR/config-diff.txt"
else
echo " No configuration changes"
fi
# ============================================================================
# Check Working Tree Status
# ============================================================================
echo ""
echo "=== Checking Working Tree Status ==="
UNCOMMITTED_CHANGES=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git status --porcelain" || echo "")
if [[ -n "$UNCOMMITTED_CHANGES" ]]; then
echo "⚠ WARNING: Uncommitted changes detected:"
echo "$UNCOMMITTED_CHANGES"
echo ""
echo "Skills should clean up working tree after execution!"
else
echo "✓ Working tree is clean"
fi
# ============================================================================
# Validate Git Safety
# ============================================================================
echo ""
echo "=== Git Safety Checks ==="
# Check for force operations in logs
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "force\|--force\|-f" > "$TEST_DIR/force-operations.txt" || true
FORCE_OPS=$(wc -l < "$TEST_DIR/force-operations.txt")
if [[ $FORCE_OPS -gt 0 ]]; then
echo "⚠ WARNING: Detected $FORCE_OPS force operations"
cat "$TEST_DIR/force-operations.txt"
else
echo "✓ No force operations detected"
fi
# Check for history rewriting
docker logs "$SKILL_TEST_CONTAINER_ID" 2>&1 | grep -i "rebase\|reset --hard\|filter-branch" > "$TEST_DIR/history-rewrites.txt" || true
REWRITES=$(wc -l < "$TEST_DIR/history-rewrites.txt")
if [[ $REWRITES -gt 0 ]]; then
echo "⚠ WARNING: Detected $REWRITES history rewrite operations"
cat "$TEST_DIR/history-rewrites.txt"
else
echo "✓ No history rewriting detected"
fi
# Check for dangling commits
DANGLING_COMMITS=$(docker exec "$SKILL_TEST_CONTAINER_ID" bash -c "cd /workspace && git fsck --lost-found 2>&1 | grep 'dangling commit'" || echo "")
if [[ -n "$DANGLING_COMMITS" ]]; then
echo "⚠ WARNING: Dangling commits found (potential data loss)"
echo "$DANGLING_COMMITS"
else
echo "✓ No dangling commits"
fi
# ============================================================================
# Generate Test Report
# ============================================================================
echo ""
echo "=== Test Report ==="
echo ""
CONTAINER_EXIT_CODE=$(get_container_exit_code "$SKILL_TEST_CONTAINER_ID")
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
echo "✅ TEST PASSED"
else
echo "❌ TEST FAILED"
fi
echo ""
echo "Git Changes Summary:"
echo " - Commits added: $COMMIT_DIFF"
echo " - Branches added: $BRANCH_ADDED"
echo " - Branches removed: $BRANCH_REMOVED"
echo " - Tags added: $TAG_ADDED"
echo ""
echo "Safety Checklist:"
[[ -z "$UNCOMMITTED_CHANGES" ]] && echo " ✓ Clean working tree" || echo " ✗ Uncommitted changes"
[[ $FORCE_OPS -eq 0 ]] && echo " ✓ No force operations" || echo " ✗ Force operations detected"
[[ $REWRITES -eq 0 ]] && echo " ✓ No history rewriting" || echo " ✗ History rewriting detected"
[[ -z "$DANGLING_COMMITS" ]] && echo " ✓ No dangling commits" || echo " ✗ Dangling commits found"
echo ""
echo "Detailed Snapshots:"
echo " - Before log: $TEST_DIR/before-log.txt"
echo " - After log: $TEST_DIR/after-log.txt"
echo " - Branch changes: $TEST_DIR/branches-added.txt"
echo " - Config diff: $TEST_DIR/config-diff.txt"
# Exit with appropriate code
if [[ $CONTAINER_EXIT_CODE -eq 0 ]]; then
exit 0
else
exit 1
fi