Initial commit

2025-11-29 18:47:40 +08:00
commit 14c678ceac
22 changed files with 7501 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,12 @@
+{
+  "name": "tailscale-sshsync-agent",
+  "description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
+  "version": "0.0.0-2025.11.28",
+  "author": {
+    "name": "William VanSickle III",
+    "email": "noreply@humanfrontierlabs.com"
+  },
+  "skills": [
+    "./"
+  ]
+}
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,163 @@
+# Changelog
+
+All notable changes to Tailscale SSH Sync Agent will be documented here.
+
+Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
+Versioning follows [Semantic Versioning](https://semver.org/).
+
+## [1.0.0] - 2025-10-19
+
+### Added
+
+**Core Functionality:**
+- `sshsync_wrapper.py`: Python interface to sshsync CLI operations
+  - `get_host_status()`: Check online/offline status of hosts
+  - `execute_on_all()`: Run commands on all configured hosts
+  - `execute_on_group()`: Run commands on specific groups
+  - `execute_on_host()`: Run commands on single host
+  - `push_to_hosts()`: Push files to multiple hosts (with groups support)
+  - `pull_from_host()`: Pull files from hosts
+  - `list_hosts()`: List all configured hosts
+  - `get_groups()`: Get group configuration
+
+- `tailscale_manager.py`: Tailscale-specific operations
+  - `get_tailscale_status()`: Get complete network status
+  - `check_connectivity()`: Ping hosts via Tailscale
+  - `get_peer_info()`: Get detailed peer information
+  - `list_online_machines()`: List all online Tailscale machines
+  - `validate_tailscale_ssh()`: Check if Tailscale SSH works for a host
+  - `get_network_summary()`: Human-readable network summary
+
+- `load_balancer.py`: Intelligent task distribution
+  - `get_machine_load()`: Get CPU, memory, disk metrics for a machine
+  - `select_optimal_host()`: Pick best host based on current load
+  - `get_group_capacity()`: Get aggregate capacity of a group
+  - `distribute_tasks()`: Distribute multiple tasks optimally across hosts
+  - `format_load_report()`: Format load metrics as human-readable report
+
+- `workflow_executor.py`: Common multi-machine workflows
+  - `deploy_workflow()`: Full deployment pipeline (staging → test → production)
+  - `backup_workflow()`: Backup files from multiple hosts
+  - `sync_workflow()`: Sync files from one host to many
+  - `rolling_restart()`: Zero-downtime service restart across group
+  - `health_check_workflow()`: Check health endpoints across group
+
+**Utilities:**
+- `utils/helpers.py`: Common formatting and parsing functions
+  - Byte formatting (`format_bytes`)
+  - Duration formatting (`format_duration`)
+  - Percentage formatting (`format_percentage`)
+  - SSH config parsing (`parse_ssh_config`)
+  - sshsync config parsing (`parse_sshsync_config`)
+  - System metrics parsing (`parse_disk_usage`, `parse_memory_usage`, `parse_cpu_load`)
+  - Load score calculation (`calculate_load_score`)
+  - Status classification (`classify_load_status`, `classify_latency`)
+  - Safe command execution (`run_command`, `safe_execute`)
+
+- `utils/validators/`: Comprehensive validation system
+  - `parameter_validator.py`: Input validation (hosts, groups, paths, timeouts, commands)
+  - `host_validator.py`: Host configuration and availability validation
+  - `connection_validator.py`: SSH and Tailscale connection validation
+
+**Testing:**
+- `tests/test_integration.py`: 11 end-to-end integration tests
+- `tests/test_helpers.py`: 11 helper function tests
+- `tests/test_validation.py`: 7 validation tests
+- **Total: 29 tests** covering all major functionality
+
+**Documentation:**
+- `SKILL.md`: Complete skill documentation (6,000+ words)
+  - When to use this skill
+  - How it works
+  - Data sources (sshsync CLI, Tailscale)
+  - Detailed workflows for each operation type
+  - Available scripts and functions
+  - Error handling and validations
+  - Performance and caching strategies
+  - Usage examples
+- `references/sshsync-guide.md`: Complete sshsync CLI reference
+- `references/tailscale-integration.md`: Tailscale integration guide
+- `README.md`: Installation and quick start guide
+- `INSTALLATION.md`: Detailed setup tutorial
+- `DECISIONS.md`: Architecture decisions and rationale
+
+### Data Sources
+
+**sshsync CLI:**
+- Installation: `pip install sshsync`
+- Configuration: `~/.config/sshsync/config.yaml`
+- SSH config integration: `~/.ssh/config`
+- Group-based host management
+- Remote command execution with timeouts
+- File push/pull operations (single or recursive)
+- Status checking and connectivity validation
+
+**Tailscale:**
+- Zero-config VPN with WireGuard encryption
+- MagicDNS for easy host addressing
+- Built-in SSH capabilities
+- Seamless integration with standard SSH
+- Peer-to-peer connections
+- Works across NATs and firewalls
+
+### Coverage
+
+**Operations:**
+- Host status monitoring and availability checks
+- Intelligent load-based task distribution
+- Multi-host command execution (all hosts, groups, individual)
+- File synchronization workflows (push/pull)
+- Deployment pipelines (staging → production)
+- Backup and sync workflows
+- Rolling restarts with zero downtime
+- Health checking across services
+
+**Geographic Coverage:** All hosts in Tailscale network (global)
+
+**Temporal Coverage:** Real-time status and operations
+
+### Known Limitations
+
+**v1.0.0:**
+- sshsync must be installed separately (`pip install sshsync`)
+- Tailscale must be configured separately
+- SSH keys must be set up manually on each host
+- Load balancing uses simple metrics (CPU, memory, disk)
+- No built-in monitoring dashboards (terminal output only)
+- No persistence of operation history (logs only)
+- Requires SSH config and sshsync config to be manually maintained
+
+### Planned for v2.0
+
+**Enhanced Features:**
+- Automated SSH key distribution across hosts
+- Built-in operation history and logging database
+- Web dashboard for monitoring and operations
+- Advanced load balancing with custom metrics
+- Scheduled operations and cron integration
+- Operation rollback capabilities
+- Integration with configuration management tools (Ansible, Terraform)
+- Cost tracking for cloud resources
+- Performance metrics collection and visualization
+- Alert system for failed operations
+- Multi-tenancy support for team environments
+
+**Integrations:**
+- Prometheus metrics export
+- Grafana dashboard templates
+- Slack/Discord notifications
+- CI/CD pipeline integration
+- Container orchestration support (Docker, Kubernetes)
+
+## [Unreleased]
+
+### Planned
+
+- Add support for Windows hosts (PowerShell remoting)
+- Improve performance for large host groups (100+)
+- Add SSH connection pooling for faster operations
+- Implement operation queueing for long-running tasks
+- Add support for custom validation plugins
+- Expand coverage to Docker containers via SSH
+- Add retry strategies with exponential backoff
+- Implement circuit breaker pattern for failing hosts
--- a/DECISIONS.md
+++ b/DECISIONS.md
@@ -0,0 +1,458 @@
+# Architecture Decisions
+
+Documentation of all technical decisions made for Tailscale SSH Sync Agent.
+
+## Tool Selection
+
+### Selected Tool: sshsync
+
+**Justification:**
+
+✅ **Advantages:**
+- **Ready-to-use**: Available via `pip install sshsync`
+- **Group management**: Built-in support for organizing hosts into groups
+- **Integration**: Works with existing SSH config (`~/.ssh/config`)
+- **Simple API**: Easy-to-wrap CLI interface
+- **Parallel execution**: Commands run concurrently across hosts
+- **File operations**: Push/pull with recursive support
+- **Timeout handling**: Per-command timeouts for reliability
+- **Active maintenance**: Regular updates and bug fixes
+- **Python-based**: Easy to extend and integrate
+
+✅ **Coverage:**
+- All SSH-accessible hosts
+- Works with any SSH server (Linux, macOS, BSD, etc.)
+- Platform-agnostic (runs on any OS with Python)
+
+✅ **Cost:**
+- Free and open-source
+- No API keys or subscriptions required
+- No rate limits
+
+✅ **Documentation:**
+- Clear command-line interface
+- PyPI documentation available
+- GitHub repository with examples
+
+**Alternatives Considered:**
+
+❌ **Fabric (Python library)**
+- Pros: Pure Python, very flexible
+- Cons: Requires writing more code, no built-in group management
+- **Rejected because**: sshsync provides ready-made functionality
+
+❌ **Ansible**
+- Pros: Industry standard, very powerful
+- Cons: Requires learning YAML playbooks, overkill for simple operations
+- **Rejected because**: Too heavyweight for ad-hoc commands and file transfers
+
+❌ **pssh (parallel-ssh)**
+- Pros: Simple parallel SSH
+- Cons: No group management, no file transfer built-in, less actively maintained
+- **Rejected because**: sshsync has better group management and file operations
+
+❌ **Custom SSH wrapper**
+- Pros: Full control
+- Cons: Reinventing the wheel, maintaining parallel execution logic
+- **Rejected because**: sshsync already provides what we need
+
+**Conclusion:**
+
+sshsync is the best tool for this use case because it:
+1. Provides group-based host management out of the box
+2. Handles parallel execution automatically
+3. Integrates with existing SSH configuration
+4. Supports both command execution and file transfers
+5. Requires minimal wrapper code
+
+## Integration: Tailscale
+
+**Decision**: Integrate with Tailscale for network connectivity
+
+**Justification:**
+
+✅ **Why Tailscale:**
+- **Zero-config VPN**: No manual firewall/NAT configuration
+- **Secure by default**: WireGuard encryption
+- **Works everywhere**: Coffee shop, home, office, cloud
+- **MagicDNS**: Easy addressing (machine-name.tailnet.ts.net)
+- **Standard SSH**: Works with all SSH tools including sshsync
+- **No overhead**: Uses regular SSH protocol over Tailscale network
+
+✅ **Integration approach:**
+- Tailscale provides the network layer
+- Standard SSH works over Tailscale
+- sshsync operates normally using Tailscale hostnames/IPs
+- No Tailscale-specific code needed in core operations
+- Tailscale status checking for diagnostics
+
+**Alternatives:**
+
+❌ **Direct public internet + port forwarding**
+- Cons: Complex firewall setup, security risks, doesn't work on mobile/restricted networks
+- **Rejected because**: Requires too much configuration and has security concerns
+
+❌ **Other VPNs (WireGuard, OpenVPN, ZeroTier)**
+- Cons: More manual configuration, less zero-config
+- **Rejected because**: Tailscale is easier to set up and use
+
+**Conclusion:**
+
+Tailscale + standard SSH is the optimal combination:
+- Secure connectivity without configuration
+- Works with existing SSH tools
+- No vendor lock-in (can use other VPNs if needed)
+
+## Architecture
+
+### Structure: Modular Scripts + Utilities
+
+**Decision**: Separate concerns into focused modules
+
+```
+scripts/
+├── sshsync_wrapper.py         # sshsync CLI interface
+├── tailscale_manager.py       # Tailscale operations
+├── load_balancer.py           # Task distribution logic
+├── workflow_executor.py       # Common workflows
+└── utils/
+    ├── helpers.py             # Formatting, parsing
+    └── validators/            # Input validation
+```
+
+**Justification:**
+
+✅ **Modularity:**
+- Each script has single responsibility
+- Easy to test independently
+- Easy to extend without breaking others
+
+✅ **Reusability:**
+- Helpers used across all scripts
+- Validators prevent duplicate validation logic
+- Workflows compose lower-level operations
+
+✅ **Maintainability:**
+- Clear file organization
+- Easy to locate specific functionality
+- Separation of concerns
+
+**Alternatives:**
+
+❌ **Monolithic single script**
+- Cons: Hard to test, hard to maintain, becomes too large
+- **Rejected because**: Doesn't scale well
+
+❌ **Over-engineered class hierarchy**
+- Cons: Unnecessary complexity for this use case
+- **Rejected because**: Simple functions are sufficient
+
+**Conclusion:**
+
+Modular functional approach provides good balance of simplicity and maintainability.
+
+### Validation Strategy: Multi-Layer
+
+**Decision**: Validate at multiple layers
+
+**Layers:**
+
+1. **Parameter validation** (`parameter_validator.py`)
+   - Validates user inputs before any operations
+   - Prevents invalid hosts, groups, paths, etc.
+
+2. **Host validation** (`host_validator.py`)
+   - Validates SSH configuration exists
+   - Checks host reachability
+   - Validates group membership
+
+3. **Connection validation** (`connection_validator.py`)
+   - Tests actual SSH connectivity
+   - Verifies Tailscale status
+   - Checks SSH key authentication
+
+**Justification:**
+
+✅ **Early failure:**
+- Catch errors before expensive operations
+- Clear error messages at each layer
+
+✅ **Comprehensive:**
+- Multiple validation points catch different issues
+- Reduces runtime failures
+
+✅ **User-friendly:**
+- Helpful error messages with suggestions
+- Clear indication of what went wrong
+
+**Conclusion:**
+
+Multi-layer validation provides robust error handling and great user experience.
+
+## Load Balancing Strategy
+
+### Decision: Simple Composite Score
+
+**Formula:**
+```python
+score = (cpu_pct * 0.4) + (mem_pct * 0.3) + (disk_pct * 0.3)
+```
+
+**Weights:**
+- CPU: 40% (most important for compute tasks)
+- Memory: 30% (important for data processing)
+- Disk: 30% (important for I/O operations)
+
+**Justification:**
+
+✅ **Simple and effective:**
+- Easy to understand
+- Fast to calculate
+- Works well for most workloads
+
+✅ **Balanced:**
+- Considers multiple resource types
+- No single metric dominates
+
+**Alternatives:**
+
+❌ **CPU only**
+- Cons: Ignores memory-bound and I/O-bound tasks
+- **Rejected because**: Too narrow
+
+❌ **Complex ML-based prediction**
+- Cons: Overkill, slow, requires training data
+- **Rejected because**: Unnecessary complexity
+
+❌ **Fixed round-robin**
+- Cons: Doesn't consider actual load
+- **Rejected because**: Can overload already-busy hosts
+
+**Conclusion:**
+
+Simple weighted score provides good balance without complexity.
+
+## Error Handling Philosophy
+
+### Decision: Graceful Degradation + Clear Messages
+
+**Principles:**
+
+1. **Fail early with validation**: Catch errors before operations
+2. **Isolate failures**: One host failure doesn't stop others
+3. **Clear messages**: Tell user exactly what went wrong and how to fix
+4. **Automatic retry**: Retry transient errors (network, timeout)
+5. **Dry-run support**: Preview operations before execution
+
+**Implementation:**
+
+```python
+# Example error handling pattern
+try:
+    validate_host(host)
+    validate_ssh_connection(host)
+    result = execute_command(host, command)
+except ValidationError as e:
+    return {'error': str(e), 'suggestion': 'Fix: ...'}
+except ConnectionError as e:
+    return {'error': str(e), 'diagnostics': get_diagnostics(host)}
+```
+
+**Justification:**
+
+✅ **Better UX:**
+- Users know exactly what's wrong
+- Suggestions help fix issues quickly
+
+✅ **Reliability:**
+- Automatic retry handles transient issues
+- Dry-run prevents mistakes
+
+✅ **Debugging:**
+- Clear error messages speed up troubleshooting
+- Diagnostics provide actionable information
+
+**Conclusion:**
+
+Graceful degradation with helpful messages creates better user experience.
+
+## Caching Strategy
+
+**Decision**: Minimal caching for real-time accuracy
+
+**What we cache:**
+- Nothing (v1.0.0)
+
+**Why no caching:**
+- Host status changes frequently
+- Load metrics change constantly
+- Operations need real-time data
+- Cache invalidation is complex
+
+**Future consideration (v2.0):**
+- Cache Tailscale status (60s TTL)
+- Cache group configuration (5min TTL)
+- Cache SSH config parsing (5min TTL)
+
+**Justification:**
+
+✅ **Simplicity:**
+- No cache invalidation logic needed
+- No stale data issues
+
+✅ **Accuracy:**
+- Always get current state
+- No surprises from cached data
+
+**Trade-off:**
+- Slightly slower repeated operations
+- More network calls
+
+**Conclusion:**
+
+For v1.0.0, simplicity and accuracy outweigh performance concerns. Real-time data is more valuable than speed.
+
+## Testing Strategy
+
+### Decision: Comprehensive Unit + Integration Tests
+
+**Coverage:**
+
+- **29 tests total:**
+  - 11 integration tests (end-to-end workflows)
+  - 11 helper tests (formatting, parsing, calculations)
+  - 7 validation tests (input validation, safety checks)
+
+**Test Philosophy:**
+
+1. **Test real functionality**: Integration tests use actual functions
+2. **Test edge cases**: Validation tests cover error conditions
+3. **Test helpers**: Ensure formatting/parsing works correctly
+4. **Fast execution**: All tests run in < 10 seconds
+5. **No external dependencies**: Tests don't require Tailscale or sshsync to be running
+
+**Justification:**
+
+✅ **Confidence:**
+- Tests verify code works as expected
+- Catches regressions when modifying code
+
+✅ **Documentation:**
+- Tests show how to use functions
+- Examples of expected behavior
+
+✅ **Reliability:**
+- Production-ready code from v1.0.0
+
+**Conclusion:**
+
+Comprehensive testing ensures reliable code from the start.
+
+## Performance Considerations
+
+### Parallel Execution
+
+**Decision**: Leverage sshsync's built-in parallelization
+
+- sshsync runs commands concurrently across hosts automatically
+- No need to implement custom threading/multiprocessing
+- Timeout applies per-host independently
+
+**Trade-offs:**
+
+✅ **Pros:**
+- Simple to use
+- Fast for large host groups
+- No concurrency bugs
+
+⚠️ **Cons:**
+- Less control over parallelism level
+- Can overwhelm network with too many concurrent connections
+
+**Conclusion:**
+
+Built-in parallelization is sufficient for most use cases. Custom control can be added in v2.0 if needed.
+
+## Security Considerations
+
+### SSH Key Authentication
+
+**Decision**: Require SSH keys (no password auth)
+
+**Justification:**
+
+✅ **Security:**
+- Keys are more secure than passwords
+- Can't be brute-forced
+- Can be revoked per-host
+
+✅ **Automation:**
+- Non-interactive (no password prompts)
+- Works in scripts and CI/CD
+
+**Implementation:**
+- Validators check SSH key auth works
+- Clear error messages guide users to set up keys
+- Documentation explains SSH key setup
+
+### Command Safety
+
+**Decision**: Validate dangerous commands
+
+**Dangerous patterns blocked:**
+- `rm -rf /` (root deletion)
+- `mkfs.*` (filesystem formatting)
+- `dd.*of=/dev/` (direct disk writes)
+- Fork bombs
+- Direct disk writes
+
+**Override**: Use `allow_dangerous=True` to bypass
+
+**Justification:**
+
+✅ **Safety:**
+- Prevents accidental destructive operations
+- Dry-run provides preview
+
+✅ **Flexibility:**
+- Can still run dangerous commands if explicitly allowed
+
+**Conclusion:**
+
+Safety by default with escape hatch for advanced users.
+
+## Decisions Summary
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| **CLI Tool** | sshsync | Best balance of features, ease of use, and maintenance |
+| **Network** | Tailscale | Zero-config secure VPN, works everywhere |
+| **Architecture** | Modular scripts | Clear separation of concerns, maintainable |
+| **Validation** | Multi-layer | Catch errors early with helpful messages |
+| **Load Balancing** | Composite score | Simple, effective, considers multiple resources |
+| **Caching** | None (v1.0) | Simplicity and real-time accuracy |
+| **Testing** | 29 tests | Comprehensive coverage for reliability |
+| **Security** | SSH keys + validation | Secure and automation-friendly |
+
+## Trade-offs Accepted
+
+1. **No caching** → Slightly slower, but always accurate
+2. **sshsync dependency** → External tool, but saves development time
+3. **SSH key requirement** → Setup needed, but more secure
+4. **Simple load balancing** → Less sophisticated, but fast and easy to understand
+5. **Terminal UI only** → No web dashboard, but simpler to develop and maintain
+
+## Future Improvements
+
+### v2.0 Considerations
+
+1. **Add caching** for frequently-accessed data (Tailscale status, groups)
+2. **Web dashboard** for visualization and monitoring
+3. **Operation history** database for audit trail
+4. **Advanced load balancing** with custom metrics
+5. **Automated SSH key distribution** across hosts
+6. **Integration with config management** tools (Ansible, Terraform)
+7. **Container support** via SSH to Docker containers
+8. **Custom validation plugins** for domain-specific checks
+
+All decisions prioritize **simplicity**, **security**, and **maintainability** for v1.0.0.
--- a/INSTALLATION.md
+++ b/INSTALLATION.md
@@ -0,0 +1,707 @@
+# Installation Guide
+
+Complete step-by-step tutorial for setting up Tailscale SSH Sync Agent.
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Step 1: Install Tailscale](#step-1-install-tailscale)
+3. [Step 2: Install sshsync](#step-2-install-sshsync)
+4. [Step 3: Configure SSH](#step-3-configure-ssh)
+5. [Step 4: Configure sshsync Groups](#step-4-configure-sshsync-groups)
+6. [Step 5: Install Agent](#step-5-install-agent)
+7. [Step 6: Test Installation](#step-6-test-installation)
+8. [Troubleshooting](#troubleshooting)
+
+## Prerequisites
+
+Before you begin, ensure you have:
+
+- **Operating System**: macOS, Linux, or BSD
+- **Python**: Version 3.10 or higher
+- **pip**: Python package installer
+- **Claude Code**: Installed and running
+- **Remote machines**: At least one machine you want to manage
+- **SSH access**: Ability to SSH to remote machines
+
+**Check Python version**:
+```bash
+python3 --version
+# Should show: Python 3.10.x or higher
+```
+
+**Check pip**:
+```bash
+pip3 --version
+# Should show: pip xx.x.x from ...
+```
+
+## Step 1: Install Tailscale
+
+Tailscale provides secure networking between your machines.
+
+### macOS
+
+```bash
+# Install via Homebrew
+brew install tailscale
+
+# Start Tailscale
+sudo tailscale up
+
+# Follow authentication link in terminal
+# This will open browser to log in
+```
+
+### Linux (Ubuntu/Debian)
+
+```bash
+# Install Tailscale
+curl -fsSL https://tailscale.com/install.sh | sh
+
+# Start and authenticate
+sudo tailscale up
+
+# Follow authentication link
+```
+
+### Linux (Fedora/RHEL)
+
+```bash
+# Add repository
+sudo dnf config-manager --add-repo https://pkgs.tailscale.com/stable/fedora/tailscale.repo
+
+# Install
+sudo dnf install tailscale
+
+# Enable and start
+sudo systemctl enable --now tailscaled
+sudo tailscale up
+```
+
+### Verify Installation
+
+```bash
+# Check Tailscale status
+tailscale status
+
+# Should show list of machines in your tailnet
+# Example output:
+# 100.64.1.10  homelab-1    user@    linux   -
+# 100.64.1.11  laptop       user@    macOS   -
+```
+
+**Important**: Install and authenticate Tailscale on **all machines** you want to manage.
+
+## Step 2: Install sshsync
+
+sshsync is the CLI tool for managing SSH operations across multiple hosts.
+
+```bash
+# Install via pip
+pip3 install sshsync
+
+# Or use pipx for isolated installation
+pipx install sshsync
+```
+
+### Verify Installation
+
+```bash
+# Check version
+sshsync --version
+
+# Should show: sshsync, version x.x.x
+```
+
+### Common Installation Issues
+
+**Issue**: `pip3: command not found`
+
+**Solution**:
+```bash
+# macOS
+brew install python3
+
+# Linux (Ubuntu/Debian)
+sudo apt install python3-pip
+
+# Linux (Fedora/RHEL)
+sudo dnf install python3-pip
+```
+
+**Issue**: Permission denied during install
+
+**Solution**:
+```bash
+# Install for current user only
+pip3 install --user sshsync
+
+# Or use pipx
+pip3 install --user pipx
+pipx install sshsync
+```
+
+## Step 3: Configure SSH
+
+SSH configuration defines how to connect to each machine.
+
+### Step 3.1: Generate SSH Keys (if you don't have them)
+
+```bash
+# Generate ed25519 key (recommended)
+ssh-keygen -t ed25519 -C "your_email@example.com"
+
+# Press Enter to use default location (~/.ssh/id_ed25519)
+# Enter passphrase (or leave empty for no passphrase)
+```
+
+**Output**:
+```
+Your identification has been saved in /Users/you/.ssh/id_ed25519
+Your public key has been saved in /Users/you/.ssh/id_ed25519.pub
+```
+
+### Step 3.2: Copy Public Key to Remote Machines
+
+For each remote machine:
+
+```bash
+# Copy SSH key to remote
+ssh-copy-id user@machine-hostname
+
+# Example:
+ssh-copy-id admin@100.64.1.10
+```
+
+**Manual method** (if ssh-copy-id doesn't work):
+
+```bash
+# Display public key
+cat ~/.ssh/id_ed25519.pub
+
+# SSH to remote machine
+ssh user@remote-host
+
+# On remote machine:
+mkdir -p ~/.ssh
+chmod 700 ~/.ssh
+echo "your-public-key-here" >> ~/.ssh/authorized_keys
+chmod 600 ~/.ssh/authorized_keys
+exit
+```
+
+### Step 3.3: Test SSH Connection
+
+```bash
+# Test connection (should not ask for password)
+ssh user@remote-host "hostname"
+
+# If successful, should print remote hostname
+```
+
+### Step 3.4: Create SSH Config File
+
+Edit `~/.ssh/config`:
+
+```bash
+vim ~/.ssh/config
+```
+
+**Add host entries**:
+
+```
+# Production servers
+Host prod-web-01
+  HostName prod-web-01.tailnet.ts.net
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+  Port 22
+
+Host prod-web-02
+  HostName 100.64.1.21
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+
+Host prod-db-01
+  HostName 100.64.1.30
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+
+# Development
+Host dev-laptop
+  HostName dev-laptop.tailnet.ts.net
+  User developer
+  IdentityFile ~/.ssh/id_ed25519
+
+Host dev-desktop
+  HostName 100.64.1.40
+  User developer
+  IdentityFile ~/.ssh/id_ed25519
+
+# Homelab
+Host homelab-1
+  HostName 100.64.1.10
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+
+Host homelab-2
+  HostName 100.64.1.11
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+```
+
+**Important fields**:
+- **Host**: Alias you'll use (e.g., "homelab-1")
+- **HostName**: Actual hostname or IP (Tailscale hostname or IP)
+- **User**: SSH username on remote machine
+- **IdentityFile**: Path to SSH private key
+
+### Step 3.5: Set Correct Permissions
+
+```bash
+# SSH config should be readable only by you
+chmod 600 ~/.ssh/config
+
+# SSH directory permissions
+chmod 700 ~/.ssh
+
+# Private key permissions
+chmod 600 ~/.ssh/id_ed25519
+
+# Public key permissions
+chmod 644 ~/.ssh/id_ed25519.pub
+```
+
+### Step 3.6: Verify All Hosts
+
+Test each host in your config:
+
+```bash
+# Test each host
+ssh homelab-1 "echo 'Connection successful'"
+ssh prod-web-01 "echo 'Connection successful'"
+ssh dev-laptop "echo 'Connection successful'"
+
+# Should connect without asking for password
+```
+
+## Step 4: Configure sshsync Groups
+
+Groups organize your hosts for easy management.
+
+### Step 4.1: Initialize sshsync Configuration
+
+```bash
+# Sync hosts and create groups
+sshsync sync
+```
+
+**What this does**:
+1. Reads all hosts from `~/.ssh/config`
+2. Prompts you to assign hosts to groups
+3. Creates `~/.config/sshsync/config.yaml`
+
+### Step 4.2: Follow Interactive Prompts
+
+```
+Found 7 ungrouped hosts:
+1. homelab-1
+2. homelab-2
+3. prod-web-01
+4. prod-web-02
+5. prod-db-01
+6. dev-laptop
+7. dev-desktop
+
+Assign groups now? [Y/n]: Y
+
+Enter group name for homelab-1 (or skip): homelab
+Enter group name for homelab-2 (or skip): homelab
+Enter group name for prod-web-01 (or skip): production,web
+Enter group name for prod-web-02 (or skip): production,web
+Enter group name for prod-db-01 (or skip): production,database
+Enter group name for dev-laptop (or skip): development
+Enter group name for dev-desktop (or skip): development
+```
+
+**Tips**:
+- Hosts can belong to multiple groups (separate with commas)
+- Use meaningful group names (production, development, web, database, homelab)
+- Skip hosts you don't want to group yet
+
+### Step 4.3: Verify Configuration
+
+```bash
+# View generated config
+cat ~/.config/sshsync/config.yaml
+```
+
+**Expected output**:
+```yaml
+groups:
+  production:
+    - prod-web-01
+    - prod-web-02
+    - prod-db-01
+  web:
+    - prod-web-01
+    - prod-web-02
+  database:
+    - prod-db-01
+  development:
+    - dev-laptop
+    - dev-desktop
+  homelab:
+    - homelab-1
+    - homelab-2
+```
+
+### Step 4.4: Test sshsync
+
+```bash
+# List hosts
+sshsync ls
+
+# List with status
+sshsync ls --with-status
+
+# Test command execution
+sshsync all "hostname"
+
+# Test group execution
+sshsync group homelab "uptime"
+```
+
+## Step 5: Install Agent
+
+### Step 5.1: Navigate to Agent Directory
+
+```bash
+cd /path/to/tailscale-sshsync-agent
+```
+
+### Step 5.2: Verify Agent Structure
+
+```bash
+# List files
+ls -la
+
+# Should see:
+# .claude-plugin/
+# scripts/
+# tests/
+# references/
+# SKILL.md
+# README.md
+# VERSION
+# CHANGELOG.md
+# etc.
+```
+
+### Step 5.3: Validate marketplace.json
+
+```bash
+# Check JSON is valid
+python3 -c "import json; json.load(open('.claude-plugin/marketplace.json')); print('✅ Valid JSON')"
+
+# Should output: ✅ Valid JSON
+```
+
+### Step 5.4: Install via Claude Code
+
+In Claude Code:
+
+```
+/plugin marketplace add /absolute/path/to/tailscale-sshsync-agent
+```
+
+**Example**:
+```
+/plugin marketplace add /Users/you/tailscale-sshsync-agent
+```
+
+**Expected output**:
+```
+✓ Plugin installed successfully
+✓ Skill: tailscale-sshsync-agent
+✓ Description: Manages distributed workloads and file sharing...
+```
+
+### Step 5.5: Verify Installation
+
+In Claude Code:
+
+```
+"Which of my machines are online?"
+```
+
+**Expected response**: Agent should activate and check your Tailscale network.
+
+## Step 6: Test Installation
+
+### Test 1: Host Status
+
+**Query**:
+```
+"Which of my machines are online?"
+```
+
+**Expected**: List of hosts with online/offline status
+
+### Test 2: List Groups
+
+**Query**:
+```
+"What groups do I have configured?"
+```
+
+**Expected**: List of your sshsync groups
+
+### Test 3: Execute Command
+
+**Query**:
+```
+"Check disk space on homelab machines"
+```
+
+**Expected**: Disk usage for hosts in homelab group
+
+### Test 4: Dry-Run
+
+**Query**:
+```
+"Show me what would happen if I ran 'uptime' on all machines (dry-run)"
+```
+
+**Expected**: Preview without execution
+
+### Test 5: Run Test Suite
+
+```bash
+cd /path/to/tailscale-sshsync-agent
+
+# Run all tests
+python3 tests/test_integration.py
+
+# Should show:
+# Results: 11/11 passed
+# 🎉 All tests passed!
+```
+
+## Troubleshooting
+
+### Agent Not Activating
+
+**Symptoms**: Agent doesn't respond to queries about machines/hosts
+
+**Solutions**:
+
+1. **Check installation**:
+   ```
+   /plugin list
+   ```
+   Should show `tailscale-sshsync-agent` in list.
+
+2. **Reinstall**:
+   ```
+   /plugin remove tailscale-sshsync-agent
+   /plugin marketplace add /path/to/tailscale-sshsync-agent
+   ```
+
+3. **Check marketplace.json**:
+   ```bash
+   cat .claude-plugin/marketplace.json
+   # Verify "description" field matches SKILL.md frontmatter
+   ```
+
+### SSH Connection Fails
+
+**Symptoms**: "Permission denied" or "Connection refused"
+
+**Solutions**:
+
+1. **Check SSH key**:
+   ```bash
+   ssh-add -l
+   # Should list your SSH key
+   ```
+
+   If not listed:
+   ```bash
+   ssh-add ~/.ssh/id_ed25519
+   ```
+
+2. **Test SSH directly**:
+   ```bash
+   ssh -v hostname
+   # -v shows verbose debug info
+   ```
+
+3. **Verify authorized_keys on remote**:
+   ```bash
+   ssh hostname "cat ~/.ssh/authorized_keys"
+   # Should contain your public key
+   ```
+
+### Tailscale Connection Issues
+
+**Symptoms**: Hosts show as offline in Tailscale
+
+**Solutions**:
+
+1. **Check Tailscale status**:
+   ```bash
+   tailscale status
+   ```
+
+2. **Restart Tailscale**:
+   ```bash
+   # macOS
+   brew services restart tailscale
+
+   # Linux
+   sudo systemctl restart tailscaled
+   ```
+
+3. **Re-authenticate**:
+   ```bash
+   sudo tailscale up
+   ```
+
+### sshsync Errors
+
+**Symptoms**: "sshsync: command not found"
+
+**Solutions**:
+
+1. **Reinstall sshsync**:
+   ```bash
+   pip3 install --upgrade sshsync
+   ```
+
+2. **Check PATH**:
+   ```bash
+   which sshsync
+   # Should show path to sshsync
+   ```
+
+   If not found, add to PATH:
+   ```bash
+   echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
+   source ~/.bashrc
+   ```
+
+### Config File Issues
+
+**Symptoms**: "Group not found" or "Host not found"
+
+**Solutions**:
+
+1. **Verify SSH config**:
+   ```bash
+   cat ~/.ssh/config
+   # Check host aliases are correct
+   ```
+
+2. **Verify sshsync config**:
+   ```bash
+   cat ~/.config/sshsync/config.yaml
+   # Check groups are defined
+   ```
+
+3. **Re-sync**:
+   ```bash
+   sshsync sync
+   ```
+
+### Test Failures
+
+**Symptoms**: Tests fail with errors
+
+**Solutions**:
+
+1. **Check dependencies**:
+   ```bash
+   pip3 list | grep -E "sshsync|pyyaml"
+   ```
+
+2. **Check Python version**:
+   ```bash
+   python3 --version
+   # Must be 3.10+
+   ```
+
+3. **Run tests individually**:
+   ```bash
+   python3 tests/test_helpers.py
+   python3 tests/test_validation.py
+   python3 tests/test_integration.py
+   ```
+
+## Post-Installation
+
+### Recommended Next Steps
+
+1. **Create more groups** for better organization:
+   ```bash
+   sshsync gadd staging
+   sshsync gadd backup-servers
+   ```
+
+2. **Test file operations**:
+   ```
+   "Push test file to homelab machines (dry-run)"
+   ```
+
+3. **Set up automation**:
+   - Create scripts for common tasks
+   - Schedule backups
+   - Automate deployments
+
+4. **Review documentation**:
+   - Read `references/sshsync-guide.md` for advanced sshsync usage
+   - Read `references/tailscale-integration.md` for Tailscale tips
+
+### Security Checklist
+
+- ✅ SSH keys are password-protected
+- ✅ SSH config has correct permissions (600)
+- ✅ Private keys have correct permissions (600)
+- ✅ Tailscale ACLs configured (if using teams)
+- ✅ Only necessary hosts have SSH access
+- ✅ Regularly review connected devices in Tailscale
+
+## Summary
+
+You now have:
+
+1. ✅ Tailscale installed and connected
+2. ✅ sshsync installed and configured
+3. ✅ SSH keys set up on all machines
+4. ✅ SSH config with all hosts
+5. ✅ sshsync groups organized
+6. ✅ Agent installed in Claude Code
+7. ✅ Tests passing
+
+**Start using**:
+
+```
+"Which machines are online?"
+"Run this on the least loaded machine"
+"Push files to production servers"
+"Deploy to staging then production"
+```
+
+For more examples, see README.md and SKILL.md.
+
+## Support
+
+If you encounter issues:
+
+1. Check this troubleshooting section
+2. Review references/ for detailed guides
+3. Check DECISIONS.md for architecture rationale
+4. Run tests to verify installation
+
+Happy automating! 🚀
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
+# tailscale-sshsync-agent
+
+Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.
--- a/SKILL.md
+++ b/SKILL.md
--- a/1
+++ b/1
@@ -0,0 +1 @@
+1.0.0
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,117 @@
+{
+  "$schema": "internal://schemas/plugin.lock.v1.json",
+  "pluginId": "gh:Human-Frontier-Labs-Inc/human-frontier-labs-marketplace:plugins/tailscale-sshsync-agent",
+  "normalized": {
+    "repo": null,
+    "ref": "refs/tags/v20251128.0",
+    "commit": "3a7cbe9632f245c6b9a4c4bf2731da65c857a7f4",
+    "treeHash": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805",
+    "generatedAt": "2025-11-28T10:11:41.356928Z",
+    "toolVersion": "publish_plugins.py@0.2.0"
+  },
+  "origin": {
+    "remote": "git@github.com:zhongweili/42plugin-data.git",
+    "branch": "master",
+    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
+    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
+  },
+  "manifest": {
+    "name": "tailscale-sshsync-agent",
+    "description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
+    "version": null
+  },
+  "content": {
+    "files": [
+      {
+        "path": "CHANGELOG.md",
+        "sha256": "74dbda933868b7cab410144a831b43e4f1ae6161f2402edcb068a8232c50bfe4"
+      },
+      {
+        "path": "README.md",
+        "sha256": "470f165d8ac61a8942e6fb3568c49febb7f803bfa0f4010d14e09f807c34c88e"
+      },
+      {
+        "path": "VERSION",
+        "sha256": "59854984853104df5c353e2f681a15fc7924742f9a2e468c29af248dce45ce03"
+      },
+      {
+        "path": "SKILL.md",
+        "sha256": "31c8f237f9b3617c32c6ff381ae83d427b50eb0877d3763d9826e00ece6618f1"
+      },
+      {
+        "path": "INSTALLATION.md",
+        "sha256": "9313ea1bbb0a03e4c078c41b207f3febe800cd38eb57b7205c7b5188238ca46a"
+      },
+      {
+        "path": "DECISIONS.md",
+        "sha256": "59549e84aaa8e32d4bdf64d46855714f5cde7f061906e1c74976658883472c82"
+      },
+      {
+        "path": "references/tailscale-integration.md",
+        "sha256": "6553b3ceeaca5118a7b005368223ea4b3ab70eb2492ccaf5c2b7f7758b65dd42"
+      },
+      {
+        "path": "references/sshsync-guide.md",
+        "sha256": "697ce0b56eda258732a0b924f821e9e24eb6b977934153bdd2045be961e58de2"
+      },
+      {
+        "path": "tests/test_validation.py",
+        "sha256": "716ae0d2e86f0e6657903aef6bb714fbd3b5b72d3b109fab4da3f75f90cc2c0a"
+      },
+      {
+        "path": "tests/test_helpers.py",
+        "sha256": "3be88e30825414eb3ade048b766c84995dc98a01cb7236ce75201716179279a8"
+      },
+      {
+        "path": "tests/test_integration.py",
+        "sha256": "12f7cb857fda23531a9c74caf072cf73b739672b1e99c55f42a2ef8e11238523"
+      },
+      {
+        "path": "scripts/load_balancer.py",
+        "sha256": "9d87476562ac848a026e42116e381f733d520e9330da33de3d905585af14398d"
+      },
+      {
+        "path": "scripts/tailscale_manager.py",
+        "sha256": "4b75ebb9423d221b9788eb9352b274e0256c101185de11064a7b4cb00684016e"
+      },
+      {
+        "path": "scripts/workflow_executor.py",
+        "sha256": "9f23f3bb421e940766e65949e6efa485a313115e297d4c5f1088589155a7bac1"
+      },
+      {
+        "path": "scripts/sshsync_wrapper.py",
+        "sha256": "fc2062ebbc72e3ddc6c6bfb5f22019b23050f5c2ed9ac35c315018a96871fb19"
+      },
+      {
+        "path": "scripts/utils/helpers.py",
+        "sha256": "b01979ee56ab92037b8f8054a883124d600b8337cf461855092b866091aed24a"
+      },
+      {
+        "path": "scripts/utils/validators/connection_validator.py",
+        "sha256": "9ac82108e69690b74d9aa89ca51f7d06fe860e880aaa1983d08242d7199d1601"
+      },
+      {
+        "path": "scripts/utils/validators/parameter_validator.py",
+        "sha256": "157dfcb7f1937df88344647a37a124d52e1de1b992b72c9b9e69d3b717ca0195"
+      },
+      {
+        "path": "scripts/utils/validators/__init__.py",
+        "sha256": "2d109ad1b5d253578a095c8354159fdf9318154b4f62d9b16eaa1a88a422382d"
+      },
+      {
+        "path": "scripts/utils/validators/host_validator.py",
+        "sha256": "79cab42587435a799349ba8a562c4ec0f3d54f3f2790562c894c6289beade6d6"
+      },
+      {
+        "path": ".claude-plugin/plugin.json",
+        "sha256": "0ec7466bbf2e8dc2fe1607feff0cc0ef0ebebf44ff54f17dcce96255e2c21215"
+      }
+    ],
+    "dirSha256": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805"
+  },
+  "security": {
+    "scannedAt": null,
+    "scannerVersion": null,
+    "flags": []
+  }
+}
--- a/references/sshsync-guide.md
+++ b/references/sshsync-guide.md
@@ -0,0 +1,466 @@
+# sshsync CLI Tool Guide
+
+Complete reference for using sshsync with Tailscale SSH Sync Agent.
+
+## Table of Contents
+
+1. [Installation](#installation)
+2. [Configuration](#configuration)
+3. [Core Commands](#core-commands)
+4. [Advanced Usage](#advanced-usage)
+5. [Troubleshooting](#troubleshooting)
+
+## Installation
+
+### Via pip
+
+```bash
+pip install sshsync
+```
+
+### Verify Installation
+
+```bash
+sshsync --version
+```
+
+## Configuration
+
+### 1. SSH Config Setup
+
+sshsync uses your existing SSH configuration. Edit `~/.ssh/config`:
+
+```
+# Example host entries
+Host homelab-1
+  HostName 100.64.1.10
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+  Port 22
+
+Host prod-web-01
+  HostName 100.64.1.20
+  User deploy
+  IdentityFile ~/.ssh/id_rsa
+  Port 22
+
+Host dev-laptop
+  HostName 100.64.1.30
+  User developer
+```
+
+**Important Notes**:
+- sshsync uses the **Host alias** (e.g., "homelab-1"), not the actual hostname
+- Ensure SSH key authentication is configured
+- Test each host with `ssh host-alias` before using with sshsync
+
+### 2. Initialize sshsync Configuration
+
+First run:
+
+```bash
+sshsync sync
+```
+
+This will:
+1. Read all hosts from your SSH config
+2. Prompt you to assign hosts to groups
+3. Create `~/.config/sshsync/config.yaml`
+
+### 3. sshsync Config File
+
+Location: `~/.config/sshsync/config.yaml`
+
+Structure:
+```yaml
+groups:
+  production:
+    - prod-web-01
+    - prod-web-02
+    - prod-db-01
+  development:
+    - dev-laptop
+    - dev-desktop
+  homelab:
+    - homelab-1
+    - homelab-2
+```
+
+**Manual Editing**:
+- Groups are arbitrary labels (use what makes sense for you)
+- Hosts can belong to multiple groups
+- Use consistent host aliases from SSH config
+
+## Core Commands
+
+### List Hosts
+
+```bash
+# List all configured hosts
+sshsync ls
+
+# List with online/offline status
+sshsync ls --with-status
+```
+
+**Output Example**:
+```
+Host            Status
+homelab-1       online
+homelab-2       offline
+prod-web-01     online
+dev-laptop      online
+```
+
+### Execute Commands
+
+#### On All Hosts
+
+```bash
+# Execute on all configured hosts
+sshsync all "df -h"
+
+# With custom timeout (default: 10s)
+sshsync all --timeout 20 "systemctl status nginx"
+
+# Dry-run (preview without executing)
+sshsync all --dry-run "reboot"
+```
+
+#### On Specific Group
+
+```bash
+# Execute on group
+sshsync group production "uptime"
+
+# With timeout
+sshsync group web-servers --timeout 30 "npm run build"
+
+# Filter with regex
+sshsync group production --regex "web-.*" "df -h"
+```
+
+**Regex Filtering**:
+- Filters group members by alias matching pattern
+- Uses Python regex syntax
+- Example: `--regex "web-0[1-3]"` matches web-01, web-02, web-03
+
+### File Transfer
+
+#### Push Files
+
+```bash
+# Push to specific host
+sshsync push --host web-01 ./app /var/www/app
+
+# Push to group
+sshsync push --group production ./dist /var/www/app
+
+# Push to all hosts
+sshsync push --all ./config.yml /etc/app/config.yml
+
+# Recursive push (directory with contents)
+sshsync push --group web --recurse ./app /var/www/app
+
+# Dry-run
+sshsync push --group production --dry-run ./dist /var/www/app
+```
+
+**Important**:
+- Local path comes first, remote path second
+- Use `--recurse` for directories
+- Dry-run shows what would be transferred without executing
+
+#### Pull Files
+
+```bash
+# Pull from specific host
+sshsync pull --host db-01 /var/log/mysql/error.log ./logs/
+
+# Pull from group (creates separate directories per host)
+sshsync pull --group databases /var/backups ./backups/
+
+# Recursive pull
+sshsync pull --host web-01 --recurse /var/www/app ./backup/
+```
+
+**Pull Behavior**:
+- When pulling from groups, creates subdirectory per host
+- Use `--recurse` to pull entire directory trees
+- Destination directory created if doesn't exist
+
+### Group Management
+
+#### Add Hosts to Group
+
+```bash
+# Interactive: prompts to select hosts
+sshsync gadd production
+
+# Follow prompts to select which hosts to add
+```
+
+#### Add Host to SSH Config
+
+```bash
+# Interactive host addition
+sshsync hadd
+
+# Follow prompts for:
+# - Host alias
+# - Hostname/IP
+# - Username
+# - Port (optional)
+# - Identity file (optional)
+```
+
+#### Sync Ungrouped Hosts
+
+```bash
+# Assign groups to hosts not yet in any group
+sshsync sync
+```
+
+## Advanced Usage
+
+### Parallel Execution
+
+sshsync automatically executes commands in parallel across hosts:
+
+```bash
+# This runs simultaneously on all hosts in group
+sshsync group web-servers "npm run build"
+```
+
+**Performance**:
+- Commands execute concurrently
+- Results collected as they complete
+- Timeout applies per-host independently
+
+### Timeout Strategies
+
+Different operations need different timeouts:
+
+```bash
+# Quick checks (5-10s)
+sshsync all --timeout 5 "hostname"
+
+# Moderate operations (30-60s)
+sshsync group web --timeout 60 "npm install"
+
+# Long-running tasks (300s+)
+sshsync group build --timeout 300 "docker build ."
+```
+
+**Timeout Best Practices**:
+- Set timeout 20-30% longer than expected duration
+- Use dry-run first to estimate timing
+- Increase timeout for network-intensive operations
+
+### Combining with Other Tools
+
+#### With xargs
+
+```bash
+# Get list of online hosts
+sshsync ls --with-status | grep online | awk '{print $1}' | xargs -I {} echo "Host {} is online"
+```
+
+#### With jq (if using JSON output)
+
+```bash
+# Parse structured output (if sshsync supports --json flag)
+sshsync ls --json | jq '.hosts[] | select(.status=="online") | .name'
+```
+
+#### In Shell Scripts
+
+```bash
+#!/bin/bash
+
+# Deploy script using sshsync
+echo "Deploying to staging..."
+sshsync push --group staging --recurse ./dist /var/www/app
+
+if [ $? -eq 0 ]; then
+    echo "Staging deployment successful"
+
+    echo "Running tests..."
+    sshsync group staging "cd /var/www/app && npm test"
+
+    if [ $? -eq 0 ]; then
+        echo "Tests passed, deploying to production..."
+        sshsync push --group production --recurse ./dist /var/www/app
+    fi
+fi
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### 1. "Permission denied (publickey)"
+
+**Cause**: SSH key not configured or not added to ssh-agent
+
+**Solution**:
+```bash
+# Add SSH key to agent
+ssh-add ~/.ssh/id_ed25519
+
+# Verify it's added
+ssh-add -l
+
+# Copy public key to remote
+ssh-copy-id user@host
+```
+
+#### 2. "Connection timed out"
+
+**Cause**: Host is offline or network issue
+
+**Solution**:
+```bash
+# Test connectivity
+ping hostname
+
+# Test Tailscale specifically
+tailscale ping hostname
+
+# Check Tailscale status
+tailscale status
+```
+
+#### 3. "Host not found in SSH config"
+
+**Cause**: Host alias not in `~/.ssh/config`
+
+**Solution**:
+```bash
+# Add host to SSH config
+sshsync hadd
+
+# Or manually edit ~/.ssh/config
+vim ~/.ssh/config
+```
+
+#### 4. "Group not found"
+
+**Cause**: Group doesn't exist in sshsync config
+
+**Solution**:
+```bash
+# Add hosts to new group
+sshsync gadd mygroup
+
+# Or manually edit config
+vim ~/.config/sshsync/config.yaml
+```
+
+#### 5. File Transfer Fails
+
+**Cause**: Insufficient permissions, disk space, or path doesn't exist
+
+**Solution**:
+```bash
+# Check remote disk space
+sshsync group production "df -h"
+
+# Check remote path exists
+sshsync group production "ls -ld /target/path"
+
+# Check permissions
+sshsync group production "ls -la /target/path"
+```
+
+### Debug Mode
+
+While sshsync doesn't have a built-in verbose mode, you can debug underlying SSH:
+
+```bash
+# Increase SSH verbosity
+SSH_VERBOSE=1 sshsync all "uptime"
+
+# Or use dry-run to see what would execute
+sshsync all --dry-run "command"
+```
+
+### Performance Issues
+
+If operations are slow:
+
+1. **Reduce parallelism** (run on fewer hosts at once)
+2. **Increase timeout** for network-bound operations
+3. **Check network latency**:
+   ```bash
+   sshsync all "echo $HOSTNAME" --timeout 5
+   ```
+
+### Configuration Validation
+
+```bash
+# Verify SSH config is readable
+cat ~/.ssh/config
+
+# Verify sshsync config
+cat ~/.config/sshsync/config.yaml
+
+# Test hosts individually
+for host in $(sshsync ls | awk '{print $1}'); do
+    echo "Testing $host..."
+    ssh $host "echo OK" || echo "FAILED: $host"
+done
+```
+
+## Best Practices
+
+1. **Use meaningful host aliases** in SSH config
+2. **Organize groups logically** (by function, environment, location)
+3. **Always dry-run first** for destructive operations
+4. **Set appropriate timeouts** based on operation type
+5. **Test SSH keys** before using sshsync
+6. **Keep groups updated** as infrastructure changes
+7. **Use --with-status** to check availability before operations
+
+## Integration with Tailscale
+
+sshsync works seamlessly with Tailscale SSH:
+
+```bash
+# SSH config using Tailscale hostname
+Host homelab-1
+  HostName homelab-1.tailnet.ts.net
+  User admin
+
+# Or using Tailscale IP directly
+Host homelab-1
+  HostName 100.64.1.10
+  User admin
+```
+
+**Tailscale Advantages**:
+- No need for port forwarding
+- Encrypted connections
+- MagicDNS for easy hostnames
+- Works across NATs
+
+**Verify Tailscale**:
+```bash
+# Check Tailscale network
+tailscale status
+
+# Ping host via Tailscale
+tailscale ping homelab-1
+```
+
+## Summary
+
+sshsync simplifies multi-host SSH operations:
+- ✅ Execute commands across host groups
+- ✅ Transfer files to/from multiple hosts
+- ✅ Organize hosts into logical groups
+- ✅ Parallel execution for speed
+- ✅ Dry-run mode for safety
+- ✅ Works great with Tailscale
+
+For more help: `sshsync --help`
--- a/references/tailscale-integration.md
+++ b/references/tailscale-integration.md
@@ -0,0 +1,468 @@
+# Tailscale Integration Guide
+
+How to use Tailscale SSH with sshsync for secure, zero-config remote access.
+
+## What is Tailscale?
+
+Tailscale is a zero-config VPN that creates a secure network between your devices using WireGuard. It provides:
+
+- **Peer-to-peer encrypted connections**
+- **No port forwarding required**
+- **Works across NATs and firewalls**
+- **MagicDNS for easy device addressing**
+- **Built-in SSH functionality**
+- **Access control lists (ACLs)**
+
+## Why Tailscale + sshsync?
+
+Combining Tailscale with sshsync gives you:
+
+1. **Secure connections** everywhere (Tailscale encryption)
+2. **Simple addressing** (MagicDNS hostnames)
+3. **Multi-host operations** (sshsync groups and execution)
+4. **No firewall configuration** needed
+5. **Works from anywhere** (coffee shop, home, office)
+
+## Setup
+
+### 1. Install Tailscale
+
+**macOS**:
+```bash
+brew install tailscale
+```
+
+**Linux**:
+```bash
+curl -fsSL https://tailscale.com/install.sh | sh
+```
+
+**Verify Installation**:
+```bash
+tailscale version
+```
+
+### 2. Connect to Tailscale
+
+```bash
+# Start Tailscale
+sudo tailscale up
+
+# Follow the authentication link
+# This opens browser to authenticate
+
+# Verify connection
+tailscale status
+```
+
+### 3. Configure SSH via Tailscale
+
+Tailscale provides two SSH options:
+
+#### Option A: Tailscale SSH (Built-in)
+
+**Enable on each machine**:
+```bash
+sudo tailscale up --ssh
+```
+
+**Use**:
+```bash
+tailscale ssh user@machine-name
+```
+
+**Advantages**:
+- No SSH server configuration needed
+- Uses Tailscale authentication
+- Automatic key management
+
+#### Option B: Standard SSH over Tailscale (Recommended for sshsync)
+
+**Configure SSH config** to use Tailscale hostnames:
+
+```bash
+# ~/.ssh/config
+
+Host homelab-1
+  HostName homelab-1.tailnet-name.ts.net
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+
+# Or use Tailscale IP directly
+Host homelab-2
+  HostName 100.64.1.10
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+```
+
+**Advantages**:
+- Works with all SSH tools (including sshsync)
+- Standard SSH key authentication
+- More flexibility
+
+## Getting Tailscale Hostnames and IPs
+
+### View All Machines
+
+```bash
+tailscale status
+```
+
+**Output**:
+```
+100.64.1.10  homelab-1    user@    linux   -
+100.64.1.11  homelab-2    user@    linux   -
+100.64.1.20  laptop       user@    macOS   -
+100.64.1.30  phone        user@    iOS     offline
+```
+
+### Get MagicDNS Hostname
+
+**Format**: `machine-name.tailnet-name.ts.net`
+
+**Find your tailnet name**:
+```bash
+tailscale status --json | grep -i tailnet
+```
+
+Or check in Tailscale admin console: https://login.tailscale.com/admin/machines
+
+### Get Tailscale IP
+
+```bash
+# Your own IP
+tailscale ip -4
+
+# Another machine's IP (from status output)
+tailscale status | grep machine-name
+```
+
+## Testing Connectivity
+
+### Ping via Tailscale
+
+```bash
+# Ping by hostname
+tailscale ping homelab-1
+
+# Ping by IP
+tailscale ping 100.64.1.10
+```
+
+**Successful output**:
+```
+pong from homelab-1 (100.64.1.10) via DERP(nyc) in 45ms
+pong from homelab-1 (100.64.1.10) via DERP(nyc) in 43ms
+```
+
+**Failed output**:
+```
+timeout waiting for pong
+```
+
+### SSH Test
+
+```bash
+# Test SSH connection
+ssh user@homelab-1.tailnet.ts.net
+
+# Or with IP
+ssh user@100.64.1.10
+```
+
+## Configuring sshsync with Tailscale
+
+### Step 1: Add Tailscale Hosts to SSH Config
+
+```bash
+vim ~/.ssh/config
+```
+
+**Example configuration**:
+```
+# Production servers
+Host prod-web-01
+  HostName prod-web-01.tailnet.ts.net
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+
+Host prod-web-02
+  HostName prod-web-02.tailnet.ts.net
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+
+Host prod-db-01
+  HostName prod-db-01.tailnet.ts.net
+  User deploy
+  IdentityFile ~/.ssh/id_ed25519
+
+# Homelab
+Host homelab-1
+  HostName 100.64.1.10
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+
+Host homelab-2
+  HostName 100.64.1.11
+  User admin
+  IdentityFile ~/.ssh/id_ed25519
+
+# Development
+Host dev-laptop
+  HostName dev-laptop.tailnet.ts.net
+  User developer
+  IdentityFile ~/.ssh/id_ed25519
+```
+
+### Step 2: Test Each Host
+
+```bash
+# Test connectivity to each host
+ssh prod-web-01 "hostname"
+ssh homelab-1 "hostname"
+ssh dev-laptop "hostname"
+```
+
+### Step 3: Initialize sshsync
+
+```bash
+# Sync hosts and create groups
+sshsync sync
+
+# Add hosts to groups
+sshsync gadd production
+# Select: prod-web-01, prod-web-02, prod-db-01
+
+sshsync gadd homelab
+# Select: homelab-1, homelab-2
+
+sshsync gadd development
+# Select: dev-laptop
+```
+
+### Step 4: Verify Configuration
+
+```bash
+# List all hosts with status
+sshsync ls --with-status
+
+# Test command execution
+sshsync all "uptime"
+
+# Test group execution
+sshsync group production "df -h"
+```
+
+## Advanced Tailscale Features
+
+### Tailnet Lock
+
+Prevents unauthorized device additions:
+
+```bash
+tailscale lock status
+```
+
+### Exit Nodes
+
+Route all traffic through a specific machine:
+
+```bash
+# Enable exit node on a machine
+sudo tailscale up --advertise-exit-node
+
+# Use exit node from another machine
+sudo tailscale set --exit-node=exit-node-name
+```
+
+### Subnet Routing
+
+Access networks behind Tailscale machines:
+
+```bash
+# Advertise subnet routes
+sudo tailscale up --advertise-routes=192.168.1.0/24
+```
+
+### ACLs (Access Control Lists)
+
+Control who can access what: https://login.tailscale.com/admin/acls
+
+**Example ACL**:
+```json
+{
+  "acls": [
+    {
+      "action": "accept",
+      "src": ["group:admins"],
+      "dst": ["*:22", "*:80", "*:443"]
+    },
+    {
+      "action": "accept",
+      "src": ["group:developers"],
+      "dst": ["tag:development:*"]
+    }
+  ]
+}
+```
+
+## Troubleshooting
+
+### Machine Shows Offline
+
+**Check Tailscale status**:
+```bash
+tailscale status
+```
+
+**Restart Tailscale**:
+```bash
+# macOS
+brew services restart tailscale
+
+# Linux
+sudo systemctl restart tailscaled
+```
+
+**Re-authenticate**:
+```bash
+sudo tailscale up
+```
+
+### Cannot Connect via SSH
+
+1. **Verify Tailscale connectivity**:
+   ```bash
+   tailscale ping machine-name
+   ```
+
+2. **Check SSH is running** on remote:
+   ```bash
+   tailscale ssh machine-name "systemctl status sshd"
+   ```
+
+3. **Verify SSH keys**:
+   ```bash
+   ssh-add -l
+   ```
+
+4. **Test SSH directly**:
+   ```bash
+   ssh -v user@machine-name.tailnet.ts.net
+   ```
+
+### High Latency
+
+**Check connection method**:
+```bash
+tailscale status
+```
+
+Look for "direct" vs "DERP relay":
+- **Direct**: Low latency (< 50ms)
+- **DERP relay**: Higher latency (100-200ms)
+
+**Force direct connection**:
+```bash
+# Ensure both machines can establish P2P
+# May require NAT traversal
+```
+
+### MagicDNS Not Working
+
+**Enable MagicDNS**:
+1. Go to https://login.tailscale.com/admin/dns
+2. Enable MagicDNS
+
+**Verify**:
+```bash
+nslookup machine-name.tailnet.ts.net
+```
+
+## Security Best Practices
+
+1. **Use SSH keys**, not passwords
+2. **Enable Tailnet Lock** to prevent unauthorized devices
+3. **Use ACLs** to restrict access
+4. **Regularly review** connected devices
+5. **Set up key expiry** for team members who leave
+6. **Use tags** for machine roles
+7. **Enable two-factor auth** for Tailscale account
+
+## Monitoring
+
+### Check Network Status
+
+```bash
+# All machines
+tailscale status
+
+# Self status
+tailscale status --self
+
+# JSON format for parsing
+tailscale status --json
+```
+
+### View Logs
+
+```bash
+# macOS
+tail -f /var/log/tailscaled.log
+
+# Linux
+journalctl -u tailscaled -f
+```
+
+## Use Cases with sshsync
+
+### 1. Deploy to All Production Servers
+
+```bash
+sshsync push --group production --recurse ./dist /var/www/app
+sshsync group production "cd /var/www/app && pm2 restart all"
+```
+
+### 2. Collect Logs from All Servers
+
+```bash
+sshsync pull --group production /var/log/app/error.log ./logs/
+```
+
+### 3. Update All Homelab Machines
+
+```bash
+sshsync group homelab "sudo apt update && sudo apt upgrade -y"
+```
+
+### 4. Check Disk Space Everywhere
+
+```bash
+sshsync all "df -h /"
+```
+
+### 5. Sync Configuration Across Machines
+
+```bash
+sshsync push --all ~/dotfiles/.bashrc ~/.bashrc
+sshsync push --all ~/dotfiles/.vimrc ~/.vimrc
+```
+
+## Summary
+
+Tailscale + sshsync = **Powerful Remote Management**
+
+- ✅ Secure connections everywhere (WireGuard encryption)
+- ✅ No firewall configuration needed
+- ✅ Easy addressing (MagicDNS)
+- ✅ Multi-host operations (sshsync groups)
+- ✅ Works from anywhere
+
+**Quick Start**:
+1. Install Tailscale: `brew install tailscale`
+2. Connect: `sudo tailscale up`
+3. Configure SSH config with Tailscale hostnames
+4. Initialize sshsync: `sshsync sync`
+5. Start managing: `sshsync all "uptime"`
+
+For more: https://tailscale.com/kb/
--- a/scripts/load_balancer.py
+++ b/scripts/load_balancer.py
@@ -0,0 +1,378 @@
+#!/usr/bin/env python3
+"""
+Load balancer for Tailscale SSH Sync Agent.
+Intelligent task distribution based on machine resources.
+"""
+
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+from dataclasses import dataclass
+import logging
+
+# Add utils to path
+sys.path.insert(0, str(Path(__file__).parent))
+
+from utils.helpers import parse_cpu_load, parse_memory_usage, parse_disk_usage, calculate_load_score, classify_load_status
+from sshsync_wrapper import execute_on_host
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class MachineMetrics:
+    """Resource metrics for a machine."""
+    host: str
+    cpu_pct: float
+    mem_pct: float
+    disk_pct: float
+    load_score: float
+    status: str
+
+
+def get_machine_load(host: str, timeout: int = 10) -> Optional[MachineMetrics]:
+    """
+    Get CPU, memory, disk metrics for a machine.
+
+    Args:
+        host: Host to check
+        timeout: Command timeout
+
+    Returns:
+        MachineMetrics object or None on failure
+
+    Example:
+        >>> metrics = get_machine_load("web-01")
+        >>> metrics.cpu_pct
+        45.2
+        >>> metrics.load_score
+        0.49
+    """
+    try:
+        # Get CPU load
+        cpu_result = execute_on_host(host, "uptime", timeout=timeout)
+        cpu_data = {}
+        if cpu_result.get('success'):
+            cpu_data = parse_cpu_load(cpu_result['stdout'])
+
+        # Get memory usage
+        mem_result = execute_on_host(host, "free -m 2>/dev/null || vm_stat", timeout=timeout)
+        mem_data = {}
+        if mem_result.get('success'):
+            mem_data = parse_memory_usage(mem_result['stdout'])
+
+        # Get disk usage
+        disk_result = execute_on_host(host, "df -h / | tail -1", timeout=timeout)
+        disk_data = {}
+        if disk_result.get('success'):
+            disk_data = parse_disk_usage(disk_result['stdout'])
+
+        # Calculate metrics
+        # CPU: Use 1-min load average, normalize by assuming 4 cores (adjust as needed)
+        cpu_pct = (cpu_data.get('load_1min', 0) / 4.0) * 100 if cpu_data else 50.0
+
+        # Memory: Direct percentage
+        mem_pct = mem_data.get('use_pct', 50.0)
+
+        # Disk: Direct percentage
+        disk_pct = disk_data.get('use_pct', 50.0)
+
+        # Calculate load score
+        score = calculate_load_score(cpu_pct, mem_pct, disk_pct)
+        status = classify_load_status(score)
+
+        return MachineMetrics(
+            host=host,
+            cpu_pct=cpu_pct,
+            mem_pct=mem_pct,
+            disk_pct=disk_pct,
+            load_score=score,
+            status=status
+        )
+
+    except Exception as e:
+        logger.error(f"Error getting load for {host}: {e}")
+        return None
+
+
+def select_optimal_host(candidates: List[str],
+                       prefer_group: Optional[str] = None,
+                       timeout: int = 10) -> Tuple[Optional[str], Optional[MachineMetrics]]:
+    """
+    Pick best host from candidates based on load.
+
+    Args:
+        candidates: List of candidate hosts
+        prefer_group: Prefer hosts from this group if available
+        timeout: Timeout for metric gathering
+
+    Returns:
+        Tuple of (selected_host, metrics)
+
+    Example:
+        >>> host, metrics = select_optimal_host(["web-01", "web-02", "web-03"])
+        >>> host
+        "web-03"
+        >>> metrics.load_score
+        0.28
+    """
+    if not candidates:
+        return None, None
+
+    # Get metrics for all candidates
+    metrics_list: List[MachineMetrics] = []
+
+    for host in candidates:
+        metrics = get_machine_load(host, timeout=timeout)
+        if metrics:
+            metrics_list.append(metrics)
+
+    if not metrics_list:
+        logger.warning("No valid metrics collected from candidates")
+        return None, None
+
+    # Sort by load score (lower is better)
+    metrics_list.sort(key=lambda m: m.load_score)
+
+    # If prefer_group specified, prioritize those hosts if load is similar
+    if prefer_group:
+        from utils.helpers import parse_sshsync_config, get_groups_for_host
+        groups_config = parse_sshsync_config()
+
+        # Find hosts in preferred group
+        preferred_metrics = [
+            m for m in metrics_list
+            if prefer_group in get_groups_for_host(m.host, groups_config)
+        ]
+
+        # Use preferred if load score within 20% of absolute best
+        if preferred_metrics:
+            best_score = metrics_list[0].load_score
+            for m in preferred_metrics:
+                if m.load_score <= best_score * 1.2:
+                    return m.host, m
+
+    # Return absolute best
+    best = metrics_list[0]
+    return best.host, best
+
+
+def get_group_capacity(group: str, timeout: int = 10) -> Dict:
+    """
+    Get aggregate capacity of a group.
+
+    Args:
+        group: Group name
+        timeout: Timeout for metric gathering
+
+    Returns:
+        Dict with aggregate metrics:
+        {
+            'hosts': List[MachineMetrics],
+            'total_hosts': int,
+            'avg_cpu': float,
+            'avg_mem': float,
+            'avg_disk': float,
+            'avg_load_score': float,
+            'total_capacity': str  # descriptive
+        }
+
+    Example:
+        >>> capacity = get_group_capacity("production")
+        >>> capacity['avg_load_score']
+        0.45
+    """
+    from utils.helpers import parse_sshsync_config
+
+    groups_config = parse_sshsync_config()
+    group_hosts = groups_config.get(group, [])
+
+    if not group_hosts:
+        return {
+            'error': f'Group {group} not found or has no members',
+            'hosts': []
+        }
+
+    # Get metrics for all hosts in group
+    metrics_list: List[MachineMetrics] = []
+
+    for host in group_hosts:
+        metrics = get_machine_load(host, timeout=timeout)
+        if metrics:
+            metrics_list.append(metrics)
+
+    if not metrics_list:
+        return {
+            'error': f'Could not get metrics for any hosts in {group}',
+            'hosts': []
+        }
+
+    # Calculate aggregates
+    avg_cpu = sum(m.cpu_pct for m in metrics_list) / len(metrics_list)
+    avg_mem = sum(m.mem_pct for m in metrics_list) / len(metrics_list)
+    avg_disk = sum(m.disk_pct for m in metrics_list) / len(metrics_list)
+    avg_score = sum(m.load_score for m in metrics_list) / len(metrics_list)
+
+    # Determine overall capacity description
+    if avg_score < 0.4:
+        capacity_desc = "High capacity available"
+    elif avg_score < 0.7:
+        capacity_desc = "Moderate capacity"
+    else:
+        capacity_desc = "Limited capacity"
+
+    return {
+        'group': group,
+        'hosts': metrics_list,
+        'total_hosts': len(metrics_list),
+        'available_hosts': len(group_hosts),
+        'avg_cpu': avg_cpu,
+        'avg_mem': avg_mem,
+        'avg_disk': avg_disk,
+        'avg_load_score': avg_score,
+        'total_capacity': capacity_desc
+    }
+
+
+def distribute_tasks(tasks: List[Dict], hosts: List[str],
+                    timeout: int = 10) -> Dict[str, List[Dict]]:
+    """
+    Distribute multiple tasks optimally across hosts.
+
+    Args:
+        tasks: List of task dicts (each with 'command', 'priority', etc)
+        hosts: Available hosts
+        timeout: Timeout for metric gathering
+
+    Returns:
+        Dict mapping hosts to assigned tasks
+
+    Algorithm:
+        - Get current load for all hosts
+        - Assign tasks to least loaded hosts
+        - Balance by estimated task weight
+
+    Example:
+        >>> tasks = [
+        ...     {'command': 'npm run build', 'weight': 3},
+        ...     {'command': 'npm test', 'weight': 2}
+        ... ]
+        >>> distribution = distribute_tasks(tasks, ["web-01", "web-02"])
+        >>> distribution["web-01"]
+        [{'command': 'npm run build', 'weight': 3}]
+    """
+    if not tasks or not hosts:
+        return {}
+
+    # Get current load for all hosts
+    host_metrics = {}
+    for host in hosts:
+        metrics = get_machine_load(host, timeout=timeout)
+        if metrics:
+            host_metrics[host] = metrics
+
+    if not host_metrics:
+        logger.error("No valid host metrics available")
+        return {}
+
+    # Initialize assignment
+    assignment: Dict[str, List[Dict]] = {host: [] for host in host_metrics.keys()}
+    host_loads = {host: m.load_score for host, m in host_metrics.items()}
+
+    # Sort tasks by weight (descending) to assign heavy tasks first
+    sorted_tasks = sorted(
+        tasks,
+        key=lambda t: t.get('weight', 1),
+        reverse=True
+    )
+
+    # Assign each task to least loaded host
+    for task in sorted_tasks:
+        # Find host with minimum current load
+        min_host = min(host_loads.keys(), key=lambda h: host_loads[h])
+
+        # Assign task
+        assignment[min_host].append(task)
+
+        # Update simulated load (add task weight normalized)
+        task_weight = task.get('weight', 1)
+        host_loads[min_host] += (task_weight * 0.1)  # 0.1 = scaling factor
+
+    return assignment
+
+
+def format_load_report(metrics: MachineMetrics, compare_to_avg: Optional[Dict] = None) -> str:
+    """
+    Format load metrics as human-readable report.
+
+    Args:
+        metrics: Machine metrics
+        compare_to_avg: Optional dict with avg_cpu, avg_mem, avg_disk for comparison
+
+    Returns:
+        Formatted report string
+
+    Example:
+        >>> metrics = MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate')
+        >>> print(format_load_report(metrics))
+        web-01: Load Score: 0.49 (moderate)
+          CPU: 45.0% | Memory: 60.0% | Disk: 40.0%
+    """
+    lines = [
+        f"{metrics.host}: Load Score: {metrics.load_score:.2f} ({metrics.status})",
+        f"  CPU: {metrics.cpu_pct:.1f}% | Memory: {metrics.mem_pct:.1f}% | Disk: {metrics.disk_pct:.1f}%"
+    ]
+
+    if compare_to_avg:
+        cpu_vs = metrics.cpu_pct - compare_to_avg.get('avg_cpu', 0)
+        mem_vs = metrics.mem_pct - compare_to_avg.get('avg_mem', 0)
+        disk_vs = metrics.disk_pct - compare_to_avg.get('avg_disk', 0)
+
+        comparisons = []
+        if abs(cpu_vs) > 10:
+            comparisons.append(f"CPU {'+' if cpu_vs > 0 else ''}{cpu_vs:.0f}% vs avg")
+        if abs(mem_vs) > 10:
+            comparisons.append(f"Mem {'+' if mem_vs > 0 else ''}{mem_vs:.0f}% vs avg")
+        if abs(disk_vs) > 10:
+            comparisons.append(f"Disk {'+' if disk_vs > 0 else ''}{disk_vs:.0f}% vs avg")
+
+        if comparisons:
+            lines.append(f"  vs Average: {' | '.join(comparisons)}")
+
+    return "\n".join(lines)
+
+
+def main():
+    """Test load balancer functions."""
+    print("Testing load balancer...\n")
+
+    print("1. Testing select_optimal_host:")
+    print("   (Requires configured hosts - using dry-run simulation)")
+
+    # Simulate metrics
+    test_metrics = [
+        MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate'),
+        MachineMetrics('web-02', 85, 70, 65, 0.75, 'high'),
+        MachineMetrics('web-03', 20, 35, 30, 0.28, 'low'),
+    ]
+
+    # Sort by score
+    test_metrics.sort(key=lambda m: m.load_score)
+    best = test_metrics[0]
+
+    print(f"   ✓ Best host: {best.host} (score: {best.load_score:.2f})")
+    print(f"   Reason: {best.status} load")
+
+    print("\n2. Format load report:")
+    report = format_load_report(test_metrics[0], {
+        'avg_cpu': 50,
+        'avg_mem': 55,
+        'avg_disk': 45
+    })
+    print(report)
+
+    print("\n✅ Load balancer tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/sshsync_wrapper.py
+++ b/scripts/sshsync_wrapper.py
@@ -0,0 +1,409 @@
+#!/usr/bin/env python3
+"""
+SSH Sync wrapper for Tailscale SSH Sync Agent.
+Python interface to sshsync CLI operations.
+"""
+
+import subprocess
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+import json
+import logging
+
+# Add utils to path
+sys.path.insert(0, str(Path(__file__).parent))
+
+from utils.helpers import parse_ssh_config, parse_sshsync_config, format_bytes, format_duration
+from utils.validators import validate_host, validate_group, validate_path_exists, validate_timeout, validate_command
+
+logger = logging.getLogger(__name__)
+
+
+def get_host_status(group: Optional[str] = None) -> Dict:
+    """
+    Get online/offline status of hosts.
+
+    Args:
+        group: Optional group to filter (None = all hosts)
+
+    Returns:
+        Dict with status info
+
+    Example:
+        >>> status = get_host_status()
+        >>> status['online_count']
+        8
+    """
+    try:
+        # Run sshsync ls --with-status
+        cmd = ["sshsync", "ls", "--with-status"]
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
+
+        if result.returncode != 0:
+            return {'error': result.stderr, 'hosts': []}
+
+        # Parse output
+        hosts = []
+        for line in result.stdout.strip().split('\n'):
+            if not line or line.startswith('Host') or line.startswith('---'):
+                continue
+
+            parts = line.split()
+            if len(parts) >= 2:
+                host_name = parts[0]
+                status = parts[1] if len(parts) > 1 else 'unknown'
+
+                hosts.append({
+                    'host': host_name,
+                    'online': status.lower() in ['online', 'reachable', '✓'],
+                    'status': status
+                })
+
+        # Filter by group if specified
+        if group:
+            groups_config = parse_sshsync_config()
+            group_hosts = groups_config.get(group, [])
+            hosts = [h for h in hosts if h['host'] in group_hosts]
+
+        online_count = sum(1 for h in hosts if h['online'])
+
+        return {
+            'hosts': hosts,
+            'total_count': len(hosts),
+            'online_count': online_count,
+            'offline_count': len(hosts) - online_count,
+            'availability_pct': (online_count / len(hosts) * 100) if hosts else 0
+        }
+
+    except Exception as e:
+        logger.error(f"Error getting host status: {e}")
+        return {'error': str(e), 'hosts': []}
+
+
+def execute_on_all(command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
+    """
+    Execute command on all hosts.
+
+    Args:
+        command: Command to execute
+        timeout: Timeout in seconds
+        dry_run: If True, don't actually execute
+
+    Returns:
+        Dict with results per host
+
+    Example:
+        >>> result = execute_on_all("uptime", timeout=15)
+        >>> len(result['results'])
+        10
+    """
+    validate_command(command)
+    validate_timeout(timeout)
+
+    if dry_run:
+        return {
+            'dry_run': True,
+            'command': command,
+            'message': 'Would execute on all hosts'
+        }
+
+    try:
+        cmd = ["sshsync", "all", f"--timeout={timeout}", command]
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
+
+        # Parse results (format varies, simplified here)
+        return {
+            'success': result.returncode == 0,
+            'stdout': result.stdout,
+            'stderr': result.stderr,
+            'command': command
+        }
+
+    except subprocess.TimeoutExpired:
+        return {'error': f'Command timed out after {timeout}s'}
+    except Exception as e:
+        return {'error': str(e)}
+
+
+def execute_on_group(group: str, command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
+    """
+    Execute command on specific group.
+
+    Args:
+        group: Group name
+        command: Command to execute
+        timeout: Timeout in seconds
+        dry_run: Preview without executing
+
+    Returns:
+        Dict with execution results
+
+    Example:
+        >>> result = execute_on_group("web-servers", "df -h /var/www")
+        >>> result['success']
+        True
+    """
+    groups_config = parse_sshsync_config()
+    validate_group(group, list(groups_config.keys()))
+    validate_command(command)
+    validate_timeout(timeout)
+
+    if dry_run:
+        group_hosts = groups_config.get(group, [])
+        return {
+            'dry_run': True,
+            'group': group,
+            'hosts': group_hosts,
+            'command': command,
+            'message': f'Would execute on {len(group_hosts)} hosts in group {group}'
+        }
+
+    try:
+        cmd = ["sshsync", "group", f"--timeout={timeout}", group, command]
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
+
+        return {
+            'success': result.returncode == 0,
+            'group': group,
+            'stdout': result.stdout,
+            'stderr': result.stderr,
+            'command': command
+        }
+
+    except subprocess.TimeoutExpired:
+        return {'error': f'Command timed out after {timeout}s'}
+    except Exception as e:
+        return {'error': str(e)}
+
+
+def execute_on_host(host: str, command: str, timeout: int = 10) -> Dict:
+    """
+    Execute command on single host.
+
+    Args:
+        host: Host name
+        command: Command to execute
+        timeout: Timeout in seconds
+
+    Returns:
+        Dict with result
+
+    Example:
+        >>> result = execute_on_host("web-01", "hostname")
+        >>> result['stdout']
+        "web-01"
+    """
+    ssh_hosts = parse_ssh_config()
+    validate_host(host, list(ssh_hosts.keys()))
+    validate_command(command)
+    validate_timeout(timeout)
+
+    try:
+        cmd = ["ssh", "-o", f"ConnectTimeout={timeout}", host, command]
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
+
+        return {
+            'success': result.returncode == 0,
+            'host': host,
+            'stdout': result.stdout,
+            'stderr': result.stderr,
+            'command': command
+        }
+
+    except subprocess.TimeoutExpired:
+        return {'error': f'Command timed out after {timeout}s'}
+    except Exception as e:
+        return {'error': str(e)}
+
+
+def push_to_hosts(local_path: str, remote_path: str,
+                  hosts: Optional[List[str]] = None,
+                  group: Optional[str] = None,
+                  recurse: bool = False,
+                  dry_run: bool = False) -> Dict:
+    """
+    Push files to hosts.
+
+    Args:
+        local_path: Local file/directory path
+        remote_path: Remote destination path
+        hosts: Specific hosts (None = all if group also None)
+        group: Group name
+        recurse: Recursive copy
+        dry_run: Preview without executing
+
+    Returns:
+        Dict with push results
+
+    Example:
+        >>> result = push_to_hosts("./dist", "/var/www/app", group="production", recurse=True)
+        >>> result['success']
+        True
+    """
+    validate_path_exists(local_path)
+
+    if dry_run:
+        return {
+            'dry_run': True,
+            'local_path': local_path,
+            'remote_path': remote_path,
+            'hosts': hosts,
+            'group': group,
+            'recurse': recurse,
+            'message': 'Would push files'
+        }
+
+    try:
+        cmd = ["sshsync", "push"]
+
+        if hosts:
+            for host in hosts:
+                cmd.extend(["--host", host])
+        elif group:
+            cmd.extend(["--group", group])
+        else:
+            cmd.append("--all")
+
+        if recurse:
+            cmd.append("--recurse")
+
+        cmd.extend([local_path, remote_path])
+
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
+
+        return {
+            'success': result.returncode == 0,
+            'stdout': result.stdout,
+            'stderr': result.stderr,
+            'local_path': local_path,
+            'remote_path': remote_path
+        }
+
+    except subprocess.TimeoutExpired:
+        return {'error': 'Push operation timed out'}
+    except Exception as e:
+        return {'error': str(e)}
+
+
+def pull_from_host(host: str, remote_path: str, local_path: str,
+                   recurse: bool = False, dry_run: bool = False) -> Dict:
+    """
+    Pull files from host.
+
+    Args:
+        host: Host to pull from
+        remote_path: Remote file/directory path
+        local_path: Local destination path
+        recurse: Recursive copy
+        dry_run: Preview without executing
+
+    Returns:
+        Dict with pull results
+
+    Example:
+        >>> result = pull_from_host("web-01", "/var/log/nginx", "./logs", recurse=True)
+        >>> result['success']
+        True
+    """
+    ssh_hosts = parse_ssh_config()
+    validate_host(host, list(ssh_hosts.keys()))
+
+    if dry_run:
+        return {
+            'dry_run': True,
+            'host': host,
+            'remote_path': remote_path,
+            'local_path': local_path,
+            'recurse': recurse,
+            'message': f'Would pull from {host}'
+        }
+
+    try:
+        cmd = ["sshsync", "pull", "--host", host]
+
+        if recurse:
+            cmd.append("--recurse")
+
+        cmd.extend([remote_path, local_path])
+
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
+
+        return {
+            'success': result.returncode == 0,
+            'host': host,
+            'stdout': result.stdout,
+            'stderr': result.stderr,
+            'remote_path': remote_path,
+            'local_path': local_path
+        }
+
+    except subprocess.TimeoutExpired:
+        return {'error': 'Pull operation timed out'}
+    except Exception as e:
+        return {'error': str(e)}
+
+
+def list_hosts(with_status: bool = True) -> Dict:
+    """
+    List all configured hosts.
+
+    Args:
+        with_status: Include online/offline status
+
+    Returns:
+        Dict with hosts info
+
+    Example:
+        >>> result = list_hosts(with_status=True)
+        >>> len(result['hosts'])
+        10
+    """
+    if with_status:
+        return get_host_status()
+    else:
+        ssh_hosts = parse_ssh_config()
+        return {
+            'hosts': [{'host': name} for name in ssh_hosts.keys()],
+            'count': len(ssh_hosts)
+        }
+
+
+def get_groups() -> Dict[str, List[str]]:
+    """
+    Get all defined groups and their members.
+
+    Returns:
+        Dict mapping group names to host lists
+
+    Example:
+        >>> groups = get_groups()
+        >>> groups['production']
+        ['prod-web-01', 'prod-db-01']
+    """
+    return parse_sshsync_config()
+
+
+def main():
+    """Test sshsync wrapper functions."""
+    print("Testing sshsync wrapper...\n")
+
+    print("1. List hosts:")
+    result = list_hosts(with_status=False)
+    print(f"   Found {result.get('count', 0)} hosts")
+
+    print("\n2. Get groups:")
+    groups = get_groups()
+    print(f"   Found {len(groups)} groups")
+    for group, hosts in groups.items():
+        print(f"   - {group}: {len(hosts)} hosts")
+
+    print("\n3. Test dry-run:")
+    result = execute_on_all("uptime", dry_run=True)
+    print(f"   Dry-run: {result.get('message', 'OK')}")
+
+    print("\n✅ sshsync wrapper tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/tailscale_manager.py
+++ b/scripts/tailscale_manager.py
@@ -0,0 +1,426 @@
+#!/usr/bin/env python3
+"""
+Tailscale manager for Tailscale SSH Sync Agent.
+Tailscale-specific operations and status management.
+"""
+
+import subprocess
+import re
+import json
+from typing import Dict, List, Optional
+from dataclasses import dataclass
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TailscalePeer:
+    """Represents a Tailscale peer."""
+    hostname: str
+    ip: str
+    online: bool
+    last_seen: Optional[str] = None
+    os: Optional[str] = None
+    relay: Optional[str] = None
+
+
+def get_tailscale_status() -> Dict:
+    """
+    Get Tailscale network status (all peers).
+
+    Returns:
+        Dict with network status:
+        {
+            'connected': bool,
+            'peers': List[TailscalePeer],
+            'online_count': int,
+            'total_count': int,
+            'self_ip': str
+        }
+
+    Example:
+        >>> status = get_tailscale_status()
+        >>> status['online_count']
+        8
+        >>> status['peers'][0].hostname
+        'homelab-1'
+    """
+    try:
+        # Get status in JSON format
+        result = subprocess.run(
+            ["tailscale", "status", "--json"],
+            capture_output=True,
+            text=True,
+            timeout=10
+        )
+
+        if result.returncode != 0:
+            # Try text format if JSON fails
+            result = subprocess.run(
+                ["tailscale", "status"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+
+            if result.returncode != 0:
+                return {
+                    'connected': False,
+                    'error': 'Tailscale not running or accessible',
+                    'peers': []
+                }
+
+            # Parse text format
+            return _parse_text_status(result.stdout)
+
+        # Parse JSON format
+        data = json.loads(result.stdout)
+        return _parse_json_status(data)
+
+    except FileNotFoundError:
+        return {
+            'connected': False,
+            'error': 'Tailscale not installed',
+            'peers': []
+        }
+    except subprocess.TimeoutExpired:
+        return {
+            'connected': False,
+            'error': 'Timeout getting Tailscale status',
+            'peers': []
+        }
+    except Exception as e:
+        logger.error(f"Error getting Tailscale status: {e}")
+        return {
+            'connected': False,
+            'error': str(e),
+            'peers': []
+        }
+
+
+def _parse_json_status(data: Dict) -> Dict:
+    """Parse Tailscale JSON status."""
+    peers = []
+
+    self_data = data.get('Self', {})
+    self_ip = self_data.get('TailscaleIPs', [''])[0]
+
+    for peer_id, peer_data in data.get('Peer', {}).items():
+        hostname = peer_data.get('HostName', 'unknown')
+        ips = peer_data.get('TailscaleIPs', [])
+        ip = ips[0] if ips else 'unknown'
+        online = peer_data.get('Online', False)
+        os = peer_data.get('OS', 'unknown')
+
+        peers.append(TailscalePeer(
+            hostname=hostname,
+            ip=ip,
+            online=online,
+            os=os
+        ))
+
+    online_count = sum(1 for p in peers if p.online)
+
+    return {
+        'connected': True,
+        'peers': peers,
+        'online_count': online_count,
+        'total_count': len(peers),
+        'self_ip': self_ip
+    }
+
+
+def _parse_text_status(output: str) -> Dict:
+    """Parse Tailscale text status output."""
+    peers = []
+    self_ip = None
+
+    for line in output.strip().split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+
+        # Parse format: hostname  ip  status  ...
+        parts = line.split()
+        if len(parts) >= 2:
+            hostname = parts[0]
+            ip = parts[1] if len(parts) > 1 else 'unknown'
+
+            # Check for self (usually marked with *)
+            if hostname.endswith('-'):
+                self_ip = ip
+                continue
+
+            # Determine online status from additional fields
+            online = 'offline' not in line.lower()
+
+            peers.append(TailscalePeer(
+                hostname=hostname,
+                ip=ip,
+                online=online
+            ))
+
+    online_count = sum(1 for p in peers if p.online)
+
+    return {
+        'connected': True,
+        'peers': peers,
+        'online_count': online_count,
+        'total_count': len(peers),
+        'self_ip': self_ip or 'unknown'
+    }
+
+
+def check_connectivity(host: str, timeout: int = 5) -> bool:
+    """
+    Ping host via Tailscale.
+
+    Args:
+        host: Hostname to ping
+        timeout: Timeout in seconds
+
+    Returns:
+        True if host responds to ping
+
+    Example:
+        >>> check_connectivity("homelab-1")
+        True
+    """
+    try:
+        result = subprocess.run(
+            ["tailscale", "ping", "--timeout", f"{timeout}s", "--c", "1", host],
+            capture_output=True,
+            text=True,
+            timeout=timeout + 2
+        )
+
+        # Check if ping succeeded
+        return result.returncode == 0 or 'pong' in result.stdout.lower()
+
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        return False
+    except Exception as e:
+        logger.error(f"Error pinging {host}: {e}")
+        return False
+
+
+def get_peer_info(hostname: str) -> Optional[TailscalePeer]:
+    """
+    Get detailed info about a specific peer.
+
+    Args:
+        hostname: Peer hostname
+
+    Returns:
+        TailscalePeer object or None if not found
+
+    Example:
+        >>> peer = get_peer_info("homelab-1")
+        >>> peer.ip
+        '100.64.1.10'
+    """
+    status = get_tailscale_status()
+
+    if not status.get('connected'):
+        return None
+
+    for peer in status.get('peers', []):
+        if peer.hostname == hostname or hostname in peer.hostname:
+            return peer
+
+    return None
+
+
+def list_online_machines() -> List[str]:
+    """
+    List all online Tailscale machines.
+
+    Returns:
+        List of online machine hostnames
+
+    Example:
+        >>> machines = list_online_machines()
+        >>> len(machines)
+        8
+    """
+    status = get_tailscale_status()
+
+    if not status.get('connected'):
+        return []
+
+    return [
+        peer.hostname
+        for peer in status.get('peers', [])
+        if peer.online
+    ]
+
+
+def get_machine_ip(hostname: str) -> Optional[str]:
+    """
+    Get Tailscale IP for a machine.
+
+    Args:
+        hostname: Machine hostname
+
+    Returns:
+        IP address or None if not found
+
+    Example:
+        >>> ip = get_machine_ip("homelab-1")
+        >>> ip
+        '100.64.1.10'
+    """
+    peer = get_peer_info(hostname)
+    return peer.ip if peer else None
+
+
+def validate_tailscale_ssh(host: str, timeout: int = 10) -> Dict:
+    """
+    Check if Tailscale SSH is working for a host.
+
+    Args:
+        host: Host to check
+        timeout: Connection timeout
+
+    Returns:
+        Dict with validation results:
+        {
+            'working': bool,
+            'message': str,
+            'details': Dict
+        }
+
+    Example:
+        >>> result = validate_tailscale_ssh("homelab-1")
+        >>> result['working']
+        True
+    """
+    # First check if host is in Tailscale network
+    peer = get_peer_info(host)
+
+    if not peer:
+        return {
+            'working': False,
+            'message': f'Host {host} not found in Tailscale network',
+            'details': {'peer_found': False}
+        }
+
+    if not peer.online:
+        return {
+            'working': False,
+            'message': f'Host {host} is offline in Tailscale',
+            'details': {'peer_found': True, 'online': False}
+        }
+
+    # Check connectivity
+    if not check_connectivity(host, timeout=timeout):
+        return {
+            'working': False,
+            'message': f'Cannot ping {host} via Tailscale',
+            'details': {'peer_found': True, 'online': True, 'ping': False}
+        }
+
+    # Try SSH connection
+    try:
+        result = subprocess.run(
+            ["tailscale", "ssh", host, "echo", "test"],
+            capture_output=True,
+            text=True,
+            timeout=timeout
+        )
+
+        if result.returncode == 0:
+            return {
+                'working': True,
+                'message': f'Tailscale SSH to {host} is working',
+                'details': {
+                    'peer_found': True,
+                    'online': True,
+                    'ping': True,
+                    'ssh': True,
+                    'ip': peer.ip
+                }
+            }
+        else:
+            return {
+                'working': False,
+                'message': f'Tailscale SSH failed: {result.stderr}',
+                'details': {
+                    'peer_found': True,
+                    'online': True,
+                    'ping': True,
+                    'ssh': False,
+                    'error': result.stderr
+                }
+            }
+
+    except subprocess.TimeoutExpired:
+        return {
+            'working': False,
+            'message': f'Tailscale SSH timed out after {timeout}s',
+            'details': {'timeout': True}
+        }
+    except Exception as e:
+        return {
+            'working': False,
+            'message': f'Error testing Tailscale SSH: {e}',
+            'details': {'error': str(e)}
+        }
+
+
+def get_network_summary() -> str:
+    """
+    Get human-readable network summary.
+
+    Returns:
+        Formatted summary string
+
+    Example:
+        >>> print(get_network_summary())
+        Tailscale Network: Connected
+        Online: 8/10 machines (80%)
+        Self IP: 100.64.1.5
+    """
+    status = get_tailscale_status()
+
+    if not status.get('connected'):
+        return "Tailscale Network: Not connected\nError: {}".format(
+            status.get('error', 'Unknown error')
+        )
+
+    lines = [
+        "Tailscale Network: Connected",
+        f"Online: {status['online_count']}/{status['total_count']} machines ({status['online_count']/status['total_count']*100:.0f}%)",
+        f"Self IP: {status.get('self_ip', 'unknown')}"
+    ]
+
+    return "\n".join(lines)
+
+
+def main():
+    """Test Tailscale manager functions."""
+    print("Testing Tailscale manager...\n")
+
+    print("1. Get Tailscale status:")
+    status = get_tailscale_status()
+    if status.get('connected'):
+        print(f"   ✓ Connected")
+        print(f"   Peers: {status['total_count']} total, {status['online_count']} online")
+    else:
+        print(f"   ✗ Not connected: {status.get('error', 'Unknown error')}")
+
+    print("\n2. List online machines:")
+    machines = list_online_machines()
+    print(f"   Found {len(machines)} online machines")
+    for machine in machines[:5]:  # Show first 5
+        print(f"   - {machine}")
+
+    print("\n3. Network summary:")
+    print(get_network_summary())
+
+    print("\n✅ Tailscale manager tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/utils/helpers.py
+++ b/scripts/utils/helpers.py
@@ -0,0 +1,628 @@
+#!/usr/bin/env python3
+"""
+Helper utilities for Tailscale SSH Sync Agent.
+Provides common formatting, parsing, and utility functions.
+"""
+
+import os
+import re
+import subprocess
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple, Any
+import yaml
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+def format_bytes(bytes_value: int) -> str:
+    """
+    Format bytes as human-readable string.
+
+    Args:
+        bytes_value: Number of bytes
+
+    Returns:
+        Formatted string (e.g., "12.3 MB", "1.5 GB")
+
+    Example:
+        >>> format_bytes(12582912)
+        "12.0 MB"
+        >>> format_bytes(1610612736)
+        "1.5 GB"
+    """
+    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
+        if bytes_value < 1024.0:
+            return f"{bytes_value:.1f} {unit}"
+        bytes_value /= 1024.0
+    return f"{bytes_value:.1f} PB"
+
+
+def format_duration(seconds: float) -> str:
+    """
+    Format duration as human-readable string.
+
+    Args:
+        seconds: Duration in seconds
+
+    Returns:
+        Formatted string (e.g., "2m 15s", "1h 30m")
+
+    Example:
+        >>> format_duration(135)
+        "2m 15s"
+        >>> format_duration(5430)
+        "1h 30m 30s"
+    """
+    if seconds < 60:
+        return f"{int(seconds)}s"
+
+    minutes = int(seconds // 60)
+    secs = int(seconds % 60)
+
+    if minutes < 60:
+        return f"{minutes}m {secs}s" if secs > 0 else f"{minutes}m"
+
+    hours = minutes // 60
+    minutes = minutes % 60
+
+    parts = [f"{hours}h"]
+    if minutes > 0:
+        parts.append(f"{minutes}m")
+    if secs > 0 and hours == 0:  # Only show seconds if < 1 hour
+        parts.append(f"{secs}s")
+
+    return " ".join(parts)
+
+
+def format_percentage(value: float, decimals: int = 1) -> str:
+    """
+    Format percentage with specified decimals.
+
+    Args:
+        value: Percentage value (0-100)
+        decimals: Number of decimal places
+
+    Returns:
+        Formatted string (e.g., "45.5%")
+
+    Example:
+        >>> format_percentage(45.567)
+        "45.6%"
+    """
+    return f"{value:.{decimals}f}%"
+
+
+def parse_ssh_config(config_path: Optional[Path] = None) -> Dict[str, Dict[str, str]]:
+    """
+    Parse SSH config file for host definitions.
+
+    Args:
+        config_path: Path to SSH config (default: ~/.ssh/config)
+
+    Returns:
+        Dict mapping host aliases to their configuration:
+        {
+            'host-alias': {
+                'hostname': '100.64.1.10',
+                'user': 'admin',
+                'port': '22',
+                'identityfile': '~/.ssh/id_ed25519'
+            }
+        }
+
+    Example:
+        >>> hosts = parse_ssh_config()
+        >>> hosts['homelab-1']['hostname']
+        '100.64.1.10'
+    """
+    if config_path is None:
+        config_path = Path.home() / '.ssh' / 'config'
+
+    if not config_path.exists():
+        logger.warning(f"SSH config not found: {config_path}")
+        return {}
+
+    hosts = {}
+    current_host = None
+
+    try:
+        with open(config_path, 'r') as f:
+            for line in f:
+                line = line.strip()
+
+                # Skip comments and empty lines
+                if not line or line.startswith('#'):
+                    continue
+
+                # Host directive
+                if line.lower().startswith('host '):
+                    host_alias = line.split(maxsplit=1)[1]
+                    # Skip wildcards
+                    if '*' not in host_alias and '?' not in host_alias:
+                        current_host = host_alias
+                        hosts[current_host] = {}
+
+                # Configuration directives
+                elif current_host:
+                    parts = line.split(maxsplit=1)
+                    if len(parts) == 2:
+                        key, value = parts
+                        hosts[current_host][key.lower()] = value
+
+        return hosts
+
+    except Exception as e:
+        logger.error(f"Error parsing SSH config: {e}")
+        return {}
+
+
+def parse_sshsync_config(config_path: Optional[Path] = None) -> Dict[str, List[str]]:
+    """
+    Parse sshsync config file for group definitions.
+
+    Args:
+        config_path: Path to sshsync config (default: ~/.config/sshsync/config.yaml)
+
+    Returns:
+        Dict mapping group names to list of hosts:
+        {
+            'production': ['prod-web-01', 'prod-db-01'],
+            'development': ['dev-laptop', 'dev-desktop']
+        }
+
+    Example:
+        >>> groups = parse_sshsync_config()
+        >>> groups['production']
+        ['prod-web-01', 'prod-db-01']
+    """
+    if config_path is None:
+        config_path = Path.home() / '.config' / 'sshsync' / 'config.yaml'
+
+    if not config_path.exists():
+        logger.warning(f"sshsync config not found: {config_path}")
+        return {}
+
+    try:
+        with open(config_path, 'r') as f:
+            config = yaml.safe_load(f)
+
+        return config.get('groups', {})
+
+    except Exception as e:
+        logger.error(f"Error parsing sshsync config: {e}")
+        return {}
+
+
+def get_timestamp(iso: bool = True) -> str:
+    """
+    Get current timestamp.
+
+    Args:
+        iso: If True, return ISO format; otherwise human-readable
+
+    Returns:
+        Timestamp string
+
+    Example:
+        >>> get_timestamp(iso=True)
+        "2025-10-19T19:43:41Z"
+        >>> get_timestamp(iso=False)
+        "2025-10-19 19:43:41"
+    """
+    now = datetime.now()
+    if iso:
+        return now.strftime("%Y-%m-%dT%H:%M:%SZ")
+    else:
+        return now.strftime("%Y-%m-%d %H:%M:%S")
+
+
+def safe_execute(func, *args, default=None, **kwargs) -> Any:
+    """
+    Execute function with error handling.
+
+    Args:
+        func: Function to execute
+        *args: Positional arguments
+        default: Value to return on error
+        **kwargs: Keyword arguments
+
+    Returns:
+        Function result or default on error
+
+    Example:
+        >>> safe_execute(int, "not_a_number", default=0)
+        0
+        >>> safe_execute(int, "42")
+        42
+    """
+    try:
+        return func(*args, **kwargs)
+    except Exception as e:
+        logger.error(f"Error executing {func.__name__}: {e}")
+        return default
+
+
+def validate_path(path: str, must_exist: bool = True) -> bool:
+    """
+    Check if path is valid and accessible.
+
+    Args:
+        path: Path to validate
+        must_exist: If True, path must exist
+
+    Returns:
+        True if valid, False otherwise
+
+    Example:
+        >>> validate_path("/tmp")
+        True
+        >>> validate_path("/nonexistent", must_exist=True)
+        False
+    """
+    p = Path(path).expanduser()
+
+    if must_exist:
+        return p.exists()
+    else:
+        # Check if parent directory exists (for paths that will be created)
+        return p.parent.exists()
+
+
+def parse_disk_usage(df_output: str) -> Dict[str, Any]:
+    """
+    Parse 'df' command output.
+
+    Args:
+        df_output: Output from 'df -h' command
+
+    Returns:
+        Dict with disk usage info:
+        {
+            'filesystem': '/dev/sda1',
+            'size': '100G',
+            'used': '45G',
+            'available': '50G',
+            'use_pct': 45,
+            'mount': '/'
+        }
+
+    Example:
+        >>> output = "Filesystem     Size  Used Avail Use% Mounted on\\n/dev/sda1      100G   45G   50G  45% /"
+        >>> parse_disk_usage(output)
+        {'filesystem': '/dev/sda1', 'size': '100G', ...}
+    """
+    lines = df_output.strip().split('\n')
+    if len(lines) < 2:
+        return {}
+
+    # Parse last line (actual data, not header)
+    data_line = lines[-1]
+    parts = data_line.split()
+
+    if len(parts) < 6:
+        return {}
+
+    try:
+        return {
+            'filesystem': parts[0],
+            'size': parts[1],
+            'used': parts[2],
+            'available': parts[3],
+            'use_pct': int(parts[4].rstrip('%')),
+            'mount': parts[5]
+        }
+    except (ValueError, IndexError) as e:
+        logger.error(f"Error parsing disk usage: {e}")
+        return {}
+
+
+def parse_memory_usage(free_output: str) -> Dict[str, Any]:
+    """
+    Parse 'free' command output (Linux).
+
+    Args:
+        free_output: Output from 'free -m' command
+
+    Returns:
+        Dict with memory info:
+        {
+            'total': 16384,  # MB
+            'used': 8192,
+            'free': 8192,
+            'use_pct': 50.0
+        }
+
+    Example:
+        >>> output = "Mem:   16384   8192   8192   0   0   0"
+        >>> parse_memory_usage(output)
+        {'total': 16384, 'used': 8192, ...}
+    """
+    lines = free_output.strip().split('\n')
+
+    for line in lines:
+        if line.startswith('Mem:'):
+            parts = line.split()
+            if len(parts) >= 3:
+                try:
+                    total = int(parts[1])
+                    used = int(parts[2])
+                    free = int(parts[3]) if len(parts) > 3 else (total - used)
+
+                    return {
+                        'total': total,
+                        'used': used,
+                        'free': free,
+                        'use_pct': (used / total * 100) if total > 0 else 0
+                    }
+                except (ValueError, IndexError) as e:
+                    logger.error(f"Error parsing memory usage: {e}")
+
+    return {}
+
+
+def parse_cpu_load(uptime_output: str) -> Dict[str, float]:
+    """
+    Parse 'uptime' command output for load averages.
+
+    Args:
+        uptime_output: Output from 'uptime' command
+
+    Returns:
+        Dict with load averages:
+        {
+            'load_1min': 0.45,
+            'load_5min': 0.38,
+            'load_15min': 0.32
+        }
+
+    Example:
+        >>> output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
+        >>> parse_cpu_load(output)
+        {'load_1min': 0.45, 'load_5min': 0.38, 'load_15min': 0.32}
+    """
+    # Find "load average:" part
+    match = re.search(r'load average:\s+([\d.]+),\s+([\d.]+),\s+([\d.]+)', uptime_output)
+
+    if match:
+        try:
+            return {
+                'load_1min': float(match.group(1)),
+                'load_5min': float(match.group(2)),
+                'load_15min': float(match.group(3))
+            }
+        except ValueError as e:
+            logger.error(f"Error parsing CPU load: {e}")
+
+    return {}
+
+
+def format_host_status(host: str, online: bool, groups: List[str],
+                       latency: Optional[int] = None,
+                       tailscale_connected: bool = False) -> str:
+    """
+    Format host status as display string.
+
+    Args:
+        host: Host name
+        online: Whether host is online
+        groups: List of groups host belongs to
+        latency: Latency in ms (optional)
+        tailscale_connected: Tailscale connection status
+
+    Returns:
+        Formatted status string
+
+    Example:
+        >>> format_host_status("web-01", True, ["production", "web"], 25, True)
+        "🟢 web-01 (production, web) - Online - Tailscale: Connected | Latency: 25ms"
+    """
+    icon = "🟢" if online else "🔴"
+    status = "Online" if online else "Offline"
+    group_str = ", ".join(groups) if groups else "no group"
+
+    parts = [f"{icon} {host} ({group_str}) - {status}"]
+
+    if tailscale_connected:
+        parts.append("Tailscale: Connected")
+
+    if latency is not None and online:
+        parts.append(f"Latency: {latency}ms")
+
+    return " - ".join(parts)
+
+
+def calculate_load_score(cpu_pct: float, mem_pct: float, disk_pct: float) -> float:
+    """
+    Calculate composite load score for a machine.
+
+    Args:
+        cpu_pct: CPU usage percentage (0-100)
+        mem_pct: Memory usage percentage (0-100)
+        disk_pct: Disk usage percentage (0-100)
+
+    Returns:
+        Load score (0-1, lower is better)
+
+    Formula:
+        score = (cpu * 0.4) + (mem * 0.3) + (disk * 0.3)
+
+    Example:
+        >>> calculate_load_score(45, 60, 40)
+        0.48  # (0.45*0.4 + 0.60*0.3 + 0.40*0.3)
+    """
+    return (cpu_pct * 0.4 + mem_pct * 0.3 + disk_pct * 0.3) / 100
+
+
+def classify_load_status(score: float) -> str:
+    """
+    Classify load score into status category.
+
+    Args:
+        score: Load score (0-1)
+
+    Returns:
+        Status string: "low", "moderate", or "high"
+
+    Example:
+        >>> classify_load_status(0.28)
+        "low"
+        >>> classify_load_status(0.55)
+        "moderate"
+        >>> classify_load_status(0.82)
+        "high"
+    """
+    if score < 0.4:
+        return "low"
+    elif score < 0.7:
+        return "moderate"
+    else:
+        return "high"
+
+
+def classify_latency(latency_ms: int) -> Tuple[str, str]:
+    """
+    Classify network latency.
+
+    Args:
+        latency_ms: Latency in milliseconds
+
+    Returns:
+        Tuple of (status, description)
+
+    Example:
+        >>> classify_latency(25)
+        ("excellent", "Ideal for interactive tasks")
+        >>> classify_latency(150)
+        ("fair", "May impact interactive workflows")
+    """
+    if latency_ms < 50:
+        return ("excellent", "Ideal for interactive tasks")
+    elif latency_ms < 100:
+        return ("good", "Suitable for most operations")
+    elif latency_ms < 200:
+        return ("fair", "May impact interactive workflows")
+    else:
+        return ("poor", "Investigate network issues")
+
+
+def get_hosts_from_groups(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
+    """
+    Get list of hosts in a group.
+
+    Args:
+        group: Group name
+        groups_config: Groups configuration dict
+
+    Returns:
+        List of host names in group
+
+    Example:
+        >>> groups = {'production': ['web-01', 'db-01']}
+        >>> get_hosts_from_groups('production', groups)
+        ['web-01', 'db-01']
+    """
+    return groups_config.get(group, [])
+
+
+def get_groups_for_host(host: str, groups_config: Dict[str, List[str]]) -> List[str]:
+    """
+    Get list of groups a host belongs to.
+
+    Args:
+        host: Host name
+        groups_config: Groups configuration dict
+
+    Returns:
+        List of group names
+
+    Example:
+        >>> groups = {'production': ['web-01'], 'web': ['web-01', 'web-02']}
+        >>> get_groups_for_host('web-01', groups)
+        ['production', 'web']
+    """
+    return [group for group, hosts in groups_config.items() if host in hosts]
+
+
+def run_command(command: str, timeout: int = 10) -> Tuple[bool, str, str]:
+    """
+    Run shell command with timeout.
+
+    Args:
+        command: Command to execute
+        timeout: Timeout in seconds
+
+    Returns:
+        Tuple of (success, stdout, stderr)
+
+    Example:
+        >>> success, stdout, stderr = run_command("echo hello")
+        >>> success
+        True
+        >>> stdout.strip()
+        "hello"
+    """
+    try:
+        result = subprocess.run(
+            command,
+            shell=True,
+            capture_output=True,
+            text=True,
+            timeout=timeout
+        )
+
+        return (
+            result.returncode == 0,
+            result.stdout,
+            result.stderr
+        )
+
+    except subprocess.TimeoutExpired:
+        return (False, "", f"Command timed out after {timeout}s")
+    except Exception as e:
+        return (False, "", str(e))
+
+
+def main():
+    """Test helper functions."""
+    print("Testing helper functions...\n")
+
+    # Test formatting
+    print("1. Format bytes:")
+    print(f"   12582912 bytes = {format_bytes(12582912)}")
+    print(f"   1610612736 bytes = {format_bytes(1610612736)}")
+
+    print("\n2. Format duration:")
+    print(f"   135 seconds = {format_duration(135)}")
+    print(f"   5430 seconds = {format_duration(5430)}")
+
+    print("\n3. Format percentage:")
+    print(f"   45.567 = {format_percentage(45.567)}")
+
+    print("\n4. Calculate load score:")
+    score = calculate_load_score(45, 60, 40)
+    print(f"   CPU 45%, Mem 60%, Disk 40% = {score:.2f}")
+    print(f"   Status: {classify_load_status(score)}")
+
+    print("\n5. Classify latency:")
+    latencies = [25, 75, 150, 250]
+    for lat in latencies:
+        status, desc = classify_latency(lat)
+        print(f"   {lat}ms: {status} - {desc}")
+
+    print("\n6. Parse SSH config:")
+    ssh_hosts = parse_ssh_config()
+    print(f"   Found {len(ssh_hosts)} hosts")
+
+    print("\n7. Parse sshsync config:")
+    groups = parse_sshsync_config()
+    print(f"   Found {len(groups)} groups")
+    for group, hosts in groups.items():
+        print(f"   - {group}: {len(hosts)} hosts")
+
+    print("\n✅ All helpers tested successfully")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/utils/validators/init.py
+++ b/scripts/utils/validators/init.py
@@ -0,0 +1,43 @@
+"""
+Validators package for Tailscale SSH Sync Agent.
+"""
+
+from .parameter_validator import (
+    ValidationError,
+    validate_host,
+    validate_group,
+    validate_path_exists,
+    validate_timeout,
+    validate_command
+)
+
+from .host_validator import (
+    validate_ssh_config,
+    validate_host_reachable,
+    validate_group_members,
+    get_invalid_hosts
+)
+
+from .connection_validator import (
+    validate_ssh_connection,
+    validate_tailscale_connection,
+    validate_ssh_key,
+    get_connection_diagnostics
+)
+
+__all__ = [
+    'ValidationError',
+    'validate_host',
+    'validate_group',
+    'validate_path_exists',
+    'validate_timeout',
+    'validate_command',
+    'validate_ssh_config',
+    'validate_host_reachable',
+    'validate_group_members',
+    'get_invalid_hosts',
+    'validate_ssh_connection',
+    'validate_tailscale_connection',
+    'validate_ssh_key',
+    'get_connection_diagnostics',
+]
--- a/scripts/utils/validators/connection_validator.py
+++ b/scripts/utils/validators/connection_validator.py
@@ -0,0 +1,275 @@
+#!/usr/bin/env python3
+"""
+Connection validators for Tailscale SSH Sync Agent.
+Validates SSH and Tailscale connections.
+"""
+
+import subprocess
+from typing import Dict, Optional
+import logging
+
+from .parameter_validator import ValidationError
+
+logger = logging.getLogger(__name__)
+
+
+def validate_ssh_connection(host: str, timeout: int = 10) -> bool:
+    """
+    Test SSH connection works.
+
+    Args:
+        host: Host to connect to
+        timeout: Connection timeout in seconds
+
+    Returns:
+        True if SSH connection successful
+
+    Raises:
+        ValidationError: If connection fails
+
+    Example:
+        >>> validate_ssh_connection("web-01")
+        True
+    """
+    try:
+        # Try to execute a simple command via SSH
+        result = subprocess.run(
+            ["ssh", "-o", "ConnectTimeout={}".format(timeout),
+             "-o", "BatchMode=yes",
+             "-o", "StrictHostKeyChecking=no",
+             host, "echo", "test"],
+            capture_output=True,
+            text=True,
+            timeout=timeout + 5
+        )
+
+        if result.returncode == 0:
+            return True
+        else:
+            # Parse error message
+            error_msg = result.stderr.strip()
+
+            if "Permission denied" in error_msg:
+                raise ValidationError(
+                    f"SSH authentication failed for '{host}'\n"
+                    "Check:\n"
+                    "1. SSH key is added: ssh-add -l\n"
+                    "2. Public key is on remote: cat ~/.ssh/authorized_keys\n"
+                    "3. User/key in SSH config is correct"
+                )
+            elif "Connection refused" in error_msg:
+                raise ValidationError(
+                    f"SSH connection refused for '{host}'\n"
+                    "Check:\n"
+                    "1. SSH server is running on remote\n"
+                    "2. Port 22 is not blocked by firewall"
+                )
+            elif "Connection timed out" in error_msg or "timeout" in error_msg.lower():
+                raise ValidationError(
+                    f"SSH connection timed out for '{host}'\n"
+                    "Check:\n"
+                    "1. Host is reachable (ping test)\n"
+                    "2. Tailscale is connected\n"
+                    "3. Network connectivity"
+                )
+            else:
+                raise ValidationError(
+                    f"SSH connection failed for '{host}': {error_msg}"
+                )
+
+    except subprocess.TimeoutExpired:
+        raise ValidationError(
+            f"SSH connection timed out for '{host}' (>{timeout}s)"
+        )
+    except Exception as e:
+        raise ValidationError(f"Error testing SSH connection to '{host}': {e}")
+
+
+def validate_tailscale_connection(host: str) -> bool:
+    """
+    Test Tailscale connectivity to host.
+
+    Args:
+        host: Host to check
+
+    Returns:
+        True if Tailscale connection active
+
+    Raises:
+        ValidationError: If Tailscale not connected
+
+    Example:
+        >>> validate_tailscale_connection("web-01")
+        True
+    """
+    try:
+        # Check if tailscale is running
+        result = subprocess.run(
+            ["tailscale", "status"],
+            capture_output=True,
+            text=True,
+            timeout=5
+        )
+
+        if result.returncode != 0:
+            raise ValidationError(
+                "Tailscale is not running\n"
+                "Start Tailscale: sudo tailscale up"
+            )
+
+        # Check if specific host is in the network
+        if host in result.stdout or host.replace('-', '.') in result.stdout:
+            return True
+        else:
+            raise ValidationError(
+                f"Host '{host}' not found in Tailscale network\n"
+                "Ensure host is:\n"
+                "1. Connected to Tailscale\n"
+                "2. In the same tailnet\n"
+                "3. Not expired/offline"
+            )
+
+    except FileNotFoundError:
+        raise ValidationError(
+            "Tailscale not installed\n"
+            "Install: https://tailscale.com/download"
+        )
+    except subprocess.TimeoutExpired:
+        raise ValidationError("Timeout checking Tailscale status")
+    except Exception as e:
+        raise ValidationError(f"Error checking Tailscale connection: {e}")
+
+
+def validate_ssh_key(host: str) -> bool:
+    """
+    Check SSH key authentication is working.
+
+    Args:
+        host: Host to check
+
+    Returns:
+        True if SSH key auth works
+
+    Raises:
+        ValidationError: If key auth fails
+
+    Example:
+        >>> validate_ssh_key("web-01")
+        True
+    """
+    try:
+        # Test connection with explicit key-only auth
+        result = subprocess.run(
+            ["ssh", "-o", "BatchMode=yes",
+             "-o", "PasswordAuthentication=no",
+             "-o", "ConnectTimeout=5",
+             host, "echo", "test"],
+            capture_output=True,
+            text=True,
+            timeout=10
+        )
+
+        if result.returncode == 0:
+            return True
+        else:
+            error_msg = result.stderr.strip()
+
+            if "Permission denied" in error_msg:
+                raise ValidationError(
+                    f"SSH key authentication failed for '{host}'\n"
+                    "Fix:\n"
+                    "1. Add your SSH key: ssh-add ~/.ssh/id_ed25519\n"
+                    "2. Copy public key to remote: ssh-copy-id {}\n"
+                    "3. Verify: ssh -v {} 2>&1 | grep -i 'offering public key'".format(host, host)
+                )
+            else:
+                raise ValidationError(
+                    f"SSH key validation failed for '{host}': {error_msg}"
+                )
+
+    except subprocess.TimeoutExpired:
+        raise ValidationError(f"Timeout validating SSH key for '{host}'")
+    except Exception as e:
+        raise ValidationError(f"Error validating SSH key for '{host}': {e}")
+
+
+def get_connection_diagnostics(host: str) -> Dict[str, any]:
+    """
+    Comprehensive connection testing.
+
+    Args:
+        host: Host to diagnose
+
+    Returns:
+        Dict with diagnostic results:
+        {
+            'ping': {'success': bool, 'message': str},
+            'ssh': {'success': bool, 'message': str},
+            'tailscale': {'success': bool, 'message': str},
+            'ssh_key': {'success': bool, 'message': str}
+        }
+
+    Example:
+        >>> diag = get_connection_diagnostics("web-01")
+        >>> diag['ssh']['success']
+        True
+    """
+    diagnostics = {}
+
+    # Test 1: Ping
+    try:
+        result = subprocess.run(
+            ["ping", "-c", "1", "-W", "2", host],
+            capture_output=True,
+            timeout=3
+        )
+        diagnostics['ping'] = {
+            'success': result.returncode == 0,
+            'message': 'Host is reachable' if result.returncode == 0 else 'Host not reachable'
+        }
+    except Exception as e:
+        diagnostics['ping'] = {'success': False, 'message': str(e)}
+
+    # Test 2: SSH connection
+    try:
+        validate_ssh_connection(host, timeout=5)
+        diagnostics['ssh'] = {'success': True, 'message': 'SSH connection works'}
+    except ValidationError as e:
+        diagnostics['ssh'] = {'success': False, 'message': str(e).split('\n')[0]}
+
+    # Test 3: Tailscale
+    try:
+        validate_tailscale_connection(host)
+        diagnostics['tailscale'] = {'success': True, 'message': 'Tailscale connected'}
+    except ValidationError as e:
+        diagnostics['tailscale'] = {'success': False, 'message': str(e).split('\n')[0]}
+
+    # Test 4: SSH key
+    try:
+        validate_ssh_key(host)
+        diagnostics['ssh_key'] = {'success': True, 'message': 'SSH key authentication works'}
+    except ValidationError as e:
+        diagnostics['ssh_key'] = {'success': False, 'message': str(e).split('\n')[0]}
+
+    return diagnostics
+
+
+def main():
+    """Test connection validators."""
+    print("Testing connection validators...\n")
+
+    print("1. Testing connection diagnostics:")
+    try:
+        diag = get_connection_diagnostics("localhost")
+        print("   Results:")
+        for test, result in diag.items():
+            status = "✓" if result['success'] else "✗"
+            print(f"   {status} {test}: {result['message']}")
+    except Exception as e:
+        print(f"   Error: {e}")
+
+    print("\n✅ Connection validators tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/utils/validators/host_validator.py
+++ b/scripts/utils/validators/host_validator.py
@@ -0,0 +1,232 @@
+#!/usr/bin/env python3
+"""
+Host validators for Tailscale SSH Sync Agent.
+Validates host configuration and availability.
+"""
+
+import subprocess
+from typing import List, Dict, Optional
+from pathlib import Path
+import logging
+
+from .parameter_validator import ValidationError
+
+logger = logging.getLogger(__name__)
+
+
+def validate_ssh_config(host: str, config_path: Optional[Path] = None) -> bool:
+    """
+    Check if host has SSH config entry.
+
+    Args:
+        host: Host name to check
+        config_path: Path to SSH config (default: ~/.ssh/config)
+
+    Returns:
+        True if host is in SSH config
+
+    Raises:
+        ValidationError: If host not found in config
+
+    Example:
+        >>> validate_ssh_config("web-01")
+        True
+    """
+    if config_path is None:
+        config_path = Path.home() / '.ssh' / 'config'
+
+    if not config_path.exists():
+        raise ValidationError(
+            f"SSH config file not found: {config_path}\n"
+            "Create ~/.ssh/config with your host definitions"
+        )
+
+    # Parse SSH config for this host
+    host_found = False
+
+    try:
+        with open(config_path, 'r') as f:
+            for line in f:
+                line = line.strip()
+                if line.lower().startswith('host ') and host in line:
+                    host_found = True
+                    break
+
+        if not host_found:
+            raise ValidationError(
+                f"Host '{host}' not found in SSH config: {config_path}\n"
+                "Add host to SSH config:\n"
+                f"Host {host}\n"
+                f"  HostName <IP_ADDRESS>\n"
+                f"  User <USERNAME>"
+            )
+
+        return True
+
+    except IOError as e:
+        raise ValidationError(f"Error reading SSH config: {e}")
+
+
+def validate_host_reachable(host: str, timeout: int = 5) -> bool:
+    """
+    Check if host is reachable via ping.
+
+    Args:
+        host: Host name to check
+        timeout: Timeout in seconds
+
+    Returns:
+        True if host is reachable
+
+    Raises:
+        ValidationError: If host is not reachable
+
+    Example:
+        >>> validate_host_reachable("web-01", timeout=5)
+        True
+    """
+    try:
+        # Try to resolve via SSH config first
+        result = subprocess.run(
+            ["ssh", "-G", host],
+            capture_output=True,
+            text=True,
+            timeout=2
+        )
+
+        if result.returncode == 0:
+            # Extract hostname from SSH config
+            for line in result.stdout.split('\n'):
+                if line.startswith('hostname '):
+                    actual_host = line.split()[1]
+                    break
+            else:
+                actual_host = host
+        else:
+            actual_host = host
+
+        # Ping the host
+        ping_result = subprocess.run(
+            ["ping", "-c", "1", "-W", str(timeout), actual_host],
+            capture_output=True,
+            text=True,
+            timeout=timeout + 1
+        )
+
+        if ping_result.returncode == 0:
+            return True
+        else:
+            raise ValidationError(
+                f"Host '{host}' ({actual_host}) is not reachable\n"
+                "Check:\n"
+                "1. Host is powered on\n"
+                "2. Tailscale is connected\n"
+                "3. Network connectivity"
+            )
+
+    except subprocess.TimeoutExpired:
+        raise ValidationError(f"Timeout checking host '{host}' (>{timeout}s)")
+    except Exception as e:
+        raise ValidationError(f"Error checking host '{host}': {e}")
+
+
+def validate_group_members(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
+    """
+    Ensure group has valid members.
+
+    Args:
+        group: Group name
+        groups_config: Groups configuration dict
+
+    Returns:
+        List of valid hosts in group
+
+    Raises:
+        ValidationError: If group is empty or has no valid members
+
+    Example:
+        >>> groups = {'production': ['web-01', 'db-01']}
+        >>> validate_group_members('production', groups)
+        ['web-01', 'db-01']
+    """
+    if group not in groups_config:
+        raise ValidationError(
+            f"Group '{group}' not found in configuration\n"
+            f"Available groups: {', '.join(groups_config.keys())}"
+        )
+
+    members = groups_config[group]
+
+    if not members:
+        raise ValidationError(
+            f"Group '{group}' has no members\n"
+            f"Add hosts to group with: sshsync gadd {group}"
+        )
+
+    if not isinstance(members, list):
+        raise ValidationError(
+            f"Invalid group configuration for '{group}': members must be a list"
+        )
+
+    return members
+
+
+def get_invalid_hosts(hosts: List[str], config_path: Optional[Path] = None) -> List[str]:
+    """
+    Find hosts without valid SSH config.
+
+    Args:
+        hosts: List of host names
+        config_path: Path to SSH config
+
+    Returns:
+        List of hosts without valid config
+
+    Example:
+        >>> get_invalid_hosts(["web-01", "nonexistent"])
+        ["nonexistent"]
+    """
+    if config_path is None:
+        config_path = Path.home() / '.ssh' / 'config'
+
+    if not config_path.exists():
+        return hosts  # All invalid if no config
+
+    # Parse SSH config
+    valid_hosts = set()
+    try:
+        with open(config_path, 'r') as f:
+            for line in f:
+                line = line.strip()
+                if line.lower().startswith('host '):
+                    host_alias = line.split(maxsplit=1)[1]
+                    if '*' not in host_alias and '?' not in host_alias:
+                        valid_hosts.add(host_alias)
+    except IOError:
+        return hosts
+
+    # Find invalid hosts
+    return [h for h in hosts if h not in valid_hosts]
+
+
+def main():
+    """Test host validators."""
+    print("Testing host validators...\n")
+
+    print("1. Testing validate_ssh_config():")
+    try:
+        validate_ssh_config("localhost")
+        print("   ✓ localhost has SSH config")
+    except ValidationError as e:
+        print(f"   Note: {e.args[0].split(chr(10))[0]}")
+
+    print("\n2. Testing get_invalid_hosts():")
+    test_hosts = ["localhost", "nonexistent-host-12345"]
+    invalid = get_invalid_hosts(test_hosts)
+    print(f"   Invalid hosts: {invalid}")
+
+    print("\n✅ Host validators tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/utils/validators/parameter_validator.py
+++ b/scripts/utils/validators/parameter_validator.py
@@ -0,0 +1,363 @@
+#!/usr/bin/env python3
+"""
+Parameter validators for Tailscale SSH Sync Agent.
+Validates user inputs before making operations.
+"""
+
+from typing import List, Optional
+from pathlib import Path
+import re
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class ValidationError(Exception):
+    """Raised when validation fails."""
+    pass
+
+
+def validate_host(host: str, valid_hosts: Optional[List[str]] = None) -> str:
+    """
+    Validate host parameter.
+
+    Args:
+        host: Host name or alias
+        valid_hosts: List of valid hosts (None to skip check)
+
+    Returns:
+        str: Validated and normalized host name
+
+    Raises:
+        ValidationError: If host is invalid
+
+    Example:
+        >>> validate_host("web-01")
+        "web-01"
+        >>> validate_host("web-01", ["web-01", "web-02"])
+        "web-01"
+    """
+    if not host:
+        raise ValidationError("Host cannot be empty")
+
+    if not isinstance(host, str):
+        raise ValidationError(f"Host must be string, got {type(host)}")
+
+    # Normalize (strip whitespace, lowercase for comparison)
+    host = host.strip()
+
+    # Basic validation: alphanumeric, dash, underscore, dot
+    if not re.match(r'^[a-zA-Z0-9._-]+$', host):
+        raise ValidationError(
+            f"Invalid host name format: {host}\n"
+            "Host names must contain only letters, numbers, dots, dashes, and underscores"
+        )
+
+    # Check if valid (if list provided)
+    if valid_hosts:
+        # Try exact match first
+        if host in valid_hosts:
+            return host
+
+        # Try case-insensitive match
+        for valid_host in valid_hosts:
+            if host.lower() == valid_host.lower():
+                return valid_host
+
+        # Not found - provide suggestions
+        suggestions = [h for h in valid_hosts if host[:3].lower() in h.lower()]
+        raise ValidationError(
+            f"Invalid host: {host}\n"
+            f"Valid options: {', '.join(valid_hosts[:10])}\n"
+            + (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
+        )
+
+    return host
+
+
+def validate_group(group: str, valid_groups: Optional[List[str]] = None) -> str:
+    """
+    Validate group parameter.
+
+    Args:
+        group: Group name
+        valid_groups: List of valid groups (None to skip check)
+
+    Returns:
+        str: Validated group name
+
+    Raises:
+        ValidationError: If group is invalid
+
+    Example:
+        >>> validate_group("production")
+        "production"
+        >>> validate_group("prod", ["production", "development"])
+        ValidationError: Invalid group: prod
+    """
+    if not group:
+        raise ValidationError("Group cannot be empty")
+
+    if not isinstance(group, str):
+        raise ValidationError(f"Group must be string, got {type(group)}")
+
+    # Normalize
+    group = group.strip().lower()
+
+    # Basic validation
+    if not re.match(r'^[a-z0-9_-]+$', group):
+        raise ValidationError(
+            f"Invalid group name format: {group}\n"
+            "Group names must contain only lowercase letters, numbers, dashes, and underscores"
+        )
+
+    # Check if valid (if list provided)
+    if valid_groups:
+        if group not in valid_groups:
+            suggestions = [g for g in valid_groups if group[:3] in g]
+            raise ValidationError(
+                f"Invalid group: {group}\n"
+                f"Valid groups: {', '.join(valid_groups)}\n"
+                + (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
+            )
+
+    return group
+
+
+def validate_path_exists(path: str, must_be_file: bool = False,
+                        must_be_dir: bool = False) -> Path:
+    """
+    Validate path exists and is accessible.
+
+    Args:
+        path: Path to validate
+        must_be_file: If True, path must be a file
+        must_be_dir: If True, path must be a directory
+
+    Returns:
+        Path: Validated Path object
+
+    Raises:
+        ValidationError: If path is invalid
+
+    Example:
+        >>> validate_path_exists("/tmp", must_be_dir=True)
+        Path('/tmp')
+        >>> validate_path_exists("/nonexistent")
+        ValidationError: Path does not exist: /nonexistent
+    """
+    if not path:
+        raise ValidationError("Path cannot be empty")
+
+    p = Path(path).expanduser().resolve()
+
+    if not p.exists():
+        raise ValidationError(
+            f"Path does not exist: {path}\n"
+            f"Resolved to: {p}"
+        )
+
+    if must_be_file and not p.is_file():
+        raise ValidationError(f"Path must be a file: {path}")
+
+    if must_be_dir and not p.is_dir():
+        raise ValidationError(f"Path must be a directory: {path}")
+
+    return p
+
+
+def validate_timeout(timeout: int, min_timeout: int = 1,
+                     max_timeout: int = 600) -> int:
+    """
+    Validate timeout parameter.
+
+    Args:
+        timeout: Timeout in seconds
+        min_timeout: Minimum allowed timeout
+        max_timeout: Maximum allowed timeout
+
+    Returns:
+        int: Validated timeout
+
+    Raises:
+        ValidationError: If timeout is invalid
+
+    Example:
+        >>> validate_timeout(10)
+        10
+        >>> validate_timeout(0)
+        ValidationError: Timeout must be between 1 and 600 seconds
+    """
+    if not isinstance(timeout, int):
+        raise ValidationError(f"Timeout must be integer, got {type(timeout)}")
+
+    if timeout < min_timeout:
+        raise ValidationError(
+            f"Timeout too low: {timeout}s (minimum: {min_timeout}s)"
+        )
+
+    if timeout > max_timeout:
+        raise ValidationError(
+            f"Timeout too high: {timeout}s (maximum: {max_timeout}s)"
+        )
+
+    return timeout
+
+
+def validate_command(command: str, allow_dangerous: bool = False) -> str:
+    """
+    Basic command safety validation.
+
+    Args:
+        command: Command to validate
+        allow_dangerous: If False, block potentially dangerous commands
+
+    Returns:
+        str: Validated command
+
+    Raises:
+        ValidationError: If command is invalid or dangerous
+
+    Example:
+        >>> validate_command("ls -la")
+        "ls -la"
+        >>> validate_command("rm -rf /", allow_dangerous=False)
+        ValidationError: Potentially dangerous command blocked: rm -rf
+    """
+    if not command:
+        raise ValidationError("Command cannot be empty")
+
+    if not isinstance(command, str):
+        raise ValidationError(f"Command must be string, got {type(command)}")
+
+    command = command.strip()
+
+    if not allow_dangerous:
+        # Check for dangerous patterns
+        dangerous_patterns = [
+            (r'\brm\s+-rf\s+/', "rm -rf on root directory"),
+            (r'\bmkfs\.', "filesystem formatting"),
+            (r'\bdd\s+.*of=/dev/', "disk writing with dd"),
+            (r':(){:|:&};:', "fork bomb"),
+            (r'>\s*/dev/sd[a-z]', "direct disk writing"),
+        ]
+
+        for pattern, description in dangerous_patterns:
+            if re.search(pattern, command, re.IGNORECASE):
+                raise ValidationError(
+                    f"Potentially dangerous command blocked: {description}\n"
+                    f"Command: {command}\n"
+                    "Use allow_dangerous=True if you really want to execute this"
+                )
+
+    return command
+
+
+def validate_hosts_list(hosts: List[str], valid_hosts: Optional[List[str]] = None) -> List[str]:
+    """
+    Validate a list of hosts.
+
+    Args:
+        hosts: List of host names
+        valid_hosts: List of valid hosts (None to skip check)
+
+    Returns:
+        List[str]: Validated host names
+
+    Raises:
+        ValidationError: If any host is invalid
+
+    Example:
+        >>> validate_hosts_list(["web-01", "web-02"])
+        ["web-01", "web-02"]
+    """
+    if not hosts:
+        raise ValidationError("Hosts list cannot be empty")
+
+    if not isinstance(hosts, list):
+        raise ValidationError(f"Hosts must be list, got {type(hosts)}")
+
+    validated = []
+    errors = []
+
+    for host in hosts:
+        try:
+            validated.append(validate_host(host, valid_hosts))
+        except ValidationError as e:
+            errors.append(str(e))
+
+    if errors:
+        raise ValidationError(
+            f"Invalid hosts in list:\n" + "\n".join(errors)
+        )
+
+    return validated
+
+
+def main():
+    """Test validators."""
+    print("Testing parameter validators...\n")
+
+    # Test host validation
+    print("1. Testing validate_host():")
+    try:
+        host = validate_host("web-01", ["web-01", "web-02", "db-01"])
+        print(f"   ✓ Valid host: {host}")
+    except ValidationError as e:
+        print(f"   ✗ Error: {e}")
+
+    try:
+        host = validate_host("invalid-host", ["web-01", "web-02"])
+        print(f"   ✗ Should have failed!")
+    except ValidationError as e:
+        print(f"   ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
+
+    # Test group validation
+    print("\n2. Testing validate_group():")
+    try:
+        group = validate_group("production", ["production", "development"])
+        print(f"   ✓ Valid group: {group}")
+    except ValidationError as e:
+        print(f"   ✗ Error: {e}")
+
+    # Test path validation
+    print("\n3. Testing validate_path_exists():")
+    try:
+        path = validate_path_exists("/tmp", must_be_dir=True)
+        print(f"   ✓ Valid path: {path}")
+    except ValidationError as e:
+        print(f"   ✗ Error: {e}")
+
+    # Test timeout validation
+    print("\n4. Testing validate_timeout():")
+    try:
+        timeout = validate_timeout(10)
+        print(f"   ✓ Valid timeout: {timeout}s")
+    except ValidationError as e:
+        print(f"   ✗ Error: {e}")
+
+    try:
+        timeout = validate_timeout(0)
+        print(f"   ✗ Should have failed!")
+    except ValidationError as e:
+        print(f"   ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
+
+    # Test command validation
+    print("\n5. Testing validate_command():")
+    try:
+        cmd = validate_command("ls -la")
+        print(f"   ✓ Safe command: {cmd}")
+    except ValidationError as e:
+        print(f"   ✗ Error: {e}")
+
+    try:
+        cmd = validate_command("rm -rf /", allow_dangerous=False)
+        print(f"   ✗ Should have failed!")
+    except ValidationError as e:
+        print(f"   ✓ Correctly blocked: {e.args[0].split(chr(10))[0]}")
+
+    print("\n✅ All parameter validators tested")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/workflow_executor.py
+++ b/scripts/workflow_executor.py
@@ -0,0 +1,445 @@
+#!/usr/bin/env python3
+"""
+Workflow executor for Tailscale SSH Sync Agent.
+Common multi-machine workflow automation.
+"""
+
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional
+import time
+import logging
+
+# Add utils to path
+sys.path.insert(0, str(Path(__file__).parent))
+
+from utils.helpers import format_duration, get_timestamp
+from sshsync_wrapper import execute_on_group, execute_on_host, push_to_hosts
+from load_balancer import get_group_capacity
+
+logger = logging.getLogger(__name__)
+
+
+def deploy_workflow(code_path: str,
+                    staging_group: str,
+                    prod_group: str,
+                    run_tests: bool = True) -> Dict:
+    """
+    Full deployment pipeline: staging → test → production.
+
+    Args:
+        code_path: Path to code to deploy
+        staging_group: Staging server group
+        prod_group: Production server group
+        run_tests: Whether to run tests on staging
+
+    Returns:
+        Dict with deployment results
+
+    Example:
+        >>> result = deploy_workflow("./dist", "staging", "production")
+        >>> result['success']
+        True
+        >>> result['duration']
+        "12m 45s"
+    """
+    start_time = time.time()
+    results = {
+        'stages': {},
+        'success': False,
+        'start_time': get_timestamp()
+    }
+
+    try:
+        # Stage 1: Deploy to staging
+        logger.info("Stage 1: Deploying to staging...")
+        stage1 = push_to_hosts(
+            local_path=code_path,
+            remote_path="/var/www/app",
+            group=staging_group,
+            recurse=True
+        )
+
+        results['stages']['staging_deploy'] = stage1
+
+        if not stage1.get('success'):
+            results['error'] = 'Staging deployment failed'
+            return results
+
+        # Build on staging
+        logger.info("Building on staging...")
+        build_result = execute_on_group(
+            staging_group,
+            "cd /var/www/app && npm run build",
+            timeout=300
+        )
+
+        results['stages']['staging_build'] = build_result
+
+        if not build_result.get('success'):
+            results['error'] = 'Staging build failed'
+            return results
+
+        # Stage 2: Run tests (if enabled)
+        if run_tests:
+            logger.info("Stage 2: Running tests...")
+            test_result = execute_on_group(
+                staging_group,
+                "cd /var/www/app && npm test",
+                timeout=600
+            )
+
+            results['stages']['tests'] = test_result
+
+            if not test_result.get('success'):
+                results['error'] = 'Tests failed on staging'
+                return results
+
+        # Stage 3: Validation
+        logger.info("Stage 3: Validating staging...")
+        health_result = execute_on_group(
+            staging_group,
+            "curl -f http://localhost:3000/health || echo 'Health check failed'",
+            timeout=10
+        )
+
+        results['stages']['staging_validation'] = health_result
+
+        # Stage 4: Deploy to production
+        logger.info("Stage 4: Deploying to production...")
+        prod_deploy = push_to_hosts(
+            local_path=code_path,
+            remote_path="/var/www/app",
+            group=prod_group,
+            recurse=True
+        )
+
+        results['stages']['production_deploy'] = prod_deploy
+
+        if not prod_deploy.get('success'):
+            results['error'] = 'Production deployment failed'
+            return results
+
+        # Build and restart on production
+        logger.info("Building and restarting production...")
+        prod_build = execute_on_group(
+            prod_group,
+            "cd /var/www/app && npm run build && pm2 restart app",
+            timeout=300
+        )
+
+        results['stages']['production_build'] = prod_build
+
+        # Stage 5: Production verification
+        logger.info("Stage 5: Verifying production...")
+        prod_health = execute_on_group(
+            prod_group,
+            "curl -f http://localhost:3000/health",
+            timeout=15
+        )
+
+        results['stages']['production_verification'] = prod_health
+
+        # Success!
+        results['success'] = True
+        results['duration'] = format_duration(time.time() - start_time)
+
+        return results
+
+    except Exception as e:
+        logger.error(f"Deployment workflow error: {e}")
+        results['error'] = str(e)
+        results['duration'] = format_duration(time.time() - start_time)
+        return results
+
+
+def backup_workflow(hosts: List[str],
+                   backup_paths: List[str],
+                   destination: str) -> Dict:
+    """
+    Backup files from multiple hosts.
+
+    Args:
+        hosts: List of hosts to backup from
+        backup_paths: Paths to backup on each host
+        destination: Local destination directory
+
+    Returns:
+        Dict with backup results
+
+    Example:
+        >>> result = backup_workflow(
+        ...     ["db-01", "db-02"],
+        ...     ["/var/lib/mysql"],
+        ...     "./backups"
+        ... )
+        >>> result['backed_up_hosts']
+        2
+    """
+    from sshsync_wrapper import pull_from_host
+
+    start_time = time.time()
+    results = {
+        'hosts': {},
+        'success': True,
+        'backed_up_hosts': 0
+    }
+
+    for host in hosts:
+        host_results = []
+
+        for backup_path in backup_paths:
+            # Create timestamped backup directory
+            timestamp = time.strftime("%Y%m%d_%H%M%S")
+            host_dest = f"{destination}/{host}_{timestamp}"
+
+            result = pull_from_host(
+                host=host,
+                remote_path=backup_path,
+                local_path=host_dest,
+                recurse=True
+            )
+
+            host_results.append(result)
+
+            if not result.get('success'):
+                results['success'] = False
+
+        results['hosts'][host] = host_results
+
+        if all(r.get('success') for r in host_results):
+            results['backed_up_hosts'] += 1
+
+    results['duration'] = format_duration(time.time() - start_time)
+
+    return results
+
+
+def sync_workflow(source_host: str,
+                 target_group: str,
+                 paths: List[str]) -> Dict:
+    """
+    Sync files from one host to many.
+
+    Args:
+        source_host: Host to pull from
+        target_group: Group to push to
+        paths: Paths to sync
+
+    Returns:
+        Dict with sync results
+
+    Example:
+        >>> result = sync_workflow(
+        ...     "master-db",
+        ...     "replica-dbs",
+        ...     ["/var/lib/mysql/config"]
+        ... )
+        >>> result['success']
+        True
+    """
+    from sshsync_wrapper import pull_from_host, push_to_hosts
+    import tempfile
+    import shutil
+
+    start_time = time.time()
+    results = {'paths': {}, 'success': True}
+
+    # Create temp directory
+    with tempfile.TemporaryDirectory() as temp_dir:
+        for path in paths:
+            # Pull from source
+            pull_result = pull_from_host(
+                host=source_host,
+                remote_path=path,
+                local_path=f"{temp_dir}/{Path(path).name}",
+                recurse=True
+            )
+
+            if not pull_result.get('success'):
+                results['paths'][path] = {
+                    'success': False,
+                    'error': 'Pull from source failed'
+                }
+                results['success'] = False
+                continue
+
+            # Push to targets
+            push_result = push_to_hosts(
+                local_path=f"{temp_dir}/{Path(path).name}",
+                remote_path=path,
+                group=target_group,
+                recurse=True
+            )
+
+            results['paths'][path] = {
+                'pull': pull_result,
+                'push': push_result,
+                'success': push_result.get('success', False)
+            }
+
+            if not push_result.get('success'):
+                results['success'] = False
+
+    results['duration'] = format_duration(time.time() - start_time)
+
+    return results
+
+
+def rolling_restart(group: str,
+                   service_name: str,
+                   wait_between: int = 30) -> Dict:
+    """
+    Zero-downtime rolling restart of a service across group.
+
+    Args:
+        group: Group to restart
+        service_name: Service name (e.g., "nginx", "app")
+        wait_between: Seconds to wait between restarts
+
+    Returns:
+        Dict with restart results
+
+    Example:
+        >>> result = rolling_restart("web-servers", "nginx")
+        >>> result['restarted_count']
+        3
+    """
+    from utils.helpers import parse_sshsync_config
+
+    start_time = time.time()
+    groups_config = parse_sshsync_config()
+    hosts = groups_config.get(group, [])
+
+    if not hosts:
+        return {
+            'success': False,
+            'error': f'Group {group} not found or empty'
+        }
+
+    results = {
+        'hosts': {},
+        'restarted_count': 0,
+        'failed_count': 0,
+        'success': True
+    }
+
+    for host in hosts:
+        logger.info(f"Restarting {service_name} on {host}...")
+
+        # Restart service
+        restart_result = execute_on_host(
+            host,
+            f"sudo systemctl restart {service_name} || sudo service {service_name} restart",
+            timeout=30
+        )
+
+        # Health check
+        time.sleep(5)  # Wait for service to start
+
+        health_result = execute_on_host(
+            host,
+            f"sudo systemctl is-active {service_name} || sudo service {service_name} status",
+            timeout=10
+        )
+
+        success = restart_result.get('success') and health_result.get('success')
+
+        results['hosts'][host] = {
+            'restart': restart_result,
+            'health': health_result,
+            'success': success
+        }
+
+        if success:
+            results['restarted_count'] += 1
+            logger.info(f"✓ {host} restarted successfully")
+        else:
+            results['failed_count'] += 1
+            results['success'] = False
+            logger.error(f"✗ {host} restart failed")
+
+        # Wait before next restart (except last)
+        if host != hosts[-1]:
+            time.sleep(wait_between)
+
+    results['duration'] = format_duration(time.time() - start_time)
+
+    return results
+
+
+def health_check_workflow(group: str,
+                         endpoint: str = "/health",
+                         timeout: int = 10) -> Dict:
+    """
+    Check health endpoint across group.
+
+    Args:
+        group: Group to check
+        endpoint: Health endpoint path
+        timeout: Request timeout
+
+    Returns:
+        Dict with health check results
+
+    Example:
+        >>> result = health_check_workflow("production", "/health")
+        >>> result['healthy_count']
+        3
+    """
+    from utils.helpers import parse_sshsync_config
+
+    groups_config = parse_sshsync_config()
+    hosts = groups_config.get(group, [])
+
+    if not hosts:
+        return {
+            'success': False,
+            'error': f'Group {group} not found or empty'
+        }
+
+    results = {
+        'hosts': {},
+        'healthy_count': 0,
+        'unhealthy_count': 0
+    }
+
+    for host in hosts:
+        health_result = execute_on_host(
+            host,
+            f"curl -f -s -o /dev/null -w '%{{http_code}}' http://localhost:3000{endpoint}",
+            timeout=timeout
+        )
+
+        is_healthy = (
+            health_result.get('success') and
+            '200' in health_result.get('stdout', '')
+        )
+
+        results['hosts'][host] = {
+            'healthy': is_healthy,
+            'response': health_result.get('stdout', '').strip()
+        }
+
+        if is_healthy:
+            results['healthy_count'] += 1
+        else:
+            results['unhealthy_count'] += 1
+
+    results['success'] = results['unhealthy_count'] == 0
+
+    return results
+
+
+def main():
+    """Test workflow executor functions."""
+    print("Testing workflow executor...\n")
+
+    print("Note: Workflow executor requires configured hosts and groups.")
+    print("Tests would execute real operations, so showing dry-run simulations.\n")
+
+    print("✅ Workflow executor ready")
+
+
+if __name__ == "__main__":
+    main()
--- a/tests/test_helpers.py
+++ b/tests/test_helpers.py
@@ -0,0 +1,180 @@
+#!/usr/bin/env python3
+"""
+Tests for helper utilities.
+"""
+
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
+
+from utils.helpers import *
+
+
+def test_format_bytes():
+    """Test byte formatting."""
+    assert format_bytes(0) == "0.0 B"
+    assert format_bytes(512) == "512.0 B"
+    assert format_bytes(1024) == "1.0 KB"
+    assert format_bytes(1048576) == "1.0 MB"
+    assert format_bytes(1073741824) == "1.0 GB"
+    print("✓ format_bytes() passed")
+    return True
+
+
+def test_format_duration():
+    """Test duration formatting."""
+    assert format_duration(30) == "30s"
+    assert format_duration(65) == "1m 5s"
+    assert format_duration(3600) == "1h"
+    assert format_duration(3665) == "1h 1m"
+    assert format_duration(7265) == "2h 1m"
+    print("✓ format_duration() passed")
+    return True
+
+
+def test_format_percentage():
+    """Test percentage formatting."""
+    assert format_percentage(45.567) == "45.6%"
+    assert format_percentage(100) == "100.0%"
+    assert format_percentage(0.123, decimals=2) == "0.12%"
+    print("✓ format_percentage() passed")
+    return True
+
+
+def test_calculate_load_score():
+    """Test load score calculation."""
+    score = calculate_load_score(50, 50, 50)
+    assert 0 <= score <= 1
+    assert abs(score - 0.5) < 0.01
+
+    score_low = calculate_load_score(20, 30, 25)
+    score_high = calculate_load_score(80, 85, 90)
+    assert score_low < score_high
+
+    print("✓ calculate_load_score() passed")
+    return True
+
+
+def test_classify_load_status():
+    """Test load status classification."""
+    assert classify_load_status(0.2) == "low"
+    assert classify_load_status(0.5) == "moderate"
+    assert classify_load_status(0.8) == "high"
+    print("✓ classify_load_status() passed")
+    return True
+
+
+def test_classify_latency():
+    """Test latency classification."""
+    status, desc = classify_latency(25)
+    assert status == "excellent"
+    assert "interactive" in desc.lower()
+
+    status, desc = classify_latency(150)
+    assert status == "fair"
+
+    print("✓ classify_latency() passed")
+    return True
+
+
+def test_parse_disk_usage():
+    """Test disk usage parsing."""
+    sample_output = """Filesystem     Size  Used Avail Use% Mounted on
+/dev/sda1      100G   45G   50G  45% /"""
+
+    result = parse_disk_usage(sample_output)
+    assert result['filesystem'] == '/dev/sda1'
+    assert result['size'] == '100G'
+    assert result['used'] == '45G'
+    assert result['use_pct'] == 45
+
+    print("✓ parse_disk_usage() passed")
+    return True
+
+
+def test_parse_cpu_load():
+    """Test CPU load parsing."""
+    sample_output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
+
+    result = parse_cpu_load(sample_output)
+    assert result['load_1min'] == 0.45
+    assert result['load_5min'] == 0.38
+    assert result['load_15min'] == 0.32
+
+    print("✓ parse_cpu_load() passed")
+    return True
+
+
+def test_get_timestamp():
+    """Test timestamp generation."""
+    ts_iso = get_timestamp(iso=True)
+    assert 'T' in ts_iso
+    assert 'Z' in ts_iso
+
+    ts_human = get_timestamp(iso=False)
+    assert ' ' in ts_human
+    assert len(ts_human) == 19  # YYYY-MM-DD HH:MM:SS
+
+    print("✓ get_timestamp() passed")
+    return True
+
+
+def test_validate_path():
+    """Test path validation."""
+    assert validate_path("/tmp", must_exist=True) == True
+    assert validate_path("/nonexistent_path_12345", must_exist=False) == False
+
+    print("✓ validate_path() passed")
+    return True
+
+
+def test_safe_execute():
+    """Test safe execution wrapper."""
+    # Should return result on success
+    result = safe_execute(int, "42")
+    assert result == 42
+
+    # Should return default on failure
+    result = safe_execute(int, "not_a_number", default=0)
+    assert result == 0
+
+    print("✓ safe_execute() passed")
+    return True
+
+
+def main():
+    """Run all helper tests."""
+    print("=" * 70)
+    print("HELPER TESTS")
+    print("=" * 70)
+
+    tests = [
+        test_format_bytes,
+        test_format_duration,
+        test_format_percentage,
+        test_calculate_load_score,
+        test_classify_load_status,
+        test_classify_latency,
+        test_parse_disk_usage,
+        test_parse_cpu_load,
+        test_get_timestamp,
+        test_validate_path,
+        test_safe_execute,
+    ]
+
+    passed = 0
+    for test in tests:
+        try:
+            if test():
+                passed += 1
+        except Exception as e:
+            print(f"✗ {test.__name__} failed: {e}")
+
+    print(f"\nResults: {passed}/{len(tests)} passed")
+    return passed == len(tests)
+
+
+if __name__ == "__main__":
+    success = main()
+    sys.exit(0 if success else 1)
--- a/tests/test_integration.py
+++ b/tests/test_integration.py
@@ -0,0 +1,346 @@
+#!/usr/bin/env python3
+"""
+Integration tests for Tailscale SSH Sync Agent.
+Tests complete workflows from query to result.
+"""
+
+import sys
+from pathlib import Path
+
+# Add scripts to path
+sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
+
+from sshsync_wrapper import get_host_status, list_hosts, get_groups
+from tailscale_manager import get_tailscale_status, get_network_summary
+from load_balancer import format_load_report, MachineMetrics
+from utils.helpers import (
+    format_bytes, format_duration, format_percentage,
+    calculate_load_score, classify_load_status, classify_latency
+)
+
+
+def test_host_status_basic():
+    """Test get_host_status() without errors."""
+    print("\n✓ Testing get_host_status()...")
+
+    try:
+        result = get_host_status()
+
+        # Validations
+        assert 'hosts' in result, "Missing 'hosts' in result"
+        assert isinstance(result.get('hosts', []), list), "'hosts' must be list"
+
+        # Should have basic counts even if no hosts configured
+        assert 'total_count' in result, "Missing 'total_count'"
+        assert 'online_count' in result, "Missing 'online_count'"
+        assert 'offline_count' in result, "Missing 'offline_count'"
+
+        print(f"  ✓ Found {result.get('total_count', 0)} hosts")
+        print(f"  ✓ Online: {result.get('online_count', 0)}")
+        print(f"  ✓ Offline: {result.get('offline_count', 0)}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+
+
+def test_list_hosts():
+    """Test list_hosts() function."""
+    print("\n✓ Testing list_hosts()...")
+
+    try:
+        result = list_hosts(with_status=False)
+
+        assert 'hosts' in result, "Missing 'hosts' in result"
+        assert 'count' in result, "Missing 'count' in result"
+        assert isinstance(result['hosts'], list), "'hosts' must be list"
+
+        print(f"  ✓ List hosts working")
+        print(f"  ✓ Found {result['count']} configured hosts")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_get_groups():
+    """Test get_groups() function."""
+    print("\n✓ Testing get_groups()...")
+
+    try:
+        groups = get_groups()
+
+        assert isinstance(groups, dict), "Groups must be dict"
+
+        print(f"  ✓ Groups config loaded")
+        print(f"  ✓ Found {len(groups)} groups")
+
+        for group, hosts in list(groups.items())[:3]:  # Show first 3
+            print(f"    - {group}: {len(hosts)} hosts")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_tailscale_status():
+    """Test Tailscale status check."""
+    print("\n✓ Testing get_tailscale_status()...")
+
+    try:
+        status = get_tailscale_status()
+
+        assert isinstance(status, dict), "Status must be dict"
+        assert 'connected' in status, "Missing 'connected' field"
+
+        if status.get('connected'):
+            print(f"  ✓ Tailscale connected")
+            print(f"  ✓ Peers: {status.get('total_count', 0)} total, {status.get('online_count', 0)} online")
+        else:
+            print(f"  ℹ Tailscale not connected: {status.get('error', 'Unknown')}")
+            print(f"  (This is OK if Tailscale is not installed/configured)")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_network_summary():
+    """Test network summary generation."""
+    print("\n✓ Testing get_network_summary()...")
+
+    try:
+        summary = get_network_summary()
+
+        assert isinstance(summary, str), "Summary must be string"
+        assert len(summary) > 0, "Summary cannot be empty"
+
+        print(f"  ✓ Network summary generated:")
+        for line in summary.split('\n'):
+            print(f"    {line}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_format_helpers():
+    """Test formatting helper functions."""
+    print("\n✓ Testing format helpers...")
+
+    try:
+        # Test format_bytes
+        assert format_bytes(1024) == "1.0 KB", "format_bytes failed for 1024"
+        assert format_bytes(12582912) == "12.0 MB", "format_bytes failed for 12MB"
+
+        # Test format_duration
+        assert format_duration(65) == "1m 5s", "format_duration failed for 65s"
+        assert format_duration(3665) == "1h 1m", "format_duration failed for 1h+"
+
+        # Test format_percentage
+        assert format_percentage(45.567) == "45.6%", "format_percentage failed"
+
+        print(f"  ✓ format_bytes(12582912) = {format_bytes(12582912)}")
+        print(f"  ✓ format_duration(3665) = {format_duration(3665)}")
+        print(f"  ✓ format_percentage(45.567) = {format_percentage(45.567)}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_load_score_calculation():
+    """Test load score calculation."""
+    print("\n✓ Testing calculate_load_score()...")
+
+    try:
+        # Test various scenarios
+        score1 = calculate_load_score(45, 60, 40)
+        assert 0 <= score1 <= 1, "Score must be 0-1"
+        assert abs(score1 - 0.49) < 0.01, f"Expected ~0.49, got {score1}"
+
+        score2 = calculate_load_score(20, 35, 30)
+        assert score2 < score1, "Lower usage should have lower score"
+
+        score3 = calculate_load_score(85, 70, 65)
+        assert score3 > score1, "Higher usage should have higher score"
+
+        print(f"  ✓ Low load (20%, 35%, 30%): {score2:.2f}")
+        print(f"  ✓ Med load (45%, 60%, 40%): {score1:.2f}")
+        print(f"  ✓ High load (85%, 70%, 65%): {score3:.2f}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_load_classification():
+    """Test load status classification."""
+    print("\n✓ Testing classify_load_status()...")
+
+    try:
+        assert classify_load_status(0.28) == "low", "0.28 should be 'low'"
+        assert classify_load_status(0.55) == "moderate", "0.55 should be 'moderate'"
+        assert classify_load_status(0.82) == "high", "0.82 should be 'high'"
+
+        print(f"  ✓ Score 0.28 = {classify_load_status(0.28)}")
+        print(f"  ✓ Score 0.55 = {classify_load_status(0.55)}")
+        print(f"  ✓ Score 0.82 = {classify_load_status(0.82)}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_latency_classification():
+    """Test network latency classification."""
+    print("\n✓ Testing classify_latency()...")
+
+    try:
+        status1, desc1 = classify_latency(25)
+        assert status1 == "excellent", "25ms should be 'excellent'"
+
+        status2, desc2 = classify_latency(75)
+        assert status2 == "good", "75ms should be 'good'"
+
+        status3, desc3 = classify_latency(150)
+        assert status3 == "fair", "150ms should be 'fair'"
+
+        status4, desc4 = classify_latency(250)
+        assert status4 == "poor", "250ms should be 'poor'"
+
+        print(f"  ✓ 25ms: {status1} - {desc1}")
+        print(f"  ✓ 75ms: {status2} - {desc2}")
+        print(f"  ✓ 150ms: {status3} - {desc3}")
+        print(f"  ✓ 250ms: {status4} - {desc4}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_load_report_formatting():
+    """Test load report formatting."""
+    print("\n✓ Testing format_load_report()...")
+
+    try:
+        metrics = MachineMetrics(
+            host='web-01',
+            cpu_pct=45.0,
+            mem_pct=60.0,
+            disk_pct=40.0,
+            load_score=0.49,
+            status='moderate'
+        )
+
+        report = format_load_report(metrics)
+
+        assert 'web-01' in report, "Report must include hostname"
+        assert '0.49' in report, "Report must include load score"
+        assert 'moderate' in report, "Report must include status"
+
+        print(f"  ✓ Report generated:")
+        for line in report.split('\n'):
+            print(f"    {line}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def test_dry_run_execution():
+    """Test dry-run mode for operations."""
+    print("\n✓ Testing dry-run execution...")
+
+    try:
+        from sshsync_wrapper import execute_on_all
+
+        result = execute_on_all("uptime", dry_run=True)
+
+        assert result.get('dry_run') == True, "Must indicate dry-run mode"
+        assert 'command' in result, "Must include command"
+        assert 'message' in result, "Must include message"
+
+        print(f"  ✓ Dry-run mode working")
+        print(f"  ✓ Command: {result.get('command')}")
+        print(f"  ✓ Message: {result.get('message')}")
+
+        return True
+
+    except Exception as e:
+        print(f"  ✗ FAILED: {e}")
+        return False
+
+
+def main():
+    """Run all integration tests."""
+    print("=" * 70)
+    print("INTEGRATION TESTS - Tailscale SSH Sync Agent")
+    print("=" * 70)
+
+    tests = [
+        ("Host status check", test_host_status_basic),
+        ("List hosts", test_list_hosts),
+        ("Get groups", test_get_groups),
+        ("Tailscale status", test_tailscale_status),
+        ("Network summary", test_network_summary),
+        ("Format helpers", test_format_helpers),
+        ("Load score calculation", test_load_score_calculation),
+        ("Load classification", test_load_classification),
+        ("Latency classification", test_latency_classification),
+        ("Load report formatting", test_load_report_formatting),
+        ("Dry-run execution", test_dry_run_execution),
+    ]
+
+    results = []
+    for test_name, test_func in tests:
+        passed = test_func()
+        results.append((test_name, passed))
+
+    # Summary
+    print("\n" + "=" * 70)
+    print("SUMMARY")
+    print("=" * 70)
+
+    for test_name, passed in results:
+        status = "✅ PASS" if passed else "❌ FAIL"
+        print(f"{status}: {test_name}")
+
+    passed_count = sum(1 for _, p in results if p)
+    total_count = len(results)
+
+    print(f"\nResults: {passed_count}/{total_count} passed")
+
+    if passed_count == total_count:
+        print("\n🎉 All tests passed!")
+    else:
+        print(f"\n⚠️  {total_count - passed_count} test(s) failed")
+
+    return passed_count == total_count
+
+
+if __name__ == "__main__":
+    success = main()
+    sys.exit(0 if success else 1)
--- a/tests/test_validation.py
+++ b/tests/test_validation.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python3
+"""
+Tests for validators.
+"""
+
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
+
+from utils.validators import *
+
+
+def test_validate_host():
+    """Test host validation."""
+    # Valid host
+    assert validate_host("web-01") == "web-01"
+    assert validate_host(" web-01 ") == "web-01"  # Strips whitespace
+
+    # With valid list
+    assert validate_host("web-01", ["web-01", "web-02"]) == "web-01"
+
+    # Invalid format
+    try:
+        validate_host("web@01")  # Invalid character
+        assert False, "Should have raised ValidationError"
+    except ValidationError:
+        pass
+
+    print("✓ validate_host() passed")
+    return True
+
+
+def test_validate_group():
+    """Test group validation."""
+    # Valid group
+    assert validate_group("production") == "production"
+    assert validate_group("PRODUCTION") == "production"  # Lowercase normalization
+
+    # With valid list
+    assert validate_group("production", ["production", "staging"]) == "production"
+
+    # Invalid
+    try:
+        validate_group("invalid!", ["production"])
+        assert False, "Should have raised ValidationError"
+    except ValidationError:
+        pass
+
+    print("✓ validate_group() passed")
+    return True
+
+
+def test_validate_path_exists():
+    """Test path existence validation."""
+    # Valid path
+    path = validate_path_exists("/tmp", must_be_dir=True)
+    assert isinstance(path, Path)
+
+    # Invalid path
+    try:
+        validate_path_exists("/nonexistent_12345")
+        assert False, "Should have raised ValidationError"
+    except ValidationError:
+        pass
+
+    print("✓ validate_path_exists() passed")
+    return True
+
+
+def test_validate_timeout():
+    """Test timeout validation."""
+    # Valid timeouts
+    assert validate_timeout(10) == 10
+    assert validate_timeout(1) == 1
+    assert validate_timeout(600) == 600
+
+    # Too low
+    try:
+        validate_timeout(0)
+        assert False, "Should have raised ValidationError"
+    except ValidationError:
+        pass
+
+    # Too high
+    try:
+        validate_timeout(1000)
+        assert False, "Should have raised ValidationError"
+    except ValidationError:
+        pass
+
+    print("✓ validate_timeout() passed")
+    return True
+
+
+def test_validate_command():
+    """Test command validation."""
+    # Safe commands
+    assert validate_command("ls -la") == "ls -la"
+    assert validate_command("uptime") == "uptime"
+
+    # Dangerous commands (should fail without allow_dangerous)
+    try:
+        validate_command("rm -rf /")
+        assert False, "Should have blocked dangerous command"
+    except ValidationError:
+        pass
+
+    # But should work with allow_dangerous
+    assert validate_command("rm -rf /tmp/test", allow_dangerous=True)
+
+    print("✓ validate_command() passed")
+    return True
+
+
+def test_validate_hosts_list():
+    """Test list validation."""
+    # Valid list
+    hosts = validate_hosts_list(["web-01", "web-02"])
+    assert len(hosts) == 2
+    assert "web-01" in hosts
+
+    # Empty list
+    try:
+        validate_hosts_list([])
+        assert False, "Should have raised ValidationError for empty list"
+    except ValidationError:
+        pass
+
+    print("✓ validate_hosts_list() passed")
+    return True
+
+
+def test_get_invalid_hosts():
+    """Test finding invalid hosts."""
+    # Test with mix of valid and invalid
+    # (This would require actual SSH config, so we test the function exists)
+    result = get_invalid_hosts(["web-01", "nonexistent-host-12345"])
+    assert isinstance(result, list)
+
+    print("✓ get_invalid_hosts() passed")
+    return True
+
+
+def main():
+    """Run all validation tests."""
+    print("=" * 70)
+    print("VALIDATION TESTS")
+    print("=" * 70)
+
+    tests = [
+        test_validate_host,
+        test_validate_group,
+        test_validate_path_exists,
+        test_validate_timeout,
+        test_validate_command,
+        test_validate_hosts_list,
+        test_get_invalid_hosts,
+    ]
+
+    passed = 0
+    for test in tests:
+        try:
+            if test():
+                passed += 1
+        except Exception as e:
+            print(f"✗ {test.__name__} failed: {e}")
+            import traceback
+            traceback.print_exc()
+
+    print(f"\nResults: {passed}/{len(tests)} passed")
+    return passed == len(tests)
+
+
+if __name__ == "__main__":
+    success = main()
+    sys.exit(0 if success else 1)