Initial commit

2025-11-29 18:47:40 +08:00
commit 14c678ceac
22 changed files with 7501 additions and 0 deletions
--- a/.claude-plugin/plugin.json
+++ b/.claude-plugin/plugin.json
@@ -0,0 +1,12 @@
 {
  "name": "tailscale-sshsync-agent",
  "description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
  "version": "0.0.0-2025.11.28",
  "author": {
    "name": "William VanSickle III",
    "email": "noreply@humanfrontierlabs.com"
  },
  "skills": [
    "./"
  ]
 }
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,163 @@
 # Changelog
 All notable changes to Tailscale SSH Sync Agent will be documented here.
 Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 Versioning follows [Semantic Versioning](https://semver.org/).
 ## [1.0.0] - 2025-10-19
 ### Added
 **Core Functionality:**
 - `sshsync_wrapper.py`: Python interface to sshsync CLI operations
  - `get_host_status()`: Check online/offline status of hosts
  - `execute_on_all()`: Run commands on all configured hosts
  - `execute_on_group()`: Run commands on specific groups
  - `execute_on_host()`: Run commands on single host
  - `push_to_hosts()`: Push files to multiple hosts (with groups support)
  - `pull_from_host()`: Pull files from hosts
  - `list_hosts()`: List all configured hosts
  - `get_groups()`: Get group configuration
 - `tailscale_manager.py`: Tailscale-specific operations
  - `get_tailscale_status()`: Get complete network status
  - `check_connectivity()`: Ping hosts via Tailscale
  - `get_peer_info()`: Get detailed peer information
  - `list_online_machines()`: List all online Tailscale machines
  - `validate_tailscale_ssh()`: Check if Tailscale SSH works for a host
  - `get_network_summary()`: Human-readable network summary
 - `load_balancer.py`: Intelligent task distribution
  - `get_machine_load()`: Get CPU, memory, disk metrics for a machine
  - `select_optimal_host()`: Pick best host based on current load
  - `get_group_capacity()`: Get aggregate capacity of a group
  - `distribute_tasks()`: Distribute multiple tasks optimally across hosts
  - `format_load_report()`: Format load metrics as human-readable report
 - `workflow_executor.py`: Common multi-machine workflows
  - `deploy_workflow()`: Full deployment pipeline (staging → test → production)
  - `backup_workflow()`: Backup files from multiple hosts
  - `sync_workflow()`: Sync files from one host to many
  - `rolling_restart()`: Zero-downtime service restart across group
  - `health_check_workflow()`: Check health endpoints across group
 **Utilities:**
 - `utils/helpers.py`: Common formatting and parsing functions
  - Byte formatting (`format_bytes`)
  - Duration formatting (`format_duration`)
  - Percentage formatting (`format_percentage`)
  - SSH config parsing (`parse_ssh_config`)
  - sshsync config parsing (`parse_sshsync_config`)
  - System metrics parsing (`parse_disk_usage`, `parse_memory_usage`, `parse_cpu_load`)
  - Load score calculation (`calculate_load_score`)
  - Status classification (`classify_load_status`, `classify_latency`)
  - Safe command execution (`run_command`, `safe_execute`)
 - `utils/validators/`: Comprehensive validation system
  - `parameter_validator.py`: Input validation (hosts, groups, paths, timeouts, commands)
  - `host_validator.py`: Host configuration and availability validation
  - `connection_validator.py`: SSH and Tailscale connection validation
 **Testing:**
 - `tests/test_integration.py`: 11 end-to-end integration tests
 - `tests/test_helpers.py`: 11 helper function tests
 - `tests/test_validation.py`: 7 validation tests
 - **Total: 29 tests** covering all major functionality
 **Documentation:**
 - `SKILL.md`: Complete skill documentation (6,000+ words)
  - When to use this skill
  - How it works
  - Data sources (sshsync CLI, Tailscale)
  - Detailed workflows for each operation type
  - Available scripts and functions
  - Error handling and validations
  - Performance and caching strategies
  - Usage examples
 - `references/sshsync-guide.md`: Complete sshsync CLI reference
 - `references/tailscale-integration.md`: Tailscale integration guide
 - `README.md`: Installation and quick start guide
 - `INSTALLATION.md`: Detailed setup tutorial
 - `DECISIONS.md`: Architecture decisions and rationale
 ### Data Sources
 **sshsync CLI:**
 - Installation: `pip install sshsync`
 - Configuration: `~/.config/sshsync/config.yaml`
 - SSH config integration: `~/.ssh/config`
 - Group-based host management
 - Remote command execution with timeouts
 - File push/pull operations (single or recursive)
 - Status checking and connectivity validation
 **Tailscale:**
 - Zero-config VPN with WireGuard encryption
 - MagicDNS for easy host addressing
 - Built-in SSH capabilities
 - Seamless integration with standard SSH
 - Peer-to-peer connections
 - Works across NATs and firewalls
 ### Coverage
 **Operations:**
 - Host status monitoring and availability checks
 - Intelligent load-based task distribution
 - Multi-host command execution (all hosts, groups, individual)
 - File synchronization workflows (push/pull)
 - Deployment pipelines (staging → production)
 - Backup and sync workflows
 - Rolling restarts with zero downtime
 - Health checking across services
 **Geographic Coverage:** All hosts in Tailscale network (global)
 **Temporal Coverage:** Real-time status and operations
 ### Known Limitations
 **v1.0.0:**
 - sshsync must be installed separately (`pip install sshsync`)
 - Tailscale must be configured separately
 - SSH keys must be set up manually on each host
 - Load balancing uses simple metrics (CPU, memory, disk)
 - No built-in monitoring dashboards (terminal output only)
 - No persistence of operation history (logs only)
 - Requires SSH config and sshsync config to be manually maintained
 ### Planned for v2.0
 **Enhanced Features:**
 - Automated SSH key distribution across hosts
 - Built-in operation history and logging database
 - Web dashboard for monitoring and operations
 - Advanced load balancing with custom metrics
 - Scheduled operations and cron integration
 - Operation rollback capabilities
 - Integration with configuration management tools (Ansible, Terraform)
 - Cost tracking for cloud resources
 - Performance metrics collection and visualization
 - Alert system for failed operations
 - Multi-tenancy support for team environments
 **Integrations:**
 - Prometheus metrics export
 - Grafana dashboard templates
 - Slack/Discord notifications
 - CI/CD pipeline integration
 - Container orchestration support (Docker, Kubernetes)
 ## [Unreleased]
 ### Planned
 - Add support for Windows hosts (PowerShell remoting)
 - Improve performance for large host groups (100+)
 - Add SSH connection pooling for faster operations
 - Implement operation queueing for long-running tasks
 - Add support for custom validation plugins
 - Expand coverage to Docker containers via SSH
 - Add retry strategies with exponential backoff
 - Implement circuit breaker pattern for failing hosts
--- a/DECISIONS.md
+++ b/DECISIONS.md
@@ -0,0 +1,458 @@
 # Architecture Decisions
 Documentation of all technical decisions made for Tailscale SSH Sync Agent.
 ## Tool Selection
 ### Selected Tool: sshsync
 **Justification:**
 ✅ **Advantages:**
 - **Ready-to-use**: Available via `pip install sshsync`
 - **Group management**: Built-in support for organizing hosts into groups
 - **Integration**: Works with existing SSH config (`~/.ssh/config`)
 - **Simple API**: Easy-to-wrap CLI interface
 - **Parallel execution**: Commands run concurrently across hosts
 - **File operations**: Push/pull with recursive support
 - **Timeout handling**: Per-command timeouts for reliability
 - **Active maintenance**: Regular updates and bug fixes
 - **Python-based**: Easy to extend and integrate
 ✅ **Coverage:**
 - All SSH-accessible hosts
 - Works with any SSH server (Linux, macOS, BSD, etc.)
 - Platform-agnostic (runs on any OS with Python)
 ✅ **Cost:**
 - Free and open-source
 - No API keys or subscriptions required
 - No rate limits
 ✅ **Documentation:**
 - Clear command-line interface
 - PyPI documentation available
 - GitHub repository with examples
 **Alternatives Considered:**
 ❌ **Fabric (Python library)**
 - Pros: Pure Python, very flexible
 - Cons: Requires writing more code, no built-in group management
 - **Rejected because**: sshsync provides ready-made functionality
 ❌ **Ansible**
 - Pros: Industry standard, very powerful
 - Cons: Requires learning YAML playbooks, overkill for simple operations
 - **Rejected because**: Too heavyweight for ad-hoc commands and file transfers
 ❌ **pssh (parallel-ssh)**
 - Pros: Simple parallel SSH
 - Cons: No group management, no file transfer built-in, less actively maintained
 - **Rejected because**: sshsync has better group management and file operations
 ❌ **Custom SSH wrapper**
 - Pros: Full control
 - Cons: Reinventing the wheel, maintaining parallel execution logic
 - **Rejected because**: sshsync already provides what we need
 **Conclusion:**
 sshsync is the best tool for this use case because it:
 1. Provides group-based host management out of the box
 2. Handles parallel execution automatically
 3. Integrates with existing SSH configuration
 4. Supports both command execution and file transfers
 5. Requires minimal wrapper code
 ## Integration: Tailscale
 **Decision**: Integrate with Tailscale for network connectivity
 **Justification:**
 ✅ **Why Tailscale:**
 - **Zero-config VPN**: No manual firewall/NAT configuration
 - **Secure by default**: WireGuard encryption
 - **Works everywhere**: Coffee shop, home, office, cloud
 - **MagicDNS**: Easy addressing (machine-name.tailnet.ts.net)
 - **Standard SSH**: Works with all SSH tools including sshsync
 - **No overhead**: Uses regular SSH protocol over Tailscale network
 ✅ **Integration approach:**
 - Tailscale provides the network layer
 - Standard SSH works over Tailscale
 - sshsync operates normally using Tailscale hostnames/IPs
 - No Tailscale-specific code needed in core operations
 - Tailscale status checking for diagnostics
 **Alternatives:**
 ❌ **Direct public internet + port forwarding**
 - Cons: Complex firewall setup, security risks, doesn't work on mobile/restricted networks
 - **Rejected because**: Requires too much configuration and has security concerns
 ❌ **Other VPNs (WireGuard, OpenVPN, ZeroTier)**
 - Cons: More manual configuration, less zero-config
 - **Rejected because**: Tailscale is easier to set up and use
 **Conclusion:**
 Tailscale + standard SSH is the optimal combination:
 - Secure connectivity without configuration
 - Works with existing SSH tools
 - No vendor lock-in (can use other VPNs if needed)
 ## Architecture
 ### Structure: Modular Scripts + Utilities
 **Decision**: Separate concerns into focused modules
 ```
 scripts/
 ├── sshsync_wrapper.py         # sshsync CLI interface
 ├── tailscale_manager.py       # Tailscale operations
 ├── load_balancer.py           # Task distribution logic
 ├── workflow_executor.py       # Common workflows
 └── utils/
    ├── helpers.py             # Formatting, parsing
    └── validators/            # Input validation
 ```
 **Justification:**
 ✅ **Modularity:**
 - Each script has single responsibility
 - Easy to test independently
 - Easy to extend without breaking others
 ✅ **Reusability:**
 - Helpers used across all scripts
 - Validators prevent duplicate validation logic
 - Workflows compose lower-level operations
 ✅ **Maintainability:**
 - Clear file organization
 - Easy to locate specific functionality
 - Separation of concerns
 **Alternatives:**
 ❌ **Monolithic single script**
 - Cons: Hard to test, hard to maintain, becomes too large
 - **Rejected because**: Doesn't scale well
 ❌ **Over-engineered class hierarchy**
 - Cons: Unnecessary complexity for this use case
 - **Rejected because**: Simple functions are sufficient
 **Conclusion:**
 Modular functional approach provides good balance of simplicity and maintainability.
 ### Validation Strategy: Multi-Layer
 **Decision**: Validate at multiple layers
 **Layers:**
 1. **Parameter validation** (`parameter_validator.py`)
   - Validates user inputs before any operations
   - Prevents invalid hosts, groups, paths, etc.
 2. **Host validation** (`host_validator.py`)
   - Validates SSH configuration exists
   - Checks host reachability
   - Validates group membership
 3. **Connection validation** (`connection_validator.py`)
   - Tests actual SSH connectivity
   - Verifies Tailscale status
   - Checks SSH key authentication
 **Justification:**
 ✅ **Early failure:**
 - Catch errors before expensive operations
 - Clear error messages at each layer
 ✅ **Comprehensive:**
 - Multiple validation points catch different issues
 - Reduces runtime failures
 ✅ **User-friendly:**
 - Helpful error messages with suggestions
 - Clear indication of what went wrong
 **Conclusion:**
 Multi-layer validation provides robust error handling and great user experience.
 ## Load Balancing Strategy
 ### Decision: Simple Composite Score
 **Formula:**
 ```python
 score = (cpu_pct * 0.4) + (mem_pct * 0.3) + (disk_pct * 0.3)
 ```
 **Weights:**
 - CPU: 40% (most important for compute tasks)
 - Memory: 30% (important for data processing)
 - Disk: 30% (important for I/O operations)
 **Justification:**
 ✅ **Simple and effective:**
 - Easy to understand
 - Fast to calculate
 - Works well for most workloads
 ✅ **Balanced:**
 - Considers multiple resource types
 - No single metric dominates
 **Alternatives:**
 ❌ **CPU only**
 - Cons: Ignores memory-bound and I/O-bound tasks
 - **Rejected because**: Too narrow
 ❌ **Complex ML-based prediction**
 - Cons: Overkill, slow, requires training data
 - **Rejected because**: Unnecessary complexity
 ❌ **Fixed round-robin**
 - Cons: Doesn't consider actual load
 - **Rejected because**: Can overload already-busy hosts
 **Conclusion:**
 Simple weighted score provides good balance without complexity.
 ## Error Handling Philosophy
 ### Decision: Graceful Degradation + Clear Messages
 **Principles:**
 1. **Fail early with validation**: Catch errors before operations
 2. **Isolate failures**: One host failure doesn't stop others
 3. **Clear messages**: Tell user exactly what went wrong and how to fix
 4. **Automatic retry**: Retry transient errors (network, timeout)
 5. **Dry-run support**: Preview operations before execution
 **Implementation:**
 ```python
 # Example error handling pattern
 try:
    validate_host(host)
    validate_ssh_connection(host)
    result = execute_command(host, command)
 except ValidationError as e:
    return {'error': str(e), 'suggestion': 'Fix: ...'}
 except ConnectionError as e:
    return {'error': str(e), 'diagnostics': get_diagnostics(host)}
 ```
 **Justification:**
 ✅ **Better UX:**
 - Users know exactly what's wrong
 - Suggestions help fix issues quickly
 ✅ **Reliability:**
 - Automatic retry handles transient issues
 - Dry-run prevents mistakes
 ✅ **Debugging:**
 - Clear error messages speed up troubleshooting
 - Diagnostics provide actionable information
 **Conclusion:**
 Graceful degradation with helpful messages creates better user experience.
 ## Caching Strategy
 **Decision**: Minimal caching for real-time accuracy
 **What we cache:**
 - Nothing (v1.0.0)
 **Why no caching:**
 - Host status changes frequently
 - Load metrics change constantly
 - Operations need real-time data
 - Cache invalidation is complex
 **Future consideration (v2.0):**
 - Cache Tailscale status (60s TTL)
 - Cache group configuration (5min TTL)
 - Cache SSH config parsing (5min TTL)
 **Justification:**
 ✅ **Simplicity:**
 - No cache invalidation logic needed
 - No stale data issues
 ✅ **Accuracy:**
 - Always get current state
 - No surprises from cached data
 **Trade-off:**
 - Slightly slower repeated operations
 - More network calls
 **Conclusion:**
 For v1.0.0, simplicity and accuracy outweigh performance concerns. Real-time data is more valuable than speed.
 ## Testing Strategy
 ### Decision: Comprehensive Unit + Integration Tests
 **Coverage:**
 - **29 tests total:**
  - 11 integration tests (end-to-end workflows)
  - 11 helper tests (formatting, parsing, calculations)
  - 7 validation tests (input validation, safety checks)
 **Test Philosophy:**
 1. **Test real functionality**: Integration tests use actual functions
 2. **Test edge cases**: Validation tests cover error conditions
 3. **Test helpers**: Ensure formatting/parsing works correctly
 4. **Fast execution**: All tests run in < 10 seconds
 5. **No external dependencies**: Tests don't require Tailscale or sshsync to be running
 **Justification:**
 ✅ **Confidence:**
 - Tests verify code works as expected
 - Catches regressions when modifying code
 ✅ **Documentation:**
 - Tests show how to use functions
 - Examples of expected behavior
 ✅ **Reliability:**
 - Production-ready code from v1.0.0
 **Conclusion:**
 Comprehensive testing ensures reliable code from the start.
 ## Performance Considerations
 ### Parallel Execution
 **Decision**: Leverage sshsync's built-in parallelization
 - sshsync runs commands concurrently across hosts automatically
 - No need to implement custom threading/multiprocessing
 - Timeout applies per-host independently
 **Trade-offs:**
 ✅ **Pros:**
 - Simple to use
 - Fast for large host groups
 - No concurrency bugs
 ⚠️ **Cons:**
 - Less control over parallelism level
 - Can overwhelm network with too many concurrent connections
 **Conclusion:**
 Built-in parallelization is sufficient for most use cases. Custom control can be added in v2.0 if needed.
 ## Security Considerations
 ### SSH Key Authentication
 **Decision**: Require SSH keys (no password auth)
 **Justification:**
 ✅ **Security:**
 - Keys are more secure than passwords
 - Can't be brute-forced
 - Can be revoked per-host
 ✅ **Automation:**
 - Non-interactive (no password prompts)
 - Works in scripts and CI/CD
 **Implementation:**
 - Validators check SSH key auth works
 - Clear error messages guide users to set up keys
 - Documentation explains SSH key setup
 ### Command Safety
 **Decision**: Validate dangerous commands
 **Dangerous patterns blocked:**
 - `rm -rf /` (root deletion)
 - `mkfs.*` (filesystem formatting)
 - `dd.*of=/dev/` (direct disk writes)
 - Fork bombs
 - Direct disk writes
 **Override**: Use `allow_dangerous=True` to bypass
 **Justification:**
 ✅ **Safety:**
 - Prevents accidental destructive operations
 - Dry-run provides preview
 ✅ **Flexibility:**
 - Can still run dangerous commands if explicitly allowed
 **Conclusion:**
 Safety by default with escape hatch for advanced users.
 ## Decisions Summary
 | Decision | Choice | Rationale |
 |----------|--------|-----------|
 | **CLI Tool** | sshsync | Best balance of features, ease of use, and maintenance |
 | **Network** | Tailscale | Zero-config secure VPN, works everywhere |
 | **Architecture** | Modular scripts | Clear separation of concerns, maintainable |
 | **Validation** | Multi-layer | Catch errors early with helpful messages |
 | **Load Balancing** | Composite score | Simple, effective, considers multiple resources |
 | **Caching** | None (v1.0) | Simplicity and real-time accuracy |
 | **Testing** | 29 tests | Comprehensive coverage for reliability |
 | **Security** | SSH keys + validation | Secure and automation-friendly |
 ## Trade-offs Accepted
 1. **No caching** → Slightly slower, but always accurate
 2. **sshsync dependency** → External tool, but saves development time
 3. **SSH key requirement** → Setup needed, but more secure
 4. **Simple load balancing** → Less sophisticated, but fast and easy to understand
 5. **Terminal UI only** → No web dashboard, but simpler to develop and maintain
 ## Future Improvements
 ### v2.0 Considerations
 1. **Add caching** for frequently-accessed data (Tailscale status, groups)
 2. **Web dashboard** for visualization and monitoring
 3. **Operation history** database for audit trail
 4. **Advanced load balancing** with custom metrics
 5. **Automated SSH key distribution** across hosts
 6. **Integration with config management** tools (Ansible, Terraform)
 7. **Container support** via SSH to Docker containers
 8. **Custom validation plugins** for domain-specific checks
 All decisions prioritize **simplicity**, **security**, and **maintainability** for v1.0.0.
--- a/INSTALLATION.md
+++ b/INSTALLATION.md
@@ -0,0 +1,707 @@
 # Installation Guide
 Complete step-by-step tutorial for setting up Tailscale SSH Sync Agent.
 ## Table of Contents
 1. [Prerequisites](#prerequisites)
 2. [Step 1: Install Tailscale](#step-1-install-tailscale)
 3. [Step 2: Install sshsync](#step-2-install-sshsync)
 4. [Step 3: Configure SSH](#step-3-configure-ssh)
 5. [Step 4: Configure sshsync Groups](#step-4-configure-sshsync-groups)
 6. [Step 5: Install Agent](#step-5-install-agent)
 7. [Step 6: Test Installation](#step-6-test-installation)
 8. [Troubleshooting](#troubleshooting)
 ## Prerequisites
 Before you begin, ensure you have:
 - **Operating System**: macOS, Linux, or BSD
 - **Python**: Version 3.10 or higher
 - **pip**: Python package installer
 - **Claude Code**: Installed and running
 - **Remote machines**: At least one machine you want to manage
 - **SSH access**: Ability to SSH to remote machines
 **Check Python version**:
 ```bash
 python3 --version
 # Should show: Python 3.10.x or higher
 ```
 **Check pip**:
 ```bash
 pip3 --version
 # Should show: pip xx.x.x from ...
 ```
 ## Step 1: Install Tailscale
 Tailscale provides secure networking between your machines.
 ### macOS
 ```bash
 # Install via Homebrew
 brew install tailscale
 # Start Tailscale
 sudo tailscale up
 # Follow authentication link in terminal
 # This will open browser to log in
 ```
 ### Linux (Ubuntu/Debian)
 ```bash
 # Install Tailscale
 curl -fsSL https://tailscale.com/install.sh | sh
 # Start and authenticate
 sudo tailscale up
 # Follow authentication link
 ```
 ### Linux (Fedora/RHEL)
 ```bash
 # Add repository
 sudo dnf config-manager --add-repo https://pkgs.tailscale.com/stable/fedora/tailscale.repo
 # Install
 sudo dnf install tailscale
 # Enable and start
 sudo systemctl enable --now tailscaled
 sudo tailscale up
 ```
 ### Verify Installation
 ```bash
 # Check Tailscale status
 tailscale status
 # Should show list of machines in your tailnet
 # Example output:
 # 100.64.1.10  homelab-1    user@    linux   -
 # 100.64.1.11  laptop       user@    macOS   -
 ```
 **Important**: Install and authenticate Tailscale on **all machines** you want to manage.
 ## Step 2: Install sshsync
 sshsync is the CLI tool for managing SSH operations across multiple hosts.
 ```bash
 # Install via pip
 pip3 install sshsync
 # Or use pipx for isolated installation
 pipx install sshsync
 ```
 ### Verify Installation
 ```bash
 # Check version
 sshsync --version
 # Should show: sshsync, version x.x.x
 ```
 ### Common Installation Issues
 **Issue**: `pip3: command not found`
 **Solution**:
 ```bash
 # macOS
 brew install python3
 # Linux (Ubuntu/Debian)
 sudo apt install python3-pip
 # Linux (Fedora/RHEL)
 sudo dnf install python3-pip
 ```
 **Issue**: Permission denied during install
 **Solution**:
 ```bash
 # Install for current user only
 pip3 install --user sshsync
 # Or use pipx
 pip3 install --user pipx
 pipx install sshsync
 ```
 ## Step 3: Configure SSH
 SSH configuration defines how to connect to each machine.
 ### Step 3.1: Generate SSH Keys (if you don't have them)
 ```bash
 # Generate ed25519 key (recommended)
 ssh-keygen -t ed25519 -C "your_email@example.com"
 # Press Enter to use default location (~/.ssh/id_ed25519)
 # Enter passphrase (or leave empty for no passphrase)
 ```
 **Output**:
 ```
 Your identification has been saved in /Users/you/.ssh/id_ed25519
 Your public key has been saved in /Users/you/.ssh/id_ed25519.pub
 ```
 ### Step 3.2: Copy Public Key to Remote Machines
 For each remote machine:
 ```bash
 # Copy SSH key to remote
 ssh-copy-id user@machine-hostname
 # Example:
 ssh-copy-id admin@100.64.1.10
 ```
 **Manual method** (if ssh-copy-id doesn't work):
 ```bash
 # Display public key
 cat ~/.ssh/id_ed25519.pub
 # SSH to remote machine
 ssh user@remote-host
 # On remote machine:
 mkdir -p ~/.ssh
 chmod 700 ~/.ssh
 echo "your-public-key-here" >> ~/.ssh/authorized_keys
 chmod 600 ~/.ssh/authorized_keys
 exit
 ```
 ### Step 3.3: Test SSH Connection
 ```bash
 # Test connection (should not ask for password)
 ssh user@remote-host "hostname"
 # If successful, should print remote hostname
 ```
 ### Step 3.4: Create SSH Config File
 Edit `~/.ssh/config`:
 ```bash
 vim ~/.ssh/config
 ```
 **Add host entries**:
 ```
 # Production servers
 Host prod-web-01
  HostName prod-web-01.tailnet.ts.net
  User deploy
  IdentityFile ~/.ssh/id_ed25519
  Port 22
 Host prod-web-02
  HostName 100.64.1.21
  User deploy
  IdentityFile ~/.ssh/id_ed25519
 Host prod-db-01
  HostName 100.64.1.30
  User deploy
  IdentityFile ~/.ssh/id_ed25519
 # Development
 Host dev-laptop
  HostName dev-laptop.tailnet.ts.net
  User developer
  IdentityFile ~/.ssh/id_ed25519
 Host dev-desktop
  HostName 100.64.1.40
  User developer
  IdentityFile ~/.ssh/id_ed25519
 # Homelab
 Host homelab-1
  HostName 100.64.1.10
  User admin
  IdentityFile ~/.ssh/id_ed25519
 Host homelab-2
  HostName 100.64.1.11
  User admin
  IdentityFile ~/.ssh/id_ed25519
 ```
 **Important fields**:
 - **Host**: Alias you'll use (e.g., "homelab-1")
 - **HostName**: Actual hostname or IP (Tailscale hostname or IP)
 - **User**: SSH username on remote machine
 - **IdentityFile**: Path to SSH private key
 ### Step 3.5: Set Correct Permissions
 ```bash
 # SSH config should be readable only by you
 chmod 600 ~/.ssh/config
 # SSH directory permissions
 chmod 700 ~/.ssh
 # Private key permissions
 chmod 600 ~/.ssh/id_ed25519
 # Public key permissions
 chmod 644 ~/.ssh/id_ed25519.pub
 ```
 ### Step 3.6: Verify All Hosts
 Test each host in your config:
 ```bash
 # Test each host
 ssh homelab-1 "echo 'Connection successful'"
 ssh prod-web-01 "echo 'Connection successful'"
 ssh dev-laptop "echo 'Connection successful'"
 # Should connect without asking for password
 ```
 ## Step 4: Configure sshsync Groups
 Groups organize your hosts for easy management.
 ### Step 4.1: Initialize sshsync Configuration
 ```bash
 # Sync hosts and create groups
 sshsync sync
 ```
 **What this does**:
 1. Reads all hosts from `~/.ssh/config`
 2. Prompts you to assign hosts to groups
 3. Creates `~/.config/sshsync/config.yaml`
 ### Step 4.2: Follow Interactive Prompts
 ```
 Found 7 ungrouped hosts:
 1. homelab-1
 2. homelab-2
 3. prod-web-01
 4. prod-web-02
 5. prod-db-01
 6. dev-laptop
 7. dev-desktop
 Assign groups now? [Y/n]: Y
 Enter group name for homelab-1 (or skip): homelab
 Enter group name for homelab-2 (or skip): homelab
 Enter group name for prod-web-01 (or skip): production,web
 Enter group name for prod-web-02 (or skip): production,web
 Enter group name for prod-db-01 (or skip): production,database
 Enter group name for dev-laptop (or skip): development
 Enter group name for dev-desktop (or skip): development
 ```
 **Tips**:
 - Hosts can belong to multiple groups (separate with commas)
 - Use meaningful group names (production, development, web, database, homelab)
 - Skip hosts you don't want to group yet
 ### Step 4.3: Verify Configuration
 ```bash
 # View generated config
 cat ~/.config/sshsync/config.yaml
 ```
 **Expected output**:
 ```yaml
 groups:
  production:
    - prod-web-01
    - prod-web-02
    - prod-db-01
  web:
    - prod-web-01
    - prod-web-02
  database:
    - prod-db-01
  development:
    - dev-laptop
    - dev-desktop
  homelab:
    - homelab-1
    - homelab-2
 ```
 ### Step 4.4: Test sshsync
 ```bash
 # List hosts
 sshsync ls
 # List with status
 sshsync ls --with-status
 # Test command execution
 sshsync all "hostname"
 # Test group execution
 sshsync group homelab "uptime"
 ```
 ## Step 5: Install Agent
 ### Step 5.1: Navigate to Agent Directory
 ```bash
 cd /path/to/tailscale-sshsync-agent
 ```
 ### Step 5.2: Verify Agent Structure
 ```bash
 # List files
 ls -la
 # Should see:
 # .claude-plugin/
 # scripts/
 # tests/
 # references/
 # SKILL.md
 # README.md
 # VERSION
 # CHANGELOG.md
 # etc.
 ```
 ### Step 5.3: Validate marketplace.json
 ```bash
 # Check JSON is valid
 python3 -c "import json; json.load(open('.claude-plugin/marketplace.json')); print('✅ Valid JSON')"
 # Should output: ✅ Valid JSON
 ```
 ### Step 5.4: Install via Claude Code
 In Claude Code:
 ```
 /plugin marketplace add /absolute/path/to/tailscale-sshsync-agent
 ```
 **Example**:
 ```
 /plugin marketplace add /Users/you/tailscale-sshsync-agent
 ```
 **Expected output**:
 ```
 ✓ Plugin installed successfully
 ✓ Skill: tailscale-sshsync-agent
 ✓ Description: Manages distributed workloads and file sharing...
 ```
 ### Step 5.5: Verify Installation
 In Claude Code:
 ```
 "Which of my machines are online?"
 ```
 **Expected response**: Agent should activate and check your Tailscale network.
 ## Step 6: Test Installation
 ### Test 1: Host Status
 **Query**:
 ```
 "Which of my machines are online?"
 ```
 **Expected**: List of hosts with online/offline status
 ### Test 2: List Groups
 **Query**:
 ```
 "What groups do I have configured?"
 ```
 **Expected**: List of your sshsync groups
 ### Test 3: Execute Command
 **Query**:
 ```
 "Check disk space on homelab machines"
 ```
 **Expected**: Disk usage for hosts in homelab group
 ### Test 4: Dry-Run
 **Query**:
 ```
 "Show me what would happen if I ran 'uptime' on all machines (dry-run)"
 ```
 **Expected**: Preview without execution
 ### Test 5: Run Test Suite
 ```bash
 cd /path/to/tailscale-sshsync-agent
 # Run all tests
 python3 tests/test_integration.py
 # Should show:
 # Results: 11/11 passed
 # 🎉 All tests passed!
 ```
 ## Troubleshooting
 ### Agent Not Activating
 **Symptoms**: Agent doesn't respond to queries about machines/hosts
 **Solutions**:
 1. **Check installation**:
   ```
   /plugin list
   ```
   Should show `tailscale-sshsync-agent` in list.
 2. **Reinstall**:
   ```
   /plugin remove tailscale-sshsync-agent
   /plugin marketplace add /path/to/tailscale-sshsync-agent
   ```
 3. **Check marketplace.json**:
   ```bash
   cat .claude-plugin/marketplace.json
   # Verify "description" field matches SKILL.md frontmatter
   ```
 ### SSH Connection Fails
 **Symptoms**: "Permission denied" or "Connection refused"
 **Solutions**:
 1. **Check SSH key**:
   ```bash
   ssh-add -l
   # Should list your SSH key
   ```
   If not listed:
   ```bash
   ssh-add ~/.ssh/id_ed25519
   ```
 2. **Test SSH directly**:
   ```bash
   ssh -v hostname
   # -v shows verbose debug info
   ```
 3. **Verify authorized_keys on remote**:
   ```bash
   ssh hostname "cat ~/.ssh/authorized_keys"
   # Should contain your public key
   ```
 ### Tailscale Connection Issues
 **Symptoms**: Hosts show as offline in Tailscale
 **Solutions**:
 1. **Check Tailscale status**:
   ```bash
   tailscale status
   ```
 2. **Restart Tailscale**:
   ```bash
   # macOS
   brew services restart tailscale
   # Linux
   sudo systemctl restart tailscaled
   ```
 3. **Re-authenticate**:
   ```bash
   sudo tailscale up
   ```
 ### sshsync Errors
 **Symptoms**: "sshsync: command not found"
 **Solutions**:
 1. **Reinstall sshsync**:
   ```bash
   pip3 install --upgrade sshsync
   ```
 2. **Check PATH**:
   ```bash
   which sshsync
   # Should show path to sshsync
   ```
   If not found, add to PATH:
   ```bash
   echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
   source ~/.bashrc
   ```
 ### Config File Issues
 **Symptoms**: "Group not found" or "Host not found"
 **Solutions**:
 1. **Verify SSH config**:
   ```bash
   cat ~/.ssh/config
   # Check host aliases are correct
   ```
 2. **Verify sshsync config**:
   ```bash
   cat ~/.config/sshsync/config.yaml
   # Check groups are defined
   ```
 3. **Re-sync**:
   ```bash
   sshsync sync
   ```
 ### Test Failures
 **Symptoms**: Tests fail with errors
 **Solutions**:
 1. **Check dependencies**:
   ```bash
   pip3 list | grep -E "sshsync|pyyaml"
   ```
 2. **Check Python version**:
   ```bash
   python3 --version
   # Must be 3.10+
   ```
 3. **Run tests individually**:
   ```bash
   python3 tests/test_helpers.py
   python3 tests/test_validation.py
   python3 tests/test_integration.py
   ```
 ## Post-Installation
 ### Recommended Next Steps
 1. **Create more groups** for better organization:
   ```bash
   sshsync gadd staging
   sshsync gadd backup-servers
   ```
 2. **Test file operations**:
   ```
   "Push test file to homelab machines (dry-run)"
   ```
 3. **Set up automation**:
   - Create scripts for common tasks
   - Schedule backups
   - Automate deployments
 4. **Review documentation**:
   - Read `references/sshsync-guide.md` for advanced sshsync usage
   - Read `references/tailscale-integration.md` for Tailscale tips
 ### Security Checklist
 - ✅ SSH keys are password-protected
 - ✅ SSH config has correct permissions (600)
 - ✅ Private keys have correct permissions (600)
 - ✅ Tailscale ACLs configured (if using teams)
 - ✅ Only necessary hosts have SSH access
 - ✅ Regularly review connected devices in Tailscale
 ## Summary
 You now have:
 1. ✅ Tailscale installed and connected
 2. ✅ sshsync installed and configured
 3. ✅ SSH keys set up on all machines
 4. ✅ SSH config with all hosts
 5. ✅ sshsync groups organized
 6. ✅ Agent installed in Claude Code
 7. ✅ Tests passing
 **Start using**:
 ```
 "Which machines are online?"
 "Run this on the least loaded machine"
 "Push files to production servers"
 "Deploy to staging then production"
 ```
 For more examples, see README.md and SKILL.md.
 ## Support
 If you encounter issues:
 1. Check this troubleshooting section
 2. Review references/ for detailed guides
 3. Check DECISIONS.md for architecture rationale
 4. Run tests to verify installation
 Happy automating! 🚀
--- a/README.md
+++ b/README.md
@@ -0,0 +1,3 @@
 # tailscale-sshsync-agent
 Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.
--- a/SKILL.md
+++ b/SKILL.md
--- a/1
+++ b/1
@@ -0,0 +1 @@
 1.0.0
--- a/plugin.lock.json
+++ b/plugin.lock.json
@@ -0,0 +1,117 @@
 {
  "$schema": "internal://schemas/plugin.lock.v1.json",
  "pluginId": "gh:Human-Frontier-Labs-Inc/human-frontier-labs-marketplace:plugins/tailscale-sshsync-agent",
  "normalized": {
    "repo": null,
    "ref": "refs/tags/v20251128.0",
    "commit": "3a7cbe9632f245c6b9a4c4bf2731da65c857a7f4",
    "treeHash": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805",
    "generatedAt": "2025-11-28T10:11:41.356928Z",
    "toolVersion": "publish_plugins.py@0.2.0"
  },
  "origin": {
    "remote": "git@github.com:zhongweili/42plugin-data.git",
    "branch": "master",
    "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
    "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
  },
  "manifest": {
    "name": "tailscale-sshsync-agent",
    "description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
    "version": null
  },
  "content": {
    "files": [
      {
        "path": "CHANGELOG.md",
        "sha256": "74dbda933868b7cab410144a831b43e4f1ae6161f2402edcb068a8232c50bfe4"
      },
      {
        "path": "README.md",
        "sha256": "470f165d8ac61a8942e6fb3568c49febb7f803bfa0f4010d14e09f807c34c88e"
      },
      {
        "path": "VERSION",
        "sha256": "59854984853104df5c353e2f681a15fc7924742f9a2e468c29af248dce45ce03"
      },
      {
        "path": "SKILL.md",
        "sha256": "31c8f237f9b3617c32c6ff381ae83d427b50eb0877d3763d9826e00ece6618f1"
      },
      {
        "path": "INSTALLATION.md",
        "sha256": "9313ea1bbb0a03e4c078c41b207f3febe800cd38eb57b7205c7b5188238ca46a"
      },
      {
        "path": "DECISIONS.md",
        "sha256": "59549e84aaa8e32d4bdf64d46855714f5cde7f061906e1c74976658883472c82"
      },
      {
        "path": "references/tailscale-integration.md",
        "sha256": "6553b3ceeaca5118a7b005368223ea4b3ab70eb2492ccaf5c2b7f7758b65dd42"
      },
      {
        "path": "references/sshsync-guide.md",
        "sha256": "697ce0b56eda258732a0b924f821e9e24eb6b977934153bdd2045be961e58de2"
      },
      {
        "path": "tests/test_validation.py",
        "sha256": "716ae0d2e86f0e6657903aef6bb714fbd3b5b72d3b109fab4da3f75f90cc2c0a"
      },
      {
        "path": "tests/test_helpers.py",
        "sha256": "3be88e30825414eb3ade048b766c84995dc98a01cb7236ce75201716179279a8"
      },
      {
        "path": "tests/test_integration.py",
        "sha256": "12f7cb857fda23531a9c74caf072cf73b739672b1e99c55f42a2ef8e11238523"
      },
      {
        "path": "scripts/load_balancer.py",
        "sha256": "9d87476562ac848a026e42116e381f733d520e9330da33de3d905585af14398d"
      },
      {
        "path": "scripts/tailscale_manager.py",
        "sha256": "4b75ebb9423d221b9788eb9352b274e0256c101185de11064a7b4cb00684016e"
      },
      {
        "path": "scripts/workflow_executor.py",
        "sha256": "9f23f3bb421e940766e65949e6efa485a313115e297d4c5f1088589155a7bac1"
      },
      {
        "path": "scripts/sshsync_wrapper.py",
        "sha256": "fc2062ebbc72e3ddc6c6bfb5f22019b23050f5c2ed9ac35c315018a96871fb19"
      },
      {
        "path": "scripts/utils/helpers.py",
        "sha256": "b01979ee56ab92037b8f8054a883124d600b8337cf461855092b866091aed24a"
      },
      {
        "path": "scripts/utils/validators/connection_validator.py",
        "sha256": "9ac82108e69690b74d9aa89ca51f7d06fe860e880aaa1983d08242d7199d1601"
      },
      {
        "path": "scripts/utils/validators/parameter_validator.py",
        "sha256": "157dfcb7f1937df88344647a37a124d52e1de1b992b72c9b9e69d3b717ca0195"
      },
      {
        "path": "scripts/utils/validators/__init__.py",
        "sha256": "2d109ad1b5d253578a095c8354159fdf9318154b4f62d9b16eaa1a88a422382d"
      },
      {
        "path": "scripts/utils/validators/host_validator.py",
        "sha256": "79cab42587435a799349ba8a562c4ec0f3d54f3f2790562c894c6289beade6d6"
      },
      {
        "path": ".claude-plugin/plugin.json",
        "sha256": "0ec7466bbf2e8dc2fe1607feff0cc0ef0ebebf44ff54f17dcce96255e2c21215"
      }
    ],
    "dirSha256": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805"
  },
  "security": {
    "scannedAt": null,
    "scannerVersion": null,
    "flags": []
  }
 }
--- a/references/sshsync-guide.md
+++ b/references/sshsync-guide.md
@@ -0,0 +1,466 @@
 # sshsync CLI Tool Guide
 Complete reference for using sshsync with Tailscale SSH Sync Agent.
 ## Table of Contents
 1. [Installation](#installation)
 2. [Configuration](#configuration)
 3. [Core Commands](#core-commands)
 4. [Advanced Usage](#advanced-usage)
 5. [Troubleshooting](#troubleshooting)
 ## Installation
 ### Via pip
 ```bash
 pip install sshsync
 ```
 ### Verify Installation
 ```bash
 sshsync --version
 ```
 ## Configuration
 ### 1. SSH Config Setup
 sshsync uses your existing SSH configuration. Edit `~/.ssh/config`:
 ```
 # Example host entries
 Host homelab-1
  HostName 100.64.1.10
  User admin
  IdentityFile ~/.ssh/id_ed25519
  Port 22
 Host prod-web-01
  HostName 100.64.1.20
  User deploy
  IdentityFile ~/.ssh/id_rsa
  Port 22
 Host dev-laptop
  HostName 100.64.1.30
  User developer
 ```
 **Important Notes**:
 - sshsync uses the **Host alias** (e.g., "homelab-1"), not the actual hostname
 - Ensure SSH key authentication is configured
 - Test each host with `ssh host-alias` before using with sshsync
 ### 2. Initialize sshsync Configuration
 First run:
 ```bash
 sshsync sync
 ```
 This will:
 1. Read all hosts from your SSH config
 2. Prompt you to assign hosts to groups
 3. Create `~/.config/sshsync/config.yaml`
 ### 3. sshsync Config File
 Location: `~/.config/sshsync/config.yaml`
 Structure:
 ```yaml
 groups:
  production:
    - prod-web-01
    - prod-web-02
    - prod-db-01
  development:
    - dev-laptop
    - dev-desktop
  homelab:
    - homelab-1
    - homelab-2
 ```
 **Manual Editing**:
 - Groups are arbitrary labels (use what makes sense for you)
 - Hosts can belong to multiple groups
 - Use consistent host aliases from SSH config
 ## Core Commands
 ### List Hosts
 ```bash
 # List all configured hosts
 sshsync ls
 # List with online/offline status
 sshsync ls --with-status
 ```
 **Output Example**:
 ```
 Host            Status
 homelab-1       online
 homelab-2       offline
 prod-web-01     online
 dev-laptop      online
 ```
 ### Execute Commands
 #### On All Hosts
 ```bash
 # Execute on all configured hosts
 sshsync all "df -h"
 # With custom timeout (default: 10s)
 sshsync all --timeout 20 "systemctl status nginx"
 # Dry-run (preview without executing)
 sshsync all --dry-run "reboot"
 ```
 #### On Specific Group
 ```bash
 # Execute on group
 sshsync group production "uptime"
 # With timeout
 sshsync group web-servers --timeout 30 "npm run build"
 # Filter with regex
 sshsync group production --regex "web-.*" "df -h"
 ```
 **Regex Filtering**:
 - Filters group members by alias matching pattern
 - Uses Python regex syntax
 - Example: `--regex "web-0[1-3]"` matches web-01, web-02, web-03
 ### File Transfer
 #### Push Files
 ```bash
 # Push to specific host
 sshsync push --host web-01 ./app /var/www/app
 # Push to group
 sshsync push --group production ./dist /var/www/app
 # Push to all hosts
 sshsync push --all ./config.yml /etc/app/config.yml
 # Recursive push (directory with contents)
 sshsync push --group web --recurse ./app /var/www/app
 # Dry-run
 sshsync push --group production --dry-run ./dist /var/www/app
 ```
 **Important**:
 - Local path comes first, remote path second
 - Use `--recurse` for directories
 - Dry-run shows what would be transferred without executing
 #### Pull Files
 ```bash
 # Pull from specific host
 sshsync pull --host db-01 /var/log/mysql/error.log ./logs/
 # Pull from group (creates separate directories per host)
 sshsync pull --group databases /var/backups ./backups/
 # Recursive pull
 sshsync pull --host web-01 --recurse /var/www/app ./backup/
 ```
 **Pull Behavior**:
 - When pulling from groups, creates subdirectory per host
 - Use `--recurse` to pull entire directory trees
 - Destination directory created if doesn't exist
 ### Group Management
 #### Add Hosts to Group
 ```bash
 # Interactive: prompts to select hosts
 sshsync gadd production
 # Follow prompts to select which hosts to add
 ```
 #### Add Host to SSH Config
 ```bash
 # Interactive host addition
 sshsync hadd
 # Follow prompts for:
 # - Host alias
 # - Hostname/IP
 # - Username
 # - Port (optional)
 # - Identity file (optional)
 ```
 #### Sync Ungrouped Hosts
 ```bash
 # Assign groups to hosts not yet in any group
 sshsync sync
 ```
 ## Advanced Usage
 ### Parallel Execution
 sshsync automatically executes commands in parallel across hosts:
 ```bash
 # This runs simultaneously on all hosts in group
 sshsync group web-servers "npm run build"
 ```
 **Performance**:
 - Commands execute concurrently
 - Results collected as they complete
 - Timeout applies per-host independently
 ### Timeout Strategies
 Different operations need different timeouts:
 ```bash
 # Quick checks (5-10s)
 sshsync all --timeout 5 "hostname"
 # Moderate operations (30-60s)
 sshsync group web --timeout 60 "npm install"
 # Long-running tasks (300s+)
 sshsync group build --timeout 300 "docker build ."
 ```
 **Timeout Best Practices**:
 - Set timeout 20-30% longer than expected duration
 - Use dry-run first to estimate timing
 - Increase timeout for network-intensive operations
 ### Combining with Other Tools
 #### With xargs
 ```bash
 # Get list of online hosts
 sshsync ls --with-status | grep online | awk '{print $1}' | xargs -I {} echo "Host {} is online"
 ```
 #### With jq (if using JSON output)
 ```bash
 # Parse structured output (if sshsync supports --json flag)
 sshsync ls --json | jq '.hosts[] | select(.status=="online") | .name'
 ```
 #### In Shell Scripts
 ```bash
 #!/bin/bash
 # Deploy script using sshsync
 echo "Deploying to staging..."
 sshsync push --group staging --recurse ./dist /var/www/app
 if [ $? -eq 0 ]; then
    echo "Staging deployment successful"
    echo "Running tests..."
    sshsync group staging "cd /var/www/app && npm test"
    if [ $? -eq 0 ]; then
        echo "Tests passed, deploying to production..."
        sshsync push --group production --recurse ./dist /var/www/app
    fi
 fi
 ```
 ## Troubleshooting
 ### Common Issues
 #### 1. "Permission denied (publickey)"
 **Cause**: SSH key not configured or not added to ssh-agent
 **Solution**:
 ```bash
 # Add SSH key to agent
 ssh-add ~/.ssh/id_ed25519
 # Verify it's added
 ssh-add -l
 # Copy public key to remote
 ssh-copy-id user@host
 ```
 #### 2. "Connection timed out"
 **Cause**: Host is offline or network issue
 **Solution**:
 ```bash
 # Test connectivity
 ping hostname
 # Test Tailscale specifically
 tailscale ping hostname
 # Check Tailscale status
 tailscale status
 ```
 #### 3. "Host not found in SSH config"
 **Cause**: Host alias not in `~/.ssh/config`
 **Solution**:
 ```bash
 # Add host to SSH config
 sshsync hadd
 # Or manually edit ~/.ssh/config
 vim ~/.ssh/config
 ```
 #### 4. "Group not found"
 **Cause**: Group doesn't exist in sshsync config
 **Solution**:
 ```bash
 # Add hosts to new group
 sshsync gadd mygroup
 # Or manually edit config
 vim ~/.config/sshsync/config.yaml
 ```
 #### 5. File Transfer Fails
 **Cause**: Insufficient permissions, disk space, or path doesn't exist
 **Solution**:
 ```bash
 # Check remote disk space
 sshsync group production "df -h"
 # Check remote path exists
 sshsync group production "ls -ld /target/path"
 # Check permissions
 sshsync group production "ls -la /target/path"
 ```
 ### Debug Mode
 While sshsync doesn't have a built-in verbose mode, you can debug underlying SSH:
 ```bash
 # Increase SSH verbosity
 SSH_VERBOSE=1 sshsync all "uptime"
 # Or use dry-run to see what would execute
 sshsync all --dry-run "command"
 ```
 ### Performance Issues
 If operations are slow:
 1. **Reduce parallelism** (run on fewer hosts at once)
 2. **Increase timeout** for network-bound operations
 3. **Check network latency**:
   ```bash
   sshsync all "echo $HOSTNAME" --timeout 5
   ```
 ### Configuration Validation
 ```bash
 # Verify SSH config is readable
 cat ~/.ssh/config
 # Verify sshsync config
 cat ~/.config/sshsync/config.yaml
 # Test hosts individually
 for host in $(sshsync ls | awk '{print $1}'); do
    echo "Testing $host..."
    ssh $host "echo OK" || echo "FAILED: $host"
 done
 ```
 ## Best Practices
 1. **Use meaningful host aliases** in SSH config
 2. **Organize groups logically** (by function, environment, location)
 3. **Always dry-run first** for destructive operations
 4. **Set appropriate timeouts** based on operation type
 5. **Test SSH keys** before using sshsync
 6. **Keep groups updated** as infrastructure changes
 7. **Use --with-status** to check availability before operations
 ## Integration with Tailscale
 sshsync works seamlessly with Tailscale SSH:
 ```bash
 # SSH config using Tailscale hostname
 Host homelab-1
  HostName homelab-1.tailnet.ts.net
  User admin
 # Or using Tailscale IP directly
 Host homelab-1
  HostName 100.64.1.10
  User admin
 ```
 **Tailscale Advantages**:
 - No need for port forwarding
 - Encrypted connections
 - MagicDNS for easy hostnames
 - Works across NATs
 **Verify Tailscale**:
 ```bash
 # Check Tailscale network
 tailscale status
 # Ping host via Tailscale
 tailscale ping homelab-1
 ```
 ## Summary
 sshsync simplifies multi-host SSH operations:
 - ✅ Execute commands across host groups
 - ✅ Transfer files to/from multiple hosts
 - ✅ Organize hosts into logical groups
 - ✅ Parallel execution for speed
 - ✅ Dry-run mode for safety
 - ✅ Works great with Tailscale
 For more help: `sshsync --help`
--- a/references/tailscale-integration.md
+++ b/references/tailscale-integration.md
@@ -0,0 +1,468 @@
 # Tailscale Integration Guide
 How to use Tailscale SSH with sshsync for secure, zero-config remote access.
 ## What is Tailscale?
 Tailscale is a zero-config VPN that creates a secure network between your devices using WireGuard. It provides:
 - **Peer-to-peer encrypted connections**
 - **No port forwarding required**
 - **Works across NATs and firewalls**
 - **MagicDNS for easy device addressing**
 - **Built-in SSH functionality**
 - **Access control lists (ACLs)**
 ## Why Tailscale + sshsync?
 Combining Tailscale with sshsync gives you:
 1. **Secure connections** everywhere (Tailscale encryption)
 2. **Simple addressing** (MagicDNS hostnames)
 3. **Multi-host operations** (sshsync groups and execution)
 4. **No firewall configuration** needed
 5. **Works from anywhere** (coffee shop, home, office)
 ## Setup
 ### 1. Install Tailscale
 **macOS**:
 ```bash
 brew install tailscale
 ```
 **Linux**:
 ```bash
 curl -fsSL https://tailscale.com/install.sh | sh
 ```
 **Verify Installation**:
 ```bash
 tailscale version
 ```
 ### 2. Connect to Tailscale
 ```bash
 # Start Tailscale
 sudo tailscale up
 # Follow the authentication link
 # This opens browser to authenticate
 # Verify connection
 tailscale status
 ```
 ### 3. Configure SSH via Tailscale
 Tailscale provides two SSH options:
 #### Option A: Tailscale SSH (Built-in)
 **Enable on each machine**:
 ```bash
 sudo tailscale up --ssh
 ```
 **Use**:
 ```bash
 tailscale ssh user@machine-name
 ```
 **Advantages**:
 - No SSH server configuration needed
 - Uses Tailscale authentication
 - Automatic key management
 #### Option B: Standard SSH over Tailscale (Recommended for sshsync)
 **Configure SSH config** to use Tailscale hostnames:
 ```bash
 # ~/.ssh/config
 Host homelab-1
  HostName homelab-1.tailnet-name.ts.net
  User admin
  IdentityFile ~/.ssh/id_ed25519
 # Or use Tailscale IP directly
 Host homelab-2
  HostName 100.64.1.10
  User admin
  IdentityFile ~/.ssh/id_ed25519
 ```
 **Advantages**:
 - Works with all SSH tools (including sshsync)
 - Standard SSH key authentication
 - More flexibility
 ## Getting Tailscale Hostnames and IPs
 ### View All Machines
 ```bash
 tailscale status
 ```
 **Output**:
 ```
 100.64.1.10  homelab-1    user@    linux   -
 100.64.1.11  homelab-2    user@    linux   -
 100.64.1.20  laptop       user@    macOS   -
 100.64.1.30  phone        user@    iOS     offline
 ```
 ### Get MagicDNS Hostname
 **Format**: `machine-name.tailnet-name.ts.net`
 **Find your tailnet name**:
 ```bash
 tailscale status --json | grep -i tailnet
 ```
 Or check in Tailscale admin console: https://login.tailscale.com/admin/machines
 ### Get Tailscale IP
 ```bash
 # Your own IP
 tailscale ip -4
 # Another machine's IP (from status output)
 tailscale status | grep machine-name
 ```
 ## Testing Connectivity
 ### Ping via Tailscale
 ```bash
 # Ping by hostname
 tailscale ping homelab-1
 # Ping by IP
 tailscale ping 100.64.1.10
 ```
 **Successful output**:
 ```
 pong from homelab-1 (100.64.1.10) via DERP(nyc) in 45ms
 pong from homelab-1 (100.64.1.10) via DERP(nyc) in 43ms
 ```
 **Failed output**:
 ```
 timeout waiting for pong
 ```
 ### SSH Test
 ```bash
 # Test SSH connection
 ssh user@homelab-1.tailnet.ts.net
 # Or with IP
 ssh user@100.64.1.10
 ```
 ## Configuring sshsync with Tailscale
 ### Step 1: Add Tailscale Hosts to SSH Config
 ```bash
 vim ~/.ssh/config
 ```
 **Example configuration**:
 ```
 # Production servers
 Host prod-web-01
  HostName prod-web-01.tailnet.ts.net
  User deploy
  IdentityFile ~/.ssh/id_ed25519
 Host prod-web-02
  HostName prod-web-02.tailnet.ts.net
  User deploy
  IdentityFile ~/.ssh/id_ed25519
 Host prod-db-01
  HostName prod-db-01.tailnet.ts.net
  User deploy
  IdentityFile ~/.ssh/id_ed25519
 # Homelab
 Host homelab-1
  HostName 100.64.1.10
  User admin
  IdentityFile ~/.ssh/id_ed25519
 Host homelab-2
  HostName 100.64.1.11
  User admin
  IdentityFile ~/.ssh/id_ed25519
 # Development
 Host dev-laptop
  HostName dev-laptop.tailnet.ts.net
  User developer
  IdentityFile ~/.ssh/id_ed25519
 ```
 ### Step 2: Test Each Host
 ```bash
 # Test connectivity to each host
 ssh prod-web-01 "hostname"
 ssh homelab-1 "hostname"
 ssh dev-laptop "hostname"
 ```
 ### Step 3: Initialize sshsync
 ```bash
 # Sync hosts and create groups
 sshsync sync
 # Add hosts to groups
 sshsync gadd production
 # Select: prod-web-01, prod-web-02, prod-db-01
 sshsync gadd homelab
 # Select: homelab-1, homelab-2
 sshsync gadd development
 # Select: dev-laptop
 ```
 ### Step 4: Verify Configuration
 ```bash
 # List all hosts with status
 sshsync ls --with-status
 # Test command execution
 sshsync all "uptime"
 # Test group execution
 sshsync group production "df -h"
 ```
 ## Advanced Tailscale Features
 ### Tailnet Lock
 Prevents unauthorized device additions:
 ```bash
 tailscale lock status
 ```
 ### Exit Nodes
 Route all traffic through a specific machine:
 ```bash
 # Enable exit node on a machine
 sudo tailscale up --advertise-exit-node
 # Use exit node from another machine
 sudo tailscale set --exit-node=exit-node-name
 ```
 ### Subnet Routing
 Access networks behind Tailscale machines:
 ```bash
 # Advertise subnet routes
 sudo tailscale up --advertise-routes=192.168.1.0/24
 ```
 ### ACLs (Access Control Lists)
 Control who can access what: https://login.tailscale.com/admin/acls
 **Example ACL**:
 ```json
 {
  "acls": [
    {
      "action": "accept",
      "src": ["group:admins"],
      "dst": ["*:22", "*:80", "*:443"]
    },
    {
      "action": "accept",
      "src": ["group:developers"],
      "dst": ["tag:development:*"]
    }
  ]
 }
 ```
 ## Troubleshooting
 ### Machine Shows Offline
 **Check Tailscale status**:
 ```bash
 tailscale status
 ```
 **Restart Tailscale**:
 ```bash
 # macOS
 brew services restart tailscale
 # Linux
 sudo systemctl restart tailscaled
 ```
 **Re-authenticate**:
 ```bash
 sudo tailscale up
 ```
 ### Cannot Connect via SSH
 1. **Verify Tailscale connectivity**:
   ```bash
   tailscale ping machine-name
   ```
 2. **Check SSH is running** on remote:
   ```bash
   tailscale ssh machine-name "systemctl status sshd"
   ```
 3. **Verify SSH keys**:
   ```bash
   ssh-add -l
   ```
 4. **Test SSH directly**:
   ```bash
   ssh -v user@machine-name.tailnet.ts.net
   ```
 ### High Latency
 **Check connection method**:
 ```bash
 tailscale status
 ```
 Look for "direct" vs "DERP relay":
 - **Direct**: Low latency (< 50ms)
 - **DERP relay**: Higher latency (100-200ms)
 **Force direct connection**:
 ```bash
 # Ensure both machines can establish P2P
 # May require NAT traversal
 ```
 ### MagicDNS Not Working
 **Enable MagicDNS**:
 1. Go to https://login.tailscale.com/admin/dns
 2. Enable MagicDNS
 **Verify**:
 ```bash
 nslookup machine-name.tailnet.ts.net
 ```
 ## Security Best Practices
 1. **Use SSH keys**, not passwords
 2. **Enable Tailnet Lock** to prevent unauthorized devices
 3. **Use ACLs** to restrict access
 4. **Regularly review** connected devices
 5. **Set up key expiry** for team members who leave
 6. **Use tags** for machine roles
 7. **Enable two-factor auth** for Tailscale account
 ## Monitoring
 ### Check Network Status
 ```bash
 # All machines
 tailscale status
 # Self status
 tailscale status --self
 # JSON format for parsing
 tailscale status --json
 ```
 ### View Logs
 ```bash
 # macOS
 tail -f /var/log/tailscaled.log
 # Linux
 journalctl -u tailscaled -f
 ```
 ## Use Cases with sshsync
 ### 1. Deploy to All Production Servers
 ```bash
 sshsync push --group production --recurse ./dist /var/www/app
 sshsync group production "cd /var/www/app && pm2 restart all"
 ```
 ### 2. Collect Logs from All Servers
 ```bash
 sshsync pull --group production /var/log/app/error.log ./logs/
 ```
 ### 3. Update All Homelab Machines
 ```bash
 sshsync group homelab "sudo apt update && sudo apt upgrade -y"
 ```
 ### 4. Check Disk Space Everywhere
 ```bash
 sshsync all "df -h /"
 ```
 ### 5. Sync Configuration Across Machines
 ```bash
 sshsync push --all ~/dotfiles/.bashrc ~/.bashrc
 sshsync push --all ~/dotfiles/.vimrc ~/.vimrc
 ```
 ## Summary
 Tailscale + sshsync = **Powerful Remote Management**
 - ✅ Secure connections everywhere (WireGuard encryption)
 - ✅ No firewall configuration needed
 - ✅ Easy addressing (MagicDNS)
 - ✅ Multi-host operations (sshsync groups)
 - ✅ Works from anywhere
 **Quick Start**:
 1. Install Tailscale: `brew install tailscale`
 2. Connect: `sudo tailscale up`
 3. Configure SSH config with Tailscale hostnames
 4. Initialize sshsync: `sshsync sync`
 5. Start managing: `sshsync all "uptime"`
 For more: https://tailscale.com/kb/
--- a/scripts/load_balancer.py
+++ b/scripts/load_balancer.py
@@ -0,0 +1,378 @@
 #!/usr/bin/env python3
 """
 Load balancer for Tailscale SSH Sync Agent.
 Intelligent task distribution based on machine resources.
 """
 import sys
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 from dataclasses import dataclass
 import logging
 # Add utils to path
 sys.path.insert(0, str(Path(__file__).parent))
 from utils.helpers import parse_cpu_load, parse_memory_usage, parse_disk_usage, calculate_load_score, classify_load_status
 from sshsync_wrapper import execute_on_host
 logger = logging.getLogger(__name__)
@dataclass
 class MachineMetrics:
    """Resource metrics for a machine."""
    host: str
    cpu_pct: float
    mem_pct: float
    disk_pct: float
    load_score: float
    status: str
 def get_machine_load(host: str, timeout: int = 10) -> Optional[MachineMetrics]:
    """
    Get CPU, memory, disk metrics for a machine.
    Args:
        host: Host to check
        timeout: Command timeout
    Returns:
        MachineMetrics object or None on failure
    Example:
        >>> metrics = get_machine_load("web-01")
        >>> metrics.cpu_pct
        45.2
        >>> metrics.load_score
        0.49
    """
    try:
        # Get CPU load
        cpu_result = execute_on_host(host, "uptime", timeout=timeout)
        cpu_data = {}
        if cpu_result.get('success'):
            cpu_data = parse_cpu_load(cpu_result['stdout'])
        # Get memory usage
        mem_result = execute_on_host(host, "free -m 2>/dev/null || vm_stat", timeout=timeout)
        mem_data = {}
        if mem_result.get('success'):
            mem_data = parse_memory_usage(mem_result['stdout'])
        # Get disk usage
        disk_result = execute_on_host(host, "df -h / | tail -1", timeout=timeout)
        disk_data = {}
        if disk_result.get('success'):
            disk_data = parse_disk_usage(disk_result['stdout'])
        # Calculate metrics
        # CPU: Use 1-min load average, normalize by assuming 4 cores (adjust as needed)
        cpu_pct = (cpu_data.get('load_1min', 0) / 4.0) * 100 if cpu_data else 50.0
        # Memory: Direct percentage
        mem_pct = mem_data.get('use_pct', 50.0)
        # Disk: Direct percentage
        disk_pct = disk_data.get('use_pct', 50.0)
        # Calculate load score
        score = calculate_load_score(cpu_pct, mem_pct, disk_pct)
        status = classify_load_status(score)
        return MachineMetrics(
            host=host,
            cpu_pct=cpu_pct,
            mem_pct=mem_pct,
            disk_pct=disk_pct,
            load_score=score,
            status=status
        )
    except Exception as e:
        logger.error(f"Error getting load for {host}: {e}")
        return None
 def select_optimal_host(candidates: List[str],
                       prefer_group: Optional[str] = None,
                       timeout: int = 10) -> Tuple[Optional[str], Optional[MachineMetrics]]:
    """
    Pick best host from candidates based on load.
    Args:
        candidates: List of candidate hosts
        prefer_group: Prefer hosts from this group if available
        timeout: Timeout for metric gathering
    Returns:
        Tuple of (selected_host, metrics)
    Example:
        >>> host, metrics = select_optimal_host(["web-01", "web-02", "web-03"])
        >>> host
        "web-03"
        >>> metrics.load_score
        0.28
    """
    if not candidates:
        return None, None
    # Get metrics for all candidates
    metrics_list: List[MachineMetrics] = []
    for host in candidates:
        metrics = get_machine_load(host, timeout=timeout)
        if metrics:
            metrics_list.append(metrics)
    if not metrics_list:
        logger.warning("No valid metrics collected from candidates")
        return None, None
    # Sort by load score (lower is better)
    metrics_list.sort(key=lambda m: m.load_score)
    # If prefer_group specified, prioritize those hosts if load is similar
    if prefer_group:
        from utils.helpers import parse_sshsync_config, get_groups_for_host
        groups_config = parse_sshsync_config()
        # Find hosts in preferred group
        preferred_metrics = [
            m for m in metrics_list
            if prefer_group in get_groups_for_host(m.host, groups_config)
        ]
        # Use preferred if load score within 20% of absolute best
        if preferred_metrics:
            best_score = metrics_list[0].load_score
            for m in preferred_metrics:
                if m.load_score <= best_score * 1.2:
                    return m.host, m
    # Return absolute best
    best = metrics_list[0]
    return best.host, best
 def get_group_capacity(group: str, timeout: int = 10) -> Dict:
    """
    Get aggregate capacity of a group.
    Args:
        group: Group name
        timeout: Timeout for metric gathering
    Returns:
        Dict with aggregate metrics:
        {
            'hosts': List[MachineMetrics],
            'total_hosts': int,
            'avg_cpu': float,
            'avg_mem': float,
            'avg_disk': float,
            'avg_load_score': float,
            'total_capacity': str  # descriptive
        }
    Example:
        >>> capacity = get_group_capacity("production")
        >>> capacity['avg_load_score']
        0.45
    """
    from utils.helpers import parse_sshsync_config
    groups_config = parse_sshsync_config()
    group_hosts = groups_config.get(group, [])
    if not group_hosts:
        return {
            'error': f'Group {group} not found or has no members',
            'hosts': []
        }
    # Get metrics for all hosts in group
    metrics_list: List[MachineMetrics] = []
    for host in group_hosts:
        metrics = get_machine_load(host, timeout=timeout)
        if metrics:
            metrics_list.append(metrics)
    if not metrics_list:
        return {
            'error': f'Could not get metrics for any hosts in {group}',
            'hosts': []
        }
    # Calculate aggregates
    avg_cpu = sum(m.cpu_pct for m in metrics_list) / len(metrics_list)
    avg_mem = sum(m.mem_pct for m in metrics_list) / len(metrics_list)
    avg_disk = sum(m.disk_pct for m in metrics_list) / len(metrics_list)
    avg_score = sum(m.load_score for m in metrics_list) / len(metrics_list)
    # Determine overall capacity description
    if avg_score < 0.4:
        capacity_desc = "High capacity available"
    elif avg_score < 0.7:
        capacity_desc = "Moderate capacity"
    else:
        capacity_desc = "Limited capacity"
    return {
        'group': group,
        'hosts': metrics_list,
        'total_hosts': len(metrics_list),
        'available_hosts': len(group_hosts),
        'avg_cpu': avg_cpu,
        'avg_mem': avg_mem,
        'avg_disk': avg_disk,
        'avg_load_score': avg_score,
        'total_capacity': capacity_desc
    }
 def distribute_tasks(tasks: List[Dict], hosts: List[str],
                    timeout: int = 10) -> Dict[str, List[Dict]]:
    """
    Distribute multiple tasks optimally across hosts.
    Args:
        tasks: List of task dicts (each with 'command', 'priority', etc)
        hosts: Available hosts
        timeout: Timeout for metric gathering
    Returns:
        Dict mapping hosts to assigned tasks
    Algorithm:
        - Get current load for all hosts
        - Assign tasks to least loaded hosts
        - Balance by estimated task weight
    Example:
        >>> tasks = [
        ...     {'command': 'npm run build', 'weight': 3},
        ...     {'command': 'npm test', 'weight': 2}
        ... ]
        >>> distribution = distribute_tasks(tasks, ["web-01", "web-02"])
        >>> distribution["web-01"]
        [{'command': 'npm run build', 'weight': 3}]
    """
    if not tasks or not hosts:
        return {}
    # Get current load for all hosts
    host_metrics = {}
    for host in hosts:
        metrics = get_machine_load(host, timeout=timeout)
        if metrics:
            host_metrics[host] = metrics
    if not host_metrics:
        logger.error("No valid host metrics available")
        return {}
    # Initialize assignment
    assignment: Dict[str, List[Dict]] = {host: [] for host in host_metrics.keys()}
    host_loads = {host: m.load_score for host, m in host_metrics.items()}
    # Sort tasks by weight (descending) to assign heavy tasks first
    sorted_tasks = sorted(
        tasks,
        key=lambda t: t.get('weight', 1),
        reverse=True
    )
    # Assign each task to least loaded host
    for task in sorted_tasks:
        # Find host with minimum current load
        min_host = min(host_loads.keys(), key=lambda h: host_loads[h])
        # Assign task
        assignment[min_host].append(task)
        # Update simulated load (add task weight normalized)
        task_weight = task.get('weight', 1)
        host_loads[min_host] += (task_weight * 0.1)  # 0.1 = scaling factor
    return assignment
 def format_load_report(metrics: MachineMetrics, compare_to_avg: Optional[Dict] = None) -> str:
    """
    Format load metrics as human-readable report.
    Args:
        metrics: Machine metrics
        compare_to_avg: Optional dict with avg_cpu, avg_mem, avg_disk for comparison
    Returns:
        Formatted report string
    Example:
        >>> metrics = MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate')
        >>> print(format_load_report(metrics))
        web-01: Load Score: 0.49 (moderate)
          CPU: 45.0% | Memory: 60.0% | Disk: 40.0%
    """
    lines = [
        f"{metrics.host}: Load Score: {metrics.load_score:.2f} ({metrics.status})",
        f"  CPU: {metrics.cpu_pct:.1f}% | Memory: {metrics.mem_pct:.1f}% | Disk: {metrics.disk_pct:.1f}%"
    ]
    if compare_to_avg:
        cpu_vs = metrics.cpu_pct - compare_to_avg.get('avg_cpu', 0)
        mem_vs = metrics.mem_pct - compare_to_avg.get('avg_mem', 0)
        disk_vs = metrics.disk_pct - compare_to_avg.get('avg_disk', 0)
        comparisons = []
        if abs(cpu_vs) > 10:
            comparisons.append(f"CPU {'+' if cpu_vs > 0 else ''}{cpu_vs:.0f}% vs avg")
        if abs(mem_vs) > 10:
            comparisons.append(f"Mem {'+' if mem_vs > 0 else ''}{mem_vs:.0f}% vs avg")
        if abs(disk_vs) > 10:
            comparisons.append(f"Disk {'+' if disk_vs > 0 else ''}{disk_vs:.0f}% vs avg")
        if comparisons:
            lines.append(f"  vs Average: {' | '.join(comparisons)}")
    return "\n".join(lines)
 def main():
    """Test load balancer functions."""
    print("Testing load balancer...\n")
    print("1. Testing select_optimal_host:")
    print("   (Requires configured hosts - using dry-run simulation)")
    # Simulate metrics
    test_metrics = [
        MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate'),
        MachineMetrics('web-02', 85, 70, 65, 0.75, 'high'),
        MachineMetrics('web-03', 20, 35, 30, 0.28, 'low'),
    ]
    # Sort by score
    test_metrics.sort(key=lambda m: m.load_score)
    best = test_metrics[0]
    print(f"   ✓ Best host: {best.host} (score: {best.load_score:.2f})")
    print(f"   Reason: {best.status} load")
    print("\n2. Format load report:")
    report = format_load_report(test_metrics[0], {
        'avg_cpu': 50,
        'avg_mem': 55,
        'avg_disk': 45
    })
    print(report)
    print("\n✅ Load balancer tested")
 if __name__ == "__main__":
    main()
--- a/scripts/sshsync_wrapper.py
+++ b/scripts/sshsync_wrapper.py
@@ -0,0 +1,409 @@
 #!/usr/bin/env python3
 """
 SSH Sync wrapper for Tailscale SSH Sync Agent.
 Python interface to sshsync CLI operations.
 """
 import subprocess
 import sys
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple
 import json
 import logging
 # Add utils to path
 sys.path.insert(0, str(Path(__file__).parent))
 from utils.helpers import parse_ssh_config, parse_sshsync_config, format_bytes, format_duration
 from utils.validators import validate_host, validate_group, validate_path_exists, validate_timeout, validate_command
 logger = logging.getLogger(__name__)
 def get_host_status(group: Optional[str] = None) -> Dict:
    """
    Get online/offline status of hosts.
    Args:
        group: Optional group to filter (None = all hosts)
    Returns:
        Dict with status info
    Example:
        >>> status = get_host_status()
        >>> status['online_count']
        8
    """
    try:
        # Run sshsync ls --with-status
        cmd = ["sshsync", "ls", "--with-status"]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
        if result.returncode != 0:
            return {'error': result.stderr, 'hosts': []}
        # Parse output
        hosts = []
        for line in result.stdout.strip().split('\n'):
            if not line or line.startswith('Host') or line.startswith('---'):
                continue
            parts = line.split()
            if len(parts) >= 2:
                host_name = parts[0]
                status = parts[1] if len(parts) > 1 else 'unknown'
                hosts.append({
                    'host': host_name,
                    'online': status.lower() in ['online', 'reachable', '✓'],
                    'status': status
                })
        # Filter by group if specified
        if group:
            groups_config = parse_sshsync_config()
            group_hosts = groups_config.get(group, [])
            hosts = [h for h in hosts if h['host'] in group_hosts]
        online_count = sum(1 for h in hosts if h['online'])
        return {
            'hosts': hosts,
            'total_count': len(hosts),
            'online_count': online_count,
            'offline_count': len(hosts) - online_count,
            'availability_pct': (online_count / len(hosts) * 100) if hosts else 0
        }
    except Exception as e:
        logger.error(f"Error getting host status: {e}")
        return {'error': str(e), 'hosts': []}
 def execute_on_all(command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
    """
    Execute command on all hosts.
    Args:
        command: Command to execute
        timeout: Timeout in seconds
        dry_run: If True, don't actually execute
    Returns:
        Dict with results per host
    Example:
        >>> result = execute_on_all("uptime", timeout=15)
        >>> len(result['results'])
        10
    """
    validate_command(command)
    validate_timeout(timeout)
    if dry_run:
        return {
            'dry_run': True,
            'command': command,
            'message': 'Would execute on all hosts'
        }
    try:
        cmd = ["sshsync", "all", f"--timeout={timeout}", command]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
        # Parse results (format varies, simplified here)
        return {
            'success': result.returncode == 0,
            'stdout': result.stdout,
            'stderr': result.stderr,
            'command': command
        }
    except subprocess.TimeoutExpired:
        return {'error': f'Command timed out after {timeout}s'}
    except Exception as e:
        return {'error': str(e)}
 def execute_on_group(group: str, command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
    """
    Execute command on specific group.
    Args:
        group: Group name
        command: Command to execute
        timeout: Timeout in seconds
        dry_run: Preview without executing
    Returns:
        Dict with execution results
    Example:
        >>> result = execute_on_group("web-servers", "df -h /var/www")
        >>> result['success']
        True
    """
    groups_config = parse_sshsync_config()
    validate_group(group, list(groups_config.keys()))
    validate_command(command)
    validate_timeout(timeout)
    if dry_run:
        group_hosts = groups_config.get(group, [])
        return {
            'dry_run': True,
            'group': group,
            'hosts': group_hosts,
            'command': command,
            'message': f'Would execute on {len(group_hosts)} hosts in group {group}'
        }
    try:
        cmd = ["sshsync", "group", f"--timeout={timeout}", group, command]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
        return {
            'success': result.returncode == 0,
            'group': group,
            'stdout': result.stdout,
            'stderr': result.stderr,
            'command': command
        }
    except subprocess.TimeoutExpired:
        return {'error': f'Command timed out after {timeout}s'}
    except Exception as e:
        return {'error': str(e)}
 def execute_on_host(host: str, command: str, timeout: int = 10) -> Dict:
    """
    Execute command on single host.
    Args:
        host: Host name
        command: Command to execute
        timeout: Timeout in seconds
    Returns:
        Dict with result
    Example:
        >>> result = execute_on_host("web-01", "hostname")
        >>> result['stdout']
        "web-01"
    """
    ssh_hosts = parse_ssh_config()
    validate_host(host, list(ssh_hosts.keys()))
    validate_command(command)
    validate_timeout(timeout)
    try:
        cmd = ["ssh", "-o", f"ConnectTimeout={timeout}", host, command]
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
        return {
            'success': result.returncode == 0,
            'host': host,
            'stdout': result.stdout,
            'stderr': result.stderr,
            'command': command
        }
    except subprocess.TimeoutExpired:
        return {'error': f'Command timed out after {timeout}s'}
    except Exception as e:
        return {'error': str(e)}
 def push_to_hosts(local_path: str, remote_path: str,
                  hosts: Optional[List[str]] = None,
                  group: Optional[str] = None,
                  recurse: bool = False,
                  dry_run: bool = False) -> Dict:
    """
    Push files to hosts.
    Args:
        local_path: Local file/directory path
        remote_path: Remote destination path
        hosts: Specific hosts (None = all if group also None)
        group: Group name
        recurse: Recursive copy
        dry_run: Preview without executing
    Returns:
        Dict with push results
    Example:
        >>> result = push_to_hosts("./dist", "/var/www/app", group="production", recurse=True)
        >>> result['success']
        True
    """
    validate_path_exists(local_path)
    if dry_run:
        return {
            'dry_run': True,
            'local_path': local_path,
            'remote_path': remote_path,
            'hosts': hosts,
            'group': group,
            'recurse': recurse,
            'message': 'Would push files'
        }
    try:
        cmd = ["sshsync", "push"]
        if hosts:
            for host in hosts:
                cmd.extend(["--host", host])
        elif group:
            cmd.extend(["--group", group])
        else:
            cmd.append("--all")
        if recurse:
            cmd.append("--recurse")
        cmd.extend([local_path, remote_path])
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
        return {
            'success': result.returncode == 0,
            'stdout': result.stdout,
            'stderr': result.stderr,
            'local_path': local_path,
            'remote_path': remote_path
        }
    except subprocess.TimeoutExpired:
        return {'error': 'Push operation timed out'}
    except Exception as e:
        return {'error': str(e)}
 def pull_from_host(host: str, remote_path: str, local_path: str,
                   recurse: bool = False, dry_run: bool = False) -> Dict:
    """
    Pull files from host.
    Args:
        host: Host to pull from
        remote_path: Remote file/directory path
        local_path: Local destination path
        recurse: Recursive copy
        dry_run: Preview without executing
    Returns:
        Dict with pull results
    Example:
        >>> result = pull_from_host("web-01", "/var/log/nginx", "./logs", recurse=True)
        >>> result['success']
        True
    """
    ssh_hosts = parse_ssh_config()
    validate_host(host, list(ssh_hosts.keys()))
    if dry_run:
        return {
            'dry_run': True,
            'host': host,
            'remote_path': remote_path,
            'local_path': local_path,
            'recurse': recurse,
            'message': f'Would pull from {host}'
        }
    try:
        cmd = ["sshsync", "pull", "--host", host]
        if recurse:
            cmd.append("--recurse")
        cmd.extend([remote_path, local_path])
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
        return {
            'success': result.returncode == 0,
            'host': host,
            'stdout': result.stdout,
            'stderr': result.stderr,
            'remote_path': remote_path,
            'local_path': local_path
        }
    except subprocess.TimeoutExpired:
        return {'error': 'Pull operation timed out'}
    except Exception as e:
        return {'error': str(e)}
 def list_hosts(with_status: bool = True) -> Dict:
    """
    List all configured hosts.
    Args:
        with_status: Include online/offline status
    Returns:
        Dict with hosts info
    Example:
        >>> result = list_hosts(with_status=True)
        >>> len(result['hosts'])
        10
    """
    if with_status:
        return get_host_status()
    else:
        ssh_hosts = parse_ssh_config()
        return {
            'hosts': [{'host': name} for name in ssh_hosts.keys()],
            'count': len(ssh_hosts)
        }
 def get_groups() -> Dict[str, List[str]]:
    """
    Get all defined groups and their members.
    Returns:
        Dict mapping group names to host lists
    Example:
        >>> groups = get_groups()
        >>> groups['production']
        ['prod-web-01', 'prod-db-01']
    """
    return parse_sshsync_config()
 def main():
    """Test sshsync wrapper functions."""
    print("Testing sshsync wrapper...\n")
    print("1. List hosts:")
    result = list_hosts(with_status=False)
    print(f"   Found {result.get('count', 0)} hosts")
    print("\n2. Get groups:")
    groups = get_groups()
    print(f"   Found {len(groups)} groups")
    for group, hosts in groups.items():
        print(f"   - {group}: {len(hosts)} hosts")
    print("\n3. Test dry-run:")
    result = execute_on_all("uptime", dry_run=True)
    print(f"   Dry-run: {result.get('message', 'OK')}")
    print("\n✅ sshsync wrapper tested")
 if __name__ == "__main__":
    main()
--- a/scripts/tailscale_manager.py
+++ b/scripts/tailscale_manager.py
@@ -0,0 +1,426 @@
 #!/usr/bin/env python3
 """
 Tailscale manager for Tailscale SSH Sync Agent.
 Tailscale-specific operations and status management.
 """
 import subprocess
 import re
 import json
 from typing import Dict, List, Optional
 from dataclasses import dataclass
 import logging
 logger = logging.getLogger(__name__)
@dataclass
 class TailscalePeer:
    """Represents a Tailscale peer."""
    hostname: str
    ip: str
    online: bool
    last_seen: Optional[str] = None
    os: Optional[str] = None
    relay: Optional[str] = None
 def get_tailscale_status() -> Dict:
    """
    Get Tailscale network status (all peers).
    Returns:
        Dict with network status:
        {
            'connected': bool,
            'peers': List[TailscalePeer],
            'online_count': int,
            'total_count': int,
            'self_ip': str
        }
    Example:
        >>> status = get_tailscale_status()
        >>> status['online_count']
        8
        >>> status['peers'][0].hostname
        'homelab-1'
    """
    try:
        # Get status in JSON format
        result = subprocess.run(
            ["tailscale", "status", "--json"],
            capture_output=True,
            text=True,
            timeout=10
        )
        if result.returncode != 0:
            # Try text format if JSON fails
            result = subprocess.run(
                ["tailscale", "status"],
                capture_output=True,
                text=True,
                timeout=10
            )
            if result.returncode != 0:
                return {
                    'connected': False,
                    'error': 'Tailscale not running or accessible',
                    'peers': []
                }
            # Parse text format
            return _parse_text_status(result.stdout)
        # Parse JSON format
        data = json.loads(result.stdout)
        return _parse_json_status(data)
    except FileNotFoundError:
        return {
            'connected': False,
            'error': 'Tailscale not installed',
            'peers': []
        }
    except subprocess.TimeoutExpired:
        return {
            'connected': False,
            'error': 'Timeout getting Tailscale status',
            'peers': []
        }
    except Exception as e:
        logger.error(f"Error getting Tailscale status: {e}")
        return {
            'connected': False,
            'error': str(e),
            'peers': []
        }
 def _parse_json_status(data: Dict) -> Dict:
    """Parse Tailscale JSON status."""
    peers = []
    self_data = data.get('Self', {})
    self_ip = self_data.get('TailscaleIPs', [''])[0]
    for peer_id, peer_data in data.get('Peer', {}).items():
        hostname = peer_data.get('HostName', 'unknown')
        ips = peer_data.get('TailscaleIPs', [])
        ip = ips[0] if ips else 'unknown'
        online = peer_data.get('Online', False)
        os = peer_data.get('OS', 'unknown')
        peers.append(TailscalePeer(
            hostname=hostname,
            ip=ip,
            online=online,
            os=os
        ))
    online_count = sum(1 for p in peers if p.online)
    return {
        'connected': True,
        'peers': peers,
        'online_count': online_count,
        'total_count': len(peers),
        'self_ip': self_ip
    }
 def _parse_text_status(output: str) -> Dict:
    """Parse Tailscale text status output."""
    peers = []
    self_ip = None
    for line in output.strip().split('\n'):
        line = line.strip()
        if not line:
            continue
        # Parse format: hostname  ip  status  ...
        parts = line.split()
        if len(parts) >= 2:
            hostname = parts[0]
            ip = parts[1] if len(parts) > 1 else 'unknown'
            # Check for self (usually marked with *)
            if hostname.endswith('-'):
                self_ip = ip
                continue
            # Determine online status from additional fields
            online = 'offline' not in line.lower()
            peers.append(TailscalePeer(
                hostname=hostname,
                ip=ip,
                online=online
            ))
    online_count = sum(1 for p in peers if p.online)
    return {
        'connected': True,
        'peers': peers,
        'online_count': online_count,
        'total_count': len(peers),
        'self_ip': self_ip or 'unknown'
    }
 def check_connectivity(host: str, timeout: int = 5) -> bool:
    """
    Ping host via Tailscale.
    Args:
        host: Hostname to ping
        timeout: Timeout in seconds
    Returns:
        True if host responds to ping
    Example:
        >>> check_connectivity("homelab-1")
        True
    """
    try:
        result = subprocess.run(
            ["tailscale", "ping", "--timeout", f"{timeout}s", "--c", "1", host],
            capture_output=True,
            text=True,
            timeout=timeout + 2
        )
        # Check if ping succeeded
        return result.returncode == 0 or 'pong' in result.stdout.lower()
    except (FileNotFoundError, subprocess.TimeoutExpired):
        return False
    except Exception as e:
        logger.error(f"Error pinging {host}: {e}")
        return False
 def get_peer_info(hostname: str) -> Optional[TailscalePeer]:
    """
    Get detailed info about a specific peer.
    Args:
        hostname: Peer hostname
    Returns:
        TailscalePeer object or None if not found
    Example:
        >>> peer = get_peer_info("homelab-1")
        >>> peer.ip
        '100.64.1.10'
    """
    status = get_tailscale_status()
    if not status.get('connected'):
        return None
    for peer in status.get('peers', []):
        if peer.hostname == hostname or hostname in peer.hostname:
            return peer
    return None
 def list_online_machines() -> List[str]:
    """
    List all online Tailscale machines.
    Returns:
        List of online machine hostnames
    Example:
        >>> machines = list_online_machines()
        >>> len(machines)
        8
    """
    status = get_tailscale_status()
    if not status.get('connected'):
        return []
    return [
        peer.hostname
        for peer in status.get('peers', [])
        if peer.online
    ]
 def get_machine_ip(hostname: str) -> Optional[str]:
    """
    Get Tailscale IP for a machine.
    Args:
        hostname: Machine hostname
    Returns:
        IP address or None if not found
    Example:
        >>> ip = get_machine_ip("homelab-1")
        >>> ip
        '100.64.1.10'
    """
    peer = get_peer_info(hostname)
    return peer.ip if peer else None
 def validate_tailscale_ssh(host: str, timeout: int = 10) -> Dict:
    """
    Check if Tailscale SSH is working for a host.
    Args:
        host: Host to check
        timeout: Connection timeout
    Returns:
        Dict with validation results:
        {
            'working': bool,
            'message': str,
            'details': Dict
        }
    Example:
        >>> result = validate_tailscale_ssh("homelab-1")
        >>> result['working']
        True
    """
    # First check if host is in Tailscale network
    peer = get_peer_info(host)
    if not peer:
        return {
            'working': False,
            'message': f'Host {host} not found in Tailscale network',
            'details': {'peer_found': False}
        }
    if not peer.online:
        return {
            'working': False,
            'message': f'Host {host} is offline in Tailscale',
            'details': {'peer_found': True, 'online': False}
        }
    # Check connectivity
    if not check_connectivity(host, timeout=timeout):
        return {
            'working': False,
            'message': f'Cannot ping {host} via Tailscale',
            'details': {'peer_found': True, 'online': True, 'ping': False}
        }
    # Try SSH connection
    try:
        result = subprocess.run(
            ["tailscale", "ssh", host, "echo", "test"],
            capture_output=True,
            text=True,
            timeout=timeout
        )
        if result.returncode == 0:
            return {
                'working': True,
                'message': f'Tailscale SSH to {host} is working',
                'details': {
                    'peer_found': True,
                    'online': True,
                    'ping': True,
                    'ssh': True,
                    'ip': peer.ip
                }
            }
        else:
            return {
                'working': False,
                'message': f'Tailscale SSH failed: {result.stderr}',
                'details': {
                    'peer_found': True,
                    'online': True,
                    'ping': True,
                    'ssh': False,
                    'error': result.stderr
                }
            }
    except subprocess.TimeoutExpired:
        return {
            'working': False,
            'message': f'Tailscale SSH timed out after {timeout}s',
            'details': {'timeout': True}
        }
    except Exception as e:
        return {
            'working': False,
            'message': f'Error testing Tailscale SSH: {e}',
            'details': {'error': str(e)}
        }
 def get_network_summary() -> str:
    """
    Get human-readable network summary.
    Returns:
        Formatted summary string
    Example:
        >>> print(get_network_summary())
        Tailscale Network: Connected
        Online: 8/10 machines (80%)
        Self IP: 100.64.1.5
    """
    status = get_tailscale_status()
    if not status.get('connected'):
        return "Tailscale Network: Not connected\nError: {}".format(
            status.get('error', 'Unknown error')
        )
    lines = [
        "Tailscale Network: Connected",
        f"Online: {status['online_count']}/{status['total_count']} machines ({status['online_count']/status['total_count']*100:.0f}%)",
        f"Self IP: {status.get('self_ip', 'unknown')}"
    ]
    return "\n".join(lines)
 def main():
    """Test Tailscale manager functions."""
    print("Testing Tailscale manager...\n")
    print("1. Get Tailscale status:")
    status = get_tailscale_status()
    if status.get('connected'):
        print(f"   ✓ Connected")
        print(f"   Peers: {status['total_count']} total, {status['online_count']} online")
    else:
        print(f"   ✗ Not connected: {status.get('error', 'Unknown error')}")
    print("\n2. List online machines:")
    machines = list_online_machines()
    print(f"   Found {len(machines)} online machines")
    for machine in machines[:5]:  # Show first 5
        print(f"   - {machine}")
    print("\n3. Network summary:")
    print(get_network_summary())
    print("\n✅ Tailscale manager tested")
 if __name__ == "__main__":
    main()
--- a/scripts/utils/helpers.py
+++ b/scripts/utils/helpers.py
@@ -0,0 +1,628 @@
 #!/usr/bin/env python3
 """
 Helper utilities for Tailscale SSH Sync Agent.
 Provides common formatting, parsing, and utility functions.
 """
 import os
 import re
 import subprocess
 from datetime import datetime, timedelta
 from pathlib import Path
 from typing import Dict, List, Optional, Tuple, Any
 import yaml
 import logging
 logger = logging.getLogger(__name__)
 def format_bytes(bytes_value: int) -> str:
    """
    Format bytes as human-readable string.
    Args:
        bytes_value: Number of bytes
    Returns:
        Formatted string (e.g., "12.3 MB", "1.5 GB")
    Example:
        >>> format_bytes(12582912)
        "12.0 MB"
        >>> format_bytes(1610612736)
        "1.5 GB"
    """
    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
        if bytes_value < 1024.0:
            return f"{bytes_value:.1f} {unit}"
        bytes_value /= 1024.0
    return f"{bytes_value:.1f} PB"
 def format_duration(seconds: float) -> str:
    """
    Format duration as human-readable string.
    Args:
        seconds: Duration in seconds
    Returns:
        Formatted string (e.g., "2m 15s", "1h 30m")
    Example:
        >>> format_duration(135)
        "2m 15s"
        >>> format_duration(5430)
        "1h 30m 30s"
    """
    if seconds < 60:
        return f"{int(seconds)}s"
    minutes = int(seconds // 60)
    secs = int(seconds % 60)
    if minutes < 60:
        return f"{minutes}m {secs}s" if secs > 0 else f"{minutes}m"
    hours = minutes // 60
    minutes = minutes % 60
    parts = [f"{hours}h"]
    if minutes > 0:
        parts.append(f"{minutes}m")
    if secs > 0 and hours == 0:  # Only show seconds if < 1 hour
        parts.append(f"{secs}s")
    return " ".join(parts)
 def format_percentage(value: float, decimals: int = 1) -> str:
    """
    Format percentage with specified decimals.
    Args:
        value: Percentage value (0-100)
        decimals: Number of decimal places
    Returns:
        Formatted string (e.g., "45.5%")
    Example:
        >>> format_percentage(45.567)
        "45.6%"
    """
    return f"{value:.{decimals}f}%"
 def parse_ssh_config(config_path: Optional[Path] = None) -> Dict[str, Dict[str, str]]:
    """
    Parse SSH config file for host definitions.
    Args:
        config_path: Path to SSH config (default: ~/.ssh/config)
    Returns:
        Dict mapping host aliases to their configuration:
        {
            'host-alias': {
                'hostname': '100.64.1.10',
                'user': 'admin',
                'port': '22',
                'identityfile': '~/.ssh/id_ed25519'
            }
        }
    Example:
        >>> hosts = parse_ssh_config()
        >>> hosts['homelab-1']['hostname']
        '100.64.1.10'
    """
    if config_path is None:
        config_path = Path.home() / '.ssh' / 'config'
    if not config_path.exists():
        logger.warning(f"SSH config not found: {config_path}")
        return {}
    hosts = {}
    current_host = None
    try:
        with open(config_path, 'r') as f:
            for line in f:
                line = line.strip()
                # Skip comments and empty lines
                if not line or line.startswith('#'):
                    continue
                # Host directive
                if line.lower().startswith('host '):
                    host_alias = line.split(maxsplit=1)[1]
                    # Skip wildcards
                    if '*' not in host_alias and '?' not in host_alias:
                        current_host = host_alias
                        hosts[current_host] = {}
                # Configuration directives
                elif current_host:
                    parts = line.split(maxsplit=1)
                    if len(parts) == 2:
                        key, value = parts
                        hosts[current_host][key.lower()] = value
        return hosts
    except Exception as e:
        logger.error(f"Error parsing SSH config: {e}")
        return {}
 def parse_sshsync_config(config_path: Optional[Path] = None) -> Dict[str, List[str]]:
    """
    Parse sshsync config file for group definitions.
    Args:
        config_path: Path to sshsync config (default: ~/.config/sshsync/config.yaml)
    Returns:
        Dict mapping group names to list of hosts:
        {
            'production': ['prod-web-01', 'prod-db-01'],
            'development': ['dev-laptop', 'dev-desktop']
        }
    Example:
        >>> groups = parse_sshsync_config()
        >>> groups['production']
        ['prod-web-01', 'prod-db-01']
    """
    if config_path is None:
        config_path = Path.home() / '.config' / 'sshsync' / 'config.yaml'
    if not config_path.exists():
        logger.warning(f"sshsync config not found: {config_path}")
        return {}
    try:
        with open(config_path, 'r') as f:
            config = yaml.safe_load(f)
        return config.get('groups', {})
    except Exception as e:
        logger.error(f"Error parsing sshsync config: {e}")
        return {}
 def get_timestamp(iso: bool = True) -> str:
    """
    Get current timestamp.
    Args:
        iso: If True, return ISO format; otherwise human-readable
    Returns:
        Timestamp string
    Example:
        >>> get_timestamp(iso=True)
        "2025-10-19T19:43:41Z"
        >>> get_timestamp(iso=False)
        "2025-10-19 19:43:41"
    """
    now = datetime.now()
    if iso:
        return now.strftime("%Y-%m-%dT%H:%M:%SZ")
    else:
        return now.strftime("%Y-%m-%d %H:%M:%S")
 def safe_execute(func, *args, default=None, **kwargs) -> Any:
    """
    Execute function with error handling.
    Args:
        func: Function to execute
        *args: Positional arguments
        default: Value to return on error
        **kwargs: Keyword arguments
    Returns:
        Function result or default on error
    Example:
        >>> safe_execute(int, "not_a_number", default=0)
        0
        >>> safe_execute(int, "42")
        42
    """
    try:
        return func(*args, **kwargs)
    except Exception as e:
        logger.error(f"Error executing {func.__name__}: {e}")
        return default
 def validate_path(path: str, must_exist: bool = True) -> bool:
    """
    Check if path is valid and accessible.
    Args:
        path: Path to validate
        must_exist: If True, path must exist
    Returns:
        True if valid, False otherwise
    Example:
        >>> validate_path("/tmp")
        True
        >>> validate_path("/nonexistent", must_exist=True)
        False
    """
    p = Path(path).expanduser()
    if must_exist:
        return p.exists()
    else:
        # Check if parent directory exists (for paths that will be created)
        return p.parent.exists()
 def parse_disk_usage(df_output: str) -> Dict[str, Any]:
    """
    Parse 'df' command output.
    Args:
        df_output: Output from 'df -h' command
    Returns:
        Dict with disk usage info:
        {
            'filesystem': '/dev/sda1',
            'size': '100G',
            'used': '45G',
            'available': '50G',
            'use_pct': 45,
            'mount': '/'
        }
    Example:
        >>> output = "Filesystem     Size  Used Avail Use% Mounted on\\n/dev/sda1      100G   45G   50G  45% /"
        >>> parse_disk_usage(output)
        {'filesystem': '/dev/sda1', 'size': '100G', ...}
    """
    lines = df_output.strip().split('\n')
    if len(lines) < 2:
        return {}
    # Parse last line (actual data, not header)
    data_line = lines[-1]
    parts = data_line.split()
    if len(parts) < 6:
        return {}
    try:
        return {
            'filesystem': parts[0],
            'size': parts[1],
            'used': parts[2],
            'available': parts[3],
            'use_pct': int(parts[4].rstrip('%')),
            'mount': parts[5]
        }
    except (ValueError, IndexError) as e:
        logger.error(f"Error parsing disk usage: {e}")
        return {}
 def parse_memory_usage(free_output: str) -> Dict[str, Any]:
    """
    Parse 'free' command output (Linux).
    Args:
        free_output: Output from 'free -m' command
    Returns:
        Dict with memory info:
        {
            'total': 16384,  # MB
            'used': 8192,
            'free': 8192,
            'use_pct': 50.0
        }
    Example:
        >>> output = "Mem:   16384   8192   8192   0   0   0"
        >>> parse_memory_usage(output)
        {'total': 16384, 'used': 8192, ...}
    """
    lines = free_output.strip().split('\n')
    for line in lines:
        if line.startswith('Mem:'):
            parts = line.split()
            if len(parts) >= 3:
                try:
                    total = int(parts[1])
                    used = int(parts[2])
                    free = int(parts[3]) if len(parts) > 3 else (total - used)
                    return {
                        'total': total,
                        'used': used,
                        'free': free,
                        'use_pct': (used / total * 100) if total > 0 else 0
                    }
                except (ValueError, IndexError) as e:
                    logger.error(f"Error parsing memory usage: {e}")
    return {}
 def parse_cpu_load(uptime_output: str) -> Dict[str, float]:
    """
    Parse 'uptime' command output for load averages.
    Args:
        uptime_output: Output from 'uptime' command
    Returns:
        Dict with load averages:
        {
            'load_1min': 0.45,
            'load_5min': 0.38,
            'load_15min': 0.32
        }
    Example:
        >>> output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
        >>> parse_cpu_load(output)
        {'load_1min': 0.45, 'load_5min': 0.38, 'load_15min': 0.32}
    """
    # Find "load average:" part
    match = re.search(r'load average:\s+([\d.]+),\s+([\d.]+),\s+([\d.]+)', uptime_output)
    if match:
        try:
            return {
                'load_1min': float(match.group(1)),
                'load_5min': float(match.group(2)),
                'load_15min': float(match.group(3))
            }
        except ValueError as e:
            logger.error(f"Error parsing CPU load: {e}")
    return {}
 def format_host_status(host: str, online: bool, groups: List[str],
                       latency: Optional[int] = None,
                       tailscale_connected: bool = False) -> str:
    """
    Format host status as display string.
    Args:
        host: Host name
        online: Whether host is online
        groups: List of groups host belongs to
        latency: Latency in ms (optional)
        tailscale_connected: Tailscale connection status
    Returns:
        Formatted status string
    Example:
        >>> format_host_status("web-01", True, ["production", "web"], 25, True)
        "🟢 web-01 (production, web) - Online - Tailscale: Connected | Latency: 25ms"
    """
    icon = "🟢" if online else "🔴"
    status = "Online" if online else "Offline"
    group_str = ", ".join(groups) if groups else "no group"
    parts = [f"{icon} {host} ({group_str}) - {status}"]
    if tailscale_connected:
        parts.append("Tailscale: Connected")
    if latency is not None and online:
        parts.append(f"Latency: {latency}ms")
    return " - ".join(parts)
 def calculate_load_score(cpu_pct: float, mem_pct: float, disk_pct: float) -> float:
    """
    Calculate composite load score for a machine.
    Args:
        cpu_pct: CPU usage percentage (0-100)
        mem_pct: Memory usage percentage (0-100)
        disk_pct: Disk usage percentage (0-100)
    Returns:
        Load score (0-1, lower is better)
    Formula:
        score = (cpu * 0.4) + (mem * 0.3) + (disk * 0.3)
    Example:
        >>> calculate_load_score(45, 60, 40)
        0.48  # (0.45*0.4 + 0.60*0.3 + 0.40*0.3)
    """
    return (cpu_pct * 0.4 + mem_pct * 0.3 + disk_pct * 0.3) / 100
 def classify_load_status(score: float) -> str:
    """
    Classify load score into status category.
    Args:
        score: Load score (0-1)
    Returns:
        Status string: "low", "moderate", or "high"
    Example:
        >>> classify_load_status(0.28)
        "low"
        >>> classify_load_status(0.55)
        "moderate"
        >>> classify_load_status(0.82)
        "high"
    """
    if score < 0.4:
        return "low"
    elif score < 0.7:
        return "moderate"
    else:
        return "high"
 def classify_latency(latency_ms: int) -> Tuple[str, str]:
    """
    Classify network latency.
    Args:
        latency_ms: Latency in milliseconds
    Returns:
        Tuple of (status, description)
    Example:
        >>> classify_latency(25)
        ("excellent", "Ideal for interactive tasks")
        >>> classify_latency(150)
        ("fair", "May impact interactive workflows")
    """
    if latency_ms < 50:
        return ("excellent", "Ideal for interactive tasks")
    elif latency_ms < 100:
        return ("good", "Suitable for most operations")
    elif latency_ms < 200:
        return ("fair", "May impact interactive workflows")
    else:
        return ("poor", "Investigate network issues")
 def get_hosts_from_groups(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
    """
    Get list of hosts in a group.
    Args:
        group: Group name
        groups_config: Groups configuration dict
    Returns:
        List of host names in group
    Example:
        >>> groups = {'production': ['web-01', 'db-01']}
        >>> get_hosts_from_groups('production', groups)
        ['web-01', 'db-01']
    """
    return groups_config.get(group, [])
 def get_groups_for_host(host: str, groups_config: Dict[str, List[str]]) -> List[str]:
    """
    Get list of groups a host belongs to.
    Args:
        host: Host name
        groups_config: Groups configuration dict
    Returns:
        List of group names
    Example:
        >>> groups = {'production': ['web-01'], 'web': ['web-01', 'web-02']}
        >>> get_groups_for_host('web-01', groups)
        ['production', 'web']
    """
    return [group for group, hosts in groups_config.items() if host in hosts]
 def run_command(command: str, timeout: int = 10) -> Tuple[bool, str, str]:
    """
    Run shell command with timeout.
    Args:
        command: Command to execute
        timeout: Timeout in seconds
    Returns:
        Tuple of (success, stdout, stderr)
    Example:
        >>> success, stdout, stderr = run_command("echo hello")
        >>> success
        True
        >>> stdout.strip()
        "hello"
    """
    try:
        result = subprocess.run(
            command,
            shell=True,
            capture_output=True,
            text=True,
            timeout=timeout
        )
        return (
            result.returncode == 0,
            result.stdout,
            result.stderr
        )
    except subprocess.TimeoutExpired:
        return (False, "", f"Command timed out after {timeout}s")
    except Exception as e:
        return (False, "", str(e))
 def main():
    """Test helper functions."""
    print("Testing helper functions...\n")
    # Test formatting
    print("1. Format bytes:")
    print(f"   12582912 bytes = {format_bytes(12582912)}")
    print(f"   1610612736 bytes = {format_bytes(1610612736)}")
    print("\n2. Format duration:")
    print(f"   135 seconds = {format_duration(135)}")
    print(f"   5430 seconds = {format_duration(5430)}")
    print("\n3. Format percentage:")
    print(f"   45.567 = {format_percentage(45.567)}")
    print("\n4. Calculate load score:")
    score = calculate_load_score(45, 60, 40)
    print(f"   CPU 45%, Mem 60%, Disk 40% = {score:.2f}")
    print(f"   Status: {classify_load_status(score)}")
    print("\n5. Classify latency:")
    latencies = [25, 75, 150, 250]
    for lat in latencies:
        status, desc = classify_latency(lat)
        print(f"   {lat}ms: {status} - {desc}")
    print("\n6. Parse SSH config:")
    ssh_hosts = parse_ssh_config()
    print(f"   Found {len(ssh_hosts)} hosts")
    print("\n7. Parse sshsync config:")
    groups = parse_sshsync_config()
    print(f"   Found {len(groups)} groups")
    for group, hosts in groups.items():
        print(f"   - {group}: {len(hosts)} hosts")
    print("\n✅ All helpers tested successfully")
 if __name__ == "__main__":
    main()
--- a/scripts/utils/validators/init.py
+++ b/scripts/utils/validators/init.py
@@ -0,0 +1,43 @@
 """
 Validators package for Tailscale SSH Sync Agent.
 """
 from .parameter_validator import (
    ValidationError,
    validate_host,
    validate_group,
    validate_path_exists,
    validate_timeout,
    validate_command
 )
 from .host_validator import (
    validate_ssh_config,
    validate_host_reachable,
    validate_group_members,
    get_invalid_hosts
 )
 from .connection_validator import (
    validate_ssh_connection,
    validate_tailscale_connection,
    validate_ssh_key,
    get_connection_diagnostics
 )
 __all__ = [
    'ValidationError',
    'validate_host',
    'validate_group',
    'validate_path_exists',
    'validate_timeout',
    'validate_command',
    'validate_ssh_config',
    'validate_host_reachable',
    'validate_group_members',
    'get_invalid_hosts',
    'validate_ssh_connection',
    'validate_tailscale_connection',
    'validate_ssh_key',
    'get_connection_diagnostics',
 ]
--- a/scripts/utils/validators/connection_validator.py
+++ b/scripts/utils/validators/connection_validator.py
@@ -0,0 +1,275 @@
 #!/usr/bin/env python3
 """
 Connection validators for Tailscale SSH Sync Agent.
 Validates SSH and Tailscale connections.
 """
 import subprocess
 from typing import Dict, Optional
 import logging
 from .parameter_validator import ValidationError
 logger = logging.getLogger(__name__)
 def validate_ssh_connection(host: str, timeout: int = 10) -> bool:
    """
    Test SSH connection works.
    Args:
        host: Host to connect to
        timeout: Connection timeout in seconds
    Returns:
        True if SSH connection successful
    Raises:
        ValidationError: If connection fails
    Example:
        >>> validate_ssh_connection("web-01")
        True
    """
    try:
        # Try to execute a simple command via SSH
        result = subprocess.run(
            ["ssh", "-o", "ConnectTimeout={}".format(timeout),
             "-o", "BatchMode=yes",
             "-o", "StrictHostKeyChecking=no",
             host, "echo", "test"],
            capture_output=True,
            text=True,
            timeout=timeout + 5
        )
        if result.returncode == 0:
            return True
        else:
            # Parse error message
            error_msg = result.stderr.strip()
            if "Permission denied" in error_msg:
                raise ValidationError(
                    f"SSH authentication failed for '{host}'\n"
                    "Check:\n"
                    "1. SSH key is added: ssh-add -l\n"
                    "2. Public key is on remote: cat ~/.ssh/authorized_keys\n"
                    "3. User/key in SSH config is correct"
                )
            elif "Connection refused" in error_msg:
                raise ValidationError(
                    f"SSH connection refused for '{host}'\n"
                    "Check:\n"
                    "1. SSH server is running on remote\n"
                    "2. Port 22 is not blocked by firewall"
                )
            elif "Connection timed out" in error_msg or "timeout" in error_msg.lower():
                raise ValidationError(
                    f"SSH connection timed out for '{host}'\n"
                    "Check:\n"
                    "1. Host is reachable (ping test)\n"
                    "2. Tailscale is connected\n"
                    "3. Network connectivity"
                )
            else:
                raise ValidationError(
                    f"SSH connection failed for '{host}': {error_msg}"
                )
    except subprocess.TimeoutExpired:
        raise ValidationError(
            f"SSH connection timed out for '{host}' (>{timeout}s)"
        )
    except Exception as e:
        raise ValidationError(f"Error testing SSH connection to '{host}': {e}")
 def validate_tailscale_connection(host: str) -> bool:
    """
    Test Tailscale connectivity to host.
    Args:
        host: Host to check
    Returns:
        True if Tailscale connection active
    Raises:
        ValidationError: If Tailscale not connected
    Example:
        >>> validate_tailscale_connection("web-01")
        True
    """
    try:
        # Check if tailscale is running
        result = subprocess.run(
            ["tailscale", "status"],
            capture_output=True,
            text=True,
            timeout=5
        )
        if result.returncode != 0:
            raise ValidationError(
                "Tailscale is not running\n"
                "Start Tailscale: sudo tailscale up"
            )
        # Check if specific host is in the network
        if host in result.stdout or host.replace('-', '.') in result.stdout:
            return True
        else:
            raise ValidationError(
                f"Host '{host}' not found in Tailscale network\n"
                "Ensure host is:\n"
                "1. Connected to Tailscale\n"
                "2. In the same tailnet\n"
                "3. Not expired/offline"
            )
    except FileNotFoundError:
        raise ValidationError(
            "Tailscale not installed\n"
            "Install: https://tailscale.com/download"
        )
    except subprocess.TimeoutExpired:
        raise ValidationError("Timeout checking Tailscale status")
    except Exception as e:
        raise ValidationError(f"Error checking Tailscale connection: {e}")
 def validate_ssh_key(host: str) -> bool:
    """
    Check SSH key authentication is working.
    Args:
        host: Host to check
    Returns:
        True if SSH key auth works
    Raises:
        ValidationError: If key auth fails
    Example:
        >>> validate_ssh_key("web-01")
        True
    """
    try:
        # Test connection with explicit key-only auth
        result = subprocess.run(
            ["ssh", "-o", "BatchMode=yes",
             "-o", "PasswordAuthentication=no",
             "-o", "ConnectTimeout=5",
             host, "echo", "test"],
            capture_output=True,
            text=True,
            timeout=10
        )
        if result.returncode == 0:
            return True
        else:
            error_msg = result.stderr.strip()
            if "Permission denied" in error_msg:
                raise ValidationError(
                    f"SSH key authentication failed for '{host}'\n"
                    "Fix:\n"
                    "1. Add your SSH key: ssh-add ~/.ssh/id_ed25519\n"
                    "2. Copy public key to remote: ssh-copy-id {}\n"
                    "3. Verify: ssh -v {} 2>&1 | grep -i 'offering public key'".format(host, host)
                )
            else:
                raise ValidationError(
                    f"SSH key validation failed for '{host}': {error_msg}"
                )
    except subprocess.TimeoutExpired:
        raise ValidationError(f"Timeout validating SSH key for '{host}'")
    except Exception as e:
        raise ValidationError(f"Error validating SSH key for '{host}': {e}")
 def get_connection_diagnostics(host: str) -> Dict[str, any]:
    """
    Comprehensive connection testing.
    Args:
        host: Host to diagnose
    Returns:
        Dict with diagnostic results:
        {
            'ping': {'success': bool, 'message': str},
            'ssh': {'success': bool, 'message': str},
            'tailscale': {'success': bool, 'message': str},
            'ssh_key': {'success': bool, 'message': str}
        }
    Example:
        >>> diag = get_connection_diagnostics("web-01")
        >>> diag['ssh']['success']
        True
    """
    diagnostics = {}
    # Test 1: Ping
    try:
        result = subprocess.run(
            ["ping", "-c", "1", "-W", "2", host],
            capture_output=True,
            timeout=3
        )
        diagnostics['ping'] = {
            'success': result.returncode == 0,
            'message': 'Host is reachable' if result.returncode == 0 else 'Host not reachable'
        }
    except Exception as e:
        diagnostics['ping'] = {'success': False, 'message': str(e)}
    # Test 2: SSH connection
    try:
        validate_ssh_connection(host, timeout=5)
        diagnostics['ssh'] = {'success': True, 'message': 'SSH connection works'}
    except ValidationError as e:
        diagnostics['ssh'] = {'success': False, 'message': str(e).split('\n')[0]}
    # Test 3: Tailscale
    try:
        validate_tailscale_connection(host)
        diagnostics['tailscale'] = {'success': True, 'message': 'Tailscale connected'}
    except ValidationError as e:
        diagnostics['tailscale'] = {'success': False, 'message': str(e).split('\n')[0]}
    # Test 4: SSH key
    try:
        validate_ssh_key(host)
        diagnostics['ssh_key'] = {'success': True, 'message': 'SSH key authentication works'}
    except ValidationError as e:
        diagnostics['ssh_key'] = {'success': False, 'message': str(e).split('\n')[0]}
    return diagnostics
 def main():
    """Test connection validators."""
    print("Testing connection validators...\n")
    print("1. Testing connection diagnostics:")
    try:
        diag = get_connection_diagnostics("localhost")
        print("   Results:")
        for test, result in diag.items():
            status = "✓" if result['success'] else "✗"
            print(f"   {status} {test}: {result['message']}")
    except Exception as e:
        print(f"   Error: {e}")
    print("\n✅ Connection validators tested")
 if __name__ == "__main__":
    main()
--- a/scripts/utils/validators/host_validator.py
+++ b/scripts/utils/validators/host_validator.py
@@ -0,0 +1,232 @@
 #!/usr/bin/env python3
 """
 Host validators for Tailscale SSH Sync Agent.
 Validates host configuration and availability.
 """
 import subprocess
 from typing import List, Dict, Optional
 from pathlib import Path
 import logging
 from .parameter_validator import ValidationError
 logger = logging.getLogger(__name__)
 def validate_ssh_config(host: str, config_path: Optional[Path] = None) -> bool:
    """
    Check if host has SSH config entry.
    Args:
        host: Host name to check
        config_path: Path to SSH config (default: ~/.ssh/config)
    Returns:
        True if host is in SSH config
    Raises:
        ValidationError: If host not found in config
    Example:
        >>> validate_ssh_config("web-01")
        True
    """
    if config_path is None:
        config_path = Path.home() / '.ssh' / 'config'
    if not config_path.exists():
        raise ValidationError(
            f"SSH config file not found: {config_path}\n"
            "Create ~/.ssh/config with your host definitions"
        )
    # Parse SSH config for this host
    host_found = False
    try:
        with open(config_path, 'r') as f:
            for line in f:
                line = line.strip()
                if line.lower().startswith('host ') and host in line:
                    host_found = True
                    break
        if not host_found:
            raise ValidationError(
                f"Host '{host}' not found in SSH config: {config_path}\n"
                "Add host to SSH config:\n"
                f"Host {host}\n"
                f"  HostName <IP_ADDRESS>\n"
                f"  User <USERNAME>"
            )
        return True
    except IOError as e:
        raise ValidationError(f"Error reading SSH config: {e}")
 def validate_host_reachable(host: str, timeout: int = 5) -> bool:
    """
    Check if host is reachable via ping.
    Args:
        host: Host name to check
        timeout: Timeout in seconds
    Returns:
        True if host is reachable
    Raises:
        ValidationError: If host is not reachable
    Example:
        >>> validate_host_reachable("web-01", timeout=5)
        True
    """
    try:
        # Try to resolve via SSH config first
        result = subprocess.run(
            ["ssh", "-G", host],
            capture_output=True,
            text=True,
            timeout=2
        )
        if result.returncode == 0:
            # Extract hostname from SSH config
            for line in result.stdout.split('\n'):
                if line.startswith('hostname '):
                    actual_host = line.split()[1]
                    break
            else:
                actual_host = host
        else:
            actual_host = host
        # Ping the host
        ping_result = subprocess.run(
            ["ping", "-c", "1", "-W", str(timeout), actual_host],
            capture_output=True,
            text=True,
            timeout=timeout + 1
        )
        if ping_result.returncode == 0:
            return True
        else:
            raise ValidationError(
                f"Host '{host}' ({actual_host}) is not reachable\n"
                "Check:\n"
                "1. Host is powered on\n"
                "2. Tailscale is connected\n"
                "3. Network connectivity"
            )
    except subprocess.TimeoutExpired:
        raise ValidationError(f"Timeout checking host '{host}' (>{timeout}s)")
    except Exception as e:
        raise ValidationError(f"Error checking host '{host}': {e}")
 def validate_group_members(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
    """
    Ensure group has valid members.
    Args:
        group: Group name
        groups_config: Groups configuration dict
    Returns:
        List of valid hosts in group
    Raises:
        ValidationError: If group is empty or has no valid members
    Example:
        >>> groups = {'production': ['web-01', 'db-01']}
        >>> validate_group_members('production', groups)
        ['web-01', 'db-01']
    """
    if group not in groups_config:
        raise ValidationError(
            f"Group '{group}' not found in configuration\n"
            f"Available groups: {', '.join(groups_config.keys())}"
        )
    members = groups_config[group]
    if not members:
        raise ValidationError(
            f"Group '{group}' has no members\n"
            f"Add hosts to group with: sshsync gadd {group}"
        )
    if not isinstance(members, list):
        raise ValidationError(
            f"Invalid group configuration for '{group}': members must be a list"
        )
    return members
 def get_invalid_hosts(hosts: List[str], config_path: Optional[Path] = None) -> List[str]:
    """
    Find hosts without valid SSH config.
    Args:
        hosts: List of host names
        config_path: Path to SSH config
    Returns:
        List of hosts without valid config
    Example:
        >>> get_invalid_hosts(["web-01", "nonexistent"])
        ["nonexistent"]
    """
    if config_path is None:
        config_path = Path.home() / '.ssh' / 'config'
    if not config_path.exists():
        return hosts  # All invalid if no config
    # Parse SSH config
    valid_hosts = set()
    try:
        with open(config_path, 'r') as f:
            for line in f:
                line = line.strip()
                if line.lower().startswith('host '):
                    host_alias = line.split(maxsplit=1)[1]
                    if '*' not in host_alias and '?' not in host_alias:
                        valid_hosts.add(host_alias)
    except IOError:
        return hosts
    # Find invalid hosts
    return [h for h in hosts if h not in valid_hosts]
 def main():
    """Test host validators."""
    print("Testing host validators...\n")
    print("1. Testing validate_ssh_config():")
    try:
        validate_ssh_config("localhost")
        print("   ✓ localhost has SSH config")
    except ValidationError as e:
        print(f"   Note: {e.args[0].split(chr(10))[0]}")
    print("\n2. Testing get_invalid_hosts():")
    test_hosts = ["localhost", "nonexistent-host-12345"]
    invalid = get_invalid_hosts(test_hosts)
    print(f"   Invalid hosts: {invalid}")
    print("\n✅ Host validators tested")
 if __name__ == "__main__":
    main()
--- a/scripts/utils/validators/parameter_validator.py
+++ b/scripts/utils/validators/parameter_validator.py
@@ -0,0 +1,363 @@
 #!/usr/bin/env python3
 """
 Parameter validators for Tailscale SSH Sync Agent.
 Validates user inputs before making operations.
 """
 from typing import List, Optional
 from pathlib import Path
 import re
 import logging
 logger = logging.getLogger(__name__)
 class ValidationError(Exception):
    """Raised when validation fails."""
    pass
 def validate_host(host: str, valid_hosts: Optional[List[str]] = None) -> str:
    """
    Validate host parameter.
    Args:
        host: Host name or alias
        valid_hosts: List of valid hosts (None to skip check)
    Returns:
        str: Validated and normalized host name
    Raises:
        ValidationError: If host is invalid
    Example:
        >>> validate_host("web-01")
        "web-01"
        >>> validate_host("web-01", ["web-01", "web-02"])
        "web-01"
    """
    if not host:
        raise ValidationError("Host cannot be empty")
    if not isinstance(host, str):
        raise ValidationError(f"Host must be string, got {type(host)}")
    # Normalize (strip whitespace, lowercase for comparison)
    host = host.strip()
    # Basic validation: alphanumeric, dash, underscore, dot
    if not re.match(r'^[a-zA-Z0-9._-]+$', host):
        raise ValidationError(
            f"Invalid host name format: {host}\n"
            "Host names must contain only letters, numbers, dots, dashes, and underscores"
        )
    # Check if valid (if list provided)
    if valid_hosts:
        # Try exact match first
        if host in valid_hosts:
            return host
        # Try case-insensitive match
        for valid_host in valid_hosts:
            if host.lower() == valid_host.lower():
                return valid_host
        # Not found - provide suggestions
        suggestions = [h for h in valid_hosts if host[:3].lower() in h.lower()]
        raise ValidationError(
            f"Invalid host: {host}\n"
            f"Valid options: {', '.join(valid_hosts[:10])}\n"
            + (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
        )
    return host
 def validate_group(group: str, valid_groups: Optional[List[str]] = None) -> str:
    """
    Validate group parameter.
    Args:
        group: Group name
        valid_groups: List of valid groups (None to skip check)
    Returns:
        str: Validated group name
    Raises:
        ValidationError: If group is invalid
    Example:
        >>> validate_group("production")
        "production"
        >>> validate_group("prod", ["production", "development"])
        ValidationError: Invalid group: prod
    """
    if not group:
        raise ValidationError("Group cannot be empty")
    if not isinstance(group, str):
        raise ValidationError(f"Group must be string, got {type(group)}")
    # Normalize
    group = group.strip().lower()
    # Basic validation
    if not re.match(r'^[a-z0-9_-]+$', group):
        raise ValidationError(
            f"Invalid group name format: {group}\n"
            "Group names must contain only lowercase letters, numbers, dashes, and underscores"
        )
    # Check if valid (if list provided)
    if valid_groups:
        if group not in valid_groups:
            suggestions = [g for g in valid_groups if group[:3] in g]
            raise ValidationError(
                f"Invalid group: {group}\n"
                f"Valid groups: {', '.join(valid_groups)}\n"
                + (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
            )
    return group
 def validate_path_exists(path: str, must_be_file: bool = False,
                        must_be_dir: bool = False) -> Path:
    """
    Validate path exists and is accessible.
    Args:
        path: Path to validate
        must_be_file: If True, path must be a file
        must_be_dir: If True, path must be a directory
    Returns:
        Path: Validated Path object
    Raises:
        ValidationError: If path is invalid
    Example:
        >>> validate_path_exists("/tmp", must_be_dir=True)
        Path('/tmp')
        >>> validate_path_exists("/nonexistent")
        ValidationError: Path does not exist: /nonexistent
    """
    if not path:
        raise ValidationError("Path cannot be empty")
    p = Path(path).expanduser().resolve()
    if not p.exists():
        raise ValidationError(
            f"Path does not exist: {path}\n"
            f"Resolved to: {p}"
        )
    if must_be_file and not p.is_file():
        raise ValidationError(f"Path must be a file: {path}")
    if must_be_dir and not p.is_dir():
        raise ValidationError(f"Path must be a directory: {path}")
    return p
 def validate_timeout(timeout: int, min_timeout: int = 1,
                     max_timeout: int = 600) -> int:
    """
    Validate timeout parameter.
    Args:
        timeout: Timeout in seconds
        min_timeout: Minimum allowed timeout
        max_timeout: Maximum allowed timeout
    Returns:
        int: Validated timeout
    Raises:
        ValidationError: If timeout is invalid
    Example:
        >>> validate_timeout(10)
        10
        >>> validate_timeout(0)
        ValidationError: Timeout must be between 1 and 600 seconds
    """
    if not isinstance(timeout, int):
        raise ValidationError(f"Timeout must be integer, got {type(timeout)}")
    if timeout < min_timeout:
        raise ValidationError(
            f"Timeout too low: {timeout}s (minimum: {min_timeout}s)"
        )
    if timeout > max_timeout:
        raise ValidationError(
            f"Timeout too high: {timeout}s (maximum: {max_timeout}s)"
        )
    return timeout
 def validate_command(command: str, allow_dangerous: bool = False) -> str:
    """
    Basic command safety validation.
    Args:
        command: Command to validate
        allow_dangerous: If False, block potentially dangerous commands
    Returns:
        str: Validated command
    Raises:
        ValidationError: If command is invalid or dangerous
    Example:
        >>> validate_command("ls -la")
        "ls -la"
        >>> validate_command("rm -rf /", allow_dangerous=False)
        ValidationError: Potentially dangerous command blocked: rm -rf
    """
    if not command:
        raise ValidationError("Command cannot be empty")
    if not isinstance(command, str):
        raise ValidationError(f"Command must be string, got {type(command)}")
    command = command.strip()
    if not allow_dangerous:
        # Check for dangerous patterns
        dangerous_patterns = [
            (r'\brm\s+-rf\s+/', "rm -rf on root directory"),
            (r'\bmkfs\.', "filesystem formatting"),
            (r'\bdd\s+.*of=/dev/', "disk writing with dd"),
            (r':(){:|:&};:', "fork bomb"),
            (r'>\s*/dev/sd[a-z]', "direct disk writing"),
        ]
        for pattern, description in dangerous_patterns:
            if re.search(pattern, command, re.IGNORECASE):
                raise ValidationError(
                    f"Potentially dangerous command blocked: {description}\n"
                    f"Command: {command}\n"
                    "Use allow_dangerous=True if you really want to execute this"
                )
    return command
 def validate_hosts_list(hosts: List[str], valid_hosts: Optional[List[str]] = None) -> List[str]:
    """
    Validate a list of hosts.
    Args:
        hosts: List of host names
        valid_hosts: List of valid hosts (None to skip check)
    Returns:
        List[str]: Validated host names
    Raises:
        ValidationError: If any host is invalid
    Example:
        >>> validate_hosts_list(["web-01", "web-02"])
        ["web-01", "web-02"]
    """
    if not hosts:
        raise ValidationError("Hosts list cannot be empty")
    if not isinstance(hosts, list):
        raise ValidationError(f"Hosts must be list, got {type(hosts)}")
    validated = []
    errors = []
    for host in hosts:
        try:
            validated.append(validate_host(host, valid_hosts))
        except ValidationError as e:
            errors.append(str(e))
    if errors:
        raise ValidationError(
            f"Invalid hosts in list:\n" + "\n".join(errors)
        )
    return validated
 def main():
    """Test validators."""
    print("Testing parameter validators...\n")
    # Test host validation
    print("1. Testing validate_host():")
    try:
        host = validate_host("web-01", ["web-01", "web-02", "db-01"])
        print(f"   ✓ Valid host: {host}")
    except ValidationError as e:
        print(f"   ✗ Error: {e}")
    try:
        host = validate_host("invalid-host", ["web-01", "web-02"])
        print(f"   ✗ Should have failed!")
    except ValidationError as e:
        print(f"   ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
    # Test group validation
    print("\n2. Testing validate_group():")
    try:
        group = validate_group("production", ["production", "development"])
        print(f"   ✓ Valid group: {group}")
    except ValidationError as e:
        print(f"   ✗ Error: {e}")
    # Test path validation
    print("\n3. Testing validate_path_exists():")
    try:
        path = validate_path_exists("/tmp", must_be_dir=True)
        print(f"   ✓ Valid path: {path}")
    except ValidationError as e:
        print(f"   ✗ Error: {e}")
    # Test timeout validation
    print("\n4. Testing validate_timeout():")
    try:
        timeout = validate_timeout(10)
        print(f"   ✓ Valid timeout: {timeout}s")
    except ValidationError as e:
        print(f"   ✗ Error: {e}")
    try:
        timeout = validate_timeout(0)
        print(f"   ✗ Should have failed!")
    except ValidationError as e:
        print(f"   ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
    # Test command validation
    print("\n5. Testing validate_command():")
    try:
        cmd = validate_command("ls -la")
        print(f"   ✓ Safe command: {cmd}")
    except ValidationError as e:
        print(f"   ✗ Error: {e}")
    try:
        cmd = validate_command("rm -rf /", allow_dangerous=False)
        print(f"   ✗ Should have failed!")
    except ValidationError as e:
        print(f"   ✓ Correctly blocked: {e.args[0].split(chr(10))[0]}")
    print("\n✅ All parameter validators tested")
 if __name__ == "__main__":
    main()
--- a/scripts/workflow_executor.py
+++ b/scripts/workflow_executor.py
@@ -0,0 +1,445 @@
 #!/usr/bin/env python3
 """
 Workflow executor for Tailscale SSH Sync Agent.
 Common multi-machine workflow automation.
 """
 import sys
 from pathlib import Path
 from typing import Dict, List, Optional
 import time
 import logging
 # Add utils to path
 sys.path.insert(0, str(Path(__file__).parent))
 from utils.helpers import format_duration, get_timestamp
 from sshsync_wrapper import execute_on_group, execute_on_host, push_to_hosts
 from load_balancer import get_group_capacity
 logger = logging.getLogger(__name__)
 def deploy_workflow(code_path: str,
                    staging_group: str,
                    prod_group: str,
                    run_tests: bool = True) -> Dict:
    """
    Full deployment pipeline: staging → test → production.
    Args:
        code_path: Path to code to deploy
        staging_group: Staging server group
        prod_group: Production server group
        run_tests: Whether to run tests on staging
    Returns:
        Dict with deployment results
    Example:
        >>> result = deploy_workflow("./dist", "staging", "production")
        >>> result['success']
        True
        >>> result['duration']
        "12m 45s"
    """
    start_time = time.time()
    results = {
        'stages': {},
        'success': False,
        'start_time': get_timestamp()
    }
    try:
        # Stage 1: Deploy to staging
        logger.info("Stage 1: Deploying to staging...")
        stage1 = push_to_hosts(
            local_path=code_path,
            remote_path="/var/www/app",
            group=staging_group,
            recurse=True
        )
        results['stages']['staging_deploy'] = stage1
        if not stage1.get('success'):
            results['error'] = 'Staging deployment failed'
            return results
        # Build on staging
        logger.info("Building on staging...")
        build_result = execute_on_group(
            staging_group,
            "cd /var/www/app && npm run build",
            timeout=300
        )
        results['stages']['staging_build'] = build_result
        if not build_result.get('success'):
            results['error'] = 'Staging build failed'
            return results
        # Stage 2: Run tests (if enabled)
        if run_tests:
            logger.info("Stage 2: Running tests...")
            test_result = execute_on_group(
                staging_group,
                "cd /var/www/app && npm test",
                timeout=600
            )
            results['stages']['tests'] = test_result
            if not test_result.get('success'):
                results['error'] = 'Tests failed on staging'
                return results
        # Stage 3: Validation
        logger.info("Stage 3: Validating staging...")
        health_result = execute_on_group(
            staging_group,
            "curl -f http://localhost:3000/health || echo 'Health check failed'",
            timeout=10
        )
        results['stages']['staging_validation'] = health_result
        # Stage 4: Deploy to production
        logger.info("Stage 4: Deploying to production...")
        prod_deploy = push_to_hosts(
            local_path=code_path,
            remote_path="/var/www/app",
            group=prod_group,
            recurse=True
        )
        results['stages']['production_deploy'] = prod_deploy
        if not prod_deploy.get('success'):
            results['error'] = 'Production deployment failed'
            return results
        # Build and restart on production
        logger.info("Building and restarting production...")
        prod_build = execute_on_group(
            prod_group,
            "cd /var/www/app && npm run build && pm2 restart app",
            timeout=300
        )
        results['stages']['production_build'] = prod_build
        # Stage 5: Production verification
        logger.info("Stage 5: Verifying production...")
        prod_health = execute_on_group(
            prod_group,
            "curl -f http://localhost:3000/health",
            timeout=15
        )
        results['stages']['production_verification'] = prod_health
        # Success!
        results['success'] = True
        results['duration'] = format_duration(time.time() - start_time)
        return results
    except Exception as e:
        logger.error(f"Deployment workflow error: {e}")
        results['error'] = str(e)
        results['duration'] = format_duration(time.time() - start_time)
        return results
 def backup_workflow(hosts: List[str],
                   backup_paths: List[str],
                   destination: str) -> Dict:
    """
    Backup files from multiple hosts.
    Args:
        hosts: List of hosts to backup from
        backup_paths: Paths to backup on each host
        destination: Local destination directory
    Returns:
        Dict with backup results
    Example:
        >>> result = backup_workflow(
        ...     ["db-01", "db-02"],
        ...     ["/var/lib/mysql"],
        ...     "./backups"
        ... )
        >>> result['backed_up_hosts']
        2
    """
    from sshsync_wrapper import pull_from_host
    start_time = time.time()
    results = {
        'hosts': {},
        'success': True,
        'backed_up_hosts': 0
    }
    for host in hosts:
        host_results = []
        for backup_path in backup_paths:
            # Create timestamped backup directory
            timestamp = time.strftime("%Y%m%d_%H%M%S")
            host_dest = f"{destination}/{host}_{timestamp}"
            result = pull_from_host(
                host=host,
                remote_path=backup_path,
                local_path=host_dest,
                recurse=True
            )
            host_results.append(result)
            if not result.get('success'):
                results['success'] = False
        results['hosts'][host] = host_results
        if all(r.get('success') for r in host_results):
            results['backed_up_hosts'] += 1
    results['duration'] = format_duration(time.time() - start_time)
    return results
 def sync_workflow(source_host: str,
                 target_group: str,
                 paths: List[str]) -> Dict:
    """
    Sync files from one host to many.
    Args:
        source_host: Host to pull from
        target_group: Group to push to
        paths: Paths to sync
    Returns:
        Dict with sync results
    Example:
        >>> result = sync_workflow(
        ...     "master-db",
        ...     "replica-dbs",
        ...     ["/var/lib/mysql/config"]
        ... )
        >>> result['success']
        True
    """
    from sshsync_wrapper import pull_from_host, push_to_hosts
    import tempfile
    import shutil
    start_time = time.time()
    results = {'paths': {}, 'success': True}
    # Create temp directory
    with tempfile.TemporaryDirectory() as temp_dir:
        for path in paths:
            # Pull from source
            pull_result = pull_from_host(
                host=source_host,
                remote_path=path,
                local_path=f"{temp_dir}/{Path(path).name}",
                recurse=True
            )
            if not pull_result.get('success'):
                results['paths'][path] = {
                    'success': False,
                    'error': 'Pull from source failed'
                }
                results['success'] = False
                continue
            # Push to targets
            push_result = push_to_hosts(
                local_path=f"{temp_dir}/{Path(path).name}",
                remote_path=path,
                group=target_group,
                recurse=True
            )
            results['paths'][path] = {
                'pull': pull_result,
                'push': push_result,
                'success': push_result.get('success', False)
            }
            if not push_result.get('success'):
                results['success'] = False
    results['duration'] = format_duration(time.time() - start_time)
    return results
 def rolling_restart(group: str,
                   service_name: str,
                   wait_between: int = 30) -> Dict:
    """
    Zero-downtime rolling restart of a service across group.
    Args:
        group: Group to restart
        service_name: Service name (e.g., "nginx", "app")
        wait_between: Seconds to wait between restarts
    Returns:
        Dict with restart results
    Example:
        >>> result = rolling_restart("web-servers", "nginx")
        >>> result['restarted_count']
        3
    """
    from utils.helpers import parse_sshsync_config
    start_time = time.time()
    groups_config = parse_sshsync_config()
    hosts = groups_config.get(group, [])
    if not hosts:
        return {
            'success': False,
            'error': f'Group {group} not found or empty'
        }
    results = {
        'hosts': {},
        'restarted_count': 0,
        'failed_count': 0,
        'success': True
    }
    for host in hosts:
        logger.info(f"Restarting {service_name} on {host}...")
        # Restart service
        restart_result = execute_on_host(
            host,
            f"sudo systemctl restart {service_name} || sudo service {service_name} restart",
            timeout=30
        )
        # Health check
        time.sleep(5)  # Wait for service to start
        health_result = execute_on_host(
            host,
            f"sudo systemctl is-active {service_name} || sudo service {service_name} status",
            timeout=10
        )
        success = restart_result.get('success') and health_result.get('success')
        results['hosts'][host] = {
            'restart': restart_result,
            'health': health_result,
            'success': success
        }
        if success:
            results['restarted_count'] += 1
            logger.info(f"✓ {host} restarted successfully")
        else:
            results['failed_count'] += 1
            results['success'] = False
            logger.error(f"✗ {host} restart failed")
        # Wait before next restart (except last)
        if host != hosts[-1]:
            time.sleep(wait_between)
    results['duration'] = format_duration(time.time() - start_time)
    return results
 def health_check_workflow(group: str,
                         endpoint: str = "/health",
                         timeout: int = 10) -> Dict:
    """
    Check health endpoint across group.
    Args:
        group: Group to check
        endpoint: Health endpoint path
        timeout: Request timeout
    Returns:
        Dict with health check results
    Example:
        >>> result = health_check_workflow("production", "/health")
        >>> result['healthy_count']
        3
    """
    from utils.helpers import parse_sshsync_config
    groups_config = parse_sshsync_config()
    hosts = groups_config.get(group, [])
    if not hosts:
        return {
            'success': False,
            'error': f'Group {group} not found or empty'
        }
    results = {
        'hosts': {},
        'healthy_count': 0,
        'unhealthy_count': 0
    }
    for host in hosts:
        health_result = execute_on_host(
            host,
            f"curl -f -s -o /dev/null -w '%{{http_code}}' http://localhost:3000{endpoint}",
            timeout=timeout
        )
        is_healthy = (
            health_result.get('success') and
            '200' in health_result.get('stdout', '')
        )
        results['hosts'][host] = {
            'healthy': is_healthy,
            'response': health_result.get('stdout', '').strip()
        }
        if is_healthy:
            results['healthy_count'] += 1
        else:
            results['unhealthy_count'] += 1
    results['success'] = results['unhealthy_count'] == 0
    return results
 def main():
    """Test workflow executor functions."""
    print("Testing workflow executor...\n")
    print("Note: Workflow executor requires configured hosts and groups.")
    print("Tests would execute real operations, so showing dry-run simulations.\n")
    print("✅ Workflow executor ready")
 if __name__ == "__main__":
    main()
--- a/tests/test_helpers.py
+++ b/tests/test_helpers.py
@@ -0,0 +1,180 @@
 #!/usr/bin/env python3
 """
 Tests for helper utilities.
 """
 import sys
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
 from utils.helpers import *
 def test_format_bytes():
    """Test byte formatting."""
    assert format_bytes(0) == "0.0 B"
    assert format_bytes(512) == "512.0 B"
    assert format_bytes(1024) == "1.0 KB"
    assert format_bytes(1048576) == "1.0 MB"
    assert format_bytes(1073741824) == "1.0 GB"
    print("✓ format_bytes() passed")
    return True
 def test_format_duration():
    """Test duration formatting."""
    assert format_duration(30) == "30s"
    assert format_duration(65) == "1m 5s"
    assert format_duration(3600) == "1h"
    assert format_duration(3665) == "1h 1m"
    assert format_duration(7265) == "2h 1m"
    print("✓ format_duration() passed")
    return True
 def test_format_percentage():
    """Test percentage formatting."""
    assert format_percentage(45.567) == "45.6%"
    assert format_percentage(100) == "100.0%"
    assert format_percentage(0.123, decimals=2) == "0.12%"
    print("✓ format_percentage() passed")
    return True
 def test_calculate_load_score():
    """Test load score calculation."""
    score = calculate_load_score(50, 50, 50)
    assert 0 <= score <= 1
    assert abs(score - 0.5) < 0.01
    score_low = calculate_load_score(20, 30, 25)
    score_high = calculate_load_score(80, 85, 90)
    assert score_low < score_high
    print("✓ calculate_load_score() passed")
    return True
 def test_classify_load_status():
    """Test load status classification."""
    assert classify_load_status(0.2) == "low"
    assert classify_load_status(0.5) == "moderate"
    assert classify_load_status(0.8) == "high"
    print("✓ classify_load_status() passed")
    return True
 def test_classify_latency():
    """Test latency classification."""
    status, desc = classify_latency(25)
    assert status == "excellent"
    assert "interactive" in desc.lower()
    status, desc = classify_latency(150)
    assert status == "fair"
    print("✓ classify_latency() passed")
    return True
 def test_parse_disk_usage():
    """Test disk usage parsing."""
    sample_output = """Filesystem     Size  Used Avail Use% Mounted on
 /dev/sda1      100G   45G   50G  45% /"""
    result = parse_disk_usage(sample_output)
    assert result['filesystem'] == '/dev/sda1'
    assert result['size'] == '100G'
    assert result['used'] == '45G'
    assert result['use_pct'] == 45
    print("✓ parse_disk_usage() passed")
    return True
 def test_parse_cpu_load():
    """Test CPU load parsing."""
    sample_output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
    result = parse_cpu_load(sample_output)
    assert result['load_1min'] == 0.45
    assert result['load_5min'] == 0.38
    assert result['load_15min'] == 0.32
    print("✓ parse_cpu_load() passed")
    return True
 def test_get_timestamp():
    """Test timestamp generation."""
    ts_iso = get_timestamp(iso=True)
    assert 'T' in ts_iso
    assert 'Z' in ts_iso
    ts_human = get_timestamp(iso=False)
    assert ' ' in ts_human
    assert len(ts_human) == 19  # YYYY-MM-DD HH:MM:SS
    print("✓ get_timestamp() passed")
    return True
 def test_validate_path():
    """Test path validation."""
    assert validate_path("/tmp", must_exist=True) == True
    assert validate_path("/nonexistent_path_12345", must_exist=False) == False
    print("✓ validate_path() passed")
    return True
 def test_safe_execute():
    """Test safe execution wrapper."""
    # Should return result on success
    result = safe_execute(int, "42")
    assert result == 42
    # Should return default on failure
    result = safe_execute(int, "not_a_number", default=0)
    assert result == 0
    print("✓ safe_execute() passed")
    return True
 def main():
    """Run all helper tests."""
    print("=" * 70)
    print("HELPER TESTS")
    print("=" * 70)
    tests = [
        test_format_bytes,
        test_format_duration,
        test_format_percentage,
        test_calculate_load_score,
        test_classify_load_status,
        test_classify_latency,
        test_parse_disk_usage,
        test_parse_cpu_load,
        test_get_timestamp,
        test_validate_path,
        test_safe_execute,
    ]
    passed = 0
    for test in tests:
        try:
            if test():
                passed += 1
        except Exception as e:
            print(f"✗ {test.__name__} failed: {e}")
    print(f"\nResults: {passed}/{len(tests)} passed")
    return passed == len(tests)
 if __name__ == "__main__":
    success = main()
    sys.exit(0 if success else 1)
--- a/tests/test_integration.py
+++ b/tests/test_integration.py
@@ -0,0 +1,346 @@
 #!/usr/bin/env python3
 """
 Integration tests for Tailscale SSH Sync Agent.
 Tests complete workflows from query to result.
 """
 import sys
 from pathlib import Path
 # Add scripts to path
 sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
 from sshsync_wrapper import get_host_status, list_hosts, get_groups
 from tailscale_manager import get_tailscale_status, get_network_summary
 from load_balancer import format_load_report, MachineMetrics
 from utils.helpers import (
    format_bytes, format_duration, format_percentage,
    calculate_load_score, classify_load_status, classify_latency
 )
 def test_host_status_basic():
    """Test get_host_status() without errors."""
    print("\n✓ Testing get_host_status()...")
    try:
        result = get_host_status()
        # Validations
        assert 'hosts' in result, "Missing 'hosts' in result"
        assert isinstance(result.get('hosts', []), list), "'hosts' must be list"
        # Should have basic counts even if no hosts configured
        assert 'total_count' in result, "Missing 'total_count'"
        assert 'online_count' in result, "Missing 'online_count'"
        assert 'offline_count' in result, "Missing 'offline_count'"
        print(f"  ✓ Found {result.get('total_count', 0)} hosts")
        print(f"  ✓ Online: {result.get('online_count', 0)}")
        print(f"  ✓ Offline: {result.get('offline_count', 0)}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        import traceback
        traceback.print_exc()
        return False
 def test_list_hosts():
    """Test list_hosts() function."""
    print("\n✓ Testing list_hosts()...")
    try:
        result = list_hosts(with_status=False)
        assert 'hosts' in result, "Missing 'hosts' in result"
        assert 'count' in result, "Missing 'count' in result"
        assert isinstance(result['hosts'], list), "'hosts' must be list"
        print(f"  ✓ List hosts working")
        print(f"  ✓ Found {result['count']} configured hosts")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_get_groups():
    """Test get_groups() function."""
    print("\n✓ Testing get_groups()...")
    try:
        groups = get_groups()
        assert isinstance(groups, dict), "Groups must be dict"
        print(f"  ✓ Groups config loaded")
        print(f"  ✓ Found {len(groups)} groups")
        for group, hosts in list(groups.items())[:3]:  # Show first 3
            print(f"    - {group}: {len(hosts)} hosts")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_tailscale_status():
    """Test Tailscale status check."""
    print("\n✓ Testing get_tailscale_status()...")
    try:
        status = get_tailscale_status()
        assert isinstance(status, dict), "Status must be dict"
        assert 'connected' in status, "Missing 'connected' field"
        if status.get('connected'):
            print(f"  ✓ Tailscale connected")
            print(f"  ✓ Peers: {status.get('total_count', 0)} total, {status.get('online_count', 0)} online")
        else:
            print(f"  ℹ Tailscale not connected: {status.get('error', 'Unknown')}")
            print(f"  (This is OK if Tailscale is not installed/configured)")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_network_summary():
    """Test network summary generation."""
    print("\n✓ Testing get_network_summary()...")
    try:
        summary = get_network_summary()
        assert isinstance(summary, str), "Summary must be string"
        assert len(summary) > 0, "Summary cannot be empty"
        print(f"  ✓ Network summary generated:")
        for line in summary.split('\n'):
            print(f"    {line}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_format_helpers():
    """Test formatting helper functions."""
    print("\n✓ Testing format helpers...")
    try:
        # Test format_bytes
        assert format_bytes(1024) == "1.0 KB", "format_bytes failed for 1024"
        assert format_bytes(12582912) == "12.0 MB", "format_bytes failed for 12MB"
        # Test format_duration
        assert format_duration(65) == "1m 5s", "format_duration failed for 65s"
        assert format_duration(3665) == "1h 1m", "format_duration failed for 1h+"
        # Test format_percentage
        assert format_percentage(45.567) == "45.6%", "format_percentage failed"
        print(f"  ✓ format_bytes(12582912) = {format_bytes(12582912)}")
        print(f"  ✓ format_duration(3665) = {format_duration(3665)}")
        print(f"  ✓ format_percentage(45.567) = {format_percentage(45.567)}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_load_score_calculation():
    """Test load score calculation."""
    print("\n✓ Testing calculate_load_score()...")
    try:
        # Test various scenarios
        score1 = calculate_load_score(45, 60, 40)
        assert 0 <= score1 <= 1, "Score must be 0-1"
        assert abs(score1 - 0.49) < 0.01, f"Expected ~0.49, got {score1}"
        score2 = calculate_load_score(20, 35, 30)
        assert score2 < score1, "Lower usage should have lower score"
        score3 = calculate_load_score(85, 70, 65)
        assert score3 > score1, "Higher usage should have higher score"
        print(f"  ✓ Low load (20%, 35%, 30%): {score2:.2f}")
        print(f"  ✓ Med load (45%, 60%, 40%): {score1:.2f}")
        print(f"  ✓ High load (85%, 70%, 65%): {score3:.2f}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_load_classification():
    """Test load status classification."""
    print("\n✓ Testing classify_load_status()...")
    try:
        assert classify_load_status(0.28) == "low", "0.28 should be 'low'"
        assert classify_load_status(0.55) == "moderate", "0.55 should be 'moderate'"
        assert classify_load_status(0.82) == "high", "0.82 should be 'high'"
        print(f"  ✓ Score 0.28 = {classify_load_status(0.28)}")
        print(f"  ✓ Score 0.55 = {classify_load_status(0.55)}")
        print(f"  ✓ Score 0.82 = {classify_load_status(0.82)}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_latency_classification():
    """Test network latency classification."""
    print("\n✓ Testing classify_latency()...")
    try:
        status1, desc1 = classify_latency(25)
        assert status1 == "excellent", "25ms should be 'excellent'"
        status2, desc2 = classify_latency(75)
        assert status2 == "good", "75ms should be 'good'"
        status3, desc3 = classify_latency(150)
        assert status3 == "fair", "150ms should be 'fair'"
        status4, desc4 = classify_latency(250)
        assert status4 == "poor", "250ms should be 'poor'"
        print(f"  ✓ 25ms: {status1} - {desc1}")
        print(f"  ✓ 75ms: {status2} - {desc2}")
        print(f"  ✓ 150ms: {status3} - {desc3}")
        print(f"  ✓ 250ms: {status4} - {desc4}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_load_report_formatting():
    """Test load report formatting."""
    print("\n✓ Testing format_load_report()...")
    try:
        metrics = MachineMetrics(
            host='web-01',
            cpu_pct=45.0,
            mem_pct=60.0,
            disk_pct=40.0,
            load_score=0.49,
            status='moderate'
        )
        report = format_load_report(metrics)
        assert 'web-01' in report, "Report must include hostname"
        assert '0.49' in report, "Report must include load score"
        assert 'moderate' in report, "Report must include status"
        print(f"  ✓ Report generated:")
        for line in report.split('\n'):
            print(f"    {line}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def test_dry_run_execution():
    """Test dry-run mode for operations."""
    print("\n✓ Testing dry-run execution...")
    try:
        from sshsync_wrapper import execute_on_all
        result = execute_on_all("uptime", dry_run=True)
        assert result.get('dry_run') == True, "Must indicate dry-run mode"
        assert 'command' in result, "Must include command"
        assert 'message' in result, "Must include message"
        print(f"  ✓ Dry-run mode working")
        print(f"  ✓ Command: {result.get('command')}")
        print(f"  ✓ Message: {result.get('message')}")
        return True
    except Exception as e:
        print(f"  ✗ FAILED: {e}")
        return False
 def main():
    """Run all integration tests."""
    print("=" * 70)
    print("INTEGRATION TESTS - Tailscale SSH Sync Agent")
    print("=" * 70)
    tests = [
        ("Host status check", test_host_status_basic),
        ("List hosts", test_list_hosts),
        ("Get groups", test_get_groups),
        ("Tailscale status", test_tailscale_status),
        ("Network summary", test_network_summary),
        ("Format helpers", test_format_helpers),
        ("Load score calculation", test_load_score_calculation),
        ("Load classification", test_load_classification),
        ("Latency classification", test_latency_classification),
        ("Load report formatting", test_load_report_formatting),
        ("Dry-run execution", test_dry_run_execution),
    ]
    results = []
    for test_name, test_func in tests:
        passed = test_func()
        results.append((test_name, passed))
    # Summary
    print("\n" + "=" * 70)
    print("SUMMARY")
    print("=" * 70)
    for test_name, passed in results:
        status = "✅ PASS" if passed else "❌ FAIL"
        print(f"{status}: {test_name}")
    passed_count = sum(1 for _, p in results if p)
    total_count = len(results)
    print(f"\nResults: {passed_count}/{total_count} passed")
    if passed_count == total_count:
        print("\n🎉 All tests passed!")
    else:
        print(f"\n⚠️  {total_count - passed_count} test(s) failed")
    return passed_count == total_count
 if __name__ == "__main__":
    success = main()
    sys.exit(0 if success else 1)
--- a/tests/test_validation.py
+++ b/tests/test_validation.py
@@ -0,0 +1,177 @@
 #!/usr/bin/env python3
 """
 Tests for validators.
 """
 import sys
 from pathlib import Path
 sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
 from utils.validators import *
 def test_validate_host():
    """Test host validation."""
    # Valid host
    assert validate_host("web-01") == "web-01"
    assert validate_host(" web-01 ") == "web-01"  # Strips whitespace
    # With valid list
    assert validate_host("web-01", ["web-01", "web-02"]) == "web-01"
    # Invalid format
    try:
        validate_host("web@01")  # Invalid character
        assert False, "Should have raised ValidationError"
    except ValidationError:
        pass
    print("✓ validate_host() passed")
    return True
 def test_validate_group():
    """Test group validation."""
    # Valid group
    assert validate_group("production") == "production"
    assert validate_group("PRODUCTION") == "production"  # Lowercase normalization
    # With valid list
    assert validate_group("production", ["production", "staging"]) == "production"
    # Invalid
    try:
        validate_group("invalid!", ["production"])
        assert False, "Should have raised ValidationError"
    except ValidationError:
        pass
    print("✓ validate_group() passed")
    return True
 def test_validate_path_exists():
    """Test path existence validation."""
    # Valid path
    path = validate_path_exists("/tmp", must_be_dir=True)
    assert isinstance(path, Path)
    # Invalid path
    try:
        validate_path_exists("/nonexistent_12345")
        assert False, "Should have raised ValidationError"
    except ValidationError:
        pass
    print("✓ validate_path_exists() passed")
    return True
 def test_validate_timeout():
    """Test timeout validation."""
    # Valid timeouts
    assert validate_timeout(10) == 10
    assert validate_timeout(1) == 1
    assert validate_timeout(600) == 600
    # Too low
    try:
        validate_timeout(0)
        assert False, "Should have raised ValidationError"
    except ValidationError:
        pass
    # Too high
    try:
        validate_timeout(1000)
        assert False, "Should have raised ValidationError"
    except ValidationError:
        pass
    print("✓ validate_timeout() passed")
    return True
 def test_validate_command():
    """Test command validation."""
    # Safe commands
    assert validate_command("ls -la") == "ls -la"
    assert validate_command("uptime") == "uptime"
    # Dangerous commands (should fail without allow_dangerous)
    try:
        validate_command("rm -rf /")
        assert False, "Should have blocked dangerous command"
    except ValidationError:
        pass
    # But should work with allow_dangerous
    assert validate_command("rm -rf /tmp/test", allow_dangerous=True)
    print("✓ validate_command() passed")
    return True
 def test_validate_hosts_list():
    """Test list validation."""
    # Valid list
    hosts = validate_hosts_list(["web-01", "web-02"])
    assert len(hosts) == 2
    assert "web-01" in hosts
    # Empty list
    try:
        validate_hosts_list([])
        assert False, "Should have raised ValidationError for empty list"
    except ValidationError:
        pass
    print("✓ validate_hosts_list() passed")
    return True
 def test_get_invalid_hosts():
    """Test finding invalid hosts."""
    # Test with mix of valid and invalid
    # (This would require actual SSH config, so we test the function exists)
    result = get_invalid_hosts(["web-01", "nonexistent-host-12345"])
    assert isinstance(result, list)
    print("✓ get_invalid_hosts() passed")
    return True
 def main():
    """Run all validation tests."""
    print("=" * 70)
    print("VALIDATION TESTS")
    print("=" * 70)
    tests = [
        test_validate_host,
        test_validate_group,
        test_validate_path_exists,
        test_validate_timeout,
        test_validate_command,
        test_validate_hosts_list,
        test_get_invalid_hosts,
    ]
    passed = 0
    for test in tests:
        try:
            if test():
                passed += 1
        except Exception as e:
            print(f"✗ {test.__name__} failed: {e}")
            import traceback
            traceback.print_exc()
    print(f"\nResults: {passed}/{len(tests)} passed")
    return passed == len(tests)
 if __name__ == "__main__":
    success = main()
    sys.exit(0 if success else 1)
		`@@ -0,0 +1,3 @@`
							`# tailscale-sshsync-agent`

							`Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.`