Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:47:40 +08:00
commit 14c678ceac
22 changed files with 7501 additions and 0 deletions

View File

@@ -0,0 +1,12 @@
{
"name": "tailscale-sshsync-agent",
"description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
"version": "0.0.0-2025.11.28",
"author": {
"name": "William VanSickle III",
"email": "noreply@humanfrontierlabs.com"
},
"skills": [
"./"
]
}

163
CHANGELOG.md Normal file
View File

@@ -0,0 +1,163 @@
# Changelog
All notable changes to Tailscale SSH Sync Agent will be documented here.
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
Versioning follows [Semantic Versioning](https://semver.org/).
## [1.0.0] - 2025-10-19
### Added
**Core Functionality:**
- `sshsync_wrapper.py`: Python interface to sshsync CLI operations
- `get_host_status()`: Check online/offline status of hosts
- `execute_on_all()`: Run commands on all configured hosts
- `execute_on_group()`: Run commands on specific groups
- `execute_on_host()`: Run commands on single host
- `push_to_hosts()`: Push files to multiple hosts (with groups support)
- `pull_from_host()`: Pull files from hosts
- `list_hosts()`: List all configured hosts
- `get_groups()`: Get group configuration
- `tailscale_manager.py`: Tailscale-specific operations
- `get_tailscale_status()`: Get complete network status
- `check_connectivity()`: Ping hosts via Tailscale
- `get_peer_info()`: Get detailed peer information
- `list_online_machines()`: List all online Tailscale machines
- `validate_tailscale_ssh()`: Check if Tailscale SSH works for a host
- `get_network_summary()`: Human-readable network summary
- `load_balancer.py`: Intelligent task distribution
- `get_machine_load()`: Get CPU, memory, disk metrics for a machine
- `select_optimal_host()`: Pick best host based on current load
- `get_group_capacity()`: Get aggregate capacity of a group
- `distribute_tasks()`: Distribute multiple tasks optimally across hosts
- `format_load_report()`: Format load metrics as human-readable report
- `workflow_executor.py`: Common multi-machine workflows
- `deploy_workflow()`: Full deployment pipeline (staging → test → production)
- `backup_workflow()`: Backup files from multiple hosts
- `sync_workflow()`: Sync files from one host to many
- `rolling_restart()`: Zero-downtime service restart across group
- `health_check_workflow()`: Check health endpoints across group
**Utilities:**
- `utils/helpers.py`: Common formatting and parsing functions
- Byte formatting (`format_bytes`)
- Duration formatting (`format_duration`)
- Percentage formatting (`format_percentage`)
- SSH config parsing (`parse_ssh_config`)
- sshsync config parsing (`parse_sshsync_config`)
- System metrics parsing (`parse_disk_usage`, `parse_memory_usage`, `parse_cpu_load`)
- Load score calculation (`calculate_load_score`)
- Status classification (`classify_load_status`, `classify_latency`)
- Safe command execution (`run_command`, `safe_execute`)
- `utils/validators/`: Comprehensive validation system
- `parameter_validator.py`: Input validation (hosts, groups, paths, timeouts, commands)
- `host_validator.py`: Host configuration and availability validation
- `connection_validator.py`: SSH and Tailscale connection validation
**Testing:**
- `tests/test_integration.py`: 11 end-to-end integration tests
- `tests/test_helpers.py`: 11 helper function tests
- `tests/test_validation.py`: 7 validation tests
- **Total: 29 tests** covering all major functionality
**Documentation:**
- `SKILL.md`: Complete skill documentation (6,000+ words)
- When to use this skill
- How it works
- Data sources (sshsync CLI, Tailscale)
- Detailed workflows for each operation type
- Available scripts and functions
- Error handling and validations
- Performance and caching strategies
- Usage examples
- `references/sshsync-guide.md`: Complete sshsync CLI reference
- `references/tailscale-integration.md`: Tailscale integration guide
- `README.md`: Installation and quick start guide
- `INSTALLATION.md`: Detailed setup tutorial
- `DECISIONS.md`: Architecture decisions and rationale
### Data Sources
**sshsync CLI:**
- Installation: `pip install sshsync`
- Configuration: `~/.config/sshsync/config.yaml`
- SSH config integration: `~/.ssh/config`
- Group-based host management
- Remote command execution with timeouts
- File push/pull operations (single or recursive)
- Status checking and connectivity validation
**Tailscale:**
- Zero-config VPN with WireGuard encryption
- MagicDNS for easy host addressing
- Built-in SSH capabilities
- Seamless integration with standard SSH
- Peer-to-peer connections
- Works across NATs and firewalls
### Coverage
**Operations:**
- Host status monitoring and availability checks
- Intelligent load-based task distribution
- Multi-host command execution (all hosts, groups, individual)
- File synchronization workflows (push/pull)
- Deployment pipelines (staging → production)
- Backup and sync workflows
- Rolling restarts with zero downtime
- Health checking across services
**Geographic Coverage:** All hosts in Tailscale network (global)
**Temporal Coverage:** Real-time status and operations
### Known Limitations
**v1.0.0:**
- sshsync must be installed separately (`pip install sshsync`)
- Tailscale must be configured separately
- SSH keys must be set up manually on each host
- Load balancing uses simple metrics (CPU, memory, disk)
- No built-in monitoring dashboards (terminal output only)
- No persistence of operation history (logs only)
- Requires SSH config and sshsync config to be manually maintained
### Planned for v2.0
**Enhanced Features:**
- Automated SSH key distribution across hosts
- Built-in operation history and logging database
- Web dashboard for monitoring and operations
- Advanced load balancing with custom metrics
- Scheduled operations and cron integration
- Operation rollback capabilities
- Integration with configuration management tools (Ansible, Terraform)
- Cost tracking for cloud resources
- Performance metrics collection and visualization
- Alert system for failed operations
- Multi-tenancy support for team environments
**Integrations:**
- Prometheus metrics export
- Grafana dashboard templates
- Slack/Discord notifications
- CI/CD pipeline integration
- Container orchestration support (Docker, Kubernetes)
## [Unreleased]
### Planned
- Add support for Windows hosts (PowerShell remoting)
- Improve performance for large host groups (100+)
- Add SSH connection pooling for faster operations
- Implement operation queueing for long-running tasks
- Add support for custom validation plugins
- Expand coverage to Docker containers via SSH
- Add retry strategies with exponential backoff
- Implement circuit breaker pattern for failing hosts

458
DECISIONS.md Normal file
View File

@@ -0,0 +1,458 @@
# Architecture Decisions
Documentation of all technical decisions made for Tailscale SSH Sync Agent.
## Tool Selection
### Selected Tool: sshsync
**Justification:**
**Advantages:**
- **Ready-to-use**: Available via `pip install sshsync`
- **Group management**: Built-in support for organizing hosts into groups
- **Integration**: Works with existing SSH config (`~/.ssh/config`)
- **Simple API**: Easy-to-wrap CLI interface
- **Parallel execution**: Commands run concurrently across hosts
- **File operations**: Push/pull with recursive support
- **Timeout handling**: Per-command timeouts for reliability
- **Active maintenance**: Regular updates and bug fixes
- **Python-based**: Easy to extend and integrate
**Coverage:**
- All SSH-accessible hosts
- Works with any SSH server (Linux, macOS, BSD, etc.)
- Platform-agnostic (runs on any OS with Python)
**Cost:**
- Free and open-source
- No API keys or subscriptions required
- No rate limits
**Documentation:**
- Clear command-line interface
- PyPI documentation available
- GitHub repository with examples
**Alternatives Considered:**
**Fabric (Python library)**
- Pros: Pure Python, very flexible
- Cons: Requires writing more code, no built-in group management
- **Rejected because**: sshsync provides ready-made functionality
**Ansible**
- Pros: Industry standard, very powerful
- Cons: Requires learning YAML playbooks, overkill for simple operations
- **Rejected because**: Too heavyweight for ad-hoc commands and file transfers
**pssh (parallel-ssh)**
- Pros: Simple parallel SSH
- Cons: No group management, no file transfer built-in, less actively maintained
- **Rejected because**: sshsync has better group management and file operations
**Custom SSH wrapper**
- Pros: Full control
- Cons: Reinventing the wheel, maintaining parallel execution logic
- **Rejected because**: sshsync already provides what we need
**Conclusion:**
sshsync is the best tool for this use case because it:
1. Provides group-based host management out of the box
2. Handles parallel execution automatically
3. Integrates with existing SSH configuration
4. Supports both command execution and file transfers
5. Requires minimal wrapper code
## Integration: Tailscale
**Decision**: Integrate with Tailscale for network connectivity
**Justification:**
**Why Tailscale:**
- **Zero-config VPN**: No manual firewall/NAT configuration
- **Secure by default**: WireGuard encryption
- **Works everywhere**: Coffee shop, home, office, cloud
- **MagicDNS**: Easy addressing (machine-name.tailnet.ts.net)
- **Standard SSH**: Works with all SSH tools including sshsync
- **No overhead**: Uses regular SSH protocol over Tailscale network
**Integration approach:**
- Tailscale provides the network layer
- Standard SSH works over Tailscale
- sshsync operates normally using Tailscale hostnames/IPs
- No Tailscale-specific code needed in core operations
- Tailscale status checking for diagnostics
**Alternatives:**
**Direct public internet + port forwarding**
- Cons: Complex firewall setup, security risks, doesn't work on mobile/restricted networks
- **Rejected because**: Requires too much configuration and has security concerns
**Other VPNs (WireGuard, OpenVPN, ZeroTier)**
- Cons: More manual configuration, less zero-config
- **Rejected because**: Tailscale is easier to set up and use
**Conclusion:**
Tailscale + standard SSH is the optimal combination:
- Secure connectivity without configuration
- Works with existing SSH tools
- No vendor lock-in (can use other VPNs if needed)
## Architecture
### Structure: Modular Scripts + Utilities
**Decision**: Separate concerns into focused modules
```
scripts/
├── sshsync_wrapper.py # sshsync CLI interface
├── tailscale_manager.py # Tailscale operations
├── load_balancer.py # Task distribution logic
├── workflow_executor.py # Common workflows
└── utils/
├── helpers.py # Formatting, parsing
└── validators/ # Input validation
```
**Justification:**
**Modularity:**
- Each script has single responsibility
- Easy to test independently
- Easy to extend without breaking others
**Reusability:**
- Helpers used across all scripts
- Validators prevent duplicate validation logic
- Workflows compose lower-level operations
**Maintainability:**
- Clear file organization
- Easy to locate specific functionality
- Separation of concerns
**Alternatives:**
**Monolithic single script**
- Cons: Hard to test, hard to maintain, becomes too large
- **Rejected because**: Doesn't scale well
**Over-engineered class hierarchy**
- Cons: Unnecessary complexity for this use case
- **Rejected because**: Simple functions are sufficient
**Conclusion:**
Modular functional approach provides good balance of simplicity and maintainability.
### Validation Strategy: Multi-Layer
**Decision**: Validate at multiple layers
**Layers:**
1. **Parameter validation** (`parameter_validator.py`)
- Validates user inputs before any operations
- Prevents invalid hosts, groups, paths, etc.
2. **Host validation** (`host_validator.py`)
- Validates SSH configuration exists
- Checks host reachability
- Validates group membership
3. **Connection validation** (`connection_validator.py`)
- Tests actual SSH connectivity
- Verifies Tailscale status
- Checks SSH key authentication
**Justification:**
**Early failure:**
- Catch errors before expensive operations
- Clear error messages at each layer
**Comprehensive:**
- Multiple validation points catch different issues
- Reduces runtime failures
**User-friendly:**
- Helpful error messages with suggestions
- Clear indication of what went wrong
**Conclusion:**
Multi-layer validation provides robust error handling and great user experience.
## Load Balancing Strategy
### Decision: Simple Composite Score
**Formula:**
```python
score = (cpu_pct * 0.4) + (mem_pct * 0.3) + (disk_pct * 0.3)
```
**Weights:**
- CPU: 40% (most important for compute tasks)
- Memory: 30% (important for data processing)
- Disk: 30% (important for I/O operations)
**Justification:**
**Simple and effective:**
- Easy to understand
- Fast to calculate
- Works well for most workloads
**Balanced:**
- Considers multiple resource types
- No single metric dominates
**Alternatives:**
**CPU only**
- Cons: Ignores memory-bound and I/O-bound tasks
- **Rejected because**: Too narrow
**Complex ML-based prediction**
- Cons: Overkill, slow, requires training data
- **Rejected because**: Unnecessary complexity
**Fixed round-robin**
- Cons: Doesn't consider actual load
- **Rejected because**: Can overload already-busy hosts
**Conclusion:**
Simple weighted score provides good balance without complexity.
## Error Handling Philosophy
### Decision: Graceful Degradation + Clear Messages
**Principles:**
1. **Fail early with validation**: Catch errors before operations
2. **Isolate failures**: One host failure doesn't stop others
3. **Clear messages**: Tell user exactly what went wrong and how to fix
4. **Automatic retry**: Retry transient errors (network, timeout)
5. **Dry-run support**: Preview operations before execution
**Implementation:**
```python
# Example error handling pattern
try:
validate_host(host)
validate_ssh_connection(host)
result = execute_command(host, command)
except ValidationError as e:
return {'error': str(e), 'suggestion': 'Fix: ...'}
except ConnectionError as e:
return {'error': str(e), 'diagnostics': get_diagnostics(host)}
```
**Justification:**
**Better UX:**
- Users know exactly what's wrong
- Suggestions help fix issues quickly
**Reliability:**
- Automatic retry handles transient issues
- Dry-run prevents mistakes
**Debugging:**
- Clear error messages speed up troubleshooting
- Diagnostics provide actionable information
**Conclusion:**
Graceful degradation with helpful messages creates better user experience.
## Caching Strategy
**Decision**: Minimal caching for real-time accuracy
**What we cache:**
- Nothing (v1.0.0)
**Why no caching:**
- Host status changes frequently
- Load metrics change constantly
- Operations need real-time data
- Cache invalidation is complex
**Future consideration (v2.0):**
- Cache Tailscale status (60s TTL)
- Cache group configuration (5min TTL)
- Cache SSH config parsing (5min TTL)
**Justification:**
**Simplicity:**
- No cache invalidation logic needed
- No stale data issues
**Accuracy:**
- Always get current state
- No surprises from cached data
**Trade-off:**
- Slightly slower repeated operations
- More network calls
**Conclusion:**
For v1.0.0, simplicity and accuracy outweigh performance concerns. Real-time data is more valuable than speed.
## Testing Strategy
### Decision: Comprehensive Unit + Integration Tests
**Coverage:**
- **29 tests total:**
- 11 integration tests (end-to-end workflows)
- 11 helper tests (formatting, parsing, calculations)
- 7 validation tests (input validation, safety checks)
**Test Philosophy:**
1. **Test real functionality**: Integration tests use actual functions
2. **Test edge cases**: Validation tests cover error conditions
3. **Test helpers**: Ensure formatting/parsing works correctly
4. **Fast execution**: All tests run in < 10 seconds
5. **No external dependencies**: Tests don't require Tailscale or sshsync to be running
**Justification:**
**Confidence:**
- Tests verify code works as expected
- Catches regressions when modifying code
**Documentation:**
- Tests show how to use functions
- Examples of expected behavior
**Reliability:**
- Production-ready code from v1.0.0
**Conclusion:**
Comprehensive testing ensures reliable code from the start.
## Performance Considerations
### Parallel Execution
**Decision**: Leverage sshsync's built-in parallelization
- sshsync runs commands concurrently across hosts automatically
- No need to implement custom threading/multiprocessing
- Timeout applies per-host independently
**Trade-offs:**
**Pros:**
- Simple to use
- Fast for large host groups
- No concurrency bugs
⚠️ **Cons:**
- Less control over parallelism level
- Can overwhelm network with too many concurrent connections
**Conclusion:**
Built-in parallelization is sufficient for most use cases. Custom control can be added in v2.0 if needed.
## Security Considerations
### SSH Key Authentication
**Decision**: Require SSH keys (no password auth)
**Justification:**
**Security:**
- Keys are more secure than passwords
- Can't be brute-forced
- Can be revoked per-host
**Automation:**
- Non-interactive (no password prompts)
- Works in scripts and CI/CD
**Implementation:**
- Validators check SSH key auth works
- Clear error messages guide users to set up keys
- Documentation explains SSH key setup
### Command Safety
**Decision**: Validate dangerous commands
**Dangerous patterns blocked:**
- `rm -rf /` (root deletion)
- `mkfs.*` (filesystem formatting)
- `dd.*of=/dev/` (direct disk writes)
- Fork bombs
- Direct disk writes
**Override**: Use `allow_dangerous=True` to bypass
**Justification:**
**Safety:**
- Prevents accidental destructive operations
- Dry-run provides preview
**Flexibility:**
- Can still run dangerous commands if explicitly allowed
**Conclusion:**
Safety by default with escape hatch for advanced users.
## Decisions Summary
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **CLI Tool** | sshsync | Best balance of features, ease of use, and maintenance |
| **Network** | Tailscale | Zero-config secure VPN, works everywhere |
| **Architecture** | Modular scripts | Clear separation of concerns, maintainable |
| **Validation** | Multi-layer | Catch errors early with helpful messages |
| **Load Balancing** | Composite score | Simple, effective, considers multiple resources |
| **Caching** | None (v1.0) | Simplicity and real-time accuracy |
| **Testing** | 29 tests | Comprehensive coverage for reliability |
| **Security** | SSH keys + validation | Secure and automation-friendly |
## Trade-offs Accepted
1. **No caching** → Slightly slower, but always accurate
2. **sshsync dependency** → External tool, but saves development time
3. **SSH key requirement** → Setup needed, but more secure
4. **Simple load balancing** → Less sophisticated, but fast and easy to understand
5. **Terminal UI only** → No web dashboard, but simpler to develop and maintain
## Future Improvements
### v2.0 Considerations
1. **Add caching** for frequently-accessed data (Tailscale status, groups)
2. **Web dashboard** for visualization and monitoring
3. **Operation history** database for audit trail
4. **Advanced load balancing** with custom metrics
5. **Automated SSH key distribution** across hosts
6. **Integration with config management** tools (Ansible, Terraform)
7. **Container support** via SSH to Docker containers
8. **Custom validation plugins** for domain-specific checks
All decisions prioritize **simplicity**, **security**, and **maintainability** for v1.0.0.

707
INSTALLATION.md Normal file
View File

@@ -0,0 +1,707 @@
# Installation Guide
Complete step-by-step tutorial for setting up Tailscale SSH Sync Agent.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Step 1: Install Tailscale](#step-1-install-tailscale)
3. [Step 2: Install sshsync](#step-2-install-sshsync)
4. [Step 3: Configure SSH](#step-3-configure-ssh)
5. [Step 4: Configure sshsync Groups](#step-4-configure-sshsync-groups)
6. [Step 5: Install Agent](#step-5-install-agent)
7. [Step 6: Test Installation](#step-6-test-installation)
8. [Troubleshooting](#troubleshooting)
## Prerequisites
Before you begin, ensure you have:
- **Operating System**: macOS, Linux, or BSD
- **Python**: Version 3.10 or higher
- **pip**: Python package installer
- **Claude Code**: Installed and running
- **Remote machines**: At least one machine you want to manage
- **SSH access**: Ability to SSH to remote machines
**Check Python version**:
```bash
python3 --version
# Should show: Python 3.10.x or higher
```
**Check pip**:
```bash
pip3 --version
# Should show: pip xx.x.x from ...
```
## Step 1: Install Tailscale
Tailscale provides secure networking between your machines.
### macOS
```bash
# Install via Homebrew
brew install tailscale
# Start Tailscale
sudo tailscale up
# Follow authentication link in terminal
# This will open browser to log in
```
### Linux (Ubuntu/Debian)
```bash
# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
# Start and authenticate
sudo tailscale up
# Follow authentication link
```
### Linux (Fedora/RHEL)
```bash
# Add repository
sudo dnf config-manager --add-repo https://pkgs.tailscale.com/stable/fedora/tailscale.repo
# Install
sudo dnf install tailscale
# Enable and start
sudo systemctl enable --now tailscaled
sudo tailscale up
```
### Verify Installation
```bash
# Check Tailscale status
tailscale status
# Should show list of machines in your tailnet
# Example output:
# 100.64.1.10 homelab-1 user@ linux -
# 100.64.1.11 laptop user@ macOS -
```
**Important**: Install and authenticate Tailscale on **all machines** you want to manage.
## Step 2: Install sshsync
sshsync is the CLI tool for managing SSH operations across multiple hosts.
```bash
# Install via pip
pip3 install sshsync
# Or use pipx for isolated installation
pipx install sshsync
```
### Verify Installation
```bash
# Check version
sshsync --version
# Should show: sshsync, version x.x.x
```
### Common Installation Issues
**Issue**: `pip3: command not found`
**Solution**:
```bash
# macOS
brew install python3
# Linux (Ubuntu/Debian)
sudo apt install python3-pip
# Linux (Fedora/RHEL)
sudo dnf install python3-pip
```
**Issue**: Permission denied during install
**Solution**:
```bash
# Install for current user only
pip3 install --user sshsync
# Or use pipx
pip3 install --user pipx
pipx install sshsync
```
## Step 3: Configure SSH
SSH configuration defines how to connect to each machine.
### Step 3.1: Generate SSH Keys (if you don't have them)
```bash
# Generate ed25519 key (recommended)
ssh-keygen -t ed25519 -C "your_email@example.com"
# Press Enter to use default location (~/.ssh/id_ed25519)
# Enter passphrase (or leave empty for no passphrase)
```
**Output**:
```
Your identification has been saved in /Users/you/.ssh/id_ed25519
Your public key has been saved in /Users/you/.ssh/id_ed25519.pub
```
### Step 3.2: Copy Public Key to Remote Machines
For each remote machine:
```bash
# Copy SSH key to remote
ssh-copy-id user@machine-hostname
# Example:
ssh-copy-id admin@100.64.1.10
```
**Manual method** (if ssh-copy-id doesn't work):
```bash
# Display public key
cat ~/.ssh/id_ed25519.pub
# SSH to remote machine
ssh user@remote-host
# On remote machine:
mkdir -p ~/.ssh
chmod 700 ~/.ssh
echo "your-public-key-here" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
```
### Step 3.3: Test SSH Connection
```bash
# Test connection (should not ask for password)
ssh user@remote-host "hostname"
# If successful, should print remote hostname
```
### Step 3.4: Create SSH Config File
Edit `~/.ssh/config`:
```bash
vim ~/.ssh/config
```
**Add host entries**:
```
# Production servers
Host prod-web-01
HostName prod-web-01.tailnet.ts.net
User deploy
IdentityFile ~/.ssh/id_ed25519
Port 22
Host prod-web-02
HostName 100.64.1.21
User deploy
IdentityFile ~/.ssh/id_ed25519
Host prod-db-01
HostName 100.64.1.30
User deploy
IdentityFile ~/.ssh/id_ed25519
# Development
Host dev-laptop
HostName dev-laptop.tailnet.ts.net
User developer
IdentityFile ~/.ssh/id_ed25519
Host dev-desktop
HostName 100.64.1.40
User developer
IdentityFile ~/.ssh/id_ed25519
# Homelab
Host homelab-1
HostName 100.64.1.10
User admin
IdentityFile ~/.ssh/id_ed25519
Host homelab-2
HostName 100.64.1.11
User admin
IdentityFile ~/.ssh/id_ed25519
```
**Important fields**:
- **Host**: Alias you'll use (e.g., "homelab-1")
- **HostName**: Actual hostname or IP (Tailscale hostname or IP)
- **User**: SSH username on remote machine
- **IdentityFile**: Path to SSH private key
### Step 3.5: Set Correct Permissions
```bash
# SSH config should be readable only by you
chmod 600 ~/.ssh/config
# SSH directory permissions
chmod 700 ~/.ssh
# Private key permissions
chmod 600 ~/.ssh/id_ed25519
# Public key permissions
chmod 644 ~/.ssh/id_ed25519.pub
```
### Step 3.6: Verify All Hosts
Test each host in your config:
```bash
# Test each host
ssh homelab-1 "echo 'Connection successful'"
ssh prod-web-01 "echo 'Connection successful'"
ssh dev-laptop "echo 'Connection successful'"
# Should connect without asking for password
```
## Step 4: Configure sshsync Groups
Groups organize your hosts for easy management.
### Step 4.1: Initialize sshsync Configuration
```bash
# Sync hosts and create groups
sshsync sync
```
**What this does**:
1. Reads all hosts from `~/.ssh/config`
2. Prompts you to assign hosts to groups
3. Creates `~/.config/sshsync/config.yaml`
### Step 4.2: Follow Interactive Prompts
```
Found 7 ungrouped hosts:
1. homelab-1
2. homelab-2
3. prod-web-01
4. prod-web-02
5. prod-db-01
6. dev-laptop
7. dev-desktop
Assign groups now? [Y/n]: Y
Enter group name for homelab-1 (or skip): homelab
Enter group name for homelab-2 (or skip): homelab
Enter group name for prod-web-01 (or skip): production,web
Enter group name for prod-web-02 (or skip): production,web
Enter group name for prod-db-01 (or skip): production,database
Enter group name for dev-laptop (or skip): development
Enter group name for dev-desktop (or skip): development
```
**Tips**:
- Hosts can belong to multiple groups (separate with commas)
- Use meaningful group names (production, development, web, database, homelab)
- Skip hosts you don't want to group yet
### Step 4.3: Verify Configuration
```bash
# View generated config
cat ~/.config/sshsync/config.yaml
```
**Expected output**:
```yaml
groups:
production:
- prod-web-01
- prod-web-02
- prod-db-01
web:
- prod-web-01
- prod-web-02
database:
- prod-db-01
development:
- dev-laptop
- dev-desktop
homelab:
- homelab-1
- homelab-2
```
### Step 4.4: Test sshsync
```bash
# List hosts
sshsync ls
# List with status
sshsync ls --with-status
# Test command execution
sshsync all "hostname"
# Test group execution
sshsync group homelab "uptime"
```
## Step 5: Install Agent
### Step 5.1: Navigate to Agent Directory
```bash
cd /path/to/tailscale-sshsync-agent
```
### Step 5.2: Verify Agent Structure
```bash
# List files
ls -la
# Should see:
# .claude-plugin/
# scripts/
# tests/
# references/
# SKILL.md
# README.md
# VERSION
# CHANGELOG.md
# etc.
```
### Step 5.3: Validate marketplace.json
```bash
# Check JSON is valid
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json')); print('✅ Valid JSON')"
# Should output: ✅ Valid JSON
```
### Step 5.4: Install via Claude Code
In Claude Code:
```
/plugin marketplace add /absolute/path/to/tailscale-sshsync-agent
```
**Example**:
```
/plugin marketplace add /Users/you/tailscale-sshsync-agent
```
**Expected output**:
```
✓ Plugin installed successfully
✓ Skill: tailscale-sshsync-agent
✓ Description: Manages distributed workloads and file sharing...
```
### Step 5.5: Verify Installation
In Claude Code:
```
"Which of my machines are online?"
```
**Expected response**: Agent should activate and check your Tailscale network.
## Step 6: Test Installation
### Test 1: Host Status
**Query**:
```
"Which of my machines are online?"
```
**Expected**: List of hosts with online/offline status
### Test 2: List Groups
**Query**:
```
"What groups do I have configured?"
```
**Expected**: List of your sshsync groups
### Test 3: Execute Command
**Query**:
```
"Check disk space on homelab machines"
```
**Expected**: Disk usage for hosts in homelab group
### Test 4: Dry-Run
**Query**:
```
"Show me what would happen if I ran 'uptime' on all machines (dry-run)"
```
**Expected**: Preview without execution
### Test 5: Run Test Suite
```bash
cd /path/to/tailscale-sshsync-agent
# Run all tests
python3 tests/test_integration.py
# Should show:
# Results: 11/11 passed
# 🎉 All tests passed!
```
## Troubleshooting
### Agent Not Activating
**Symptoms**: Agent doesn't respond to queries about machines/hosts
**Solutions**:
1. **Check installation**:
```
/plugin list
```
Should show `tailscale-sshsync-agent` in list.
2. **Reinstall**:
```
/plugin remove tailscale-sshsync-agent
/plugin marketplace add /path/to/tailscale-sshsync-agent
```
3. **Check marketplace.json**:
```bash
cat .claude-plugin/marketplace.json
# Verify "description" field matches SKILL.md frontmatter
```
### SSH Connection Fails
**Symptoms**: "Permission denied" or "Connection refused"
**Solutions**:
1. **Check SSH key**:
```bash
ssh-add -l
# Should list your SSH key
```
If not listed:
```bash
ssh-add ~/.ssh/id_ed25519
```
2. **Test SSH directly**:
```bash
ssh -v hostname
# -v shows verbose debug info
```
3. **Verify authorized_keys on remote**:
```bash
ssh hostname "cat ~/.ssh/authorized_keys"
# Should contain your public key
```
### Tailscale Connection Issues
**Symptoms**: Hosts show as offline in Tailscale
**Solutions**:
1. **Check Tailscale status**:
```bash
tailscale status
```
2. **Restart Tailscale**:
```bash
# macOS
brew services restart tailscale
# Linux
sudo systemctl restart tailscaled
```
3. **Re-authenticate**:
```bash
sudo tailscale up
```
### sshsync Errors
**Symptoms**: "sshsync: command not found"
**Solutions**:
1. **Reinstall sshsync**:
```bash
pip3 install --upgrade sshsync
```
2. **Check PATH**:
```bash
which sshsync
# Should show path to sshsync
```
If not found, add to PATH:
```bash
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
### Config File Issues
**Symptoms**: "Group not found" or "Host not found"
**Solutions**:
1. **Verify SSH config**:
```bash
cat ~/.ssh/config
# Check host aliases are correct
```
2. **Verify sshsync config**:
```bash
cat ~/.config/sshsync/config.yaml
# Check groups are defined
```
3. **Re-sync**:
```bash
sshsync sync
```
### Test Failures
**Symptoms**: Tests fail with errors
**Solutions**:
1. **Check dependencies**:
```bash
pip3 list | grep -E "sshsync|pyyaml"
```
2. **Check Python version**:
```bash
python3 --version
# Must be 3.10+
```
3. **Run tests individually**:
```bash
python3 tests/test_helpers.py
python3 tests/test_validation.py
python3 tests/test_integration.py
```
## Post-Installation
### Recommended Next Steps
1. **Create more groups** for better organization:
```bash
sshsync gadd staging
sshsync gadd backup-servers
```
2. **Test file operations**:
```
"Push test file to homelab machines (dry-run)"
```
3. **Set up automation**:
- Create scripts for common tasks
- Schedule backups
- Automate deployments
4. **Review documentation**:
- Read `references/sshsync-guide.md` for advanced sshsync usage
- Read `references/tailscale-integration.md` for Tailscale tips
### Security Checklist
- ✅ SSH keys are password-protected
- ✅ SSH config has correct permissions (600)
- ✅ Private keys have correct permissions (600)
- ✅ Tailscale ACLs configured (if using teams)
- ✅ Only necessary hosts have SSH access
- ✅ Regularly review connected devices in Tailscale
## Summary
You now have:
1. ✅ Tailscale installed and connected
2. ✅ sshsync installed and configured
3. ✅ SSH keys set up on all machines
4. ✅ SSH config with all hosts
5. ✅ sshsync groups organized
6. ✅ Agent installed in Claude Code
7. ✅ Tests passing
**Start using**:
```
"Which machines are online?"
"Run this on the least loaded machine"
"Push files to production servers"
"Deploy to staging then production"
```
For more examples, see README.md and SKILL.md.
## Support
If you encounter issues:
1. Check this troubleshooting section
2. Review references/ for detailed guides
3. Check DECISIONS.md for architecture rationale
4. Run tests to verify installation
Happy automating! 🚀

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# tailscale-sshsync-agent
Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.

1204
SKILL.md Normal file

File diff suppressed because it is too large Load Diff

1
VERSION Normal file
View File

@@ -0,0 +1 @@
1.0.0

117
plugin.lock.json Normal file
View File

@@ -0,0 +1,117 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:Human-Frontier-Labs-Inc/human-frontier-labs-marketplace:plugins/tailscale-sshsync-agent",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "3a7cbe9632f245c6b9a4c4bf2731da65c857a7f4",
"treeHash": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805",
"generatedAt": "2025-11-28T10:11:41.356928Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "tailscale-sshsync-agent",
"description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
"version": null
},
"content": {
"files": [
{
"path": "CHANGELOG.md",
"sha256": "74dbda933868b7cab410144a831b43e4f1ae6161f2402edcb068a8232c50bfe4"
},
{
"path": "README.md",
"sha256": "470f165d8ac61a8942e6fb3568c49febb7f803bfa0f4010d14e09f807c34c88e"
},
{
"path": "VERSION",
"sha256": "59854984853104df5c353e2f681a15fc7924742f9a2e468c29af248dce45ce03"
},
{
"path": "SKILL.md",
"sha256": "31c8f237f9b3617c32c6ff381ae83d427b50eb0877d3763d9826e00ece6618f1"
},
{
"path": "INSTALLATION.md",
"sha256": "9313ea1bbb0a03e4c078c41b207f3febe800cd38eb57b7205c7b5188238ca46a"
},
{
"path": "DECISIONS.md",
"sha256": "59549e84aaa8e32d4bdf64d46855714f5cde7f061906e1c74976658883472c82"
},
{
"path": "references/tailscale-integration.md",
"sha256": "6553b3ceeaca5118a7b005368223ea4b3ab70eb2492ccaf5c2b7f7758b65dd42"
},
{
"path": "references/sshsync-guide.md",
"sha256": "697ce0b56eda258732a0b924f821e9e24eb6b977934153bdd2045be961e58de2"
},
{
"path": "tests/test_validation.py",
"sha256": "716ae0d2e86f0e6657903aef6bb714fbd3b5b72d3b109fab4da3f75f90cc2c0a"
},
{
"path": "tests/test_helpers.py",
"sha256": "3be88e30825414eb3ade048b766c84995dc98a01cb7236ce75201716179279a8"
},
{
"path": "tests/test_integration.py",
"sha256": "12f7cb857fda23531a9c74caf072cf73b739672b1e99c55f42a2ef8e11238523"
},
{
"path": "scripts/load_balancer.py",
"sha256": "9d87476562ac848a026e42116e381f733d520e9330da33de3d905585af14398d"
},
{
"path": "scripts/tailscale_manager.py",
"sha256": "4b75ebb9423d221b9788eb9352b274e0256c101185de11064a7b4cb00684016e"
},
{
"path": "scripts/workflow_executor.py",
"sha256": "9f23f3bb421e940766e65949e6efa485a313115e297d4c5f1088589155a7bac1"
},
{
"path": "scripts/sshsync_wrapper.py",
"sha256": "fc2062ebbc72e3ddc6c6bfb5f22019b23050f5c2ed9ac35c315018a96871fb19"
},
{
"path": "scripts/utils/helpers.py",
"sha256": "b01979ee56ab92037b8f8054a883124d600b8337cf461855092b866091aed24a"
},
{
"path": "scripts/utils/validators/connection_validator.py",
"sha256": "9ac82108e69690b74d9aa89ca51f7d06fe860e880aaa1983d08242d7199d1601"
},
{
"path": "scripts/utils/validators/parameter_validator.py",
"sha256": "157dfcb7f1937df88344647a37a124d52e1de1b992b72c9b9e69d3b717ca0195"
},
{
"path": "scripts/utils/validators/__init__.py",
"sha256": "2d109ad1b5d253578a095c8354159fdf9318154b4f62d9b16eaa1a88a422382d"
},
{
"path": "scripts/utils/validators/host_validator.py",
"sha256": "79cab42587435a799349ba8a562c4ec0f3d54f3f2790562c894c6289beade6d6"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "0ec7466bbf2e8dc2fe1607feff0cc0ef0ebebf44ff54f17dcce96255e2c21215"
}
],
"dirSha256": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

466
references/sshsync-guide.md Normal file
View File

@@ -0,0 +1,466 @@
# sshsync CLI Tool Guide
Complete reference for using sshsync with Tailscale SSH Sync Agent.
## Table of Contents
1. [Installation](#installation)
2. [Configuration](#configuration)
3. [Core Commands](#core-commands)
4. [Advanced Usage](#advanced-usage)
5. [Troubleshooting](#troubleshooting)
## Installation
### Via pip
```bash
pip install sshsync
```
### Verify Installation
```bash
sshsync --version
```
## Configuration
### 1. SSH Config Setup
sshsync uses your existing SSH configuration. Edit `~/.ssh/config`:
```
# Example host entries
Host homelab-1
HostName 100.64.1.10
User admin
IdentityFile ~/.ssh/id_ed25519
Port 22
Host prod-web-01
HostName 100.64.1.20
User deploy
IdentityFile ~/.ssh/id_rsa
Port 22
Host dev-laptop
HostName 100.64.1.30
User developer
```
**Important Notes**:
- sshsync uses the **Host alias** (e.g., "homelab-1"), not the actual hostname
- Ensure SSH key authentication is configured
- Test each host with `ssh host-alias` before using with sshsync
### 2. Initialize sshsync Configuration
First run:
```bash
sshsync sync
```
This will:
1. Read all hosts from your SSH config
2. Prompt you to assign hosts to groups
3. Create `~/.config/sshsync/config.yaml`
### 3. sshsync Config File
Location: `~/.config/sshsync/config.yaml`
Structure:
```yaml
groups:
production:
- prod-web-01
- prod-web-02
- prod-db-01
development:
- dev-laptop
- dev-desktop
homelab:
- homelab-1
- homelab-2
```
**Manual Editing**:
- Groups are arbitrary labels (use what makes sense for you)
- Hosts can belong to multiple groups
- Use consistent host aliases from SSH config
## Core Commands
### List Hosts
```bash
# List all configured hosts
sshsync ls
# List with online/offline status
sshsync ls --with-status
```
**Output Example**:
```
Host Status
homelab-1 online
homelab-2 offline
prod-web-01 online
dev-laptop online
```
### Execute Commands
#### On All Hosts
```bash
# Execute on all configured hosts
sshsync all "df -h"
# With custom timeout (default: 10s)
sshsync all --timeout 20 "systemctl status nginx"
# Dry-run (preview without executing)
sshsync all --dry-run "reboot"
```
#### On Specific Group
```bash
# Execute on group
sshsync group production "uptime"
# With timeout
sshsync group web-servers --timeout 30 "npm run build"
# Filter with regex
sshsync group production --regex "web-.*" "df -h"
```
**Regex Filtering**:
- Filters group members by alias matching pattern
- Uses Python regex syntax
- Example: `--regex "web-0[1-3]"` matches web-01, web-02, web-03
### File Transfer
#### Push Files
```bash
# Push to specific host
sshsync push --host web-01 ./app /var/www/app
# Push to group
sshsync push --group production ./dist /var/www/app
# Push to all hosts
sshsync push --all ./config.yml /etc/app/config.yml
# Recursive push (directory with contents)
sshsync push --group web --recurse ./app /var/www/app
# Dry-run
sshsync push --group production --dry-run ./dist /var/www/app
```
**Important**:
- Local path comes first, remote path second
- Use `--recurse` for directories
- Dry-run shows what would be transferred without executing
#### Pull Files
```bash
# Pull from specific host
sshsync pull --host db-01 /var/log/mysql/error.log ./logs/
# Pull from group (creates separate directories per host)
sshsync pull --group databases /var/backups ./backups/
# Recursive pull
sshsync pull --host web-01 --recurse /var/www/app ./backup/
```
**Pull Behavior**:
- When pulling from groups, creates subdirectory per host
- Use `--recurse` to pull entire directory trees
- Destination directory created if doesn't exist
### Group Management
#### Add Hosts to Group
```bash
# Interactive: prompts to select hosts
sshsync gadd production
# Follow prompts to select which hosts to add
```
#### Add Host to SSH Config
```bash
# Interactive host addition
sshsync hadd
# Follow prompts for:
# - Host alias
# - Hostname/IP
# - Username
# - Port (optional)
# - Identity file (optional)
```
#### Sync Ungrouped Hosts
```bash
# Assign groups to hosts not yet in any group
sshsync sync
```
## Advanced Usage
### Parallel Execution
sshsync automatically executes commands in parallel across hosts:
```bash
# This runs simultaneously on all hosts in group
sshsync group web-servers "npm run build"
```
**Performance**:
- Commands execute concurrently
- Results collected as they complete
- Timeout applies per-host independently
### Timeout Strategies
Different operations need different timeouts:
```bash
# Quick checks (5-10s)
sshsync all --timeout 5 "hostname"
# Moderate operations (30-60s)
sshsync group web --timeout 60 "npm install"
# Long-running tasks (300s+)
sshsync group build --timeout 300 "docker build ."
```
**Timeout Best Practices**:
- Set timeout 20-30% longer than expected duration
- Use dry-run first to estimate timing
- Increase timeout for network-intensive operations
### Combining with Other Tools
#### With xargs
```bash
# Get list of online hosts
sshsync ls --with-status | grep online | awk '{print $1}' | xargs -I {} echo "Host {} is online"
```
#### With jq (if using JSON output)
```bash
# Parse structured output (if sshsync supports --json flag)
sshsync ls --json | jq '.hosts[] | select(.status=="online") | .name'
```
#### In Shell Scripts
```bash
#!/bin/bash
# Deploy script using sshsync
echo "Deploying to staging..."
sshsync push --group staging --recurse ./dist /var/www/app
if [ $? -eq 0 ]; then
echo "Staging deployment successful"
echo "Running tests..."
sshsync group staging "cd /var/www/app && npm test"
if [ $? -eq 0 ]; then
echo "Tests passed, deploying to production..."
sshsync push --group production --recurse ./dist /var/www/app
fi
fi
```
## Troubleshooting
### Common Issues
#### 1. "Permission denied (publickey)"
**Cause**: SSH key not configured or not added to ssh-agent
**Solution**:
```bash
# Add SSH key to agent
ssh-add ~/.ssh/id_ed25519
# Verify it's added
ssh-add -l
# Copy public key to remote
ssh-copy-id user@host
```
#### 2. "Connection timed out"
**Cause**: Host is offline or network issue
**Solution**:
```bash
# Test connectivity
ping hostname
# Test Tailscale specifically
tailscale ping hostname
# Check Tailscale status
tailscale status
```
#### 3. "Host not found in SSH config"
**Cause**: Host alias not in `~/.ssh/config`
**Solution**:
```bash
# Add host to SSH config
sshsync hadd
# Or manually edit ~/.ssh/config
vim ~/.ssh/config
```
#### 4. "Group not found"
**Cause**: Group doesn't exist in sshsync config
**Solution**:
```bash
# Add hosts to new group
sshsync gadd mygroup
# Or manually edit config
vim ~/.config/sshsync/config.yaml
```
#### 5. File Transfer Fails
**Cause**: Insufficient permissions, disk space, or path doesn't exist
**Solution**:
```bash
# Check remote disk space
sshsync group production "df -h"
# Check remote path exists
sshsync group production "ls -ld /target/path"
# Check permissions
sshsync group production "ls -la /target/path"
```
### Debug Mode
While sshsync doesn't have a built-in verbose mode, you can debug underlying SSH:
```bash
# Increase SSH verbosity
SSH_VERBOSE=1 sshsync all "uptime"
# Or use dry-run to see what would execute
sshsync all --dry-run "command"
```
### Performance Issues
If operations are slow:
1. **Reduce parallelism** (run on fewer hosts at once)
2. **Increase timeout** for network-bound operations
3. **Check network latency**:
```bash
sshsync all "echo $HOSTNAME" --timeout 5
```
### Configuration Validation
```bash
# Verify SSH config is readable
cat ~/.ssh/config
# Verify sshsync config
cat ~/.config/sshsync/config.yaml
# Test hosts individually
for host in $(sshsync ls | awk '{print $1}'); do
echo "Testing $host..."
ssh $host "echo OK" || echo "FAILED: $host"
done
```
## Best Practices
1. **Use meaningful host aliases** in SSH config
2. **Organize groups logically** (by function, environment, location)
3. **Always dry-run first** for destructive operations
4. **Set appropriate timeouts** based on operation type
5. **Test SSH keys** before using sshsync
6. **Keep groups updated** as infrastructure changes
7. **Use --with-status** to check availability before operations
## Integration with Tailscale
sshsync works seamlessly with Tailscale SSH:
```bash
# SSH config using Tailscale hostname
Host homelab-1
HostName homelab-1.tailnet.ts.net
User admin
# Or using Tailscale IP directly
Host homelab-1
HostName 100.64.1.10
User admin
```
**Tailscale Advantages**:
- No need for port forwarding
- Encrypted connections
- MagicDNS for easy hostnames
- Works across NATs
**Verify Tailscale**:
```bash
# Check Tailscale network
tailscale status
# Ping host via Tailscale
tailscale ping homelab-1
```
## Summary
sshsync simplifies multi-host SSH operations:
- ✅ Execute commands across host groups
- ✅ Transfer files to/from multiple hosts
- ✅ Organize hosts into logical groups
- ✅ Parallel execution for speed
- ✅ Dry-run mode for safety
- ✅ Works great with Tailscale
For more help: `sshsync --help`

View File

@@ -0,0 +1,468 @@
# Tailscale Integration Guide
How to use Tailscale SSH with sshsync for secure, zero-config remote access.
## What is Tailscale?
Tailscale is a zero-config VPN that creates a secure network between your devices using WireGuard. It provides:
- **Peer-to-peer encrypted connections**
- **No port forwarding required**
- **Works across NATs and firewalls**
- **MagicDNS for easy device addressing**
- **Built-in SSH functionality**
- **Access control lists (ACLs)**
## Why Tailscale + sshsync?
Combining Tailscale with sshsync gives you:
1. **Secure connections** everywhere (Tailscale encryption)
2. **Simple addressing** (MagicDNS hostnames)
3. **Multi-host operations** (sshsync groups and execution)
4. **No firewall configuration** needed
5. **Works from anywhere** (coffee shop, home, office)
## Setup
### 1. Install Tailscale
**macOS**:
```bash
brew install tailscale
```
**Linux**:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
```
**Verify Installation**:
```bash
tailscale version
```
### 2. Connect to Tailscale
```bash
# Start Tailscale
sudo tailscale up
# Follow the authentication link
# This opens browser to authenticate
# Verify connection
tailscale status
```
### 3. Configure SSH via Tailscale
Tailscale provides two SSH options:
#### Option A: Tailscale SSH (Built-in)
**Enable on each machine**:
```bash
sudo tailscale up --ssh
```
**Use**:
```bash
tailscale ssh user@machine-name
```
**Advantages**:
- No SSH server configuration needed
- Uses Tailscale authentication
- Automatic key management
#### Option B: Standard SSH over Tailscale (Recommended for sshsync)
**Configure SSH config** to use Tailscale hostnames:
```bash
# ~/.ssh/config
Host homelab-1
HostName homelab-1.tailnet-name.ts.net
User admin
IdentityFile ~/.ssh/id_ed25519
# Or use Tailscale IP directly
Host homelab-2
HostName 100.64.1.10
User admin
IdentityFile ~/.ssh/id_ed25519
```
**Advantages**:
- Works with all SSH tools (including sshsync)
- Standard SSH key authentication
- More flexibility
## Getting Tailscale Hostnames and IPs
### View All Machines
```bash
tailscale status
```
**Output**:
```
100.64.1.10 homelab-1 user@ linux -
100.64.1.11 homelab-2 user@ linux -
100.64.1.20 laptop user@ macOS -
100.64.1.30 phone user@ iOS offline
```
### Get MagicDNS Hostname
**Format**: `machine-name.tailnet-name.ts.net`
**Find your tailnet name**:
```bash
tailscale status --json | grep -i tailnet
```
Or check in Tailscale admin console: https://login.tailscale.com/admin/machines
### Get Tailscale IP
```bash
# Your own IP
tailscale ip -4
# Another machine's IP (from status output)
tailscale status | grep machine-name
```
## Testing Connectivity
### Ping via Tailscale
```bash
# Ping by hostname
tailscale ping homelab-1
# Ping by IP
tailscale ping 100.64.1.10
```
**Successful output**:
```
pong from homelab-1 (100.64.1.10) via DERP(nyc) in 45ms
pong from homelab-1 (100.64.1.10) via DERP(nyc) in 43ms
```
**Failed output**:
```
timeout waiting for pong
```
### SSH Test
```bash
# Test SSH connection
ssh user@homelab-1.tailnet.ts.net
# Or with IP
ssh user@100.64.1.10
```
## Configuring sshsync with Tailscale
### Step 1: Add Tailscale Hosts to SSH Config
```bash
vim ~/.ssh/config
```
**Example configuration**:
```
# Production servers
Host prod-web-01
HostName prod-web-01.tailnet.ts.net
User deploy
IdentityFile ~/.ssh/id_ed25519
Host prod-web-02
HostName prod-web-02.tailnet.ts.net
User deploy
IdentityFile ~/.ssh/id_ed25519
Host prod-db-01
HostName prod-db-01.tailnet.ts.net
User deploy
IdentityFile ~/.ssh/id_ed25519
# Homelab
Host homelab-1
HostName 100.64.1.10
User admin
IdentityFile ~/.ssh/id_ed25519
Host homelab-2
HostName 100.64.1.11
User admin
IdentityFile ~/.ssh/id_ed25519
# Development
Host dev-laptop
HostName dev-laptop.tailnet.ts.net
User developer
IdentityFile ~/.ssh/id_ed25519
```
### Step 2: Test Each Host
```bash
# Test connectivity to each host
ssh prod-web-01 "hostname"
ssh homelab-1 "hostname"
ssh dev-laptop "hostname"
```
### Step 3: Initialize sshsync
```bash
# Sync hosts and create groups
sshsync sync
# Add hosts to groups
sshsync gadd production
# Select: prod-web-01, prod-web-02, prod-db-01
sshsync gadd homelab
# Select: homelab-1, homelab-2
sshsync gadd development
# Select: dev-laptop
```
### Step 4: Verify Configuration
```bash
# List all hosts with status
sshsync ls --with-status
# Test command execution
sshsync all "uptime"
# Test group execution
sshsync group production "df -h"
```
## Advanced Tailscale Features
### Tailnet Lock
Prevents unauthorized device additions:
```bash
tailscale lock status
```
### Exit Nodes
Route all traffic through a specific machine:
```bash
# Enable exit node on a machine
sudo tailscale up --advertise-exit-node
# Use exit node from another machine
sudo tailscale set --exit-node=exit-node-name
```
### Subnet Routing
Access networks behind Tailscale machines:
```bash
# Advertise subnet routes
sudo tailscale up --advertise-routes=192.168.1.0/24
```
### ACLs (Access Control Lists)
Control who can access what: https://login.tailscale.com/admin/acls
**Example ACL**:
```json
{
"acls": [
{
"action": "accept",
"src": ["group:admins"],
"dst": ["*:22", "*:80", "*:443"]
},
{
"action": "accept",
"src": ["group:developers"],
"dst": ["tag:development:*"]
}
]
}
```
## Troubleshooting
### Machine Shows Offline
**Check Tailscale status**:
```bash
tailscale status
```
**Restart Tailscale**:
```bash
# macOS
brew services restart tailscale
# Linux
sudo systemctl restart tailscaled
```
**Re-authenticate**:
```bash
sudo tailscale up
```
### Cannot Connect via SSH
1. **Verify Tailscale connectivity**:
```bash
tailscale ping machine-name
```
2. **Check SSH is running** on remote:
```bash
tailscale ssh machine-name "systemctl status sshd"
```
3. **Verify SSH keys**:
```bash
ssh-add -l
```
4. **Test SSH directly**:
```bash
ssh -v user@machine-name.tailnet.ts.net
```
### High Latency
**Check connection method**:
```bash
tailscale status
```
Look for "direct" vs "DERP relay":
- **Direct**: Low latency (< 50ms)
- **DERP relay**: Higher latency (100-200ms)
**Force direct connection**:
```bash
# Ensure both machines can establish P2P
# May require NAT traversal
```
### MagicDNS Not Working
**Enable MagicDNS**:
1. Go to https://login.tailscale.com/admin/dns
2. Enable MagicDNS
**Verify**:
```bash
nslookup machine-name.tailnet.ts.net
```
## Security Best Practices
1. **Use SSH keys**, not passwords
2. **Enable Tailnet Lock** to prevent unauthorized devices
3. **Use ACLs** to restrict access
4. **Regularly review** connected devices
5. **Set up key expiry** for team members who leave
6. **Use tags** for machine roles
7. **Enable two-factor auth** for Tailscale account
## Monitoring
### Check Network Status
```bash
# All machines
tailscale status
# Self status
tailscale status --self
# JSON format for parsing
tailscale status --json
```
### View Logs
```bash
# macOS
tail -f /var/log/tailscaled.log
# Linux
journalctl -u tailscaled -f
```
## Use Cases with sshsync
### 1. Deploy to All Production Servers
```bash
sshsync push --group production --recurse ./dist /var/www/app
sshsync group production "cd /var/www/app && pm2 restart all"
```
### 2. Collect Logs from All Servers
```bash
sshsync pull --group production /var/log/app/error.log ./logs/
```
### 3. Update All Homelab Machines
```bash
sshsync group homelab "sudo apt update && sudo apt upgrade -y"
```
### 4. Check Disk Space Everywhere
```bash
sshsync all "df -h /"
```
### 5. Sync Configuration Across Machines
```bash
sshsync push --all ~/dotfiles/.bashrc ~/.bashrc
sshsync push --all ~/dotfiles/.vimrc ~/.vimrc
```
## Summary
Tailscale + sshsync = **Powerful Remote Management**
- ✅ Secure connections everywhere (WireGuard encryption)
- ✅ No firewall configuration needed
- ✅ Easy addressing (MagicDNS)
- ✅ Multi-host operations (sshsync groups)
- ✅ Works from anywhere
**Quick Start**:
1. Install Tailscale: `brew install tailscale`
2. Connect: `sudo tailscale up`
3. Configure SSH config with Tailscale hostnames
4. Initialize sshsync: `sshsync sync`
5. Start managing: `sshsync all "uptime"`
For more: https://tailscale.com/kb/

378
scripts/load_balancer.py Normal file
View File

@@ -0,0 +1,378 @@
#!/usr/bin/env python3
"""
Load balancer for Tailscale SSH Sync Agent.
Intelligent task distribution based on machine resources.
"""
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass
import logging
# Add utils to path
sys.path.insert(0, str(Path(__file__).parent))
from utils.helpers import parse_cpu_load, parse_memory_usage, parse_disk_usage, calculate_load_score, classify_load_status
from sshsync_wrapper import execute_on_host
logger = logging.getLogger(__name__)
@dataclass
class MachineMetrics:
"""Resource metrics for a machine."""
host: str
cpu_pct: float
mem_pct: float
disk_pct: float
load_score: float
status: str
def get_machine_load(host: str, timeout: int = 10) -> Optional[MachineMetrics]:
"""
Get CPU, memory, disk metrics for a machine.
Args:
host: Host to check
timeout: Command timeout
Returns:
MachineMetrics object or None on failure
Example:
>>> metrics = get_machine_load("web-01")
>>> metrics.cpu_pct
45.2
>>> metrics.load_score
0.49
"""
try:
# Get CPU load
cpu_result = execute_on_host(host, "uptime", timeout=timeout)
cpu_data = {}
if cpu_result.get('success'):
cpu_data = parse_cpu_load(cpu_result['stdout'])
# Get memory usage
mem_result = execute_on_host(host, "free -m 2>/dev/null || vm_stat", timeout=timeout)
mem_data = {}
if mem_result.get('success'):
mem_data = parse_memory_usage(mem_result['stdout'])
# Get disk usage
disk_result = execute_on_host(host, "df -h / | tail -1", timeout=timeout)
disk_data = {}
if disk_result.get('success'):
disk_data = parse_disk_usage(disk_result['stdout'])
# Calculate metrics
# CPU: Use 1-min load average, normalize by assuming 4 cores (adjust as needed)
cpu_pct = (cpu_data.get('load_1min', 0) / 4.0) * 100 if cpu_data else 50.0
# Memory: Direct percentage
mem_pct = mem_data.get('use_pct', 50.0)
# Disk: Direct percentage
disk_pct = disk_data.get('use_pct', 50.0)
# Calculate load score
score = calculate_load_score(cpu_pct, mem_pct, disk_pct)
status = classify_load_status(score)
return MachineMetrics(
host=host,
cpu_pct=cpu_pct,
mem_pct=mem_pct,
disk_pct=disk_pct,
load_score=score,
status=status
)
except Exception as e:
logger.error(f"Error getting load for {host}: {e}")
return None
def select_optimal_host(candidates: List[str],
prefer_group: Optional[str] = None,
timeout: int = 10) -> Tuple[Optional[str], Optional[MachineMetrics]]:
"""
Pick best host from candidates based on load.
Args:
candidates: List of candidate hosts
prefer_group: Prefer hosts from this group if available
timeout: Timeout for metric gathering
Returns:
Tuple of (selected_host, metrics)
Example:
>>> host, metrics = select_optimal_host(["web-01", "web-02", "web-03"])
>>> host
"web-03"
>>> metrics.load_score
0.28
"""
if not candidates:
return None, None
# Get metrics for all candidates
metrics_list: List[MachineMetrics] = []
for host in candidates:
metrics = get_machine_load(host, timeout=timeout)
if metrics:
metrics_list.append(metrics)
if not metrics_list:
logger.warning("No valid metrics collected from candidates")
return None, None
# Sort by load score (lower is better)
metrics_list.sort(key=lambda m: m.load_score)
# If prefer_group specified, prioritize those hosts if load is similar
if prefer_group:
from utils.helpers import parse_sshsync_config, get_groups_for_host
groups_config = parse_sshsync_config()
# Find hosts in preferred group
preferred_metrics = [
m for m in metrics_list
if prefer_group in get_groups_for_host(m.host, groups_config)
]
# Use preferred if load score within 20% of absolute best
if preferred_metrics:
best_score = metrics_list[0].load_score
for m in preferred_metrics:
if m.load_score <= best_score * 1.2:
return m.host, m
# Return absolute best
best = metrics_list[0]
return best.host, best
def get_group_capacity(group: str, timeout: int = 10) -> Dict:
"""
Get aggregate capacity of a group.
Args:
group: Group name
timeout: Timeout for metric gathering
Returns:
Dict with aggregate metrics:
{
'hosts': List[MachineMetrics],
'total_hosts': int,
'avg_cpu': float,
'avg_mem': float,
'avg_disk': float,
'avg_load_score': float,
'total_capacity': str # descriptive
}
Example:
>>> capacity = get_group_capacity("production")
>>> capacity['avg_load_score']
0.45
"""
from utils.helpers import parse_sshsync_config
groups_config = parse_sshsync_config()
group_hosts = groups_config.get(group, [])
if not group_hosts:
return {
'error': f'Group {group} not found or has no members',
'hosts': []
}
# Get metrics for all hosts in group
metrics_list: List[MachineMetrics] = []
for host in group_hosts:
metrics = get_machine_load(host, timeout=timeout)
if metrics:
metrics_list.append(metrics)
if not metrics_list:
return {
'error': f'Could not get metrics for any hosts in {group}',
'hosts': []
}
# Calculate aggregates
avg_cpu = sum(m.cpu_pct for m in metrics_list) / len(metrics_list)
avg_mem = sum(m.mem_pct for m in metrics_list) / len(metrics_list)
avg_disk = sum(m.disk_pct for m in metrics_list) / len(metrics_list)
avg_score = sum(m.load_score for m in metrics_list) / len(metrics_list)
# Determine overall capacity description
if avg_score < 0.4:
capacity_desc = "High capacity available"
elif avg_score < 0.7:
capacity_desc = "Moderate capacity"
else:
capacity_desc = "Limited capacity"
return {
'group': group,
'hosts': metrics_list,
'total_hosts': len(metrics_list),
'available_hosts': len(group_hosts),
'avg_cpu': avg_cpu,
'avg_mem': avg_mem,
'avg_disk': avg_disk,
'avg_load_score': avg_score,
'total_capacity': capacity_desc
}
def distribute_tasks(tasks: List[Dict], hosts: List[str],
timeout: int = 10) -> Dict[str, List[Dict]]:
"""
Distribute multiple tasks optimally across hosts.
Args:
tasks: List of task dicts (each with 'command', 'priority', etc)
hosts: Available hosts
timeout: Timeout for metric gathering
Returns:
Dict mapping hosts to assigned tasks
Algorithm:
- Get current load for all hosts
- Assign tasks to least loaded hosts
- Balance by estimated task weight
Example:
>>> tasks = [
... {'command': 'npm run build', 'weight': 3},
... {'command': 'npm test', 'weight': 2}
... ]
>>> distribution = distribute_tasks(tasks, ["web-01", "web-02"])
>>> distribution["web-01"]
[{'command': 'npm run build', 'weight': 3}]
"""
if not tasks or not hosts:
return {}
# Get current load for all hosts
host_metrics = {}
for host in hosts:
metrics = get_machine_load(host, timeout=timeout)
if metrics:
host_metrics[host] = metrics
if not host_metrics:
logger.error("No valid host metrics available")
return {}
# Initialize assignment
assignment: Dict[str, List[Dict]] = {host: [] for host in host_metrics.keys()}
host_loads = {host: m.load_score for host, m in host_metrics.items()}
# Sort tasks by weight (descending) to assign heavy tasks first
sorted_tasks = sorted(
tasks,
key=lambda t: t.get('weight', 1),
reverse=True
)
# Assign each task to least loaded host
for task in sorted_tasks:
# Find host with minimum current load
min_host = min(host_loads.keys(), key=lambda h: host_loads[h])
# Assign task
assignment[min_host].append(task)
# Update simulated load (add task weight normalized)
task_weight = task.get('weight', 1)
host_loads[min_host] += (task_weight * 0.1) # 0.1 = scaling factor
return assignment
def format_load_report(metrics: MachineMetrics, compare_to_avg: Optional[Dict] = None) -> str:
"""
Format load metrics as human-readable report.
Args:
metrics: Machine metrics
compare_to_avg: Optional dict with avg_cpu, avg_mem, avg_disk for comparison
Returns:
Formatted report string
Example:
>>> metrics = MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate')
>>> print(format_load_report(metrics))
web-01: Load Score: 0.49 (moderate)
CPU: 45.0% | Memory: 60.0% | Disk: 40.0%
"""
lines = [
f"{metrics.host}: Load Score: {metrics.load_score:.2f} ({metrics.status})",
f" CPU: {metrics.cpu_pct:.1f}% | Memory: {metrics.mem_pct:.1f}% | Disk: {metrics.disk_pct:.1f}%"
]
if compare_to_avg:
cpu_vs = metrics.cpu_pct - compare_to_avg.get('avg_cpu', 0)
mem_vs = metrics.mem_pct - compare_to_avg.get('avg_mem', 0)
disk_vs = metrics.disk_pct - compare_to_avg.get('avg_disk', 0)
comparisons = []
if abs(cpu_vs) > 10:
comparisons.append(f"CPU {'+' if cpu_vs > 0 else ''}{cpu_vs:.0f}% vs avg")
if abs(mem_vs) > 10:
comparisons.append(f"Mem {'+' if mem_vs > 0 else ''}{mem_vs:.0f}% vs avg")
if abs(disk_vs) > 10:
comparisons.append(f"Disk {'+' if disk_vs > 0 else ''}{disk_vs:.0f}% vs avg")
if comparisons:
lines.append(f" vs Average: {' | '.join(comparisons)}")
return "\n".join(lines)
def main():
"""Test load balancer functions."""
print("Testing load balancer...\n")
print("1. Testing select_optimal_host:")
print(" (Requires configured hosts - using dry-run simulation)")
# Simulate metrics
test_metrics = [
MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate'),
MachineMetrics('web-02', 85, 70, 65, 0.75, 'high'),
MachineMetrics('web-03', 20, 35, 30, 0.28, 'low'),
]
# Sort by score
test_metrics.sort(key=lambda m: m.load_score)
best = test_metrics[0]
print(f" ✓ Best host: {best.host} (score: {best.load_score:.2f})")
print(f" Reason: {best.status} load")
print("\n2. Format load report:")
report = format_load_report(test_metrics[0], {
'avg_cpu': 50,
'avg_mem': 55,
'avg_disk': 45
})
print(report)
print("\n✅ Load balancer tested")
if __name__ == "__main__":
main()

409
scripts/sshsync_wrapper.py Normal file
View File

@@ -0,0 +1,409 @@
#!/usr/bin/env python3
"""
SSH Sync wrapper for Tailscale SSH Sync Agent.
Python interface to sshsync CLI operations.
"""
import subprocess
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple
import json
import logging
# Add utils to path
sys.path.insert(0, str(Path(__file__).parent))
from utils.helpers import parse_ssh_config, parse_sshsync_config, format_bytes, format_duration
from utils.validators import validate_host, validate_group, validate_path_exists, validate_timeout, validate_command
logger = logging.getLogger(__name__)
def get_host_status(group: Optional[str] = None) -> Dict:
"""
Get online/offline status of hosts.
Args:
group: Optional group to filter (None = all hosts)
Returns:
Dict with status info
Example:
>>> status = get_host_status()
>>> status['online_count']
8
"""
try:
# Run sshsync ls --with-status
cmd = ["sshsync", "ls", "--with-status"]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
if result.returncode != 0:
return {'error': result.stderr, 'hosts': []}
# Parse output
hosts = []
for line in result.stdout.strip().split('\n'):
if not line or line.startswith('Host') or line.startswith('---'):
continue
parts = line.split()
if len(parts) >= 2:
host_name = parts[0]
status = parts[1] if len(parts) > 1 else 'unknown'
hosts.append({
'host': host_name,
'online': status.lower() in ['online', 'reachable', ''],
'status': status
})
# Filter by group if specified
if group:
groups_config = parse_sshsync_config()
group_hosts = groups_config.get(group, [])
hosts = [h for h in hosts if h['host'] in group_hosts]
online_count = sum(1 for h in hosts if h['online'])
return {
'hosts': hosts,
'total_count': len(hosts),
'online_count': online_count,
'offline_count': len(hosts) - online_count,
'availability_pct': (online_count / len(hosts) * 100) if hosts else 0
}
except Exception as e:
logger.error(f"Error getting host status: {e}")
return {'error': str(e), 'hosts': []}
def execute_on_all(command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
"""
Execute command on all hosts.
Args:
command: Command to execute
timeout: Timeout in seconds
dry_run: If True, don't actually execute
Returns:
Dict with results per host
Example:
>>> result = execute_on_all("uptime", timeout=15)
>>> len(result['results'])
10
"""
validate_command(command)
validate_timeout(timeout)
if dry_run:
return {
'dry_run': True,
'command': command,
'message': 'Would execute on all hosts'
}
try:
cmd = ["sshsync", "all", f"--timeout={timeout}", command]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
# Parse results (format varies, simplified here)
return {
'success': result.returncode == 0,
'stdout': result.stdout,
'stderr': result.stderr,
'command': command
}
except subprocess.TimeoutExpired:
return {'error': f'Command timed out after {timeout}s'}
except Exception as e:
return {'error': str(e)}
def execute_on_group(group: str, command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
"""
Execute command on specific group.
Args:
group: Group name
command: Command to execute
timeout: Timeout in seconds
dry_run: Preview without executing
Returns:
Dict with execution results
Example:
>>> result = execute_on_group("web-servers", "df -h /var/www")
>>> result['success']
True
"""
groups_config = parse_sshsync_config()
validate_group(group, list(groups_config.keys()))
validate_command(command)
validate_timeout(timeout)
if dry_run:
group_hosts = groups_config.get(group, [])
return {
'dry_run': True,
'group': group,
'hosts': group_hosts,
'command': command,
'message': f'Would execute on {len(group_hosts)} hosts in group {group}'
}
try:
cmd = ["sshsync", "group", f"--timeout={timeout}", group, command]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
return {
'success': result.returncode == 0,
'group': group,
'stdout': result.stdout,
'stderr': result.stderr,
'command': command
}
except subprocess.TimeoutExpired:
return {'error': f'Command timed out after {timeout}s'}
except Exception as e:
return {'error': str(e)}
def execute_on_host(host: str, command: str, timeout: int = 10) -> Dict:
"""
Execute command on single host.
Args:
host: Host name
command: Command to execute
timeout: Timeout in seconds
Returns:
Dict with result
Example:
>>> result = execute_on_host("web-01", "hostname")
>>> result['stdout']
"web-01"
"""
ssh_hosts = parse_ssh_config()
validate_host(host, list(ssh_hosts.keys()))
validate_command(command)
validate_timeout(timeout)
try:
cmd = ["ssh", "-o", f"ConnectTimeout={timeout}", host, command]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
return {
'success': result.returncode == 0,
'host': host,
'stdout': result.stdout,
'stderr': result.stderr,
'command': command
}
except subprocess.TimeoutExpired:
return {'error': f'Command timed out after {timeout}s'}
except Exception as e:
return {'error': str(e)}
def push_to_hosts(local_path: str, remote_path: str,
hosts: Optional[List[str]] = None,
group: Optional[str] = None,
recurse: bool = False,
dry_run: bool = False) -> Dict:
"""
Push files to hosts.
Args:
local_path: Local file/directory path
remote_path: Remote destination path
hosts: Specific hosts (None = all if group also None)
group: Group name
recurse: Recursive copy
dry_run: Preview without executing
Returns:
Dict with push results
Example:
>>> result = push_to_hosts("./dist", "/var/www/app", group="production", recurse=True)
>>> result['success']
True
"""
validate_path_exists(local_path)
if dry_run:
return {
'dry_run': True,
'local_path': local_path,
'remote_path': remote_path,
'hosts': hosts,
'group': group,
'recurse': recurse,
'message': 'Would push files'
}
try:
cmd = ["sshsync", "push"]
if hosts:
for host in hosts:
cmd.extend(["--host", host])
elif group:
cmd.extend(["--group", group])
else:
cmd.append("--all")
if recurse:
cmd.append("--recurse")
cmd.extend([local_path, remote_path])
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
return {
'success': result.returncode == 0,
'stdout': result.stdout,
'stderr': result.stderr,
'local_path': local_path,
'remote_path': remote_path
}
except subprocess.TimeoutExpired:
return {'error': 'Push operation timed out'}
except Exception as e:
return {'error': str(e)}
def pull_from_host(host: str, remote_path: str, local_path: str,
recurse: bool = False, dry_run: bool = False) -> Dict:
"""
Pull files from host.
Args:
host: Host to pull from
remote_path: Remote file/directory path
local_path: Local destination path
recurse: Recursive copy
dry_run: Preview without executing
Returns:
Dict with pull results
Example:
>>> result = pull_from_host("web-01", "/var/log/nginx", "./logs", recurse=True)
>>> result['success']
True
"""
ssh_hosts = parse_ssh_config()
validate_host(host, list(ssh_hosts.keys()))
if dry_run:
return {
'dry_run': True,
'host': host,
'remote_path': remote_path,
'local_path': local_path,
'recurse': recurse,
'message': f'Would pull from {host}'
}
try:
cmd = ["sshsync", "pull", "--host", host]
if recurse:
cmd.append("--recurse")
cmd.extend([remote_path, local_path])
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
return {
'success': result.returncode == 0,
'host': host,
'stdout': result.stdout,
'stderr': result.stderr,
'remote_path': remote_path,
'local_path': local_path
}
except subprocess.TimeoutExpired:
return {'error': 'Pull operation timed out'}
except Exception as e:
return {'error': str(e)}
def list_hosts(with_status: bool = True) -> Dict:
"""
List all configured hosts.
Args:
with_status: Include online/offline status
Returns:
Dict with hosts info
Example:
>>> result = list_hosts(with_status=True)
>>> len(result['hosts'])
10
"""
if with_status:
return get_host_status()
else:
ssh_hosts = parse_ssh_config()
return {
'hosts': [{'host': name} for name in ssh_hosts.keys()],
'count': len(ssh_hosts)
}
def get_groups() -> Dict[str, List[str]]:
"""
Get all defined groups and their members.
Returns:
Dict mapping group names to host lists
Example:
>>> groups = get_groups()
>>> groups['production']
['prod-web-01', 'prod-db-01']
"""
return parse_sshsync_config()
def main():
"""Test sshsync wrapper functions."""
print("Testing sshsync wrapper...\n")
print("1. List hosts:")
result = list_hosts(with_status=False)
print(f" Found {result.get('count', 0)} hosts")
print("\n2. Get groups:")
groups = get_groups()
print(f" Found {len(groups)} groups")
for group, hosts in groups.items():
print(f" - {group}: {len(hosts)} hosts")
print("\n3. Test dry-run:")
result = execute_on_all("uptime", dry_run=True)
print(f" Dry-run: {result.get('message', 'OK')}")
print("\n✅ sshsync wrapper tested")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,426 @@
#!/usr/bin/env python3
"""
Tailscale manager for Tailscale SSH Sync Agent.
Tailscale-specific operations and status management.
"""
import subprocess
import re
import json
from typing import Dict, List, Optional
from dataclasses import dataclass
import logging
logger = logging.getLogger(__name__)
@dataclass
class TailscalePeer:
"""Represents a Tailscale peer."""
hostname: str
ip: str
online: bool
last_seen: Optional[str] = None
os: Optional[str] = None
relay: Optional[str] = None
def get_tailscale_status() -> Dict:
"""
Get Tailscale network status (all peers).
Returns:
Dict with network status:
{
'connected': bool,
'peers': List[TailscalePeer],
'online_count': int,
'total_count': int,
'self_ip': str
}
Example:
>>> status = get_tailscale_status()
>>> status['online_count']
8
>>> status['peers'][0].hostname
'homelab-1'
"""
try:
# Get status in JSON format
result = subprocess.run(
["tailscale", "status", "--json"],
capture_output=True,
text=True,
timeout=10
)
if result.returncode != 0:
# Try text format if JSON fails
result = subprocess.run(
["tailscale", "status"],
capture_output=True,
text=True,
timeout=10
)
if result.returncode != 0:
return {
'connected': False,
'error': 'Tailscale not running or accessible',
'peers': []
}
# Parse text format
return _parse_text_status(result.stdout)
# Parse JSON format
data = json.loads(result.stdout)
return _parse_json_status(data)
except FileNotFoundError:
return {
'connected': False,
'error': 'Tailscale not installed',
'peers': []
}
except subprocess.TimeoutExpired:
return {
'connected': False,
'error': 'Timeout getting Tailscale status',
'peers': []
}
except Exception as e:
logger.error(f"Error getting Tailscale status: {e}")
return {
'connected': False,
'error': str(e),
'peers': []
}
def _parse_json_status(data: Dict) -> Dict:
"""Parse Tailscale JSON status."""
peers = []
self_data = data.get('Self', {})
self_ip = self_data.get('TailscaleIPs', [''])[0]
for peer_id, peer_data in data.get('Peer', {}).items():
hostname = peer_data.get('HostName', 'unknown')
ips = peer_data.get('TailscaleIPs', [])
ip = ips[0] if ips else 'unknown'
online = peer_data.get('Online', False)
os = peer_data.get('OS', 'unknown')
peers.append(TailscalePeer(
hostname=hostname,
ip=ip,
online=online,
os=os
))
online_count = sum(1 for p in peers if p.online)
return {
'connected': True,
'peers': peers,
'online_count': online_count,
'total_count': len(peers),
'self_ip': self_ip
}
def _parse_text_status(output: str) -> Dict:
"""Parse Tailscale text status output."""
peers = []
self_ip = None
for line in output.strip().split('\n'):
line = line.strip()
if not line:
continue
# Parse format: hostname ip status ...
parts = line.split()
if len(parts) >= 2:
hostname = parts[0]
ip = parts[1] if len(parts) > 1 else 'unknown'
# Check for self (usually marked with *)
if hostname.endswith('-'):
self_ip = ip
continue
# Determine online status from additional fields
online = 'offline' not in line.lower()
peers.append(TailscalePeer(
hostname=hostname,
ip=ip,
online=online
))
online_count = sum(1 for p in peers if p.online)
return {
'connected': True,
'peers': peers,
'online_count': online_count,
'total_count': len(peers),
'self_ip': self_ip or 'unknown'
}
def check_connectivity(host: str, timeout: int = 5) -> bool:
"""
Ping host via Tailscale.
Args:
host: Hostname to ping
timeout: Timeout in seconds
Returns:
True if host responds to ping
Example:
>>> check_connectivity("homelab-1")
True
"""
try:
result = subprocess.run(
["tailscale", "ping", "--timeout", f"{timeout}s", "--c", "1", host],
capture_output=True,
text=True,
timeout=timeout + 2
)
# Check if ping succeeded
return result.returncode == 0 or 'pong' in result.stdout.lower()
except (FileNotFoundError, subprocess.TimeoutExpired):
return False
except Exception as e:
logger.error(f"Error pinging {host}: {e}")
return False
def get_peer_info(hostname: str) -> Optional[TailscalePeer]:
"""
Get detailed info about a specific peer.
Args:
hostname: Peer hostname
Returns:
TailscalePeer object or None if not found
Example:
>>> peer = get_peer_info("homelab-1")
>>> peer.ip
'100.64.1.10'
"""
status = get_tailscale_status()
if not status.get('connected'):
return None
for peer in status.get('peers', []):
if peer.hostname == hostname or hostname in peer.hostname:
return peer
return None
def list_online_machines() -> List[str]:
"""
List all online Tailscale machines.
Returns:
List of online machine hostnames
Example:
>>> machines = list_online_machines()
>>> len(machines)
8
"""
status = get_tailscale_status()
if not status.get('connected'):
return []
return [
peer.hostname
for peer in status.get('peers', [])
if peer.online
]
def get_machine_ip(hostname: str) -> Optional[str]:
"""
Get Tailscale IP for a machine.
Args:
hostname: Machine hostname
Returns:
IP address or None if not found
Example:
>>> ip = get_machine_ip("homelab-1")
>>> ip
'100.64.1.10'
"""
peer = get_peer_info(hostname)
return peer.ip if peer else None
def validate_tailscale_ssh(host: str, timeout: int = 10) -> Dict:
"""
Check if Tailscale SSH is working for a host.
Args:
host: Host to check
timeout: Connection timeout
Returns:
Dict with validation results:
{
'working': bool,
'message': str,
'details': Dict
}
Example:
>>> result = validate_tailscale_ssh("homelab-1")
>>> result['working']
True
"""
# First check if host is in Tailscale network
peer = get_peer_info(host)
if not peer:
return {
'working': False,
'message': f'Host {host} not found in Tailscale network',
'details': {'peer_found': False}
}
if not peer.online:
return {
'working': False,
'message': f'Host {host} is offline in Tailscale',
'details': {'peer_found': True, 'online': False}
}
# Check connectivity
if not check_connectivity(host, timeout=timeout):
return {
'working': False,
'message': f'Cannot ping {host} via Tailscale',
'details': {'peer_found': True, 'online': True, 'ping': False}
}
# Try SSH connection
try:
result = subprocess.run(
["tailscale", "ssh", host, "echo", "test"],
capture_output=True,
text=True,
timeout=timeout
)
if result.returncode == 0:
return {
'working': True,
'message': f'Tailscale SSH to {host} is working',
'details': {
'peer_found': True,
'online': True,
'ping': True,
'ssh': True,
'ip': peer.ip
}
}
else:
return {
'working': False,
'message': f'Tailscale SSH failed: {result.stderr}',
'details': {
'peer_found': True,
'online': True,
'ping': True,
'ssh': False,
'error': result.stderr
}
}
except subprocess.TimeoutExpired:
return {
'working': False,
'message': f'Tailscale SSH timed out after {timeout}s',
'details': {'timeout': True}
}
except Exception as e:
return {
'working': False,
'message': f'Error testing Tailscale SSH: {e}',
'details': {'error': str(e)}
}
def get_network_summary() -> str:
"""
Get human-readable network summary.
Returns:
Formatted summary string
Example:
>>> print(get_network_summary())
Tailscale Network: Connected
Online: 8/10 machines (80%)
Self IP: 100.64.1.5
"""
status = get_tailscale_status()
if not status.get('connected'):
return "Tailscale Network: Not connected\nError: {}".format(
status.get('error', 'Unknown error')
)
lines = [
"Tailscale Network: Connected",
f"Online: {status['online_count']}/{status['total_count']} machines ({status['online_count']/status['total_count']*100:.0f}%)",
f"Self IP: {status.get('self_ip', 'unknown')}"
]
return "\n".join(lines)
def main():
"""Test Tailscale manager functions."""
print("Testing Tailscale manager...\n")
print("1. Get Tailscale status:")
status = get_tailscale_status()
if status.get('connected'):
print(f" ✓ Connected")
print(f" Peers: {status['total_count']} total, {status['online_count']} online")
else:
print(f" ✗ Not connected: {status.get('error', 'Unknown error')}")
print("\n2. List online machines:")
machines = list_online_machines()
print(f" Found {len(machines)} online machines")
for machine in machines[:5]: # Show first 5
print(f" - {machine}")
print("\n3. Network summary:")
print(get_network_summary())
print("\n✅ Tailscale manager tested")
if __name__ == "__main__":
main()

628
scripts/utils/helpers.py Normal file
View File

@@ -0,0 +1,628 @@
#!/usr/bin/env python3
"""
Helper utilities for Tailscale SSH Sync Agent.
Provides common formatting, parsing, and utility functions.
"""
import os
import re
import subprocess
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Any
import yaml
import logging
logger = logging.getLogger(__name__)
def format_bytes(bytes_value: int) -> str:
"""
Format bytes as human-readable string.
Args:
bytes_value: Number of bytes
Returns:
Formatted string (e.g., "12.3 MB", "1.5 GB")
Example:
>>> format_bytes(12582912)
"12.0 MB"
>>> format_bytes(1610612736)
"1.5 GB"
"""
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if bytes_value < 1024.0:
return f"{bytes_value:.1f} {unit}"
bytes_value /= 1024.0
return f"{bytes_value:.1f} PB"
def format_duration(seconds: float) -> str:
"""
Format duration as human-readable string.
Args:
seconds: Duration in seconds
Returns:
Formatted string (e.g., "2m 15s", "1h 30m")
Example:
>>> format_duration(135)
"2m 15s"
>>> format_duration(5430)
"1h 30m 30s"
"""
if seconds < 60:
return f"{int(seconds)}s"
minutes = int(seconds // 60)
secs = int(seconds % 60)
if minutes < 60:
return f"{minutes}m {secs}s" if secs > 0 else f"{minutes}m"
hours = minutes // 60
minutes = minutes % 60
parts = [f"{hours}h"]
if minutes > 0:
parts.append(f"{minutes}m")
if secs > 0 and hours == 0: # Only show seconds if < 1 hour
parts.append(f"{secs}s")
return " ".join(parts)
def format_percentage(value: float, decimals: int = 1) -> str:
"""
Format percentage with specified decimals.
Args:
value: Percentage value (0-100)
decimals: Number of decimal places
Returns:
Formatted string (e.g., "45.5%")
Example:
>>> format_percentage(45.567)
"45.6%"
"""
return f"{value:.{decimals}f}%"
def parse_ssh_config(config_path: Optional[Path] = None) -> Dict[str, Dict[str, str]]:
"""
Parse SSH config file for host definitions.
Args:
config_path: Path to SSH config (default: ~/.ssh/config)
Returns:
Dict mapping host aliases to their configuration:
{
'host-alias': {
'hostname': '100.64.1.10',
'user': 'admin',
'port': '22',
'identityfile': '~/.ssh/id_ed25519'
}
}
Example:
>>> hosts = parse_ssh_config()
>>> hosts['homelab-1']['hostname']
'100.64.1.10'
"""
if config_path is None:
config_path = Path.home() / '.ssh' / 'config'
if not config_path.exists():
logger.warning(f"SSH config not found: {config_path}")
return {}
hosts = {}
current_host = None
try:
with open(config_path, 'r') as f:
for line in f:
line = line.strip()
# Skip comments and empty lines
if not line or line.startswith('#'):
continue
# Host directive
if line.lower().startswith('host '):
host_alias = line.split(maxsplit=1)[1]
# Skip wildcards
if '*' not in host_alias and '?' not in host_alias:
current_host = host_alias
hosts[current_host] = {}
# Configuration directives
elif current_host:
parts = line.split(maxsplit=1)
if len(parts) == 2:
key, value = parts
hosts[current_host][key.lower()] = value
return hosts
except Exception as e:
logger.error(f"Error parsing SSH config: {e}")
return {}
def parse_sshsync_config(config_path: Optional[Path] = None) -> Dict[str, List[str]]:
"""
Parse sshsync config file for group definitions.
Args:
config_path: Path to sshsync config (default: ~/.config/sshsync/config.yaml)
Returns:
Dict mapping group names to list of hosts:
{
'production': ['prod-web-01', 'prod-db-01'],
'development': ['dev-laptop', 'dev-desktop']
}
Example:
>>> groups = parse_sshsync_config()
>>> groups['production']
['prod-web-01', 'prod-db-01']
"""
if config_path is None:
config_path = Path.home() / '.config' / 'sshsync' / 'config.yaml'
if not config_path.exists():
logger.warning(f"sshsync config not found: {config_path}")
return {}
try:
with open(config_path, 'r') as f:
config = yaml.safe_load(f)
return config.get('groups', {})
except Exception as e:
logger.error(f"Error parsing sshsync config: {e}")
return {}
def get_timestamp(iso: bool = True) -> str:
"""
Get current timestamp.
Args:
iso: If True, return ISO format; otherwise human-readable
Returns:
Timestamp string
Example:
>>> get_timestamp(iso=True)
"2025-10-19T19:43:41Z"
>>> get_timestamp(iso=False)
"2025-10-19 19:43:41"
"""
now = datetime.now()
if iso:
return now.strftime("%Y-%m-%dT%H:%M:%SZ")
else:
return now.strftime("%Y-%m-%d %H:%M:%S")
def safe_execute(func, *args, default=None, **kwargs) -> Any:
"""
Execute function with error handling.
Args:
func: Function to execute
*args: Positional arguments
default: Value to return on error
**kwargs: Keyword arguments
Returns:
Function result or default on error
Example:
>>> safe_execute(int, "not_a_number", default=0)
0
>>> safe_execute(int, "42")
42
"""
try:
return func(*args, **kwargs)
except Exception as e:
logger.error(f"Error executing {func.__name__}: {e}")
return default
def validate_path(path: str, must_exist: bool = True) -> bool:
"""
Check if path is valid and accessible.
Args:
path: Path to validate
must_exist: If True, path must exist
Returns:
True if valid, False otherwise
Example:
>>> validate_path("/tmp")
True
>>> validate_path("/nonexistent", must_exist=True)
False
"""
p = Path(path).expanduser()
if must_exist:
return p.exists()
else:
# Check if parent directory exists (for paths that will be created)
return p.parent.exists()
def parse_disk_usage(df_output: str) -> Dict[str, Any]:
"""
Parse 'df' command output.
Args:
df_output: Output from 'df -h' command
Returns:
Dict with disk usage info:
{
'filesystem': '/dev/sda1',
'size': '100G',
'used': '45G',
'available': '50G',
'use_pct': 45,
'mount': '/'
}
Example:
>>> output = "Filesystem Size Used Avail Use% Mounted on\\n/dev/sda1 100G 45G 50G 45% /"
>>> parse_disk_usage(output)
{'filesystem': '/dev/sda1', 'size': '100G', ...}
"""
lines = df_output.strip().split('\n')
if len(lines) < 2:
return {}
# Parse last line (actual data, not header)
data_line = lines[-1]
parts = data_line.split()
if len(parts) < 6:
return {}
try:
return {
'filesystem': parts[0],
'size': parts[1],
'used': parts[2],
'available': parts[3],
'use_pct': int(parts[4].rstrip('%')),
'mount': parts[5]
}
except (ValueError, IndexError) as e:
logger.error(f"Error parsing disk usage: {e}")
return {}
def parse_memory_usage(free_output: str) -> Dict[str, Any]:
"""
Parse 'free' command output (Linux).
Args:
free_output: Output from 'free -m' command
Returns:
Dict with memory info:
{
'total': 16384, # MB
'used': 8192,
'free': 8192,
'use_pct': 50.0
}
Example:
>>> output = "Mem: 16384 8192 8192 0 0 0"
>>> parse_memory_usage(output)
{'total': 16384, 'used': 8192, ...}
"""
lines = free_output.strip().split('\n')
for line in lines:
if line.startswith('Mem:'):
parts = line.split()
if len(parts) >= 3:
try:
total = int(parts[1])
used = int(parts[2])
free = int(parts[3]) if len(parts) > 3 else (total - used)
return {
'total': total,
'used': used,
'free': free,
'use_pct': (used / total * 100) if total > 0 else 0
}
except (ValueError, IndexError) as e:
logger.error(f"Error parsing memory usage: {e}")
return {}
def parse_cpu_load(uptime_output: str) -> Dict[str, float]:
"""
Parse 'uptime' command output for load averages.
Args:
uptime_output: Output from 'uptime' command
Returns:
Dict with load averages:
{
'load_1min': 0.45,
'load_5min': 0.38,
'load_15min': 0.32
}
Example:
>>> output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
>>> parse_cpu_load(output)
{'load_1min': 0.45, 'load_5min': 0.38, 'load_15min': 0.32}
"""
# Find "load average:" part
match = re.search(r'load average:\s+([\d.]+),\s+([\d.]+),\s+([\d.]+)', uptime_output)
if match:
try:
return {
'load_1min': float(match.group(1)),
'load_5min': float(match.group(2)),
'load_15min': float(match.group(3))
}
except ValueError as e:
logger.error(f"Error parsing CPU load: {e}")
return {}
def format_host_status(host: str, online: bool, groups: List[str],
latency: Optional[int] = None,
tailscale_connected: bool = False) -> str:
"""
Format host status as display string.
Args:
host: Host name
online: Whether host is online
groups: List of groups host belongs to
latency: Latency in ms (optional)
tailscale_connected: Tailscale connection status
Returns:
Formatted status string
Example:
>>> format_host_status("web-01", True, ["production", "web"], 25, True)
"🟢 web-01 (production, web) - Online - Tailscale: Connected | Latency: 25ms"
"""
icon = "🟢" if online else "🔴"
status = "Online" if online else "Offline"
group_str = ", ".join(groups) if groups else "no group"
parts = [f"{icon} {host} ({group_str}) - {status}"]
if tailscale_connected:
parts.append("Tailscale: Connected")
if latency is not None and online:
parts.append(f"Latency: {latency}ms")
return " - ".join(parts)
def calculate_load_score(cpu_pct: float, mem_pct: float, disk_pct: float) -> float:
"""
Calculate composite load score for a machine.
Args:
cpu_pct: CPU usage percentage (0-100)
mem_pct: Memory usage percentage (0-100)
disk_pct: Disk usage percentage (0-100)
Returns:
Load score (0-1, lower is better)
Formula:
score = (cpu * 0.4) + (mem * 0.3) + (disk * 0.3)
Example:
>>> calculate_load_score(45, 60, 40)
0.48 # (0.45*0.4 + 0.60*0.3 + 0.40*0.3)
"""
return (cpu_pct * 0.4 + mem_pct * 0.3 + disk_pct * 0.3) / 100
def classify_load_status(score: float) -> str:
"""
Classify load score into status category.
Args:
score: Load score (0-1)
Returns:
Status string: "low", "moderate", or "high"
Example:
>>> classify_load_status(0.28)
"low"
>>> classify_load_status(0.55)
"moderate"
>>> classify_load_status(0.82)
"high"
"""
if score < 0.4:
return "low"
elif score < 0.7:
return "moderate"
else:
return "high"
def classify_latency(latency_ms: int) -> Tuple[str, str]:
"""
Classify network latency.
Args:
latency_ms: Latency in milliseconds
Returns:
Tuple of (status, description)
Example:
>>> classify_latency(25)
("excellent", "Ideal for interactive tasks")
>>> classify_latency(150)
("fair", "May impact interactive workflows")
"""
if latency_ms < 50:
return ("excellent", "Ideal for interactive tasks")
elif latency_ms < 100:
return ("good", "Suitable for most operations")
elif latency_ms < 200:
return ("fair", "May impact interactive workflows")
else:
return ("poor", "Investigate network issues")
def get_hosts_from_groups(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
"""
Get list of hosts in a group.
Args:
group: Group name
groups_config: Groups configuration dict
Returns:
List of host names in group
Example:
>>> groups = {'production': ['web-01', 'db-01']}
>>> get_hosts_from_groups('production', groups)
['web-01', 'db-01']
"""
return groups_config.get(group, [])
def get_groups_for_host(host: str, groups_config: Dict[str, List[str]]) -> List[str]:
"""
Get list of groups a host belongs to.
Args:
host: Host name
groups_config: Groups configuration dict
Returns:
List of group names
Example:
>>> groups = {'production': ['web-01'], 'web': ['web-01', 'web-02']}
>>> get_groups_for_host('web-01', groups)
['production', 'web']
"""
return [group for group, hosts in groups_config.items() if host in hosts]
def run_command(command: str, timeout: int = 10) -> Tuple[bool, str, str]:
"""
Run shell command with timeout.
Args:
command: Command to execute
timeout: Timeout in seconds
Returns:
Tuple of (success, stdout, stderr)
Example:
>>> success, stdout, stderr = run_command("echo hello")
>>> success
True
>>> stdout.strip()
"hello"
"""
try:
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=timeout
)
return (
result.returncode == 0,
result.stdout,
result.stderr
)
except subprocess.TimeoutExpired:
return (False, "", f"Command timed out after {timeout}s")
except Exception as e:
return (False, "", str(e))
def main():
"""Test helper functions."""
print("Testing helper functions...\n")
# Test formatting
print("1. Format bytes:")
print(f" 12582912 bytes = {format_bytes(12582912)}")
print(f" 1610612736 bytes = {format_bytes(1610612736)}")
print("\n2. Format duration:")
print(f" 135 seconds = {format_duration(135)}")
print(f" 5430 seconds = {format_duration(5430)}")
print("\n3. Format percentage:")
print(f" 45.567 = {format_percentage(45.567)}")
print("\n4. Calculate load score:")
score = calculate_load_score(45, 60, 40)
print(f" CPU 45%, Mem 60%, Disk 40% = {score:.2f}")
print(f" Status: {classify_load_status(score)}")
print("\n5. Classify latency:")
latencies = [25, 75, 150, 250]
for lat in latencies:
status, desc = classify_latency(lat)
print(f" {lat}ms: {status} - {desc}")
print("\n6. Parse SSH config:")
ssh_hosts = parse_ssh_config()
print(f" Found {len(ssh_hosts)} hosts")
print("\n7. Parse sshsync config:")
groups = parse_sshsync_config()
print(f" Found {len(groups)} groups")
for group, hosts in groups.items():
print(f" - {group}: {len(hosts)} hosts")
print("\n✅ All helpers tested successfully")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,43 @@
"""
Validators package for Tailscale SSH Sync Agent.
"""
from .parameter_validator import (
ValidationError,
validate_host,
validate_group,
validate_path_exists,
validate_timeout,
validate_command
)
from .host_validator import (
validate_ssh_config,
validate_host_reachable,
validate_group_members,
get_invalid_hosts
)
from .connection_validator import (
validate_ssh_connection,
validate_tailscale_connection,
validate_ssh_key,
get_connection_diagnostics
)
__all__ = [
'ValidationError',
'validate_host',
'validate_group',
'validate_path_exists',
'validate_timeout',
'validate_command',
'validate_ssh_config',
'validate_host_reachable',
'validate_group_members',
'get_invalid_hosts',
'validate_ssh_connection',
'validate_tailscale_connection',
'validate_ssh_key',
'get_connection_diagnostics',
]

View File

@@ -0,0 +1,275 @@
#!/usr/bin/env python3
"""
Connection validators for Tailscale SSH Sync Agent.
Validates SSH and Tailscale connections.
"""
import subprocess
from typing import Dict, Optional
import logging
from .parameter_validator import ValidationError
logger = logging.getLogger(__name__)
def validate_ssh_connection(host: str, timeout: int = 10) -> bool:
"""
Test SSH connection works.
Args:
host: Host to connect to
timeout: Connection timeout in seconds
Returns:
True if SSH connection successful
Raises:
ValidationError: If connection fails
Example:
>>> validate_ssh_connection("web-01")
True
"""
try:
# Try to execute a simple command via SSH
result = subprocess.run(
["ssh", "-o", "ConnectTimeout={}".format(timeout),
"-o", "BatchMode=yes",
"-o", "StrictHostKeyChecking=no",
host, "echo", "test"],
capture_output=True,
text=True,
timeout=timeout + 5
)
if result.returncode == 0:
return True
else:
# Parse error message
error_msg = result.stderr.strip()
if "Permission denied" in error_msg:
raise ValidationError(
f"SSH authentication failed for '{host}'\n"
"Check:\n"
"1. SSH key is added: ssh-add -l\n"
"2. Public key is on remote: cat ~/.ssh/authorized_keys\n"
"3. User/key in SSH config is correct"
)
elif "Connection refused" in error_msg:
raise ValidationError(
f"SSH connection refused for '{host}'\n"
"Check:\n"
"1. SSH server is running on remote\n"
"2. Port 22 is not blocked by firewall"
)
elif "Connection timed out" in error_msg or "timeout" in error_msg.lower():
raise ValidationError(
f"SSH connection timed out for '{host}'\n"
"Check:\n"
"1. Host is reachable (ping test)\n"
"2. Tailscale is connected\n"
"3. Network connectivity"
)
else:
raise ValidationError(
f"SSH connection failed for '{host}': {error_msg}"
)
except subprocess.TimeoutExpired:
raise ValidationError(
f"SSH connection timed out for '{host}' (>{timeout}s)"
)
except Exception as e:
raise ValidationError(f"Error testing SSH connection to '{host}': {e}")
def validate_tailscale_connection(host: str) -> bool:
"""
Test Tailscale connectivity to host.
Args:
host: Host to check
Returns:
True if Tailscale connection active
Raises:
ValidationError: If Tailscale not connected
Example:
>>> validate_tailscale_connection("web-01")
True
"""
try:
# Check if tailscale is running
result = subprocess.run(
["tailscale", "status"],
capture_output=True,
text=True,
timeout=5
)
if result.returncode != 0:
raise ValidationError(
"Tailscale is not running\n"
"Start Tailscale: sudo tailscale up"
)
# Check if specific host is in the network
if host in result.stdout or host.replace('-', '.') in result.stdout:
return True
else:
raise ValidationError(
f"Host '{host}' not found in Tailscale network\n"
"Ensure host is:\n"
"1. Connected to Tailscale\n"
"2. In the same tailnet\n"
"3. Not expired/offline"
)
except FileNotFoundError:
raise ValidationError(
"Tailscale not installed\n"
"Install: https://tailscale.com/download"
)
except subprocess.TimeoutExpired:
raise ValidationError("Timeout checking Tailscale status")
except Exception as e:
raise ValidationError(f"Error checking Tailscale connection: {e}")
def validate_ssh_key(host: str) -> bool:
"""
Check SSH key authentication is working.
Args:
host: Host to check
Returns:
True if SSH key auth works
Raises:
ValidationError: If key auth fails
Example:
>>> validate_ssh_key("web-01")
True
"""
try:
# Test connection with explicit key-only auth
result = subprocess.run(
["ssh", "-o", "BatchMode=yes",
"-o", "PasswordAuthentication=no",
"-o", "ConnectTimeout=5",
host, "echo", "test"],
capture_output=True,
text=True,
timeout=10
)
if result.returncode == 0:
return True
else:
error_msg = result.stderr.strip()
if "Permission denied" in error_msg:
raise ValidationError(
f"SSH key authentication failed for '{host}'\n"
"Fix:\n"
"1. Add your SSH key: ssh-add ~/.ssh/id_ed25519\n"
"2. Copy public key to remote: ssh-copy-id {}\n"
"3. Verify: ssh -v {} 2>&1 | grep -i 'offering public key'".format(host, host)
)
else:
raise ValidationError(
f"SSH key validation failed for '{host}': {error_msg}"
)
except subprocess.TimeoutExpired:
raise ValidationError(f"Timeout validating SSH key for '{host}'")
except Exception as e:
raise ValidationError(f"Error validating SSH key for '{host}': {e}")
def get_connection_diagnostics(host: str) -> Dict[str, any]:
"""
Comprehensive connection testing.
Args:
host: Host to diagnose
Returns:
Dict with diagnostic results:
{
'ping': {'success': bool, 'message': str},
'ssh': {'success': bool, 'message': str},
'tailscale': {'success': bool, 'message': str},
'ssh_key': {'success': bool, 'message': str}
}
Example:
>>> diag = get_connection_diagnostics("web-01")
>>> diag['ssh']['success']
True
"""
diagnostics = {}
# Test 1: Ping
try:
result = subprocess.run(
["ping", "-c", "1", "-W", "2", host],
capture_output=True,
timeout=3
)
diagnostics['ping'] = {
'success': result.returncode == 0,
'message': 'Host is reachable' if result.returncode == 0 else 'Host not reachable'
}
except Exception as e:
diagnostics['ping'] = {'success': False, 'message': str(e)}
# Test 2: SSH connection
try:
validate_ssh_connection(host, timeout=5)
diagnostics['ssh'] = {'success': True, 'message': 'SSH connection works'}
except ValidationError as e:
diagnostics['ssh'] = {'success': False, 'message': str(e).split('\n')[0]}
# Test 3: Tailscale
try:
validate_tailscale_connection(host)
diagnostics['tailscale'] = {'success': True, 'message': 'Tailscale connected'}
except ValidationError as e:
diagnostics['tailscale'] = {'success': False, 'message': str(e).split('\n')[0]}
# Test 4: SSH key
try:
validate_ssh_key(host)
diagnostics['ssh_key'] = {'success': True, 'message': 'SSH key authentication works'}
except ValidationError as e:
diagnostics['ssh_key'] = {'success': False, 'message': str(e).split('\n')[0]}
return diagnostics
def main():
"""Test connection validators."""
print("Testing connection validators...\n")
print("1. Testing connection diagnostics:")
try:
diag = get_connection_diagnostics("localhost")
print(" Results:")
for test, result in diag.items():
status = "" if result['success'] else ""
print(f" {status} {test}: {result['message']}")
except Exception as e:
print(f" Error: {e}")
print("\n✅ Connection validators tested")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,232 @@
#!/usr/bin/env python3
"""
Host validators for Tailscale SSH Sync Agent.
Validates host configuration and availability.
"""
import subprocess
from typing import List, Dict, Optional
from pathlib import Path
import logging
from .parameter_validator import ValidationError
logger = logging.getLogger(__name__)
def validate_ssh_config(host: str, config_path: Optional[Path] = None) -> bool:
"""
Check if host has SSH config entry.
Args:
host: Host name to check
config_path: Path to SSH config (default: ~/.ssh/config)
Returns:
True if host is in SSH config
Raises:
ValidationError: If host not found in config
Example:
>>> validate_ssh_config("web-01")
True
"""
if config_path is None:
config_path = Path.home() / '.ssh' / 'config'
if not config_path.exists():
raise ValidationError(
f"SSH config file not found: {config_path}\n"
"Create ~/.ssh/config with your host definitions"
)
# Parse SSH config for this host
host_found = False
try:
with open(config_path, 'r') as f:
for line in f:
line = line.strip()
if line.lower().startswith('host ') and host in line:
host_found = True
break
if not host_found:
raise ValidationError(
f"Host '{host}' not found in SSH config: {config_path}\n"
"Add host to SSH config:\n"
f"Host {host}\n"
f" HostName <IP_ADDRESS>\n"
f" User <USERNAME>"
)
return True
except IOError as e:
raise ValidationError(f"Error reading SSH config: {e}")
def validate_host_reachable(host: str, timeout: int = 5) -> bool:
"""
Check if host is reachable via ping.
Args:
host: Host name to check
timeout: Timeout in seconds
Returns:
True if host is reachable
Raises:
ValidationError: If host is not reachable
Example:
>>> validate_host_reachable("web-01", timeout=5)
True
"""
try:
# Try to resolve via SSH config first
result = subprocess.run(
["ssh", "-G", host],
capture_output=True,
text=True,
timeout=2
)
if result.returncode == 0:
# Extract hostname from SSH config
for line in result.stdout.split('\n'):
if line.startswith('hostname '):
actual_host = line.split()[1]
break
else:
actual_host = host
else:
actual_host = host
# Ping the host
ping_result = subprocess.run(
["ping", "-c", "1", "-W", str(timeout), actual_host],
capture_output=True,
text=True,
timeout=timeout + 1
)
if ping_result.returncode == 0:
return True
else:
raise ValidationError(
f"Host '{host}' ({actual_host}) is not reachable\n"
"Check:\n"
"1. Host is powered on\n"
"2. Tailscale is connected\n"
"3. Network connectivity"
)
except subprocess.TimeoutExpired:
raise ValidationError(f"Timeout checking host '{host}' (>{timeout}s)")
except Exception as e:
raise ValidationError(f"Error checking host '{host}': {e}")
def validate_group_members(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
"""
Ensure group has valid members.
Args:
group: Group name
groups_config: Groups configuration dict
Returns:
List of valid hosts in group
Raises:
ValidationError: If group is empty or has no valid members
Example:
>>> groups = {'production': ['web-01', 'db-01']}
>>> validate_group_members('production', groups)
['web-01', 'db-01']
"""
if group not in groups_config:
raise ValidationError(
f"Group '{group}' not found in configuration\n"
f"Available groups: {', '.join(groups_config.keys())}"
)
members = groups_config[group]
if not members:
raise ValidationError(
f"Group '{group}' has no members\n"
f"Add hosts to group with: sshsync gadd {group}"
)
if not isinstance(members, list):
raise ValidationError(
f"Invalid group configuration for '{group}': members must be a list"
)
return members
def get_invalid_hosts(hosts: List[str], config_path: Optional[Path] = None) -> List[str]:
"""
Find hosts without valid SSH config.
Args:
hosts: List of host names
config_path: Path to SSH config
Returns:
List of hosts without valid config
Example:
>>> get_invalid_hosts(["web-01", "nonexistent"])
["nonexistent"]
"""
if config_path is None:
config_path = Path.home() / '.ssh' / 'config'
if not config_path.exists():
return hosts # All invalid if no config
# Parse SSH config
valid_hosts = set()
try:
with open(config_path, 'r') as f:
for line in f:
line = line.strip()
if line.lower().startswith('host '):
host_alias = line.split(maxsplit=1)[1]
if '*' not in host_alias and '?' not in host_alias:
valid_hosts.add(host_alias)
except IOError:
return hosts
# Find invalid hosts
return [h for h in hosts if h not in valid_hosts]
def main():
"""Test host validators."""
print("Testing host validators...\n")
print("1. Testing validate_ssh_config():")
try:
validate_ssh_config("localhost")
print(" ✓ localhost has SSH config")
except ValidationError as e:
print(f" Note: {e.args[0].split(chr(10))[0]}")
print("\n2. Testing get_invalid_hosts():")
test_hosts = ["localhost", "nonexistent-host-12345"]
invalid = get_invalid_hosts(test_hosts)
print(f" Invalid hosts: {invalid}")
print("\n✅ Host validators tested")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,363 @@
#!/usr/bin/env python3
"""
Parameter validators for Tailscale SSH Sync Agent.
Validates user inputs before making operations.
"""
from typing import List, Optional
from pathlib import Path
import re
import logging
logger = logging.getLogger(__name__)
class ValidationError(Exception):
"""Raised when validation fails."""
pass
def validate_host(host: str, valid_hosts: Optional[List[str]] = None) -> str:
"""
Validate host parameter.
Args:
host: Host name or alias
valid_hosts: List of valid hosts (None to skip check)
Returns:
str: Validated and normalized host name
Raises:
ValidationError: If host is invalid
Example:
>>> validate_host("web-01")
"web-01"
>>> validate_host("web-01", ["web-01", "web-02"])
"web-01"
"""
if not host:
raise ValidationError("Host cannot be empty")
if not isinstance(host, str):
raise ValidationError(f"Host must be string, got {type(host)}")
# Normalize (strip whitespace, lowercase for comparison)
host = host.strip()
# Basic validation: alphanumeric, dash, underscore, dot
if not re.match(r'^[a-zA-Z0-9._-]+$', host):
raise ValidationError(
f"Invalid host name format: {host}\n"
"Host names must contain only letters, numbers, dots, dashes, and underscores"
)
# Check if valid (if list provided)
if valid_hosts:
# Try exact match first
if host in valid_hosts:
return host
# Try case-insensitive match
for valid_host in valid_hosts:
if host.lower() == valid_host.lower():
return valid_host
# Not found - provide suggestions
suggestions = [h for h in valid_hosts if host[:3].lower() in h.lower()]
raise ValidationError(
f"Invalid host: {host}\n"
f"Valid options: {', '.join(valid_hosts[:10])}\n"
+ (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
)
return host
def validate_group(group: str, valid_groups: Optional[List[str]] = None) -> str:
"""
Validate group parameter.
Args:
group: Group name
valid_groups: List of valid groups (None to skip check)
Returns:
str: Validated group name
Raises:
ValidationError: If group is invalid
Example:
>>> validate_group("production")
"production"
>>> validate_group("prod", ["production", "development"])
ValidationError: Invalid group: prod
"""
if not group:
raise ValidationError("Group cannot be empty")
if not isinstance(group, str):
raise ValidationError(f"Group must be string, got {type(group)}")
# Normalize
group = group.strip().lower()
# Basic validation
if not re.match(r'^[a-z0-9_-]+$', group):
raise ValidationError(
f"Invalid group name format: {group}\n"
"Group names must contain only lowercase letters, numbers, dashes, and underscores"
)
# Check if valid (if list provided)
if valid_groups:
if group not in valid_groups:
suggestions = [g for g in valid_groups if group[:3] in g]
raise ValidationError(
f"Invalid group: {group}\n"
f"Valid groups: {', '.join(valid_groups)}\n"
+ (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
)
return group
def validate_path_exists(path: str, must_be_file: bool = False,
must_be_dir: bool = False) -> Path:
"""
Validate path exists and is accessible.
Args:
path: Path to validate
must_be_file: If True, path must be a file
must_be_dir: If True, path must be a directory
Returns:
Path: Validated Path object
Raises:
ValidationError: If path is invalid
Example:
>>> validate_path_exists("/tmp", must_be_dir=True)
Path('/tmp')
>>> validate_path_exists("/nonexistent")
ValidationError: Path does not exist: /nonexistent
"""
if not path:
raise ValidationError("Path cannot be empty")
p = Path(path).expanduser().resolve()
if not p.exists():
raise ValidationError(
f"Path does not exist: {path}\n"
f"Resolved to: {p}"
)
if must_be_file and not p.is_file():
raise ValidationError(f"Path must be a file: {path}")
if must_be_dir and not p.is_dir():
raise ValidationError(f"Path must be a directory: {path}")
return p
def validate_timeout(timeout: int, min_timeout: int = 1,
max_timeout: int = 600) -> int:
"""
Validate timeout parameter.
Args:
timeout: Timeout in seconds
min_timeout: Minimum allowed timeout
max_timeout: Maximum allowed timeout
Returns:
int: Validated timeout
Raises:
ValidationError: If timeout is invalid
Example:
>>> validate_timeout(10)
10
>>> validate_timeout(0)
ValidationError: Timeout must be between 1 and 600 seconds
"""
if not isinstance(timeout, int):
raise ValidationError(f"Timeout must be integer, got {type(timeout)}")
if timeout < min_timeout:
raise ValidationError(
f"Timeout too low: {timeout}s (minimum: {min_timeout}s)"
)
if timeout > max_timeout:
raise ValidationError(
f"Timeout too high: {timeout}s (maximum: {max_timeout}s)"
)
return timeout
def validate_command(command: str, allow_dangerous: bool = False) -> str:
"""
Basic command safety validation.
Args:
command: Command to validate
allow_dangerous: If False, block potentially dangerous commands
Returns:
str: Validated command
Raises:
ValidationError: If command is invalid or dangerous
Example:
>>> validate_command("ls -la")
"ls -la"
>>> validate_command("rm -rf /", allow_dangerous=False)
ValidationError: Potentially dangerous command blocked: rm -rf
"""
if not command:
raise ValidationError("Command cannot be empty")
if not isinstance(command, str):
raise ValidationError(f"Command must be string, got {type(command)}")
command = command.strip()
if not allow_dangerous:
# Check for dangerous patterns
dangerous_patterns = [
(r'\brm\s+-rf\s+/', "rm -rf on root directory"),
(r'\bmkfs\.', "filesystem formatting"),
(r'\bdd\s+.*of=/dev/', "disk writing with dd"),
(r':(){:|:&};:', "fork bomb"),
(r'>\s*/dev/sd[a-z]', "direct disk writing"),
]
for pattern, description in dangerous_patterns:
if re.search(pattern, command, re.IGNORECASE):
raise ValidationError(
f"Potentially dangerous command blocked: {description}\n"
f"Command: {command}\n"
"Use allow_dangerous=True if you really want to execute this"
)
return command
def validate_hosts_list(hosts: List[str], valid_hosts: Optional[List[str]] = None) -> List[str]:
"""
Validate a list of hosts.
Args:
hosts: List of host names
valid_hosts: List of valid hosts (None to skip check)
Returns:
List[str]: Validated host names
Raises:
ValidationError: If any host is invalid
Example:
>>> validate_hosts_list(["web-01", "web-02"])
["web-01", "web-02"]
"""
if not hosts:
raise ValidationError("Hosts list cannot be empty")
if not isinstance(hosts, list):
raise ValidationError(f"Hosts must be list, got {type(hosts)}")
validated = []
errors = []
for host in hosts:
try:
validated.append(validate_host(host, valid_hosts))
except ValidationError as e:
errors.append(str(e))
if errors:
raise ValidationError(
f"Invalid hosts in list:\n" + "\n".join(errors)
)
return validated
def main():
"""Test validators."""
print("Testing parameter validators...\n")
# Test host validation
print("1. Testing validate_host():")
try:
host = validate_host("web-01", ["web-01", "web-02", "db-01"])
print(f" ✓ Valid host: {host}")
except ValidationError as e:
print(f" ✗ Error: {e}")
try:
host = validate_host("invalid-host", ["web-01", "web-02"])
print(f" ✗ Should have failed!")
except ValidationError as e:
print(f" ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
# Test group validation
print("\n2. Testing validate_group():")
try:
group = validate_group("production", ["production", "development"])
print(f" ✓ Valid group: {group}")
except ValidationError as e:
print(f" ✗ Error: {e}")
# Test path validation
print("\n3. Testing validate_path_exists():")
try:
path = validate_path_exists("/tmp", must_be_dir=True)
print(f" ✓ Valid path: {path}")
except ValidationError as e:
print(f" ✗ Error: {e}")
# Test timeout validation
print("\n4. Testing validate_timeout():")
try:
timeout = validate_timeout(10)
print(f" ✓ Valid timeout: {timeout}s")
except ValidationError as e:
print(f" ✗ Error: {e}")
try:
timeout = validate_timeout(0)
print(f" ✗ Should have failed!")
except ValidationError as e:
print(f" ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
# Test command validation
print("\n5. Testing validate_command():")
try:
cmd = validate_command("ls -la")
print(f" ✓ Safe command: {cmd}")
except ValidationError as e:
print(f" ✗ Error: {e}")
try:
cmd = validate_command("rm -rf /", allow_dangerous=False)
print(f" ✗ Should have failed!")
except ValidationError as e:
print(f" ✓ Correctly blocked: {e.args[0].split(chr(10))[0]}")
print("\n✅ All parameter validators tested")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,445 @@
#!/usr/bin/env python3
"""
Workflow executor for Tailscale SSH Sync Agent.
Common multi-machine workflow automation.
"""
import sys
from pathlib import Path
from typing import Dict, List, Optional
import time
import logging
# Add utils to path
sys.path.insert(0, str(Path(__file__).parent))
from utils.helpers import format_duration, get_timestamp
from sshsync_wrapper import execute_on_group, execute_on_host, push_to_hosts
from load_balancer import get_group_capacity
logger = logging.getLogger(__name__)
def deploy_workflow(code_path: str,
staging_group: str,
prod_group: str,
run_tests: bool = True) -> Dict:
"""
Full deployment pipeline: staging → test → production.
Args:
code_path: Path to code to deploy
staging_group: Staging server group
prod_group: Production server group
run_tests: Whether to run tests on staging
Returns:
Dict with deployment results
Example:
>>> result = deploy_workflow("./dist", "staging", "production")
>>> result['success']
True
>>> result['duration']
"12m 45s"
"""
start_time = time.time()
results = {
'stages': {},
'success': False,
'start_time': get_timestamp()
}
try:
# Stage 1: Deploy to staging
logger.info("Stage 1: Deploying to staging...")
stage1 = push_to_hosts(
local_path=code_path,
remote_path="/var/www/app",
group=staging_group,
recurse=True
)
results['stages']['staging_deploy'] = stage1
if not stage1.get('success'):
results['error'] = 'Staging deployment failed'
return results
# Build on staging
logger.info("Building on staging...")
build_result = execute_on_group(
staging_group,
"cd /var/www/app && npm run build",
timeout=300
)
results['stages']['staging_build'] = build_result
if not build_result.get('success'):
results['error'] = 'Staging build failed'
return results
# Stage 2: Run tests (if enabled)
if run_tests:
logger.info("Stage 2: Running tests...")
test_result = execute_on_group(
staging_group,
"cd /var/www/app && npm test",
timeout=600
)
results['stages']['tests'] = test_result
if not test_result.get('success'):
results['error'] = 'Tests failed on staging'
return results
# Stage 3: Validation
logger.info("Stage 3: Validating staging...")
health_result = execute_on_group(
staging_group,
"curl -f http://localhost:3000/health || echo 'Health check failed'",
timeout=10
)
results['stages']['staging_validation'] = health_result
# Stage 4: Deploy to production
logger.info("Stage 4: Deploying to production...")
prod_deploy = push_to_hosts(
local_path=code_path,
remote_path="/var/www/app",
group=prod_group,
recurse=True
)
results['stages']['production_deploy'] = prod_deploy
if not prod_deploy.get('success'):
results['error'] = 'Production deployment failed'
return results
# Build and restart on production
logger.info("Building and restarting production...")
prod_build = execute_on_group(
prod_group,
"cd /var/www/app && npm run build && pm2 restart app",
timeout=300
)
results['stages']['production_build'] = prod_build
# Stage 5: Production verification
logger.info("Stage 5: Verifying production...")
prod_health = execute_on_group(
prod_group,
"curl -f http://localhost:3000/health",
timeout=15
)
results['stages']['production_verification'] = prod_health
# Success!
results['success'] = True
results['duration'] = format_duration(time.time() - start_time)
return results
except Exception as e:
logger.error(f"Deployment workflow error: {e}")
results['error'] = str(e)
results['duration'] = format_duration(time.time() - start_time)
return results
def backup_workflow(hosts: List[str],
backup_paths: List[str],
destination: str) -> Dict:
"""
Backup files from multiple hosts.
Args:
hosts: List of hosts to backup from
backup_paths: Paths to backup on each host
destination: Local destination directory
Returns:
Dict with backup results
Example:
>>> result = backup_workflow(
... ["db-01", "db-02"],
... ["/var/lib/mysql"],
... "./backups"
... )
>>> result['backed_up_hosts']
2
"""
from sshsync_wrapper import pull_from_host
start_time = time.time()
results = {
'hosts': {},
'success': True,
'backed_up_hosts': 0
}
for host in hosts:
host_results = []
for backup_path in backup_paths:
# Create timestamped backup directory
timestamp = time.strftime("%Y%m%d_%H%M%S")
host_dest = f"{destination}/{host}_{timestamp}"
result = pull_from_host(
host=host,
remote_path=backup_path,
local_path=host_dest,
recurse=True
)
host_results.append(result)
if not result.get('success'):
results['success'] = False
results['hosts'][host] = host_results
if all(r.get('success') for r in host_results):
results['backed_up_hosts'] += 1
results['duration'] = format_duration(time.time() - start_time)
return results
def sync_workflow(source_host: str,
target_group: str,
paths: List[str]) -> Dict:
"""
Sync files from one host to many.
Args:
source_host: Host to pull from
target_group: Group to push to
paths: Paths to sync
Returns:
Dict with sync results
Example:
>>> result = sync_workflow(
... "master-db",
... "replica-dbs",
... ["/var/lib/mysql/config"]
... )
>>> result['success']
True
"""
from sshsync_wrapper import pull_from_host, push_to_hosts
import tempfile
import shutil
start_time = time.time()
results = {'paths': {}, 'success': True}
# Create temp directory
with tempfile.TemporaryDirectory() as temp_dir:
for path in paths:
# Pull from source
pull_result = pull_from_host(
host=source_host,
remote_path=path,
local_path=f"{temp_dir}/{Path(path).name}",
recurse=True
)
if not pull_result.get('success'):
results['paths'][path] = {
'success': False,
'error': 'Pull from source failed'
}
results['success'] = False
continue
# Push to targets
push_result = push_to_hosts(
local_path=f"{temp_dir}/{Path(path).name}",
remote_path=path,
group=target_group,
recurse=True
)
results['paths'][path] = {
'pull': pull_result,
'push': push_result,
'success': push_result.get('success', False)
}
if not push_result.get('success'):
results['success'] = False
results['duration'] = format_duration(time.time() - start_time)
return results
def rolling_restart(group: str,
service_name: str,
wait_between: int = 30) -> Dict:
"""
Zero-downtime rolling restart of a service across group.
Args:
group: Group to restart
service_name: Service name (e.g., "nginx", "app")
wait_between: Seconds to wait between restarts
Returns:
Dict with restart results
Example:
>>> result = rolling_restart("web-servers", "nginx")
>>> result['restarted_count']
3
"""
from utils.helpers import parse_sshsync_config
start_time = time.time()
groups_config = parse_sshsync_config()
hosts = groups_config.get(group, [])
if not hosts:
return {
'success': False,
'error': f'Group {group} not found or empty'
}
results = {
'hosts': {},
'restarted_count': 0,
'failed_count': 0,
'success': True
}
for host in hosts:
logger.info(f"Restarting {service_name} on {host}...")
# Restart service
restart_result = execute_on_host(
host,
f"sudo systemctl restart {service_name} || sudo service {service_name} restart",
timeout=30
)
# Health check
time.sleep(5) # Wait for service to start
health_result = execute_on_host(
host,
f"sudo systemctl is-active {service_name} || sudo service {service_name} status",
timeout=10
)
success = restart_result.get('success') and health_result.get('success')
results['hosts'][host] = {
'restart': restart_result,
'health': health_result,
'success': success
}
if success:
results['restarted_count'] += 1
logger.info(f"{host} restarted successfully")
else:
results['failed_count'] += 1
results['success'] = False
logger.error(f"{host} restart failed")
# Wait before next restart (except last)
if host != hosts[-1]:
time.sleep(wait_between)
results['duration'] = format_duration(time.time() - start_time)
return results
def health_check_workflow(group: str,
endpoint: str = "/health",
timeout: int = 10) -> Dict:
"""
Check health endpoint across group.
Args:
group: Group to check
endpoint: Health endpoint path
timeout: Request timeout
Returns:
Dict with health check results
Example:
>>> result = health_check_workflow("production", "/health")
>>> result['healthy_count']
3
"""
from utils.helpers import parse_sshsync_config
groups_config = parse_sshsync_config()
hosts = groups_config.get(group, [])
if not hosts:
return {
'success': False,
'error': f'Group {group} not found or empty'
}
results = {
'hosts': {},
'healthy_count': 0,
'unhealthy_count': 0
}
for host in hosts:
health_result = execute_on_host(
host,
f"curl -f -s -o /dev/null -w '%{{http_code}}' http://localhost:3000{endpoint}",
timeout=timeout
)
is_healthy = (
health_result.get('success') and
'200' in health_result.get('stdout', '')
)
results['hosts'][host] = {
'healthy': is_healthy,
'response': health_result.get('stdout', '').strip()
}
if is_healthy:
results['healthy_count'] += 1
else:
results['unhealthy_count'] += 1
results['success'] = results['unhealthy_count'] == 0
return results
def main():
"""Test workflow executor functions."""
print("Testing workflow executor...\n")
print("Note: Workflow executor requires configured hosts and groups.")
print("Tests would execute real operations, so showing dry-run simulations.\n")
print("✅ Workflow executor ready")
if __name__ == "__main__":
main()

180
tests/test_helpers.py Normal file
View File

@@ -0,0 +1,180 @@
#!/usr/bin/env python3
"""
Tests for helper utilities.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from utils.helpers import *
def test_format_bytes():
"""Test byte formatting."""
assert format_bytes(0) == "0.0 B"
assert format_bytes(512) == "512.0 B"
assert format_bytes(1024) == "1.0 KB"
assert format_bytes(1048576) == "1.0 MB"
assert format_bytes(1073741824) == "1.0 GB"
print("✓ format_bytes() passed")
return True
def test_format_duration():
"""Test duration formatting."""
assert format_duration(30) == "30s"
assert format_duration(65) == "1m 5s"
assert format_duration(3600) == "1h"
assert format_duration(3665) == "1h 1m"
assert format_duration(7265) == "2h 1m"
print("✓ format_duration() passed")
return True
def test_format_percentage():
"""Test percentage formatting."""
assert format_percentage(45.567) == "45.6%"
assert format_percentage(100) == "100.0%"
assert format_percentage(0.123, decimals=2) == "0.12%"
print("✓ format_percentage() passed")
return True
def test_calculate_load_score():
"""Test load score calculation."""
score = calculate_load_score(50, 50, 50)
assert 0 <= score <= 1
assert abs(score - 0.5) < 0.01
score_low = calculate_load_score(20, 30, 25)
score_high = calculate_load_score(80, 85, 90)
assert score_low < score_high
print("✓ calculate_load_score() passed")
return True
def test_classify_load_status():
"""Test load status classification."""
assert classify_load_status(0.2) == "low"
assert classify_load_status(0.5) == "moderate"
assert classify_load_status(0.8) == "high"
print("✓ classify_load_status() passed")
return True
def test_classify_latency():
"""Test latency classification."""
status, desc = classify_latency(25)
assert status == "excellent"
assert "interactive" in desc.lower()
status, desc = classify_latency(150)
assert status == "fair"
print("✓ classify_latency() passed")
return True
def test_parse_disk_usage():
"""Test disk usage parsing."""
sample_output = """Filesystem Size Used Avail Use% Mounted on
/dev/sda1 100G 45G 50G 45% /"""
result = parse_disk_usage(sample_output)
assert result['filesystem'] == '/dev/sda1'
assert result['size'] == '100G'
assert result['used'] == '45G'
assert result['use_pct'] == 45
print("✓ parse_disk_usage() passed")
return True
def test_parse_cpu_load():
"""Test CPU load parsing."""
sample_output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
result = parse_cpu_load(sample_output)
assert result['load_1min'] == 0.45
assert result['load_5min'] == 0.38
assert result['load_15min'] == 0.32
print("✓ parse_cpu_load() passed")
return True
def test_get_timestamp():
"""Test timestamp generation."""
ts_iso = get_timestamp(iso=True)
assert 'T' in ts_iso
assert 'Z' in ts_iso
ts_human = get_timestamp(iso=False)
assert ' ' in ts_human
assert len(ts_human) == 19 # YYYY-MM-DD HH:MM:SS
print("✓ get_timestamp() passed")
return True
def test_validate_path():
"""Test path validation."""
assert validate_path("/tmp", must_exist=True) == True
assert validate_path("/nonexistent_path_12345", must_exist=False) == False
print("✓ validate_path() passed")
return True
def test_safe_execute():
"""Test safe execution wrapper."""
# Should return result on success
result = safe_execute(int, "42")
assert result == 42
# Should return default on failure
result = safe_execute(int, "not_a_number", default=0)
assert result == 0
print("✓ safe_execute() passed")
return True
def main():
"""Run all helper tests."""
print("=" * 70)
print("HELPER TESTS")
print("=" * 70)
tests = [
test_format_bytes,
test_format_duration,
test_format_percentage,
test_calculate_load_score,
test_classify_load_status,
test_classify_latency,
test_parse_disk_usage,
test_parse_cpu_load,
test_get_timestamp,
test_validate_path,
test_safe_execute,
]
passed = 0
for test in tests:
try:
if test():
passed += 1
except Exception as e:
print(f"{test.__name__} failed: {e}")
print(f"\nResults: {passed}/{len(tests)} passed")
return passed == len(tests)
if __name__ == "__main__":
success = main()
sys.exit(0 if success else 1)

346
tests/test_integration.py Normal file
View File

@@ -0,0 +1,346 @@
#!/usr/bin/env python3
"""
Integration tests for Tailscale SSH Sync Agent.
Tests complete workflows from query to result.
"""
import sys
from pathlib import Path
# Add scripts to path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from sshsync_wrapper import get_host_status, list_hosts, get_groups
from tailscale_manager import get_tailscale_status, get_network_summary
from load_balancer import format_load_report, MachineMetrics
from utils.helpers import (
format_bytes, format_duration, format_percentage,
calculate_load_score, classify_load_status, classify_latency
)
def test_host_status_basic():
"""Test get_host_status() without errors."""
print("\n✓ Testing get_host_status()...")
try:
result = get_host_status()
# Validations
assert 'hosts' in result, "Missing 'hosts' in result"
assert isinstance(result.get('hosts', []), list), "'hosts' must be list"
# Should have basic counts even if no hosts configured
assert 'total_count' in result, "Missing 'total_count'"
assert 'online_count' in result, "Missing 'online_count'"
assert 'offline_count' in result, "Missing 'offline_count'"
print(f" ✓ Found {result.get('total_count', 0)} hosts")
print(f" ✓ Online: {result.get('online_count', 0)}")
print(f" ✓ Offline: {result.get('offline_count', 0)}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
import traceback
traceback.print_exc()
return False
def test_list_hosts():
"""Test list_hosts() function."""
print("\n✓ Testing list_hosts()...")
try:
result = list_hosts(with_status=False)
assert 'hosts' in result, "Missing 'hosts' in result"
assert 'count' in result, "Missing 'count' in result"
assert isinstance(result['hosts'], list), "'hosts' must be list"
print(f" ✓ List hosts working")
print(f" ✓ Found {result['count']} configured hosts")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_get_groups():
"""Test get_groups() function."""
print("\n✓ Testing get_groups()...")
try:
groups = get_groups()
assert isinstance(groups, dict), "Groups must be dict"
print(f" ✓ Groups config loaded")
print(f" ✓ Found {len(groups)} groups")
for group, hosts in list(groups.items())[:3]: # Show first 3
print(f" - {group}: {len(hosts)} hosts")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_tailscale_status():
"""Test Tailscale status check."""
print("\n✓ Testing get_tailscale_status()...")
try:
status = get_tailscale_status()
assert isinstance(status, dict), "Status must be dict"
assert 'connected' in status, "Missing 'connected' field"
if status.get('connected'):
print(f" ✓ Tailscale connected")
print(f" ✓ Peers: {status.get('total_count', 0)} total, {status.get('online_count', 0)} online")
else:
print(f" Tailscale not connected: {status.get('error', 'Unknown')}")
print(f" (This is OK if Tailscale is not installed/configured)")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_network_summary():
"""Test network summary generation."""
print("\n✓ Testing get_network_summary()...")
try:
summary = get_network_summary()
assert isinstance(summary, str), "Summary must be string"
assert len(summary) > 0, "Summary cannot be empty"
print(f" ✓ Network summary generated:")
for line in summary.split('\n'):
print(f" {line}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_format_helpers():
"""Test formatting helper functions."""
print("\n✓ Testing format helpers...")
try:
# Test format_bytes
assert format_bytes(1024) == "1.0 KB", "format_bytes failed for 1024"
assert format_bytes(12582912) == "12.0 MB", "format_bytes failed for 12MB"
# Test format_duration
assert format_duration(65) == "1m 5s", "format_duration failed for 65s"
assert format_duration(3665) == "1h 1m", "format_duration failed for 1h+"
# Test format_percentage
assert format_percentage(45.567) == "45.6%", "format_percentage failed"
print(f" ✓ format_bytes(12582912) = {format_bytes(12582912)}")
print(f" ✓ format_duration(3665) = {format_duration(3665)}")
print(f" ✓ format_percentage(45.567) = {format_percentage(45.567)}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_load_score_calculation():
"""Test load score calculation."""
print("\n✓ Testing calculate_load_score()...")
try:
# Test various scenarios
score1 = calculate_load_score(45, 60, 40)
assert 0 <= score1 <= 1, "Score must be 0-1"
assert abs(score1 - 0.49) < 0.01, f"Expected ~0.49, got {score1}"
score2 = calculate_load_score(20, 35, 30)
assert score2 < score1, "Lower usage should have lower score"
score3 = calculate_load_score(85, 70, 65)
assert score3 > score1, "Higher usage should have higher score"
print(f" ✓ Low load (20%, 35%, 30%): {score2:.2f}")
print(f" ✓ Med load (45%, 60%, 40%): {score1:.2f}")
print(f" ✓ High load (85%, 70%, 65%): {score3:.2f}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_load_classification():
"""Test load status classification."""
print("\n✓ Testing classify_load_status()...")
try:
assert classify_load_status(0.28) == "low", "0.28 should be 'low'"
assert classify_load_status(0.55) == "moderate", "0.55 should be 'moderate'"
assert classify_load_status(0.82) == "high", "0.82 should be 'high'"
print(f" ✓ Score 0.28 = {classify_load_status(0.28)}")
print(f" ✓ Score 0.55 = {classify_load_status(0.55)}")
print(f" ✓ Score 0.82 = {classify_load_status(0.82)}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_latency_classification():
"""Test network latency classification."""
print("\n✓ Testing classify_latency()...")
try:
status1, desc1 = classify_latency(25)
assert status1 == "excellent", "25ms should be 'excellent'"
status2, desc2 = classify_latency(75)
assert status2 == "good", "75ms should be 'good'"
status3, desc3 = classify_latency(150)
assert status3 == "fair", "150ms should be 'fair'"
status4, desc4 = classify_latency(250)
assert status4 == "poor", "250ms should be 'poor'"
print(f" ✓ 25ms: {status1} - {desc1}")
print(f" ✓ 75ms: {status2} - {desc2}")
print(f" ✓ 150ms: {status3} - {desc3}")
print(f" ✓ 250ms: {status4} - {desc4}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_load_report_formatting():
"""Test load report formatting."""
print("\n✓ Testing format_load_report()...")
try:
metrics = MachineMetrics(
host='web-01',
cpu_pct=45.0,
mem_pct=60.0,
disk_pct=40.0,
load_score=0.49,
status='moderate'
)
report = format_load_report(metrics)
assert 'web-01' in report, "Report must include hostname"
assert '0.49' in report, "Report must include load score"
assert 'moderate' in report, "Report must include status"
print(f" ✓ Report generated:")
for line in report.split('\n'):
print(f" {line}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def test_dry_run_execution():
"""Test dry-run mode for operations."""
print("\n✓ Testing dry-run execution...")
try:
from sshsync_wrapper import execute_on_all
result = execute_on_all("uptime", dry_run=True)
assert result.get('dry_run') == True, "Must indicate dry-run mode"
assert 'command' in result, "Must include command"
assert 'message' in result, "Must include message"
print(f" ✓ Dry-run mode working")
print(f" ✓ Command: {result.get('command')}")
print(f" ✓ Message: {result.get('message')}")
return True
except Exception as e:
print(f" ✗ FAILED: {e}")
return False
def main():
"""Run all integration tests."""
print("=" * 70)
print("INTEGRATION TESTS - Tailscale SSH Sync Agent")
print("=" * 70)
tests = [
("Host status check", test_host_status_basic),
("List hosts", test_list_hosts),
("Get groups", test_get_groups),
("Tailscale status", test_tailscale_status),
("Network summary", test_network_summary),
("Format helpers", test_format_helpers),
("Load score calculation", test_load_score_calculation),
("Load classification", test_load_classification),
("Latency classification", test_latency_classification),
("Load report formatting", test_load_report_formatting),
("Dry-run execution", test_dry_run_execution),
]
results = []
for test_name, test_func in tests:
passed = test_func()
results.append((test_name, passed))
# Summary
print("\n" + "=" * 70)
print("SUMMARY")
print("=" * 70)
for test_name, passed in results:
status = "✅ PASS" if passed else "❌ FAIL"
print(f"{status}: {test_name}")
passed_count = sum(1 for _, p in results if p)
total_count = len(results)
print(f"\nResults: {passed_count}/{total_count} passed")
if passed_count == total_count:
print("\n🎉 All tests passed!")
else:
print(f"\n⚠️ {total_count - passed_count} test(s) failed")
return passed_count == total_count
if __name__ == "__main__":
success = main()
sys.exit(0 if success else 1)

177
tests/test_validation.py Normal file
View File

@@ -0,0 +1,177 @@
#!/usr/bin/env python3
"""
Tests for validators.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
from utils.validators import *
def test_validate_host():
"""Test host validation."""
# Valid host
assert validate_host("web-01") == "web-01"
assert validate_host(" web-01 ") == "web-01" # Strips whitespace
# With valid list
assert validate_host("web-01", ["web-01", "web-02"]) == "web-01"
# Invalid format
try:
validate_host("web@01") # Invalid character
assert False, "Should have raised ValidationError"
except ValidationError:
pass
print("✓ validate_host() passed")
return True
def test_validate_group():
"""Test group validation."""
# Valid group
assert validate_group("production") == "production"
assert validate_group("PRODUCTION") == "production" # Lowercase normalization
# With valid list
assert validate_group("production", ["production", "staging"]) == "production"
# Invalid
try:
validate_group("invalid!", ["production"])
assert False, "Should have raised ValidationError"
except ValidationError:
pass
print("✓ validate_group() passed")
return True
def test_validate_path_exists():
"""Test path existence validation."""
# Valid path
path = validate_path_exists("/tmp", must_be_dir=True)
assert isinstance(path, Path)
# Invalid path
try:
validate_path_exists("/nonexistent_12345")
assert False, "Should have raised ValidationError"
except ValidationError:
pass
print("✓ validate_path_exists() passed")
return True
def test_validate_timeout():
"""Test timeout validation."""
# Valid timeouts
assert validate_timeout(10) == 10
assert validate_timeout(1) == 1
assert validate_timeout(600) == 600
# Too low
try:
validate_timeout(0)
assert False, "Should have raised ValidationError"
except ValidationError:
pass
# Too high
try:
validate_timeout(1000)
assert False, "Should have raised ValidationError"
except ValidationError:
pass
print("✓ validate_timeout() passed")
return True
def test_validate_command():
"""Test command validation."""
# Safe commands
assert validate_command("ls -la") == "ls -la"
assert validate_command("uptime") == "uptime"
# Dangerous commands (should fail without allow_dangerous)
try:
validate_command("rm -rf /")
assert False, "Should have blocked dangerous command"
except ValidationError:
pass
# But should work with allow_dangerous
assert validate_command("rm -rf /tmp/test", allow_dangerous=True)
print("✓ validate_command() passed")
return True
def test_validate_hosts_list():
"""Test list validation."""
# Valid list
hosts = validate_hosts_list(["web-01", "web-02"])
assert len(hosts) == 2
assert "web-01" in hosts
# Empty list
try:
validate_hosts_list([])
assert False, "Should have raised ValidationError for empty list"
except ValidationError:
pass
print("✓ validate_hosts_list() passed")
return True
def test_get_invalid_hosts():
"""Test finding invalid hosts."""
# Test with mix of valid and invalid
# (This would require actual SSH config, so we test the function exists)
result = get_invalid_hosts(["web-01", "nonexistent-host-12345"])
assert isinstance(result, list)
print("✓ get_invalid_hosts() passed")
return True
def main():
"""Run all validation tests."""
print("=" * 70)
print("VALIDATION TESTS")
print("=" * 70)
tests = [
test_validate_host,
test_validate_group,
test_validate_path_exists,
test_validate_timeout,
test_validate_command,
test_validate_hosts_list,
test_get_invalid_hosts,
]
passed = 0
for test in tests:
try:
if test():
passed += 1
except Exception as e:
print(f"{test.__name__} failed: {e}")
import traceback
traceback.print_exc()
print(f"\nResults: {passed}/{len(tests)} passed")
return passed == len(tests)
if __name__ == "__main__":
success = main()
sys.exit(0 if success else 1)