Files
2025-11-29 18:47:40 +08:00

1205 lines
31 KiB
Markdown

---
name: tailscale-sshsync-agent
description: Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync. Activates when discussing remote machines, Tailscale SSH, workload distribution, file sharing, or multi-host operations.
---
# Tailscale SSH Sync Agent
## When to Use This Skill
This skill automatically activates when you need to:
**Distribute workloads** across multiple machines
- "Run this on my least loaded machine"
- "Execute this task on the machine with most resources"
- "Balance work across my Tailscale network"
**Share files** between Tailscale-connected hosts
- "Push this directory to all my development machines"
- "Sync code across my homelab servers"
- "Deploy configuration to production group"
**Execute commands** remotely across host groups
- "Run system updates on all servers"
- "Check disk space across web-servers group"
- "Restart services on database hosts"
**Monitor machine availability** and health
- "Which machines are online?"
- "Show status of my Tailscale network"
- "Check connectivity to remote hosts"
**Automate multi-machine workflows**
- "Deploy to staging, test, then production"
- "Backup files from all machines"
- "Synchronize development environment across laptops"
## How It Works
This agent provides intelligent workload distribution and file sharing management across Tailscale SSH-connected machines using the `sshsync` CLI tool.
**Core Architecture**:
1. **SSH Sync Wrapper**: Python interface to sshsync CLI operations
2. **Tailscale Manager**: Tailscale-specific connectivity and status management
3. **Load Balancer**: Intelligent task distribution based on machine resources
4. **Workflow Executor**: Common multi-machine workflow automation
5. **Validators**: Parameter, host, and connection validation
6. **Helpers**: Temporal context, formatting, and utilities
**Key Features**:
- **Automatic host discovery** via Tailscale and SSH config
- **Intelligent load balancing** based on CPU, memory, and current load
- **Group-based operations** (execute on all web servers, databases, etc.)
- **Dry-run mode** for preview before execution
- **Parallel execution** across multiple hosts
- **Comprehensive error handling** and retry logic
- **Connection validation** before operations
- **Progress tracking** for long-running operations
## Data Sources
### sshsync CLI Tool
**What is sshsync?**
sshsync is a Python CLI tool for managing SSH connections and executing operations across multiple hosts. It provides:
- Group-based host management
- Remote command execution with timeouts
- File push/pull operations (single or recursive)
- Integration with existing SSH config (~/.ssh/config)
- Status checking and connectivity validation
**Installation**:
```bash
pip install sshsync
```
**Configuration**:
sshsync uses two configuration sources:
1. **SSH Config** (`~/.ssh/config`): Host connection details
2. **sshsync Config** (`~/.config/sshsync/config.yaml`): Group assignments
**Example SSH Config**:
```
Host homelab-1
HostName 100.64.1.10
User admin
IdentityFile ~/.ssh/id_ed25519
Host prod-web-01
HostName 100.64.1.20
User deploy
Port 22
```
**Example sshsync Config**:
```yaml
groups:
homelab:
- homelab-1
- homelab-2
production:
- prod-web-01
- prod-web-02
- prod-db-01
development:
- dev-laptop
- dev-desktop
```
**sshsync Commands Used**:
| Command | Purpose | Example |
|---------|---------|---------|
| `sshsync all` | Execute on all hosts | `sshsync all "df -h"` |
| `sshsync group` | Execute on group | `sshsync group web "systemctl status nginx"` |
| `sshsync push` | Push files to hosts | `sshsync push --group prod ./app /var/www/` |
| `sshsync pull` | Pull files from hosts | `sshsync pull --host db /var/log/mysql ./logs/` |
| `sshsync ls` | List hosts | `sshsync ls --with-status` |
| `sshsync sync` | Sync ungrouped hosts | `sshsync sync` |
### Tailscale Integration
**What is Tailscale?**
Tailscale is a zero-config VPN that creates a secure network between your devices. It provides:
- **Automatic peer-to-peer connections** via WireGuard
- **Magic DNS** for easy host addressing (e.g., `machine-name.tailnet-name.ts.net`)
- **SSH capabilities** built-in to Tailscale CLI
- **ACLs** for access control
**Tailscale SSH**:
Tailscale includes SSH functionality that works seamlessly with standard SSH:
```bash
# Standard SSH via Tailscale
ssh user@machine-name
# Tailscale-specific SSH command
tailscale ssh machine-name
```
**Integration with sshsync**:
Since Tailscale SSH uses standard SSH protocol, it works perfectly with sshsync. Just configure your SSH config with Tailscale hostnames:
```
Host homelab-1
HostName homelab-1.tailnet.ts.net
User admin
```
**Tailscale Commands Used**:
| Command | Purpose | Example |
|---------|---------|---------|
| `tailscale status` | Show network status | Lists all connected machines |
| `tailscale ping` | Check connectivity | `tailscale ping machine-name` |
| `tailscale ssh` | SSH to machine | `tailscale ssh user@machine` |
## Workflows
### 1. Host Health Monitoring
**User Query**: "Which of my machines are online?"
**Workflow**:
1. Load SSH config and sshsync groups
2. Execute `sshsync ls --with-status`
3. Parse connectivity results
4. Query Tailscale status for additional context
5. Return formatted health report with:
- Online/offline status per host
- Group memberships
- Tailscale connection state
- Last seen timestamp
**Implementation**: `scripts/sshsync_wrapper.py``get_host_status()`
**Output Format**:
```
🟢 homelab-1 (homelab) - Online - Tailscale: Connected
🟢 prod-web-01 (production, web-servers) - Online - Tailscale: Connected
🔴 dev-laptop (development) - Offline - Last seen: 2h ago
🟢 prod-db-01 (production, databases) - Online - Tailscale: Connected
Summary: 3/4 hosts online (75%)
```
### 2. Intelligent Load Balancing
**User Query**: "Run this task on the least loaded machine"
**Workflow**:
1. Get list of candidate hosts (from group or all)
2. For each online host, check:
- CPU load (via `uptime` or `top`)
- Memory usage (via `free` or `vm_stat`)
- Disk space (via `df`)
3. Calculate composite load score
4. Select host with lowest score
5. Execute task on selected host
6. Return result with performance metrics
**Implementation**: `scripts/load_balancer.py``select_optimal_host()`
**Load Score Calculation**:
```
score = (cpu_pct * 0.4) + (mem_pct * 0.3) + (disk_pct * 0.3)
```
Lower score = better candidate for task execution.
**Output Format**:
```
✓ Selected host: prod-web-02
Reason: Lowest load score (0.32)
- CPU: 15% (vs avg 45%)
- Memory: 30% (vs avg 60%)
- Disk: 40% (vs avg 55%)
Executing: npm run build
[Task output...]
✓ Completed in 2m 15s
```
### 3. File Synchronization Workflows
**User Query**: "Sync my code to all development machines"
**Workflow**:
1. Validate source path exists locally
2. Identify target group ("development")
3. Check connectivity to all group members
4. Show dry-run preview (files to be synced, sizes)
5. Execute parallel push to all hosts
6. Validate successful transfer on each host
7. Return summary with per-host status
**Implementation**: `scripts/sshsync_wrapper.py``push_to_group()`
**Supported Operations**:
- **Push to all**: Sync files to every configured host
- **Push to group**: Sync to specific group (dev, prod, etc.)
- **Pull from host**: Retrieve files from single host
- **Pull from group**: Collect files from multiple hosts
- **Recursive sync**: Entire directory trees with `--recurse`
**Output Format**:
```
📤 Syncing: ~/projects/myapp → /var/www/myapp
Group: development (3 hosts)
Preview (dry-run):
- dev-laptop: 145 files, 12.3 MB
- dev-desktop: 145 files, 12.3 MB
- dev-server: 145 files, 12.3 MB
Execute? [Proceeding...]
✓ dev-laptop: Synced 145 files in 8s
✓ dev-desktop: Synced 145 files in 6s
✓ dev-server: Synced 145 files in 10s
Summary: 3/3 successful (435 files, 36.9 MB total)
```
### 4. Remote Command Orchestration
**User Query**: "Check disk space on all web servers"
**Workflow**:
1. Identify target group ("web-servers")
2. Validate group exists and has members
3. Check connectivity to group members
4. Execute command in parallel across group
5. Collect and parse outputs
6. Format results with per-host breakdown
**Implementation**: `scripts/sshsync_wrapper.py``execute_on_group()`
**Features**:
- **Parallel execution**: Commands run simultaneously on all hosts
- **Timeout handling**: Configurable per-command timeout (default 10s)
- **Error isolation**: Failure on one host doesn't stop others
- **Output aggregation**: Collect and correlate all outputs
- **Dry-run mode**: Preview what would execute without running
**Output Format**:
```
🔧 Executing on group 'web-servers': df -h /var/www
web-01:
Filesystem: /dev/sda1
Size: 100G, Used: 45G, Available: 50G (45% used)
web-02:
Filesystem: /dev/sda1
Size: 100G, Used: 67G, Available: 28G (67% used) ⚠️
web-03:
Filesystem: /dev/sda1
Size: 100G, Used: 52G, Available: 43G (52% used)
⚠️ Alert: web-02 is above 60% disk usage
```
### 5. Multi-Stage Deployment Workflow
**User Query**: "Deploy to staging, test, then production"
**Workflow**:
1. **Stage 1 - Staging Deploy**:
- Push code to staging group
- Run build process
- Execute automated tests
- If tests fail: STOP and report error
2. **Stage 2 - Validation**:
- Check staging health endpoints
- Validate database migrations
- Run smoke tests
3. **Stage 3 - Production Deploy**:
- Push to production group (one at a time for zero-downtime)
- Restart services gracefully
- Verify each host before proceeding to next
4. **Stage 4 - Verification**:
- Check production health
- Monitor for errors
- Rollback if issues detected
**Implementation**: `scripts/workflow_executor.py``deploy_workflow()`
**Output Format**:
```
🚀 Multi-Stage Deployment Workflow
Stage 1: Staging Deployment
✓ Pushed code to staging-01
✓ Build completed (2m 15s)
✓ Tests passed (145/145)
Stage 2: Validation
✓ Health check passed
✓ Database migration OK
✓ Smoke tests passed (12/12)
Stage 3: Production Deployment
✓ prod-web-01: Deployed & verified
✓ prod-web-02: Deployed & verified
✓ prod-web-03: Deployed & verified
Stage 4: Verification
✓ All health checks passed
✓ No errors in logs (5min window)
✅ Deployment completed successfully in 12m 45s
```
## Available Scripts
### scripts/sshsync_wrapper.py
**Purpose**: Python wrapper around sshsync CLI for programmatic access
**Functions**:
- `get_host_status(group=None)`: Get online/offline status of hosts
- `execute_on_all(command, timeout=10, dry_run=False)`: Run command on all hosts
- `execute_on_group(group, command, timeout=10, dry_run=False)`: Run on specific group
- `execute_on_host(host, command, timeout=10)`: Run on single host
- `push_to_hosts(local_path, remote_path, hosts=None, group=None, recurse=False, dry_run=False)`: Push files
- `pull_from_host(host, remote_path, local_path, recurse=False, dry_run=False)`: Pull files
- `list_hosts(with_status=True)`: List all configured hosts
- `get_groups()`: Get all defined groups and their members
- `add_hosts_to_group(group, hosts)`: Add hosts to a group
**Usage Example**:
```python
from sshsync_wrapper import execute_on_group, push_to_hosts
# Execute command
result = execute_on_group(
group="web-servers",
command="systemctl status nginx",
timeout=15
)
# Push files
push_to_hosts(
local_path="./dist",
remote_path="/var/www/app",
group="production",
recurse=True
)
```
### scripts/tailscale_manager.py
**Purpose**: Tailscale-specific operations and status management
**Functions**:
- `get_tailscale_status()`: Get Tailscale network status (all peers)
- `check_connectivity(host)`: Ping host via Tailscale
- `get_peer_info(hostname)`: Get detailed info about peer
- `list_online_machines()`: List all online Tailscale machines
- `get_machine_ip(hostname)`: Get Tailscale IP for machine
- `validate_tailscale_ssh(host)`: Check if Tailscale SSH is working
**Usage Example**:
```python
from tailscale_manager import get_tailscale_status, check_connectivity
# Get network status
status = get_tailscale_status()
print(f"Online machines: {status['online_count']}")
# Check specific host
is_online = check_connectivity("homelab-1")
```
### scripts/load_balancer.py
**Purpose**: Intelligent task distribution based on machine resources
**Functions**:
- `get_machine_load(host)`: Get CPU, memory, disk metrics
- `calculate_load_score(metrics)`: Calculate composite load score
- `select_optimal_host(candidates, prefer_group=None)`: Pick best host
- `get_group_capacity()`: Get aggregate capacity of group
- `distribute_tasks(tasks, hosts)`: Distribute multiple tasks optimally
**Usage Example**:
```python
from load_balancer import select_optimal_host
# Find best machine for task
best_host = select_optimal_host(
candidates=["web-01", "web-02", "web-03"],
prefer_group="production"
)
# Execute on selected host
execute_on_host(best_host, "npm run build")
```
### scripts/workflow_executor.py
**Purpose**: Common multi-machine workflow automation
**Functions**:
- `deploy_workflow(code_path, staging_group, prod_group)`: Full deployment pipeline
- `backup_workflow(hosts, backup_paths, destination)`: Backup from multiple hosts
- `sync_workflow(source_host, target_group, paths)`: Sync from one to many
- `rolling_restart(group, service_name)`: Zero-downtime service restart
- `health_check_workflow(group, endpoint)`: Check health across group
**Usage Example**:
```python
from workflow_executor import deploy_workflow, backup_workflow
# Deploy with testing
deploy_workflow(
code_path="./dist",
staging_group="staging",
prod_group="production"
)
# Backup from all databases
backup_workflow(
hosts=["db-01", "db-02"],
backup_paths=["/var/lib/mysql"],
destination="./backups"
)
```
### scripts/utils/helpers.py
**Purpose**: Common utilities and formatting functions
**Functions**:
- `format_bytes(bytes)`: Human-readable byte formatting (1.2 GB)
- `format_duration(seconds)`: Human-readable duration (2m 15s)
- `parse_ssh_config()`: Parse ~/.ssh/config for host details
- `parse_sshsync_config()`: Parse sshsync group configuration
- `get_timestamp()`: Get ISO timestamp for logging
- `safe_execute(func, *args, **kwargs)`: Execute with error handling
- `validate_path(path)`: Check if path exists and is accessible
### scripts/utils/validators/parameter_validator.py
**Purpose**: Validate user inputs and parameters
**Functions**:
- `validate_host(host, valid_hosts=None)`: Validate host exists
- `validate_group(group, valid_groups=None)`: Validate group exists
- `validate_path_exists(path)`: Check local path exists
- `validate_timeout(timeout)`: Ensure timeout is reasonable
- `validate_command(command)`: Basic command safety validation
### scripts/utils/validators/host_validator.py
**Purpose**: Validate host configuration and availability
**Functions**:
- `validate_ssh_config(host)`: Check host has SSH config entry
- `validate_host_reachable(host, timeout=5)`: Check host is reachable
- `validate_group_members(group)`: Ensure group has valid members
- `get_invalid_hosts(hosts)`: Find hosts without valid config
### scripts/utils/validators/connection_validator.py
**Purpose**: Validate SSH and Tailscale connections
**Functions**:
- `validate_ssh_connection(host)`: Test SSH connection works
- `validate_tailscale_connection(host)`: Test Tailscale connectivity
- `validate_ssh_key(host)`: Check SSH key authentication
- `get_connection_diagnostics(host)`: Comprehensive connection testing
## Available Analyses
### 1. Host Availability Analysis
**Function**: `analyze_host_availability(group=None)`
**Objective**: Determine which machines are online and accessible
**Inputs**:
- `group` (optional): Specific group to check (None = all hosts)
**Outputs**:
```python
{
'total_hosts': 10,
'online_hosts': 8,
'offline_hosts': 2,
'availability_pct': 80.0,
'by_group': {
'production': {'online': 3, 'total': 3, 'pct': 100.0},
'development': {'online': 2, 'total': 3, 'pct': 66.7},
'homelab': {'online': 3, 'total': 4, 'pct': 75.0}
},
'offline_hosts_details': [
{'host': 'dev-laptop', 'last_seen': '2h ago', 'groups': ['development']},
{'host': 'homelab-4', 'last_seen': '1d ago', 'groups': ['homelab']}
]
}
```
**Interpretation**:
- **> 90%**: Excellent availability
- **70-90%**: Good availability, monitor offline hosts
- **< 70%**: Poor availability, investigate issues
### 2. Load Distribution Analysis
**Function**: `analyze_load_distribution(group=None)`
**Objective**: Understand resource usage across machines
**Inputs**:
- `group` (optional): Specific group to analyze
**Outputs**:
```python
{
'hosts': [
{
'host': 'web-01',
'cpu_pct': 45,
'mem_pct': 60,
'disk_pct': 40,
'load_score': 0.49,
'status': 'moderate'
},
# ... more hosts
],
'aggregate': {
'avg_cpu': 35,
'avg_mem': 55,
'avg_disk': 45,
'total_capacity': 1200 # GB
},
'recommendations': [
{
'host': 'web-02',
'issue': 'High CPU usage (85%)',
'action': 'Consider migrating workloads'
}
]
}
```
**Load Status**:
- **Low** (score < 0.4): Good capacity for more work
- **Moderate** (0.4-0.7): Normal operation
- **High** (> 0.7): May need to offload work
### 3. File Sync Status Analysis
**Function**: `analyze_sync_status(local_path, remote_path, group)`
**Objective**: Compare local files with remote versions
**Inputs**:
- `local_path`: Local directory to compare
- `remote_path`: Remote directory path
- `group`: Group to check
**Outputs**:
```python
{
'local_files': 145,
'local_size': 12582912, # bytes
'hosts': [
{
'host': 'web-01',
'status': 'in_sync',
'files_match': 145,
'files_different': 0,
'missing_files': 0
},
{
'host': 'web-02',
'status': 'out_of_sync',
'files_match': 140,
'files_different': 3,
'missing_files': 2,
'details': ['config.json modified', 'index.html modified', ...]
}
],
'sync_percentage': 96.7,
'recommended_action': 'Push to web-02'
}
```
### 4. Network Latency Analysis
**Function**: `analyze_network_latency(hosts=None)`
**Objective**: Measure connection latency to hosts
**Inputs**:
- `hosts` (optional): Specific hosts to test (None = all)
**Outputs**:
```python
{
'hosts': [
{'host': 'web-01', 'latency_ms': 15, 'status': 'excellent'},
{'host': 'web-02', 'latency_ms': 45, 'status': 'good'},
{'host': 'db-01', 'latency_ms': 150, 'status': 'fair'}
],
'avg_latency': 70,
'min_latency': 15,
'max_latency': 150,
'recommendations': [
{'host': 'db-01', 'issue': 'High latency', 'action': 'Check network path'}
]
}
```
**Latency Classification**:
- **Excellent** (< 50ms): Ideal for interactive tasks
- **Good** (50-100ms): Suitable for most operations
- **Fair** (100-200ms): May impact interactive workflows
- **Poor** (> 200ms): Investigate network issues
### 5. Comprehensive Infrastructure Report
**Function**: `comprehensive_infrastructure_report(group=None)`
**Objective**: One-stop function for complete infrastructure overview
**Inputs**:
- `group` (optional): Limit to specific group (None = all)
**Outputs**:
```python
{
'report_timestamp': '2025-10-19T19:43:41Z',
'group': 'production', # or 'all'
'metrics': {
'availability': {...}, # from analyze_host_availability
'load_distribution': {...}, # from analyze_load_distribution
'network_latency': {...}, # from analyze_network_latency
'tailscale_status': {...} # from Tailscale integration
},
'summary': "Production infrastructure: 3/3 hosts online, avg load 45%, network latency 35ms",
'alerts': [
"⚠ web-02: High CPU usage (85%)",
"⚠ db-01: Elevated latency (150ms)"
],
'recommendations': [
"Consider rebalancing workload from web-02",
"Investigate network path to db-01"
],
'overall_health': 'good' # excellent | good | fair | poor
}
```
**Overall Health Classification**:
- **Excellent**: All metrics green, no alerts
- **Good**: Most metrics healthy, minor alerts
- **Fair**: Some concerning metrics, action recommended
- **Poor**: Critical issues, immediate action required
## Error Handling
### Connection Errors
**Error**: Cannot connect to host
**Causes**:
- Host is offline
- Tailscale not connected
- SSH key missing/invalid
- Firewall blocking connection
**Handling**:
```python
try:
execute_on_host("web-01", "ls")
except ConnectionError as e:
# Try Tailscale ping first
if not check_connectivity("web-01"):
return {
'error': 'Host unreachable',
'suggestion': 'Check Tailscale connection',
'diagnostics': get_connection_diagnostics("web-01")
}
# Then check SSH
if not validate_ssh_connection("web-01"):
return {
'error': 'SSH authentication failed',
'suggestion': 'Check SSH keys: ssh-add -l'
}
```
### Timeout Errors
**Error**: Operation timed out
**Causes**:
- Command taking too long
- Network latency
- Host overloaded
**Handling**:
- Automatic retry with exponential backoff (3 attempts)
- Increase timeout for known slow operations
- Fall back to alternative host if available
### File Transfer Errors
**Error**: File sync failed
**Causes**:
- Insufficient disk space
- Permission denied
- Path doesn't exist
**Handling**:
- Pre-check disk space on target
- Validate permissions before transfer
- Create directories if needed
- Partial transfer recovery
### Validation Errors
**Error**: Invalid parameter
**Examples**:
- Unknown host
- Non-existent group
- Invalid path
**Handling**:
- Validate all inputs before execution
- Provide suggestions for similar valid options
- Clear error messages with corrective actions
## Mandatory Validations
### Before Any Operation
1. **Parameter Validation**:
```python
host = validate_host(host, valid_hosts=get_all_hosts())
group = validate_group(group, valid_groups=get_groups())
timeout = validate_timeout(timeout)
```
2. **Connection Validation**:
```python
if not validate_host_reachable(host, timeout=5):
raise ConnectionError(f"Host {host} is not reachable")
```
3. **Path Validation** (for file operations):
```python
if not validate_path_exists(local_path):
raise ValueError(f"Path does not exist: {local_path}")
```
### During Operation
1. **Timeout Monitoring**: Every operation has configurable timeout
2. **Progress Tracking**: Long operations show progress
3. **Error Isolation**: Failure on one host doesn't stop others
### After Operation
1. **Result Validation**:
```python
report = validate_operation_result(result)
if report.has_critical_issues():
raise OperationError(report.get_summary())
```
2. **State Verification**: Confirm operation succeeded
3. **Logging**: Record all operations for audit trail
## Performance and Caching
### Caching Strategy
**Host Status Cache**:
- **TTL**: 60 seconds
- **Why**: Host status doesn't change rapidly
- **Invalidation**: Manual invalidate when connectivity changes
**Load Metrics Cache**:
- **TTL**: 30 seconds
- **Why**: Load changes frequently
- **Invalidation**: Automatic on timeout
**Group Configuration Cache**:
- **TTL**: 5 minutes
- **Why**: Group membership rarely changes
- **Invalidation**: Manual when groups modified
### Performance Optimizations
1. **Parallel Execution**:
- Commands execute concurrently across hosts
- ThreadPoolExecutor with max 10 workers
- Prevents sequential bottleneck
2. **Connection Pooling**:
- Reuse SSH connections when possible
- ControlMaster in SSH config
3. **Lazy Loading**:
- Only fetch data when needed
- Don't load all host status unless required
4. **Progressive Results**:
- Stream results as they complete
- Don't wait for slowest host
## Automatic Detection Keywords
This skill automatically activates when you mention:
**Hosts & Machines**:
- remote machine, remote host, remote server
- Tailscale machine, Tailscale host
- SSH host, SSH server
- homelab, servers, cluster
**Tools**:
- sshsync, ssh sync
- Tailscale, tailnet
- SSH, secure shell
**Operations**:
- distribute workload, load balance
- file sharing, file sync, file transfer
- push files, pull files, sync files
- remote execution, execute remotely
- deploy to, deploy across
- backup from, sync to
**Groups**:
- host group, machine group, server group
- production servers, development machines
- web servers, database servers
- staging environment, production environment
**Actions**:
- check status, check connectivity
- run on all, execute on group
- sync across, distribute to
- deploy to, push to, pull from
**Monitoring**:
- which machines are online
- host availability, server status
- machine health, connectivity check
- network status, latency check
**Workflows**:
- multi-stage deployment
- rolling restart, zero-downtime
- backup workflow, sync workflow
- distributed task, parallel execution
## Usage Examples
### Example 1: Check Host Status
**User**: "Which of my Tailscale machines are online?"
**Agent Response**:
```
Checking Tailscale network status...
🟢 homelab-1 (homelab) - Online - Tailscale: Connected
IP: 100.64.1.10 | Latency: 15ms
🟢 homelab-2 (homelab) - Online - Tailscale: Connected
IP: 100.64.1.11 | Latency: 18ms
🔴 dev-laptop (development) - Offline
Last seen: 2 hours ago | Tailscale: Not connected
🟢 prod-web-01 (production, web-servers) - Online - Tailscale: Connected
IP: 100.64.1.20 | Latency: 25ms
🟢 prod-db-01 (production, databases) - Online - Tailscale: Connected
IP: 100.64.1.30 | Latency: 30ms
Summary: 4/5 hosts online (80%)
Average latency: 22ms
```
### Example 2: Execute on Group
**User**: "Check disk space on all web servers"
**Agent Response**:
```
Executing on group 'web-servers': df -h /var/www
🔧 web-01:
Filesystem: /dev/sda1
Size: 100G | Used: 45G | Available: 50G (45% used)
Status: ✓ Healthy
🔧 web-02:
Filesystem: /dev/sda1
Size: 100G | Used: 67G | Available: 28G (67% used)
Status: ⚠ Warning - Above 60%
🔧 web-03:
Filesystem: /dev/sda1
Size: 100G | Used: 52G | Available: 43G (52% used)
Status: ✓ Healthy
⚠️ Alert: web-02 is above 60% disk usage
Recommendation: Clear old logs or increase disk space
```
### Example 3: Load-Balanced Execution
**User**: "Run this npm build on the machine with most available resources"
**Agent Response**:
```
Analyzing resource availability across 5 machines...
Load Analysis:
web-01: CPU 45%, Mem 60%, Disk 40% → Score: 0.49 (moderate)
web-02: CPU 85%, Mem 70%, Disk 65% → Score: 0.75 (high load)
web-03: CPU 20%, Mem 35%, Disk 30% → Score: 0.28 (low load) ✓
✓ Selected: web-03
Reason: Lowest load score (0.28)
Available: CPU 80%, Memory 65%, Disk 70%
Executing on web-03: npm run build
[Build output...]
✓ Build completed successfully
Duration: 2m 15s
Output: ./dist (145 files, 12.3 MB)
Final load: CPU 65%, Mem 50% (still below average)
```
### Example 4: File Sync
**User**: "Push my app directory to all production servers"
**Agent Response**:
```
📤 Syncing: ~/projects/myapp → /var/www/myapp
Target group: production (3 hosts)
Analyzing source...
145 files, 12.3 MB total
Preview (dry-run):
prod-web-01: 145 files → /var/www/myapp
prod-web-02: 145 files → /var/www/myapp
prod-db-01: Skipped (not in web-servers group)
Checking target capacity...
✓ prod-web-01: 50 GB available
✓ prod-web-02: 28 GB available
Executing push...
✓ prod-web-01: Synced 145 files in 8.2s
Modified: 12 files
Added: 3 files
✓ prod-web-02: Synced 145 files in 6.5s
Modified: 12 files
Added: 3 files
Summary: 2/2 successful (290 files, 24.6 MB total)
```
### Example 5: Multi-Stage Deployment
**User**: "Deploy this to staging, run tests, then push to production"
**Agent Response**:
```
🚀 Multi-Stage Deployment Workflow
Stage 1: Staging Deployment
📤 Pushing to staging-01...
✓ Code synced (145 files, 12.3 MB) - 8s
🔨 Running build...
$ npm run build
✓ Build completed - 2m 15s
🧪 Running tests...
$ npm test
✓ All tests passed (145/145) - 45s
Stage 2: Staging Validation
✓ Health endpoint: 200 OK
✓ Database migrations: Up to date
✓ Smoke tests: 12/12 passed
Stage 3: Production Deployment
Rolling deployment to 3 hosts...
⏳ prod-web-01:
✓ Code synced
✓ Build completed
✓ Service restarted gracefully
✓ Health check passed
Duration: 3m 10s
⏳ prod-web-02:
✓ Code synced
✓ Build completed
✓ Service restarted gracefully
✓ Health check passed
Duration: 3m 5s
⏳ prod-web-03:
✓ Code synced
✓ Build completed
✓ Service restarted gracefully
✓ Health check passed
Duration: 3m 12s
Stage 4: Production Verification
✓ All health checks passed (3/3)
✓ No errors in logs (5min window)
✓ Response times normal (avg 45ms)
✅ Deployment completed successfully!
Total duration: 12m 45s
Hosts updated: 3
Zero downtime: ✓
```
### Example 6: Comprehensive Infrastructure Report
**User**: "Give me a complete status report of my infrastructure"
**Agent Response**:
```
📊 Infrastructure Report
Generated: 2025-10-19 19:43:41
=== AVAILABILITY ===
Total Hosts: 10
Online: 8 (80%)
Offline: 2
By Group:
production: 3/3 (100%) ✓
development: 2/3 (66.7%) ⚠
homelab: 3/4 (75%)
Offline Hosts:
• dev-laptop (development) - Last seen: 2h ago
• homelab-4 (homelab) - Last seen: 1d ago
=== LOAD DISTRIBUTION ===
Average Load: 45% CPU, 55% Memory, 45% Disk
Top 3 Loaded Hosts:
1. web-02: 85% CPU, 70% Mem, 65% Disk (Score: 0.75) ⚠
2. db-01: 60% CPU, 75% Mem, 55% Disk (Score: 0.65)
3. web-01: 45% CPU, 60% Mem, 40% Disk (Score: 0.49)
Top 3 Available Hosts:
1. web-03: 20% CPU, 35% Mem, 30% Disk (Score: 0.28) ✓
2. homelab-1: 25% CPU, 40% Mem, 35% Disk (Score: 0.33)
3. homelab-2: 30% CPU, 45% Mem, 40% Disk (Score: 0.38)
=== NETWORK LATENCY ===
Average: 35ms
Range: 15ms - 150ms
Excellent (< 50ms): 6 hosts
Good (50-100ms): 1 host
Fair (100-200ms): 1 host (db-01: 150ms) ⚠
=== TAILSCALE STATUS ===
Network: Connected
Peers Online: 8/10
Exit Node: None
MagicDNS: Enabled
=== ALERTS ===
⚠ web-02: High CPU usage (85%) - Consider load balancing
⚠ db-01: Elevated latency (150ms) - Check network path
⚠ dev-laptop: Offline for 2 hours - May need attention
=== RECOMMENDATIONS ===
1. Rebalance workload from web-02 to web-03
2. Investigate network latency to db-01
3. Check status of dev-laptop and homelab-4
4. Consider scheduling maintenance for web-02
Overall Health: GOOD ✓
```
## Installation
See INSTALLATION.md for detailed setup instructions.
Quick start:
```bash
# 1. Install sshsync
pip install sshsync
# 2. Configure SSH hosts
vim ~/.ssh/config
# 3. Sync host groups
sshsync sync
# 4. Install agent
/plugin marketplace add ./tailscale-sshsync-agent
# 5. Test
"Which of my machines are online?"
```
## Version
Current version: 1.0.0
See CHANGELOG.md for release history.
## Architecture Decisions
See DECISIONS.md for detailed rationale behind tool selection, architecture choices, and trade-offs considered.