Initial commit
This commit is contained in:
12
.claude-plugin/plugin.json
Normal file
12
.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"name": "tailscale-sshsync-agent",
|
||||||
|
"description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
|
||||||
|
"version": "0.0.0-2025.11.28",
|
||||||
|
"author": {
|
||||||
|
"name": "William VanSickle III",
|
||||||
|
"email": "noreply@humanfrontierlabs.com"
|
||||||
|
},
|
||||||
|
"skills": [
|
||||||
|
"./"
|
||||||
|
]
|
||||||
|
}
|
||||||
163
CHANGELOG.md
Normal file
163
CHANGELOG.md
Normal file
@@ -0,0 +1,163 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to Tailscale SSH Sync Agent will be documented here.
|
||||||
|
|
||||||
|
Format based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
Versioning follows [Semantic Versioning](https://semver.org/).
|
||||||
|
|
||||||
|
## [1.0.0] - 2025-10-19
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
**Core Functionality:**
|
||||||
|
- `sshsync_wrapper.py`: Python interface to sshsync CLI operations
|
||||||
|
- `get_host_status()`: Check online/offline status of hosts
|
||||||
|
- `execute_on_all()`: Run commands on all configured hosts
|
||||||
|
- `execute_on_group()`: Run commands on specific groups
|
||||||
|
- `execute_on_host()`: Run commands on single host
|
||||||
|
- `push_to_hosts()`: Push files to multiple hosts (with groups support)
|
||||||
|
- `pull_from_host()`: Pull files from hosts
|
||||||
|
- `list_hosts()`: List all configured hosts
|
||||||
|
- `get_groups()`: Get group configuration
|
||||||
|
|
||||||
|
- `tailscale_manager.py`: Tailscale-specific operations
|
||||||
|
- `get_tailscale_status()`: Get complete network status
|
||||||
|
- `check_connectivity()`: Ping hosts via Tailscale
|
||||||
|
- `get_peer_info()`: Get detailed peer information
|
||||||
|
- `list_online_machines()`: List all online Tailscale machines
|
||||||
|
- `validate_tailscale_ssh()`: Check if Tailscale SSH works for a host
|
||||||
|
- `get_network_summary()`: Human-readable network summary
|
||||||
|
|
||||||
|
- `load_balancer.py`: Intelligent task distribution
|
||||||
|
- `get_machine_load()`: Get CPU, memory, disk metrics for a machine
|
||||||
|
- `select_optimal_host()`: Pick best host based on current load
|
||||||
|
- `get_group_capacity()`: Get aggregate capacity of a group
|
||||||
|
- `distribute_tasks()`: Distribute multiple tasks optimally across hosts
|
||||||
|
- `format_load_report()`: Format load metrics as human-readable report
|
||||||
|
|
||||||
|
- `workflow_executor.py`: Common multi-machine workflows
|
||||||
|
- `deploy_workflow()`: Full deployment pipeline (staging → test → production)
|
||||||
|
- `backup_workflow()`: Backup files from multiple hosts
|
||||||
|
- `sync_workflow()`: Sync files from one host to many
|
||||||
|
- `rolling_restart()`: Zero-downtime service restart across group
|
||||||
|
- `health_check_workflow()`: Check health endpoints across group
|
||||||
|
|
||||||
|
**Utilities:**
|
||||||
|
- `utils/helpers.py`: Common formatting and parsing functions
|
||||||
|
- Byte formatting (`format_bytes`)
|
||||||
|
- Duration formatting (`format_duration`)
|
||||||
|
- Percentage formatting (`format_percentage`)
|
||||||
|
- SSH config parsing (`parse_ssh_config`)
|
||||||
|
- sshsync config parsing (`parse_sshsync_config`)
|
||||||
|
- System metrics parsing (`parse_disk_usage`, `parse_memory_usage`, `parse_cpu_load`)
|
||||||
|
- Load score calculation (`calculate_load_score`)
|
||||||
|
- Status classification (`classify_load_status`, `classify_latency`)
|
||||||
|
- Safe command execution (`run_command`, `safe_execute`)
|
||||||
|
|
||||||
|
- `utils/validators/`: Comprehensive validation system
|
||||||
|
- `parameter_validator.py`: Input validation (hosts, groups, paths, timeouts, commands)
|
||||||
|
- `host_validator.py`: Host configuration and availability validation
|
||||||
|
- `connection_validator.py`: SSH and Tailscale connection validation
|
||||||
|
|
||||||
|
**Testing:**
|
||||||
|
- `tests/test_integration.py`: 11 end-to-end integration tests
|
||||||
|
- `tests/test_helpers.py`: 11 helper function tests
|
||||||
|
- `tests/test_validation.py`: 7 validation tests
|
||||||
|
- **Total: 29 tests** covering all major functionality
|
||||||
|
|
||||||
|
**Documentation:**
|
||||||
|
- `SKILL.md`: Complete skill documentation (6,000+ words)
|
||||||
|
- When to use this skill
|
||||||
|
- How it works
|
||||||
|
- Data sources (sshsync CLI, Tailscale)
|
||||||
|
- Detailed workflows for each operation type
|
||||||
|
- Available scripts and functions
|
||||||
|
- Error handling and validations
|
||||||
|
- Performance and caching strategies
|
||||||
|
- Usage examples
|
||||||
|
- `references/sshsync-guide.md`: Complete sshsync CLI reference
|
||||||
|
- `references/tailscale-integration.md`: Tailscale integration guide
|
||||||
|
- `README.md`: Installation and quick start guide
|
||||||
|
- `INSTALLATION.md`: Detailed setup tutorial
|
||||||
|
- `DECISIONS.md`: Architecture decisions and rationale
|
||||||
|
|
||||||
|
### Data Sources
|
||||||
|
|
||||||
|
**sshsync CLI:**
|
||||||
|
- Installation: `pip install sshsync`
|
||||||
|
- Configuration: `~/.config/sshsync/config.yaml`
|
||||||
|
- SSH config integration: `~/.ssh/config`
|
||||||
|
- Group-based host management
|
||||||
|
- Remote command execution with timeouts
|
||||||
|
- File push/pull operations (single or recursive)
|
||||||
|
- Status checking and connectivity validation
|
||||||
|
|
||||||
|
**Tailscale:**
|
||||||
|
- Zero-config VPN with WireGuard encryption
|
||||||
|
- MagicDNS for easy host addressing
|
||||||
|
- Built-in SSH capabilities
|
||||||
|
- Seamless integration with standard SSH
|
||||||
|
- Peer-to-peer connections
|
||||||
|
- Works across NATs and firewalls
|
||||||
|
|
||||||
|
### Coverage
|
||||||
|
|
||||||
|
**Operations:**
|
||||||
|
- Host status monitoring and availability checks
|
||||||
|
- Intelligent load-based task distribution
|
||||||
|
- Multi-host command execution (all hosts, groups, individual)
|
||||||
|
- File synchronization workflows (push/pull)
|
||||||
|
- Deployment pipelines (staging → production)
|
||||||
|
- Backup and sync workflows
|
||||||
|
- Rolling restarts with zero downtime
|
||||||
|
- Health checking across services
|
||||||
|
|
||||||
|
**Geographic Coverage:** All hosts in Tailscale network (global)
|
||||||
|
|
||||||
|
**Temporal Coverage:** Real-time status and operations
|
||||||
|
|
||||||
|
### Known Limitations
|
||||||
|
|
||||||
|
**v1.0.0:**
|
||||||
|
- sshsync must be installed separately (`pip install sshsync`)
|
||||||
|
- Tailscale must be configured separately
|
||||||
|
- SSH keys must be set up manually on each host
|
||||||
|
- Load balancing uses simple metrics (CPU, memory, disk)
|
||||||
|
- No built-in monitoring dashboards (terminal output only)
|
||||||
|
- No persistence of operation history (logs only)
|
||||||
|
- Requires SSH config and sshsync config to be manually maintained
|
||||||
|
|
||||||
|
### Planned for v2.0
|
||||||
|
|
||||||
|
**Enhanced Features:**
|
||||||
|
- Automated SSH key distribution across hosts
|
||||||
|
- Built-in operation history and logging database
|
||||||
|
- Web dashboard for monitoring and operations
|
||||||
|
- Advanced load balancing with custom metrics
|
||||||
|
- Scheduled operations and cron integration
|
||||||
|
- Operation rollback capabilities
|
||||||
|
- Integration with configuration management tools (Ansible, Terraform)
|
||||||
|
- Cost tracking for cloud resources
|
||||||
|
- Performance metrics collection and visualization
|
||||||
|
- Alert system for failed operations
|
||||||
|
- Multi-tenancy support for team environments
|
||||||
|
|
||||||
|
**Integrations:**
|
||||||
|
- Prometheus metrics export
|
||||||
|
- Grafana dashboard templates
|
||||||
|
- Slack/Discord notifications
|
||||||
|
- CI/CD pipeline integration
|
||||||
|
- Container orchestration support (Docker, Kubernetes)
|
||||||
|
|
||||||
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Planned
|
||||||
|
|
||||||
|
- Add support for Windows hosts (PowerShell remoting)
|
||||||
|
- Improve performance for large host groups (100+)
|
||||||
|
- Add SSH connection pooling for faster operations
|
||||||
|
- Implement operation queueing for long-running tasks
|
||||||
|
- Add support for custom validation plugins
|
||||||
|
- Expand coverage to Docker containers via SSH
|
||||||
|
- Add retry strategies with exponential backoff
|
||||||
|
- Implement circuit breaker pattern for failing hosts
|
||||||
458
DECISIONS.md
Normal file
458
DECISIONS.md
Normal file
@@ -0,0 +1,458 @@
|
|||||||
|
# Architecture Decisions
|
||||||
|
|
||||||
|
Documentation of all technical decisions made for Tailscale SSH Sync Agent.
|
||||||
|
|
||||||
|
## Tool Selection
|
||||||
|
|
||||||
|
### Selected Tool: sshsync
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Advantages:**
|
||||||
|
- **Ready-to-use**: Available via `pip install sshsync`
|
||||||
|
- **Group management**: Built-in support for organizing hosts into groups
|
||||||
|
- **Integration**: Works with existing SSH config (`~/.ssh/config`)
|
||||||
|
- **Simple API**: Easy-to-wrap CLI interface
|
||||||
|
- **Parallel execution**: Commands run concurrently across hosts
|
||||||
|
- **File operations**: Push/pull with recursive support
|
||||||
|
- **Timeout handling**: Per-command timeouts for reliability
|
||||||
|
- **Active maintenance**: Regular updates and bug fixes
|
||||||
|
- **Python-based**: Easy to extend and integrate
|
||||||
|
|
||||||
|
✅ **Coverage:**
|
||||||
|
- All SSH-accessible hosts
|
||||||
|
- Works with any SSH server (Linux, macOS, BSD, etc.)
|
||||||
|
- Platform-agnostic (runs on any OS with Python)
|
||||||
|
|
||||||
|
✅ **Cost:**
|
||||||
|
- Free and open-source
|
||||||
|
- No API keys or subscriptions required
|
||||||
|
- No rate limits
|
||||||
|
|
||||||
|
✅ **Documentation:**
|
||||||
|
- Clear command-line interface
|
||||||
|
- PyPI documentation available
|
||||||
|
- GitHub repository with examples
|
||||||
|
|
||||||
|
**Alternatives Considered:**
|
||||||
|
|
||||||
|
❌ **Fabric (Python library)**
|
||||||
|
- Pros: Pure Python, very flexible
|
||||||
|
- Cons: Requires writing more code, no built-in group management
|
||||||
|
- **Rejected because**: sshsync provides ready-made functionality
|
||||||
|
|
||||||
|
❌ **Ansible**
|
||||||
|
- Pros: Industry standard, very powerful
|
||||||
|
- Cons: Requires learning YAML playbooks, overkill for simple operations
|
||||||
|
- **Rejected because**: Too heavyweight for ad-hoc commands and file transfers
|
||||||
|
|
||||||
|
❌ **pssh (parallel-ssh)**
|
||||||
|
- Pros: Simple parallel SSH
|
||||||
|
- Cons: No group management, no file transfer built-in, less actively maintained
|
||||||
|
- **Rejected because**: sshsync has better group management and file operations
|
||||||
|
|
||||||
|
❌ **Custom SSH wrapper**
|
||||||
|
- Pros: Full control
|
||||||
|
- Cons: Reinventing the wheel, maintaining parallel execution logic
|
||||||
|
- **Rejected because**: sshsync already provides what we need
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
sshsync is the best tool for this use case because it:
|
||||||
|
1. Provides group-based host management out of the box
|
||||||
|
2. Handles parallel execution automatically
|
||||||
|
3. Integrates with existing SSH configuration
|
||||||
|
4. Supports both command execution and file transfers
|
||||||
|
5. Requires minimal wrapper code
|
||||||
|
|
||||||
|
## Integration: Tailscale
|
||||||
|
|
||||||
|
**Decision**: Integrate with Tailscale for network connectivity
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Why Tailscale:**
|
||||||
|
- **Zero-config VPN**: No manual firewall/NAT configuration
|
||||||
|
- **Secure by default**: WireGuard encryption
|
||||||
|
- **Works everywhere**: Coffee shop, home, office, cloud
|
||||||
|
- **MagicDNS**: Easy addressing (machine-name.tailnet.ts.net)
|
||||||
|
- **Standard SSH**: Works with all SSH tools including sshsync
|
||||||
|
- **No overhead**: Uses regular SSH protocol over Tailscale network
|
||||||
|
|
||||||
|
✅ **Integration approach:**
|
||||||
|
- Tailscale provides the network layer
|
||||||
|
- Standard SSH works over Tailscale
|
||||||
|
- sshsync operates normally using Tailscale hostnames/IPs
|
||||||
|
- No Tailscale-specific code needed in core operations
|
||||||
|
- Tailscale status checking for diagnostics
|
||||||
|
|
||||||
|
**Alternatives:**
|
||||||
|
|
||||||
|
❌ **Direct public internet + port forwarding**
|
||||||
|
- Cons: Complex firewall setup, security risks, doesn't work on mobile/restricted networks
|
||||||
|
- **Rejected because**: Requires too much configuration and has security concerns
|
||||||
|
|
||||||
|
❌ **Other VPNs (WireGuard, OpenVPN, ZeroTier)**
|
||||||
|
- Cons: More manual configuration, less zero-config
|
||||||
|
- **Rejected because**: Tailscale is easier to set up and use
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Tailscale + standard SSH is the optimal combination:
|
||||||
|
- Secure connectivity without configuration
|
||||||
|
- Works with existing SSH tools
|
||||||
|
- No vendor lock-in (can use other VPNs if needed)
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Structure: Modular Scripts + Utilities
|
||||||
|
|
||||||
|
**Decision**: Separate concerns into focused modules
|
||||||
|
|
||||||
|
```
|
||||||
|
scripts/
|
||||||
|
├── sshsync_wrapper.py # sshsync CLI interface
|
||||||
|
├── tailscale_manager.py # Tailscale operations
|
||||||
|
├── load_balancer.py # Task distribution logic
|
||||||
|
├── workflow_executor.py # Common workflows
|
||||||
|
└── utils/
|
||||||
|
├── helpers.py # Formatting, parsing
|
||||||
|
└── validators/ # Input validation
|
||||||
|
```
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Modularity:**
|
||||||
|
- Each script has single responsibility
|
||||||
|
- Easy to test independently
|
||||||
|
- Easy to extend without breaking others
|
||||||
|
|
||||||
|
✅ **Reusability:**
|
||||||
|
- Helpers used across all scripts
|
||||||
|
- Validators prevent duplicate validation logic
|
||||||
|
- Workflows compose lower-level operations
|
||||||
|
|
||||||
|
✅ **Maintainability:**
|
||||||
|
- Clear file organization
|
||||||
|
- Easy to locate specific functionality
|
||||||
|
- Separation of concerns
|
||||||
|
|
||||||
|
**Alternatives:**
|
||||||
|
|
||||||
|
❌ **Monolithic single script**
|
||||||
|
- Cons: Hard to test, hard to maintain, becomes too large
|
||||||
|
- **Rejected because**: Doesn't scale well
|
||||||
|
|
||||||
|
❌ **Over-engineered class hierarchy**
|
||||||
|
- Cons: Unnecessary complexity for this use case
|
||||||
|
- **Rejected because**: Simple functions are sufficient
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Modular functional approach provides good balance of simplicity and maintainability.
|
||||||
|
|
||||||
|
### Validation Strategy: Multi-Layer
|
||||||
|
|
||||||
|
**Decision**: Validate at multiple layers
|
||||||
|
|
||||||
|
**Layers:**
|
||||||
|
|
||||||
|
1. **Parameter validation** (`parameter_validator.py`)
|
||||||
|
- Validates user inputs before any operations
|
||||||
|
- Prevents invalid hosts, groups, paths, etc.
|
||||||
|
|
||||||
|
2. **Host validation** (`host_validator.py`)
|
||||||
|
- Validates SSH configuration exists
|
||||||
|
- Checks host reachability
|
||||||
|
- Validates group membership
|
||||||
|
|
||||||
|
3. **Connection validation** (`connection_validator.py`)
|
||||||
|
- Tests actual SSH connectivity
|
||||||
|
- Verifies Tailscale status
|
||||||
|
- Checks SSH key authentication
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Early failure:**
|
||||||
|
- Catch errors before expensive operations
|
||||||
|
- Clear error messages at each layer
|
||||||
|
|
||||||
|
✅ **Comprehensive:**
|
||||||
|
- Multiple validation points catch different issues
|
||||||
|
- Reduces runtime failures
|
||||||
|
|
||||||
|
✅ **User-friendly:**
|
||||||
|
- Helpful error messages with suggestions
|
||||||
|
- Clear indication of what went wrong
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Multi-layer validation provides robust error handling and great user experience.
|
||||||
|
|
||||||
|
## Load Balancing Strategy
|
||||||
|
|
||||||
|
### Decision: Simple Composite Score
|
||||||
|
|
||||||
|
**Formula:**
|
||||||
|
```python
|
||||||
|
score = (cpu_pct * 0.4) + (mem_pct * 0.3) + (disk_pct * 0.3)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Weights:**
|
||||||
|
- CPU: 40% (most important for compute tasks)
|
||||||
|
- Memory: 30% (important for data processing)
|
||||||
|
- Disk: 30% (important for I/O operations)
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Simple and effective:**
|
||||||
|
- Easy to understand
|
||||||
|
- Fast to calculate
|
||||||
|
- Works well for most workloads
|
||||||
|
|
||||||
|
✅ **Balanced:**
|
||||||
|
- Considers multiple resource types
|
||||||
|
- No single metric dominates
|
||||||
|
|
||||||
|
**Alternatives:**
|
||||||
|
|
||||||
|
❌ **CPU only**
|
||||||
|
- Cons: Ignores memory-bound and I/O-bound tasks
|
||||||
|
- **Rejected because**: Too narrow
|
||||||
|
|
||||||
|
❌ **Complex ML-based prediction**
|
||||||
|
- Cons: Overkill, slow, requires training data
|
||||||
|
- **Rejected because**: Unnecessary complexity
|
||||||
|
|
||||||
|
❌ **Fixed round-robin**
|
||||||
|
- Cons: Doesn't consider actual load
|
||||||
|
- **Rejected because**: Can overload already-busy hosts
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Simple weighted score provides good balance without complexity.
|
||||||
|
|
||||||
|
## Error Handling Philosophy
|
||||||
|
|
||||||
|
### Decision: Graceful Degradation + Clear Messages
|
||||||
|
|
||||||
|
**Principles:**
|
||||||
|
|
||||||
|
1. **Fail early with validation**: Catch errors before operations
|
||||||
|
2. **Isolate failures**: One host failure doesn't stop others
|
||||||
|
3. **Clear messages**: Tell user exactly what went wrong and how to fix
|
||||||
|
4. **Automatic retry**: Retry transient errors (network, timeout)
|
||||||
|
5. **Dry-run support**: Preview operations before execution
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Example error handling pattern
|
||||||
|
try:
|
||||||
|
validate_host(host)
|
||||||
|
validate_ssh_connection(host)
|
||||||
|
result = execute_command(host, command)
|
||||||
|
except ValidationError as e:
|
||||||
|
return {'error': str(e), 'suggestion': 'Fix: ...'}
|
||||||
|
except ConnectionError as e:
|
||||||
|
return {'error': str(e), 'diagnostics': get_diagnostics(host)}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Better UX:**
|
||||||
|
- Users know exactly what's wrong
|
||||||
|
- Suggestions help fix issues quickly
|
||||||
|
|
||||||
|
✅ **Reliability:**
|
||||||
|
- Automatic retry handles transient issues
|
||||||
|
- Dry-run prevents mistakes
|
||||||
|
|
||||||
|
✅ **Debugging:**
|
||||||
|
- Clear error messages speed up troubleshooting
|
||||||
|
- Diagnostics provide actionable information
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Graceful degradation with helpful messages creates better user experience.
|
||||||
|
|
||||||
|
## Caching Strategy
|
||||||
|
|
||||||
|
**Decision**: Minimal caching for real-time accuracy
|
||||||
|
|
||||||
|
**What we cache:**
|
||||||
|
- Nothing (v1.0.0)
|
||||||
|
|
||||||
|
**Why no caching:**
|
||||||
|
- Host status changes frequently
|
||||||
|
- Load metrics change constantly
|
||||||
|
- Operations need real-time data
|
||||||
|
- Cache invalidation is complex
|
||||||
|
|
||||||
|
**Future consideration (v2.0):**
|
||||||
|
- Cache Tailscale status (60s TTL)
|
||||||
|
- Cache group configuration (5min TTL)
|
||||||
|
- Cache SSH config parsing (5min TTL)
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Simplicity:**
|
||||||
|
- No cache invalidation logic needed
|
||||||
|
- No stale data issues
|
||||||
|
|
||||||
|
✅ **Accuracy:**
|
||||||
|
- Always get current state
|
||||||
|
- No surprises from cached data
|
||||||
|
|
||||||
|
**Trade-off:**
|
||||||
|
- Slightly slower repeated operations
|
||||||
|
- More network calls
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
For v1.0.0, simplicity and accuracy outweigh performance concerns. Real-time data is more valuable than speed.
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Decision: Comprehensive Unit + Integration Tests
|
||||||
|
|
||||||
|
**Coverage:**
|
||||||
|
|
||||||
|
- **29 tests total:**
|
||||||
|
- 11 integration tests (end-to-end workflows)
|
||||||
|
- 11 helper tests (formatting, parsing, calculations)
|
||||||
|
- 7 validation tests (input validation, safety checks)
|
||||||
|
|
||||||
|
**Test Philosophy:**
|
||||||
|
|
||||||
|
1. **Test real functionality**: Integration tests use actual functions
|
||||||
|
2. **Test edge cases**: Validation tests cover error conditions
|
||||||
|
3. **Test helpers**: Ensure formatting/parsing works correctly
|
||||||
|
4. **Fast execution**: All tests run in < 10 seconds
|
||||||
|
5. **No external dependencies**: Tests don't require Tailscale or sshsync to be running
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Confidence:**
|
||||||
|
- Tests verify code works as expected
|
||||||
|
- Catches regressions when modifying code
|
||||||
|
|
||||||
|
✅ **Documentation:**
|
||||||
|
- Tests show how to use functions
|
||||||
|
- Examples of expected behavior
|
||||||
|
|
||||||
|
✅ **Reliability:**
|
||||||
|
- Production-ready code from v1.0.0
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Comprehensive testing ensures reliable code from the start.
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
### Parallel Execution
|
||||||
|
|
||||||
|
**Decision**: Leverage sshsync's built-in parallelization
|
||||||
|
|
||||||
|
- sshsync runs commands concurrently across hosts automatically
|
||||||
|
- No need to implement custom threading/multiprocessing
|
||||||
|
- Timeout applies per-host independently
|
||||||
|
|
||||||
|
**Trade-offs:**
|
||||||
|
|
||||||
|
✅ **Pros:**
|
||||||
|
- Simple to use
|
||||||
|
- Fast for large host groups
|
||||||
|
- No concurrency bugs
|
||||||
|
|
||||||
|
⚠️ **Cons:**
|
||||||
|
- Less control over parallelism level
|
||||||
|
- Can overwhelm network with too many concurrent connections
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Built-in parallelization is sufficient for most use cases. Custom control can be added in v2.0 if needed.
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
### SSH Key Authentication
|
||||||
|
|
||||||
|
**Decision**: Require SSH keys (no password auth)
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Security:**
|
||||||
|
- Keys are more secure than passwords
|
||||||
|
- Can't be brute-forced
|
||||||
|
- Can be revoked per-host
|
||||||
|
|
||||||
|
✅ **Automation:**
|
||||||
|
- Non-interactive (no password prompts)
|
||||||
|
- Works in scripts and CI/CD
|
||||||
|
|
||||||
|
**Implementation:**
|
||||||
|
- Validators check SSH key auth works
|
||||||
|
- Clear error messages guide users to set up keys
|
||||||
|
- Documentation explains SSH key setup
|
||||||
|
|
||||||
|
### Command Safety
|
||||||
|
|
||||||
|
**Decision**: Validate dangerous commands
|
||||||
|
|
||||||
|
**Dangerous patterns blocked:**
|
||||||
|
- `rm -rf /` (root deletion)
|
||||||
|
- `mkfs.*` (filesystem formatting)
|
||||||
|
- `dd.*of=/dev/` (direct disk writes)
|
||||||
|
- Fork bombs
|
||||||
|
- Direct disk writes
|
||||||
|
|
||||||
|
**Override**: Use `allow_dangerous=True` to bypass
|
||||||
|
|
||||||
|
**Justification:**
|
||||||
|
|
||||||
|
✅ **Safety:**
|
||||||
|
- Prevents accidental destructive operations
|
||||||
|
- Dry-run provides preview
|
||||||
|
|
||||||
|
✅ **Flexibility:**
|
||||||
|
- Can still run dangerous commands if explicitly allowed
|
||||||
|
|
||||||
|
**Conclusion:**
|
||||||
|
|
||||||
|
Safety by default with escape hatch for advanced users.
|
||||||
|
|
||||||
|
## Decisions Summary
|
||||||
|
|
||||||
|
| Decision | Choice | Rationale |
|
||||||
|
|----------|--------|-----------|
|
||||||
|
| **CLI Tool** | sshsync | Best balance of features, ease of use, and maintenance |
|
||||||
|
| **Network** | Tailscale | Zero-config secure VPN, works everywhere |
|
||||||
|
| **Architecture** | Modular scripts | Clear separation of concerns, maintainable |
|
||||||
|
| **Validation** | Multi-layer | Catch errors early with helpful messages |
|
||||||
|
| **Load Balancing** | Composite score | Simple, effective, considers multiple resources |
|
||||||
|
| **Caching** | None (v1.0) | Simplicity and real-time accuracy |
|
||||||
|
| **Testing** | 29 tests | Comprehensive coverage for reliability |
|
||||||
|
| **Security** | SSH keys + validation | Secure and automation-friendly |
|
||||||
|
|
||||||
|
## Trade-offs Accepted
|
||||||
|
|
||||||
|
1. **No caching** → Slightly slower, but always accurate
|
||||||
|
2. **sshsync dependency** → External tool, but saves development time
|
||||||
|
3. **SSH key requirement** → Setup needed, but more secure
|
||||||
|
4. **Simple load balancing** → Less sophisticated, but fast and easy to understand
|
||||||
|
5. **Terminal UI only** → No web dashboard, but simpler to develop and maintain
|
||||||
|
|
||||||
|
## Future Improvements
|
||||||
|
|
||||||
|
### v2.0 Considerations
|
||||||
|
|
||||||
|
1. **Add caching** for frequently-accessed data (Tailscale status, groups)
|
||||||
|
2. **Web dashboard** for visualization and monitoring
|
||||||
|
3. **Operation history** database for audit trail
|
||||||
|
4. **Advanced load balancing** with custom metrics
|
||||||
|
5. **Automated SSH key distribution** across hosts
|
||||||
|
6. **Integration with config management** tools (Ansible, Terraform)
|
||||||
|
7. **Container support** via SSH to Docker containers
|
||||||
|
8. **Custom validation plugins** for domain-specific checks
|
||||||
|
|
||||||
|
All decisions prioritize **simplicity**, **security**, and **maintainability** for v1.0.0.
|
||||||
707
INSTALLATION.md
Normal file
707
INSTALLATION.md
Normal file
@@ -0,0 +1,707 @@
|
|||||||
|
# Installation Guide
|
||||||
|
|
||||||
|
Complete step-by-step tutorial for setting up Tailscale SSH Sync Agent.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Prerequisites](#prerequisites)
|
||||||
|
2. [Step 1: Install Tailscale](#step-1-install-tailscale)
|
||||||
|
3. [Step 2: Install sshsync](#step-2-install-sshsync)
|
||||||
|
4. [Step 3: Configure SSH](#step-3-configure-ssh)
|
||||||
|
5. [Step 4: Configure sshsync Groups](#step-4-configure-sshsync-groups)
|
||||||
|
6. [Step 5: Install Agent](#step-5-install-agent)
|
||||||
|
7. [Step 6: Test Installation](#step-6-test-installation)
|
||||||
|
8. [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before you begin, ensure you have:
|
||||||
|
|
||||||
|
- **Operating System**: macOS, Linux, or BSD
|
||||||
|
- **Python**: Version 3.10 or higher
|
||||||
|
- **pip**: Python package installer
|
||||||
|
- **Claude Code**: Installed and running
|
||||||
|
- **Remote machines**: At least one machine you want to manage
|
||||||
|
- **SSH access**: Ability to SSH to remote machines
|
||||||
|
|
||||||
|
**Check Python version**:
|
||||||
|
```bash
|
||||||
|
python3 --version
|
||||||
|
# Should show: Python 3.10.x or higher
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check pip**:
|
||||||
|
```bash
|
||||||
|
pip3 --version
|
||||||
|
# Should show: pip xx.x.x from ...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 1: Install Tailscale
|
||||||
|
|
||||||
|
Tailscale provides secure networking between your machines.
|
||||||
|
|
||||||
|
### macOS
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install via Homebrew
|
||||||
|
brew install tailscale
|
||||||
|
|
||||||
|
# Start Tailscale
|
||||||
|
sudo tailscale up
|
||||||
|
|
||||||
|
# Follow authentication link in terminal
|
||||||
|
# This will open browser to log in
|
||||||
|
```
|
||||||
|
|
||||||
|
### Linux (Ubuntu/Debian)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install Tailscale
|
||||||
|
curl -fsSL https://tailscale.com/install.sh | sh
|
||||||
|
|
||||||
|
# Start and authenticate
|
||||||
|
sudo tailscale up
|
||||||
|
|
||||||
|
# Follow authentication link
|
||||||
|
```
|
||||||
|
|
||||||
|
### Linux (Fedora/RHEL)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add repository
|
||||||
|
sudo dnf config-manager --add-repo https://pkgs.tailscale.com/stable/fedora/tailscale.repo
|
||||||
|
|
||||||
|
# Install
|
||||||
|
sudo dnf install tailscale
|
||||||
|
|
||||||
|
# Enable and start
|
||||||
|
sudo systemctl enable --now tailscaled
|
||||||
|
sudo tailscale up
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Tailscale status
|
||||||
|
tailscale status
|
||||||
|
|
||||||
|
# Should show list of machines in your tailnet
|
||||||
|
# Example output:
|
||||||
|
# 100.64.1.10 homelab-1 user@ linux -
|
||||||
|
# 100.64.1.11 laptop user@ macOS -
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**: Install and authenticate Tailscale on **all machines** you want to manage.
|
||||||
|
|
||||||
|
## Step 2: Install sshsync
|
||||||
|
|
||||||
|
sshsync is the CLI tool for managing SSH operations across multiple hosts.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install via pip
|
||||||
|
pip3 install sshsync
|
||||||
|
|
||||||
|
# Or use pipx for isolated installation
|
||||||
|
pipx install sshsync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check version
|
||||||
|
sshsync --version
|
||||||
|
|
||||||
|
# Should show: sshsync, version x.x.x
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Installation Issues
|
||||||
|
|
||||||
|
**Issue**: `pip3: command not found`
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# macOS
|
||||||
|
brew install python3
|
||||||
|
|
||||||
|
# Linux (Ubuntu/Debian)
|
||||||
|
sudo apt install python3-pip
|
||||||
|
|
||||||
|
# Linux (Fedora/RHEL)
|
||||||
|
sudo dnf install python3-pip
|
||||||
|
```
|
||||||
|
|
||||||
|
**Issue**: Permission denied during install
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Install for current user only
|
||||||
|
pip3 install --user sshsync
|
||||||
|
|
||||||
|
# Or use pipx
|
||||||
|
pip3 install --user pipx
|
||||||
|
pipx install sshsync
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 3: Configure SSH
|
||||||
|
|
||||||
|
SSH configuration defines how to connect to each machine.
|
||||||
|
|
||||||
|
### Step 3.1: Generate SSH Keys (if you don't have them)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate ed25519 key (recommended)
|
||||||
|
ssh-keygen -t ed25519 -C "your_email@example.com"
|
||||||
|
|
||||||
|
# Press Enter to use default location (~/.ssh/id_ed25519)
|
||||||
|
# Enter passphrase (or leave empty for no passphrase)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output**:
|
||||||
|
```
|
||||||
|
Your identification has been saved in /Users/you/.ssh/id_ed25519
|
||||||
|
Your public key has been saved in /Users/you/.ssh/id_ed25519.pub
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3.2: Copy Public Key to Remote Machines
|
||||||
|
|
||||||
|
For each remote machine:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Copy SSH key to remote
|
||||||
|
ssh-copy-id user@machine-hostname
|
||||||
|
|
||||||
|
# Example:
|
||||||
|
ssh-copy-id admin@100.64.1.10
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manual method** (if ssh-copy-id doesn't work):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Display public key
|
||||||
|
cat ~/.ssh/id_ed25519.pub
|
||||||
|
|
||||||
|
# SSH to remote machine
|
||||||
|
ssh user@remote-host
|
||||||
|
|
||||||
|
# On remote machine:
|
||||||
|
mkdir -p ~/.ssh
|
||||||
|
chmod 700 ~/.ssh
|
||||||
|
echo "your-public-key-here" >> ~/.ssh/authorized_keys
|
||||||
|
chmod 600 ~/.ssh/authorized_keys
|
||||||
|
exit
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3.3: Test SSH Connection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test connection (should not ask for password)
|
||||||
|
ssh user@remote-host "hostname"
|
||||||
|
|
||||||
|
# If successful, should print remote hostname
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3.4: Create SSH Config File
|
||||||
|
|
||||||
|
Edit `~/.ssh/config`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vim ~/.ssh/config
|
||||||
|
```
|
||||||
|
|
||||||
|
**Add host entries**:
|
||||||
|
|
||||||
|
```
|
||||||
|
# Production servers
|
||||||
|
Host prod-web-01
|
||||||
|
HostName prod-web-01.tailnet.ts.net
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
Port 22
|
||||||
|
|
||||||
|
Host prod-web-02
|
||||||
|
HostName 100.64.1.21
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host prod-db-01
|
||||||
|
HostName 100.64.1.30
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Development
|
||||||
|
Host dev-laptop
|
||||||
|
HostName dev-laptop.tailnet.ts.net
|
||||||
|
User developer
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host dev-desktop
|
||||||
|
HostName 100.64.1.40
|
||||||
|
User developer
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Homelab
|
||||||
|
Host homelab-1
|
||||||
|
HostName 100.64.1.10
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host homelab-2
|
||||||
|
HostName 100.64.1.11
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important fields**:
|
||||||
|
- **Host**: Alias you'll use (e.g., "homelab-1")
|
||||||
|
- **HostName**: Actual hostname or IP (Tailscale hostname or IP)
|
||||||
|
- **User**: SSH username on remote machine
|
||||||
|
- **IdentityFile**: Path to SSH private key
|
||||||
|
|
||||||
|
### Step 3.5: Set Correct Permissions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SSH config should be readable only by you
|
||||||
|
chmod 600 ~/.ssh/config
|
||||||
|
|
||||||
|
# SSH directory permissions
|
||||||
|
chmod 700 ~/.ssh
|
||||||
|
|
||||||
|
# Private key permissions
|
||||||
|
chmod 600 ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Public key permissions
|
||||||
|
chmod 644 ~/.ssh/id_ed25519.pub
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3.6: Verify All Hosts
|
||||||
|
|
||||||
|
Test each host in your config:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test each host
|
||||||
|
ssh homelab-1 "echo 'Connection successful'"
|
||||||
|
ssh prod-web-01 "echo 'Connection successful'"
|
||||||
|
ssh dev-laptop "echo 'Connection successful'"
|
||||||
|
|
||||||
|
# Should connect without asking for password
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Configure sshsync Groups
|
||||||
|
|
||||||
|
Groups organize your hosts for easy management.
|
||||||
|
|
||||||
|
### Step 4.1: Initialize sshsync Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Sync hosts and create groups
|
||||||
|
sshsync sync
|
||||||
|
```
|
||||||
|
|
||||||
|
**What this does**:
|
||||||
|
1. Reads all hosts from `~/.ssh/config`
|
||||||
|
2. Prompts you to assign hosts to groups
|
||||||
|
3. Creates `~/.config/sshsync/config.yaml`
|
||||||
|
|
||||||
|
### Step 4.2: Follow Interactive Prompts
|
||||||
|
|
||||||
|
```
|
||||||
|
Found 7 ungrouped hosts:
|
||||||
|
1. homelab-1
|
||||||
|
2. homelab-2
|
||||||
|
3. prod-web-01
|
||||||
|
4. prod-web-02
|
||||||
|
5. prod-db-01
|
||||||
|
6. dev-laptop
|
||||||
|
7. dev-desktop
|
||||||
|
|
||||||
|
Assign groups now? [Y/n]: Y
|
||||||
|
|
||||||
|
Enter group name for homelab-1 (or skip): homelab
|
||||||
|
Enter group name for homelab-2 (or skip): homelab
|
||||||
|
Enter group name for prod-web-01 (or skip): production,web
|
||||||
|
Enter group name for prod-web-02 (or skip): production,web
|
||||||
|
Enter group name for prod-db-01 (or skip): production,database
|
||||||
|
Enter group name for dev-laptop (or skip): development
|
||||||
|
Enter group name for dev-desktop (or skip): development
|
||||||
|
```
|
||||||
|
|
||||||
|
**Tips**:
|
||||||
|
- Hosts can belong to multiple groups (separate with commas)
|
||||||
|
- Use meaningful group names (production, development, web, database, homelab)
|
||||||
|
- Skip hosts you don't want to group yet
|
||||||
|
|
||||||
|
### Step 4.3: Verify Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View generated config
|
||||||
|
cat ~/.config/sshsync/config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected output**:
|
||||||
|
```yaml
|
||||||
|
groups:
|
||||||
|
production:
|
||||||
|
- prod-web-01
|
||||||
|
- prod-web-02
|
||||||
|
- prod-db-01
|
||||||
|
web:
|
||||||
|
- prod-web-01
|
||||||
|
- prod-web-02
|
||||||
|
database:
|
||||||
|
- prod-db-01
|
||||||
|
development:
|
||||||
|
- dev-laptop
|
||||||
|
- dev-desktop
|
||||||
|
homelab:
|
||||||
|
- homelab-1
|
||||||
|
- homelab-2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4.4: Test sshsync
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List hosts
|
||||||
|
sshsync ls
|
||||||
|
|
||||||
|
# List with status
|
||||||
|
sshsync ls --with-status
|
||||||
|
|
||||||
|
# Test command execution
|
||||||
|
sshsync all "hostname"
|
||||||
|
|
||||||
|
# Test group execution
|
||||||
|
sshsync group homelab "uptime"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Install Agent
|
||||||
|
|
||||||
|
### Step 5.1: Navigate to Agent Directory
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/tailscale-sshsync-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5.2: Verify Agent Structure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List files
|
||||||
|
ls -la
|
||||||
|
|
||||||
|
# Should see:
|
||||||
|
# .claude-plugin/
|
||||||
|
# scripts/
|
||||||
|
# tests/
|
||||||
|
# references/
|
||||||
|
# SKILL.md
|
||||||
|
# README.md
|
||||||
|
# VERSION
|
||||||
|
# CHANGELOG.md
|
||||||
|
# etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5.3: Validate marketplace.json
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check JSON is valid
|
||||||
|
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json')); print('✅ Valid JSON')"
|
||||||
|
|
||||||
|
# Should output: ✅ Valid JSON
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5.4: Install via Claude Code
|
||||||
|
|
||||||
|
In Claude Code:
|
||||||
|
|
||||||
|
```
|
||||||
|
/plugin marketplace add /absolute/path/to/tailscale-sshsync-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example**:
|
||||||
|
```
|
||||||
|
/plugin marketplace add /Users/you/tailscale-sshsync-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected output**:
|
||||||
|
```
|
||||||
|
✓ Plugin installed successfully
|
||||||
|
✓ Skill: tailscale-sshsync-agent
|
||||||
|
✓ Description: Manages distributed workloads and file sharing...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5.5: Verify Installation
|
||||||
|
|
||||||
|
In Claude Code:
|
||||||
|
|
||||||
|
```
|
||||||
|
"Which of my machines are online?"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected response**: Agent should activate and check your Tailscale network.
|
||||||
|
|
||||||
|
## Step 6: Test Installation
|
||||||
|
|
||||||
|
### Test 1: Host Status
|
||||||
|
|
||||||
|
**Query**:
|
||||||
|
```
|
||||||
|
"Which of my machines are online?"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected**: List of hosts with online/offline status
|
||||||
|
|
||||||
|
### Test 2: List Groups
|
||||||
|
|
||||||
|
**Query**:
|
||||||
|
```
|
||||||
|
"What groups do I have configured?"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected**: List of your sshsync groups
|
||||||
|
|
||||||
|
### Test 3: Execute Command
|
||||||
|
|
||||||
|
**Query**:
|
||||||
|
```
|
||||||
|
"Check disk space on homelab machines"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected**: Disk usage for hosts in homelab group
|
||||||
|
|
||||||
|
### Test 4: Dry-Run
|
||||||
|
|
||||||
|
**Query**:
|
||||||
|
```
|
||||||
|
"Show me what would happen if I ran 'uptime' on all machines (dry-run)"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected**: Preview without execution
|
||||||
|
|
||||||
|
### Test 5: Run Test Suite
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /path/to/tailscale-sshsync-agent
|
||||||
|
|
||||||
|
# Run all tests
|
||||||
|
python3 tests/test_integration.py
|
||||||
|
|
||||||
|
# Should show:
|
||||||
|
# Results: 11/11 passed
|
||||||
|
# 🎉 All tests passed!
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Agent Not Activating
|
||||||
|
|
||||||
|
**Symptoms**: Agent doesn't respond to queries about machines/hosts
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Check installation**:
|
||||||
|
```
|
||||||
|
/plugin list
|
||||||
|
```
|
||||||
|
Should show `tailscale-sshsync-agent` in list.
|
||||||
|
|
||||||
|
2. **Reinstall**:
|
||||||
|
```
|
||||||
|
/plugin remove tailscale-sshsync-agent
|
||||||
|
/plugin marketplace add /path/to/tailscale-sshsync-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Check marketplace.json**:
|
||||||
|
```bash
|
||||||
|
cat .claude-plugin/marketplace.json
|
||||||
|
# Verify "description" field matches SKILL.md frontmatter
|
||||||
|
```
|
||||||
|
|
||||||
|
### SSH Connection Fails
|
||||||
|
|
||||||
|
**Symptoms**: "Permission denied" or "Connection refused"
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Check SSH key**:
|
||||||
|
```bash
|
||||||
|
ssh-add -l
|
||||||
|
# Should list your SSH key
|
||||||
|
```
|
||||||
|
|
||||||
|
If not listed:
|
||||||
|
```bash
|
||||||
|
ssh-add ~/.ssh/id_ed25519
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Test SSH directly**:
|
||||||
|
```bash
|
||||||
|
ssh -v hostname
|
||||||
|
# -v shows verbose debug info
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify authorized_keys on remote**:
|
||||||
|
```bash
|
||||||
|
ssh hostname "cat ~/.ssh/authorized_keys"
|
||||||
|
# Should contain your public key
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tailscale Connection Issues
|
||||||
|
|
||||||
|
**Symptoms**: Hosts show as offline in Tailscale
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Check Tailscale status**:
|
||||||
|
```bash
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Restart Tailscale**:
|
||||||
|
```bash
|
||||||
|
# macOS
|
||||||
|
brew services restart tailscale
|
||||||
|
|
||||||
|
# Linux
|
||||||
|
sudo systemctl restart tailscaled
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Re-authenticate**:
|
||||||
|
```bash
|
||||||
|
sudo tailscale up
|
||||||
|
```
|
||||||
|
|
||||||
|
### sshsync Errors
|
||||||
|
|
||||||
|
**Symptoms**: "sshsync: command not found"
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Reinstall sshsync**:
|
||||||
|
```bash
|
||||||
|
pip3 install --upgrade sshsync
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check PATH**:
|
||||||
|
```bash
|
||||||
|
which sshsync
|
||||||
|
# Should show path to sshsync
|
||||||
|
```
|
||||||
|
|
||||||
|
If not found, add to PATH:
|
||||||
|
```bash
|
||||||
|
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
|
||||||
|
source ~/.bashrc
|
||||||
|
```
|
||||||
|
|
||||||
|
### Config File Issues
|
||||||
|
|
||||||
|
**Symptoms**: "Group not found" or "Host not found"
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Verify SSH config**:
|
||||||
|
```bash
|
||||||
|
cat ~/.ssh/config
|
||||||
|
# Check host aliases are correct
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify sshsync config**:
|
||||||
|
```bash
|
||||||
|
cat ~/.config/sshsync/config.yaml
|
||||||
|
# Check groups are defined
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Re-sync**:
|
||||||
|
```bash
|
||||||
|
sshsync sync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test Failures
|
||||||
|
|
||||||
|
**Symptoms**: Tests fail with errors
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Check dependencies**:
|
||||||
|
```bash
|
||||||
|
pip3 list | grep -E "sshsync|pyyaml"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check Python version**:
|
||||||
|
```bash
|
||||||
|
python3 --version
|
||||||
|
# Must be 3.10+
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Run tests individually**:
|
||||||
|
```bash
|
||||||
|
python3 tests/test_helpers.py
|
||||||
|
python3 tests/test_validation.py
|
||||||
|
python3 tests/test_integration.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Post-Installation
|
||||||
|
|
||||||
|
### Recommended Next Steps
|
||||||
|
|
||||||
|
1. **Create more groups** for better organization:
|
||||||
|
```bash
|
||||||
|
sshsync gadd staging
|
||||||
|
sshsync gadd backup-servers
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Test file operations**:
|
||||||
|
```
|
||||||
|
"Push test file to homelab machines (dry-run)"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Set up automation**:
|
||||||
|
- Create scripts for common tasks
|
||||||
|
- Schedule backups
|
||||||
|
- Automate deployments
|
||||||
|
|
||||||
|
4. **Review documentation**:
|
||||||
|
- Read `references/sshsync-guide.md` for advanced sshsync usage
|
||||||
|
- Read `references/tailscale-integration.md` for Tailscale tips
|
||||||
|
|
||||||
|
### Security Checklist
|
||||||
|
|
||||||
|
- ✅ SSH keys are password-protected
|
||||||
|
- ✅ SSH config has correct permissions (600)
|
||||||
|
- ✅ Private keys have correct permissions (600)
|
||||||
|
- ✅ Tailscale ACLs configured (if using teams)
|
||||||
|
- ✅ Only necessary hosts have SSH access
|
||||||
|
- ✅ Regularly review connected devices in Tailscale
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
You now have:
|
||||||
|
|
||||||
|
1. ✅ Tailscale installed and connected
|
||||||
|
2. ✅ sshsync installed and configured
|
||||||
|
3. ✅ SSH keys set up on all machines
|
||||||
|
4. ✅ SSH config with all hosts
|
||||||
|
5. ✅ sshsync groups organized
|
||||||
|
6. ✅ Agent installed in Claude Code
|
||||||
|
7. ✅ Tests passing
|
||||||
|
|
||||||
|
**Start using**:
|
||||||
|
|
||||||
|
```
|
||||||
|
"Which machines are online?"
|
||||||
|
"Run this on the least loaded machine"
|
||||||
|
"Push files to production servers"
|
||||||
|
"Deploy to staging then production"
|
||||||
|
```
|
||||||
|
|
||||||
|
For more examples, see README.md and SKILL.md.
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
If you encounter issues:
|
||||||
|
|
||||||
|
1. Check this troubleshooting section
|
||||||
|
2. Review references/ for detailed guides
|
||||||
|
3. Check DECISIONS.md for architecture rationale
|
||||||
|
4. Run tests to verify installation
|
||||||
|
|
||||||
|
Happy automating! 🚀
|
||||||
3
README.md
Normal file
3
README.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# tailscale-sshsync-agent
|
||||||
|
|
||||||
|
Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.
|
||||||
117
plugin.lock.json
Normal file
117
plugin.lock.json
Normal file
@@ -0,0 +1,117 @@
|
|||||||
|
{
|
||||||
|
"$schema": "internal://schemas/plugin.lock.v1.json",
|
||||||
|
"pluginId": "gh:Human-Frontier-Labs-Inc/human-frontier-labs-marketplace:plugins/tailscale-sshsync-agent",
|
||||||
|
"normalized": {
|
||||||
|
"repo": null,
|
||||||
|
"ref": "refs/tags/v20251128.0",
|
||||||
|
"commit": "3a7cbe9632f245c6b9a4c4bf2731da65c857a7f4",
|
||||||
|
"treeHash": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805",
|
||||||
|
"generatedAt": "2025-11-28T10:11:41.356928Z",
|
||||||
|
"toolVersion": "publish_plugins.py@0.2.0"
|
||||||
|
},
|
||||||
|
"origin": {
|
||||||
|
"remote": "git@github.com:zhongweili/42plugin-data.git",
|
||||||
|
"branch": "master",
|
||||||
|
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
|
||||||
|
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
|
||||||
|
},
|
||||||
|
"manifest": {
|
||||||
|
"name": "tailscale-sshsync-agent",
|
||||||
|
"description": "Manages distributed workloads and file sharing across Tailscale SSH-connected machines. Automates remote command execution, intelligent load balancing, file synchronization workflows, host health monitoring, and multi-machine orchestration using sshsync.",
|
||||||
|
"version": null
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"files": [
|
||||||
|
{
|
||||||
|
"path": "CHANGELOG.md",
|
||||||
|
"sha256": "74dbda933868b7cab410144a831b43e4f1ae6161f2402edcb068a8232c50bfe4"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "README.md",
|
||||||
|
"sha256": "470f165d8ac61a8942e6fb3568c49febb7f803bfa0f4010d14e09f807c34c88e"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "VERSION",
|
||||||
|
"sha256": "59854984853104df5c353e2f681a15fc7924742f9a2e468c29af248dce45ce03"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "SKILL.md",
|
||||||
|
"sha256": "31c8f237f9b3617c32c6ff381ae83d427b50eb0877d3763d9826e00ece6618f1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "INSTALLATION.md",
|
||||||
|
"sha256": "9313ea1bbb0a03e4c078c41b207f3febe800cd38eb57b7205c7b5188238ca46a"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "DECISIONS.md",
|
||||||
|
"sha256": "59549e84aaa8e32d4bdf64d46855714f5cde7f061906e1c74976658883472c82"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "references/tailscale-integration.md",
|
||||||
|
"sha256": "6553b3ceeaca5118a7b005368223ea4b3ab70eb2492ccaf5c2b7f7758b65dd42"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "references/sshsync-guide.md",
|
||||||
|
"sha256": "697ce0b56eda258732a0b924f821e9e24eb6b977934153bdd2045be961e58de2"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "tests/test_validation.py",
|
||||||
|
"sha256": "716ae0d2e86f0e6657903aef6bb714fbd3b5b72d3b109fab4da3f75f90cc2c0a"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "tests/test_helpers.py",
|
||||||
|
"sha256": "3be88e30825414eb3ade048b766c84995dc98a01cb7236ce75201716179279a8"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "tests/test_integration.py",
|
||||||
|
"sha256": "12f7cb857fda23531a9c74caf072cf73b739672b1e99c55f42a2ef8e11238523"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/load_balancer.py",
|
||||||
|
"sha256": "9d87476562ac848a026e42116e381f733d520e9330da33de3d905585af14398d"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/tailscale_manager.py",
|
||||||
|
"sha256": "4b75ebb9423d221b9788eb9352b274e0256c101185de11064a7b4cb00684016e"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/workflow_executor.py",
|
||||||
|
"sha256": "9f23f3bb421e940766e65949e6efa485a313115e297d4c5f1088589155a7bac1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/sshsync_wrapper.py",
|
||||||
|
"sha256": "fc2062ebbc72e3ddc6c6bfb5f22019b23050f5c2ed9ac35c315018a96871fb19"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/utils/helpers.py",
|
||||||
|
"sha256": "b01979ee56ab92037b8f8054a883124d600b8337cf461855092b866091aed24a"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/utils/validators/connection_validator.py",
|
||||||
|
"sha256": "9ac82108e69690b74d9aa89ca51f7d06fe860e880aaa1983d08242d7199d1601"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/utils/validators/parameter_validator.py",
|
||||||
|
"sha256": "157dfcb7f1937df88344647a37a124d52e1de1b992b72c9b9e69d3b717ca0195"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/utils/validators/__init__.py",
|
||||||
|
"sha256": "2d109ad1b5d253578a095c8354159fdf9318154b4f62d9b16eaa1a88a422382d"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": "scripts/utils/validators/host_validator.py",
|
||||||
|
"sha256": "79cab42587435a799349ba8a562c4ec0f3d54f3f2790562c894c6289beade6d6"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"path": ".claude-plugin/plugin.json",
|
||||||
|
"sha256": "0ec7466bbf2e8dc2fe1607feff0cc0ef0ebebf44ff54f17dcce96255e2c21215"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"dirSha256": "832bc62ce02c782663e60a2eb97932166fef39c681a9ca01b9d5dc170860b805"
|
||||||
|
},
|
||||||
|
"security": {
|
||||||
|
"scannedAt": null,
|
||||||
|
"scannerVersion": null,
|
||||||
|
"flags": []
|
||||||
|
}
|
||||||
|
}
|
||||||
466
references/sshsync-guide.md
Normal file
466
references/sshsync-guide.md
Normal file
@@ -0,0 +1,466 @@
|
|||||||
|
# sshsync CLI Tool Guide
|
||||||
|
|
||||||
|
Complete reference for using sshsync with Tailscale SSH Sync Agent.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Installation](#installation)
|
||||||
|
2. [Configuration](#configuration)
|
||||||
|
3. [Core Commands](#core-commands)
|
||||||
|
4. [Advanced Usage](#advanced-usage)
|
||||||
|
5. [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Via pip
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install sshsync
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### 1. SSH Config Setup
|
||||||
|
|
||||||
|
sshsync uses your existing SSH configuration. Edit `~/.ssh/config`:
|
||||||
|
|
||||||
|
```
|
||||||
|
# Example host entries
|
||||||
|
Host homelab-1
|
||||||
|
HostName 100.64.1.10
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
Port 22
|
||||||
|
|
||||||
|
Host prod-web-01
|
||||||
|
HostName 100.64.1.20
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_rsa
|
||||||
|
Port 22
|
||||||
|
|
||||||
|
Host dev-laptop
|
||||||
|
HostName 100.64.1.30
|
||||||
|
User developer
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important Notes**:
|
||||||
|
- sshsync uses the **Host alias** (e.g., "homelab-1"), not the actual hostname
|
||||||
|
- Ensure SSH key authentication is configured
|
||||||
|
- Test each host with `ssh host-alias` before using with sshsync
|
||||||
|
|
||||||
|
### 2. Initialize sshsync Configuration
|
||||||
|
|
||||||
|
First run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync sync
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
1. Read all hosts from your SSH config
|
||||||
|
2. Prompt you to assign hosts to groups
|
||||||
|
3. Create `~/.config/sshsync/config.yaml`
|
||||||
|
|
||||||
|
### 3. sshsync Config File
|
||||||
|
|
||||||
|
Location: `~/.config/sshsync/config.yaml`
|
||||||
|
|
||||||
|
Structure:
|
||||||
|
```yaml
|
||||||
|
groups:
|
||||||
|
production:
|
||||||
|
- prod-web-01
|
||||||
|
- prod-web-02
|
||||||
|
- prod-db-01
|
||||||
|
development:
|
||||||
|
- dev-laptop
|
||||||
|
- dev-desktop
|
||||||
|
homelab:
|
||||||
|
- homelab-1
|
||||||
|
- homelab-2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manual Editing**:
|
||||||
|
- Groups are arbitrary labels (use what makes sense for you)
|
||||||
|
- Hosts can belong to multiple groups
|
||||||
|
- Use consistent host aliases from SSH config
|
||||||
|
|
||||||
|
## Core Commands
|
||||||
|
|
||||||
|
### List Hosts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all configured hosts
|
||||||
|
sshsync ls
|
||||||
|
|
||||||
|
# List with online/offline status
|
||||||
|
sshsync ls --with-status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Example**:
|
||||||
|
```
|
||||||
|
Host Status
|
||||||
|
homelab-1 online
|
||||||
|
homelab-2 offline
|
||||||
|
prod-web-01 online
|
||||||
|
dev-laptop online
|
||||||
|
```
|
||||||
|
|
||||||
|
### Execute Commands
|
||||||
|
|
||||||
|
#### On All Hosts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Execute on all configured hosts
|
||||||
|
sshsync all "df -h"
|
||||||
|
|
||||||
|
# With custom timeout (default: 10s)
|
||||||
|
sshsync all --timeout 20 "systemctl status nginx"
|
||||||
|
|
||||||
|
# Dry-run (preview without executing)
|
||||||
|
sshsync all --dry-run "reboot"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### On Specific Group
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Execute on group
|
||||||
|
sshsync group production "uptime"
|
||||||
|
|
||||||
|
# With timeout
|
||||||
|
sshsync group web-servers --timeout 30 "npm run build"
|
||||||
|
|
||||||
|
# Filter with regex
|
||||||
|
sshsync group production --regex "web-.*" "df -h"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Regex Filtering**:
|
||||||
|
- Filters group members by alias matching pattern
|
||||||
|
- Uses Python regex syntax
|
||||||
|
- Example: `--regex "web-0[1-3]"` matches web-01, web-02, web-03
|
||||||
|
|
||||||
|
### File Transfer
|
||||||
|
|
||||||
|
#### Push Files
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Push to specific host
|
||||||
|
sshsync push --host web-01 ./app /var/www/app
|
||||||
|
|
||||||
|
# Push to group
|
||||||
|
sshsync push --group production ./dist /var/www/app
|
||||||
|
|
||||||
|
# Push to all hosts
|
||||||
|
sshsync push --all ./config.yml /etc/app/config.yml
|
||||||
|
|
||||||
|
# Recursive push (directory with contents)
|
||||||
|
sshsync push --group web --recurse ./app /var/www/app
|
||||||
|
|
||||||
|
# Dry-run
|
||||||
|
sshsync push --group production --dry-run ./dist /var/www/app
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**:
|
||||||
|
- Local path comes first, remote path second
|
||||||
|
- Use `--recurse` for directories
|
||||||
|
- Dry-run shows what would be transferred without executing
|
||||||
|
|
||||||
|
#### Pull Files
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Pull from specific host
|
||||||
|
sshsync pull --host db-01 /var/log/mysql/error.log ./logs/
|
||||||
|
|
||||||
|
# Pull from group (creates separate directories per host)
|
||||||
|
sshsync pull --group databases /var/backups ./backups/
|
||||||
|
|
||||||
|
# Recursive pull
|
||||||
|
sshsync pull --host web-01 --recurse /var/www/app ./backup/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pull Behavior**:
|
||||||
|
- When pulling from groups, creates subdirectory per host
|
||||||
|
- Use `--recurse` to pull entire directory trees
|
||||||
|
- Destination directory created if doesn't exist
|
||||||
|
|
||||||
|
### Group Management
|
||||||
|
|
||||||
|
#### Add Hosts to Group
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Interactive: prompts to select hosts
|
||||||
|
sshsync gadd production
|
||||||
|
|
||||||
|
# Follow prompts to select which hosts to add
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Add Host to SSH Config
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Interactive host addition
|
||||||
|
sshsync hadd
|
||||||
|
|
||||||
|
# Follow prompts for:
|
||||||
|
# - Host alias
|
||||||
|
# - Hostname/IP
|
||||||
|
# - Username
|
||||||
|
# - Port (optional)
|
||||||
|
# - Identity file (optional)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sync Ungrouped Hosts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Assign groups to hosts not yet in any group
|
||||||
|
sshsync sync
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Usage
|
||||||
|
|
||||||
|
### Parallel Execution
|
||||||
|
|
||||||
|
sshsync automatically executes commands in parallel across hosts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# This runs simultaneously on all hosts in group
|
||||||
|
sshsync group web-servers "npm run build"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Performance**:
|
||||||
|
- Commands execute concurrently
|
||||||
|
- Results collected as they complete
|
||||||
|
- Timeout applies per-host independently
|
||||||
|
|
||||||
|
### Timeout Strategies
|
||||||
|
|
||||||
|
Different operations need different timeouts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Quick checks (5-10s)
|
||||||
|
sshsync all --timeout 5 "hostname"
|
||||||
|
|
||||||
|
# Moderate operations (30-60s)
|
||||||
|
sshsync group web --timeout 60 "npm install"
|
||||||
|
|
||||||
|
# Long-running tasks (300s+)
|
||||||
|
sshsync group build --timeout 300 "docker build ."
|
||||||
|
```
|
||||||
|
|
||||||
|
**Timeout Best Practices**:
|
||||||
|
- Set timeout 20-30% longer than expected duration
|
||||||
|
- Use dry-run first to estimate timing
|
||||||
|
- Increase timeout for network-intensive operations
|
||||||
|
|
||||||
|
### Combining with Other Tools
|
||||||
|
|
||||||
|
#### With xargs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get list of online hosts
|
||||||
|
sshsync ls --with-status | grep online | awk '{print $1}' | xargs -I {} echo "Host {} is online"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### With jq (if using JSON output)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Parse structured output (if sshsync supports --json flag)
|
||||||
|
sshsync ls --json | jq '.hosts[] | select(.status=="online") | .name'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### In Shell Scripts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Deploy script using sshsync
|
||||||
|
echo "Deploying to staging..."
|
||||||
|
sshsync push --group staging --recurse ./dist /var/www/app
|
||||||
|
|
||||||
|
if [ $? -eq 0 ]; then
|
||||||
|
echo "Staging deployment successful"
|
||||||
|
|
||||||
|
echo "Running tests..."
|
||||||
|
sshsync group staging "cd /var/www/app && npm test"
|
||||||
|
|
||||||
|
if [ $? -eq 0 ]; then
|
||||||
|
echo "Tests passed, deploying to production..."
|
||||||
|
sshsync push --group production --recurse ./dist /var/www/app
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
#### 1. "Permission denied (publickey)"
|
||||||
|
|
||||||
|
**Cause**: SSH key not configured or not added to ssh-agent
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Add SSH key to agent
|
||||||
|
ssh-add ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Verify it's added
|
||||||
|
ssh-add -l
|
||||||
|
|
||||||
|
# Copy public key to remote
|
||||||
|
ssh-copy-id user@host
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. "Connection timed out"
|
||||||
|
|
||||||
|
**Cause**: Host is offline or network issue
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Test connectivity
|
||||||
|
ping hostname
|
||||||
|
|
||||||
|
# Test Tailscale specifically
|
||||||
|
tailscale ping hostname
|
||||||
|
|
||||||
|
# Check Tailscale status
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. "Host not found in SSH config"
|
||||||
|
|
||||||
|
**Cause**: Host alias not in `~/.ssh/config`
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Add host to SSH config
|
||||||
|
sshsync hadd
|
||||||
|
|
||||||
|
# Or manually edit ~/.ssh/config
|
||||||
|
vim ~/.ssh/config
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4. "Group not found"
|
||||||
|
|
||||||
|
**Cause**: Group doesn't exist in sshsync config
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Add hosts to new group
|
||||||
|
sshsync gadd mygroup
|
||||||
|
|
||||||
|
# Or manually edit config
|
||||||
|
vim ~/.config/sshsync/config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 5. File Transfer Fails
|
||||||
|
|
||||||
|
**Cause**: Insufficient permissions, disk space, or path doesn't exist
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Check remote disk space
|
||||||
|
sshsync group production "df -h"
|
||||||
|
|
||||||
|
# Check remote path exists
|
||||||
|
sshsync group production "ls -ld /target/path"
|
||||||
|
|
||||||
|
# Check permissions
|
||||||
|
sshsync group production "ls -la /target/path"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Debug Mode
|
||||||
|
|
||||||
|
While sshsync doesn't have a built-in verbose mode, you can debug underlying SSH:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Increase SSH verbosity
|
||||||
|
SSH_VERBOSE=1 sshsync all "uptime"
|
||||||
|
|
||||||
|
# Or use dry-run to see what would execute
|
||||||
|
sshsync all --dry-run "command"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
If operations are slow:
|
||||||
|
|
||||||
|
1. **Reduce parallelism** (run on fewer hosts at once)
|
||||||
|
2. **Increase timeout** for network-bound operations
|
||||||
|
3. **Check network latency**:
|
||||||
|
```bash
|
||||||
|
sshsync all "echo $HOSTNAME" --timeout 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration Validation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify SSH config is readable
|
||||||
|
cat ~/.ssh/config
|
||||||
|
|
||||||
|
# Verify sshsync config
|
||||||
|
cat ~/.config/sshsync/config.yaml
|
||||||
|
|
||||||
|
# Test hosts individually
|
||||||
|
for host in $(sshsync ls | awk '{print $1}'); do
|
||||||
|
echo "Testing $host..."
|
||||||
|
ssh $host "echo OK" || echo "FAILED: $host"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Use meaningful host aliases** in SSH config
|
||||||
|
2. **Organize groups logically** (by function, environment, location)
|
||||||
|
3. **Always dry-run first** for destructive operations
|
||||||
|
4. **Set appropriate timeouts** based on operation type
|
||||||
|
5. **Test SSH keys** before using sshsync
|
||||||
|
6. **Keep groups updated** as infrastructure changes
|
||||||
|
7. **Use --with-status** to check availability before operations
|
||||||
|
|
||||||
|
## Integration with Tailscale
|
||||||
|
|
||||||
|
sshsync works seamlessly with Tailscale SSH:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SSH config using Tailscale hostname
|
||||||
|
Host homelab-1
|
||||||
|
HostName homelab-1.tailnet.ts.net
|
||||||
|
User admin
|
||||||
|
|
||||||
|
# Or using Tailscale IP directly
|
||||||
|
Host homelab-1
|
||||||
|
HostName 100.64.1.10
|
||||||
|
User admin
|
||||||
|
```
|
||||||
|
|
||||||
|
**Tailscale Advantages**:
|
||||||
|
- No need for port forwarding
|
||||||
|
- Encrypted connections
|
||||||
|
- MagicDNS for easy hostnames
|
||||||
|
- Works across NATs
|
||||||
|
|
||||||
|
**Verify Tailscale**:
|
||||||
|
```bash
|
||||||
|
# Check Tailscale network
|
||||||
|
tailscale status
|
||||||
|
|
||||||
|
# Ping host via Tailscale
|
||||||
|
tailscale ping homelab-1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
sshsync simplifies multi-host SSH operations:
|
||||||
|
- ✅ Execute commands across host groups
|
||||||
|
- ✅ Transfer files to/from multiple hosts
|
||||||
|
- ✅ Organize hosts into logical groups
|
||||||
|
- ✅ Parallel execution for speed
|
||||||
|
- ✅ Dry-run mode for safety
|
||||||
|
- ✅ Works great with Tailscale
|
||||||
|
|
||||||
|
For more help: `sshsync --help`
|
||||||
468
references/tailscale-integration.md
Normal file
468
references/tailscale-integration.md
Normal file
@@ -0,0 +1,468 @@
|
|||||||
|
# Tailscale Integration Guide
|
||||||
|
|
||||||
|
How to use Tailscale SSH with sshsync for secure, zero-config remote access.
|
||||||
|
|
||||||
|
## What is Tailscale?
|
||||||
|
|
||||||
|
Tailscale is a zero-config VPN that creates a secure network between your devices using WireGuard. It provides:
|
||||||
|
|
||||||
|
- **Peer-to-peer encrypted connections**
|
||||||
|
- **No port forwarding required**
|
||||||
|
- **Works across NATs and firewalls**
|
||||||
|
- **MagicDNS for easy device addressing**
|
||||||
|
- **Built-in SSH functionality**
|
||||||
|
- **Access control lists (ACLs)**
|
||||||
|
|
||||||
|
## Why Tailscale + sshsync?
|
||||||
|
|
||||||
|
Combining Tailscale with sshsync gives you:
|
||||||
|
|
||||||
|
1. **Secure connections** everywhere (Tailscale encryption)
|
||||||
|
2. **Simple addressing** (MagicDNS hostnames)
|
||||||
|
3. **Multi-host operations** (sshsync groups and execution)
|
||||||
|
4. **No firewall configuration** needed
|
||||||
|
5. **Works from anywhere** (coffee shop, home, office)
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### 1. Install Tailscale
|
||||||
|
|
||||||
|
**macOS**:
|
||||||
|
```bash
|
||||||
|
brew install tailscale
|
||||||
|
```
|
||||||
|
|
||||||
|
**Linux**:
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://tailscale.com/install.sh | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify Installation**:
|
||||||
|
```bash
|
||||||
|
tailscale version
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Connect to Tailscale
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start Tailscale
|
||||||
|
sudo tailscale up
|
||||||
|
|
||||||
|
# Follow the authentication link
|
||||||
|
# This opens browser to authenticate
|
||||||
|
|
||||||
|
# Verify connection
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Configure SSH via Tailscale
|
||||||
|
|
||||||
|
Tailscale provides two SSH options:
|
||||||
|
|
||||||
|
#### Option A: Tailscale SSH (Built-in)
|
||||||
|
|
||||||
|
**Enable on each machine**:
|
||||||
|
```bash
|
||||||
|
sudo tailscale up --ssh
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use**:
|
||||||
|
```bash
|
||||||
|
tailscale ssh user@machine-name
|
||||||
|
```
|
||||||
|
|
||||||
|
**Advantages**:
|
||||||
|
- No SSH server configuration needed
|
||||||
|
- Uses Tailscale authentication
|
||||||
|
- Automatic key management
|
||||||
|
|
||||||
|
#### Option B: Standard SSH over Tailscale (Recommended for sshsync)
|
||||||
|
|
||||||
|
**Configure SSH config** to use Tailscale hostnames:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# ~/.ssh/config
|
||||||
|
|
||||||
|
Host homelab-1
|
||||||
|
HostName homelab-1.tailnet-name.ts.net
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Or use Tailscale IP directly
|
||||||
|
Host homelab-2
|
||||||
|
HostName 100.64.1.10
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
```
|
||||||
|
|
||||||
|
**Advantages**:
|
||||||
|
- Works with all SSH tools (including sshsync)
|
||||||
|
- Standard SSH key authentication
|
||||||
|
- More flexibility
|
||||||
|
|
||||||
|
## Getting Tailscale Hostnames and IPs
|
||||||
|
|
||||||
|
### View All Machines
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output**:
|
||||||
|
```
|
||||||
|
100.64.1.10 homelab-1 user@ linux -
|
||||||
|
100.64.1.11 homelab-2 user@ linux -
|
||||||
|
100.64.1.20 laptop user@ macOS -
|
||||||
|
100.64.1.30 phone user@ iOS offline
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get MagicDNS Hostname
|
||||||
|
|
||||||
|
**Format**: `machine-name.tailnet-name.ts.net`
|
||||||
|
|
||||||
|
**Find your tailnet name**:
|
||||||
|
```bash
|
||||||
|
tailscale status --json | grep -i tailnet
|
||||||
|
```
|
||||||
|
|
||||||
|
Or check in Tailscale admin console: https://login.tailscale.com/admin/machines
|
||||||
|
|
||||||
|
### Get Tailscale IP
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Your own IP
|
||||||
|
tailscale ip -4
|
||||||
|
|
||||||
|
# Another machine's IP (from status output)
|
||||||
|
tailscale status | grep machine-name
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Connectivity
|
||||||
|
|
||||||
|
### Ping via Tailscale
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Ping by hostname
|
||||||
|
tailscale ping homelab-1
|
||||||
|
|
||||||
|
# Ping by IP
|
||||||
|
tailscale ping 100.64.1.10
|
||||||
|
```
|
||||||
|
|
||||||
|
**Successful output**:
|
||||||
|
```
|
||||||
|
pong from homelab-1 (100.64.1.10) via DERP(nyc) in 45ms
|
||||||
|
pong from homelab-1 (100.64.1.10) via DERP(nyc) in 43ms
|
||||||
|
```
|
||||||
|
|
||||||
|
**Failed output**:
|
||||||
|
```
|
||||||
|
timeout waiting for pong
|
||||||
|
```
|
||||||
|
|
||||||
|
### SSH Test
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test SSH connection
|
||||||
|
ssh user@homelab-1.tailnet.ts.net
|
||||||
|
|
||||||
|
# Or with IP
|
||||||
|
ssh user@100.64.1.10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuring sshsync with Tailscale
|
||||||
|
|
||||||
|
### Step 1: Add Tailscale Hosts to SSH Config
|
||||||
|
|
||||||
|
```bash
|
||||||
|
vim ~/.ssh/config
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example configuration**:
|
||||||
|
```
|
||||||
|
# Production servers
|
||||||
|
Host prod-web-01
|
||||||
|
HostName prod-web-01.tailnet.ts.net
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host prod-web-02
|
||||||
|
HostName prod-web-02.tailnet.ts.net
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host prod-db-01
|
||||||
|
HostName prod-db-01.tailnet.ts.net
|
||||||
|
User deploy
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Homelab
|
||||||
|
Host homelab-1
|
||||||
|
HostName 100.64.1.10
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
Host homelab-2
|
||||||
|
HostName 100.64.1.11
|
||||||
|
User admin
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
|
||||||
|
# Development
|
||||||
|
Host dev-laptop
|
||||||
|
HostName dev-laptop.tailnet.ts.net
|
||||||
|
User developer
|
||||||
|
IdentityFile ~/.ssh/id_ed25519
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Test Each Host
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test connectivity to each host
|
||||||
|
ssh prod-web-01 "hostname"
|
||||||
|
ssh homelab-1 "hostname"
|
||||||
|
ssh dev-laptop "hostname"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Initialize sshsync
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Sync hosts and create groups
|
||||||
|
sshsync sync
|
||||||
|
|
||||||
|
# Add hosts to groups
|
||||||
|
sshsync gadd production
|
||||||
|
# Select: prod-web-01, prod-web-02, prod-db-01
|
||||||
|
|
||||||
|
sshsync gadd homelab
|
||||||
|
# Select: homelab-1, homelab-2
|
||||||
|
|
||||||
|
sshsync gadd development
|
||||||
|
# Select: dev-laptop
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Verify Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all hosts with status
|
||||||
|
sshsync ls --with-status
|
||||||
|
|
||||||
|
# Test command execution
|
||||||
|
sshsync all "uptime"
|
||||||
|
|
||||||
|
# Test group execution
|
||||||
|
sshsync group production "df -h"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Tailscale Features
|
||||||
|
|
||||||
|
### Tailnet Lock
|
||||||
|
|
||||||
|
Prevents unauthorized device additions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tailscale lock status
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exit Nodes
|
||||||
|
|
||||||
|
Route all traffic through a specific machine:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable exit node on a machine
|
||||||
|
sudo tailscale up --advertise-exit-node
|
||||||
|
|
||||||
|
# Use exit node from another machine
|
||||||
|
sudo tailscale set --exit-node=exit-node-name
|
||||||
|
```
|
||||||
|
|
||||||
|
### Subnet Routing
|
||||||
|
|
||||||
|
Access networks behind Tailscale machines:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Advertise subnet routes
|
||||||
|
sudo tailscale up --advertise-routes=192.168.1.0/24
|
||||||
|
```
|
||||||
|
|
||||||
|
### ACLs (Access Control Lists)
|
||||||
|
|
||||||
|
Control who can access what: https://login.tailscale.com/admin/acls
|
||||||
|
|
||||||
|
**Example ACL**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"acls": [
|
||||||
|
{
|
||||||
|
"action": "accept",
|
||||||
|
"src": ["group:admins"],
|
||||||
|
"dst": ["*:22", "*:80", "*:443"]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"action": "accept",
|
||||||
|
"src": ["group:developers"],
|
||||||
|
"dst": ["tag:development:*"]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Machine Shows Offline
|
||||||
|
|
||||||
|
**Check Tailscale status**:
|
||||||
|
```bash
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart Tailscale**:
|
||||||
|
```bash
|
||||||
|
# macOS
|
||||||
|
brew services restart tailscale
|
||||||
|
|
||||||
|
# Linux
|
||||||
|
sudo systemctl restart tailscaled
|
||||||
|
```
|
||||||
|
|
||||||
|
**Re-authenticate**:
|
||||||
|
```bash
|
||||||
|
sudo tailscale up
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cannot Connect via SSH
|
||||||
|
|
||||||
|
1. **Verify Tailscale connectivity**:
|
||||||
|
```bash
|
||||||
|
tailscale ping machine-name
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check SSH is running** on remote:
|
||||||
|
```bash
|
||||||
|
tailscale ssh machine-name "systemctl status sshd"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify SSH keys**:
|
||||||
|
```bash
|
||||||
|
ssh-add -l
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Test SSH directly**:
|
||||||
|
```bash
|
||||||
|
ssh -v user@machine-name.tailnet.ts.net
|
||||||
|
```
|
||||||
|
|
||||||
|
### High Latency
|
||||||
|
|
||||||
|
**Check connection method**:
|
||||||
|
```bash
|
||||||
|
tailscale status
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for "direct" vs "DERP relay":
|
||||||
|
- **Direct**: Low latency (< 50ms)
|
||||||
|
- **DERP relay**: Higher latency (100-200ms)
|
||||||
|
|
||||||
|
**Force direct connection**:
|
||||||
|
```bash
|
||||||
|
# Ensure both machines can establish P2P
|
||||||
|
# May require NAT traversal
|
||||||
|
```
|
||||||
|
|
||||||
|
### MagicDNS Not Working
|
||||||
|
|
||||||
|
**Enable MagicDNS**:
|
||||||
|
1. Go to https://login.tailscale.com/admin/dns
|
||||||
|
2. Enable MagicDNS
|
||||||
|
|
||||||
|
**Verify**:
|
||||||
|
```bash
|
||||||
|
nslookup machine-name.tailnet.ts.net
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Best Practices
|
||||||
|
|
||||||
|
1. **Use SSH keys**, not passwords
|
||||||
|
2. **Enable Tailnet Lock** to prevent unauthorized devices
|
||||||
|
3. **Use ACLs** to restrict access
|
||||||
|
4. **Regularly review** connected devices
|
||||||
|
5. **Set up key expiry** for team members who leave
|
||||||
|
6. **Use tags** for machine roles
|
||||||
|
7. **Enable two-factor auth** for Tailscale account
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
### Check Network Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# All machines
|
||||||
|
tailscale status
|
||||||
|
|
||||||
|
# Self status
|
||||||
|
tailscale status --self
|
||||||
|
|
||||||
|
# JSON format for parsing
|
||||||
|
tailscale status --json
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# macOS
|
||||||
|
tail -f /var/log/tailscaled.log
|
||||||
|
|
||||||
|
# Linux
|
||||||
|
journalctl -u tailscaled -f
|
||||||
|
```
|
||||||
|
|
||||||
|
## Use Cases with sshsync
|
||||||
|
|
||||||
|
### 1. Deploy to All Production Servers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync push --group production --recurse ./dist /var/www/app
|
||||||
|
sshsync group production "cd /var/www/app && pm2 restart all"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Collect Logs from All Servers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync pull --group production /var/log/app/error.log ./logs/
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Update All Homelab Machines
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync group homelab "sudo apt update && sudo apt upgrade -y"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Check Disk Space Everywhere
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync all "df -h /"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Sync Configuration Across Machines
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sshsync push --all ~/dotfiles/.bashrc ~/.bashrc
|
||||||
|
sshsync push --all ~/dotfiles/.vimrc ~/.vimrc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Tailscale + sshsync = **Powerful Remote Management**
|
||||||
|
|
||||||
|
- ✅ Secure connections everywhere (WireGuard encryption)
|
||||||
|
- ✅ No firewall configuration needed
|
||||||
|
- ✅ Easy addressing (MagicDNS)
|
||||||
|
- ✅ Multi-host operations (sshsync groups)
|
||||||
|
- ✅ Works from anywhere
|
||||||
|
|
||||||
|
**Quick Start**:
|
||||||
|
1. Install Tailscale: `brew install tailscale`
|
||||||
|
2. Connect: `sudo tailscale up`
|
||||||
|
3. Configure SSH config with Tailscale hostnames
|
||||||
|
4. Initialize sshsync: `sshsync sync`
|
||||||
|
5. Start managing: `sshsync all "uptime"`
|
||||||
|
|
||||||
|
For more: https://tailscale.com/kb/
|
||||||
378
scripts/load_balancer.py
Normal file
378
scripts/load_balancer.py
Normal file
@@ -0,0 +1,378 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Load balancer for Tailscale SSH Sync Agent.
|
||||||
|
Intelligent task distribution based on machine resources.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
from dataclasses import dataclass
|
||||||
|
import logging
|
||||||
|
|
||||||
|
# Add utils to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
|
||||||
|
from utils.helpers import parse_cpu_load, parse_memory_usage, parse_disk_usage, calculate_load_score, classify_load_status
|
||||||
|
from sshsync_wrapper import execute_on_host
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MachineMetrics:
|
||||||
|
"""Resource metrics for a machine."""
|
||||||
|
host: str
|
||||||
|
cpu_pct: float
|
||||||
|
mem_pct: float
|
||||||
|
disk_pct: float
|
||||||
|
load_score: float
|
||||||
|
status: str
|
||||||
|
|
||||||
|
|
||||||
|
def get_machine_load(host: str, timeout: int = 10) -> Optional[MachineMetrics]:
|
||||||
|
"""
|
||||||
|
Get CPU, memory, disk metrics for a machine.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to check
|
||||||
|
timeout: Command timeout
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
MachineMetrics object or None on failure
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> metrics = get_machine_load("web-01")
|
||||||
|
>>> metrics.cpu_pct
|
||||||
|
45.2
|
||||||
|
>>> metrics.load_score
|
||||||
|
0.49
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Get CPU load
|
||||||
|
cpu_result = execute_on_host(host, "uptime", timeout=timeout)
|
||||||
|
cpu_data = {}
|
||||||
|
if cpu_result.get('success'):
|
||||||
|
cpu_data = parse_cpu_load(cpu_result['stdout'])
|
||||||
|
|
||||||
|
# Get memory usage
|
||||||
|
mem_result = execute_on_host(host, "free -m 2>/dev/null || vm_stat", timeout=timeout)
|
||||||
|
mem_data = {}
|
||||||
|
if mem_result.get('success'):
|
||||||
|
mem_data = parse_memory_usage(mem_result['stdout'])
|
||||||
|
|
||||||
|
# Get disk usage
|
||||||
|
disk_result = execute_on_host(host, "df -h / | tail -1", timeout=timeout)
|
||||||
|
disk_data = {}
|
||||||
|
if disk_result.get('success'):
|
||||||
|
disk_data = parse_disk_usage(disk_result['stdout'])
|
||||||
|
|
||||||
|
# Calculate metrics
|
||||||
|
# CPU: Use 1-min load average, normalize by assuming 4 cores (adjust as needed)
|
||||||
|
cpu_pct = (cpu_data.get('load_1min', 0) / 4.0) * 100 if cpu_data else 50.0
|
||||||
|
|
||||||
|
# Memory: Direct percentage
|
||||||
|
mem_pct = mem_data.get('use_pct', 50.0)
|
||||||
|
|
||||||
|
# Disk: Direct percentage
|
||||||
|
disk_pct = disk_data.get('use_pct', 50.0)
|
||||||
|
|
||||||
|
# Calculate load score
|
||||||
|
score = calculate_load_score(cpu_pct, mem_pct, disk_pct)
|
||||||
|
status = classify_load_status(score)
|
||||||
|
|
||||||
|
return MachineMetrics(
|
||||||
|
host=host,
|
||||||
|
cpu_pct=cpu_pct,
|
||||||
|
mem_pct=mem_pct,
|
||||||
|
disk_pct=disk_pct,
|
||||||
|
load_score=score,
|
||||||
|
status=status
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error getting load for {host}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def select_optimal_host(candidates: List[str],
|
||||||
|
prefer_group: Optional[str] = None,
|
||||||
|
timeout: int = 10) -> Tuple[Optional[str], Optional[MachineMetrics]]:
|
||||||
|
"""
|
||||||
|
Pick best host from candidates based on load.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
candidates: List of candidate hosts
|
||||||
|
prefer_group: Prefer hosts from this group if available
|
||||||
|
timeout: Timeout for metric gathering
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (selected_host, metrics)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> host, metrics = select_optimal_host(["web-01", "web-02", "web-03"])
|
||||||
|
>>> host
|
||||||
|
"web-03"
|
||||||
|
>>> metrics.load_score
|
||||||
|
0.28
|
||||||
|
"""
|
||||||
|
if not candidates:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
# Get metrics for all candidates
|
||||||
|
metrics_list: List[MachineMetrics] = []
|
||||||
|
|
||||||
|
for host in candidates:
|
||||||
|
metrics = get_machine_load(host, timeout=timeout)
|
||||||
|
if metrics:
|
||||||
|
metrics_list.append(metrics)
|
||||||
|
|
||||||
|
if not metrics_list:
|
||||||
|
logger.warning("No valid metrics collected from candidates")
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
# Sort by load score (lower is better)
|
||||||
|
metrics_list.sort(key=lambda m: m.load_score)
|
||||||
|
|
||||||
|
# If prefer_group specified, prioritize those hosts if load is similar
|
||||||
|
if prefer_group:
|
||||||
|
from utils.helpers import parse_sshsync_config, get_groups_for_host
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
|
||||||
|
# Find hosts in preferred group
|
||||||
|
preferred_metrics = [
|
||||||
|
m for m in metrics_list
|
||||||
|
if prefer_group in get_groups_for_host(m.host, groups_config)
|
||||||
|
]
|
||||||
|
|
||||||
|
# Use preferred if load score within 20% of absolute best
|
||||||
|
if preferred_metrics:
|
||||||
|
best_score = metrics_list[0].load_score
|
||||||
|
for m in preferred_metrics:
|
||||||
|
if m.load_score <= best_score * 1.2:
|
||||||
|
return m.host, m
|
||||||
|
|
||||||
|
# Return absolute best
|
||||||
|
best = metrics_list[0]
|
||||||
|
return best.host, best
|
||||||
|
|
||||||
|
|
||||||
|
def get_group_capacity(group: str, timeout: int = 10) -> Dict:
|
||||||
|
"""
|
||||||
|
Get aggregate capacity of a group.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group name
|
||||||
|
timeout: Timeout for metric gathering
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with aggregate metrics:
|
||||||
|
{
|
||||||
|
'hosts': List[MachineMetrics],
|
||||||
|
'total_hosts': int,
|
||||||
|
'avg_cpu': float,
|
||||||
|
'avg_mem': float,
|
||||||
|
'avg_disk': float,
|
||||||
|
'avg_load_score': float,
|
||||||
|
'total_capacity': str # descriptive
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> capacity = get_group_capacity("production")
|
||||||
|
>>> capacity['avg_load_score']
|
||||||
|
0.45
|
||||||
|
"""
|
||||||
|
from utils.helpers import parse_sshsync_config
|
||||||
|
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
group_hosts = groups_config.get(group, [])
|
||||||
|
|
||||||
|
if not group_hosts:
|
||||||
|
return {
|
||||||
|
'error': f'Group {group} not found or has no members',
|
||||||
|
'hosts': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Get metrics for all hosts in group
|
||||||
|
metrics_list: List[MachineMetrics] = []
|
||||||
|
|
||||||
|
for host in group_hosts:
|
||||||
|
metrics = get_machine_load(host, timeout=timeout)
|
||||||
|
if metrics:
|
||||||
|
metrics_list.append(metrics)
|
||||||
|
|
||||||
|
if not metrics_list:
|
||||||
|
return {
|
||||||
|
'error': f'Could not get metrics for any hosts in {group}',
|
||||||
|
'hosts': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Calculate aggregates
|
||||||
|
avg_cpu = sum(m.cpu_pct for m in metrics_list) / len(metrics_list)
|
||||||
|
avg_mem = sum(m.mem_pct for m in metrics_list) / len(metrics_list)
|
||||||
|
avg_disk = sum(m.disk_pct for m in metrics_list) / len(metrics_list)
|
||||||
|
avg_score = sum(m.load_score for m in metrics_list) / len(metrics_list)
|
||||||
|
|
||||||
|
# Determine overall capacity description
|
||||||
|
if avg_score < 0.4:
|
||||||
|
capacity_desc = "High capacity available"
|
||||||
|
elif avg_score < 0.7:
|
||||||
|
capacity_desc = "Moderate capacity"
|
||||||
|
else:
|
||||||
|
capacity_desc = "Limited capacity"
|
||||||
|
|
||||||
|
return {
|
||||||
|
'group': group,
|
||||||
|
'hosts': metrics_list,
|
||||||
|
'total_hosts': len(metrics_list),
|
||||||
|
'available_hosts': len(group_hosts),
|
||||||
|
'avg_cpu': avg_cpu,
|
||||||
|
'avg_mem': avg_mem,
|
||||||
|
'avg_disk': avg_disk,
|
||||||
|
'avg_load_score': avg_score,
|
||||||
|
'total_capacity': capacity_desc
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def distribute_tasks(tasks: List[Dict], hosts: List[str],
|
||||||
|
timeout: int = 10) -> Dict[str, List[Dict]]:
|
||||||
|
"""
|
||||||
|
Distribute multiple tasks optimally across hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
tasks: List of task dicts (each with 'command', 'priority', etc)
|
||||||
|
hosts: Available hosts
|
||||||
|
timeout: Timeout for metric gathering
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict mapping hosts to assigned tasks
|
||||||
|
|
||||||
|
Algorithm:
|
||||||
|
- Get current load for all hosts
|
||||||
|
- Assign tasks to least loaded hosts
|
||||||
|
- Balance by estimated task weight
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> tasks = [
|
||||||
|
... {'command': 'npm run build', 'weight': 3},
|
||||||
|
... {'command': 'npm test', 'weight': 2}
|
||||||
|
... ]
|
||||||
|
>>> distribution = distribute_tasks(tasks, ["web-01", "web-02"])
|
||||||
|
>>> distribution["web-01"]
|
||||||
|
[{'command': 'npm run build', 'weight': 3}]
|
||||||
|
"""
|
||||||
|
if not tasks or not hosts:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
# Get current load for all hosts
|
||||||
|
host_metrics = {}
|
||||||
|
for host in hosts:
|
||||||
|
metrics = get_machine_load(host, timeout=timeout)
|
||||||
|
if metrics:
|
||||||
|
host_metrics[host] = metrics
|
||||||
|
|
||||||
|
if not host_metrics:
|
||||||
|
logger.error("No valid host metrics available")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
# Initialize assignment
|
||||||
|
assignment: Dict[str, List[Dict]] = {host: [] for host in host_metrics.keys()}
|
||||||
|
host_loads = {host: m.load_score for host, m in host_metrics.items()}
|
||||||
|
|
||||||
|
# Sort tasks by weight (descending) to assign heavy tasks first
|
||||||
|
sorted_tasks = sorted(
|
||||||
|
tasks,
|
||||||
|
key=lambda t: t.get('weight', 1),
|
||||||
|
reverse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assign each task to least loaded host
|
||||||
|
for task in sorted_tasks:
|
||||||
|
# Find host with minimum current load
|
||||||
|
min_host = min(host_loads.keys(), key=lambda h: host_loads[h])
|
||||||
|
|
||||||
|
# Assign task
|
||||||
|
assignment[min_host].append(task)
|
||||||
|
|
||||||
|
# Update simulated load (add task weight normalized)
|
||||||
|
task_weight = task.get('weight', 1)
|
||||||
|
host_loads[min_host] += (task_weight * 0.1) # 0.1 = scaling factor
|
||||||
|
|
||||||
|
return assignment
|
||||||
|
|
||||||
|
|
||||||
|
def format_load_report(metrics: MachineMetrics, compare_to_avg: Optional[Dict] = None) -> str:
|
||||||
|
"""
|
||||||
|
Format load metrics as human-readable report.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
metrics: Machine metrics
|
||||||
|
compare_to_avg: Optional dict with avg_cpu, avg_mem, avg_disk for comparison
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted report string
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> metrics = MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate')
|
||||||
|
>>> print(format_load_report(metrics))
|
||||||
|
web-01: Load Score: 0.49 (moderate)
|
||||||
|
CPU: 45.0% | Memory: 60.0% | Disk: 40.0%
|
||||||
|
"""
|
||||||
|
lines = [
|
||||||
|
f"{metrics.host}: Load Score: {metrics.load_score:.2f} ({metrics.status})",
|
||||||
|
f" CPU: {metrics.cpu_pct:.1f}% | Memory: {metrics.mem_pct:.1f}% | Disk: {metrics.disk_pct:.1f}%"
|
||||||
|
]
|
||||||
|
|
||||||
|
if compare_to_avg:
|
||||||
|
cpu_vs = metrics.cpu_pct - compare_to_avg.get('avg_cpu', 0)
|
||||||
|
mem_vs = metrics.mem_pct - compare_to_avg.get('avg_mem', 0)
|
||||||
|
disk_vs = metrics.disk_pct - compare_to_avg.get('avg_disk', 0)
|
||||||
|
|
||||||
|
comparisons = []
|
||||||
|
if abs(cpu_vs) > 10:
|
||||||
|
comparisons.append(f"CPU {'+' if cpu_vs > 0 else ''}{cpu_vs:.0f}% vs avg")
|
||||||
|
if abs(mem_vs) > 10:
|
||||||
|
comparisons.append(f"Mem {'+' if mem_vs > 0 else ''}{mem_vs:.0f}% vs avg")
|
||||||
|
if abs(disk_vs) > 10:
|
||||||
|
comparisons.append(f"Disk {'+' if disk_vs > 0 else ''}{disk_vs:.0f}% vs avg")
|
||||||
|
|
||||||
|
if comparisons:
|
||||||
|
lines.append(f" vs Average: {' | '.join(comparisons)}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test load balancer functions."""
|
||||||
|
print("Testing load balancer...\n")
|
||||||
|
|
||||||
|
print("1. Testing select_optimal_host:")
|
||||||
|
print(" (Requires configured hosts - using dry-run simulation)")
|
||||||
|
|
||||||
|
# Simulate metrics
|
||||||
|
test_metrics = [
|
||||||
|
MachineMetrics('web-01', 45, 60, 40, 0.49, 'moderate'),
|
||||||
|
MachineMetrics('web-02', 85, 70, 65, 0.75, 'high'),
|
||||||
|
MachineMetrics('web-03', 20, 35, 30, 0.28, 'low'),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Sort by score
|
||||||
|
test_metrics.sort(key=lambda m: m.load_score)
|
||||||
|
best = test_metrics[0]
|
||||||
|
|
||||||
|
print(f" ✓ Best host: {best.host} (score: {best.load_score:.2f})")
|
||||||
|
print(f" Reason: {best.status} load")
|
||||||
|
|
||||||
|
print("\n2. Format load report:")
|
||||||
|
report = format_load_report(test_metrics[0], {
|
||||||
|
'avg_cpu': 50,
|
||||||
|
'avg_mem': 55,
|
||||||
|
'avg_disk': 45
|
||||||
|
})
|
||||||
|
print(report)
|
||||||
|
|
||||||
|
print("\n✅ Load balancer tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
409
scripts/sshsync_wrapper.py
Normal file
409
scripts/sshsync_wrapper.py
Normal file
@@ -0,0 +1,409 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
SSH Sync wrapper for Tailscale SSH Sync Agent.
|
||||||
|
Python interface to sshsync CLI operations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
|
||||||
|
# Add utils to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
|
||||||
|
from utils.helpers import parse_ssh_config, parse_sshsync_config, format_bytes, format_duration
|
||||||
|
from utils.validators import validate_host, validate_group, validate_path_exists, validate_timeout, validate_command
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def get_host_status(group: Optional[str] = None) -> Dict:
|
||||||
|
"""
|
||||||
|
Get online/offline status of hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Optional group to filter (None = all hosts)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with status info
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> status = get_host_status()
|
||||||
|
>>> status['online_count']
|
||||||
|
8
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Run sshsync ls --with-status
|
||||||
|
cmd = ["sshsync", "ls", "--with-status"]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
return {'error': result.stderr, 'hosts': []}
|
||||||
|
|
||||||
|
# Parse output
|
||||||
|
hosts = []
|
||||||
|
for line in result.stdout.strip().split('\n'):
|
||||||
|
if not line or line.startswith('Host') or line.startswith('---'):
|
||||||
|
continue
|
||||||
|
|
||||||
|
parts = line.split()
|
||||||
|
if len(parts) >= 2:
|
||||||
|
host_name = parts[0]
|
||||||
|
status = parts[1] if len(parts) > 1 else 'unknown'
|
||||||
|
|
||||||
|
hosts.append({
|
||||||
|
'host': host_name,
|
||||||
|
'online': status.lower() in ['online', 'reachable', '✓'],
|
||||||
|
'status': status
|
||||||
|
})
|
||||||
|
|
||||||
|
# Filter by group if specified
|
||||||
|
if group:
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
group_hosts = groups_config.get(group, [])
|
||||||
|
hosts = [h for h in hosts if h['host'] in group_hosts]
|
||||||
|
|
||||||
|
online_count = sum(1 for h in hosts if h['online'])
|
||||||
|
|
||||||
|
return {
|
||||||
|
'hosts': hosts,
|
||||||
|
'total_count': len(hosts),
|
||||||
|
'online_count': online_count,
|
||||||
|
'offline_count': len(hosts) - online_count,
|
||||||
|
'availability_pct': (online_count / len(hosts) * 100) if hosts else 0
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error getting host status: {e}")
|
||||||
|
return {'error': str(e), 'hosts': []}
|
||||||
|
|
||||||
|
|
||||||
|
def execute_on_all(command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Execute command on all hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
command: Command to execute
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
dry_run: If True, don't actually execute
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with results per host
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = execute_on_all("uptime", timeout=15)
|
||||||
|
>>> len(result['results'])
|
||||||
|
10
|
||||||
|
"""
|
||||||
|
validate_command(command)
|
||||||
|
validate_timeout(timeout)
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
return {
|
||||||
|
'dry_run': True,
|
||||||
|
'command': command,
|
||||||
|
'message': 'Would execute on all hosts'
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = ["sshsync", "all", f"--timeout={timeout}", command]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
|
||||||
|
|
||||||
|
# Parse results (format varies, simplified here)
|
||||||
|
return {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'stdout': result.stdout,
|
||||||
|
'stderr': result.stderr,
|
||||||
|
'command': command
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {'error': f'Command timed out after {timeout}s'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'error': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def execute_on_group(group: str, command: str, timeout: int = 10, dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Execute command on specific group.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group name
|
||||||
|
command: Command to execute
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
dry_run: Preview without executing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with execution results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = execute_on_group("web-servers", "df -h /var/www")
|
||||||
|
>>> result['success']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
validate_group(group, list(groups_config.keys()))
|
||||||
|
validate_command(command)
|
||||||
|
validate_timeout(timeout)
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
group_hosts = groups_config.get(group, [])
|
||||||
|
return {
|
||||||
|
'dry_run': True,
|
||||||
|
'group': group,
|
||||||
|
'hosts': group_hosts,
|
||||||
|
'command': command,
|
||||||
|
'message': f'Would execute on {len(group_hosts)} hosts in group {group}'
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = ["sshsync", "group", f"--timeout={timeout}", group, command]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 30)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'group': group,
|
||||||
|
'stdout': result.stdout,
|
||||||
|
'stderr': result.stderr,
|
||||||
|
'command': command
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {'error': f'Command timed out after {timeout}s'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'error': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def execute_on_host(host: str, command: str, timeout: int = 10) -> Dict:
|
||||||
|
"""
|
||||||
|
Execute command on single host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name
|
||||||
|
command: Command to execute
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with result
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = execute_on_host("web-01", "hostname")
|
||||||
|
>>> result['stdout']
|
||||||
|
"web-01"
|
||||||
|
"""
|
||||||
|
ssh_hosts = parse_ssh_config()
|
||||||
|
validate_host(host, list(ssh_hosts.keys()))
|
||||||
|
validate_command(command)
|
||||||
|
validate_timeout(timeout)
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = ["ssh", "-o", f"ConnectTimeout={timeout}", host, command]
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'host': host,
|
||||||
|
'stdout': result.stdout,
|
||||||
|
'stderr': result.stderr,
|
||||||
|
'command': command
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {'error': f'Command timed out after {timeout}s'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'error': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def push_to_hosts(local_path: str, remote_path: str,
|
||||||
|
hosts: Optional[List[str]] = None,
|
||||||
|
group: Optional[str] = None,
|
||||||
|
recurse: bool = False,
|
||||||
|
dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Push files to hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
local_path: Local file/directory path
|
||||||
|
remote_path: Remote destination path
|
||||||
|
hosts: Specific hosts (None = all if group also None)
|
||||||
|
group: Group name
|
||||||
|
recurse: Recursive copy
|
||||||
|
dry_run: Preview without executing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with push results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = push_to_hosts("./dist", "/var/www/app", group="production", recurse=True)
|
||||||
|
>>> result['success']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
validate_path_exists(local_path)
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
return {
|
||||||
|
'dry_run': True,
|
||||||
|
'local_path': local_path,
|
||||||
|
'remote_path': remote_path,
|
||||||
|
'hosts': hosts,
|
||||||
|
'group': group,
|
||||||
|
'recurse': recurse,
|
||||||
|
'message': 'Would push files'
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = ["sshsync", "push"]
|
||||||
|
|
||||||
|
if hosts:
|
||||||
|
for host in hosts:
|
||||||
|
cmd.extend(["--host", host])
|
||||||
|
elif group:
|
||||||
|
cmd.extend(["--group", group])
|
||||||
|
else:
|
||||||
|
cmd.append("--all")
|
||||||
|
|
||||||
|
if recurse:
|
||||||
|
cmd.append("--recurse")
|
||||||
|
|
||||||
|
cmd.extend([local_path, remote_path])
|
||||||
|
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'stdout': result.stdout,
|
||||||
|
'stderr': result.stderr,
|
||||||
|
'local_path': local_path,
|
||||||
|
'remote_path': remote_path
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {'error': 'Push operation timed out'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'error': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def pull_from_host(host: str, remote_path: str, local_path: str,
|
||||||
|
recurse: bool = False, dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Pull files from host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to pull from
|
||||||
|
remote_path: Remote file/directory path
|
||||||
|
local_path: Local destination path
|
||||||
|
recurse: Recursive copy
|
||||||
|
dry_run: Preview without executing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with pull results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = pull_from_host("web-01", "/var/log/nginx", "./logs", recurse=True)
|
||||||
|
>>> result['success']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
ssh_hosts = parse_ssh_config()
|
||||||
|
validate_host(host, list(ssh_hosts.keys()))
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
return {
|
||||||
|
'dry_run': True,
|
||||||
|
'host': host,
|
||||||
|
'remote_path': remote_path,
|
||||||
|
'local_path': local_path,
|
||||||
|
'recurse': recurse,
|
||||||
|
'message': f'Would pull from {host}'
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = ["sshsync", "pull", "--host", host]
|
||||||
|
|
||||||
|
if recurse:
|
||||||
|
cmd.append("--recurse")
|
||||||
|
|
||||||
|
cmd.extend([remote_path, local_path])
|
||||||
|
|
||||||
|
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'host': host,
|
||||||
|
'stdout': result.stdout,
|
||||||
|
'stderr': result.stderr,
|
||||||
|
'remote_path': remote_path,
|
||||||
|
'local_path': local_path
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {'error': 'Pull operation timed out'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'error': str(e)}
|
||||||
|
|
||||||
|
|
||||||
|
def list_hosts(with_status: bool = True) -> Dict:
|
||||||
|
"""
|
||||||
|
List all configured hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
with_status: Include online/offline status
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with hosts info
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = list_hosts(with_status=True)
|
||||||
|
>>> len(result['hosts'])
|
||||||
|
10
|
||||||
|
"""
|
||||||
|
if with_status:
|
||||||
|
return get_host_status()
|
||||||
|
else:
|
||||||
|
ssh_hosts = parse_ssh_config()
|
||||||
|
return {
|
||||||
|
'hosts': [{'host': name} for name in ssh_hosts.keys()],
|
||||||
|
'count': len(ssh_hosts)
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_groups() -> Dict[str, List[str]]:
|
||||||
|
"""
|
||||||
|
Get all defined groups and their members.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict mapping group names to host lists
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> groups = get_groups()
|
||||||
|
>>> groups['production']
|
||||||
|
['prod-web-01', 'prod-db-01']
|
||||||
|
"""
|
||||||
|
return parse_sshsync_config()
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test sshsync wrapper functions."""
|
||||||
|
print("Testing sshsync wrapper...\n")
|
||||||
|
|
||||||
|
print("1. List hosts:")
|
||||||
|
result = list_hosts(with_status=False)
|
||||||
|
print(f" Found {result.get('count', 0)} hosts")
|
||||||
|
|
||||||
|
print("\n2. Get groups:")
|
||||||
|
groups = get_groups()
|
||||||
|
print(f" Found {len(groups)} groups")
|
||||||
|
for group, hosts in groups.items():
|
||||||
|
print(f" - {group}: {len(hosts)} hosts")
|
||||||
|
|
||||||
|
print("\n3. Test dry-run:")
|
||||||
|
result = execute_on_all("uptime", dry_run=True)
|
||||||
|
print(f" Dry-run: {result.get('message', 'OK')}")
|
||||||
|
|
||||||
|
print("\n✅ sshsync wrapper tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
426
scripts/tailscale_manager.py
Normal file
426
scripts/tailscale_manager.py
Normal file
@@ -0,0 +1,426 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Tailscale manager for Tailscale SSH Sync Agent.
|
||||||
|
Tailscale-specific operations and status management.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
import re
|
||||||
|
import json
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
from dataclasses import dataclass
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TailscalePeer:
|
||||||
|
"""Represents a Tailscale peer."""
|
||||||
|
hostname: str
|
||||||
|
ip: str
|
||||||
|
online: bool
|
||||||
|
last_seen: Optional[str] = None
|
||||||
|
os: Optional[str] = None
|
||||||
|
relay: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_tailscale_status() -> Dict:
|
||||||
|
"""
|
||||||
|
Get Tailscale network status (all peers).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with network status:
|
||||||
|
{
|
||||||
|
'connected': bool,
|
||||||
|
'peers': List[TailscalePeer],
|
||||||
|
'online_count': int,
|
||||||
|
'total_count': int,
|
||||||
|
'self_ip': str
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> status = get_tailscale_status()
|
||||||
|
>>> status['online_count']
|
||||||
|
8
|
||||||
|
>>> status['peers'][0].hostname
|
||||||
|
'homelab-1'
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Get status in JSON format
|
||||||
|
result = subprocess.run(
|
||||||
|
["tailscale", "status", "--json"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
# Try text format if JSON fails
|
||||||
|
result = subprocess.run(
|
||||||
|
["tailscale", "status"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
return {
|
||||||
|
'connected': False,
|
||||||
|
'error': 'Tailscale not running or accessible',
|
||||||
|
'peers': []
|
||||||
|
}
|
||||||
|
|
||||||
|
# Parse text format
|
||||||
|
return _parse_text_status(result.stdout)
|
||||||
|
|
||||||
|
# Parse JSON format
|
||||||
|
data = json.loads(result.stdout)
|
||||||
|
return _parse_json_status(data)
|
||||||
|
|
||||||
|
except FileNotFoundError:
|
||||||
|
return {
|
||||||
|
'connected': False,
|
||||||
|
'error': 'Tailscale not installed',
|
||||||
|
'peers': []
|
||||||
|
}
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {
|
||||||
|
'connected': False,
|
||||||
|
'error': 'Timeout getting Tailscale status',
|
||||||
|
'peers': []
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error getting Tailscale status: {e}")
|
||||||
|
return {
|
||||||
|
'connected': False,
|
||||||
|
'error': str(e),
|
||||||
|
'peers': []
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_json_status(data: Dict) -> Dict:
|
||||||
|
"""Parse Tailscale JSON status."""
|
||||||
|
peers = []
|
||||||
|
|
||||||
|
self_data = data.get('Self', {})
|
||||||
|
self_ip = self_data.get('TailscaleIPs', [''])[0]
|
||||||
|
|
||||||
|
for peer_id, peer_data in data.get('Peer', {}).items():
|
||||||
|
hostname = peer_data.get('HostName', 'unknown')
|
||||||
|
ips = peer_data.get('TailscaleIPs', [])
|
||||||
|
ip = ips[0] if ips else 'unknown'
|
||||||
|
online = peer_data.get('Online', False)
|
||||||
|
os = peer_data.get('OS', 'unknown')
|
||||||
|
|
||||||
|
peers.append(TailscalePeer(
|
||||||
|
hostname=hostname,
|
||||||
|
ip=ip,
|
||||||
|
online=online,
|
||||||
|
os=os
|
||||||
|
))
|
||||||
|
|
||||||
|
online_count = sum(1 for p in peers if p.online)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'connected': True,
|
||||||
|
'peers': peers,
|
||||||
|
'online_count': online_count,
|
||||||
|
'total_count': len(peers),
|
||||||
|
'self_ip': self_ip
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_text_status(output: str) -> Dict:
|
||||||
|
"""Parse Tailscale text status output."""
|
||||||
|
peers = []
|
||||||
|
self_ip = None
|
||||||
|
|
||||||
|
for line in output.strip().split('\n'):
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Parse format: hostname ip status ...
|
||||||
|
parts = line.split()
|
||||||
|
if len(parts) >= 2:
|
||||||
|
hostname = parts[0]
|
||||||
|
ip = parts[1] if len(parts) > 1 else 'unknown'
|
||||||
|
|
||||||
|
# Check for self (usually marked with *)
|
||||||
|
if hostname.endswith('-'):
|
||||||
|
self_ip = ip
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Determine online status from additional fields
|
||||||
|
online = 'offline' not in line.lower()
|
||||||
|
|
||||||
|
peers.append(TailscalePeer(
|
||||||
|
hostname=hostname,
|
||||||
|
ip=ip,
|
||||||
|
online=online
|
||||||
|
))
|
||||||
|
|
||||||
|
online_count = sum(1 for p in peers if p.online)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'connected': True,
|
||||||
|
'peers': peers,
|
||||||
|
'online_count': online_count,
|
||||||
|
'total_count': len(peers),
|
||||||
|
'self_ip': self_ip or 'unknown'
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def check_connectivity(host: str, timeout: int = 5) -> bool:
|
||||||
|
"""
|
||||||
|
Ping host via Tailscale.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Hostname to ping
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if host responds to ping
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> check_connectivity("homelab-1")
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["tailscale", "ping", "--timeout", f"{timeout}s", "--c", "1", host],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout + 2
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if ping succeeded
|
||||||
|
return result.returncode == 0 or 'pong' in result.stdout.lower()
|
||||||
|
|
||||||
|
except (FileNotFoundError, subprocess.TimeoutExpired):
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error pinging {host}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def get_peer_info(hostname: str) -> Optional[TailscalePeer]:
|
||||||
|
"""
|
||||||
|
Get detailed info about a specific peer.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Peer hostname
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
TailscalePeer object or None if not found
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> peer = get_peer_info("homelab-1")
|
||||||
|
>>> peer.ip
|
||||||
|
'100.64.1.10'
|
||||||
|
"""
|
||||||
|
status = get_tailscale_status()
|
||||||
|
|
||||||
|
if not status.get('connected'):
|
||||||
|
return None
|
||||||
|
|
||||||
|
for peer in status.get('peers', []):
|
||||||
|
if peer.hostname == hostname or hostname in peer.hostname:
|
||||||
|
return peer
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def list_online_machines() -> List[str]:
|
||||||
|
"""
|
||||||
|
List all online Tailscale machines.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of online machine hostnames
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> machines = list_online_machines()
|
||||||
|
>>> len(machines)
|
||||||
|
8
|
||||||
|
"""
|
||||||
|
status = get_tailscale_status()
|
||||||
|
|
||||||
|
if not status.get('connected'):
|
||||||
|
return []
|
||||||
|
|
||||||
|
return [
|
||||||
|
peer.hostname
|
||||||
|
for peer in status.get('peers', [])
|
||||||
|
if peer.online
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def get_machine_ip(hostname: str) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Get Tailscale IP for a machine.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hostname: Machine hostname
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
IP address or None if not found
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> ip = get_machine_ip("homelab-1")
|
||||||
|
>>> ip
|
||||||
|
'100.64.1.10'
|
||||||
|
"""
|
||||||
|
peer = get_peer_info(hostname)
|
||||||
|
return peer.ip if peer else None
|
||||||
|
|
||||||
|
|
||||||
|
def validate_tailscale_ssh(host: str, timeout: int = 10) -> Dict:
|
||||||
|
"""
|
||||||
|
Check if Tailscale SSH is working for a host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to check
|
||||||
|
timeout: Connection timeout
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with validation results:
|
||||||
|
{
|
||||||
|
'working': bool,
|
||||||
|
'message': str,
|
||||||
|
'details': Dict
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = validate_tailscale_ssh("homelab-1")
|
||||||
|
>>> result['working']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
# First check if host is in Tailscale network
|
||||||
|
peer = get_peer_info(host)
|
||||||
|
|
||||||
|
if not peer:
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Host {host} not found in Tailscale network',
|
||||||
|
'details': {'peer_found': False}
|
||||||
|
}
|
||||||
|
|
||||||
|
if not peer.online:
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Host {host} is offline in Tailscale',
|
||||||
|
'details': {'peer_found': True, 'online': False}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check connectivity
|
||||||
|
if not check_connectivity(host, timeout=timeout):
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Cannot ping {host} via Tailscale',
|
||||||
|
'details': {'peer_found': True, 'online': True, 'ping': False}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Try SSH connection
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["tailscale", "ssh", host, "echo", "test"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode == 0:
|
||||||
|
return {
|
||||||
|
'working': True,
|
||||||
|
'message': f'Tailscale SSH to {host} is working',
|
||||||
|
'details': {
|
||||||
|
'peer_found': True,
|
||||||
|
'online': True,
|
||||||
|
'ping': True,
|
||||||
|
'ssh': True,
|
||||||
|
'ip': peer.ip
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Tailscale SSH failed: {result.stderr}',
|
||||||
|
'details': {
|
||||||
|
'peer_found': True,
|
||||||
|
'online': True,
|
||||||
|
'ping': True,
|
||||||
|
'ssh': False,
|
||||||
|
'error': result.stderr
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Tailscale SSH timed out after {timeout}s',
|
||||||
|
'details': {'timeout': True}
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
return {
|
||||||
|
'working': False,
|
||||||
|
'message': f'Error testing Tailscale SSH: {e}',
|
||||||
|
'details': {'error': str(e)}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_network_summary() -> str:
|
||||||
|
"""
|
||||||
|
Get human-readable network summary.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted summary string
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> print(get_network_summary())
|
||||||
|
Tailscale Network: Connected
|
||||||
|
Online: 8/10 machines (80%)
|
||||||
|
Self IP: 100.64.1.5
|
||||||
|
"""
|
||||||
|
status = get_tailscale_status()
|
||||||
|
|
||||||
|
if not status.get('connected'):
|
||||||
|
return "Tailscale Network: Not connected\nError: {}".format(
|
||||||
|
status.get('error', 'Unknown error')
|
||||||
|
)
|
||||||
|
|
||||||
|
lines = [
|
||||||
|
"Tailscale Network: Connected",
|
||||||
|
f"Online: {status['online_count']}/{status['total_count']} machines ({status['online_count']/status['total_count']*100:.0f}%)",
|
||||||
|
f"Self IP: {status.get('self_ip', 'unknown')}"
|
||||||
|
]
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test Tailscale manager functions."""
|
||||||
|
print("Testing Tailscale manager...\n")
|
||||||
|
|
||||||
|
print("1. Get Tailscale status:")
|
||||||
|
status = get_tailscale_status()
|
||||||
|
if status.get('connected'):
|
||||||
|
print(f" ✓ Connected")
|
||||||
|
print(f" Peers: {status['total_count']} total, {status['online_count']} online")
|
||||||
|
else:
|
||||||
|
print(f" ✗ Not connected: {status.get('error', 'Unknown error')}")
|
||||||
|
|
||||||
|
print("\n2. List online machines:")
|
||||||
|
machines = list_online_machines()
|
||||||
|
print(f" Found {len(machines)} online machines")
|
||||||
|
for machine in machines[:5]: # Show first 5
|
||||||
|
print(f" - {machine}")
|
||||||
|
|
||||||
|
print("\n3. Network summary:")
|
||||||
|
print(get_network_summary())
|
||||||
|
|
||||||
|
print("\n✅ Tailscale manager tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
628
scripts/utils/helpers.py
Normal file
628
scripts/utils/helpers.py
Normal file
@@ -0,0 +1,628 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Helper utilities for Tailscale SSH Sync Agent.
|
||||||
|
Provides common formatting, parsing, and utility functions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional, Tuple, Any
|
||||||
|
import yaml
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def format_bytes(bytes_value: int) -> str:
|
||||||
|
"""
|
||||||
|
Format bytes as human-readable string.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
bytes_value: Number of bytes
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted string (e.g., "12.3 MB", "1.5 GB")
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> format_bytes(12582912)
|
||||||
|
"12.0 MB"
|
||||||
|
>>> format_bytes(1610612736)
|
||||||
|
"1.5 GB"
|
||||||
|
"""
|
||||||
|
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
|
||||||
|
if bytes_value < 1024.0:
|
||||||
|
return f"{bytes_value:.1f} {unit}"
|
||||||
|
bytes_value /= 1024.0
|
||||||
|
return f"{bytes_value:.1f} PB"
|
||||||
|
|
||||||
|
|
||||||
|
def format_duration(seconds: float) -> str:
|
||||||
|
"""
|
||||||
|
Format duration as human-readable string.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
seconds: Duration in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted string (e.g., "2m 15s", "1h 30m")
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> format_duration(135)
|
||||||
|
"2m 15s"
|
||||||
|
>>> format_duration(5430)
|
||||||
|
"1h 30m 30s"
|
||||||
|
"""
|
||||||
|
if seconds < 60:
|
||||||
|
return f"{int(seconds)}s"
|
||||||
|
|
||||||
|
minutes = int(seconds // 60)
|
||||||
|
secs = int(seconds % 60)
|
||||||
|
|
||||||
|
if minutes < 60:
|
||||||
|
return f"{minutes}m {secs}s" if secs > 0 else f"{minutes}m"
|
||||||
|
|
||||||
|
hours = minutes // 60
|
||||||
|
minutes = minutes % 60
|
||||||
|
|
||||||
|
parts = [f"{hours}h"]
|
||||||
|
if minutes > 0:
|
||||||
|
parts.append(f"{minutes}m")
|
||||||
|
if secs > 0 and hours == 0: # Only show seconds if < 1 hour
|
||||||
|
parts.append(f"{secs}s")
|
||||||
|
|
||||||
|
return " ".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
def format_percentage(value: float, decimals: int = 1) -> str:
|
||||||
|
"""
|
||||||
|
Format percentage with specified decimals.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
value: Percentage value (0-100)
|
||||||
|
decimals: Number of decimal places
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted string (e.g., "45.5%")
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> format_percentage(45.567)
|
||||||
|
"45.6%"
|
||||||
|
"""
|
||||||
|
return f"{value:.{decimals}f}%"
|
||||||
|
|
||||||
|
|
||||||
|
def parse_ssh_config(config_path: Optional[Path] = None) -> Dict[str, Dict[str, str]]:
|
||||||
|
"""
|
||||||
|
Parse SSH config file for host definitions.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
config_path: Path to SSH config (default: ~/.ssh/config)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict mapping host aliases to their configuration:
|
||||||
|
{
|
||||||
|
'host-alias': {
|
||||||
|
'hostname': '100.64.1.10',
|
||||||
|
'user': 'admin',
|
||||||
|
'port': '22',
|
||||||
|
'identityfile': '~/.ssh/id_ed25519'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> hosts = parse_ssh_config()
|
||||||
|
>>> hosts['homelab-1']['hostname']
|
||||||
|
'100.64.1.10'
|
||||||
|
"""
|
||||||
|
if config_path is None:
|
||||||
|
config_path = Path.home() / '.ssh' / 'config'
|
||||||
|
|
||||||
|
if not config_path.exists():
|
||||||
|
logger.warning(f"SSH config not found: {config_path}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
hosts = {}
|
||||||
|
current_host = None
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
|
||||||
|
# Skip comments and empty lines
|
||||||
|
if not line or line.startswith('#'):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Host directive
|
||||||
|
if line.lower().startswith('host '):
|
||||||
|
host_alias = line.split(maxsplit=1)[1]
|
||||||
|
# Skip wildcards
|
||||||
|
if '*' not in host_alias and '?' not in host_alias:
|
||||||
|
current_host = host_alias
|
||||||
|
hosts[current_host] = {}
|
||||||
|
|
||||||
|
# Configuration directives
|
||||||
|
elif current_host:
|
||||||
|
parts = line.split(maxsplit=1)
|
||||||
|
if len(parts) == 2:
|
||||||
|
key, value = parts
|
||||||
|
hosts[current_host][key.lower()] = value
|
||||||
|
|
||||||
|
return hosts
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error parsing SSH config: {e}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_sshsync_config(config_path: Optional[Path] = None) -> Dict[str, List[str]]:
|
||||||
|
"""
|
||||||
|
Parse sshsync config file for group definitions.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
config_path: Path to sshsync config (default: ~/.config/sshsync/config.yaml)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict mapping group names to list of hosts:
|
||||||
|
{
|
||||||
|
'production': ['prod-web-01', 'prod-db-01'],
|
||||||
|
'development': ['dev-laptop', 'dev-desktop']
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> groups = parse_sshsync_config()
|
||||||
|
>>> groups['production']
|
||||||
|
['prod-web-01', 'prod-db-01']
|
||||||
|
"""
|
||||||
|
if config_path is None:
|
||||||
|
config_path = Path.home() / '.config' / 'sshsync' / 'config.yaml'
|
||||||
|
|
||||||
|
if not config_path.exists():
|
||||||
|
logger.warning(f"sshsync config not found: {config_path}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
config = yaml.safe_load(f)
|
||||||
|
|
||||||
|
return config.get('groups', {})
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error parsing sshsync config: {e}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def get_timestamp(iso: bool = True) -> str:
|
||||||
|
"""
|
||||||
|
Get current timestamp.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
iso: If True, return ISO format; otherwise human-readable
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Timestamp string
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> get_timestamp(iso=True)
|
||||||
|
"2025-10-19T19:43:41Z"
|
||||||
|
>>> get_timestamp(iso=False)
|
||||||
|
"2025-10-19 19:43:41"
|
||||||
|
"""
|
||||||
|
now = datetime.now()
|
||||||
|
if iso:
|
||||||
|
return now.strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
else:
|
||||||
|
return now.strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
|
||||||
|
def safe_execute(func, *args, default=None, **kwargs) -> Any:
|
||||||
|
"""
|
||||||
|
Execute function with error handling.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
func: Function to execute
|
||||||
|
*args: Positional arguments
|
||||||
|
default: Value to return on error
|
||||||
|
**kwargs: Keyword arguments
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Function result or default on error
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> safe_execute(int, "not_a_number", default=0)
|
||||||
|
0
|
||||||
|
>>> safe_execute(int, "42")
|
||||||
|
42
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
return func(*args, **kwargs)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error executing {func.__name__}: {e}")
|
||||||
|
return default
|
||||||
|
|
||||||
|
|
||||||
|
def validate_path(path: str, must_exist: bool = True) -> bool:
|
||||||
|
"""
|
||||||
|
Check if path is valid and accessible.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
path: Path to validate
|
||||||
|
must_exist: If True, path must exist
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if valid, False otherwise
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_path("/tmp")
|
||||||
|
True
|
||||||
|
>>> validate_path("/nonexistent", must_exist=True)
|
||||||
|
False
|
||||||
|
"""
|
||||||
|
p = Path(path).expanduser()
|
||||||
|
|
||||||
|
if must_exist:
|
||||||
|
return p.exists()
|
||||||
|
else:
|
||||||
|
# Check if parent directory exists (for paths that will be created)
|
||||||
|
return p.parent.exists()
|
||||||
|
|
||||||
|
|
||||||
|
def parse_disk_usage(df_output: str) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Parse 'df' command output.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
df_output: Output from 'df -h' command
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with disk usage info:
|
||||||
|
{
|
||||||
|
'filesystem': '/dev/sda1',
|
||||||
|
'size': '100G',
|
||||||
|
'used': '45G',
|
||||||
|
'available': '50G',
|
||||||
|
'use_pct': 45,
|
||||||
|
'mount': '/'
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> output = "Filesystem Size Used Avail Use% Mounted on\\n/dev/sda1 100G 45G 50G 45% /"
|
||||||
|
>>> parse_disk_usage(output)
|
||||||
|
{'filesystem': '/dev/sda1', 'size': '100G', ...}
|
||||||
|
"""
|
||||||
|
lines = df_output.strip().split('\n')
|
||||||
|
if len(lines) < 2:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
# Parse last line (actual data, not header)
|
||||||
|
data_line = lines[-1]
|
||||||
|
parts = data_line.split()
|
||||||
|
|
||||||
|
if len(parts) < 6:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
try:
|
||||||
|
return {
|
||||||
|
'filesystem': parts[0],
|
||||||
|
'size': parts[1],
|
||||||
|
'used': parts[2],
|
||||||
|
'available': parts[3],
|
||||||
|
'use_pct': int(parts[4].rstrip('%')),
|
||||||
|
'mount': parts[5]
|
||||||
|
}
|
||||||
|
except (ValueError, IndexError) as e:
|
||||||
|
logger.error(f"Error parsing disk usage: {e}")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_memory_usage(free_output: str) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Parse 'free' command output (Linux).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
free_output: Output from 'free -m' command
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with memory info:
|
||||||
|
{
|
||||||
|
'total': 16384, # MB
|
||||||
|
'used': 8192,
|
||||||
|
'free': 8192,
|
||||||
|
'use_pct': 50.0
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> output = "Mem: 16384 8192 8192 0 0 0"
|
||||||
|
>>> parse_memory_usage(output)
|
||||||
|
{'total': 16384, 'used': 8192, ...}
|
||||||
|
"""
|
||||||
|
lines = free_output.strip().split('\n')
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
if line.startswith('Mem:'):
|
||||||
|
parts = line.split()
|
||||||
|
if len(parts) >= 3:
|
||||||
|
try:
|
||||||
|
total = int(parts[1])
|
||||||
|
used = int(parts[2])
|
||||||
|
free = int(parts[3]) if len(parts) > 3 else (total - used)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'total': total,
|
||||||
|
'used': used,
|
||||||
|
'free': free,
|
||||||
|
'use_pct': (used / total * 100) if total > 0 else 0
|
||||||
|
}
|
||||||
|
except (ValueError, IndexError) as e:
|
||||||
|
logger.error(f"Error parsing memory usage: {e}")
|
||||||
|
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_cpu_load(uptime_output: str) -> Dict[str, float]:
|
||||||
|
"""
|
||||||
|
Parse 'uptime' command output for load averages.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
uptime_output: Output from 'uptime' command
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with load averages:
|
||||||
|
{
|
||||||
|
'load_1min': 0.45,
|
||||||
|
'load_5min': 0.38,
|
||||||
|
'load_15min': 0.32
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
|
||||||
|
>>> parse_cpu_load(output)
|
||||||
|
{'load_1min': 0.45, 'load_5min': 0.38, 'load_15min': 0.32}
|
||||||
|
"""
|
||||||
|
# Find "load average:" part
|
||||||
|
match = re.search(r'load average:\s+([\d.]+),\s+([\d.]+),\s+([\d.]+)', uptime_output)
|
||||||
|
|
||||||
|
if match:
|
||||||
|
try:
|
||||||
|
return {
|
||||||
|
'load_1min': float(match.group(1)),
|
||||||
|
'load_5min': float(match.group(2)),
|
||||||
|
'load_15min': float(match.group(3))
|
||||||
|
}
|
||||||
|
except ValueError as e:
|
||||||
|
logger.error(f"Error parsing CPU load: {e}")
|
||||||
|
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def format_host_status(host: str, online: bool, groups: List[str],
|
||||||
|
latency: Optional[int] = None,
|
||||||
|
tailscale_connected: bool = False) -> str:
|
||||||
|
"""
|
||||||
|
Format host status as display string.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name
|
||||||
|
online: Whether host is online
|
||||||
|
groups: List of groups host belongs to
|
||||||
|
latency: Latency in ms (optional)
|
||||||
|
tailscale_connected: Tailscale connection status
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted status string
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> format_host_status("web-01", True, ["production", "web"], 25, True)
|
||||||
|
"🟢 web-01 (production, web) - Online - Tailscale: Connected | Latency: 25ms"
|
||||||
|
"""
|
||||||
|
icon = "🟢" if online else "🔴"
|
||||||
|
status = "Online" if online else "Offline"
|
||||||
|
group_str = ", ".join(groups) if groups else "no group"
|
||||||
|
|
||||||
|
parts = [f"{icon} {host} ({group_str}) - {status}"]
|
||||||
|
|
||||||
|
if tailscale_connected:
|
||||||
|
parts.append("Tailscale: Connected")
|
||||||
|
|
||||||
|
if latency is not None and online:
|
||||||
|
parts.append(f"Latency: {latency}ms")
|
||||||
|
|
||||||
|
return " - ".join(parts)
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_load_score(cpu_pct: float, mem_pct: float, disk_pct: float) -> float:
|
||||||
|
"""
|
||||||
|
Calculate composite load score for a machine.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
cpu_pct: CPU usage percentage (0-100)
|
||||||
|
mem_pct: Memory usage percentage (0-100)
|
||||||
|
disk_pct: Disk usage percentage (0-100)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Load score (0-1, lower is better)
|
||||||
|
|
||||||
|
Formula:
|
||||||
|
score = (cpu * 0.4) + (mem * 0.3) + (disk * 0.3)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> calculate_load_score(45, 60, 40)
|
||||||
|
0.48 # (0.45*0.4 + 0.60*0.3 + 0.40*0.3)
|
||||||
|
"""
|
||||||
|
return (cpu_pct * 0.4 + mem_pct * 0.3 + disk_pct * 0.3) / 100
|
||||||
|
|
||||||
|
|
||||||
|
def classify_load_status(score: float) -> str:
|
||||||
|
"""
|
||||||
|
Classify load score into status category.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
score: Load score (0-1)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Status string: "low", "moderate", or "high"
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> classify_load_status(0.28)
|
||||||
|
"low"
|
||||||
|
>>> classify_load_status(0.55)
|
||||||
|
"moderate"
|
||||||
|
>>> classify_load_status(0.82)
|
||||||
|
"high"
|
||||||
|
"""
|
||||||
|
if score < 0.4:
|
||||||
|
return "low"
|
||||||
|
elif score < 0.7:
|
||||||
|
return "moderate"
|
||||||
|
else:
|
||||||
|
return "high"
|
||||||
|
|
||||||
|
|
||||||
|
def classify_latency(latency_ms: int) -> Tuple[str, str]:
|
||||||
|
"""
|
||||||
|
Classify network latency.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
latency_ms: Latency in milliseconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (status, description)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> classify_latency(25)
|
||||||
|
("excellent", "Ideal for interactive tasks")
|
||||||
|
>>> classify_latency(150)
|
||||||
|
("fair", "May impact interactive workflows")
|
||||||
|
"""
|
||||||
|
if latency_ms < 50:
|
||||||
|
return ("excellent", "Ideal for interactive tasks")
|
||||||
|
elif latency_ms < 100:
|
||||||
|
return ("good", "Suitable for most operations")
|
||||||
|
elif latency_ms < 200:
|
||||||
|
return ("fair", "May impact interactive workflows")
|
||||||
|
else:
|
||||||
|
return ("poor", "Investigate network issues")
|
||||||
|
|
||||||
|
|
||||||
|
def get_hosts_from_groups(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
|
||||||
|
"""
|
||||||
|
Get list of hosts in a group.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group name
|
||||||
|
groups_config: Groups configuration dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of host names in group
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> groups = {'production': ['web-01', 'db-01']}
|
||||||
|
>>> get_hosts_from_groups('production', groups)
|
||||||
|
['web-01', 'db-01']
|
||||||
|
"""
|
||||||
|
return groups_config.get(group, [])
|
||||||
|
|
||||||
|
|
||||||
|
def get_groups_for_host(host: str, groups_config: Dict[str, List[str]]) -> List[str]:
|
||||||
|
"""
|
||||||
|
Get list of groups a host belongs to.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name
|
||||||
|
groups_config: Groups configuration dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of group names
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> groups = {'production': ['web-01'], 'web': ['web-01', 'web-02']}
|
||||||
|
>>> get_groups_for_host('web-01', groups)
|
||||||
|
['production', 'web']
|
||||||
|
"""
|
||||||
|
return [group for group, hosts in groups_config.items() if host in hosts]
|
||||||
|
|
||||||
|
|
||||||
|
def run_command(command: str, timeout: int = 10) -> Tuple[bool, str, str]:
|
||||||
|
"""
|
||||||
|
Run shell command with timeout.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
command: Command to execute
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (success, stdout, stderr)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> success, stdout, stderr = run_command("echo hello")
|
||||||
|
>>> success
|
||||||
|
True
|
||||||
|
>>> stdout.strip()
|
||||||
|
"hello"
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
command,
|
||||||
|
shell=True,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
return (
|
||||||
|
result.returncode == 0,
|
||||||
|
result.stdout,
|
||||||
|
result.stderr
|
||||||
|
)
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return (False, "", f"Command timed out after {timeout}s")
|
||||||
|
except Exception as e:
|
||||||
|
return (False, "", str(e))
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test helper functions."""
|
||||||
|
print("Testing helper functions...\n")
|
||||||
|
|
||||||
|
# Test formatting
|
||||||
|
print("1. Format bytes:")
|
||||||
|
print(f" 12582912 bytes = {format_bytes(12582912)}")
|
||||||
|
print(f" 1610612736 bytes = {format_bytes(1610612736)}")
|
||||||
|
|
||||||
|
print("\n2. Format duration:")
|
||||||
|
print(f" 135 seconds = {format_duration(135)}")
|
||||||
|
print(f" 5430 seconds = {format_duration(5430)}")
|
||||||
|
|
||||||
|
print("\n3. Format percentage:")
|
||||||
|
print(f" 45.567 = {format_percentage(45.567)}")
|
||||||
|
|
||||||
|
print("\n4. Calculate load score:")
|
||||||
|
score = calculate_load_score(45, 60, 40)
|
||||||
|
print(f" CPU 45%, Mem 60%, Disk 40% = {score:.2f}")
|
||||||
|
print(f" Status: {classify_load_status(score)}")
|
||||||
|
|
||||||
|
print("\n5. Classify latency:")
|
||||||
|
latencies = [25, 75, 150, 250]
|
||||||
|
for lat in latencies:
|
||||||
|
status, desc = classify_latency(lat)
|
||||||
|
print(f" {lat}ms: {status} - {desc}")
|
||||||
|
|
||||||
|
print("\n6. Parse SSH config:")
|
||||||
|
ssh_hosts = parse_ssh_config()
|
||||||
|
print(f" Found {len(ssh_hosts)} hosts")
|
||||||
|
|
||||||
|
print("\n7. Parse sshsync config:")
|
||||||
|
groups = parse_sshsync_config()
|
||||||
|
print(f" Found {len(groups)} groups")
|
||||||
|
for group, hosts in groups.items():
|
||||||
|
print(f" - {group}: {len(hosts)} hosts")
|
||||||
|
|
||||||
|
print("\n✅ All helpers tested successfully")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
43
scripts/utils/validators/__init__.py
Normal file
43
scripts/utils/validators/__init__.py
Normal file
@@ -0,0 +1,43 @@
|
|||||||
|
"""
|
||||||
|
Validators package for Tailscale SSH Sync Agent.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .parameter_validator import (
|
||||||
|
ValidationError,
|
||||||
|
validate_host,
|
||||||
|
validate_group,
|
||||||
|
validate_path_exists,
|
||||||
|
validate_timeout,
|
||||||
|
validate_command
|
||||||
|
)
|
||||||
|
|
||||||
|
from .host_validator import (
|
||||||
|
validate_ssh_config,
|
||||||
|
validate_host_reachable,
|
||||||
|
validate_group_members,
|
||||||
|
get_invalid_hosts
|
||||||
|
)
|
||||||
|
|
||||||
|
from .connection_validator import (
|
||||||
|
validate_ssh_connection,
|
||||||
|
validate_tailscale_connection,
|
||||||
|
validate_ssh_key,
|
||||||
|
get_connection_diagnostics
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
'ValidationError',
|
||||||
|
'validate_host',
|
||||||
|
'validate_group',
|
||||||
|
'validate_path_exists',
|
||||||
|
'validate_timeout',
|
||||||
|
'validate_command',
|
||||||
|
'validate_ssh_config',
|
||||||
|
'validate_host_reachable',
|
||||||
|
'validate_group_members',
|
||||||
|
'get_invalid_hosts',
|
||||||
|
'validate_ssh_connection',
|
||||||
|
'validate_tailscale_connection',
|
||||||
|
'validate_ssh_key',
|
||||||
|
'get_connection_diagnostics',
|
||||||
|
]
|
||||||
275
scripts/utils/validators/connection_validator.py
Normal file
275
scripts/utils/validators/connection_validator.py
Normal file
@@ -0,0 +1,275 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Connection validators for Tailscale SSH Sync Agent.
|
||||||
|
Validates SSH and Tailscale connections.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
from typing import Dict, Optional
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from .parameter_validator import ValidationError
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def validate_ssh_connection(host: str, timeout: int = 10) -> bool:
|
||||||
|
"""
|
||||||
|
Test SSH connection works.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to connect to
|
||||||
|
timeout: Connection timeout in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if SSH connection successful
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If connection fails
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_ssh_connection("web-01")
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Try to execute a simple command via SSH
|
||||||
|
result = subprocess.run(
|
||||||
|
["ssh", "-o", "ConnectTimeout={}".format(timeout),
|
||||||
|
"-o", "BatchMode=yes",
|
||||||
|
"-o", "StrictHostKeyChecking=no",
|
||||||
|
host, "echo", "test"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout + 5
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode == 0:
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
# Parse error message
|
||||||
|
error_msg = result.stderr.strip()
|
||||||
|
|
||||||
|
if "Permission denied" in error_msg:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH authentication failed for '{host}'\n"
|
||||||
|
"Check:\n"
|
||||||
|
"1. SSH key is added: ssh-add -l\n"
|
||||||
|
"2. Public key is on remote: cat ~/.ssh/authorized_keys\n"
|
||||||
|
"3. User/key in SSH config is correct"
|
||||||
|
)
|
||||||
|
elif "Connection refused" in error_msg:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH connection refused for '{host}'\n"
|
||||||
|
"Check:\n"
|
||||||
|
"1. SSH server is running on remote\n"
|
||||||
|
"2. Port 22 is not blocked by firewall"
|
||||||
|
)
|
||||||
|
elif "Connection timed out" in error_msg or "timeout" in error_msg.lower():
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH connection timed out for '{host}'\n"
|
||||||
|
"Check:\n"
|
||||||
|
"1. Host is reachable (ping test)\n"
|
||||||
|
"2. Tailscale is connected\n"
|
||||||
|
"3. Network connectivity"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH connection failed for '{host}': {error_msg}"
|
||||||
|
)
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH connection timed out for '{host}' (>{timeout}s)"
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
raise ValidationError(f"Error testing SSH connection to '{host}': {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def validate_tailscale_connection(host: str) -> bool:
|
||||||
|
"""
|
||||||
|
Test Tailscale connectivity to host.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to check
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if Tailscale connection active
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If Tailscale not connected
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_tailscale_connection("web-01")
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Check if tailscale is running
|
||||||
|
result = subprocess.run(
|
||||||
|
["tailscale", "status"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
raise ValidationError(
|
||||||
|
"Tailscale is not running\n"
|
||||||
|
"Start Tailscale: sudo tailscale up"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if specific host is in the network
|
||||||
|
if host in result.stdout or host.replace('-', '.') in result.stdout:
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Host '{host}' not found in Tailscale network\n"
|
||||||
|
"Ensure host is:\n"
|
||||||
|
"1. Connected to Tailscale\n"
|
||||||
|
"2. In the same tailnet\n"
|
||||||
|
"3. Not expired/offline"
|
||||||
|
)
|
||||||
|
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise ValidationError(
|
||||||
|
"Tailscale not installed\n"
|
||||||
|
"Install: https://tailscale.com/download"
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
raise ValidationError("Timeout checking Tailscale status")
|
||||||
|
except Exception as e:
|
||||||
|
raise ValidationError(f"Error checking Tailscale connection: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def validate_ssh_key(host: str) -> bool:
|
||||||
|
"""
|
||||||
|
Check SSH key authentication is working.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to check
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if SSH key auth works
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If key auth fails
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_ssh_key("web-01")
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Test connection with explicit key-only auth
|
||||||
|
result = subprocess.run(
|
||||||
|
["ssh", "-o", "BatchMode=yes",
|
||||||
|
"-o", "PasswordAuthentication=no",
|
||||||
|
"-o", "ConnectTimeout=5",
|
||||||
|
host, "echo", "test"],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode == 0:
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
error_msg = result.stderr.strip()
|
||||||
|
|
||||||
|
if "Permission denied" in error_msg:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH key authentication failed for '{host}'\n"
|
||||||
|
"Fix:\n"
|
||||||
|
"1. Add your SSH key: ssh-add ~/.ssh/id_ed25519\n"
|
||||||
|
"2. Copy public key to remote: ssh-copy-id {}\n"
|
||||||
|
"3. Verify: ssh -v {} 2>&1 | grep -i 'offering public key'".format(host, host)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH key validation failed for '{host}': {error_msg}"
|
||||||
|
)
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
raise ValidationError(f"Timeout validating SSH key for '{host}'")
|
||||||
|
except Exception as e:
|
||||||
|
raise ValidationError(f"Error validating SSH key for '{host}': {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def get_connection_diagnostics(host: str) -> Dict[str, any]:
|
||||||
|
"""
|
||||||
|
Comprehensive connection testing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host to diagnose
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with diagnostic results:
|
||||||
|
{
|
||||||
|
'ping': {'success': bool, 'message': str},
|
||||||
|
'ssh': {'success': bool, 'message': str},
|
||||||
|
'tailscale': {'success': bool, 'message': str},
|
||||||
|
'ssh_key': {'success': bool, 'message': str}
|
||||||
|
}
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> diag = get_connection_diagnostics("web-01")
|
||||||
|
>>> diag['ssh']['success']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
diagnostics = {}
|
||||||
|
|
||||||
|
# Test 1: Ping
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["ping", "-c", "1", "-W", "2", host],
|
||||||
|
capture_output=True,
|
||||||
|
timeout=3
|
||||||
|
)
|
||||||
|
diagnostics['ping'] = {
|
||||||
|
'success': result.returncode == 0,
|
||||||
|
'message': 'Host is reachable' if result.returncode == 0 else 'Host not reachable'
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
diagnostics['ping'] = {'success': False, 'message': str(e)}
|
||||||
|
|
||||||
|
# Test 2: SSH connection
|
||||||
|
try:
|
||||||
|
validate_ssh_connection(host, timeout=5)
|
||||||
|
diagnostics['ssh'] = {'success': True, 'message': 'SSH connection works'}
|
||||||
|
except ValidationError as e:
|
||||||
|
diagnostics['ssh'] = {'success': False, 'message': str(e).split('\n')[0]}
|
||||||
|
|
||||||
|
# Test 3: Tailscale
|
||||||
|
try:
|
||||||
|
validate_tailscale_connection(host)
|
||||||
|
diagnostics['tailscale'] = {'success': True, 'message': 'Tailscale connected'}
|
||||||
|
except ValidationError as e:
|
||||||
|
diagnostics['tailscale'] = {'success': False, 'message': str(e).split('\n')[0]}
|
||||||
|
|
||||||
|
# Test 4: SSH key
|
||||||
|
try:
|
||||||
|
validate_ssh_key(host)
|
||||||
|
diagnostics['ssh_key'] = {'success': True, 'message': 'SSH key authentication works'}
|
||||||
|
except ValidationError as e:
|
||||||
|
diagnostics['ssh_key'] = {'success': False, 'message': str(e).split('\n')[0]}
|
||||||
|
|
||||||
|
return diagnostics
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test connection validators."""
|
||||||
|
print("Testing connection validators...\n")
|
||||||
|
|
||||||
|
print("1. Testing connection diagnostics:")
|
||||||
|
try:
|
||||||
|
diag = get_connection_diagnostics("localhost")
|
||||||
|
print(" Results:")
|
||||||
|
for test, result in diag.items():
|
||||||
|
status = "✓" if result['success'] else "✗"
|
||||||
|
print(f" {status} {test}: {result['message']}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error: {e}")
|
||||||
|
|
||||||
|
print("\n✅ Connection validators tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
232
scripts/utils/validators/host_validator.py
Normal file
232
scripts/utils/validators/host_validator.py
Normal file
@@ -0,0 +1,232 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Host validators for Tailscale SSH Sync Agent.
|
||||||
|
Validates host configuration and availability.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import subprocess
|
||||||
|
from typing import List, Dict, Optional
|
||||||
|
from pathlib import Path
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from .parameter_validator import ValidationError
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def validate_ssh_config(host: str, config_path: Optional[Path] = None) -> bool:
|
||||||
|
"""
|
||||||
|
Check if host has SSH config entry.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name to check
|
||||||
|
config_path: Path to SSH config (default: ~/.ssh/config)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if host is in SSH config
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If host not found in config
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_ssh_config("web-01")
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
if config_path is None:
|
||||||
|
config_path = Path.home() / '.ssh' / 'config'
|
||||||
|
|
||||||
|
if not config_path.exists():
|
||||||
|
raise ValidationError(
|
||||||
|
f"SSH config file not found: {config_path}\n"
|
||||||
|
"Create ~/.ssh/config with your host definitions"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Parse SSH config for this host
|
||||||
|
host_found = False
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if line.lower().startswith('host ') and host in line:
|
||||||
|
host_found = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if not host_found:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Host '{host}' not found in SSH config: {config_path}\n"
|
||||||
|
"Add host to SSH config:\n"
|
||||||
|
f"Host {host}\n"
|
||||||
|
f" HostName <IP_ADDRESS>\n"
|
||||||
|
f" User <USERNAME>"
|
||||||
|
)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except IOError as e:
|
||||||
|
raise ValidationError(f"Error reading SSH config: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def validate_host_reachable(host: str, timeout: int = 5) -> bool:
|
||||||
|
"""
|
||||||
|
Check if host is reachable via ping.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name to check
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if host is reachable
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If host is not reachable
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_host_reachable("web-01", timeout=5)
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Try to resolve via SSH config first
|
||||||
|
result = subprocess.run(
|
||||||
|
["ssh", "-G", host],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=2
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode == 0:
|
||||||
|
# Extract hostname from SSH config
|
||||||
|
for line in result.stdout.split('\n'):
|
||||||
|
if line.startswith('hostname '):
|
||||||
|
actual_host = line.split()[1]
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
actual_host = host
|
||||||
|
else:
|
||||||
|
actual_host = host
|
||||||
|
|
||||||
|
# Ping the host
|
||||||
|
ping_result = subprocess.run(
|
||||||
|
["ping", "-c", "1", "-W", str(timeout), actual_host],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=timeout + 1
|
||||||
|
)
|
||||||
|
|
||||||
|
if ping_result.returncode == 0:
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Host '{host}' ({actual_host}) is not reachable\n"
|
||||||
|
"Check:\n"
|
||||||
|
"1. Host is powered on\n"
|
||||||
|
"2. Tailscale is connected\n"
|
||||||
|
"3. Network connectivity"
|
||||||
|
)
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
raise ValidationError(f"Timeout checking host '{host}' (>{timeout}s)")
|
||||||
|
except Exception as e:
|
||||||
|
raise ValidationError(f"Error checking host '{host}': {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def validate_group_members(group: str, groups_config: Dict[str, List[str]]) -> List[str]:
|
||||||
|
"""
|
||||||
|
Ensure group has valid members.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group name
|
||||||
|
groups_config: Groups configuration dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of valid hosts in group
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If group is empty or has no valid members
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> groups = {'production': ['web-01', 'db-01']}
|
||||||
|
>>> validate_group_members('production', groups)
|
||||||
|
['web-01', 'db-01']
|
||||||
|
"""
|
||||||
|
if group not in groups_config:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Group '{group}' not found in configuration\n"
|
||||||
|
f"Available groups: {', '.join(groups_config.keys())}"
|
||||||
|
)
|
||||||
|
|
||||||
|
members = groups_config[group]
|
||||||
|
|
||||||
|
if not members:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Group '{group}' has no members\n"
|
||||||
|
f"Add hosts to group with: sshsync gadd {group}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if not isinstance(members, list):
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid group configuration for '{group}': members must be a list"
|
||||||
|
)
|
||||||
|
|
||||||
|
return members
|
||||||
|
|
||||||
|
|
||||||
|
def get_invalid_hosts(hosts: List[str], config_path: Optional[Path] = None) -> List[str]:
|
||||||
|
"""
|
||||||
|
Find hosts without valid SSH config.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hosts: List of host names
|
||||||
|
config_path: Path to SSH config
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of hosts without valid config
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> get_invalid_hosts(["web-01", "nonexistent"])
|
||||||
|
["nonexistent"]
|
||||||
|
"""
|
||||||
|
if config_path is None:
|
||||||
|
config_path = Path.home() / '.ssh' / 'config'
|
||||||
|
|
||||||
|
if not config_path.exists():
|
||||||
|
return hosts # All invalid if no config
|
||||||
|
|
||||||
|
# Parse SSH config
|
||||||
|
valid_hosts = set()
|
||||||
|
try:
|
||||||
|
with open(config_path, 'r') as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if line.lower().startswith('host '):
|
||||||
|
host_alias = line.split(maxsplit=1)[1]
|
||||||
|
if '*' not in host_alias and '?' not in host_alias:
|
||||||
|
valid_hosts.add(host_alias)
|
||||||
|
except IOError:
|
||||||
|
return hosts
|
||||||
|
|
||||||
|
# Find invalid hosts
|
||||||
|
return [h for h in hosts if h not in valid_hosts]
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test host validators."""
|
||||||
|
print("Testing host validators...\n")
|
||||||
|
|
||||||
|
print("1. Testing validate_ssh_config():")
|
||||||
|
try:
|
||||||
|
validate_ssh_config("localhost")
|
||||||
|
print(" ✓ localhost has SSH config")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" Note: {e.args[0].split(chr(10))[0]}")
|
||||||
|
|
||||||
|
print("\n2. Testing get_invalid_hosts():")
|
||||||
|
test_hosts = ["localhost", "nonexistent-host-12345"]
|
||||||
|
invalid = get_invalid_hosts(test_hosts)
|
||||||
|
print(f" Invalid hosts: {invalid}")
|
||||||
|
|
||||||
|
print("\n✅ Host validators tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
363
scripts/utils/validators/parameter_validator.py
Normal file
363
scripts/utils/validators/parameter_validator.py
Normal file
@@ -0,0 +1,363 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Parameter validators for Tailscale SSH Sync Agent.
|
||||||
|
Validates user inputs before making operations.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from typing import List, Optional
|
||||||
|
from pathlib import Path
|
||||||
|
import re
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class ValidationError(Exception):
|
||||||
|
"""Raised when validation fails."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def validate_host(host: str, valid_hosts: Optional[List[str]] = None) -> str:
|
||||||
|
"""
|
||||||
|
Validate host parameter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
host: Host name or alias
|
||||||
|
valid_hosts: List of valid hosts (None to skip check)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Validated and normalized host name
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If host is invalid
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_host("web-01")
|
||||||
|
"web-01"
|
||||||
|
>>> validate_host("web-01", ["web-01", "web-02"])
|
||||||
|
"web-01"
|
||||||
|
"""
|
||||||
|
if not host:
|
||||||
|
raise ValidationError("Host cannot be empty")
|
||||||
|
|
||||||
|
if not isinstance(host, str):
|
||||||
|
raise ValidationError(f"Host must be string, got {type(host)}")
|
||||||
|
|
||||||
|
# Normalize (strip whitespace, lowercase for comparison)
|
||||||
|
host = host.strip()
|
||||||
|
|
||||||
|
# Basic validation: alphanumeric, dash, underscore, dot
|
||||||
|
if not re.match(r'^[a-zA-Z0-9._-]+$', host):
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid host name format: {host}\n"
|
||||||
|
"Host names must contain only letters, numbers, dots, dashes, and underscores"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if valid (if list provided)
|
||||||
|
if valid_hosts:
|
||||||
|
# Try exact match first
|
||||||
|
if host in valid_hosts:
|
||||||
|
return host
|
||||||
|
|
||||||
|
# Try case-insensitive match
|
||||||
|
for valid_host in valid_hosts:
|
||||||
|
if host.lower() == valid_host.lower():
|
||||||
|
return valid_host
|
||||||
|
|
||||||
|
# Not found - provide suggestions
|
||||||
|
suggestions = [h for h in valid_hosts if host[:3].lower() in h.lower()]
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid host: {host}\n"
|
||||||
|
f"Valid options: {', '.join(valid_hosts[:10])}\n"
|
||||||
|
+ (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
|
||||||
|
)
|
||||||
|
|
||||||
|
return host
|
||||||
|
|
||||||
|
|
||||||
|
def validate_group(group: str, valid_groups: Optional[List[str]] = None) -> str:
|
||||||
|
"""
|
||||||
|
Validate group parameter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group name
|
||||||
|
valid_groups: List of valid groups (None to skip check)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Validated group name
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If group is invalid
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_group("production")
|
||||||
|
"production"
|
||||||
|
>>> validate_group("prod", ["production", "development"])
|
||||||
|
ValidationError: Invalid group: prod
|
||||||
|
"""
|
||||||
|
if not group:
|
||||||
|
raise ValidationError("Group cannot be empty")
|
||||||
|
|
||||||
|
if not isinstance(group, str):
|
||||||
|
raise ValidationError(f"Group must be string, got {type(group)}")
|
||||||
|
|
||||||
|
# Normalize
|
||||||
|
group = group.strip().lower()
|
||||||
|
|
||||||
|
# Basic validation
|
||||||
|
if not re.match(r'^[a-z0-9_-]+$', group):
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid group name format: {group}\n"
|
||||||
|
"Group names must contain only lowercase letters, numbers, dashes, and underscores"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if valid (if list provided)
|
||||||
|
if valid_groups:
|
||||||
|
if group not in valid_groups:
|
||||||
|
suggestions = [g for g in valid_groups if group[:3] in g]
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid group: {group}\n"
|
||||||
|
f"Valid groups: {', '.join(valid_groups)}\n"
|
||||||
|
+ (f"Did you mean: {', '.join(suggestions[:3])}?" if suggestions else "")
|
||||||
|
)
|
||||||
|
|
||||||
|
return group
|
||||||
|
|
||||||
|
|
||||||
|
def validate_path_exists(path: str, must_be_file: bool = False,
|
||||||
|
must_be_dir: bool = False) -> Path:
|
||||||
|
"""
|
||||||
|
Validate path exists and is accessible.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
path: Path to validate
|
||||||
|
must_be_file: If True, path must be a file
|
||||||
|
must_be_dir: If True, path must be a directory
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path: Validated Path object
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If path is invalid
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_path_exists("/tmp", must_be_dir=True)
|
||||||
|
Path('/tmp')
|
||||||
|
>>> validate_path_exists("/nonexistent")
|
||||||
|
ValidationError: Path does not exist: /nonexistent
|
||||||
|
"""
|
||||||
|
if not path:
|
||||||
|
raise ValidationError("Path cannot be empty")
|
||||||
|
|
||||||
|
p = Path(path).expanduser().resolve()
|
||||||
|
|
||||||
|
if not p.exists():
|
||||||
|
raise ValidationError(
|
||||||
|
f"Path does not exist: {path}\n"
|
||||||
|
f"Resolved to: {p}"
|
||||||
|
)
|
||||||
|
|
||||||
|
if must_be_file and not p.is_file():
|
||||||
|
raise ValidationError(f"Path must be a file: {path}")
|
||||||
|
|
||||||
|
if must_be_dir and not p.is_dir():
|
||||||
|
raise ValidationError(f"Path must be a directory: {path}")
|
||||||
|
|
||||||
|
return p
|
||||||
|
|
||||||
|
|
||||||
|
def validate_timeout(timeout: int, min_timeout: int = 1,
|
||||||
|
max_timeout: int = 600) -> int:
|
||||||
|
"""
|
||||||
|
Validate timeout parameter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
timeout: Timeout in seconds
|
||||||
|
min_timeout: Minimum allowed timeout
|
||||||
|
max_timeout: Maximum allowed timeout
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
int: Validated timeout
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If timeout is invalid
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_timeout(10)
|
||||||
|
10
|
||||||
|
>>> validate_timeout(0)
|
||||||
|
ValidationError: Timeout must be between 1 and 600 seconds
|
||||||
|
"""
|
||||||
|
if not isinstance(timeout, int):
|
||||||
|
raise ValidationError(f"Timeout must be integer, got {type(timeout)}")
|
||||||
|
|
||||||
|
if timeout < min_timeout:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Timeout too low: {timeout}s (minimum: {min_timeout}s)"
|
||||||
|
)
|
||||||
|
|
||||||
|
if timeout > max_timeout:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Timeout too high: {timeout}s (maximum: {max_timeout}s)"
|
||||||
|
)
|
||||||
|
|
||||||
|
return timeout
|
||||||
|
|
||||||
|
|
||||||
|
def validate_command(command: str, allow_dangerous: bool = False) -> str:
|
||||||
|
"""
|
||||||
|
Basic command safety validation.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
command: Command to validate
|
||||||
|
allow_dangerous: If False, block potentially dangerous commands
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: Validated command
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If command is invalid or dangerous
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_command("ls -la")
|
||||||
|
"ls -la"
|
||||||
|
>>> validate_command("rm -rf /", allow_dangerous=False)
|
||||||
|
ValidationError: Potentially dangerous command blocked: rm -rf
|
||||||
|
"""
|
||||||
|
if not command:
|
||||||
|
raise ValidationError("Command cannot be empty")
|
||||||
|
|
||||||
|
if not isinstance(command, str):
|
||||||
|
raise ValidationError(f"Command must be string, got {type(command)}")
|
||||||
|
|
||||||
|
command = command.strip()
|
||||||
|
|
||||||
|
if not allow_dangerous:
|
||||||
|
# Check for dangerous patterns
|
||||||
|
dangerous_patterns = [
|
||||||
|
(r'\brm\s+-rf\s+/', "rm -rf on root directory"),
|
||||||
|
(r'\bmkfs\.', "filesystem formatting"),
|
||||||
|
(r'\bdd\s+.*of=/dev/', "disk writing with dd"),
|
||||||
|
(r':(){:|:&};:', "fork bomb"),
|
||||||
|
(r'>\s*/dev/sd[a-z]', "direct disk writing"),
|
||||||
|
]
|
||||||
|
|
||||||
|
for pattern, description in dangerous_patterns:
|
||||||
|
if re.search(pattern, command, re.IGNORECASE):
|
||||||
|
raise ValidationError(
|
||||||
|
f"Potentially dangerous command blocked: {description}\n"
|
||||||
|
f"Command: {command}\n"
|
||||||
|
"Use allow_dangerous=True if you really want to execute this"
|
||||||
|
)
|
||||||
|
|
||||||
|
return command
|
||||||
|
|
||||||
|
|
||||||
|
def validate_hosts_list(hosts: List[str], valid_hosts: Optional[List[str]] = None) -> List[str]:
|
||||||
|
"""
|
||||||
|
Validate a list of hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hosts: List of host names
|
||||||
|
valid_hosts: List of valid hosts (None to skip check)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[str]: Validated host names
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValidationError: If any host is invalid
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> validate_hosts_list(["web-01", "web-02"])
|
||||||
|
["web-01", "web-02"]
|
||||||
|
"""
|
||||||
|
if not hosts:
|
||||||
|
raise ValidationError("Hosts list cannot be empty")
|
||||||
|
|
||||||
|
if not isinstance(hosts, list):
|
||||||
|
raise ValidationError(f"Hosts must be list, got {type(hosts)}")
|
||||||
|
|
||||||
|
validated = []
|
||||||
|
errors = []
|
||||||
|
|
||||||
|
for host in hosts:
|
||||||
|
try:
|
||||||
|
validated.append(validate_host(host, valid_hosts))
|
||||||
|
except ValidationError as e:
|
||||||
|
errors.append(str(e))
|
||||||
|
|
||||||
|
if errors:
|
||||||
|
raise ValidationError(
|
||||||
|
f"Invalid hosts in list:\n" + "\n".join(errors)
|
||||||
|
)
|
||||||
|
|
||||||
|
return validated
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test validators."""
|
||||||
|
print("Testing parameter validators...\n")
|
||||||
|
|
||||||
|
# Test host validation
|
||||||
|
print("1. Testing validate_host():")
|
||||||
|
try:
|
||||||
|
host = validate_host("web-01", ["web-01", "web-02", "db-01"])
|
||||||
|
print(f" ✓ Valid host: {host}")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
host = validate_host("invalid-host", ["web-01", "web-02"])
|
||||||
|
print(f" ✗ Should have failed!")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
|
||||||
|
|
||||||
|
# Test group validation
|
||||||
|
print("\n2. Testing validate_group():")
|
||||||
|
try:
|
||||||
|
group = validate_group("production", ["production", "development"])
|
||||||
|
print(f" ✓ Valid group: {group}")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
# Test path validation
|
||||||
|
print("\n3. Testing validate_path_exists():")
|
||||||
|
try:
|
||||||
|
path = validate_path_exists("/tmp", must_be_dir=True)
|
||||||
|
print(f" ✓ Valid path: {path}")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
# Test timeout validation
|
||||||
|
print("\n4. Testing validate_timeout():")
|
||||||
|
try:
|
||||||
|
timeout = validate_timeout(10)
|
||||||
|
print(f" ✓ Valid timeout: {timeout}s")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
timeout = validate_timeout(0)
|
||||||
|
print(f" ✗ Should have failed!")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✓ Correctly rejected: {e.args[0].split(chr(10))[0]}")
|
||||||
|
|
||||||
|
# Test command validation
|
||||||
|
print("\n5. Testing validate_command():")
|
||||||
|
try:
|
||||||
|
cmd = validate_command("ls -la")
|
||||||
|
print(f" ✓ Safe command: {cmd}")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✗ Error: {e}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
cmd = validate_command("rm -rf /", allow_dangerous=False)
|
||||||
|
print(f" ✗ Should have failed!")
|
||||||
|
except ValidationError as e:
|
||||||
|
print(f" ✓ Correctly blocked: {e.args[0].split(chr(10))[0]}")
|
||||||
|
|
||||||
|
print("\n✅ All parameter validators tested")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
445
scripts/workflow_executor.py
Normal file
445
scripts/workflow_executor.py
Normal file
@@ -0,0 +1,445 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Workflow executor for Tailscale SSH Sync Agent.
|
||||||
|
Common multi-machine workflow automation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
|
||||||
|
# Add utils to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent))
|
||||||
|
|
||||||
|
from utils.helpers import format_duration, get_timestamp
|
||||||
|
from sshsync_wrapper import execute_on_group, execute_on_host, push_to_hosts
|
||||||
|
from load_balancer import get_group_capacity
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def deploy_workflow(code_path: str,
|
||||||
|
staging_group: str,
|
||||||
|
prod_group: str,
|
||||||
|
run_tests: bool = True) -> Dict:
|
||||||
|
"""
|
||||||
|
Full deployment pipeline: staging → test → production.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
code_path: Path to code to deploy
|
||||||
|
staging_group: Staging server group
|
||||||
|
prod_group: Production server group
|
||||||
|
run_tests: Whether to run tests on staging
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with deployment results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = deploy_workflow("./dist", "staging", "production")
|
||||||
|
>>> result['success']
|
||||||
|
True
|
||||||
|
>>> result['duration']
|
||||||
|
"12m 45s"
|
||||||
|
"""
|
||||||
|
start_time = time.time()
|
||||||
|
results = {
|
||||||
|
'stages': {},
|
||||||
|
'success': False,
|
||||||
|
'start_time': get_timestamp()
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Stage 1: Deploy to staging
|
||||||
|
logger.info("Stage 1: Deploying to staging...")
|
||||||
|
stage1 = push_to_hosts(
|
||||||
|
local_path=code_path,
|
||||||
|
remote_path="/var/www/app",
|
||||||
|
group=staging_group,
|
||||||
|
recurse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['staging_deploy'] = stage1
|
||||||
|
|
||||||
|
if not stage1.get('success'):
|
||||||
|
results['error'] = 'Staging deployment failed'
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Build on staging
|
||||||
|
logger.info("Building on staging...")
|
||||||
|
build_result = execute_on_group(
|
||||||
|
staging_group,
|
||||||
|
"cd /var/www/app && npm run build",
|
||||||
|
timeout=300
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['staging_build'] = build_result
|
||||||
|
|
||||||
|
if not build_result.get('success'):
|
||||||
|
results['error'] = 'Staging build failed'
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Stage 2: Run tests (if enabled)
|
||||||
|
if run_tests:
|
||||||
|
logger.info("Stage 2: Running tests...")
|
||||||
|
test_result = execute_on_group(
|
||||||
|
staging_group,
|
||||||
|
"cd /var/www/app && npm test",
|
||||||
|
timeout=600
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['tests'] = test_result
|
||||||
|
|
||||||
|
if not test_result.get('success'):
|
||||||
|
results['error'] = 'Tests failed on staging'
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Stage 3: Validation
|
||||||
|
logger.info("Stage 3: Validating staging...")
|
||||||
|
health_result = execute_on_group(
|
||||||
|
staging_group,
|
||||||
|
"curl -f http://localhost:3000/health || echo 'Health check failed'",
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['staging_validation'] = health_result
|
||||||
|
|
||||||
|
# Stage 4: Deploy to production
|
||||||
|
logger.info("Stage 4: Deploying to production...")
|
||||||
|
prod_deploy = push_to_hosts(
|
||||||
|
local_path=code_path,
|
||||||
|
remote_path="/var/www/app",
|
||||||
|
group=prod_group,
|
||||||
|
recurse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['production_deploy'] = prod_deploy
|
||||||
|
|
||||||
|
if not prod_deploy.get('success'):
|
||||||
|
results['error'] = 'Production deployment failed'
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Build and restart on production
|
||||||
|
logger.info("Building and restarting production...")
|
||||||
|
prod_build = execute_on_group(
|
||||||
|
prod_group,
|
||||||
|
"cd /var/www/app && npm run build && pm2 restart app",
|
||||||
|
timeout=300
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['production_build'] = prod_build
|
||||||
|
|
||||||
|
# Stage 5: Production verification
|
||||||
|
logger.info("Stage 5: Verifying production...")
|
||||||
|
prod_health = execute_on_group(
|
||||||
|
prod_group,
|
||||||
|
"curl -f http://localhost:3000/health",
|
||||||
|
timeout=15
|
||||||
|
)
|
||||||
|
|
||||||
|
results['stages']['production_verification'] = prod_health
|
||||||
|
|
||||||
|
# Success!
|
||||||
|
results['success'] = True
|
||||||
|
results['duration'] = format_duration(time.time() - start_time)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Deployment workflow error: {e}")
|
||||||
|
results['error'] = str(e)
|
||||||
|
results['duration'] = format_duration(time.time() - start_time)
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def backup_workflow(hosts: List[str],
|
||||||
|
backup_paths: List[str],
|
||||||
|
destination: str) -> Dict:
|
||||||
|
"""
|
||||||
|
Backup files from multiple hosts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
hosts: List of hosts to backup from
|
||||||
|
backup_paths: Paths to backup on each host
|
||||||
|
destination: Local destination directory
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with backup results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = backup_workflow(
|
||||||
|
... ["db-01", "db-02"],
|
||||||
|
... ["/var/lib/mysql"],
|
||||||
|
... "./backups"
|
||||||
|
... )
|
||||||
|
>>> result['backed_up_hosts']
|
||||||
|
2
|
||||||
|
"""
|
||||||
|
from sshsync_wrapper import pull_from_host
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
results = {
|
||||||
|
'hosts': {},
|
||||||
|
'success': True,
|
||||||
|
'backed_up_hosts': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
for host in hosts:
|
||||||
|
host_results = []
|
||||||
|
|
||||||
|
for backup_path in backup_paths:
|
||||||
|
# Create timestamped backup directory
|
||||||
|
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||||
|
host_dest = f"{destination}/{host}_{timestamp}"
|
||||||
|
|
||||||
|
result = pull_from_host(
|
||||||
|
host=host,
|
||||||
|
remote_path=backup_path,
|
||||||
|
local_path=host_dest,
|
||||||
|
recurse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
host_results.append(result)
|
||||||
|
|
||||||
|
if not result.get('success'):
|
||||||
|
results['success'] = False
|
||||||
|
|
||||||
|
results['hosts'][host] = host_results
|
||||||
|
|
||||||
|
if all(r.get('success') for r in host_results):
|
||||||
|
results['backed_up_hosts'] += 1
|
||||||
|
|
||||||
|
results['duration'] = format_duration(time.time() - start_time)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def sync_workflow(source_host: str,
|
||||||
|
target_group: str,
|
||||||
|
paths: List[str]) -> Dict:
|
||||||
|
"""
|
||||||
|
Sync files from one host to many.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_host: Host to pull from
|
||||||
|
target_group: Group to push to
|
||||||
|
paths: Paths to sync
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with sync results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = sync_workflow(
|
||||||
|
... "master-db",
|
||||||
|
... "replica-dbs",
|
||||||
|
... ["/var/lib/mysql/config"]
|
||||||
|
... )
|
||||||
|
>>> result['success']
|
||||||
|
True
|
||||||
|
"""
|
||||||
|
from sshsync_wrapper import pull_from_host, push_to_hosts
|
||||||
|
import tempfile
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
results = {'paths': {}, 'success': True}
|
||||||
|
|
||||||
|
# Create temp directory
|
||||||
|
with tempfile.TemporaryDirectory() as temp_dir:
|
||||||
|
for path in paths:
|
||||||
|
# Pull from source
|
||||||
|
pull_result = pull_from_host(
|
||||||
|
host=source_host,
|
||||||
|
remote_path=path,
|
||||||
|
local_path=f"{temp_dir}/{Path(path).name}",
|
||||||
|
recurse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
if not pull_result.get('success'):
|
||||||
|
results['paths'][path] = {
|
||||||
|
'success': False,
|
||||||
|
'error': 'Pull from source failed'
|
||||||
|
}
|
||||||
|
results['success'] = False
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Push to targets
|
||||||
|
push_result = push_to_hosts(
|
||||||
|
local_path=f"{temp_dir}/{Path(path).name}",
|
||||||
|
remote_path=path,
|
||||||
|
group=target_group,
|
||||||
|
recurse=True
|
||||||
|
)
|
||||||
|
|
||||||
|
results['paths'][path] = {
|
||||||
|
'pull': pull_result,
|
||||||
|
'push': push_result,
|
||||||
|
'success': push_result.get('success', False)
|
||||||
|
}
|
||||||
|
|
||||||
|
if not push_result.get('success'):
|
||||||
|
results['success'] = False
|
||||||
|
|
||||||
|
results['duration'] = format_duration(time.time() - start_time)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def rolling_restart(group: str,
|
||||||
|
service_name: str,
|
||||||
|
wait_between: int = 30) -> Dict:
|
||||||
|
"""
|
||||||
|
Zero-downtime rolling restart of a service across group.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group to restart
|
||||||
|
service_name: Service name (e.g., "nginx", "app")
|
||||||
|
wait_between: Seconds to wait between restarts
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with restart results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = rolling_restart("web-servers", "nginx")
|
||||||
|
>>> result['restarted_count']
|
||||||
|
3
|
||||||
|
"""
|
||||||
|
from utils.helpers import parse_sshsync_config
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
hosts = groups_config.get(group, [])
|
||||||
|
|
||||||
|
if not hosts:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'error': f'Group {group} not found or empty'
|
||||||
|
}
|
||||||
|
|
||||||
|
results = {
|
||||||
|
'hosts': {},
|
||||||
|
'restarted_count': 0,
|
||||||
|
'failed_count': 0,
|
||||||
|
'success': True
|
||||||
|
}
|
||||||
|
|
||||||
|
for host in hosts:
|
||||||
|
logger.info(f"Restarting {service_name} on {host}...")
|
||||||
|
|
||||||
|
# Restart service
|
||||||
|
restart_result = execute_on_host(
|
||||||
|
host,
|
||||||
|
f"sudo systemctl restart {service_name} || sudo service {service_name} restart",
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
# Health check
|
||||||
|
time.sleep(5) # Wait for service to start
|
||||||
|
|
||||||
|
health_result = execute_on_host(
|
||||||
|
host,
|
||||||
|
f"sudo systemctl is-active {service_name} || sudo service {service_name} status",
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
success = restart_result.get('success') and health_result.get('success')
|
||||||
|
|
||||||
|
results['hosts'][host] = {
|
||||||
|
'restart': restart_result,
|
||||||
|
'health': health_result,
|
||||||
|
'success': success
|
||||||
|
}
|
||||||
|
|
||||||
|
if success:
|
||||||
|
results['restarted_count'] += 1
|
||||||
|
logger.info(f"✓ {host} restarted successfully")
|
||||||
|
else:
|
||||||
|
results['failed_count'] += 1
|
||||||
|
results['success'] = False
|
||||||
|
logger.error(f"✗ {host} restart failed")
|
||||||
|
|
||||||
|
# Wait before next restart (except last)
|
||||||
|
if host != hosts[-1]:
|
||||||
|
time.sleep(wait_between)
|
||||||
|
|
||||||
|
results['duration'] = format_duration(time.time() - start_time)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def health_check_workflow(group: str,
|
||||||
|
endpoint: str = "/health",
|
||||||
|
timeout: int = 10) -> Dict:
|
||||||
|
"""
|
||||||
|
Check health endpoint across group.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
group: Group to check
|
||||||
|
endpoint: Health endpoint path
|
||||||
|
timeout: Request timeout
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict with health check results
|
||||||
|
|
||||||
|
Example:
|
||||||
|
>>> result = health_check_workflow("production", "/health")
|
||||||
|
>>> result['healthy_count']
|
||||||
|
3
|
||||||
|
"""
|
||||||
|
from utils.helpers import parse_sshsync_config
|
||||||
|
|
||||||
|
groups_config = parse_sshsync_config()
|
||||||
|
hosts = groups_config.get(group, [])
|
||||||
|
|
||||||
|
if not hosts:
|
||||||
|
return {
|
||||||
|
'success': False,
|
||||||
|
'error': f'Group {group} not found or empty'
|
||||||
|
}
|
||||||
|
|
||||||
|
results = {
|
||||||
|
'hosts': {},
|
||||||
|
'healthy_count': 0,
|
||||||
|
'unhealthy_count': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
for host in hosts:
|
||||||
|
health_result = execute_on_host(
|
||||||
|
host,
|
||||||
|
f"curl -f -s -o /dev/null -w '%{{http_code}}' http://localhost:3000{endpoint}",
|
||||||
|
timeout=timeout
|
||||||
|
)
|
||||||
|
|
||||||
|
is_healthy = (
|
||||||
|
health_result.get('success') and
|
||||||
|
'200' in health_result.get('stdout', '')
|
||||||
|
)
|
||||||
|
|
||||||
|
results['hosts'][host] = {
|
||||||
|
'healthy': is_healthy,
|
||||||
|
'response': health_result.get('stdout', '').strip()
|
||||||
|
}
|
||||||
|
|
||||||
|
if is_healthy:
|
||||||
|
results['healthy_count'] += 1
|
||||||
|
else:
|
||||||
|
results['unhealthy_count'] += 1
|
||||||
|
|
||||||
|
results['success'] = results['unhealthy_count'] == 0
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Test workflow executor functions."""
|
||||||
|
print("Testing workflow executor...\n")
|
||||||
|
|
||||||
|
print("Note: Workflow executor requires configured hosts and groups.")
|
||||||
|
print("Tests would execute real operations, so showing dry-run simulations.\n")
|
||||||
|
|
||||||
|
print("✅ Workflow executor ready")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
180
tests/test_helpers.py
Normal file
180
tests/test_helpers.py
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Tests for helper utilities.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
||||||
|
|
||||||
|
from utils.helpers import *
|
||||||
|
|
||||||
|
|
||||||
|
def test_format_bytes():
|
||||||
|
"""Test byte formatting."""
|
||||||
|
assert format_bytes(0) == "0.0 B"
|
||||||
|
assert format_bytes(512) == "512.0 B"
|
||||||
|
assert format_bytes(1024) == "1.0 KB"
|
||||||
|
assert format_bytes(1048576) == "1.0 MB"
|
||||||
|
assert format_bytes(1073741824) == "1.0 GB"
|
||||||
|
print("✓ format_bytes() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_format_duration():
|
||||||
|
"""Test duration formatting."""
|
||||||
|
assert format_duration(30) == "30s"
|
||||||
|
assert format_duration(65) == "1m 5s"
|
||||||
|
assert format_duration(3600) == "1h"
|
||||||
|
assert format_duration(3665) == "1h 1m"
|
||||||
|
assert format_duration(7265) == "2h 1m"
|
||||||
|
print("✓ format_duration() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_format_percentage():
|
||||||
|
"""Test percentage formatting."""
|
||||||
|
assert format_percentage(45.567) == "45.6%"
|
||||||
|
assert format_percentage(100) == "100.0%"
|
||||||
|
assert format_percentage(0.123, decimals=2) == "0.12%"
|
||||||
|
print("✓ format_percentage() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_calculate_load_score():
|
||||||
|
"""Test load score calculation."""
|
||||||
|
score = calculate_load_score(50, 50, 50)
|
||||||
|
assert 0 <= score <= 1
|
||||||
|
assert abs(score - 0.5) < 0.01
|
||||||
|
|
||||||
|
score_low = calculate_load_score(20, 30, 25)
|
||||||
|
score_high = calculate_load_score(80, 85, 90)
|
||||||
|
assert score_low < score_high
|
||||||
|
|
||||||
|
print("✓ calculate_load_score() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_classify_load_status():
|
||||||
|
"""Test load status classification."""
|
||||||
|
assert classify_load_status(0.2) == "low"
|
||||||
|
assert classify_load_status(0.5) == "moderate"
|
||||||
|
assert classify_load_status(0.8) == "high"
|
||||||
|
print("✓ classify_load_status() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_classify_latency():
|
||||||
|
"""Test latency classification."""
|
||||||
|
status, desc = classify_latency(25)
|
||||||
|
assert status == "excellent"
|
||||||
|
assert "interactive" in desc.lower()
|
||||||
|
|
||||||
|
status, desc = classify_latency(150)
|
||||||
|
assert status == "fair"
|
||||||
|
|
||||||
|
print("✓ classify_latency() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_disk_usage():
|
||||||
|
"""Test disk usage parsing."""
|
||||||
|
sample_output = """Filesystem Size Used Avail Use% Mounted on
|
||||||
|
/dev/sda1 100G 45G 50G 45% /"""
|
||||||
|
|
||||||
|
result = parse_disk_usage(sample_output)
|
||||||
|
assert result['filesystem'] == '/dev/sda1'
|
||||||
|
assert result['size'] == '100G'
|
||||||
|
assert result['used'] == '45G'
|
||||||
|
assert result['use_pct'] == 45
|
||||||
|
|
||||||
|
print("✓ parse_disk_usage() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_cpu_load():
|
||||||
|
"""Test CPU load parsing."""
|
||||||
|
sample_output = "19:43:41 up 5 days, 2:15, 3 users, load average: 0.45, 0.38, 0.32"
|
||||||
|
|
||||||
|
result = parse_cpu_load(sample_output)
|
||||||
|
assert result['load_1min'] == 0.45
|
||||||
|
assert result['load_5min'] == 0.38
|
||||||
|
assert result['load_15min'] == 0.32
|
||||||
|
|
||||||
|
print("✓ parse_cpu_load() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_get_timestamp():
|
||||||
|
"""Test timestamp generation."""
|
||||||
|
ts_iso = get_timestamp(iso=True)
|
||||||
|
assert 'T' in ts_iso
|
||||||
|
assert 'Z' in ts_iso
|
||||||
|
|
||||||
|
ts_human = get_timestamp(iso=False)
|
||||||
|
assert ' ' in ts_human
|
||||||
|
assert len(ts_human) == 19 # YYYY-MM-DD HH:MM:SS
|
||||||
|
|
||||||
|
print("✓ get_timestamp() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_path():
|
||||||
|
"""Test path validation."""
|
||||||
|
assert validate_path("/tmp", must_exist=True) == True
|
||||||
|
assert validate_path("/nonexistent_path_12345", must_exist=False) == False
|
||||||
|
|
||||||
|
print("✓ validate_path() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_safe_execute():
|
||||||
|
"""Test safe execution wrapper."""
|
||||||
|
# Should return result on success
|
||||||
|
result = safe_execute(int, "42")
|
||||||
|
assert result == 42
|
||||||
|
|
||||||
|
# Should return default on failure
|
||||||
|
result = safe_execute(int, "not_a_number", default=0)
|
||||||
|
assert result == 0
|
||||||
|
|
||||||
|
print("✓ safe_execute() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run all helper tests."""
|
||||||
|
print("=" * 70)
|
||||||
|
print("HELPER TESTS")
|
||||||
|
print("=" * 70)
|
||||||
|
|
||||||
|
tests = [
|
||||||
|
test_format_bytes,
|
||||||
|
test_format_duration,
|
||||||
|
test_format_percentage,
|
||||||
|
test_calculate_load_score,
|
||||||
|
test_classify_load_status,
|
||||||
|
test_classify_latency,
|
||||||
|
test_parse_disk_usage,
|
||||||
|
test_parse_cpu_load,
|
||||||
|
test_get_timestamp,
|
||||||
|
test_validate_path,
|
||||||
|
test_safe_execute,
|
||||||
|
]
|
||||||
|
|
||||||
|
passed = 0
|
||||||
|
for test in tests:
|
||||||
|
try:
|
||||||
|
if test():
|
||||||
|
passed += 1
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ {test.__name__} failed: {e}")
|
||||||
|
|
||||||
|
print(f"\nResults: {passed}/{len(tests)} passed")
|
||||||
|
return passed == len(tests)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
success = main()
|
||||||
|
sys.exit(0 if success else 1)
|
||||||
346
tests/test_integration.py
Normal file
346
tests/test_integration.py
Normal file
@@ -0,0 +1,346 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Integration tests for Tailscale SSH Sync Agent.
|
||||||
|
Tests complete workflows from query to result.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Add scripts to path
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
||||||
|
|
||||||
|
from sshsync_wrapper import get_host_status, list_hosts, get_groups
|
||||||
|
from tailscale_manager import get_tailscale_status, get_network_summary
|
||||||
|
from load_balancer import format_load_report, MachineMetrics
|
||||||
|
from utils.helpers import (
|
||||||
|
format_bytes, format_duration, format_percentage,
|
||||||
|
calculate_load_score, classify_load_status, classify_latency
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_host_status_basic():
|
||||||
|
"""Test get_host_status() without errors."""
|
||||||
|
print("\n✓ Testing get_host_status()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = get_host_status()
|
||||||
|
|
||||||
|
# Validations
|
||||||
|
assert 'hosts' in result, "Missing 'hosts' in result"
|
||||||
|
assert isinstance(result.get('hosts', []), list), "'hosts' must be list"
|
||||||
|
|
||||||
|
# Should have basic counts even if no hosts configured
|
||||||
|
assert 'total_count' in result, "Missing 'total_count'"
|
||||||
|
assert 'online_count' in result, "Missing 'online_count'"
|
||||||
|
assert 'offline_count' in result, "Missing 'offline_count'"
|
||||||
|
|
||||||
|
print(f" ✓ Found {result.get('total_count', 0)} hosts")
|
||||||
|
print(f" ✓ Online: {result.get('online_count', 0)}")
|
||||||
|
print(f" ✓ Offline: {result.get('offline_count', 0)}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_list_hosts():
|
||||||
|
"""Test list_hosts() function."""
|
||||||
|
print("\n✓ Testing list_hosts()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = list_hosts(with_status=False)
|
||||||
|
|
||||||
|
assert 'hosts' in result, "Missing 'hosts' in result"
|
||||||
|
assert 'count' in result, "Missing 'count' in result"
|
||||||
|
assert isinstance(result['hosts'], list), "'hosts' must be list"
|
||||||
|
|
||||||
|
print(f" ✓ List hosts working")
|
||||||
|
print(f" ✓ Found {result['count']} configured hosts")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_get_groups():
|
||||||
|
"""Test get_groups() function."""
|
||||||
|
print("\n✓ Testing get_groups()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
groups = get_groups()
|
||||||
|
|
||||||
|
assert isinstance(groups, dict), "Groups must be dict"
|
||||||
|
|
||||||
|
print(f" ✓ Groups config loaded")
|
||||||
|
print(f" ✓ Found {len(groups)} groups")
|
||||||
|
|
||||||
|
for group, hosts in list(groups.items())[:3]: # Show first 3
|
||||||
|
print(f" - {group}: {len(hosts)} hosts")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_tailscale_status():
|
||||||
|
"""Test Tailscale status check."""
|
||||||
|
print("\n✓ Testing get_tailscale_status()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
status = get_tailscale_status()
|
||||||
|
|
||||||
|
assert isinstance(status, dict), "Status must be dict"
|
||||||
|
assert 'connected' in status, "Missing 'connected' field"
|
||||||
|
|
||||||
|
if status.get('connected'):
|
||||||
|
print(f" ✓ Tailscale connected")
|
||||||
|
print(f" ✓ Peers: {status.get('total_count', 0)} total, {status.get('online_count', 0)} online")
|
||||||
|
else:
|
||||||
|
print(f" ℹ Tailscale not connected: {status.get('error', 'Unknown')}")
|
||||||
|
print(f" (This is OK if Tailscale is not installed/configured)")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_network_summary():
|
||||||
|
"""Test network summary generation."""
|
||||||
|
print("\n✓ Testing get_network_summary()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
summary = get_network_summary()
|
||||||
|
|
||||||
|
assert isinstance(summary, str), "Summary must be string"
|
||||||
|
assert len(summary) > 0, "Summary cannot be empty"
|
||||||
|
|
||||||
|
print(f" ✓ Network summary generated:")
|
||||||
|
for line in summary.split('\n'):
|
||||||
|
print(f" {line}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_format_helpers():
|
||||||
|
"""Test formatting helper functions."""
|
||||||
|
print("\n✓ Testing format helpers...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Test format_bytes
|
||||||
|
assert format_bytes(1024) == "1.0 KB", "format_bytes failed for 1024"
|
||||||
|
assert format_bytes(12582912) == "12.0 MB", "format_bytes failed for 12MB"
|
||||||
|
|
||||||
|
# Test format_duration
|
||||||
|
assert format_duration(65) == "1m 5s", "format_duration failed for 65s"
|
||||||
|
assert format_duration(3665) == "1h 1m", "format_duration failed for 1h+"
|
||||||
|
|
||||||
|
# Test format_percentage
|
||||||
|
assert format_percentage(45.567) == "45.6%", "format_percentage failed"
|
||||||
|
|
||||||
|
print(f" ✓ format_bytes(12582912) = {format_bytes(12582912)}")
|
||||||
|
print(f" ✓ format_duration(3665) = {format_duration(3665)}")
|
||||||
|
print(f" ✓ format_percentage(45.567) = {format_percentage(45.567)}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_score_calculation():
|
||||||
|
"""Test load score calculation."""
|
||||||
|
print("\n✓ Testing calculate_load_score()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Test various scenarios
|
||||||
|
score1 = calculate_load_score(45, 60, 40)
|
||||||
|
assert 0 <= score1 <= 1, "Score must be 0-1"
|
||||||
|
assert abs(score1 - 0.49) < 0.01, f"Expected ~0.49, got {score1}"
|
||||||
|
|
||||||
|
score2 = calculate_load_score(20, 35, 30)
|
||||||
|
assert score2 < score1, "Lower usage should have lower score"
|
||||||
|
|
||||||
|
score3 = calculate_load_score(85, 70, 65)
|
||||||
|
assert score3 > score1, "Higher usage should have higher score"
|
||||||
|
|
||||||
|
print(f" ✓ Low load (20%, 35%, 30%): {score2:.2f}")
|
||||||
|
print(f" ✓ Med load (45%, 60%, 40%): {score1:.2f}")
|
||||||
|
print(f" ✓ High load (85%, 70%, 65%): {score3:.2f}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_classification():
|
||||||
|
"""Test load status classification."""
|
||||||
|
print("\n✓ Testing classify_load_status()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
assert classify_load_status(0.28) == "low", "0.28 should be 'low'"
|
||||||
|
assert classify_load_status(0.55) == "moderate", "0.55 should be 'moderate'"
|
||||||
|
assert classify_load_status(0.82) == "high", "0.82 should be 'high'"
|
||||||
|
|
||||||
|
print(f" ✓ Score 0.28 = {classify_load_status(0.28)}")
|
||||||
|
print(f" ✓ Score 0.55 = {classify_load_status(0.55)}")
|
||||||
|
print(f" ✓ Score 0.82 = {classify_load_status(0.82)}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_latency_classification():
|
||||||
|
"""Test network latency classification."""
|
||||||
|
print("\n✓ Testing classify_latency()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
status1, desc1 = classify_latency(25)
|
||||||
|
assert status1 == "excellent", "25ms should be 'excellent'"
|
||||||
|
|
||||||
|
status2, desc2 = classify_latency(75)
|
||||||
|
assert status2 == "good", "75ms should be 'good'"
|
||||||
|
|
||||||
|
status3, desc3 = classify_latency(150)
|
||||||
|
assert status3 == "fair", "150ms should be 'fair'"
|
||||||
|
|
||||||
|
status4, desc4 = classify_latency(250)
|
||||||
|
assert status4 == "poor", "250ms should be 'poor'"
|
||||||
|
|
||||||
|
print(f" ✓ 25ms: {status1} - {desc1}")
|
||||||
|
print(f" ✓ 75ms: {status2} - {desc2}")
|
||||||
|
print(f" ✓ 150ms: {status3} - {desc3}")
|
||||||
|
print(f" ✓ 250ms: {status4} - {desc4}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_report_formatting():
|
||||||
|
"""Test load report formatting."""
|
||||||
|
print("\n✓ Testing format_load_report()...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
metrics = MachineMetrics(
|
||||||
|
host='web-01',
|
||||||
|
cpu_pct=45.0,
|
||||||
|
mem_pct=60.0,
|
||||||
|
disk_pct=40.0,
|
||||||
|
load_score=0.49,
|
||||||
|
status='moderate'
|
||||||
|
)
|
||||||
|
|
||||||
|
report = format_load_report(metrics)
|
||||||
|
|
||||||
|
assert 'web-01' in report, "Report must include hostname"
|
||||||
|
assert '0.49' in report, "Report must include load score"
|
||||||
|
assert 'moderate' in report, "Report must include status"
|
||||||
|
|
||||||
|
print(f" ✓ Report generated:")
|
||||||
|
for line in report.split('\n'):
|
||||||
|
print(f" {line}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def test_dry_run_execution():
|
||||||
|
"""Test dry-run mode for operations."""
|
||||||
|
print("\n✓ Testing dry-run execution...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
from sshsync_wrapper import execute_on_all
|
||||||
|
|
||||||
|
result = execute_on_all("uptime", dry_run=True)
|
||||||
|
|
||||||
|
assert result.get('dry_run') == True, "Must indicate dry-run mode"
|
||||||
|
assert 'command' in result, "Must include command"
|
||||||
|
assert 'message' in result, "Must include message"
|
||||||
|
|
||||||
|
print(f" ✓ Dry-run mode working")
|
||||||
|
print(f" ✓ Command: {result.get('command')}")
|
||||||
|
print(f" ✓ Message: {result.get('message')}")
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ✗ FAILED: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run all integration tests."""
|
||||||
|
print("=" * 70)
|
||||||
|
print("INTEGRATION TESTS - Tailscale SSH Sync Agent")
|
||||||
|
print("=" * 70)
|
||||||
|
|
||||||
|
tests = [
|
||||||
|
("Host status check", test_host_status_basic),
|
||||||
|
("List hosts", test_list_hosts),
|
||||||
|
("Get groups", test_get_groups),
|
||||||
|
("Tailscale status", test_tailscale_status),
|
||||||
|
("Network summary", test_network_summary),
|
||||||
|
("Format helpers", test_format_helpers),
|
||||||
|
("Load score calculation", test_load_score_calculation),
|
||||||
|
("Load classification", test_load_classification),
|
||||||
|
("Latency classification", test_latency_classification),
|
||||||
|
("Load report formatting", test_load_report_formatting),
|
||||||
|
("Dry-run execution", test_dry_run_execution),
|
||||||
|
]
|
||||||
|
|
||||||
|
results = []
|
||||||
|
for test_name, test_func in tests:
|
||||||
|
passed = test_func()
|
||||||
|
results.append((test_name, passed))
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print("\n" + "=" * 70)
|
||||||
|
print("SUMMARY")
|
||||||
|
print("=" * 70)
|
||||||
|
|
||||||
|
for test_name, passed in results:
|
||||||
|
status = "✅ PASS" if passed else "❌ FAIL"
|
||||||
|
print(f"{status}: {test_name}")
|
||||||
|
|
||||||
|
passed_count = sum(1 for _, p in results if p)
|
||||||
|
total_count = len(results)
|
||||||
|
|
||||||
|
print(f"\nResults: {passed_count}/{total_count} passed")
|
||||||
|
|
||||||
|
if passed_count == total_count:
|
||||||
|
print("\n🎉 All tests passed!")
|
||||||
|
else:
|
||||||
|
print(f"\n⚠️ {total_count - passed_count} test(s) failed")
|
||||||
|
|
||||||
|
return passed_count == total_count
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
success = main()
|
||||||
|
sys.exit(0 if success else 1)
|
||||||
177
tests/test_validation.py
Normal file
177
tests/test_validation.py
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Tests for validators.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent / 'scripts'))
|
||||||
|
|
||||||
|
from utils.validators import *
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_host():
|
||||||
|
"""Test host validation."""
|
||||||
|
# Valid host
|
||||||
|
assert validate_host("web-01") == "web-01"
|
||||||
|
assert validate_host(" web-01 ") == "web-01" # Strips whitespace
|
||||||
|
|
||||||
|
# With valid list
|
||||||
|
assert validate_host("web-01", ["web-01", "web-02"]) == "web-01"
|
||||||
|
|
||||||
|
# Invalid format
|
||||||
|
try:
|
||||||
|
validate_host("web@01") # Invalid character
|
||||||
|
assert False, "Should have raised ValidationError"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ validate_host() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_group():
|
||||||
|
"""Test group validation."""
|
||||||
|
# Valid group
|
||||||
|
assert validate_group("production") == "production"
|
||||||
|
assert validate_group("PRODUCTION") == "production" # Lowercase normalization
|
||||||
|
|
||||||
|
# With valid list
|
||||||
|
assert validate_group("production", ["production", "staging"]) == "production"
|
||||||
|
|
||||||
|
# Invalid
|
||||||
|
try:
|
||||||
|
validate_group("invalid!", ["production"])
|
||||||
|
assert False, "Should have raised ValidationError"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ validate_group() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_path_exists():
|
||||||
|
"""Test path existence validation."""
|
||||||
|
# Valid path
|
||||||
|
path = validate_path_exists("/tmp", must_be_dir=True)
|
||||||
|
assert isinstance(path, Path)
|
||||||
|
|
||||||
|
# Invalid path
|
||||||
|
try:
|
||||||
|
validate_path_exists("/nonexistent_12345")
|
||||||
|
assert False, "Should have raised ValidationError"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ validate_path_exists() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_timeout():
|
||||||
|
"""Test timeout validation."""
|
||||||
|
# Valid timeouts
|
||||||
|
assert validate_timeout(10) == 10
|
||||||
|
assert validate_timeout(1) == 1
|
||||||
|
assert validate_timeout(600) == 600
|
||||||
|
|
||||||
|
# Too low
|
||||||
|
try:
|
||||||
|
validate_timeout(0)
|
||||||
|
assert False, "Should have raised ValidationError"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Too high
|
||||||
|
try:
|
||||||
|
validate_timeout(1000)
|
||||||
|
assert False, "Should have raised ValidationError"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ validate_timeout() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_command():
|
||||||
|
"""Test command validation."""
|
||||||
|
# Safe commands
|
||||||
|
assert validate_command("ls -la") == "ls -la"
|
||||||
|
assert validate_command("uptime") == "uptime"
|
||||||
|
|
||||||
|
# Dangerous commands (should fail without allow_dangerous)
|
||||||
|
try:
|
||||||
|
validate_command("rm -rf /")
|
||||||
|
assert False, "Should have blocked dangerous command"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# But should work with allow_dangerous
|
||||||
|
assert validate_command("rm -rf /tmp/test", allow_dangerous=True)
|
||||||
|
|
||||||
|
print("✓ validate_command() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_hosts_list():
|
||||||
|
"""Test list validation."""
|
||||||
|
# Valid list
|
||||||
|
hosts = validate_hosts_list(["web-01", "web-02"])
|
||||||
|
assert len(hosts) == 2
|
||||||
|
assert "web-01" in hosts
|
||||||
|
|
||||||
|
# Empty list
|
||||||
|
try:
|
||||||
|
validate_hosts_list([])
|
||||||
|
assert False, "Should have raised ValidationError for empty list"
|
||||||
|
except ValidationError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
print("✓ validate_hosts_list() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def test_get_invalid_hosts():
|
||||||
|
"""Test finding invalid hosts."""
|
||||||
|
# Test with mix of valid and invalid
|
||||||
|
# (This would require actual SSH config, so we test the function exists)
|
||||||
|
result = get_invalid_hosts(["web-01", "nonexistent-host-12345"])
|
||||||
|
assert isinstance(result, list)
|
||||||
|
|
||||||
|
print("✓ get_invalid_hosts() passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run all validation tests."""
|
||||||
|
print("=" * 70)
|
||||||
|
print("VALIDATION TESTS")
|
||||||
|
print("=" * 70)
|
||||||
|
|
||||||
|
tests = [
|
||||||
|
test_validate_host,
|
||||||
|
test_validate_group,
|
||||||
|
test_validate_path_exists,
|
||||||
|
test_validate_timeout,
|
||||||
|
test_validate_command,
|
||||||
|
test_validate_hosts_list,
|
||||||
|
test_get_invalid_hosts,
|
||||||
|
]
|
||||||
|
|
||||||
|
passed = 0
|
||||||
|
for test in tests:
|
||||||
|
try:
|
||||||
|
if test():
|
||||||
|
passed += 1
|
||||||
|
except Exception as e:
|
||||||
|
print(f"✗ {test.__name__} failed: {e}")
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
print(f"\nResults: {passed}/{len(tests)} passed")
|
||||||
|
return passed == len(tests)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
success = main()
|
||||||
|
sys.exit(0 if success else 1)
|
||||||
Reference in New Issue
Block a user