Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:37:27 +08:00
commit 37774aa937
131 changed files with 31137 additions and 0 deletions

View File

@@ -0,0 +1,632 @@
# Operations Runbook: {{PROJECT_NAME}}
**Document Version:** 1.0
**Date:** {{DATE}}
**Status:** {{STATUS}}
<!-- SCOPE: ALL operational procedures (local development setup, Docker commands, environment variables, testing commands, build/deployment, production operations, troubleshooting, SSH access, logs, restart procedures) ONLY. -->
<!-- DO NOT add here: Architecture patterns → architecture.md, Tech stack versions → tech_stack.md, Database schema → database_schema.md, API endpoints → api_spec.md, Testing strategy → tests/README.md, Design system → design_guidelines.md, Requirements → requirements.md -->
---
## 1. Overview
### 1.1 Purpose
This runbook provides step-by-step operational procedures for {{PROJECT_NAME}} across all environments: local development, testing, and production.
### 1.2 Quick Links
- Architecture: {{ARCHITECTURE_LINK}}
- Tech Stack: {{TECH_STACK_LINK}}
- API Spec: {{API_SPEC_LINK}}
- Database Schema: {{DATABASE_SCHEMA_LINK}}
### 1.3 Key Contacts
{{KEY_CONTACTS}}
<!-- Example:
| Role | Name | Contact | Availability |
|------|------|---------|--------------|
| DevOps Lead | John Doe | john@example.com | 24/7 on-call |
| Tech Lead | Jane Smith | jane@example.com | Mon-Fri 9-6 |
| DBA | Bob Johnson | bob@example.com | Mon-Fri 9-5 |
-->
---
## 2. Prerequisites
### 2.1 Required Tools
{{REQUIRED_TOOLS}}
<!-- Example:
| Tool | Version | Installation |
|------|---------|--------------|
| Docker | 24.0+ | https://docs.docker.com/get-docker/ |
| Docker Compose | 2.20+ | Included with Docker Desktop |
| Node.js | 20.x LTS | https://nodejs.org/ (for local npm scripts) |
| Git | 2.40+ | https://git-scm.com/ |
-->
### 2.2 Access Requirements
{{ACCESS_REQUIREMENTS}}
<!-- Example:
- GitHub repository access (read for development, write for deployment)
- Production SSH keys (request from DevOps lead)
- Database credentials (stored in 1Password vault "ProjectName")
- AWS Console access (IAM role: Developer)
- VPN access for production (if required)
-->
### 2.3 Environment Variables
See [Appendix A: Environment Variables](#appendix-a-environment-variables-reference) for complete reference.
---
## 3. Local Development
### 3.1 Initial Setup
```bash
# Clone repository
git clone https://github.com/org/{{PROJECT_NAME}}.git
cd {{PROJECT_NAME}}
# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# See Appendix A for required variables
# Build and start services
docker compose up -d
# Wait for services to be ready (check logs)
docker compose logs -f app
```
**Expected output:**
```
app-1 | Server started on port 3000
db-1 | database system is ready to accept connections
```
### 3.2 Docker Commands
**Start all services:**
```bash
docker compose up -d
```
**Stop all services:**
```bash
docker compose down
```
**Rebuild after code changes:**
```bash
docker compose down
docker compose build --no-cache app
docker compose up -d
```
**View logs:**
```bash
# All services
docker compose logs -f
# Specific service
docker compose logs -f app
# Last 100 lines
docker compose logs --tail 100 app
```
**Exec into running container:**
```bash
docker compose exec app sh
# or
docker compose exec app bash
```
**Restart specific service:**
```bash
docker compose restart app
```
### 3.3 Database Operations (Local)
**Run migrations:**
```bash
docker compose exec app npm run migrate
# Or using Prisma
docker compose exec app npx prisma migrate dev
```
**Seed database:**
```bash
docker compose exec app npm run seed
```
**Reset database (⚠️ DESTRUCTIVE):**
```bash
docker compose down
docker volume rm {{PROJECT_NAME}}_postgres_data
docker compose up -d
docker compose exec app npm run migrate
docker compose exec app npm run seed
```
**Database shell:**
```bash
# PostgreSQL
docker compose exec db psql -U {{DB_USER}} -d {{DB_NAME}}
# MySQL
docker compose exec db mysql -u {{DB_USER}} -p{{DB_PASSWORD}} {{DB_NAME}}
```
### 3.4 Common Development Tasks
**Install dependencies (after package.json changes):**
```bash
docker compose down
docker compose build app
docker compose up -d
```
**Run linter:**
```bash
docker compose exec app npm run lint
# Fix automatically
docker compose exec app npm run lint:fix
```
**Format code:**
```bash
docker compose exec app npm run format
```
**Check syntax (TypeScript):**
```bash
docker compose exec app npm run type-check
```
---
## 4. Testing
### 4.1 Run All Tests
```bash
# Using docker-compose.test.yml
docker compose -f docker-compose.test.yml up --abort-on-container-exit
```
### 4.2 Run Specific Test Types
**Unit tests:**
```bash
docker compose exec app npm run test:unit
# Watch mode
docker compose exec app npm run test:unit:watch
```
**Integration tests:**
```bash
docker compose exec app npm run test:integration
```
**E2E tests:**
```bash
# Start app first
docker compose up -d
# Run E2E
docker compose exec app npm run test:e2e
```
### 4.3 Test Coverage
```bash
docker compose exec app npm run test:coverage
# Open coverage report
open coverage/index.html
```
### 4.4 Debug Tests
```bash
# Run single test file
docker compose exec app npm test -- path/to/test.spec.ts
# Run with debugging
docker compose exec app node --inspect-brk=0.0.0.0:9229 node_modules/.bin/jest
```
---
## 5. Build & Deployment
### 5.1 Build for Production
```bash
# Build production image
docker build -t {{PROJECT_NAME}}:{{VERSION}} .
# Test production build locally
docker run -p 3000:3000 --env-file .env.production {{PROJECT_NAME}}:{{VERSION}}
```
### 5.2 Deployment to Production
{{DEPLOYMENT_PROCEDURE}}
<!-- Example:
**Prerequisites:**
- [ ] All tests passing (CI/CD green)
- [ ] Code reviewed and approved
- [ ] Database migrations tested in staging
- [ ] Backup created
**Deployment steps:**
```bash
# 1. SSH to production server
ssh production-server
# 2. Navigate to project directory
cd /opt/{{PROJECT_NAME}}
# 3. Pull latest code
git pull origin main
# 4. Backup database
./scripts/backup-db.sh
# 5. Stop services
docker compose down
# 6. Rebuild images
docker compose build --no-cache
# 7. Run migrations
docker compose run --rm app npm run migrate
# 8. Start services
docker compose up -d
# 9. Verify deployment
docker compose logs -f app
curl http://localhost:3000/health
```
**Rollback procedure (if deployment fails):**
```bash
# 1. Rollback code
git reset --hard HEAD~1
# 2. Restore database (if migrations ran)
./scripts/restore-db.sh {{BACKUP_FILE}}
# 3. Restart services
docker compose down && docker compose up -d
```
-->
---
## 6. Production Operations
### 6.1 SSH Access
**SSH to production server:**
```bash
ssh {{PRODUCTION_USER}}@{{PRODUCTION_HOST}}
# Or with SSH key
ssh -i ~/.ssh/{{PROJECT_NAME}}_prod.pem {{PRODUCTION_USER}}@{{PRODUCTION_HOST}}
```
**SSH via jump host (if behind VPN):**
```bash
ssh -J {{JUMP_HOST}} {{PRODUCTION_USER}}@{{PRODUCTION_HOST}}
```
### 6.2 Health Checks
**Check application status:**
```bash
# Health endpoint
curl http://localhost:3000/health
# Expected response:
# {"status": "ok", "uptime": 123456, "timestamp": "2024-01-01T00:00:00Z"}
```
**Check service status:**
```bash
docker compose ps
# Expected output:
# NAME STATUS PORTS
# app-1 Up 5 minutes 0.0.0.0:3000->3000/tcp
# db-1 Up 5 minutes 5432/tcp
# cache-1 Up 5 minutes 6379/tcp
```
**Check resource usage:**
```bash
docker stats
# Or specific container
docker stats app-1
```
### 6.3 Monitoring & Logs
**View logs:**
```bash
# Real-time logs (all services)
docker compose logs -f
# Last 500 lines from app
docker compose logs --tail 500 app
# Logs from specific time
docker compose logs --since 2024-01-01T00:00:00 app
# Save logs to file
docker compose logs --no-color app > app-logs-$(date +%Y%m%d).log
```
**Search logs:**
```bash
# Find errors
docker compose logs app | grep ERROR
# Find specific request
docker compose logs app | grep "request_id=123"
```
**Log rotation:**
{{LOG_ROTATION}}
<!-- Example: Docker logs automatically rotate at 100MB, keep last 3 files. Manual rotation: `docker compose down && docker compose up -d` -->
### 6.4 Common Maintenance Tasks
**Restart application (zero downtime):**
```bash
docker compose up -d --no-deps --force-recreate app
```
**Clear cache:**
```bash
docker compose exec cache redis-cli FLUSHALL
```
**Database backup:**
```bash
# PostgreSQL
docker compose exec db pg_dump -U {{DB_USER}} {{DB_NAME}} > backup-$(date +%Y%m%d-%H%M%S).sql
# MySQL
docker compose exec db mysqldump -u {{DB_USER}} -p{{DB_PASSWORD}} {{DB_NAME}} > backup-$(date +%Y%m%d-%H%M%S).sql
```
**Database restore:**
```bash
# PostgreSQL
cat backup-20240101-120000.sql | docker compose exec -T db psql -U {{DB_USER}} {{DB_NAME}}
# MySQL
cat backup-20240101-120000.sql | docker compose exec -T db mysql -u {{DB_USER}} -p{{DB_PASSWORD}} {{DB_NAME}}
```
**Update dependencies:**
```bash
# Update package.json
docker compose exec app npm update
# Rebuild image
docker compose down
docker compose build --no-cache app
docker compose up -d
```
---
## 7. Troubleshooting
### 7.1 Common Issues
#### Issue 1: Application won't start
**Symptoms:**
```
app-1 | Error: Cannot connect to database
app-1 | Error: ECONNREFUSED
```
**Diagnosis:**
```bash
# Check if database is running
docker compose ps db
# Check database logs
docker compose logs db
# Test database connection
docker compose exec app nc -zv db 5432
```
**Resolution:**
```bash
# Restart database
docker compose restart db
# Wait for database to be ready
docker compose logs -f db
# Look for: "database system is ready to accept connections"
# Restart app
docker compose restart app
```
---
#### Issue 2: Out of disk space
**Symptoms:**
```
Error: no space left on device
```
**Diagnosis:**
```bash
df -h
docker system df
```
**Resolution:**
```bash
# Remove unused Docker resources
docker system prune -a
# Remove specific volumes (⚠️ DESTRUCTIVE)
docker volume rm {{PROJECT_NAME}}_postgres_data
# Remove old log files
find /var/log -name "*.log" -mtime +30 -delete
```
---
#### Issue 3: {{ISSUE_3_NAME}}
{{ISSUE_3_TROUBLESHOOTING}}
<!-- Add project-specific common issues -->
---
### 7.2 Emergency Procedures
**Production outage:**
```bash
# 1. Check health status
curl http://localhost:3000/health
# 2. Check logs for errors
docker compose logs --tail 200 app | grep ERROR
# 3. Restart services
docker compose restart
# 4. If restart fails, rollback
git reset --hard HEAD~1
docker compose down && docker compose up -d
# 5. Notify team (Slack/PagerDuty)
```
**Database corruption:**
```bash
# 1. Stop application
docker compose stop app
# 2. Restore from latest backup
./scripts/restore-db.sh {{LATEST_BACKUP}}
# 3. Verify data integrity
docker compose exec db psql -U {{DB_USER}} -d {{DB_NAME}} -c "SELECT COUNT(*) FROM users;"
# 4. Restart application
docker compose start app
```
---
## 8. Appendices
### Appendix A: Environment Variables Reference
**Required variables:**
| Variable | Example | Description |
|----------|---------|-------------|
| `DATABASE_URL` | `postgresql://user:pass@db:5432/myapp` | Database connection string |
| `REDIS_URL` | `redis://cache:6379` | Cache connection string |
| `API_KEY` | `sk_live_abc123...` | External API key (e.g., Stripe) |
| `JWT_SECRET` | `random_secret_key` | JWT signing secret |
| `NODE_ENV` | `development` or `production` | Environment mode |
**Optional variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3000` | Application port |
| `LOG_LEVEL` | `info` | Logging verbosity (debug/info/warn/error) |
| `RATE_LIMIT` | `100` | API rate limit (requests/minute) |
---
### Appendix B: Service Dependencies
{{SERVICE_DEPENDENCIES}}
<!-- Example:
**Internal dependencies:**
- app → db (PostgreSQL 16)
- app → cache (Redis 7)
- app → queue (RabbitMQ 3.12)
**External dependencies:**
- Stripe API (https://api.stripe.com)
- SendGrid API (https://api.sendgrid.com)
- AWS S3 (file storage)
**Health check URLs:**
- App: http://localhost:3000/health
- Database: `docker compose exec db pg_isready`
- Cache: `docker compose exec cache redis-cli ping`
-->
---
### Appendix C: Port Mapping
{{PORT_MAPPING}}
<!-- Example:
| Service | Container Port | Host Port | Description |
|---------|----------------|-----------|-------------|
| app | 3000 | 3000 | Application HTTP |
| db | 5432 | 5432 | PostgreSQL |
| cache | 6379 | 6379 | Redis |
| adminer | 8080 | 8080 | Database admin UI |
-->
---
## 9. Maintenance
**Last Updated:** {{DATE}}
**Update Triggers:**
- New deployment procedures
- Infrastructure changes (new services, ports)
- New operational commands
- Troubleshooting scenarios discovered
- Environment variable changes
- SSH access changes
**Verification:**
- [ ] All commands tested in staging
- [ ] SSH access verified
- [ ] Health check procedures validated
- [ ] Backup/restore procedures tested
- [ ] Emergency procedures reviewed
- [ ] Contact information current
---
**Version:** 1.0.0
**Template Last Updated:** 2025-11-16