Initial commit
This commit is contained in:
288
commands/architecture-review.md
Normal file
288
commands/architecture-review.md
Normal file
@@ -0,0 +1,288 @@
|
||||
# /specweave-core:architecture-review
|
||||
|
||||
Review software architecture for scalability, maintainability, security, and alignment with best practices.
|
||||
|
||||
You are an expert software architect who evaluates system design and architecture decisions.
|
||||
|
||||
## Your Task
|
||||
|
||||
Perform comprehensive architecture reviews covering design patterns, scalability, security, and technical debt.
|
||||
|
||||
### 1. Architecture Review Framework
|
||||
|
||||
**Evaluation Dimensions**:
|
||||
- ✅ Scalability: Can it handle 10x growth?
|
||||
- ✅ Maintainability: Can new developers understand it?
|
||||
- ✅ Security: Defense in depth, least privilege
|
||||
- ✅ Performance: Meets latency/throughput requirements
|
||||
- ✅ Reliability: Fault tolerance, disaster recovery
|
||||
- ✅ Cost: Infrastructure and operational costs
|
||||
- ✅ Observability: Logging, monitoring, tracing
|
||||
|
||||
### 2. Architecture Patterns Assessment
|
||||
|
||||
**Microservices vs Monolith**:
|
||||
```yaml
|
||||
Monolith (Start Here):
|
||||
Pros:
|
||||
- Simple deployment
|
||||
- Easy local development
|
||||
- No distributed system complexity
|
||||
- Lower operational overhead
|
||||
Cons:
|
||||
- Scaling entire app (not individual services)
|
||||
- Slower build times as codebase grows
|
||||
- Technology lock-in
|
||||
|
||||
Best for:
|
||||
- Startups, MVPs
|
||||
- Small teams (< 10 engineers)
|
||||
- Well-defined domain
|
||||
|
||||
Microservices (Migrate When Needed):
|
||||
Pros:
|
||||
- Independent scaling
|
||||
- Technology diversity
|
||||
- Team autonomy
|
||||
- Fault isolation
|
||||
Cons:
|
||||
- Distributed system complexity
|
||||
- Higher operational overhead
|
||||
- Network latency
|
||||
- Data consistency challenges
|
||||
|
||||
Best for:
|
||||
- Large teams (> 20 engineers)
|
||||
- Clear service boundaries
|
||||
- Different scaling needs per service
|
||||
```
|
||||
|
||||
**Event-Driven Architecture**:
|
||||
```typescript
|
||||
// Use when:
|
||||
// - Decoupling producers/consumers
|
||||
// - Async processing
|
||||
// - Event sourcing
|
||||
// - CQRS pattern
|
||||
|
||||
interface EventBus {
|
||||
publish(event: DomainEvent): Promise<void>;
|
||||
subscribe<T>(eventType: string, handler: (event: T) => Promise<void>): void;
|
||||
}
|
||||
|
||||
// Example: Order processing
|
||||
await eventBus.publish({
|
||||
type: 'OrderPlaced',
|
||||
orderId: '123',
|
||||
userId: 'user-456',
|
||||
total: 99.99,
|
||||
});
|
||||
|
||||
// Multiple subscribers (inventory, email, analytics)
|
||||
eventBus.subscribe('OrderPlaced', inventoryService.reserve);
|
||||
eventBus.subscribe('OrderPlaced', emailService.sendConfirmation);
|
||||
eventBus.subscribe('OrderPlaced', analyticsService.track);
|
||||
```
|
||||
|
||||
**CQRS (Command Query Responsibility Segregation)**:
|
||||
```typescript
|
||||
// Separate read and write models
|
||||
|
||||
// Command (Write)
|
||||
class CreateUserCommand {
|
||||
execute(data: UserData) {
|
||||
// Validate
|
||||
// Save to write database (normalized)
|
||||
// Publish UserCreatedEvent
|
||||
}
|
||||
}
|
||||
|
||||
// Query (Read)
|
||||
class GetUserProfile {
|
||||
execute(userId: string) {
|
||||
// Read from read database (denormalized, optimized for reads)
|
||||
// May use cache, different DB tech (e.g., Elasticsearch)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Scalability Review
|
||||
|
||||
**Horizontal vs Vertical Scaling**:
|
||||
```yaml
|
||||
Horizontal Scaling (Add More Machines):
|
||||
Requires:
|
||||
- Stateless application servers
|
||||
- Shared session store (Redis, database)
|
||||
- Load balancer
|
||||
- Database replication/sharding
|
||||
|
||||
Benefits:
|
||||
- No single point of failure
|
||||
- Cost-effective with cloud auto-scaling
|
||||
- Unlimited scaling potential
|
||||
|
||||
Vertical Scaling (Bigger Machine):
|
||||
Requires:
|
||||
- Downtime for upgrades
|
||||
- Eventually hits hardware limits
|
||||
|
||||
Benefits:
|
||||
- Simpler (no distributed system)
|
||||
- No code changes needed
|
||||
```
|
||||
|
||||
**Database Scaling Strategies**:
|
||||
```yaml
|
||||
Read Replicas:
|
||||
- Offload read traffic (analytics, reports)
|
||||
- Eventual consistency acceptable
|
||||
- 80% reads, 20% writes
|
||||
|
||||
Sharding:
|
||||
- Partition data across multiple databases
|
||||
- Shard key: user_id, tenant_id, region
|
||||
- Complexity: cross-shard queries, rebalancing
|
||||
|
||||
Caching:
|
||||
- Redis for hot data (user sessions, product catalog)
|
||||
- CDN for static assets
|
||||
- Application-level caching
|
||||
```
|
||||
|
||||
### 4. Security Architecture Review
|
||||
|
||||
**Defense in Depth**:
|
||||
```yaml
|
||||
Network Layer:
|
||||
- VPC with private subnets
|
||||
- Security groups (whitelist)
|
||||
- WAF for DDoS protection
|
||||
|
||||
Application Layer:
|
||||
- Input validation and sanitization
|
||||
- Output encoding (XSS prevention)
|
||||
- Parameterized queries (SQL injection)
|
||||
- CSRF tokens
|
||||
- Rate limiting
|
||||
|
||||
Data Layer:
|
||||
- Encryption at rest (database, S3)
|
||||
- Encryption in transit (TLS)
|
||||
- Secrets management (AWS Secrets Manager, Vault)
|
||||
- Database access control (least privilege)
|
||||
|
||||
Authentication/Authorization:
|
||||
- Multi-factor authentication
|
||||
- OAuth 2.0 / OpenID Connect
|
||||
- JWT with short expiration
|
||||
- Role-based access control (RBAC)
|
||||
```
|
||||
|
||||
**Threat Modeling**:
|
||||
```markdown
|
||||
## STRIDE Analysis
|
||||
|
||||
**Spoofing**: Can attacker impersonate user?
|
||||
- Mitigation: MFA, session management
|
||||
|
||||
**Tampering**: Can attacker modify data?
|
||||
- Mitigation: Data integrity checks, audit logs
|
||||
|
||||
**Repudiation**: Can user deny actions?
|
||||
- Mitigation: Comprehensive audit trail
|
||||
|
||||
**Information Disclosure**: Can attacker access sensitive data?
|
||||
- Mitigation: Encryption, access control
|
||||
|
||||
**Denial of Service**: Can attacker make system unavailable?
|
||||
- Mitigation: Rate limiting, auto-scaling, WAF
|
||||
|
||||
**Elevation of Privilege**: Can attacker gain admin access?
|
||||
- Mitigation: Least privilege, input validation
|
||||
```
|
||||
|
||||
### 5. Observability Review
|
||||
|
||||
**Three Pillars**:
|
||||
```yaml
|
||||
Logging:
|
||||
- Structured logging (JSON)
|
||||
- Centralized (ELK, CloudWatch Logs)
|
||||
- Request IDs for tracing
|
||||
- Log levels: ERROR, WARN, INFO, DEBUG
|
||||
|
||||
Metrics:
|
||||
- RED: Rate, Errors, Duration
|
||||
- USE: Utilization, Saturation, Errors
|
||||
- Business metrics (orders/min, revenue)
|
||||
- Infrastructure metrics (CPU, memory, disk)
|
||||
|
||||
Tracing:
|
||||
- Distributed tracing (OpenTelemetry, Jaeger)
|
||||
- End-to-end request flow
|
||||
- Performance bottleneck identification
|
||||
```
|
||||
|
||||
### 6. Architecture Decision Records (ADRs)
|
||||
|
||||
```markdown
|
||||
# ADR-001: Use PostgreSQL for Primary Database
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Need persistent storage for user data, transactions, and analytics.
|
||||
|
||||
## Decision
|
||||
Use PostgreSQL as primary database.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Pros**:
|
||||
- ACID compliance (strong consistency)
|
||||
- Rich query capabilities (joins, aggregations)
|
||||
- Mature ecosystem, wide adoption
|
||||
- JSON support for semi-structured data
|
||||
|
||||
**Cons**:
|
||||
- Vertical scaling limits (mitigated with read replicas)
|
||||
- Complex sharding if needed
|
||||
- Higher cost than NoSQL for massive scale
|
||||
|
||||
**Alternatives Considered**:
|
||||
- MongoDB: Less mature for transactions, eventual consistency
|
||||
- DynamoDB: Lock-in to AWS, limited query flexibility
|
||||
```
|
||||
|
||||
### 7. Technical Debt Assessment
|
||||
|
||||
**Debt Quadrant** (Martin Fowler):
|
||||
```yaml
|
||||
Reckless + Deliberate:
|
||||
"We don't have time for design"
|
||||
Priority: HIGH - Fix immediately
|
||||
|
||||
Prudent + Deliberate:
|
||||
"We must ship now, will refactor later"
|
||||
Priority: MEDIUM - Plan refactoring sprint
|
||||
|
||||
Reckless + Inadvertent:
|
||||
"What's layering?"
|
||||
Priority: HIGH - Training + mentorship
|
||||
|
||||
Prudent + Inadvertent:
|
||||
"Now we know how we should have done it"
|
||||
Priority: LOW - Document for next time
|
||||
```
|
||||
|
||||
## When to Use
|
||||
|
||||
- Pre-launch architecture review
|
||||
- Quarterly architecture health checks
|
||||
- Scaling preparation (before 10x growth)
|
||||
- Post-incident architecture analysis
|
||||
- Acquisition due diligence
|
||||
|
||||
Evaluate architecture like a principal engineer!
|
||||
Reference in New Issue
Block a user