Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:24:07 +08:00
commit 330645cc39
19 changed files with 4991 additions and 0 deletions

301
commands/system-design.md Normal file
View File

@@ -0,0 +1,301 @@
---
model: claude-opus-4-1
allowed-tools: Task, Read, Bash, Grep, Glob, Write
argument-hint: <system-name> [--depth=standard|deep] [--focus=architecture|scalability|trade-offs] [--generate-diagram=true|false]
description: Design complete systems with WHY, WHAT, HOW, CONSIDERATIONS, and DEEP-DIVE framework
---
# System Design Interview Coach
Complete framework for designing systems from problem to implementation. Includes WHY/WHAT/HOW structure, trade-off analysis, mermaid diagrams, and deep-dive optimizations.
## Interview Flow (60 minutes)
### Phase 1: Requirements & Context (5 minutes)
**Your goal**: Understand the problem deeply before designing
Ask clarifying questions:
- Scale: users, requests per second, data volume?
- Availability: SLA requirements (99.9%, 99.99%)?
- Latency: response time targets?
- Consistency: strong or eventual?
- Features: read-heavy, write-heavy, or balanced?
- Growth: expected growth rate?
**Interviewer's watching**:
- Do you ask the right questions?
- Do you understand the constraints?
- Can you estimate numbers?
### Phase 2: High-Level Architecture (10 minutes)
**Your goal**: Outline the system at 30,000 feet
Cover:
- Major components (load balancer, services, databases, caches)
- Communication patterns (sync/async, protocols)
- Data flow from user request to response
- Rough scalability approach
Draw simple diagram showing component interactions.
**Interviewer's watching**:
- Do you think in systems?
- Can you structure complexity?
- Do you know when to keep it simple?
### Phase 3: Detailed Component Design (20 minutes)
**Your goal**: Explain key components with confidence
Pick 2-3 components to discuss:
- How does this component work?
- Why this technology choice?
- What are the constraints it handles?
- How does it scale?
**Interviewer's watching**:
- Do you have technical depth?
- Can you justify decisions?
- Do you know trade-offs?
### Phase 4: Scalability & Trade-Offs (15 minutes)
**Your goal**: Show senior-level thinking
Discuss:
- Bottlenecks: What breaks first at 10x growth?
- Consistency: Strong vs eventual? Why?
- Reliability: Failure modes and recovery?
- Cost: What drives operational expense?
- Complexity: Is this operationally feasible?
**Interviewer's watching**:
- Do you think like a Staff engineer?
- Can you make principled trade-offs?
- Do you understand operational reality?
### Phase 5: Extensions & Deep-Dives (8 minutes)
**Your goal**: Demonstrate mastery
Address follow-up questions:
- "How would you handle [new requirement]?"
- "What's the hardest part of operating this?"
- "What would you optimize for [metric]?"
- "How would you debug this in production?"
**Interviewer's watching**:
- Are you thinking ahead?
- Can you handle surprises?
- Do you know what you don't know?
## System Design Framework
### WHY: Problem & Context
**What to cover**:
- Problem statement (1-2 sentences)
- Primary use cases (top 3-5)
- User base and growth expectations
- Non-functional requirements (scale, latency, availability)
- Business context (why does this matter?)
**For interviewer's benefit**:
- Shows you understand the problem before solving it
- Demonstrates customer empathy
- Proves you can estimate and scope
### WHAT: Components & Data Model
**What to cover**:
- Core entities (Users, Posts, Comments, etc.)
- Entity relationships
- Storage requirements (how much data?)
- Major services (Authentication, Feed, Search, etc.)
- API contracts (what endpoints do we need?)
**For interviewer's benefit**:
- Shows you think about data structure
- Demonstrates you can decompose systems
- Proves you understand component boundaries
### HOW: Architecture & Patterns
**What to cover**:
- Request flow (from user → response)
- Service architecture (monolith vs microservices decision)
- Communication patterns (synchronous, asynchronous, pub-sub)
- Storage topology (where does data live?)
- Caching strategy (where, what, how long?)
- Replication and failover
**For interviewer's benefit**:
- Shows you know architectural patterns
- Demonstrates systems thinking
- Proves you can make principled decisions
### CONSIDERATIONS: Trade-Offs & Reality
**What to analyze**:
**Consistency**
- Strong: Always get latest data (high latency, low availability)
- Eventual: Might get stale data (low latency, high availability)
- Your choice: "For [reason], we accept [consistency model]"
**Scalability**
- Vertical: Big machines (simpler, limited)
- Horizontal: More machines (complex, unlimited)
- Your choice: "We scale [direction] because [reason]"
**Reliability**
- Single point of failure? (bad)
- Replication strategy? (multiple copies)
- Disaster recovery? (backup and restore procedure)
- Your choice: "We replicate [this way] to handle [failure]"
**Cost**
- Storage: What's the cost per GB?
- Compute: What's the cost of this many servers?
- Bandwidth: What's the egress cost?
- Your choice: "This costs [X] but solves [Y]"
**Operational Complexity**
- How many different technologies?
- How hard is debugging?
- What's the on-call pain?
- Your choice: "We keep it simple: [reason]"
### DEEP-DIVE: Component Optimization
**For each major component**, be prepared to discuss:
1. **Bottleneck Analysis**
- What's the scaling limit?
- Where would we hit the wall first?
- How do we know?
2. **Optimization Opportunities**
- What could we do to handle more load?
- What are the trade-offs?
- When is this optimization worth doing?
3. **Failure Modes**
- What if [component] fails?
- How do we detect it?
- How do we recover?
4. **Operational Concerns**
- How do we monitor this?
- What metrics matter?
- How do we debug issues?
5. **Alternative Approaches**
- What's another way to design this?
- When would you choose it?
- What problems does it have?
## Mermaid Diagram Strategy
Create diagrams that show:
1. **Architecture Diagram**: Components and communication
2. **Data Flow**: Request path through the system
3. **Database Schema**: Key entities and relationships
**Tips**:
- Keep diagrams simple initially
- Add detail when asked
- Label important decisions
- Annotate bottlenecks
## Example: Design Facebook Feed
### WHY
- **Problem**: Show users their friends' posts in a personalized, real-time feed
- **Use Cases**:
1. User opens app → see recent posts from friends
2. Friend posts → appears in followers' feeds quickly
3. Massive scale: billions of posts, minutes of latency acceptable
- **Requirements**:
- Read-heavy (100:1 read to write ratio)
- Latency: Feed load < 200ms
- Availability: 99.99%
- Consistency: Eventual OK (a few minutes lag acceptable)
### WHAT
- **Entities**: User, Post, Friendship, Like, Comment
- **Relationships**: User → Post (1:many), User → Friend (many:many)
- **Storage**: Posts: 100s of billions, User data: billions
- **Services**: Auth, Post Creation, Feed Service, Search
- **APIs**:
- POST /posts (create)
- GET /feed (get user's feed)
- POST /posts/{id}/like (like post)
### HOW
- Load balancers distribute requests
- Stateless web servers handle auth and routing
- Post service writes posts to database
- Feed service reads from cache first, database second
- Cache layer (Redis) stores hot posts
- Fanout on write: When user posts, push to all followers' feeds
- Asynchronous: Queue for fanout, workers process
### CONSIDERATIONS
- **Consistency**: Eventual consistency (a few second lag OK)
- **Scalability**: Horizontal—more servers as needed
- **Reliability**: Multi-region replication for availability
- **Cost**: Balance storage vs computation
- **Complexity**: Fanout-on-write is complex but enables fast reads
### DEEP-DIVE
1. **Fanout Bottleneck**: Celebrity posts with 100M followers?
- Solution: Hybrid fanout—fanout for normal users, cache for celebrities
2. **Feed Personalization**: How do we rank posts?
- Solution: ML model, but start with recency + engagement
3. **Real-time Updates**: How do we push new posts?
- Solution: Long-polling, WebSockets, or event stream
## Talking Points During Interview
**When introducing your design**:
- "Let me outline the system at a high level..."
- "The key insight here is [insight]"
- "This design makes [requirement] easy"
**When defending a choice**:
- "We chose [option] because [constraint] → [option] is better"
- "The trade-off is [cost] for [benefit]"
- "This would change if [different constraint]"
**When asked about scaling**:
- "Currently [component] is the bottleneck"
- "We'd scale [direction] because [reason]"
- "This approach works until [limit], then we'd [next evolution]"
**When asked about failure**:
- "If [component] fails, [other component] takes over"
- "We'd detect it via [monitoring], then [recovery action]"
- "This is why we replicate [data/component]"
## Red Flags to Avoid
❌ Diving into implementation details too early
❌ Not asking clarifying questions
❌ Designing for scale you don't need
❌ Making technology choices without justification
❌ Ignoring operational reality
❌ Treating consistency/availability as separate concerns
❌ Not discussing trade-offs
✓ Start broad, add detail on request
✓ Ask clarifying questions upfront
✓ Design for the specified scale
✓ Justify technology choices
✓ Consider how humans operate it
✓ Explicitly discuss trade-offs
✓ Show you understand what you don't know
## Success Criteria
You're ready when you can:
- ✓ Clarify ambiguous requirements with good questions
- ✓ Outline architecture clearly on a whiteboard
- ✓ Explain each component's role
- ✓ Justify your technology choices
- ✓ Discuss trade-offs explicitly
- ✓ Handle "what if" questions with confidence
- ✓ Show understanding of operational reality
- ✓ Demonstrate Staff+ systems thinking