9.6 KiB
model, allowed-tools, argument-hint, description
| model | allowed-tools | argument-hint | description |
|---|---|---|---|
| claude-opus-4-1 | Task, Read, Bash, Grep, Glob, Write | <system-name> [--depth=standard|deep] [--focus=architecture|scalability|trade-offs] [--generate-diagram=true|false] | Design complete systems with WHY, WHAT, HOW, CONSIDERATIONS, and DEEP-DIVE framework |
System Design Interview Coach
Complete framework for designing systems from problem to implementation. Includes WHY/WHAT/HOW structure, trade-off analysis, mermaid diagrams, and deep-dive optimizations.
Interview Flow (60 minutes)
Phase 1: Requirements & Context (5 minutes)
Your goal: Understand the problem deeply before designing
Ask clarifying questions:
- Scale: users, requests per second, data volume?
- Availability: SLA requirements (99.9%, 99.99%)?
- Latency: response time targets?
- Consistency: strong or eventual?
- Features: read-heavy, write-heavy, or balanced?
- Growth: expected growth rate?
Interviewer's watching:
- Do you ask the right questions?
- Do you understand the constraints?
- Can you estimate numbers?
Phase 2: High-Level Architecture (10 minutes)
Your goal: Outline the system at 30,000 feet
Cover:
- Major components (load balancer, services, databases, caches)
- Communication patterns (sync/async, protocols)
- Data flow from user request to response
- Rough scalability approach
Draw simple diagram showing component interactions.
Interviewer's watching:
- Do you think in systems?
- Can you structure complexity?
- Do you know when to keep it simple?
Phase 3: Detailed Component Design (20 minutes)
Your goal: Explain key components with confidence
Pick 2-3 components to discuss:
- How does this component work?
- Why this technology choice?
- What are the constraints it handles?
- How does it scale?
Interviewer's watching:
- Do you have technical depth?
- Can you justify decisions?
- Do you know trade-offs?
Phase 4: Scalability & Trade-Offs (15 minutes)
Your goal: Show senior-level thinking
Discuss:
- Bottlenecks: What breaks first at 10x growth?
- Consistency: Strong vs eventual? Why?
- Reliability: Failure modes and recovery?
- Cost: What drives operational expense?
- Complexity: Is this operationally feasible?
Interviewer's watching:
- Do you think like a Staff engineer?
- Can you make principled trade-offs?
- Do you understand operational reality?
Phase 5: Extensions & Deep-Dives (8 minutes)
Your goal: Demonstrate mastery
Address follow-up questions:
- "How would you handle [new requirement]?"
- "What's the hardest part of operating this?"
- "What would you optimize for [metric]?"
- "How would you debug this in production?"
Interviewer's watching:
- Are you thinking ahead?
- Can you handle surprises?
- Do you know what you don't know?
System Design Framework
WHY: Problem & Context
What to cover:
- Problem statement (1-2 sentences)
- Primary use cases (top 3-5)
- User base and growth expectations
- Non-functional requirements (scale, latency, availability)
- Business context (why does this matter?)
For interviewer's benefit:
- Shows you understand the problem before solving it
- Demonstrates customer empathy
- Proves you can estimate and scope
WHAT: Components & Data Model
What to cover:
- Core entities (Users, Posts, Comments, etc.)
- Entity relationships
- Storage requirements (how much data?)
- Major services (Authentication, Feed, Search, etc.)
- API contracts (what endpoints do we need?)
For interviewer's benefit:
- Shows you think about data structure
- Demonstrates you can decompose systems
- Proves you understand component boundaries
HOW: Architecture & Patterns
What to cover:
- Request flow (from user → response)
- Service architecture (monolith vs microservices decision)
- Communication patterns (synchronous, asynchronous, pub-sub)
- Storage topology (where does data live?)
- Caching strategy (where, what, how long?)
- Replication and failover
For interviewer's benefit:
- Shows you know architectural patterns
- Demonstrates systems thinking
- Proves you can make principled decisions
CONSIDERATIONS: Trade-Offs & Reality
What to analyze:
Consistency
- Strong: Always get latest data (high latency, low availability)
- Eventual: Might get stale data (low latency, high availability)
- Your choice: "For [reason], we accept [consistency model]"
Scalability
- Vertical: Big machines (simpler, limited)
- Horizontal: More machines (complex, unlimited)
- Your choice: "We scale [direction] because [reason]"
Reliability
- Single point of failure? (bad)
- Replication strategy? (multiple copies)
- Disaster recovery? (backup and restore procedure)
- Your choice: "We replicate [this way] to handle [failure]"
Cost
- Storage: What's the cost per GB?
- Compute: What's the cost of this many servers?
- Bandwidth: What's the egress cost?
- Your choice: "This costs [X] but solves [Y]"
Operational Complexity
- How many different technologies?
- How hard is debugging?
- What's the on-call pain?
- Your choice: "We keep it simple: [reason]"
DEEP-DIVE: Component Optimization
For each major component, be prepared to discuss:
-
Bottleneck Analysis
- What's the scaling limit?
- Where would we hit the wall first?
- How do we know?
-
Optimization Opportunities
- What could we do to handle more load?
- What are the trade-offs?
- When is this optimization worth doing?
-
Failure Modes
- What if [component] fails?
- How do we detect it?
- How do we recover?
-
Operational Concerns
- How do we monitor this?
- What metrics matter?
- How do we debug issues?
-
Alternative Approaches
- What's another way to design this?
- When would you choose it?
- What problems does it have?
Mermaid Diagram Strategy
Create diagrams that show:
- Architecture Diagram: Components and communication
- Data Flow: Request path through the system
- Database Schema: Key entities and relationships
Tips:
- Keep diagrams simple initially
- Add detail when asked
- Label important decisions
- Annotate bottlenecks
Example: Design Facebook Feed
WHY
- Problem: Show users their friends' posts in a personalized, real-time feed
- Use Cases:
- User opens app → see recent posts from friends
- Friend posts → appears in followers' feeds quickly
- Massive scale: billions of posts, minutes of latency acceptable
- Requirements:
- Read-heavy (100:1 read to write ratio)
- Latency: Feed load < 200ms
- Availability: 99.99%
- Consistency: Eventual OK (a few minutes lag acceptable)
WHAT
- Entities: User, Post, Friendship, Like, Comment
- Relationships: User → Post (1:many), User → Friend (many:many)
- Storage: Posts: 100s of billions, User data: billions
- Services: Auth, Post Creation, Feed Service, Search
- APIs:
- POST /posts (create)
- GET /feed (get user's feed)
- POST /posts/{id}/like (like post)
HOW
- Load balancers distribute requests
- Stateless web servers handle auth and routing
- Post service writes posts to database
- Feed service reads from cache first, database second
- Cache layer (Redis) stores hot posts
- Fanout on write: When user posts, push to all followers' feeds
- Asynchronous: Queue for fanout, workers process
CONSIDERATIONS
- Consistency: Eventual consistency (a few second lag OK)
- Scalability: Horizontal—more servers as needed
- Reliability: Multi-region replication for availability
- Cost: Balance storage vs computation
- Complexity: Fanout-on-write is complex but enables fast reads
DEEP-DIVE
- Fanout Bottleneck: Celebrity posts with 100M followers?
- Solution: Hybrid fanout—fanout for normal users, cache for celebrities
- Feed Personalization: How do we rank posts?
- Solution: ML model, but start with recency + engagement
- Real-time Updates: How do we push new posts?
- Solution: Long-polling, WebSockets, or event stream
Talking Points During Interview
When introducing your design:
- "Let me outline the system at a high level..."
- "The key insight here is [insight]"
- "This design makes [requirement] easy"
When defending a choice:
- "We chose [option] because [constraint] → [option] is better"
- "The trade-off is [cost] for [benefit]"
- "This would change if [different constraint]"
When asked about scaling:
- "Currently [component] is the bottleneck"
- "We'd scale [direction] because [reason]"
- "This approach works until [limit], then we'd [next evolution]"
When asked about failure:
- "If [component] fails, [other component] takes over"
- "We'd detect it via [monitoring], then [recovery action]"
- "This is why we replicate [data/component]"
Red Flags to Avoid
❌ Diving into implementation details too early ❌ Not asking clarifying questions ❌ Designing for scale you don't need ❌ Making technology choices without justification ❌ Ignoring operational reality ❌ Treating consistency/availability as separate concerns ❌ Not discussing trade-offs
✓ Start broad, add detail on request ✓ Ask clarifying questions upfront ✓ Design for the specified scale ✓ Justify technology choices ✓ Consider how humans operate it ✓ Explicitly discuss trade-offs ✓ Show you understand what you don't know
Success Criteria
You're ready when you can:
- ✓ Clarify ambiguous requirements with good questions
- ✓ Outline architecture clearly on a whiteboard
- ✓ Explain each component's role
- ✓ Justify your technology choices
- ✓ Discuss trade-offs explicitly
- ✓ Handle "what if" questions with confidence
- ✓ Show understanding of operational reality
- ✓ Demonstrate Staff+ systems thinking