Files
gh-lyndonkl-claude/skills/adr-architecture/resources/examples/database-selection.md
2025-11-30 08:38:26 +08:00

11 KiB

ADR-042: Use PostgreSQL for Primary Application Database

Status: Accepted Date: 2024-01-15 Deciders: Backend team (Sarah, James, Alex), CTO (Michael), DevOps lead (Christine) Related ADRs: ADR-015 (Data Model Design), ADR-051 (Read Replica Strategy - pending)

Context

Background

Our new SaaS platform for project management is scheduled to launch Q2 2024. We need to select a primary database that will store user data, projects, tasks, and collaboration information for the next 3-5 years.

Current situation:

  • Prototype uses SQLite (clearly insufficient for production)
  • Expected launch: 500 organizations, ~5,000 users
  • Growth projection: 10,000 organizations, ~100,000 users within 18 months
  • Data model is relational with complex queries (projects → tasks → subtasks → comments → attachments)

Requirements

Functional:

  • Support for complex relational queries with JOINs across 4-6 tables
  • ACID transactions (critical for billing and permissions)
  • Full-text search across project content
  • JSON support for flexible metadata fields
  • Row-level security for multi-tenant isolation

Non-Functional:

  • Handle 10,000 QPS at launch (mostly reads)
  • < 100ms p95 latency for queries
  • 99.9% uptime SLA
  • Support for read replicas (anticipated need at 50k+ QPS)
  • Point-in-time recovery for disaster recovery

Constraints

  • Budget: $5,000/month maximum for database infrastructure
  • Team expertise: Strong SQL experience, limited NoSQL experience
  • Timeline: Must finalize in 2 weeks to stay on schedule
  • Compliance: SOC 2 Type II required (data encryption at rest/transit)
  • Existing stack: Node.js backend, React frontend, deploying on AWS

Decision

We will use PostgreSQL 15+ as our primary application database, hosted on AWS RDS with the following configuration:

Infrastructure:

  • AWS RDS PostgreSQL 15.x
  • Initially: db.r6g.xlarge instance (4 vCPU, 32GB RAM)
  • Multi-AZ deployment for high availability
  • Automated daily backups with 7-day retention
  • Point-in-time recovery enabled

Architecture:

  • Single primary database initially
  • Prepared for read replicas when QPS exceeds 40k (anticipated 12-18 months)
  • Connection pooling via PgBouncer (deployed on application servers)
  • Row-Level Security (RLS) policies for multi-tenancy

Scope:

  • All application data (users, organizations, projects, tasks)
  • Session storage (using pgSession)
  • Background job queue (using pg-boss)
  • Excludes: Analytics data (separate data warehouse), file metadata (DynamoDB)

Alternatives Considered

MySQL 8.0

Description: Popular open-source relational database, strong AWS RDS support

Pros:

  • Team has some MySQL experience
  • Excellent AWS RDS integration
  • Strong replication support
  • Lower cost than commercial databases

Cons:

  • Weaker JSON support compared to PostgreSQL (JSON functions less mature)
  • Less robust constraint enforcement (e.g., CHECK constraints)
  • Full-text search less powerful than PostgreSQL's
  • InnoDB row-level locking can be problematic under high concurrency

Why not chosen: PostgreSQL's superior JSON support is critical for our flexible metadata requirements. Our data model has complex constraints that PostgreSQL handles more elegantly.

MongoDB Atlas

Description: Managed NoSQL document database with flexible schema

Pros:

  • Excellent horizontal scalability
  • Flexible schema for evolving data model
  • Strong JSON/document support
  • Good full-text search

Cons:

  • No multi-document ACID transactions (critical for our billing logic)
  • Team has limited NoSQL experience (learning curve risk)
  • Eventual consistency model incompatible with our requirements
  • JOIN-like operations ($lookup) are slow and cumbersome
  • More expensive at our scale (~$7k/month vs $3k for PostgreSQL)

Why not chosen: Lack of ACID transactions across documents is a dealbreaker for billing and permission changes. Our relational data model doesn't fit document paradigm well.

Amazon Aurora PostgreSQL

Description: AWS's PostgreSQL-compatible database with performance enhancements

Pros:

  • PostgreSQL compatibility with AWS optimizations
  • Better read scaling (15 read replicas vs 5)
  • Faster failover (< 30s vs 60-120s)
  • Continuous backup to S3

Cons:

  • 20-30% more expensive than RDS PostgreSQL
  • Some PostgreSQL extensions not supported
  • Vendor lock-in to AWS (harder to migrate to other clouds)
  • Adds complexity we don't need yet

Why not chosen: Premium cost not justified at our current scale. Standard RDS PostgreSQL meets our needs. We can migrate to Aurora later if needed (minimal code changes).

CockroachDB

Description: Distributed SQL database with PostgreSQL compatibility

Pros:

  • Horizontal scalability built-in
  • Multi-region support for global deployment
  • PostgreSQL wire protocol compatibility
  • Strong consistency guarantees

Cons:

  • Significantly more complex to operate (distributed systems expertise needed)
  • Higher latency for single-region workloads (consensus overhead)
  • Limited ecosystem compared to PostgreSQL
  • Team has zero distributed database experience
  • More expensive (~2-3x cost of RDS PostgreSQL)

Why not chosen: Operational complexity far exceeds our current needs. We're a single-region deployment for the foreseeable future. Can revisit if we expand globally.

Consequences

Benefits

Strong Data Integrity:

  • ACID transactions ensure billing accuracy and permission consistency
  • Robust constraint enforcement catches data errors at write-time
  • Foreign keys prevent orphaned records

Excellent Query Capabilities:

  • Complex JOINs perform well with proper indexing
  • Window functions enable sophisticated analytics
  • CTEs (Common Table Expressions) simplify complex query logic
  • Full-text search with GIN indexes for project content search

JSON Flexibility:

  • JSONB type allows flexible metadata without schema migrations
  • JSON operators enable querying nested structures efficiently
  • Balances schema enforcement (relations) with flexibility (JSON)

Team Productivity:

  • Team's SQL expertise means fast development velocity
  • Mature ORM support (Sequelize, TypeORM) accelerates development
  • Extensive community resources and documentation
  • Familiar debugging and optimization tools

Operational Maturity:

  • AWS RDS handles backups, patching, monitoring automatically
  • Point-in-time recovery provides disaster recovery
  • Multi-AZ deployment ensures high availability
  • Well-understood scaling path (read replicas, connection pooling)

Cost Efficiency:

  • Estimated $3,000/month at launch scale (db.r6g.xlarge + storage)
  • Scales to ~$8,000/month with read replicas (at 100k users)
  • Well within $5k/month budget initially

Drawbacks

Vertical Scaling Limits:

  • Single primary database limits write throughput to one instance
  • At ~50-60k QPS, will need read replicas (adds operational complexity)
  • Ultimate write limit around 100k QPS even with largest instance
  • Mitigation: Implement caching (Redis) for read-heavy workloads

Sharding Complexity:

  • Horizontal partitioning (sharding) is manual and complex
  • If we exceed single-instance limits, migration to sharded setup is expensive
  • Not as straightforward as DynamoDB or Cassandra for horizontal scaling
  • Mitigation: Monitor growth carefully; consider Aurora or CockroachDB if needed

Replication Lag:

  • Read replicas have eventual consistency (typically 10-100ms lag)
  • Application must handle stale reads if using replicas
  • Some queries must route to primary for consistency
  • Mitigation: Use replicas only for analytics and non-critical reads

Backup Window:

  • Automated backups cause brief I/O pause (usually < 5s)
  • Scheduled during low-traffic window (3-4 AM PST)
  • Multi-AZ deployment minimizes impact
  • Mitigation: Accept brief latency spike during backup window

Risks

Performance Bottleneck:

  • Risk: Single database becomes bottleneck before we implement read replicas
  • Likelihood: Medium (depends on growth rate)
  • Mitigation: Implement aggressive caching (Redis) for frequently accessed data; monitor QPS weekly; prepare read replica configuration in advance

Data Migration Challenges:

  • Risk: If we need to migrate to different database, data size makes migration slow
  • Likelihood: Low (PostgreSQL should serve us for 3-5 years)
  • Mitigation: Regularly test backup/restore procedures; maintain clear data export processes

Team Scaling:

  • Risk: As team grows, need to train new hires on PostgreSQL specifics (RLS, JSONB)
  • Likelihood: High (we plan to grow team)
  • Mitigation: Document database patterns; create onboarding materials; conduct code reviews

Trade-offs Accepted

Trading horizontal scalability for operational simplicity: We're choosing a database that's simple to operate now but harder to scale horizontally later, accepting that we may need to re-architect in 3-5 years if we grow beyond single-instance limits.

Trading NoSQL flexibility for data integrity: We're prioritizing ACID guarantees and relational integrity over schema flexibility, accepting that schema migrations will be required for data model changes.

Trading vendor portability for convenience: AWS RDS lock-in is acceptable given the operational benefits. We could migrate to other managed PostgreSQL services (Google Cloud SQL, Azure) if needed, though with effort.

Implementation

Rollout Plan

Phase 1: Setup (Week 1-2)

  • Provision AWS RDS PostgreSQL instance
  • Configure VPC security groups and IAM roles
  • Set up automated backups and monitoring
  • Configure PgBouncer connection pooling

Phase 2: Migration (Week 3-4)

  • Migrate schema from SQLite prototype
  • Load seed data and test data
  • Performance test with simulated load
  • Configure monitoring alerts (CloudWatch, Datadog)

Phase 3: Launch (Q2 2024)

  • Deploy to production
  • Monitor query performance and optimize slow queries
  • Weekly capacity review for first 3 months

Success Criteria

Technical:

  • p95 query latency < 100ms (measured via APM)
  • Zero data integrity issues in first 6 months
  • 99.9% uptime achieved

Operational:

  • Team can confidently make schema changes
  • Backup/restore tested and verified monthly
  • On-call incidents < 2 per month related to database

Business:

  • Database costs remain under $5k/month through 10k users
  • Support 100k users without re-architecture

Future Considerations

Short-term (3-6 months):

  • Implement Redis caching for hot data paths
  • Tune connection pool settings based on actual load
  • Create read-only database user for analytics

Medium-term (6-18 months):

  • Add read replicas when QPS exceeds 40k
  • Implement query result caching
  • Consider Aurora migration if cost-benefit justifies

Long-term (18+ months):

  • Evaluate sharding strategy if approaching single-instance limits
  • Consider multi-region deployment for global users
  • Explore specialized databases for specific workloads (e.g., time-series data)

References