Back to roadmap
Module 1 · Foundations & MethodDay 00125 min

What System Design Actually Is

Why interviews and real engineering both demand the same skill.

Day 001

What System Design Actually Is

System
service
Scale
note
Reliability
note
Cost
note
Changeability
note
Signal path
The four perennial concerns of any large system
Scale
note
async
System
service
Reliability
note
async
System
service
Cost
note
async
System
service
Memory hook

What System Design Actually Is: why interviews and real engineering both demand the same skill

Mental model

frame the problem before drawing the system

Design lens

Postgres is operationally simpler and the team knows it well.

Recall anchors
ConcernsInputsOutputs

Why it matters

System design is the discipline of choosing the right shapes for a problem before you build it: which boxes exist, what the lines between them mean, where state lives, and how each part fails. It is the bridge between product requirements and code, and the place where most expensive mistakes get baked in.

Deep dive

Most engineering bugs are local: a wrong loop, a wrong type, a missing null check. System design bugs are global: a database that cannot shard, a queue with no backpressure, a write path that cannot survive a region failure. They are cheap to prevent on a whiteboard and expensive to fix in production, which is why interviews — and real launch reviews — keep returning to the same exercise.

A system design always answers four questions at once. How does it scale when traffic grows by 10x? How does it stay available when a node, zone, or region disappears? How much does it cost to run, and how does that change with growth? How easy is it to change — to ship a new feature, fix a bug, or migrate a datastore?

These four pull against each other. A design that scales infinitely is rarely the cheapest. A design that is most reliable is rarely the easiest to change. Good engineers learn to recognize the tradeoff curves and pick a point that matches the product, not the other way around.

Demo / scenario

A teammate says 'just use Postgres' for a feature that will see 50k writes per second of small events.

  1. Sketch the write path: client → API → Postgres single primary.
  2. Estimate: 50k writes/sec × ~500 bytes ≈ 25 MB/s sustained, ~2 TB/day.
  3. Recognize the bottleneck: a single primary will hit IOPS and WAL limits long before storage.
  4. Reframe the choice: append-mostly events fit a log (Kafka) or a wide-column store (Cassandra) better than a row store.

Tradeoffs

  • Postgres is operationally simpler and the team knows it well.
  • A log/append store scales writes but adds query and join complexity.
  • Hybrid is common: events to a log, aggregates rolled into Postgres.

Diagram

System
Scale
10x traffic
Reliability
Failure is normal
Cost
$ per request
Changeability
Ship safely
The four perennial concerns of any large system.

Mind map

Check yourself

Loading quiz…

Sources & further reading