Back to roadmap
Module 6 · Relational Data at ScaleDay 05730 min

Sharding Strategies

When one DB isn't enough — split by key.

Day 057

Sharding Strategies

App
service
Shard 1
datastore
Shard 2
datastore
Shard 3
datastore
Signal path
Hash sharding by user_id
App
service
flow
Shard 1
datastore
App
service
flow
Shard 2
datastore
App
service
flow
Shard 3
datastore
Memory hook

Sharding Strategies: when one db isn't enough

Mental model

shape data so reads and writes stay honest

Design lens

Bucket scheme makes resharding much cheaper.

Recall anchors
StrategiesConcerns

Why it matters

Sharding partitions a dataset across multiple DBs by a key. Hash sharding spreads load uniformly; range sharding helps locality; directory sharding adds a lookup layer.

Deep dive

Hash: SHA(user_id) % N. Even distribution; range queries painful.

Range: by user_id ranges. Good for sequential access; risk hot ranges.

Directory: shard map service. Flexible, but the map is a SPOF.

Cross-shard queries are the hardest part — design to avoid.

Demo / scenario

Shard a 5 TB user table.

  1. Pick shard key = user_id (hash).
  2. Rebalance: bucket-based (1024 buckets across N shards).
  3. Add shards by reassigning buckets, not rehashing all rows.
  4. Cross-shard queries via fan-out + merge.

Tradeoffs

  • Bucket scheme makes resharding much cheaper.
  • Hot user can still saturate one shard.
  • Joins across shards: avoid or denormalize.

Diagram

App
Shard 1
Shard 2
Shard 3
Hash sharding by user_id.

Mind map

Check yourself

Loading quiz…

Sources & further reading