Module 6 · Relational Data at ScaleDay 05730 min

Sharding Strategies

When one DB isn't enough — split by key.

← Previous Next →

Day 057

Sharding Strategies

30m

focus

App

service

Shard 1

datastore

Shard 2

datastore

Shard 3

datastore

Signal path

Hash sharding by user_id

App

service

flow

Shard 1

datastore

App

service

flow

Shard 2

datastore

App

service

flow

Shard 3

datastore

Memory hook

Sharding Strategies: when one db isn't enough

Mental model

shape data so reads and writes stay honest

Design lens

Bucket scheme makes resharding much cheaper.

Recall anchors

StrategiesConcerns

Why it matters

Sharding partitions a dataset across multiple DBs by a key. Hash sharding spreads load uniformly; range sharding helps locality; directory sharding adds a lookup layer.

1Compare hash, range, and directory sharding.
2Pick a shard key that distributes load.
3Plan for rebalancing.

Deep dive

Hash: SHA(user_id) % N. Even distribution; range queries painful.

Range: by user_id ranges. Good for sequential access; risk hot ranges.

Directory: shard map service. Flexible, but the map is a SPOF.

Cross-shard queries are the hardest part — design to avoid.

Demo / scenario

Shard a 5 TB user table.

Pick shard key = user_id (hash).
Rebalance: bucket-based (1024 buckets across N shards).
Add shards by reassigning buckets, not rehashing all rows.
Cross-shard queries via fan-out + merge.

Tradeoffs

Bucket scheme makes resharding much cheaper.
Hot user can still saturate one shard.
Joins across shards: avoid or denormalize.

Diagram

Hash sharding by user_id.

Mind map

Check yourself

Loading quiz…

Sources & further reading

Vitess sharding

← Previous Next →