Back to roadmap
Module 2 · Scale, Latency, CapacityDay 01425 min

Identifying Bottlenecks

There is exactly one bottleneck at any moment — find it.

Day 014

Identifying Bottlenecks

CPU
service
Memory
service
Disk
service
Network
service
Locks
service
Signal path
USE method per resource
USE
note
async
CPU
service
USE
note
async
Memory
service
USE
note
async
Disk
service
Memory hook

Identifying Bottlenecks: there is exactly one bottleneck at any moment

Mental model

make the invisible limits visible

Design lens

Bigger pool risks DB saturation.

Recall anchors
ComputeMemoryI/O

Why it matters

Performance work is a hunt. A system has many resources, but at any throughput one of them is the binding constraint. The USE method — measure utilization, saturation, errors per resource — is the fastest way to find it.

Deep dive

CPU saturated → run flame graphs; algorithm or hot loop.

Memory pressured → GC pauses, swap, OOM kills.

Disk I/O saturated → iostat shows 100% util; investigate WAL, indexes, page cache.

Network → check TCP retransmits, NIC saturation, cross-AZ surprises.

Locks/contention → tail latency without resource saturation often points here.

Demo / scenario

API p99 spikes at noon, p50 unchanged.

  1. Check CPU: average 60%, no spike.
  2. Check DB: connection pool exhausted at noon.
  3. Cause: a slow report query holds connections.
  4. Fix: separate pool for analytics or move report to read replica.

Tradeoffs

  • Bigger pool risks DB saturation.
  • Read replica adds replication lag.
  • Async report avoids the blocking call entirely.

Diagram

CPU
Memory
Disk
Network
Locks
USE
Util/Sat/Err
USE method per resource.

Mind map

Check yourself

Loading quiz…

Sources & further reading