Module 2 · Scale, Latency, CapacityDay 01425 min

Identifying Bottlenecks

There is exactly one bottleneck at any moment — find it.

← Previous Next →

Day 014

Identifying Bottlenecks

25m

focus

CPU

service

Memory

service

Disk

service

Network

service

Locks

service

Signal path

USE method per resource

USE

note

async

CPU

service

USE

note

async

Memory

service

USE

note

async

Disk

service

Memory hook

Identifying Bottlenecks: there is exactly one bottleneck at any moment

Mental model

make the invisible limits visible

Design lens

Bigger pool risks DB saturation.

Recall anchors

ComputeMemoryI/O

Why it matters

Performance work is a hunt. A system has many resources, but at any throughput one of them is the binding constraint. The USE method — measure utilization, saturation, errors per resource — is the fastest way to find it.

1Apply USE method (Utilization, Saturation, Errors).
2Know the canonical bottlenecks: CPU, memory, disk, network, locks.
3Recognize symptoms vs causes.

Deep dive

CPU saturated → run flame graphs; algorithm or hot loop.

Memory pressured → GC pauses, swap, OOM kills.

Disk I/O saturated → iostat shows 100% util; investigate WAL, indexes, page cache.

Network → check TCP retransmits, NIC saturation, cross-AZ surprises.

Locks/contention → tail latency without resource saturation often points here.

Demo / scenario

API p99 spikes at noon, p50 unchanged.

Check CPU: average 60%, no spike.
Check DB: connection pool exhausted at noon.
Cause: a slow report query holds connections.
Fix: separate pool for analytics or move report to read replica.

Tradeoffs

Bigger pool risks DB saturation.
Read replica adds replication lag.
Async report avoids the blocking call entirely.

Diagram

USE method per resource.

Mind map

Check yourself

Loading quiz…

Sources & further reading

USE method — Brendan Gregg

← Previous Next →