Back to roadmap
Module 9 · Protocols, Security, ObservabilityDay 08925 min

Alerting: Symptom vs Cause

Page on user pain; chart on suspect causes.

Day 089

Alerting: Symptom vs Cause

Symptom: SLO burn
service
Page
client
Causes (CPU, GC, DB)
service
Dashboard
client
Signal path
Symptom alerts on top, causes underneath
Symptom: SLO burn
service
flow
Page
client
Causes (CPU, GC, DB)
service
flow
Dashboard
client
Memory hook

Alerting: Symptom vs Cause: page on user pain

Mental model

design for the day something breaks

Design lens

Burn-rate alerts catch slow-degradation events too.

Recall anchors
Symptom (page)Cause (chart)Burn rate

Why it matters

Pages are expensive — they wake people up. Alert only on user-visible symptoms tied to SLOs. Cause-based alerts go on dashboards; investigators discover them after a symptom alert.

Deep dive

Burn-rate alerts: fast burn (1h) for major incidents, slow burn (24h) for chronic budget consumption.

Two-window alert reduces flaps: alert when both 5m and 1h burn rates exceed thresholds.

Alert fatigue is the #1 reason real outages get missed; tune ruthlessly.

Demo / scenario

Latency p99 alert flapping.

  1. Replace with SLO burn-rate alert.
  2. Tune two-window: 5m × 1h.
  3. Pages drop 80%; real ones still page.
  4. Cause-based dashboards remain for investigation.

Tradeoffs

  • Burn-rate alerts catch slow-degradation events too.
  • Multi-window adds complexity.
  • Tuning needs historical data.

Diagram

Symptom: SLO burn
Page
Causes (CPU, GC, DB)
Dashboard
Symptom alerts on top, causes underneath.

Mind map

Check yourself

Loading quiz…

Sources & further reading