Back to roadmap
Module 1 · Foundations & MethodDay 00425 min

SLA, SLO, SLI, and Error Budgets

How you express 'reliable enough' in numbers — and govern by them.

Day 004

SLA, SLO, SLI, and Error Budgets

Users
client
SLI
service
SLO
service
Error Budget
datastore
Signal path
From SLI measurements to SLO target to error bu...
Users
client
flow
SLI
service
SLI
service
flow
SLO
service
SLO
service
flow
Error Budget
datastore
Memory hook

SLA, SLO, SLI, and Error Budgets: how you express 'reliable enough' in numbers

Mental model

frame the problem before drawing the system

Design lens

Stricter SLO = less velocity, more reliability work.

Recall anchors
SLISLOSLA

Why it matters

An SLI is what you measure (success rate, latency). An SLO is the target you commit to internally (99.9% success). An SLA is the contract you sign externally with a customer, usually with refunds attached. The error budget is 1 − SLO; spend it on velocity until it runs out.

Deep dive

Pick SLIs that match user experience, not server health. 'Successful checkout in under 1s' beats 'CPU < 80%'. SLIs should be expressible as 'good events / valid events'.

SLO targets should be lower than user expectations, not higher. Aiming at 100% is a sign of inexperience: it's expensive, blocks experimentation, and trains users to expect what you cannot sustainably deliver.

Error budgets translate reliability into product velocity. If the SLO is 99.9% over 30 days, you can be down ~43 minutes/month. Below budget: ship faster, take risks. Out of budget: freeze, fix reliability work first. This is the SRE control loop.

Demo / scenario

Service has 99.95% checkout SLO. Last 30 days: 99.91% success.

  1. SLI: successful_checkouts / valid_checkouts.
  2. SLO: 99.95% over 30 days → budget = 0.05% × ~1M reqs = 500 errors.
  3. Actual: 0.09% errors = 900 errors.
  4. Budget overspent by 400 → freeze risky deploys, prioritize root-cause fixes.

Tradeoffs

  • Stricter SLO = less velocity, more reliability work.
  • Looser SLO = more shipping risk, may lose customers.
  • Per-region SLOs reveal failures hidden by global averages.

Diagram

Users
SLI
Measure
SLO
Target
Error Budget
1 - SLO
From SLI measurements to SLO target to error budget governance.

Mind map

Check yourself

Loading quiz…

Sources & further reading