bluecloud.dev

SRE: Running production.

SLOs and error budgets, observability that gets read during incidents, on-call practice, post-incident review, and the reduction of toil to recover engineering time.

Articles

  1. A service-level objective is not a number on a dashboard. It is an answer to the question 'how reliable is reliable enough — and to whom?' Without that framing, the math becomes reliability theater.