One of the core tenets of Site Reliability Engineering (SRE) is that blameless postmortems / retrospectives should be held for oncall incidents. Its part of the continuous improvement process where we learn from what went wrong and try and create processes to ensure it doesn’t happen again. Very explicitly it is not about blaming anyone…
Category: Reliability
Please don’t
A fresh cup mentions the Ruby on Rails exception notifier plugin. The idea is that every time an exception is raised in your code you get an email. This is such a horrible idea that I need to take the time to comment. As someone who spends all his time dealing with large deployments of…