On April 23 at 22:37 UTC, an operations error caused an unintended cache clear of routing information required to access customers' application containers from our routers. Our routers required time to repopulate their caches. Uncached customer page request error rates improved linearly over the course of the event. Cached customer page requests were unaffected.
We have since 1) corrected operations documentation, 2) informed all operators of the changes, 3) identified a number of ways we can protect against a similar incident in the future, and 4) committed to evaluating those options and making further improvements in our upcoming sprints.