Infrastructure Issue Affecting Customer Sites
Incident Report for Pantheon Operations
Postmortem

On Sept 24th from approximately 8:30 AM to 9:20 AM PT, an internal test led to an incident in our edge routing service that affected roughly 25% of sites. Affected sites experienced increased latency for uncached requests, an increased rate of HTTP/5xx errors, and timeouts on some requests. We are scheduling the implementation of several improvements to help mitigate impact from a similar event in the future. Additionally we are working to incorporate the test that led to the discovery of these issues into our edge routing service test suite.

Posted Sep 28, 2020 - 18:00 PDT

Resolved
This incident has been resolved.
Posted Sep 24, 2020 - 10:36 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 24, 2020 - 09:54 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 24, 2020 - 09:31 PDT
Investigating
We are addressing an infrastructure failure that is affecting a small portion of customer sites.
Posted Sep 24, 2020 - 08:54 PDT
This incident affected: Customer Sites.