Infrastructure Issue Affecting Customer Sites
Incident Report for Pantheon Operations
Postmortem

From 10:28 a.m. to 12:30 p.m. UTC, we were alerted to multiple customer sites and services being down. We immediately began to investigate the root cause. We determined the root cause to be one DB server that was not reachable due to the loss of persistent storage.

Our engineering team continuously worked on finding ways to resolve the issues. The number of down sites started to decrease as a result, and all sites and services were restored by 5:00 p.m. UTC.

We have identified some improvements that will help us detect similar issues in the future.

Posted May 12, 2022 - 13:59 PDT

Resolved
This incident has been resolved.
Posted May 06, 2022 - 10:00 PDT
Update
We are continuing to monitor for any further issues.
Posted May 06, 2022 - 08:32 PDT
Update
We are continuing to monitor for any further issues.
Posted May 06, 2022 - 07:44 PDT
Update
We are continuing to monitor for any further issues.
Posted May 06, 2022 - 06:57 PDT
Update
We are continuing to monitor for any further issues.
Posted May 06, 2022 - 06:09 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 06, 2022 - 05:37 PDT
Update
We are continuing to work on a fix for this issue.
Posted May 06, 2022 - 04:50 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted May 06, 2022 - 04:12 PDT
Investigating
We are addressing an infrastructure failure that is affecting customer sites.
Posted May 06, 2022 - 03:47 PDT
This incident affected: Customer Sites.