Infrastructure Issue Affecting Customer Sites

Incident Report for Pantheon Operations

Postmortem

From 10:28 a.m. to 12:30 p.m. UTC, we were alerted to multiple customer sites and services being down. We immediately began to investigate the root cause. We determined the root cause to be one DB server that was not reachable due to the loss of persistent storage.

Our engineering team continuously worked on finding ways to resolve the issues. The number of down sites started to decrease as a result, and all sites and services were restored by 5:00 p.m. UTC.

We have identified some improvements that will help us detect similar issues in the future.

Posted May 12, 2022 - 13:59 PDT

Resolved

This incident has been resolved.

Posted May 06, 2022 - 10:00 PDT

Update

We are continuing to monitor for any further issues.

Posted May 06, 2022 - 08:32 PDT

Update

We are continuing to monitor for any further issues.

Posted May 06, 2022 - 07:44 PDT

Update

We are continuing to monitor for any further issues.

Posted May 06, 2022 - 06:57 PDT

Update

We are continuing to monitor for any further issues.

Posted May 06, 2022 - 06:09 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted May 06, 2022 - 05:37 PDT

Update

We are continuing to work on a fix for this issue.

Posted May 06, 2022 - 04:50 PDT

Identified

The issue has been identified and a fix is being implemented.

Posted May 06, 2022 - 04:12 PDT

Investigating

We are addressing an infrastructure failure that is affecting customer sites.

Posted May 06, 2022 - 03:47 PDT

This incident affected: Customer Sites.