Degraded Performance
Incident Report for Pantheon Operations
Postmortem

On June 18, around 3:35 PT, a small set of customer database servers became heavily degraded for about 50 minutes. The underlying cause for the issue (persistent disk latency) is extremely uncommon and that rarity added to the time to recover. About 0.1% of sites on the platform were affected by this incident.

We will be updating our playbook to cover this rare occurrence in more depth, and will be adding additional monitoring for this specific case, which should result in faster remediation should this occur again.

Posted Jun 19, 2020 - 17:10 PDT

Resolved
This incident has been resolved.
Posted Jun 18, 2020 - 16:57 PDT
Monitoring
We are continuing to monitor for any further issues.
Posted Jun 18, 2020 - 16:42 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 18, 2020 - 16:22 PDT
Investigating
We are investigating a failed database endpoint.
Posted Jun 18, 2020 - 16:13 PDT
This incident affected: Customer Sites.