Degraded Performance

Incident Report for Pantheon Operations

Postmortem

On June 18, around 3:35 PT, a small set of customer database servers became heavily degraded for about 50 minutes. The underlying cause for the issue (persistent disk latency) is extremely uncommon and that rarity added to the time to recover. About 0.1% of sites on the platform were affected by this incident.

We will be updating our playbook to cover this rare occurrence in more depth, and will be adding additional monitoring for this specific case, which should result in faster remediation should this occur again.

Posted Jun 19, 2020 - 17:10 PDT

Resolved

This incident has been resolved.
Posted Jun 18, 2020 - 16:57 PDT

Monitoring

We are continuing to monitor for any further issues.
Posted Jun 18, 2020 - 16:42 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Jun 18, 2020 - 16:22 PDT

Investigating

We are investigating a failed database endpoint.
Posted Jun 18, 2020 - 16:13 PDT
This incident affected: Customer Sites.