Degraded Dashboard and Terminus Performance
Incident Report for Pantheon Operations
Postmortem

At 14:00 UTC 20 May, a large spike in DNS traffic was observed. Our on-call engineers were alerted to a slowness in dashboard load times and terminus command completion. Engineers identified sites that were causing an increase in DNS traffic and mitigated the behavior resulting in a return to normal traffic. Some sites were impacted for over 30 minutes while service was restored at 17:07 UTC.

We have identified some improvements that will help us detect and prevent similar issues in the future.

Posted Jun 03, 2022 - 09:19 PDT

Resolved
This incident has been resolved.
Posted May 20, 2022 - 10:07 PDT
Update
We are continuing to monitor for any further issues.
Posted May 20, 2022 - 09:36 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 20, 2022 - 09:02 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted May 20, 2022 - 08:24 PDT
Update
We are continuing to investigate this issue.
Posted May 20, 2022 - 08:05 PDT
Investigating
Our monitoring has detected elevated error rates for the Dashboard and Terminus commands. These may manifest as slow page loads, failed logins, or failures with Terminus commands.

For urgent issues please contact support via helpdesk@pantheon.io.
Posted May 20, 2022 - 07:33 PDT
This incident affected: Dashboard, Workflow Operations, and Terminus Operations.