Degraded Dashboard Performance
Incident Report for Pantheon Operations
Postmortem

Our Dashboard experienced a period of degraded performance early Sunday morning due to misconfiguration of automatic scaling in technology that is new to Pantheon. While customer web sites were not affected during this outage, Pantheon customers briefly lost the ability to make changes to their sites at the beginning of this incident.

We have conducted an internal review of the events that lead to this outage and developed a plan to allow us to prevent similar incidents in the future, detect degraded performance of our new technology, and recover more gracefully in the event of system failure. We are confident that our plan will effectively mitigate these issues.

We recognize that being able to change a site is as important as the site being available for use and apologize for any inconvenience caused by this outage.

Posted Oct 19, 2016 - 17:16 PDT

Resolved
The incident has been cleared and no further issues have been reported. Thank you for your patience!
Posted Oct 16, 2016 - 20:37 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 16, 2016 - 08:33 PDT
Update
We continue to investigate as some sections of the dashboard are still unavailable.
Posted Oct 16, 2016 - 05:47 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 16, 2016 - 04:12 PDT
Investigating
Our monitoring has detected elevated error dashboard rates, which may manifest as slow page loads or failed logins.
Posted Oct 16, 2016 - 03:40 PDT