Degraded Filesystem Performance
Incident Report for Pantheon Operations
Postmortem

On Friday, July 14th, we became aware of an infrastructure issue affecting file access and file backups for some sites on Pantheon. We developed and deployed a fix to permanently resolve the issue.

Over the past several weeks, we have been effecting improvements in our filesystem architecture. These improvements ultimately offer better performance and increased reliability. Specifically, the new architecture allows us to better respond to increases in demand, better isolate failure scenarios, and better manage system updates.

Unfortunately, an element of the new configuration did not perform as expected under specific use-cases, despite our previous load testing. Specifically, the TLS-based secure session negotiation degraded under an widely varied production workload. Once we narrowed in on the specific use-cases and identified the root cause, we were able to reconfigure that component, and return the platform to stable operation. We are confident this new configuration is stable and the improved filesystem architecture will scale to meet the future demands of the platform.

We take reliability and performance seriously. We understand file access is core to all sites on the Pantheon platform. We apologize for any inconvenience and are committed to continually improve the reliability and performance of the platform to meet and exceed your needs.

Posted Jul 17, 2017 - 16:08 PDT

Resolved
Service restored; filesystem access has stabilized and are continuing to monitor performance. We are planning a maintenance window to proactively apply a patch to help ensure future stability.
Posted Jul 15, 2017 - 11:21 PDT
Monitoring
We have addressed the issue and are continuing to monitor closely. If you have continued issues, please contact support. We are investigating next steps to ensure future stability.
Posted Jul 15, 2017 - 10:03 PDT
Identified
We have identified the issue and are working to stabilize the filesystem for affected sites.
Posted Jul 15, 2017 - 09:21 PDT
Update
We are still investigating a root cause for the issue within the Filesystem. Please be advised that the Pantheon documentation pages are also impacted and may not be loading as expected.
Posted Jul 15, 2017 - 08:45 PDT
Investigating
We are investigating an issue affecting a portion of sites where file assets are not being delivered as expected.
Posted Jul 15, 2017 - 07:51 PDT