On 2/28/2017 AWS S3 experienced widespread issues which affected Pantheon and our customers. According to Amazon, the issues were caused by high error rates with S3 in US-EAST-1. We first noticed issues around 9:30am PST and most functions went back to normal around 3pm PST.
Pantheon’s infrastructure is hosted on Rackspace, however our file storage system (Valhalla) leverages components from both Rackspace and S3. Due to this, customers experienced issues with their dashboards, sites and support services. The file storage system was designed to withstand interruptions in the S3 service. However, the issues experienced by S3 were longer than what we had designed the file storage system to withstand. We put the system into a read-from-cache mode which allowed many sites to continue to provide some service instead of being completely down.
Several improvements have been identified to limit the impact of similar incidents. We are looking into cross provider storage services to decrease our dependency on a single provider. We apologize for the inconvenience caused by this interruption and remain committed to keeping your trust in Pantheon.