We are experiencing a general site outage.
Incident Report for Pantheon Operations
Postmortem

At approximately 1 PM PST yesterday ( Sept 5th ) the platform suffered an outage that impacted a number of sites. We detected the issue within 3 minutes, and the total degradation was resolved within 30 minutes. Many sites continued to serve content via our Global CDN caching layer during the outage. The main cause was a deployment to our edge layer that purged one of our main (route) caches too aggressively. As part of improving our platform we deploy code multiple times a day with many layers of testing and protections to avoid such outages. We rarely modify our edge layer. We performed a post mortem of yesterday's incident and have added multiple layers of protection to prevent this from happening again. Such changes in the future will invalidate the caching more slowly, and will provision extra capacity in order for the cache to warm up faster.

Posted Sep 06, 2018 - 16:33 PDT

Resolved
Our apologies for this interruption in service. The reason for this outage has been identified and is well understood. Our team will be meeting tomorrow to identify process improvements to prevent similar incidents in the future. The result of that effort will be posted here by the end of the day.
Posted Sep 05, 2018 - 14:10 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 05, 2018 - 13:32 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 05, 2018 - 13:23 PDT
Investigating
We are currently investigating this issue.
Posted Sep 05, 2018 - 13:13 PDT
This incident affected: Customer Sites.