The issue stemmed from a disruption in communication between our edge system (Fastly) and Pantheon. Pantheon uses Fastly’s edge network to provide better performance by caching pages. As pages change, Pantheon automatically refreshes them in Fastly. This incident resulted in downtime for any pages that needed to be served by Pantheon.
Pantheon's Experience Protection feature supports the ability to serve previously cached pages when we are unable to refresh a page on the edge network. This allowed us to serve previously cached pages unless a customer specifically turned off this capability for some of their pages. Additionally, some pages which are built dynamically on Pantheon could not be cached by the edge network. Such pages were not served due to the disruption between the edge network and Pantheon.
We’re taking steps to help protect against these types of issues in the future. We are enhancing our monitoring systems to detect and report such interruptions more proactively, with the goal of preventing similar incidents in the future. Additionally, we are conducting a comprehensive assessment of our monitoring and alerting framework to ensure we cover all relevant scenarios effectively.