Infrastructure Issue Affecting Customer Sites
Incident Report for Pantheon Operations
Postmortem

On 2/23/2017 we experienced a brief spike in CPU load on one of our edge routing clusters. The spike caused downtime of less than 10 minutes for a large number of sites. Our monitoring service alerted us to the problem immediately and we were able to quickly trace the cause. A large amount of traffic was observed to be going towards a specific site. The traffic reduced after a few minutes and the incident was resolved.

Pantheon’s edge routing layer is a key component in keeping customer sites available, and we are continually working to improve it. We apologize for the outage and appreciate the trust our customers place in us.

Posted Feb 24, 2017 - 16:35 PST

Resolved
This incident has now been resolved.
Posted Feb 23, 2017 - 05:14 PST
Monitoring
We have mitigated this incident and are monitoring the situation.
Posted Feb 23, 2017 - 04:26 PST
Identified
We have identified abnormally high traffic to specific sites which placed additional load on our routing layer. We have identified the issue and implemented a fix, which we are monitoring.
Posted Feb 23, 2017 - 04:22 PST
Investigating
We are addressing an infrastructure failure that is affecting a small portion of customer sites.
Posted Feb 23, 2017 - 04:20 PST