Application endpoint down

Incident Report for Pantheon Operations

Postmortem

On Monday, 20 July, 2020 around 12:50pm UTC, Pantheon platform experienced a lack of compute capacity in the European Datacenter causing some servers in the region to be unresponsive. The incident affected 0.5% of customer sites within that region. 

We addressed the capacity issue and worked with our upstream provider for the EU to increase our quota in order to provision additional spare capacity.

The incident was resolved at 5:01pm UTC after affected customer sites were moved to the new servers.

Our Engineering team is taking actions to prevent a situation where capacity is not available without an upstream provider. We are taking a look at the process in which we evaluate our capacity in non-US regions preventing such an incident from happening again.

Posted Jul 29, 2020 - 16:56 PDT

Resolved

This incident has been resolved.
Posted Jul 20, 2020 - 10:01 PDT

Update

We are continuing to monitor for any further issues.
Posted Jul 20, 2020 - 09:46 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 20, 2020 - 09:15 PDT

Update

Our engineering team have confirmed this incident affected sites in the EU region only, other regions were not affected.
Posted Jul 20, 2020 - 08:32 PDT

Update

We are continuing to work on a fix for this issue.
Posted Jul 20, 2020 - 08:12 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Jul 20, 2020 - 07:38 PDT

Update

We are continuing to investigate this issue.
Posted Jul 20, 2020 - 07:20 PDT

Update

We are continuing to investigate this issue.
Posted Jul 20, 2020 - 06:30 PDT

Investigating

We have been alerted to an issue affecting an individual endpoint and are investigating
Posted Jul 20, 2020 - 05:50 PDT
This incident affected: Customer Sites.