Database endpoint down
Incident Report for Pantheon Operations
Postmortem

On August 4th at 7:30 PM PST, a database server replacement was performed that resulted in the deletion of a number of development environment databases. No live environment databases were affected. On August 5th at 1:20 AM PST, engineers were alerted to the missing databases and began remediation. By August 5th at 2:31 PM PST, 70% of deleted database containers were online. By August 5th at 4:00 PM PST, all database containers were online. The affected database containers were restored to an empty state which allowed customers to either reinitialize their development database, restore from backup, or restore from another environment. The primary contributor of the incident was a bug in the server replacement routine that has now been fixed. While the primary contributor to this incident has been addressed, we are evaluating improvements that will guard against a similar incident occurring in the future and would lead to faster recovery should something similar happen again.

Posted 9 days ago. Aug 13, 2019 - 09:15 PDT

Resolved
This incident has been resolved.
Posted 17 days ago. Aug 05, 2019 - 17:03 PDT
Monitoring
A fix has been deployed for this issue and we are monitoring the situation.
Posted 17 days ago. Aug 05, 2019 - 16:03 PDT
Update
We are continuing to work on a fix for this issue.
Posted 17 days ago. Aug 05, 2019 - 15:45 PDT
Update
We are continuing to work on a fix for this issue.
Posted 17 days ago. Aug 05, 2019 - 14:36 PDT
Update
We are continuing to work on a fix for this issue.
Posted 17 days ago. Aug 05, 2019 - 13:27 PDT
Update
We are continuing to work on a fix for this issue.
Posted 17 days ago. Aug 05, 2019 - 12:22 PDT
Update
We are continuing to work on a fix for this issue.
Posted 17 days ago. Aug 05, 2019 - 11:16 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted 17 days ago. Aug 05, 2019 - 09:49 PDT
Investigating
We are investigating a failed database endpoint.
Posted 18 days ago. Aug 05, 2019 - 06:54 PDT
This incident affected: Customer Sites and Workflow Operations.