Pantheon Operations
All Systems Operational
Customer Sites ? Operational
Dashboard ? Operational
Global CDN ? Operational
Spinup Operations ? Operational
Workflow Operations ? Operational
Support Tickets ? Operational
Support Chat ? Operational
Terminus Operations ? Operational
Site Certificate Provisioning Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Customer Site Availability
Fetching
Dashboard Response Time
Fetching
Past Incidents
Jul 10, 2020

No incidents reported today.

Jul 9, 2020

No incidents reported.

Jul 8, 2020

No incidents reported.

Jul 7, 2020

No incidents reported.

Jul 6, 2020

No incidents reported.

Jul 5, 2020

No incidents reported.

Jul 4, 2020

No incidents reported.

Jul 3, 2020

No incidents reported.

Jul 2, 2020
Postmortem - Read details
Jul 7, 17:12 PDT
Resolved - This incident has been resolved. We have fixed the regression and removed corrupted files.

Additional update:

The incident caused some files under 1MB being written either by the CMS or by a manual upload (e.g. SFTP) to be corrupted. Those files have been purged. Their content is gone. The window of the incident began on the evening of June 29th Pacific Time (0300 UTC June 30th) and lasted until approximately midnight July 1st (0700 UTC July 2nd). As of right now affected files will show up in directory listings, but do not contain any content.

Two classes of files widely impacted were aggregated CSS or JS assets, as well as image thumbnails. If you are experiencing issues related to these assets, you should immediately flush all caches or otherwise trigger the regeneration of those files. Now that the file persistence layer is stable, they should perform as usual after being regenerated.

We will be conducting and communicating a full audit of corrupted files for all affected customers, as well as removing all errant references from directory listings. This will take some time, but work is already underway.

If you need to find out if a file uploaded or written by the CMS was lost, you can review recently added files via your WordPress or Drupal admin interface, and see if they are still available. If not, you should re-upload them if possible. This is the only path to restoring lost content.

Backups will not contain the missing files, but as a last resort restoring from backup is a way to get a site back to a previous stable state. For safety, you should use a backup from before 0300 June 30th (8pm PT June 29th). Restoring from backup disrupts a site, causing some downtime. For sites with a small content footprint, a restore to the live environment can complete in a few minutes.
To minimize downtime, you can individually import the elements of a backup to another environment (e.g. test) by copying the url to the backup elements and pasting it into the other environments “import” field. Once that workflow completes, clear your edge cache, test your changes, then use the content sync workflows to sync the db and files back over to live. Import via URL only works where the backup elements are under 500MB in size, but it will minimize the disruption in the live environment.
You can open a support ticket or engage chat for further consultation if needed.
Jul 2, 14:52 PDT
Update - Starting approximately at 0300 UTC June 30, the writes from the persistent cache to long term storage became heavily delayed for a portion of our customers. This had no effect on the filesystem, but caused affected files to not appear in backups. Starting at approximately 1900 UTC July 1, we deployed a change to address the delayed writes to long term storage which ultimately corrupted a portion of those files in long term storage. We addressed the issue by auditing files in long term storage against their validation hash, and deleted files that did not match the validation hash.

We have fixed the regression and removed corrupted files.

Recommended customer remediation: Follow their normal restore-from-backup procedure with backup started before 0300 UTC June 30

Remediation for customers that can't follow the recommendation: Find all references (e.g. DB/html) to files in their ./files/ path. Read all files. If the file disappears, either remove the reference or re-upload the file. Do not rely on directory listings. The audit period should at least cover files created between 0300 UTC June 30 to 0700 UTC July 2

We will carry out a full post mortem and update this page within 3 business days (EOD Wednesday).
Jul 2, 12:53 PDT
Update - Our engineering team has finished cache clearing and continues to investigate the incident.
Jul 2, 10:56 PDT
Update - Our engineering team is still working on clearing the cache for affected sites.
Jul 2, 09:53 PDT
Update - Our engineering team is still working on clearing the cache for affected sites.
Jul 2, 08:48 PDT
Update - Our engineering team is still working on clearing the cache for affected sites.
Jul 2, 07:47 PDT
Update - Our engineering team is still working on clearing the cache for affected sites.
Jul 2, 06:47 PDT
Update - Our engineering team is still working on clearing the cache for affected sites.
Jul 2, 05:47 PDT
Monitoring - A fix has been deployed and is being monitored. Our engineering team is currently working on clearing caches of affected customers.
Jul 2, 04:46 PDT
Update - No updates at this time.
Jul 2, 04:12 PDT
Update - No updates at this time.
Jul 2, 03:13 PDT
Update - No updates at this time.
Jul 2, 02:11 PDT
Update - No updates at this time.
Jul 2, 01:11 PDT
Update - No updates at this time.
Jul 2, 00:09 PDT
Update - No updates at this time.
Jul 1, 22:58 PDT
Update - No new information at this time.
Jul 1, 21:16 PDT
Identified - We believe we have stopped the processes that were corrupting the file data and we’re working to clean up the corrupted files.
Jul 1, 19:38 PDT
Investigating - We are currently investigating this issue.
Jul 1, 19:02 PDT
Jul 1, 2020
Jun 30, 2020

No incidents reported.

Jun 29, 2020

No incidents reported.

Jun 28, 2020

No incidents reported.

Jun 27, 2020

No incidents reported.

Jun 26, 2020

No incidents reported.