Degraded Workflows
Incident Report for Pantheon Operations
Postmortem

On November 14, 2023, starting at 20:19 UTC, a surge of workflow failures alerted Pantheon engineering to an issue pulling container images from an upstream vendor registry. Confirmation of the upstream issue was quickly confirmed via the vendor status page site detailing the external service outage impacting container image push and pull operations. Upon upstream remediations, the workflow failure rate returned below threshold on November 14, 2023 at 23:25 UTC.

We have identified areas to improve our infrastructure to prevent issues like this from occurring in the future.

Posted Nov 28, 2023 - 11:15 PST

Resolved
This incident has been resolved.
Posted Nov 15, 2023 - 06:48 PST
Update
We're pleased to report that all systems are currently functioning correctly. Our team is actively monitoring performance to ensure optimal service. If you have any questions or remaining issues with workflows, please open a support chat with us, or email helpdesk@pantheon.io.

We'll provide another status update within the next 6 hours or sooner if the situation is resolved. Thank you for your patience and understanding.
Posted Nov 15, 2023 - 03:36 PST
Update
All systems are functioning correctly. However, we are currently in the process of closely monitoring to ensure optimal performance. Please feel free to reach out if you have any further questions. We will provide another update within 6 hours or sooner if the problem is resolved.
Posted Nov 14, 2023 - 21:11 PST
Update
We are still in the monitoring phase to confirm the solution's effectiveness. The fix to address the recent workflow issues on Pantheon has been implemented for more than 2 hours now, and we haven't received any additional reports. Further updates will be provided as the situation progresses. Your patience and understanding are appreciated. If you have any questions or encounter workflow issues, please don't hesitate to contact support.
Posted Nov 14, 2023 - 18:36 PST
Monitoring
Our upstream provider has implemented a fix that resolves the recent workflow issues on Pantheon. We are now in the monitoring phase to ensure that the solution is effective. We will provide further updates as the situation progresses. Thank you for your patience and understanding, and please reach out to support if you have any questions or remaining issues with workflows.
Posted Nov 14, 2023 - 16:09 PST
Update
The issue continues to affect various workflows on Pantheon. Our team is diligently working on addressing the current challenge, exploring all possible solutions while coordinating with our service provider. We will share the next update in 3 hours, or sooner if more information becomes available. Your patience and understanding during this time are greatly appreciated.
Posted Nov 14, 2023 - 15:12 PST
Update
The issue is still impacting workflows on Pantheon. Our team is actively exploring all available options to alleviate the current issue. We will provide an update in 1 hour or as more information becomes available. Thank you for your ongoing patience.
Posted Nov 14, 2023 - 14:03 PST
Identified
We are currently experiencing an infrastructure issue that is affecting several workflows on Pantheon, including code synchronization, cache clearing, enabling Solr, and creating new sites or environments.

The current investigation points to an issue with one of our Docker image service providers. We are working with the provider to resolve the issue as quickly as possible.

We apologize for any inconvenience caused and appreciate your patience as we work towards a fix.
Posted Nov 14, 2023 - 13:14 PST
This incident affected: Workflow Operations.