DNS Routing Issues
Incident Report for Pantheon Operations
Resolved
This incident has been resolved.
Posted Sep 16, 2013 - 12:22 PDT
Update
Continuing to see correct DNS resolution; no new reports of failure.
Posted Sep 16, 2013 - 08:06 PDT
Update
All of Comcast's DNS caches are now showing correct routing data for Pantheon services. We believe this is the last major ISP that was having issues. If you still have reports of DNS problems reaching any Pantheon services, please contact support.
Posted Sep 16, 2013 - 06:17 PDT
Monitoring
10 out of 12 Comcast regional DNS caches are routing properly. We are continuing to monitor the situation
Posted Sep 16, 2013 - 05:26 PDT
Update
We're in direct contact with Comcast's NOC and DNS engineering teams. We're working with the on-call staff to flush records. Comcast is the final major ISP whose customers may be experiencing issues.
Posted Sep 16, 2013 - 02:22 PDT
Update
This evening, at around 19:15 Pacific Time, Pantheon engineers made a DNS change to the getpantheon.com domain in the process of implementing security extensions (DNSSEC). This change introduced invalid DS records, which were propagated to downstream DNS caches, like Google (8.8.8.8) and Comcast (75.75.75.75). The erroneous DNS entries were replaced with correct entries, but not before downstream DNS providers cached the incorrect DNS, causing failed DNS lookups.

A number of services rely on the getpantheon.com domain, including all customer sites that have CNAME DNS records pointing to edge.live.getpantheon.com. Customers that are pointing at an edge load-balanced IP address or using SSL are not affected. The SERVFAIL DNS response is effectively the same as no response, causing browser to return “This webpage not available” or similar message.

Currently, some DNS resolvers are returning valid entries, and others are continuing to serve failure responses. Although we are optimistic about more providers refreshing their caches, our current worst-case estimate is that the getpantheon.com DNS records will be cached for up to 24 hours (~19:00 Pacific tomorrow). At this time, updating your DNS or waiting 24 hours are the fastest solutions.

To Update your DNS to restore Service:

We are providing alternate DNS for customers who are able to update their DNS to use. Update the CNAME records for edge.live.getpantheon.com to edge.live.gotpantheon.com (replacing “gEtpantheon.com” with “gOtpantheon.com”). Your site will regain availability as soon as your new DNS entries propagate (based on the TTL of your entries).

Please email your questions, with site-name and any relevant details to helpdesk@getpantheon.com and we will respond as soon as possible. We understand this is a severe interruption, and are working hard to resolve it as soon as possible.

Here is our assessment of the current propagation of the fixed DNS:

Comcast: Erroring (75.75.75.75 and 75.75.74.74)
Google Public DNS: OK (8.8.8.8, 8.8.4.4)
Sonic.net (primary recursor): OK (208.201.224.11, 208.201.224.33)
Sonic.net (non-DNSSEC recursor): OK (64.142.73.180, 64.142.73.181)
Cox (Secondary name servers): OK (68.105.29.16, 68.105.29.17)
AT&T: OK (68.94.156.1, 68.94.157.1)
OpenDNS: OK
Posted Sep 16, 2013 - 00:48 PDT
Update
We are seeing service restored via some downstream DNS providers, i.e. Google's resolvers at 8.8.8.8. We are continuing to monitor other providers, such at Comcast's resolvers at 75.75.75.75 that continue to server failure responses.
Posted Sep 16, 2013 - 00:17 PDT
Update
We are continuing to work on the problem, and will have more information shortly.
Posted Sep 15, 2013 - 23:56 PDT
Update
We are continuing to investigate the issue with DNS, including a timeline for resolving and possible short-term solutions for certain DNS configurations. We will have more information within 30 minutes.
Posted Sep 15, 2013 - 23:14 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 15, 2013 - 21:22 PDT
Investigating
We are investigating an issue with our upstream DNS provider that is affecting customer sites using the edge.live.getpatheon.com CNAME and the dashboard.

Sites using direct IPs (because of SSL or needing to use A or AAAA records) should not be affected.
Posted Sep 15, 2013 - 20:57 PDT