I wish I’d written a post-mortem of my last disastrous hike. Not because it’s an opportunity to humble-brag about a time I hiked 43 kilometres, nor because these stories lead to compelling narratives, but because it’s invaluable for figuring out both what went wrong and how to fix it. As a bonus, it’s an opportunity to educate someone about the finer details of hiking.
Hence when it was suggested I do a post about FreethoughtBlog’s latest outage, I jumped on it relatively quickly. Unlike my hiking disasters, though, a lot of this coming second-hand via PZ and some detective work on my side, so keep a bit of skepticism handy.
In the early days of the internet, things were pretty simple. Computers were given a fixed four “digit” address on the ‘net, with each digit numbering between 0 and 255 instead of the usual 0 and 9. If you wanted one, you called up Jon and asked for it. Elizabeth maintained a text file known as “HOSTS.TXT”, which recorded all the known addresses and added simple human-friendly annotations. She put the file on one of those computers sitting on the ‘net, so anyone could periodically download it and store a copy on their local computers. The internet started off as a research project by academics, after all, so it inherited their laid-back and informal approach. Could your administrator manually edit the local copy of HOSTS.TXT to alter the addresses of a few computers, then set up a computer pretending to be those other computers at those addresses? Easily! But why would they?
A more robust and scalable system was needed, and in the 1980’s some researchers hit on the two basic systems we still use today. Getting an address on the internet and associating a human-friendly name with it were still considered two separate problems, but now much of the work had been outsourced. Groupings of those four “digit” addresses were handed out to organizations like universities, governments, and corporations to do as they see fit. For instance:
$ whois 142.250.217.78 | grep -4 Parent: NetRange: 142.250.0.0 - 142.251.255.255 CIDR: 142.250.0.0/15 NetName: GOOGLE NetHandle: NET-142-250-0-0-1 Parent: NET142 (NET-142-0-0-0-0) NetType: Direct Allocation OriginAS: AS15169 Organization: Google LLC (GOGL) RegDate: 2012-05-24
Google controls the range of addresses from 142.250.0.0 to 142.251.255.255, inclusive, and can assign those 131,072 locations however they like; sell them off to the highest bidder, rent them out for a specific length of time, hand them out as birthday presents to employees, sky’s the limit. There’s nothing special about Google, anyone can request control over an address space, for a price. All of the traditional four “digit” addresses have been claimed for years, but you can be put on a waiting list in case someone else’s range is shrunk or relinquished.
This itself has obvious problems, not least of which is how to locate a specific computer when its address might change. This is where the solution to the second problem steps in: the Domain Name System, or DNS for short. There are thirteen “root” servers, which contain information about who to query about a “top-level domain.” These currently number 1,591, and have names like “com” (controlled by Verisign), “org” (Public Interest Registry), “ca” (Canadian Internet Registration Authority), “bestbuy” (BestBuy), “meme” (Google), and “امارات” (Telecommunications and Digital Government Regulatory Authority, United Arab Emerates). The “registries” controlling these top-level domains then give/sell/rent subdomains like “google” or “canada” or “فكرة اسم” out to other people or organizations, either directly or by outsourcing the task to “registrars” like GoDaddy. So if I wanted to resolve a name like “mail.google.com”, I would contact a root DNS server and ask which servers are in charge of “com”, and in this case be pointed to Verisign’s name servers. I’d then contact those name servers and ask who’s in control of “google”, and be pointed to Google’s name servers. I’d contact them, and at long last be given the address 142.251.33.101.
You might think this system is terrible, as it implies every time I want to read my email I’ve got to manually hit up four separate servers, one of which is hammered by every single computer on the internet. But DNS has a caching system: rather than directly bother the root server, you bother a resolving server instead. That server might bother the root server initially, but the response it gets back also includes an expiry date. Subsequent requests before that date recycle the last response instead of dispatching a new one. This pattern repeats down the hierarchy, by itself saving tremendous bandwidth. Further savings come from pooling requests: your internet service provider runs their own resolving server that handles all the requests from all their customers, so if one of them queries “com” then all subsequent customers get the cached response. Quite often, the router that provides your internet also runs its own resolving server to pool together requests from all the computers connected to it, and sometimes your operating system also runs a resolving server shared by all your applications. Those root servers still get a workout (there’s roughly 1,733 actual servers, some technical wizardry makes it look like there’s only thirteen), but there’s no question the system scales well.
That’s enough background to get started, we’ll pick up the rest along the way.
My best guess is that everything started on February 20th, 2024, sometime around 11 PM. An email from “Deathlord Al-Zawahiri” lands in the complaints file for the registrar, alleging some sort of scam associated with FtB but providing no details. Presumably, the registrar sent a message to the email address associated with FtB’s account, asking they or whoever hosts the website remove the scam and get back to them.
Slight problem: Ed Brayton handled a lot of the financial side of FtB, back when it was founded in 2011, but he died in August 2020. The other founder, PZ Myers, dutifully filled out all the paperwork necessary to take over control from Ed. In the case of the domain name, this transfer happened on or before May of 2020, months before Brayton died. Rewinding time is tricky, but you can at least see the current state of things:
$ whois freethoughtblogs.com | sed -ne '/^Registrar:/p;/^\w* Name:/p' Domain Name: FREETHOUGHTBLOGS.COM Registrar: FastDomain Inc. Registrant Name: -PLEASE SELECT- Admin Name: MYERS, PAUL Tech Name: INC, BLUEHOST
Bluehost? At some point in the past, they acquired or merged with FastDomain. Whatever the case, it’s possible PZ forgot to change the email address associated with the account on FastDomain, FastDomain somehow mucked up the account hand-over and left Brayton’s old email in place, or the administration servers at FastDomain/Bluehost are busted and returned an old email address. It’s not a good sign when your servers give different information for the same domain query, repeated seconds apart:
$ whois -h whois.fastdomain.com freethoughtblogs.com | grep Email: Registrant Email: STCYNIC@GMAIL.COM Admin Email: pzmyers@gmail.com Tech Email: whois@bluehost.com Registrar Abuse Contact Email: domain.operations@web.com $ whois -h whois.bluehost.com freethoughtblogs.com | grep Email: Registrar Abuse Contact Email: tos@fastdomain.com Registrant Email: PZMYERS@GMAIL.COM Admin Email: PZMYERS@GMAIL.COM Tech Email: WHOIS@BLUEHOST.COM $ whois -h whois.fastdomain.com freethoughtblogs.com | grep Email: Registrant Email: pzmyers@gmail.com Admin Email: pzmyers@gmail.com Tech Email: whois@bluehost.com Registrar Abuse Contact Email: domain.operations@web.com
Web.com?! By bouncing around Wikipedia, I figured out that Bluehost was purchased by Endurance International Group in 2010; EIG has spent a good decade buying up registrars and related companies, at one point collecting about one hundred of them; meanwhile, Web.com had also been buying up other registrar-adjacent companies, until they in turn were acquired by Siris Capital Group in 2018; Clearlake Capital Group purchased EIG in 2020, and spun off Endurance Web Presence to house all those registrars; finally, Endurance Web Presence and Web.com merged in 2021 to form Newfold Digital, which appears to be a weird corporate Frankenstein’s monster of some sort? I suppose we could just call it a “cash cow,” as EIG alone was worth $3 billion in cold-hard cash when it was purchased.
Bluehost’s information, at least, seems to be stable. But if we now query Web.com’s servers:
$ whois -h whois.web.com freethoughtblogs.com | grep Email: Registrant Email: STCYNIC@GMAIL.COM Admin Email: pzmyers@gmail.com Tech Email: whois@bluehost.com Registrar Abuse Contact Email: domain.operations@web.com
Lovely. All these mergers and acquisitions have triggered some behind-the-scenes consolidation of back-end services, which apparently hasn’t gone smoothly. Are these the same servers queried when an issue arises with an account? Who knows!
Whoever was in control of Brayton’s email address didn’t delete any message but also didn’t respond back, so on February 25th, 2024, at 4:18 PM UTC, the domain name was flagged with “clientHold”. Apologies for immediately heading off into another tangent, but: over the years, DNS has added quite a few extra features. DNS pre-dates the world wide web by almost a decade, but email grew up along side it and was wildly popular, so email is privileged with an exclusive record type.
$ host -t MX freethoughtblogs.com freethoughtblogs.com mail is handled by 10 mail.freethoughtblogs.com
The usual name-to-traditional-address mapping were assigned record type “A”, you can add free-form text with record type “TXT”, map one domain name to another with “CNAME”, and so on.
There’s also the separate problem of who controls a domain name. DNS wasn’t intended to handle the administrative side of things, so the whois service was put in charge of that. Look carefully, and you’ll note that nearly all of the previous examples were actually done via this secondary service, with that “host” command being the sole exception. It’s much simpler than DNS: you submit a query in something like plain text, and get a plain text response back. That protocol has also been adding new features, like domain name status codes. These are flags that can effect how DNS servers handle queries for specific domains. Since FtB’s current status flags are a bit boring, let’s look at another domain’s status.
$ whois freethoughtblogs.online | egrep "^Domain Status:" Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited Domain Status: addPeriod https://icann.org/epp#addPeriod
In order, these state that ownership of that domain name cannot be transferred to another entity, according to both the registry that controls the “.online” top-level domain (Radix Technologies) and the registrar who sold and currently controls that specific domain name (eNom, though a careful look suggests it was really sold by Dreamhost, acting as a reseller under contract with eNom). Finally, this domain was added only recently, so if Radix decides to delete it then eNom and possibly Dreamhost will get a refund of the fees.
This finally gets us back to that “clientHold” domain status flag, which means:
This status code tells your domain’s registry to not activate your domain in the DNS and as a consequence, it will not resolve. It is an uncommon status that is usually enacted during legal disputes, non-payment, or when your domain is subject to deletion.
In other words, FastDomain/Bluehost/Web.com/Newfold Digital sent an automated message to Verisign on February 25th, 2024, at 4:18 AM UTC, telling the “com” top-level domain to temporarily act like the “freethoughtblogs” domain name doesn’t exist. After that point, queries to Verisign’s domain servers would shrug their shoulders if asked about “freethoughtblogs.com”, and as the caches in DNS resolvers expired that domain name gradually disappeared across the internet. Anyone who tried plugging “freethoughtblogs.com” into their browser would have it sent back a “not found: 3(NXDOMAIN)” error, which would then put a more human-friendly message on top like “Hmm. We’re having trouble finding that site.”
To most people, FtB had gone “offline.” In reality, it had not: the servers that host our website were working perfectly fine, in fact better than ever thanks to the sudden loss of traffic. While our top-level domain suddenly forgot about “freethoughtblogs”, our website servers are hidden or “proxied” behind Cloudflare’s servers, and they have their own low-level DNS servers. Another commenter queried them directly, and sure enough they pointed to the correct IP address. Had anyone temporarily changed their resolver to that specific Cloudflare server, the site would have continued to operate like normal.
That trick only worked because those servers as “canonical” and skip the entire hierarchy, plus they let any old rando query them. There’s no guarantee they’ll return anything for “mail.google.com” or anything other than a small pool of domains they’re responsible for. They also have nothing to do with Cloudflare’s public resolver servers, which would have agreed with the rest of the internet that “freethoughtblogs.com” wasn’t associated with any address.
There are more technical details to discuss, but that can wait for part two, where I talk about what we can do about [waves hands upwards].