… I haven’t written part two of this, leaving you hanging for almost a year?! Unacceptable!
Since it’s been a while, a quick recap of the story so far: a Deathlord said FtB was a scam, Frankenstein’s monster asked the dead if that was true, and when there was no reply told everyone to pretend “freethoughtblogs.com” didn’t exist. Along the way I also introduced you to Elizabeth, four-digit numbers, pools, corporate mergers, and resolvers.
All clear? Good, now we can discuss ways to prevent the February outage from happening again.
Solution One: Multiple Domain Names
Last time, I mentioned a mysterious “freethoughtblogs.online” domain. Let’s take a closer look at it, starting by checking its current status:
$ whois freethoughtblogs.online | egrep "^Domain Status:" Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
The registrar has flagged that the domain can’t be transferred to another registrar, presumably to make domain hijacking more difficult (and, as a convenient side benefit, making it slightly tougher to switch to a better registrar). Who was the domain registrar, again?
$ whois freethoughtblogs.online | sed -ne '/^Registrar:/p;/^\w* Name:/p;/^Name Server:/p' Domain Name: FREETHOUGHTBLOGS.ONLINE Registrar: eNom, Inc. Name Server: NS1.DREAMHOST.COM Name Server: NS2.DREAMHOST.COM Name Server: NS3.DREAMHOST.COM
Oh right, eNom, but Dreamhost probably did the actual work. The latter is a privately-owned company, with no ties to anything like Newfold Digital or private equity firms. The former is currently owned by Tucows, a Canadian company and the third-largest domain registrar. The upshot is that there’s no corporate ties between Tucows/eNom/Dreamhost and FtB’s domain registrar, and Tucows/etc. are large/private enough to keep it that way for a while.
As I mentioned last time, the top-level domain “online” is controlled by Radix Technologies, who no doubt moved to the Cayman Islands to enjoy the wonderful crystal caves, and asked British company CentralNic to handle the technical details before hopping on the plane. FtB’s TLD is controlled by Verisign, who got control of it by purchasing Network Solutions, the very first company to control a TLD. Confusingly, Verisign then sold off Network Solutions to Pivotal Equity Group but kept the TLD side of the business, and that private equity firm sold it to another private equity firm called General Atlantic, who then sold it to…. Web.com! At any rate, there are also no ties between the companies managing “online” and “com”.
Where is this “freethoughtblogs.online” pointing?
$ host freethoughtblogs.online freethoughtblogs.online has address 104.21.234.51 freethoughtblogs.online has address 104.21.234.50 freethoughtblogs.online mail is handled by 10 mail.freethoughtblogs.online. $ host mail.freethoughtblogs.online mail.freethoughtblogs.online has address 192.241.139.193
… waaaaaiitaminute, I’ve seen those addresses before …
$ host freethoughtblogs.com freethoughtblogs.com has address 104.21.234.50 freethoughtblogs.com has address 104.21.234.51 freethoughtblogs.com mail is handled by 10 mail.freethoughtblogs.com. $ host mail.freethoughtblogs.com mail.freethoughtblogs.com has address 192.241.139.193
“freethoughtblogs.online” points to the same IP addresses as “freethoughtblogs.com”, down to the email DNS entries! This is perfect, because now a future “Deathlord Al-Zawahiri” needs to submit two complaints to two distinct entities in order to wipe out our DNS. Though now we have a new problem: who’s in charge of the “online” domain?
$ whois freethoughtblogs.online | egrep "^Registrant" Registrant Organization: Data Protected Registrant State/Province: Alberta Registrant Country: CA Registrant Email: Please query the RDDS service of the Registrar of Record identified in this output for information on how to contact the Registrant, Admin, or Tech contact of the queried domain name.
Dang, the registrant is private. Hopefully this mysterious person living in the province of Alberta is friendly, should FtB’s DNS issues return. If they were smart they’ve created some sort of automated process to track those DNS details and update “online” accordingly.
Well, what are we waiting for? Let’s give this domain name a spin!
In hindsight, “perfect” wasn’t quite the right word.
Problem One: Transport Layer Security
Rather impressively, there are three different obstacles blocking our access.
Have you ever thought about why nearly every piece of mail is sent in an envelope or plain box? If we never used envelopes/boxes at all, then anyone who wished could read or see what you’re sending through the mail. So why don’t we just enclose anything private in an envelope/box? Because then if someone spotted an envelope/box in the mail system, they would know it contains information someone wants protected, and thus it automatically becomes a target. The best policy is thus to place everything in an envelope or box, be it spam or a private letter, because then the use of an envelope/box tells you nothing about the contents.
The same thinking applies to websites. All traffic to them was completely unencrypted until Netscape published the SSL 2.0 standard in 1995, which claimed to offer iron-clad encrypted envelopes between your browser and websites. A year later they had to release a complete revamp because security researchers had no difficulty opening those envelopes. The protocol was further revised and renamed Transport Layer Security when security researchers broke it open again. That cycle repeated until the release of TLS 1.3 in 2018, which at long last seems to have figured out how to properly glue those envelopes shut.
TLS is not without its shortcomings. It encrypts your data, but does nothing to hide the fact that you’re trying to connect to “bigbooty.com” (you only read the articles, of course). It also relies heavily on domain names for authentication, to the point that a user’s web browser asks for proof that the server parked at the “bigbooty.com” DNS entry is supposed to be there. If said server cannot produce a digital certificate testifying to their ownership, and that certificate isn’t digitally signed by a source the browser trusts, most web browsers refuse to connect at all.
You can see the problem here: your browser connects to “freethoughtblogs.online”, which sends you to the IP address hosting FtB; the web-server behind that address sends back a certificate saying they’re “freethoughtblogs.com”; your browser concludes this server isn’t authorized to live at the “online” domain name, and immediately bails. Using the domain name isn’t enough, FtB’s web server also needs an updated certificate and configuration stating “freethoughtblogs.online” is a valid secondary way to access the site.
Problem Two: Reverse Proxies
Notice, however, that the error message says nothing about invalid certificates.
A proxy server is a bit of software or hardware that sits between your internet connection and the outside world, intercepting all the data you send and receive. It sounds like a terrible invasion of privacy, but it has some solid advantages: the biggest one is that a proxy server can cache accesses on your behalf, much like resolver servers do for DNS. If multiple people share one proxy server, any overlap between their requests is handled locally, which is much faster than traversing the internet multiple times and reduces the traffic on the remote server. Proxy servers can also act as filters, preventing your kids from accessing “bigbooty.com” at school, and theoretically they can detect and halt malicious traffic before it reaches the internet. You can also use them to redirect traffic over a faster or more secure connection, with the Tor network being a famous example.
The “reverse” in “reverse proxy” refers to where the proxy is located: between the server and the internet, instead of between you and the internet. All the advantages of a normal proxy carry forward, but this time the “clients” are the entire internet. As you can imagine, caching is a lot more effective in reverse! They can also handle TLS connections on behalf of the server, greatly simplifying administration and freeing up CPU cycles for other computation.
Rather than set up our own reverse proxy, FtB has outsourced that task to Cloudflare. They offer much better caching and protection from network-based attacks than we could ever hope to build. Alas, Cloudflare is also a commercial business. Their balance sheet for 2023 states they took in over a billion dollars in revenue that year, and their operating income was a mere 14.3% of that revenue. They do hand out a few freebies, but you don’t get that rich via charity. “freethoughtblogs.com” may be under their protective care, but “freethoughtblogs.online” is not, and money will need to change hands to make that happen.
Even if Cloudflare were outright saints, note that I never did a whois lookup of FtB’s IP address.
$ whois 104.21.234.51 | grep -4 Parent: NetRange: 104.16.0.0 - 104.31.255.255 CIDR: 104.16.0.0/12 NetName: CLOUDFLARENET NetHandle: NET-104-16-0-0-1 Parent: NET104 (NET-104-0-0-0-0) NetType: Direct Allocation OriginAS: AS13335 Organization: Cloudflare, Inc. (CLOUD14) RegDate: 2014-03-28
A reverse proxy is useless if you allow connections to go around it, so we cannot list our server’s actual IP address at “freethoughtblogs.online”. Nor does it make sense for Cloudflare to map one reverse proxy to one website, it instead juggles multiple sites and routes traffic based on the domain name your browser sends when establishing a TLS connection. We can fiddle with our DNS entries, we can toss around multiple TLS certificates, but if Cloudflare hasn’t also told their servers about “freethoughtblogs.online” then you’ve got no hope of connecting to FtB.
Problem Three: WordPress
Let’s say that you manage to overcome all of that and connect to FtB. It won’t look anything like normal, unfortunately. The images aren’t loading, and none of the links work. There’s a reason for that:
$ curl https://freethoughtblogs.com/ 2>/dev/null | perl -ne 'print"$1\n" for(/href="([^"]+)"/g)' | head https://freethoughtblogs.com/feed/ https://freethoughtblogs.com/comments/feed/ https://freethoughtblogs.com/ https://freethoughtblogs.com/xmlrpc.php?rsd /favicon.icoAbout FreethoughtblogsPrivacy PolicyTechnical SupportShopRecent Posts
WordPress not only uses absolute URLs when generating HTML pages, it hard-wires those URLs into its database. You can read the rationale behind that here; for my part, I’m unimpressed. It is not hard to generate absolute URLs for an RSS feed and relative ones for a home page, nor is it hard to abstract linking behind a function that can work out how to navigate the directory tree. This sort of indirection should be a basic feature of any content management system, and WordPress is the most popular CMS out there.
This was recognized as a design flaw thirteen years ago, and yet nothing has been done about it. Maybe this inaction comes from an obsession with backwards compatibility, after all we are talking about a twenty-year old software package that still issues updates for decade-old versions. It might also be because WordPress is in the domain name business, in a way. The core software is open-source, but the development of that software is driven by a for-profit business that hosts websites on your behalf. Making it easy to migrate away would harm profits, and updating their code would make it easier to migrate away.
Whatever the case, setting up a second domain name is much more of a pain than it first appears.
Solution Two: HOSTS.TXT
The fundamental problem last February was that DNS resolvers stopped handing over an IP address when queried about “freethoughtblogs.com”. One clever commenter got around that problem, via a trick so old it makes WordPress seem young and spry.
Remember HOSTS.TXT, from the very start of the first part? Despite being a relic from fifty years ago, it still works! Both Mac OS X and Linux follow the Unix tradition of naming that file “hosts” and placing it under “/etc/”. Windows insists on something more convoluted, but nonetheless all three operating systems will look to that file first before issuing a DNS query. Just add these two lines…
104.21.234.50 freethoughtblogs.com 104.21.234.51 freethoughtblogs.com
… and all three problems are immediately solved.
Slight problem: most web browsing is done via cell phones nowadays. Ironically, both iOS and Android seem to follow Unix conventions, but both force you to go to great (and sometimes questionable) lengths to actually modify that file. Even if you’re using a conventional desktop, it might be controlled by an IT department that takes a dim view of fiddling with key operating system files.
Setting aside all of that, what happens if FtB’s DNS entry starts resolving again but now points to a different IP address? Or worse, it points to the same address, you forget to delete the modifications you made to your hosts file, and then FtB migrates to a new IP address months or years after you’ve forgotten about those two lines?
As elegant as this solution is, asking every reader to muck with their operating system just doesn’t scale.
Solution Three: Reverse Proxies
What does scale is editing the HOSTS.TXT of a web server, then turning it into a reverse proxy of FtB. This neatly handles the TLS and Cloudflare issues. WordPress’ oddball handling of URLs is fixed by rewriting the data this server sends back on-the-fly, so all links point back to itself instead of “freethoughtblogs.com”. Differing domain names are no longer a problem, and the user doesn’t have to edit text file either. Sounds like the perfect solution!
Problem Four: Privacy
HAHAHAHAHA, no.
Some of our smartest experts spent thirteen years perfecting TLS, so that no third party could read your data… but this last solution demands the ability to do exactly that! If this secondary server can re-write URLs, it must be reading every last bit of data you’re getting from FtB, unencrypted. Is it silently re-writing other people’s comments to make it look like they’re criticizing you? Is it obscuring blog posts to prevent you from reading them? Worse, it must also read every last bit of data you send to FtB. That means it sees where you’ve visited, every comment you submit, and intercepts any credentials you send when logging into your account.
The reverse proxy can impersonate you perfectly, except for one critical detail: it cannot pretend to live at your IP address. Instead, it must substitute its own. This limitation screws up the moderation page of WordPress, and renders geo-location services useless. Cloudflare’s automation would find it odd that one IP address is suddenly generating a tonne of traffic, traffic that looks like it comes from multiple people. This could trip an alarm, causing Cloudflare’s reverse proxy to halt your browsing session and ask for proof you’re human, or simply block access to this secondary server entirely.
You must have total confidence in any person who has administrative access to this secondary server. That includes the person running it, but it also includes anyone who finds a way to break in because the person running it didn’t patch a security hole. Hence why I haven’t yet shared the link to this secondary server, I had to grind into your head just how terrible this solution was first. Only after doing all that, could I link to it in clear conscience.
Problem Five: Bots
Leaving that secondary server up has created an interesting experiment. I only launched it a day or two before FtB came back online. Almost nobody took advantage of it, and since my blog is somewhat obscure few humans are aware it exists.
Web crawlers, conversely, are always on the prowl for new websites. To them, this secondary server looks like an entirely new website has popped up. FtB does implement the most common way to hold these crawlers at bay, buuuut…
$ curl -i https://freethoughtblogs.com/robots.txt 2>/dev/null | grep last-modified: last-modified: Sun, 11 Jun 2017 04:09:41 GMT
… a lot has changed on the ‘Web in the last seven years. That secondary server doesn’t store the data being sent to or from FtB for more than a few moments, but it does keep a longer-lived log of the URLs being accessed, the broad details of who was doing the accessing in the form of a “user agent” string, and (with some exceptions) how much data was returned over the last two weeks. That log also summarizes all traffic to our Mastodon instance, alas, but at least instance-to-instance traffic is easy to filter out.
With a bit of elbow grease, I can generate a summary of who’s been using the reverse proxy and/or our Mastodon instance over the last two weeks.
User Agent | MiB returned |
---|---|
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html) | 1829.955 |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com) | 1688.533 |
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) | 1624.483 |
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) | 357.727 |
Googlebot-Image/1.0 | 345.766 |
Mozilla/5.0 (compatible; Bytespider; spider-feedback@bytedance.com) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.0.0 Safari/537.36 | 142.670 |
[my cell phone connecting to the Mastodon instance] | 142.094 |
[a bot claiming to be Chrome 109 on OS X Sequoia?] | 132.541 |
[a Fediverse app I use] | 98.084 |
[a bot claiming to be Chrome 107 on Windows 10?] | 86.083 |
[a bot claiming to be Chrome 110 on Windows 10?] | 80.360 |
[a bot claiming to be Chrome 104 on Windows 10?] | 73.425 |
[a bot claiming to be Edge 101 on Windows 10?] | 72.405 |
[a bot claiming to be Chrome 101 on Windows 10?] | 70.933 |
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com) | 70.249 |
It’s kind of astonishing how much of that is bot traffic. There are only two entries that really look human to me, and both are exclusively accessing the Mastodon instance; all those Windows 10 and OS X agents look like web browsers human beings would use, but when you look more closely at the URLs they’re almost always pointing at a five-to-ten year old blog post. Semrush is a quarter-billion dollar company that specializes in search-engine optimization and marketing, based on the data collected by their aggressive web crawler. As the reference to Anthropic suggests, ClaudeBot is vacuuming up the web to create AI training data. MJ12bot? SEO again. “meta-externalagent?” AI training data. You get the picture, some quick math suggests at least 10GB of bot traffic is flowing through the reverse proxy I set up each month.
Which in turn means I’ve accidentally doubled the bot traffic that FtB gets. There’s no reason to think these same bots are no less aggressive when it comes to FtB’s actual webserver, after all. We’re probably fine, bandwidth is incredibly cheap nowadays and Cloudflare’s cache is absorbing some fraction of that bandwidth, but I will be keeping an eye out for an angry email from PZ.
Conclusions
I hope you’ve realized there is no magic spell here. There are ways to prevent a future Deathlord from taking down FtB’s DNS entries, but they require some combination of fiddling with the server’s configuration, shelling out more cash to an obscenely profitable company, asking you to modify key operating system files, or creating privacy nightmares.
Even if there was, to some extent this all amounts to fighting the previous war. FtB has gone down several times since last February, and none of those involved DNS. We really need to be pro-active about the site, trying to think ahead about what sorts of problems our infrastructure could get into and making moves to mitigate them. While I’m happy to help with that, I’m also an unpaid volunteer that has an unfortunate tendency to get distracted and wander off. Alas, sometimes you do get what you paid for.
Still, this blog post did get written, even if it took 37 drafts and two months to do so. Most of that time wasn’t spent on research, either; while it did take a bit of time to track down all the links, the only bits I didn’t know before starting a draft post was the corporate intrigue and the sheer amount of bot traffic to the FtB reverse proxy. I do have a playground I can use to teach myself WordPress administration, in my spare moments. I’m that sort of person who manages several servers just for the fun of it. All the pieces are there, it’s just a matter of time and patience, and no Deathlord can prevent that.