Before I move away from the topic of Rationalism and EA, I want to talk about Roko’s Basilisk, because WTF else am I supposed to do with this useless knowledge that I have.
From sci-fi, a “basilisk” is an idea or image that exploits flaws in the human mind to cause a fatal reaction. Roko’s Basilisk was proposed by Roko to the LessWrong (LW) community in 2010. The idea is that a benevolent AI from the future could coerce you into doing the right thing (build a benevolent AI, obv) by threatening to clone you and torture your clone. It’s a sort of a transhumanist Pascal’s Wager.
Roko’s Basilisk is absurd to the typical person, and at this point is basically a meme used to mock LW, or tech geeks more broadly. But it’s not clear how seriously this was really taken in LW. One thing we do know is that Eliezer Yudkowsky, then leader of LW, banned all discussion of the subject.
What makes Roko’s Basilisk sound so strange, is that it’s based on at least four premises that are nearly unique to the LW community, and unfamiliar to most anyone else. Just explaining Roko’s Basilisk properly requires an amusing tour of multiple ideas the LW community hath wrought.
For a proper historical account, I recommend RationalWiki, the definitive source on the subject. Most of what I have to say is already on there, and perhaps the only edge I have over Rationalwiki is that I know a lot about
Decision theory, according to the Decision Theory FAQ on LessWrong “concerns the study of preferences, uncertainties, and other issues related to making ‘optimal’ or ‘rational’ choices.” Decision theory is the subject of interdisciplinary scholarship, but for the LW community, it was a foundation of ethics and rationality.
Normative decision theory concerns the question of how “ideal” rational agents behave–independent of the question of whether humans actually behave that way in practice. The basic principle is that an ideal agent will look at all possible outcomes of each option, and makes the choice that maximizes the expected utility. In other words, the ideal agent obeys something resembling utilitarian ethics.
This is simple enough, but there are many open-ended questions in normative decision theory. For example, it’s possible to mathematically construct a lottery that has infinite expected utility. That’s called the St. Petersburg paradox. But Yudkowsky and others were obsessed with another paradox in decision theory, known as
You are confronted by Omega, who is a superintelligent machine capable of predicting the near future. Omega presents you with two boxes. Box A has $1k for sure. Box B has $1M if Omega predicts you will ignore Box A, but is empty if Omega predicts you will open Box A. Your two options are either open both boxes (“two-boxing”) or just open box B (“one-boxing”). Different people have different intuitions about whether you should one-box or two-box.
On the one hand, one-boxers get the better outcome by far. On the other hand, one-boxing is not what causes you to get the better outcome–by the time you make your choice, there’s already a $1M in box B or there isn’t, so why let that affect your decision?
Here’s another variant, created by Yudkowsky (based on Solomon’s problem). Suppose that scientists have demonstrated that people who chew gum are more likely to die of throat abscesses. However, chewing gum reduces the risk of throat abscesses; what’s going on here is that people with high risk of throat abscesses are more likely to like chewing gum. So, do you choose not to chew gum, because gum-chewers have a worse outcome? Or do you chew gum because chewing gum causes a better outcome? Most people agree that chewing gum is the better choice.
What the gum-chewing problem illustrates, is that there are two different decision theories: either make your decisions based on expected outcomes (Evidential Decision Theory), or base them on the expected outcomes caused by those decisions (Causal Decision Theory). Yudkowsky felt very strongly that the correct answers were to open one box, and to chew gum. Unfortunately, this is not consistent with either of the two decision theories. So he came up with a third theory, which he called Timeless Decision Theory.
Before I get to Timeless Decision Theory, I have to ask, why does he care so much? Humans don’t behave like ideal agents anyways, and it seems our time is better spent learning how to approximate ideal agents, rather than trying to resolve a paradox that probably won’t ever become relevant anyway. I believe it’s so important to Yudkowsky because he isn’t just thinking about humans, he’s thinking about
Yudkowsky believes that we will eventually develop AI so powerful that it will basically take over. Hopefully the AI will transform the world for the better, but there’s a risk it will basically destroy us all. In order to prevent that, he is extremely interested in giving it the correct ethical system. Many people on LW think this is basically the most important problem in the world, and donate money to the Machine Intelligence Research Institute (MIRI) in order to solve it (that’s how Effective Altruism got its start).
There’s definitely an internal logic to it. Newcomb’s paradox is important, because that’s something you explicitly need to program, whereas with humans we can just wing it. The way that Omega predicts an agent’s choices is also much more plausible when we have AI to do the predicting, and when the agent is also an AI.
There’s also a sense in which the prisoner’s dilemma is a sort of Newcomb’s problem, insofar as it has each player trying to predict the decision of the other player, and basing a decision on that prediction. So, if we want an AI that will cooperate in prisoner’s dilemmas, Yudkowsky thinks we need to solve Newcomb’s paradox.
Timeless Decision Theory
To my knowledge, Timeless Decision Theory doesn’t exist, it’s more of an aspiration. Yudkowsky wrote a 120 page paper describing Timeless Decision Theory. The theory is never constructed, so we don’t have a way to determine how it will make any particular decision, nor do we know if a consistent Timeless Decision Theory is even possible.
Yudkowsky believes the distinguishing factor between Newcomb’s problem and the chewing gum problem is precommitment. If you have a way to make a commitment, and hold yourself to that commitment, then the solution to Newcomb’s paradox is simple: before Omega arrives, commit to opening just one box. On the other hand, precommitment does not affect the gum chewing problem, so you just chew gum and that’s the solution.
Here’s the important property of Timeless Decision Theory: you don’t need to make your commitment before Omega arrives, you can make the commitment afterwards. It’s called “precommitment” but it’s a misnomer since it doesn’t need to be “pre-” anything.
Humans don’t really have an airtight way to make commitments, certainly not after-the-fact commitments. But an AI might have that ability! This is definitely the sort of AI that can make a rock so heavy that the AI itself cannot lift it.
You are your clones
There’s one last premise needed before the basilisk falls into place. Basically, LW believed that a superintelligent AI may have the power to clone (or maybe just simulate) anyone throughout history, including their memories. They also believed that your clones would effectively be you. Therefore, if you’re trying to make decisions with the best outcome for yourself, you must also take into account the outcomes for any future clones of yourself.
Aside from the implausibility of it all, I philosophically disagree with the idea that I am my clone. I go in the opposite direction, believing I am not even the same person as my future self. There’s an old Existential Comic about this one. I think that selfishness is on some level irrational but psychologically necessary. On the other hand, “selfishness” in favor of my fantastical future clone is not psychologically necessary at all, and is instead psychologically baffling.
But mostly, I shrug off the question because I have a life outside of philosophy.
The basilisk appears
So here’s how the basilisk works. We assume that we have a chance of building a friendly superintelligent AI. But a friendly AI is not like an omnibenevelont God, this AI behaves according to the principles of utilitarianism, and Timeless Decision Theory in particular. So the AI is willing to make tradeoffs if it thinks the overall utility is positive.
One possible tradeoff it can make, is to threaten people to force them to do good. Once the AI exists, it doesn’t really matter if people do good or not, since the AI will just take care of everything. However, it does matter what people do before the AI is built. And the best thing for people to do (obv) is to donate all their money to MIRI so they can make friendly AI as soon as possible.
You’d think that an AI would not be able to threaten people before it exists, but it can within Timeless Decision Theory! The AI just “precommits” to cloning you and torturing your clone if you didn’t donate all your money to MIRI. By the time the AI exists, you will already have donated your money to MIRI or not, so the AI’s precommitment doesn’t make any sense, but with the magic of Timeless Decision Theory it makes sense.
Of course, this is a friendly AI, so it will only do this if it thinks that it will actually affect your behavior. So the prerequisite for the AI torturing you, is that you know about Roko’s Basilisk, and believe in it. This is a departure from Pascal’s Wager, since Pascal’s Wager ostensibly applies to everyone, not just people who believe in Pascal’s Wager. Roko’s Basilisk only applies to people who accept about five extremely specific ideas, one of which Yudkowsky personally invented.
Supposedly this has caused some people actual distress, to the extent that content warnings may be appropriate. I don’t feel it is right to mock people who likely have an underlying mental condition. So, in all seriousness, check out the RationalWiki for refutations if none immediately come to mind. One refutation, which does not deny any of the premises, is that you can just precommit to not letting the basilisk affect your choices.
Roko’s basilisk was not necessarily believed by LW, although they did believe all of its premises. Mostly it’s hard to tell what people thought of it, since the subject was banned. It was not banned for being correct, it was banned because even if it was incorrect then it might cause someone harm. And Yudkowsky feared that if people thought about it too much they might come up with a basilisk that works. Of course, the ban backfired because of the Streisand effect. It was eventually unbanned five years later.
There’s a moral somewhere in this story, but I don’t have the galaxy brain needed to figure it out.