# The information theory of Mysterium

A question that keeps me up at night is “What is the theoretical best you can do in Mysterium?” I’m exaggerating a bit, but it is a pointless question about a silly board game that I nonetheless spent too long thinking about. I went so far as to watch a series of lectures about information theory–listening to it in the background while in dance class, as one does. I never solved the problem, but let me at least explain what the problem is.

Mysterium

Mysterium is a cooperative board game where the players are trying to solve a murder mystery via psychic communication with the victim. One player takes the role of the ghost, and the rest take the role of psychic mediums. The ghost is not allowed to speak, and may only communicate through cryptic visions. The visions are represented by cards with surreal artwork. For example, one card has two people climbing into a giant fish mouth, another has a tarantula-like thing over a chandelier. After the mediums receive their visions, they discuss what they mean and make their guesses; and the whole time the ghost giggles about how wrong they are.

Examples of vision cards.  Source: Mysterium rulebook.

Mechanically, the ghost has a hand of seven cards. Each turn, the ghost gives one or more cards to a medium, and then draws back up to seven. There are other rules but I’m going to ignore them, because the problem is already too difficult as it is.

Suppose that the medium is deciding between seven different suspects. One strategy that the ghost could use is to totally ignore the card illustrations, and give the medium a number of cards equal to the index of the correct suspect. So if the medium receives one card, it’s the first suspect; if they receive two cards, it’s the second suspect; and so on.

This is obviously not the way the game is intended to be played. It’s a boring and abusive strategy that would totally break the game. Let us not pretend for a moment that mathematically solving this problem has any relevance to how the game is actually played. We’re not playing board games here, we’re just doing math for math’s sake.

So, as a lower bound, the ghost could help the medium choose between seven options. But the ghost could do even better by considering the content of each vision. The challenge is that it’s not deterministic what cards the ghost even has available. The ghost has a hand of seven cards, out of 84 cards in the deck. If there’s one card that would perfectly communicate what the ghost wants to say, but that card isn’t in the ghost’s hand, then it’s no use. Furthermore, the medium does not know which cards the ghost has available.

Noisy Channel Coding

This problem may be situated within information theory, and more specifically the theory of noisy channel coding. Suppose that we want to communicate some amount of data through a wire, but the wire introduces some noise. We can think of the data as a series of bits, ones and zeros, but the wire occasionally flips a bit. One way to correct for error is to send three copies of the data. So if one bit is flipped, you can still extract the correct data from the other two copies. Unless, of course, two bits are flipped, then there’s still an error. So maybe send five copies of the data. But you’re still not completely eliminating the error, and you’re blowing up the size of the data you need to send.

As we try to solve this problem, we’re making tradeoffs between two objectives. First, we’d like to reduce the error rate. Second, we’d like to communicate the data quickly. The wire can communicate a certain number of bits per second, and we’d like to communicate some number of megabytes of data as fast as possible, with error rate under some threshold. How do we do it?

It turns out that we don’t need to worry about the tradeoff, because it’s possible to bring the error rate arbitrarily close to zero. According to the Shannon-Hartley theorem, there exist strategies with nearly zero error rate, but a finite communication rate. This communication rate is called the channel capacity.

In Mysterium, the ghost is essentially communicating over a noisy channel. Each turn, the ghost can send basically 7 bits, since they have seven cards, and for each card they can choose whether or not to send it. However, the bits get garbled, and transformed into a series of surreal images.

To illustrate the process, imagine that we’ve numbered each card (and there are 84 of them, so they’re numbered 1 to 84). Suppose the ghost has cards numbered 1, 6, 26, 51, 57, 63, and 64. The ghost wants to point the finger at a particular suspect, and this message gets encoded into a series of 7 bits: “0101011”. So ghost passes the cards numbered 6, 51, 63, and 64. The psychic medium only sees this series of numbers, and does not know what cards the ghost had to choose from. How many bits of information can the ghost communicate without error per turn?

The Channel Capacity of Mysterium

This question is too hard, so I’d like to simplify even further by considering the case where the ghost only has two cards in hand. The ghost is required to pass at least one card, leaving three options: pass both cards, pass the lower card, or pass the higher card. We can denote each of these options with bits: “11”, “10”, and “01”. When there are three options of equal probability, the number of bits is equal to log2(3), or about 1.58 bits. However, when the psychic medium sees a single card, they will not know whether it was the higher card or lower card in the ghost’s hand. The best that the psychic can do is guess that if the card is 42 or below, then it was the lower card; and if the card is 43 or above, then it was the higher card.

As a result, when the ghost passes a single card, 25% of the time the medium will be mistaken about whether it was the higher or lower card. 25% of the time, a “01” will be misinterpreted as a “10”, and 25% of the time, a “10” will be misinterpreted as a “01”.

The Shannon-Hartley theorem is a non-constructive theorem, meaning it won’t tell you how to attain the channel capacity. However, we can at least calculate the channel capacity. The channel capacity is equal to the maximum possible mutual information between the input and the output. I’d rather not go through all the math, (lest I reveal how shaky my own understanding is), but I’ll share the answer I got from making estimates in excel. The channel capacity is about 1.097 bits per turn. In order to attain this capacity, the ghost should use the message “11” about 47% of the time. About 91% of the information is conveyed through the number of cards, and only 9% of the information is conveyed through the choice of which card.

(In answering this question, there was one factor that I did not know how to account for. In the typical noisy channel problem, we assume that the noise is random and unpredictable. However, in this case, the ghost actually knows exactly what the noise is before choosing what message to send. How does this affect the calculation? I don’t even know.)

What about larger hand sizes? Gosh, solving it with two cards was hard enough. To do more cards, I’d need to calculate an exponentially large confusion matrix, and solve multiple logarithmic equations simultaneously. I think a good math scholar could solve it and write a paper about it, but it’s above my paygrade. There’s a phrase that math folks use in a situation like this: “Left as an exercise to the reader.”

1. says

So in this model, the ghost wants to send a 7-bit string, where the first bit is the lowest card they have available in their hand, etc, and the “error” comes from the fact that the medium doesn’t know which bit is which. What if they did it like this instead: to make the first bit a 1, send a card from the range 1-12, for the second bit, a card from 13-24, and so on, because there are 84 cards. And then the error comes from the ghost not having the right cards in their hand. When the medium receives some combination of cards and there isn’t one from the range 1-12, the medium knows that either the first bit of the message is 0, *or* the ghost wanted to send a 1 but didn’t have the right card. And there’s a certain probability of not having the right card.

I’d have to put more thought into the exact strategy- like, the idea is to have enough redundancy in what means what that there’s a low probability that you just don’t have the cards to communicate anything meaningful at all. Not sure if this is an improvement or not compared to your approach.

I’m thinking about the case where the ghost wants to send something like 0000100 but they know that if they send their 5th card, it will get interpreted as being bit 4 instead because it happens to be in the range 37-48. And if the medium is going to guess which bit is indicated by which range it falls into, the ghost should base the strategy on that, rather than the order of the cards they happen to have.

But also maybe that specific problem wouldn’t happen very much? The medium would have to guess based on what range the card is in, and get it wrong, only a small fraction of the time (how small though?) and the greatest risk of getting it wrong is when only 1 card is sent (I think? I could be wrong).

2. says

@Perfect Number,
I think it’s an improvement, if the ghost has at least 3 cards. (It doesn’t seem to come into play for 2 cards as far as I could tell.) But if you have three cards, you know that the psychic will interpret 1-28 as 100, 29-56 as 010, and 57-84 as 001. So if the ghost wants to communicate 010, but the only card in the range 29-56 is the highest card not the middle card, the ghost should just choose that card. So I think the ghost gains an advantage by knowing what the noise is ahead of time.

There may also be wholly different ways to think of the problem. For example, instead of ranking the cards and passing the middle card, the ghost could pass the card that has the lowest value when you calculate N-29 mod 84. And it gets much more complicated when passing more than one card.

3. JM says

That only certain codes are subject to noise means that the error correction can be short cut slightly. So you should be able to beat the normal limit by a small amount. I have no idea how to calculate how much. As a minimal example if you error check every 8 bits and 11 is encoded as 11 if the message is 11111111 then you can skip error checking and move on to the next byte. A more complex algorithm would adjust the amount of error checking based on the number of parts that can be wrong.

4. says

@JM,
If you’re thinking of error correcting codes, you’re thinking of it on a different level from me. I was calculating the channel capacity, which is the theoretical best that any possible error code could do in this case. It’s a non-constructive proof, so I never construct the error code itself. The channel capacity calculation already accounts for what you describe.

The channel capacity is equal to the maximum possible mutual information between the input and output. The ghost has a couple free parameters in their encoding strategy: p is the probability that they send 01, and q is the probability that they send 10 (and 1-p-q is the probability that they send 11). We can use symmetry to argue that p=q. I used Excel to calculate mutual information as a function of p, and found that it hit its maximum at about 26.5%.

Qualitatively, it makes sense that in the optimal strategy, the ghost uses “11” more than either of the other two messages. Because, as you say, it doesn’t require error checking.

5. JM says

I understand that the limit here includes error coding. I was not aware there is a version of Shannon-Hartley limit that allows for different levels of noise on different signal parts. Since you already accounted for that it changes what I said a bit. Knowing after sending a turns information if it will be received correctly or not still allows for saving some space. You can send an encoded byte over some number of turns and then send 1 card if the byte was wrong and 2 cards if correct. If the byte was wrong then resend, if not move on to the next byte.
If the number of resends allowed is capped then the potential for error still exists but both sides would be aware that it was wrong.
I find trying to think of an efficient way to send a byte annoying. I can’t think of an efficient way to do the packing that doesn’t run into problems because you can’t send 00. Which I’m guessing is why the calculated bandwidth is just slightly over 1 bit per turn. You can easily get perfect transmission at slightly under 1 bit per turn by using 1 card for 0 and 2 cards for 1, with a loss of bandwidth only for control signals.

6. says

@JM,
If you’re willing to spend some time, the 8th lecture in the series discusses a proof of the Shannon-Hartley theorem, and in the process also describes a framework to produce an encoding.

My attempt to explain: He describes an NxM parity check matrix of 1s and 0s. The idea is that the sender will always try to send strings of messages v, such that Mv = 0. And so if the receiver gets a message v such that Mv != 0, then they try to infer what the original message was. The proof doesn’t construct the matrix M, but rather performs a sum over all possible matrices, and proves that there exists a matrix that suffices.

I do think that knowing the error ahead of time gives opportunity for optimization. When I did my calculation for 2 cards, I assumed that whenever it was impossible to send “01”, then “10” would be sent instead. But the ghost can strategize around this, for instance sending “11” when they foresee an error. I haven’t checked, but it’s possible you could increase the mutual information with strategies like these.