Aside: Let’s Bisect an LLM!

I previously took a lot of words to describe the guts of Markov chains and LLMs, and ended by pointing out that all LLMs can be split into two systems: one that takes in a list of tokens and outputs the probability of every possible token being the next one after, and a second that resolves those probabilities into a canonical next token. These two systems are independent, so in theory you could muck with any LLM by declaring an unlikely token to be the next one.

Few users are granted that fine level of control, but it’s common to be given two coarse dials to twiddle. The “temperature” controls the relative likelihood of tokens, while the “seed” changes the sequence of random values relied on by the second system. The former is almost always a non-negative real number, the latter an arbitrary integer.

Let’s take them for a spin.

[Read more…]

LLMs and Markov Chains

Pattern matching is a dangerous business, but this is now the second third time I’ve seen LLMs compared to Markov chains in the span of a few weeks.

I think people who want to characterize that as merely the output of a big semantic forest being used to generate markov chain-style output. It’s not that simple. Or, perhaps flip the problem on its head: if what this thing is doing is rolling dice and doing a random tree walk through a huge database of billions of word-sequences, we need to start talking about what humans do that’s substantially different or better. …

One thought I had one night, which stopped me dead in my tracks, for a while: if humans are so freakin’ predictable that you can put a measly couple billion nodes in a markov chain (<- that is not what is happening here) and predict what I’m going to say next, I don’t think I should play poker against the AI, either.

This seems to be an idea that’s floating out there, and while Ranum is not saying the two are equivalent it’s now in the scientific record. Meanwhile, I’ve been using Markov chains for, oh, at least seven years, so I can claim to have some knowledge of them. Alas, I didn’t really define what a Markov chain was back then (and I capitalized “Chain”). Let’s fix half of that.

[Read more…]