Large language models don’t really work with languages, as we think of them anyway.
At their heart, LLMs are a sophisticated version of “guess the next number in the sequence.” Their input is a long list of integers, and their output is a long list of fractional values, one for each integer they could have been fed. The likelihood of any given number being next is proportional to the value the LLM outputs for it. We can collapse these probabilities down into a singular “canonical” output by randomly picking one of those integers, taking likelihoods into account. If the LLM is being trained, that output integer is compared against what actually came next and the LLM is adjusted to (hopefully!) be more likely to output the correct integer. Want more than one integer? Shift all the input numbers up one space, discarding the first and appending the output integer to the end, and re-run the LLM. Repeat the process until no integer is all that likely, or the most likely integer is one you’ve interpreted to mean “stop running the LLM,” or you just get bored of all this.
[Read more…]