Two kinds of LLM hallucinations

After writing about LLM error rates, I wanted to talk about a specific kind of error: the hallucination. I am aware that there is a lot of research into this subject, so I decided to read a scholarly review:

“Survey of Hallucination in Natural Language Generation” by Ziwei Ji et al (2023), publicly accessible on arxiv.

I’m not aiming to summarize the entire subject, but rather to answer a specific question: Are hallucinations are an effectively solvable problem, or are they here to stay?

What is a hallucination?

“Hallucination” is a term used in the technical literature on AI, but it’s also entered popular usage. I’ve noticed some differences, and I’d like to put the two definitions in dialogue with each other.

[Read more…]

Let’s Read: Transformer Models, Part 3

This is the final part of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, used in large language models (LLMs). In the first part, we covered some background, and in the second part we reviewed the architecture of the Transformer model. In this part, we’ll discuss the authors’ arguments in favor of Transformer models.

Why Transformer models?

The authors argue in favor of Transformers in section 4 by comparing them to previously extant options, namely recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

[Read more…]

Let’s Read: Transformer Models, Part 2

This article is a continuation of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, which is used in large language models (LLMs).

In the first part, I covered general background. This part will discuss Transformer model architecture, basically section 3 of the paper. I aim to make this understandable to non-technical audiences, but this is easily the most difficult section. Feel free to ask for clarifications, and see the TL;DRs for the essential facts.

The encoder and decoder architecture

The first figure of the paper shows the architecture of their Transformer model:

Figure 1 from “Attention is all you need”

[Read more…]

Let’s Read: Transformer Models, Part 1

Large Language Models (LLMs) are a hot topic today, but few people know even the basics of how they work. I work in data science, but I also didn’t really know how they work. In this series, I’d like to go through the foundational paper that defined the Transformer model on which LLMs are based.

“Attention is all you need” by Ashish Vaswani et al. from the Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017. https://dl.acm.org/doi/10.5555/3295222.3295349 (publicly accessible)

This series aims to be understandable to a non-technical audience, but will discuss at least some of the technical details. If the technical parts are too difficult, please ask for clarification in the comments. You’re also welcome to just read the TL;DR parts, which should contain the essential points.

[Read more…]

Origami: Aperiodic Chevron Tessellation

Aperiodic Chevron Tessellation, designed by me

Did you hear? Someone discovered an aperiodic monotile! Obviously, these are origami life goals. And, I’m making it out like a joke, but I’m pretty sure I’m not the only origamist who was thinking that.

Oh, but this origami isn’t the aperiodic monotile. Instead, I read their paper, and was inspired to create a different aperiodic tiling. And in the mean time, I learned how an aperiodic tile ticks.

[Read more…]

Paper: The statistical mechanics of music

Today I will discuss:

“The structure of musical harmony as an ordered phase of sound: A statistical mechanics approach to music theory” by Jesse Berezovsky in Science Advances (2019). Publicly accessible

I don’t remember where I found this paper, but at some point I wrote it on the back of my hand, so to speak, and it sounds intriguing. This paper uses statistical physics methods to try to explain music. In particular, it’s interested in explaining tuning systems, especially 12 equal divisions of the octave (12edo), as a way of minimizing dissonance while maximizing musical possibility.

Initially I’m quite skeptical, and you should be too. If I were more familiar with world music traditions, I’m sure could point out several traditions that violate this paper’s assumptions, including traditions that don’t use 12edo, and traditions that aren’t clearly trying to minimize dissonance. Even in western musical systems, there’s quite a lot of emphasis on the dissonant major 7th, which calls into question how much minimizing dissonance is really the goal. Nonetheless, it seems an interesting exercise to see how much we can predict from these assumptions, and if the predictions don’t match reality we can later back up and consider where it went wrong.

[Read more…]

A Trivial Knot

Everything is simple except when not

No bread for you, only circuses

The Greater Gardening of 2026 - Part 18 - Potato Potential

The Democratic Party establishment has to be overthrown

New on OnlySky: Don't look now, but Ukraine is winning

Concepting the Rock Epic

Review: I Want to Be a Wall

Will This Be the Coldest Summer of the Rest of Your Life?

How About Something Cool Now That It's Too Late for Us?

"Wheel in the sky"

Two kinds of LLM hallucinations

Let’s Read: Transformer Models, Part 3

Let’s Read: Transformer Models, Part 2

Let’s Read: Transformer Models, Part 1

Origami: Aperiodic Chevron Tessellation

Paper: The statistical mechanics of music