Let’s Read: Transformer Models, Part 3

This is the final part of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, used in large language models (LLMs). In the first part, we covered some background, and in the second part we reviewed the architecture of the Transformer model. In this part, we’ll discuss the authors’ arguments in favor of Transformer models.

Why Transformer models?

The authors argue in favor of Transformers in section 4 by comparing them to previously extant options, namely recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

[Read more…]

Let’s Read: Transformer Models, Part 2

This article is a continuation of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, which is used in large language models (LLMs).

In the first part, I covered general background. This part will discuss Transformer model architecture, basically section 3 of the paper. I aim to make this understandable to non-technical audiences, but this is easily the most difficult section. Feel free to ask for clarifications, and see the TL;DRs for the essential facts.

The encoder and decoder architecture

The first figure of the paper shows the architecture of their Transformer model:

diagram of Transformer architecture

Figure 1 from “Attention is all you need”

[Read more…]

Let’s Read: Transformer Models, Part 1

Large Language Models (LLMs) are a hot topic today, but few people know even the basics of how they work. I work in data science, but I also didn’t really know how they work. In this series, I’d like to go through the foundational paper that defined the Transformer model on which LLMs are based.

“Attention is all you need” by Ashish Vaswani et al. from the Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017. https://dl.acm.org/doi/10.5555/3295222.3295349 (publicly accessible)

This series aims to be understandable to a non-technical audience, but will discuss at least some of the technical details. If the technical parts are too difficult, please ask for clarification in the comments. You’re also welcome to just read the TL;DR parts, which should contain the essential points.

[Read more…]

Origami: Aperiodic Chevron Tessellation

Aperiodic Chevron Tessellation

Aperiodic Chevron Tessellation, designed by me

Did you hear?  Someone discovered an aperiodic monotile!  Obviously, these are origami life goals.  And, I’m making it out like a joke, but I’m pretty sure I’m not the only origamist who was thinking that.

Oh, but this origami isn’t the aperiodic monotile.  Instead, I read their paper, and was inspired to create a different aperiodic tiling.  And in the mean time, I learned how an aperiodic tile ticks.

[Read more…]

Paper: The statistical mechanics of music

Today I will discuss:

“The structure of musical harmony as an ordered phase of sound: A statistical mechanics approach to music theory” by Jesse Berezovsky in Science Advances (2019). Publicly accessible

I don’t remember where I found this paper, but at some point I wrote it on the back of my hand, so to speak, and it sounds intriguing. This paper uses statistical physics methods to try to explain music. In particular, it’s interested in explaining tuning systems, especially 12 equal divisions of the octave (12edo), as a way of minimizing dissonance while maximizing musical possibility.

Initially I’m quite skeptical, and you should be too. If I were more familiar with world music traditions, I’m sure could point out several traditions that violate this paper’s assumptions, including traditions that don’t use 12edo, and traditions that aren’t clearly trying to minimize dissonance. Even in western musical systems, there’s quite a lot of emphasis on the dissonant major 7th, which calls into question how much minimizing dissonance is really the goal. Nonetheless, it seems an interesting exercise to see how much we can predict from these assumptions, and if the predictions don’t match reality we can later back up and consider where it went wrong.

[Read more…]

Computational complexity of jigsaw puzzles

During the pandemic, I started doing more jigsaw puzzles. Not real puzzles mind you—I found a jigsaw simulator on Steam that was fairly authentic to the real experience. And since I was doing jigsaw puzzles through the medium of video games, I couldn’t help but think about them in the context of puzzle video games. I realized, jigsaw puzzles are kind of weird! In your typical puzzle video game, the ideal is to have a set of levels, each of which require some crucial insight. In contrast, a jigsaw puzzle is more like a large task that you chip away at.

One way of thinking about this is through the lens of computational complexity. Take Sokoban, the classic block pushing puzzle upon which many puzzle video games are founded. In general, a Sokoban puzzle of size N requires exp(N) time to solve, in the worst case. However, the typical Sokoban puzzle does not present the worst case, it presents a curated selection of puzzles that can be solved more quickly. This gives the solver an opportunity to feel clever, rather than just performing a computation.

Jigsaw puzzles, on the other hand, are about performing a computation. And, if you wish to do a large jigsaw puzzle in a reasonable amount of time, you look for ways to perform that computation efficiently. This raises the question: what is the computational complexity of a jigsaw puzzle?

According to the open access paper, “No easy puzzles: Hardness results for jigsaw puzzles” by Michael Brand, realistic jigsaw puzzles require Θ(N2) steps both in the worst case and on average. On the other hand, this is not born out by my own statistics, which seem to fit a straight line.

[Read more…]

I read popular physics: A planet is born

This is part of my series where I read physics articles in Scientific American, and offer commentary as a former physicist.

I’ve had the June issue around for over a month, but I procrastinated too much and here we are in July. I’ll try to keep it short this time so I can move on to the next one.

The June issue is when COVID-19 really hits Scientific American. The cover says in big letters: “The Coronavirus Pandemic”. I already got the July issue, and the coronavirus is on the cover of that too. But, the cover notwithstanding, there are still physics articles, and that’s my area of expertise. This month’s article is “A Planet is Born“, by Meredith A. MacGregor.  This one isn’t paywalled so you’re free to read it yourself.

[Read more…]