LLM error rates

I worked on LLMs, and now I got opinions. Today, let’s talk about when LLMs make mistakes.

On AI Slop

You’ve already heard of LLM mistakes, because you’ve seen them in the news. For instance, some lawyers submitted bogus legal briefs–no, I mean those other lawyers–no the other ones. Scholarly articles have been spotted with clear chatGPT conversation markers. And Google recommended putting glue on Pizza. People have started calling this “AI Slop”, although maybe the term refers more to image generation rather than text? This blog post is focused exclusively on text generation, and mostly for non-creative uses.

[Read more…]

Environmental impact of LLMs

A reader asked: what is the environmental impact of large language models (LLMs)? So I read some articles on the subject, and made comparisons to other technologies, such as video games and video streaming. My conclusion is that the environmental footprint is large enough that we shouldn’t ignore it, but I think people are overreacting.

Pricing

I’m not an expert in assessing environmental impact, but I’ve had a bit of experience assessing computational prices for LLMs. Pricing might be a good proxy for carbon footprint, because it doesn’t just represent energy costs, but also the costs of building and operating a data center. My guess is that across many different kinds of computation tasks, the carbon footprint per dollar spent is roughly similar. And in my experience, LLMs are far from the most significant computational cost in a typical tech company.

[Read more…]

I worked on LLMs

Generative AI sure is the talk of the town these days. Controversial, is it not? It’s with some trepidation that I disclose that I spent the last 6 months becoming an expert in large language models (LLMs). Earlier this year when I moseyed through the foundational LLM paper, that was where it began.

I’d like to start talking about this more, because I’ve been frustrated with the public conversation. Among both anti-AI folks as well as AI enthusiasts, people have weird, impossible expectations from LLMs, while being ignorant of other capabilities. I’d like to provide a reality check, so that readers can be more informed as they argue about it.

[Read more…]

Let’s Read: Transformer Models, Part 3

This is the final part of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, used in large language models (LLMs). In the first part, we covered some background, and in the second part we reviewed the architecture of the Transformer model. In this part, we’ll discuss the authors’ arguments in favor of Transformer models.

Why Transformer models?

The authors argue in favor of Transformers in section 4 by comparing them to previously extant options, namely recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

[Read more…]

Let’s Read: Transformer Models, Part 2

This article is a continuation of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, which is used in large language models (LLMs).

In the first part, I covered general background. This part will discuss Transformer model architecture, basically section 3 of the paper. I aim to make this understandable to non-technical audiences, but this is easily the most difficult section. Feel free to ask for clarifications, and see the TL;DRs for the essential facts.

The encoder and decoder architecture

The first figure of the paper shows the architecture of their Transformer model:

Figure 1 from “Attention is all you need”

[Read more…]

Let’s Read: Transformer Models, Part 1

Large Language Models (LLMs) are a hot topic today, but few people know even the basics of how they work. I work in data science, but I also didn’t really know how they work. In this series, I’d like to go through the foundational paper that defined the Transformer model on which LLMs are based.

“Attention is all you need” by Ashish Vaswani et al. from the Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017. https://dl.acm.org/doi/10.5555/3295222.3295349 (publicly accessible)

This series aims to be understandable to a non-technical audience, but will discuss at least some of the technical details. If the technical parts are too difficult, please ask for clarification in the comments. You’re also welcome to just read the TL;DR parts, which should contain the essential points.

[Read more…]

Response to Dr. Collier on AI

I obviously watch a lot of youtube video essays, so I frequently get recommended thinkpieces about the problems with AI. And I don’t watch them because I am literally a professional in the field and I don’t need some vlogger to ramble at me about something I generally understand better than they do. But I watched one of these videos, and I disagree on some points, and now you’re going to hear about it.

The video is “AI does not exist but it will ruin everything anyway” by Dr. Angela Collier. It is not necessarily the best example to highlight (it’s an hour of rambling, I respect that most readers will not want to watch that), but it’s the one I watched, okay? I’m going to structure this as a list of items, starting with the most fact-based items, moving towards my more contentious opinions.

[Read more…]

A Trivial Knot

Everything is simple except when not

Goldman v. Landers in NY-10.

Scampering Animals

I told you I might go see Disclosure Day

What may be in the rumored US-Iran peace deal?

The wee gardening of 2026

We missed our chance. Now what?

The Probability Broach: Robert's rules of order

Happy Facial Recognition Day!

Link Roundup: June 2026

LLM's Shouldn't Code

LLM error rates

Environmental impact of LLMs

I worked on LLMs

Let’s Read: Transformer Models, Part 3

Let’s Read: Transformer Models, Part 2

Let’s Read: Transformer Models, Part 1

Response to Dr. Collier on AI