Content note: I don’t have any links about the Trump administration today.
AIs Regurgitate Training Data | Reprobate Spreadsheet – Last month I wrote about the claim that AI “regurgitates training data”, and some people claim that this virtually never happens, or else they claim that it’s the only possible thing that happens. And I keep saying, you don’t know either way! It’s a question that can only be answered through empirical research. And what the empirical research says, is that models do it sometimes–and that’s bad enough. HJ discusses some of the research here.
But I have a bit of a critique. HJ describes a study that asked an LLM to predict number sequences, such as currency exchange rates. The predictions had lower root mean square error when predicting sequences in the training data. The researchers call this “memorization”, and HJ calls it “regurgitation”, but I call it a textbook description of “overfitting”. Clearly the models are retaining excessive unwanted information from their training sets, but calling it “memorization” creates a false impression that it’s verbatim quoting, which it’s not.
This is Arousal | No Pun Included (video, 20 min) – A board game critic traces a popular claim: the most fun part of a board game is opening the box, and then they read the rulebook where fun goes to die. It’s based on a small study of families playing Hasbro, which measured physiological arousal rather than fun. It’s not a strong study, but you know, it’s just a grad student’s proof of concept, it’s fine, been there. It’s just wildly inappropriate to generalize into a nugget of conventional wisdom. This video is a great example of science popularization done well in an unusual domain.