From the drafts: Data science as function fitting

Let’s dig out another article from my drafts bin, something I don’t think I was ever going to finish. This draft was titled “Data Science as function fitting”. It begins:

In the buzzword-ridden age of AI hype, perhaps there’s value in having more unmagical ways of talking about data science. My most unmagical description, is that data science is basically function fitting. It’s like what Excel does when you ask Excel to draw a trendline through a set of data points.

So that’s the thesis statement. But is it a thesis worth arguing? It feels kind of “hot take” quality to me. When you have a large enough audience, you start to be aware that at least a few readers are experts, and will call you out on your bullshit. And so immediately after stating the thesis, I felt it was necessary to add a ton of qualifying statements. You know, like I didn’t really mean it.

[Read more…]

“Brain art” sparks controversy

content note: fiction

Recently, brain interfacing technologies have been leveraged to make images and videos straight out of people’s heads. Some people are calling it art, but detractors say it isn’t art at all.

“It lacks any intentionality,” says prompt artist JustAlice. “All they do is lounge around, and the images are just handed to them. They don’t even need to verbalize what they want, or sort through results to choose the best one. And the results look like shit!” She shows me examples of what she considers bad brain art, furiously highlighting all the five-fingered hands. “Is it so hard to just pick up a keyboard?”

[Read more…]

Refreshed opinions on AI art

A couple years ago, I wrote several posts about AI art. But AI is a moving target, and there’s no sense to committing to one single view about it. So let’s reconsider.

1. AI art as theft

The argument against AI images that has had the most staying power, is the idea that training an image generator on art is stealing from the artists. I’ve become somewhat more sympathetic to this argument over time.
[Read more…]

What does it mean that AI is “remixing existing work”?

Marcus reminded me of the common claim: “AI is just remixing existing works”. Or the more colorful version, “AI just regurgitates existing art”. This is in reference to creative uses of AI image generators or LLMs.

While there may be a grain of truth to the claim, I have difficulty making sense of what it’s even saying. It’s basically an unverifiable statement. I think both pro- and anti-AI folks would be better served by a more technical understanding.  So, instead of being stuck at an impasse, we might be able to actually find answers.

[Read more…]

Two kinds of LLM hallucinations

After writing about LLM error rates, I wanted to talk about a specific kind of error: the hallucination. I am aware that there is a lot of research into this subject, so I decided to read a scholarly review:

Survey of Hallucination in Natural Language Generation” by Ziwei Ji et al (2023), publicly accessible on arxiv.

I’m not aiming to summarize the entire subject, but rather to answer a specific question: Are hallucinations are an effectively solvable problem, or are they here to stay?

What is a hallucination?

“Hallucination” is a term used in the technical literature on AI, but it’s also entered popular usage. I’ve noticed some differences, and I’d like to put the two definitions in dialogue with each other.

[Read more…]

Targeted Advertising: Good or evil?

I have had some professional experience in marketing. It’s a job, you know? Targeted advertising is a very common data science application. Specifically, I’ve built models that use credit data to decide who to send snail-mail. Was this a positive contribution to society? Eh, probably not.

In the title I ask, “good or evil?”, but obviously most people think the answer is “evil”. I’m not here to convince you that targeted advertising is good actually. But I have a bunch of questions, ultimately trying to figure out: why do we put up with targeted ads?

For the sake of scope, I’m thinking mainly about targeted ads as they appear on social media platforms. And I’m just thinking of ads that try to sell you a commercial product, as opposed to political ads or public service announcements. These ads may be accused of the following problems:

  1. Using personal data that we’d rather keep private.
  2. Psychic pollution–wasting our time and attention, or making us unsatisfied with what we have.
  3. Misleading people into purchasing low quality or overpriced goods.

[Read more…]

LLM error rates

I worked on LLMs, and now I got opinions. Today, let’s talk about when LLMs make mistakes.

On AI Slop

You’ve already heard of LLM mistakes, because you’ve seen them in the news. For instance, some lawyers submitted bogus legal briefs–no, I mean those other lawyers–no the other ones.  Scholarly articles have been spotted with clear chatGPT conversation markers. And Google recommended putting glue on Pizza. People have started calling this “AI Slop”, although maybe the term refers more to image generation rather than text? This blog post is focused exclusively on text generation, and mostly for non-creative uses.

[Read more…]