The Probability Broach: Good guys with guns » « The Probability Broach: Affordable care

New on OnlySky: AI deep research

I have a new column this week on OnlySky. It’s about the new AI mode called “deep research”, and whether it solves the problems that have plagued AI in the past.

AI chatbots like ChatGPT were introduced with the promise that they’d act as superintelligent robot librarians. Their creators promised that they could rapidly research and synthesize an answer to any question, putting the entirety of human knowledge at everyone’s fingertips.

As we know, the reality was very different. Chatbots are giant association machines, building up statistical models of which words are more or less likely to follow which other words, like a more complex version of the autocomplete on your phone. They don’t know the difference between right and wrong answers, only what a right answer “sounds like”. This means they have a tendency to invent facts, figures and references, which makes their output inherently unreliable.

However, the companies that created them keep working on improving the technology. Now they claim they have a solution to this problem: “deep research” mode, which forces the AI to cite real footnotes and references for each of its assertions.

In this column, I tested ChatGPT’s deep research mode for myself. How does it stack up?

Read the excerpt below, then click through to see the full piece. This column is free to read, but paid members of OnlySky get some extra perks, like a subscriber-only newsletter:

OpenAI doesn’t hesitate to claim that this is a step toward an artificial general intelligence, or AGI, which is a hypothetical AI that can do anything a human can do—including original research and the discovery of new knowledge.

These claims present a formidable problem for people, like me, reporting on this technology. I don’t want to be an uncritical booster or a salesman. On the other hand, if this is genuinely a breakthrough, people should know about it.

The case for—or against—AI depends very much on its capabilities. If AI is only good for creating unreliable, low-quality slop, all the energy and resources that went into creating it were a waste.

On the other hand, if AI can accelerate the pace of research and discovery, then there’s a real benefit to weigh against the admittedly large amounts of energy it consumes.

Continue reading on OnlySky…

The Probability Broach: Good guys with guns » « The Probability Broach: Affordable care

Comments

Dunc says

July 10, 2025 at 9:26 am

I’m going to quote a bit from a LinkedIn post a former colleague of mine wrote on this.. (I’m confident he wouldn’t mind.) The whole post is here.

Run Deep Research on a topic you don’t know well, but which your audience knows intimately, and it’ll leave you looking like you don’t fully understand their world.

I tried Gemini’s version on a very niche topic I’ve been researching for the last two years: the life of a forgotten Edwardian actress. Gemini produced a very confident-sounding and complete-looking report. In fact, it drew many of the conclusions I’d reached … about three months into my own research.

The subsequent year-and-a-half of *deeper* research I’d undertaken had painted a much more nuanced and complete picture. Had I tried Deep Research early on in my journey, I’d have been very impressed. With two years’ knowledge, not so much.

The trap is this: these tools give us information that feels comprehensive when we’re not experts. But present their surface-level analysis to someone who knows the subject, and you’ll look like you’ve Googled the answers.

This is particularly problematic when you’re researching something where the expertise is not well-represented in the training data – knowledge that’s behind paywalls, or which was never digitised.

Now, I don’t know how deep your understanding of in vivo gene editing is, but I’m not sure I’d describe the level of research needed to write a popsci article like the one you reference as “deep research”. “[N]o obvious hallucinations or egregious mistakes” is a pretty low bar.

Reply
- Adam Lee says
  
  July 11, 2025 at 8:56 am
  
  I agree that deep research is only as good as the quality of freely available data, and won’t fare well when most information is in books or behind paywalls. That’s a point I made in my article.
  
  However, if the criticism is that it’s only on the level of someone who’s done a couple of months of research… I mean, that’s not too bad!
  
  Reply
  - another stewart says
    
    July 12, 2025 at 5:58 pm
    
    I’ve been collaborating on a paper (now in press). I can’t reasonably expect Chat GPT to reproduce the whole paper, as it includes previously unpublished observations, but I asked to produce a paper covering the rest of the content. Chat GPT’s output is under 1000 words. The paper is over 6,000 words, of which about half are on the new observations, but the relevance of some the remainder is only discernable in the light of the new observations.
    
    Chat GPT included one apparent hallucination. (It didn’t provide a citation, so I can’t check whether it made it up, or repeated someone else’s mistake. It was also an inappropriate statement.) It omitted mention of a key morphological character. (I haven’t checked whether the morphological descriptions are otherwise correct, though I have my doubts about at least one statement.) It mostly referenced secondary sources, not the original literature, and I’m wondering whether it incorporated the original literature. Among the secondary (tertiary?) sources cited is WikiPedia. There are two instances of names being used in an incorrect context.
    
    Watching it run it appeared to be looking at specific sites – as if it’s got a list of sites to look at for particular topics (one way of improving results is a better curated data set, and this is a crudish way of achieving that), but one of the sites was AI generated (I expect that in the end it didn’t use anything from that site in this instance).
    
    I did tell it that I wasn’t asking for a particular journal style, but I was still disappointed with its citation format. Nor did it produce an abstract. It would seem that more detailed instructions than I gave it are necessary.
    
    On the other hand, while it didn’t cite all the literature I found (I did the initial literature search for the paper, and later rewrote the relevant section) it found the most important papers, and found them a lot faster than I did. (On the gripping hand part of the time I spent searching was involved in finding the keyword that I gave it.)
    
    In hindsight it wasn’t a particularly fair test.
    
    Reply
John Morales says

July 10, 2025 at 7:43 pm

Heh. Everyone has an opinion.

BTW, it was released in February, so that’s some months ago.

(Is it really still an open question? 😉

Reply

Daylight Atheism

Freethought in the light of the sun

That was a weird thing to find in a frying pan

Good riddance, Larry Summers

A Mission Possible: Get Your Drank On

How many churches would feed a starving baby?

Link Roundup: November 2025

Some Disjointed Thoughts on the Current Crisis

New on OnlySky: AI deep research

Comments

Leave a Reply Cancel reply