Video: Pirated books used to teach LLM AIs (Non-Fiction)

This video by Alyssa Matesic has a good overview on the subject of AI companies using pirated books to teach their programs:

I played around with an AI program while working on my next book. While the rephrase functions gave me some ideas, the purely generated text was laughable at best. I stopped using the program when the company added a function to generate entire books in minutes. I didn’t want to support flooding retailers with poor quality books.

I’m not opposed to LLM AIs in general, but there are many quality, legal, and ethical issues to be sorted out.

What are your thoughts?


  1. Bruce says

    Did an AI offer to write “port quality books”?
    Was it a tawny port wine?
    Probably put there by aliens. It explains everything.

  2. says

    Ethics issues aside, I don’t think ChatGPT could hold an idea of continuity or characters for more than a few pages. Any novel would be a fever dream of superficially sensible information that had zero connect from one chapter to the next. It’s easy to imagine, however, how a technically skilled person could fix that through the use of novel templates and programming that lets the AI remember what it generated throughout. Maybe these AIs used for the cheap novels are doing that, or maybe they are just settling for the gibberish.

    I easily have a thousand more words I could say about this but don’t have the time right this minute. Maybe I’ll return…

  3. jenorafeuer says

    As Marcus noted in John Ringo in the Crosshairs, when you get right down to it, the writers at the most danger from AI-generated books are the ones already producing what’s basically ‘generic extruded literary product’ to start with, with a lot of the fast-written stuff on the Amazon eBook-only marketplace counting. There are people who write entirely to a ‘write as fast as possible to milk Amazon payments’ scheme, and AI generation will hit those; while they won’t replace actually decent writers, they will throw up enough chaff to make it harder to find such people.

    Of course, there have also been cases in the past of books that were already blatantly just copies of other things being sold on Amazon for ridiculous prices, possibly as some form of money-laundering scheme. Putting AI-generated text into the ‘book’ would make it harder to find such things since they wouldn’t show up as obviously copies of something else.

  4. sonofrojblake says

    there have also been cases in the past of books that were already blatantly just copies of other things being sold on Amazon for ridiculous prices, possibly as some form of money-laundering scheme

    I had an odd experience of this personally. I wrote a book and published it via Print-On-Demand. Hence, the only copies that ever got printed were ones that were already paid for by interested customers, and they were available direct from the printers/publishers. Also, the publishers were able to make it available for purchase from Amazon, by the same model.

    You could buy it from the publisher for £9.99 plus delivery. You could also buy it from Amazon, directly, for £16.99 (the differential was there to make up for the cut that Amazon took, so I made more or less the same amount regardless of sales channel).

    However: as with other books, when you were shown the book it would also offer you other options to buy, from people selling MY book through Amazon. How this was supposed to work baffled me. Did they take an order through Amazon, then contact the publisher, get it printed, wait for it to be delivered, repackage it and forward it to their customer? That seemed oddly complicated and time consuming, and very labour intensive. It also “explained” why buying it that way was MORE EXPENSIVE than simply clicking on the buy-direct-from-Amazon button.

    And not a little bit more expensive, either. Prices started at forty to fify pounds, and there were several with the book priced at over a hundred. The most I saw it for sale for was a little over £250 – it was an oddly precise number, like £253.34. I contacted a number of these resellers to ask them what on earth was going on – partly to commend them on their enterprise (if they can convince someone to part with over £200 for something I’ve written well good luck to them, I know I’ll never be able to) and partly to ask if it ever, even once, worked. I never got an answer, from any of them. I assumed it was some sort of money-laundering scheme, demonstrating that they had a “legitimate business” or something.

  5. says

    2: Story Engine, which is powered by ChatGPT, can keep track of characters, themes, and plot points. The output is pretty generic, in my opinion.

    However, what’s more common is click farms using Chat GPT to compose giberious books. They’ll upload the books to Amazon and put them in Amazon’s Kindle Unlimited subscription service. Then they manipulate Amazon’s book rankings. If enough people read a few pages, the click farmers can make a profit. Vice posted an article about it. Amazon is cracking down, because they don’t want a best seller titled, “Apricot bar code architecture.”

  6. says

    3 and 4) The write quick quality be damned writers were already on the decline when Amazon changed their payment rules for Kindle Unlimited. Readers of self-published and indie works, while more forgiving, want an attempt at quality writing.

    Right now, KU payment rates to authors are the lowest ever. Many authors are leaving KU as a result. It’s not clear to me if it’s due to a flood of AI books, or Amazon deciding to pay authors less per page read.

    Resellers: They make their money either though higher prices, or adding an excessive “shipping fee” to the sales. Some might be laundering. Some of it might be they can ship books to regions where Amazon doesn’t have a presence. In most cases, the resellers buy copies of the book, so the author gets paid for one sale.

Leave a Reply