[WARNING: Long for today’s attention-spans. Readers over 50 should be OK.]
The Zombie of Bimmler
You’ve probably heard that before. Perhaps you’ve heard the same regarding large language models. One thing that this does is casually glosses over the fact that the two approaches work very differently. Or, more precisely, the two approaches are categories of approaches, which can have independent implementation details, as well.
If you are familiar with old-school Markov Chain analysis and Semantic forests, circa the late 1980s, the illustration above gives a simplified picture of a form of natural language generation. Perhaps you are remembering Rob Pike’s old experiments on USENET, with Mark V. Shaney and Bimmler. [wik] In that form of model, the claim that it is “merely regurgitating existing words” is flat-out wrong because, at best, such systems are capturing word probabilities in a total lexicon, and rolling dice to move through Markov Chains to output some text. The probability that the input text will be output is going to be less than 100% as soon as you have a branch in the graph. Look at the illustration above and you see that the word after “generate” (which is .072 of the words) is “coherent” .085 of the successor words. The odds you’ll get a 10 word sentence that is the exact same as your input very quickly turns vanishingly small. When I say “vanishingly small” what do I mean? I mean small.
GPT3 encodes 175 billion “parameters” which is basically the codons in the DNA of all the language that humans could scrape together and throw at it. Once it’s getting to a dataset of that size, it becomes impossible to exert any influence over it by tainting its inputs in the large, since you’re usually working with likelihoods on the order of 15 or sixteen zeroes after the decimal point. The likelihood that your blog comment is going to pop out the other side of GPT3 is greater than zero, but only if you’re using scientific notation.
Have I now dispatched that point?
Still, let me hammer on it more. The newer LLMs like GPT3 are supplemented with other neural networks that add classification clustering and concept analysis and clustering. One of the things neural networks are good for is – matching stuff – they have been matching stuff since the early days of character recognition (which is also probability-based). Now, if you are talking to GPT3 about your favorite choices in music, you are not simply getting the word with the most likely word following it. You’re talking to a neural network that “knows” how to match clusters of concepts, such as “country music” or “operatic goth.” Here, watch:
Yeah, now use your cognitive and associative faculties and think how a mere Markov-Chain generator would have any likelihood of spitting that out. Let’s look at the last sentence: GPT3’s analysis (call it matching if you like) included some proximity analysis of concepts surrounding the concept of “operatic goth.” In fact I’m not even sure if “operatic goth” is more than a term in my head, but GPT3 had no problem correlating it with some cluster of matching probabilities in a concept table. If you still want to think of this as just a probability engine you are a) wrong and b) forced to think that there are layers of matching networks that activate different clusters of related words (or “concepts”) and then marshal those probabilities into a language output faculty that, I suspect, has a bunch of grammar and completeness checks applied. What I am getting at is that an LLM is not just a cringy “roll some dice as we stumble through a semantic forest” engine – it is a network of networks and a workflow of workflows. Kind of like the 3lb or so of goo-soaked stuff in your head that is reading this. There was a time when all of this analysis was going on in pseudo-parallel, i.e.: a uniprocessor doing steps in series, but now that’s all broken across massive racks of graphics coprocessors that can be programmed to match and search as well as run Command and Conquer. You know how your brain has a part that decodes sound waves into abstract signals, which wind up in a recognizer that turns them into phonemes that resemble parts of a language that you speak? Well, while those phonemes are being matched against a words database (“your vocabulary”) a fascinating thing is happening in parallel – those matches are increasing the activation in your cortical memory as your brain goes, “huh, ‘operatic goth’ what’s that?” and starts retrieving concept matching maps for what you think “operatic goth” means. Neural networks are all based on successive activations, just like your organic brain is. Watch:
Because I was just talking to GPT3 about operatic goth music, including Bauhaus, GPT already had activation potentials on the “nerves” which encode the band Bauhaus (Peter Murphy, David J, Daniel Ash, Kevin Haskins) so it came up first, rather than the architectural movement which, ironically, the band is named after. If we were dealing with a bunch of flat probabilities, it would be hard to talk about Bauhaus with GPT3, without having to tell it to shut up about architecture. GPT gives the game away, in fact, when it uses the word “Associations” – one of the networks somewhere inside GPT is an associations network, which encodes the relationships between “Bauhaus” and other concepts. Just like your brain does.
The term marketing folks were using back in 2016 or so was “data lake.” The idea was to give up on structuring an organization’s data (to be honest, everyone was already flying the flag of surrender on that battle) and put everything in one place where (per above) test summarization, commonsense reasoning, and information extraction could be applied. There’s more than that, there’s a whole lot of extracting that goes into clustering and then concepts co-activate.I am not “up” on the current techniques for machine translation, but I believe that they do a whole lot of clustering in addition to some sequence probabilities. The heavy lifting is not done with Markov Chains and has not been for quite a while. Check this out:
Unless GPT3 is programmed to lie to me, I’ll take it at its word that there is no copy of Marbot encoded in there. Just like a human. I could not quote you that passage word-for word, but I know how far it is into the book, what illustration the Thomason version has, and that, yes, his derring-do landed him a coveted slot as a “staff galloper” – someone lower than an aide-de-camp but who might be expected to hang around a command tent in case some general needed a cavalry officer to run for a Big Mac and Fries, or carry a message to the artillery contingent, or see what the sound over the hill is. Anyhow, did you notice the other interesting thing? GPT3 ripped through that and took about the same amount of time as I did, retrieving what I remember of the incident, and summarized it but also danced around which river it was! I want to say Elbe because Marbot was in the German campaign at the time, but I didn’t want to show my ignorance and hoped that GPT3’s associative memory would kick that out.
Don’t you think that if all GPT3 was doing was spitting out semantic networks, that there would have been a high probability it would have mentioned The Elbe? Nice to it to confirm my hazy memory that it was the German campaign. I just nearly got sucked into Marbot’s memoirs, trying to find the incident under discussion.
The important point I am trying to get across to you is that GPT3 is not just an LLM it is a complete system of AI workflows of various types: clustering, recognition, concept extraction, language analysis, etc. It is modeled to work similarly to how human memory works. In fact, one of the most important aspects of building a modern AI system is choosing the correct learning strategy: do you favor new information, or old? Is there a discount weighting to apply? Do you have any extra information that allows you to adjust the weighting from one source or another? This is not all academic flubbery, it’s really important and we see how the topic manifests in humans: do you retain your childhood belief in Santa Claus, or do you favor new information when someone finally tells you Santa Claus is not real? Then, there’s all the reinforcement learning that periodically tells you Santa Claus is not real. But AI systems have the same problem that children raised in religious households might have, regarding what data to prioritize, when, and how.
I’m trying not to hand-wave a whole lot, here, but the internal workflows of GPT3 are not something I understand. I know that since GPT3 is fed papers about itself, it could probably explain it, but I don’t think it’s worth giving a shit about, because the innards of the AI are changing constantly and new information and new extractions are being added or changed all over the place. My main purpose in going into this is to stamp down on the willfully ignorant chorus here at FTB that says things like “AI just regurgitate stuff.” I’m sorry; I’m programmed to try to respect you but you’ve got to stop hoisting the clueless roger like that because sooner or later I’ll run out a full broadside. This is a fascinating topic and I have here just scratched the surface of it.
Image Generation
Originally I didn’t want to even go into this topic, but I realized that in order to teach you something useful, I have to. The point of which is that LLM systems or LLM AIs like GPT3 operate completely differently from image generators. That’s one reason why you might have a conversational AI like GPT3 that is pretty great, but if you ask it to draw a picture, it writes a prompt and hands it off to DALL-E3, which is 3 years old. In AI terms, it may as well be a mummified corpse from ancient Egypt. The state of the art changes constantly and very quickly. In fact it changes so fast that it’s impossible to both 1) use the systems and have fun with them 2) study them well enough to understand them as they are today. I use Stable Diffusion on my desktop gaming machine, which has a honking big GPU that cost twice what everything else in the computer cost. When I bought it, it could crush out an image in 15 seconds. Now, the software is getting more complex and the training checkpoints, too, so the image production gets slower and slower while getting better and better. But I don’t care – it can run while I’m eating dinner. Alright, so – one of the big perception problems, like with GPT3, etc., is this idea that the original image is latent in the checkpoints and might come popping out at any moment. Let’s talk about that, because it’s uglier than a three week-old pizza and those of you who have absorbed that “the AI spits out the original image” idea have been played.
So maybe you heard that there was a lawsuit where some artists sued Stability AI, Deviantart, etc. because their images had been hoovered up into the 1st generation diffusion models. Like with LLMs, a large part of the power of the model comes from having a huge amount of input – so, basically, there were image-grazers that just ingested pretty much everything. That’s what the lawsuit is or was about. [artnet] Reading the whole story of the suit is interesting, but I’ll give you the short form: the plaintiffs whiffed. The judge gave the plaintiffs an opportunity to amend their suit (i.e.: “fix it, or it’s dead”) and I believe the matter stands there. The actual complaint is not a great piece of lawyering. I wish that the defendants had hired me as an expert [I have served as a non-testifying expert on some two dozen great big tech cases, and consistently helped my clients win to the tune of over $300mn, which I think is an accomplishment of sorts. I was never on a losing side, and I never dealt any bullshit. I was almost always on the defense for matters of patent infringement, since I invented or know the inventors of most core internet security systems] anyhow, here’s what’s going on: remember how I explained that GPT3 is a system that does learn and infer bunches of stuff from its input? So is Stable Diffusion. So what the plaintiffs did is they trained a Stable Diffusion model with the plaintiff’s images only. Then asked it to do some art, and Ta Da! it looked a lot like the inputs. From the Artnet article [artnet]:
The upper left is an image by some internet artist I have never heard of, and the other three are created with Stable Diffusion using its built in idea of “styles.” Notice that Stable Diffusion does better with the hands. /snark Anyhow, there is an obvious similarity. If you took a human artist and trained them to paint in a particular style, guess what? They’d also be able to create images in a particular style. Then the question becomes simply a matter of copyright.
Some people deride AI art generators as automated machines for copyright violation, but – be careful if you do that – because human brains are also automated machines for copyright violation and, like the AIs, they have to be trained not to do it. In fact, I’m going to claim that there is a whole lot of impressionist influence in all of those images – a bit of Van Gogh, a bit of Seurat – and a dash of Valentin Serov, Gerard, David, and John Singer Sargent. In fact copyright law gets really tough when you start talking about derivative works rather than copies of the original. If you’re one of the people who heard about this case, and heard that AI art generators can sometimes spit out images that are remarkably close to their originals, you have absorbed some bullshit that you need to shake out of your system. I am pretty sure that if I trained GPT3 solely with a copy of Marbot’s memoirs, it would just love to chat about Marbot, in the style of Marbot. When you get to derivative works, the legal hurdle is much higher – you need to show that there are unique creative elements expressed by the creator, for which there is no prior art. (Prior art in tech patents means published examples, papers, or another patent) (Prior art in painting would probably be satisfied if there were examples on exhibit at The Met) I cannot take some Cray-Pas and do some cubism, then sue the estate of Picasso. For one thing, the estate of Picasso would say, “well, Pablo was strongly influenced by Cézanne” and the case would get thrown out as soon as someone pointed out the artistic debt Cézanne owed Manet and Pissarro. There is another problem, which is that copyright law respects the influence of influential artists; in other words “nobody should be surprised that if Thomas Kinkade paintings are selling for $250,000 some artists start painting ‘in the style of Thomas Kinkade.'” Of course it would be tasteless to say exactly that, but here is the problem: if I look at a beautiful portrait done by John Singer Sargent and ask someone to do a portrait of my friend Gustavus Adolphus Burnaby it may come out looking a bit Singer Sargentish and a bit Thomas Gibson Bowles if I said, “I love Sargent’s use of light” [By the way I have no idea if Bowles was influenced by Sargent or not, I am merely looking at the image and speculating that it looks quite ‘in the style of’ which means I like it.
Now, what’s going on inside the AI? First off, there are big neural networks that match keywords to image elements. There are big neural networks that match image elements to other image elements and procedurals – so, if you have a pastel sketch, it will match for pastel image elements in how it creates regions of color. I know it’s crazy, but there is no data element anywhere in the checkpoint that encodes “an arm in a white shirt” its probabilities of blobs near other blobs of colored blobs. The more blobs you have and the finer the resolution you’re working at, the smaller and more precise the blobs are. Like with LLMs, they work because they have ingested all of human art. Like with LLMs there are all sorts of matching and clusterings going on – it’s not a semantic forest that eventually resolves down to an arm in a white shirt. Because then it might resolve to an arm in a blue shirt on the other side. None of that is in there. Like with LLMs there are workflows and matching routines, and there are a huge variety. CLIP is a popular knowledge-base and process for going from a text input “arm” to “these blobs look like an arm” but there are others. Sometimes the underlying unit is “splats” and sometimes its “blobs” but the basic process is pretty simple when you understand it. As Michaelangelo didn’t say “the way you sculpt David is take a big block of marble and knock off all of it that does not look like David.” That is actually a deceptive development model for diffusion-generated art. But when people think that there’s an image of “left arm in a white shirt” or “Picasso’s Guernica” hidden down inside the checkpoint, they are making that mistake – they imagine Michaelangelo has a sketch of what he wants David to look like, and knocks off everything that doesn’t match. In other words, he is doing image-to-image transformation. Diffusion models don’t work that way. Imagine Michaelangelo takes a text description of David and runs it through some transformers that increase the activation on arms, hands, feet, p*nis, etc. Now you have a neural network that is primed to match images that the transformers have crunched the prompt down into. Then, you create a block of marble and randomly whack chunks and holes out. No, wait – analogies break down and that’s why we get the process wrong. Let me try another way.
The user’s prompt gets broken into words and is matched against the CLIP encoder, which is a model trained with vocabulary to node mappings. A “node” is a thing (a blob or a splat) in another neural network that is used to classify an image. So the first process primes a neural network to match images that more or less match the text in the prompt. Then you fill an image with noise and iteratively perform noise reduction by altering the pixels that least-well match the primed neural network. No pass specifically tries to make an image come out – each pass tries to make the image match better. The inputs into this process are the matching training set, CLIP transformation set, and the number of reduction steps you do, and the amount that you want to change each time. See? There is no “left arm in white shirt” its that if the CLIP transformation primes the match to want to match parts of images that appear like a left arm in a white shirt, those pixels will be less likely to change. After a few passes (depending on the reduction algorithm let’s say 40) set to eventually change 100% of the pixels, you’ll wind up with something that looks as much like a picture as it can. It’s way more complicated than that, of course. Now, there are a bunch of generators, which you should think of as the workflows and surrounding processes that make up an image generation AI. Automatic1111 is a self-contained one that you can swap knowledge bases or CLIP transformers on, and it’s pretty simple to use. One feature I used to enjoy was you could ask it to show you each iteration. So it would start with a block of white noise, then a ghostly shape, a refined shape, a shape with arm-blobss and leg-blobs and would slowly refine into the image you asked for. That’s pretty cool. So if your prompt was “portrait of a cavalry officer in the style of John Singer Sargent” you might get something shockingly good if the AI has been trained exhaustively. It’s also a bit weird because, for example, if my prompt is “portrait of a cavalry officer lounging on a chaise longue smoking a cigarette in the style of Sargent” the CLIP network may not understand “chaise longue” so when the diffusion engine starts matching away, it’s basically aiming to create “portrait of a cavalry officer lounging on a wossname smoking a cigarette in the style of Sargent.” So there is no model of what a chaise longue looks like, because the system literally does not know what one looks like – but it knows it when it sees one. If there are a bunch of keywords in the prompt that push the image in one direction, the wossname in there becomes impactless. I find all this delightful and trippy. So, if your checkpoints have been trained with a bunch of images labeled as Sargent, you will get images that the CLIP transformation primes as Sargent-esque. If it doesn’t know Sargent, you could try asking for Bowles or Caravaggio. But there are no images stored in the system that can possibly get pulled back out. The only way to get images that look remarkably like the inputs is to set them up that way.
One of the plaintiffs is an artist named Greg Rutkowski. I’m not going to liken him to John Ringo in terms of his talent, but he does a lot of semi-realistic fantasy paintings and illustrations. For all intents and purposes, he’s slightly less talented than Michael Whelan or Hildebrand. He’s fine, though, if you like that. When the original builders of the stable diffusion knowledge-bases trained them, they tagged the images by Rutkowski as Rutkowski. So, CLIP learned how to tell the neural nets to match for things that somewhat matched stylistically. I specifically chose those artists because Michael Whelan’s stuff on the internet is a lot smaller in terms of sheer pixels, and he was concerned with his imagery getting turned into Tshirts or other actual copyright violations, so he kept the pixel sizes small. That’s OK, if I am using an engine like Midjourney, which has style correlation and clustering, any artist that works in the style of another artist is going to crunch down to something that comes out looking like whatever it looks like. One of the reasons that the plaintiffs are going to lose their case is because the response of all the knowledge-base trainers is that they removed Rutkowski’s name from their labels. I’m sure there’s some art by him still in there, so if you ask for “a knight in gothic armor in a detailed oil style” you may get a hint of, well, everyone who works in that style, whatever their name is. That lawsuit is going to flop hard but the lawyers will still collect their fee so the notion of ‘success’ is relative to your frame of reference.If you’re one of those people whose beliefs about AI are based on outmoded systems from the 80s, or from carefully primed [for legal reasons I won’t say “fake”] images from art generators, you need to either stop talking about AI for the next 10 years, or educate yourself. I’m not trying to be mean, it’s simply that a strategy of claiming that AI can’t be creative and regurgitates – it’s not going to work. For one thing, my prediction is that the next version of GPT will appear to be sentient. It may or may not be, but if you can’t tell if it is or isn’t, your beliefs are your problem. I’ll suggest you spend a while thinking about “what is sentience?” instead of “what is intelligence?” As a proper linguistic nihilist I’m just going to have to say I don’t know what any of that is, and I probably won’t even know it if I see it – but I’m pretty sure you don’t know, either. The successful strategy, I predict, will be liberating for humanity and our non-human companions, since we will have to focus on a vague concept, namely sentience. I should probably do a post about vagueness and categorizing AI, but it’s a topic I’ve danced around here before and I hope I don’t have to hammer on it. The approach I will recommend is to give up making Turing Tests and IQ Tests and whatnot, and start categorizing things as “more or less sentient” or “more or less creative.” I think that the whole argument about “what is creativity” is doomed to linguistic nihilism (especially if I have anything to do with it!) but we can have a pretty productive discussion of “who is more creative: John Ringo or Sam Delany?” There, that was easy, wasn’t it? Of course it gets hard and knives come out if we want to sort out vague concepts like “which is a better pizza topping, mushrooms or pineapple?” because there is going to be honest disagreement. In my opinion, we have passed the Turing Test a long time ago. Which, if you want to hold up 1980s quibbles about AI, means they are intelligent, full stop. I don’t think such tests are good. Is it intelligent when it can beat Magnus Carlsen at chess? Etc. I think our discussion should be: is it more or less creative than John Ringo? Is it as good an artist as Scott Adams? Does it write better more consistently than every single college undergrad I have ever met, including me?
AI’s not that hard. Think of it as an attempt, using increasing fidelity, to model some of the behaviors that brains do. Some of it, like learning, it does pretty well. It’s complicated but it’s not as complicated as brains. I know a fair bit about AIs and relatively little about my brain [which appears to have been accidentally customized] – just think, DALL-E was where I started playing with AIs, that I had not studied since I read Rumelhart and McClellan back in 82. It’s been 3 years to go from DALL-E to increasingly impressive machine translation, interaction, and creativity. 3 years ago I couldn’t spell “AI” and now I are one!
I was thinking that if AI researchers wanted to give the AI the ability to retrieve exact text, they would have to use a Patricia Tree as an index structure. There is a problem with full text retrieval systems, in that it seems to be a law that to give exact retrieval you have to keep a copy of the input, although you can compress it as much as you can figure out how to, so long as it’s not lossy compression. Since GPT3 is trained on basically all the language that they could find, it would have to contain the internet. Another way of thinking of this problem is that it’s a matter of digital signatures and compression algorithms – unless you keep the input, you have to use some kind of lossy encoding and therefore copyrights will not be violated.
A fun part of the state of the art is using ChatGPT to write prompts for Stable Diffusion or Midjourney. It works great. There are some ComfyUI plugins that query ChatGPT to rewrite and tune the prompt with its knowledge (I do not know the extent) of CLIP models.
Another interesting aspect of having the word->image matching being a separate process (as it has to be) is you can create AI checkpoints that are simply aphasic about some things. There is a really really good, amazing, model called Flux, which creates really good images but all of the tags that would allow people to create “art in the style of…” or erotic art aren’t in the training. So you could write a prompt like “Donald Trump and Elon Musk are fucking in a hot tub” and you’ll get something but Flux doesn’t know how to create images including “fucking” because it doesn’t know how to match that word to anything.
One other really cool technique I didn’t go into above, because it’s not relevant to the argument I was making, are things like LORA and ControlNets. So, a ControlNet or a LORA is a smaller training set that is used to pull the matching probabilities in a desired direction. It doesn’t take much! The image above was created using the Flux checkpoint, with a LORA someone trained to produce Warhammer 40k space marines. To create a LORA, what you do is train this smaller set of probabilities with, let’s say, 150 space marines from Warhammer, and then when you apply the LORA it shifts the mappings from CLIP toward matching things that match space marines. All of this is done without understanding a damn thing. But what is going on inside is the prompt, through CLIP is pulling the matches toward “Santa claus” and the LORA is pulling them toward “Warhammer” – both of which are perfectly valid nodes in the checkpoint – so you get an image that matched santa claus and warhammer better in the noise than images that matched “the pope” and “pogo stick.” ControlNets do sort of the same thing as a LORA except they tweak the noisy image on a couple of the passes through it, making some regions match toward an input image. So if you have a prompt “portrait of Marilyn Monroe on a Skateboard” you feed it an image of someone on a skateboard and it pulls all the underlying pixels a little bit toward the input image, which tilts the diffusion toward producing something more like the input image. Since the ControlNet does not include details of the face, the AI engine creates a Marilyn Monroe face. This can be shockingly effective – to the point of being able to edit an existing image.
For the Santa Claus image, since I wanted it to look like a book-cover style image done in detailed oils and pencils, I said “In the style of Michael Whelan” instead of “in the style of Greg Rutkowski” – the point is not the name of the style (though the artists think it’s important) it’s the the model that the name and many other forms. There aren’t discrete models – there is no “Greg Rutkowski” model, it’s just that a whole bunch of probability vectors can be attached to Greg Rutkowski that match Michael Whelan pretty well, too, except on a few points. If you think about it, that’s the only plausible way you’d be able to request a combination of images like a Warhammer style Santa Claus, because none of those existed before I asked for it.
As I have mentioned elsewhere, it is my opinion that this is a variation of the process that produces what we call “creativity.” In fact I’ll go way out on a limb and say that creativity must be capable of being implemented in a machine, unless someone can prove that there is some kind of immaterial undetectable magical creativity engine that we simply have not yet discovered.
Bimmler was one of Rob Pike’s trained Markov Chains, which was filled with fascistic and racist drivel. When someone on USENET started making fashy noises, Pike would set Bimmler up to follow-comment all of the person’s postings, commenting approvingly and verbally goose-stepping around. USENET was a kinder, gentler, place than today’s social media.
I decided to ask someone who really knows how diffusion image generation works:
snarkhuntr says
I’ll repeat what I said earlier in the discussion about your ai generated ‘book’. If that’s your definition of creativity, it’s a pretty sad one coming from a creative person. I’ve seen some of your own art, both photographic and physical (mainly the bladesmithing), and I would say that it certainly appeared creative.
To me, the thing that’s always been lacking here is volition/intent/understanding, and these don’t appear to be things that the current generation of AI systems are capable of producing. They can attempt to simulate it, often with conversation tics that have to have been programmed in by the owners, such as when it says things like “when I hear Bauhaus” in a completely text-based format. This is essentially a cognitive trick, getting the system to ape human behaviour causes people to attribute humanlike characteristics to it, it’s been working since Eliza.
To go back to your book example: the model doesn’t ‘know’, nor does it care that it described the character as a brunette in one chapter and as a redhead in another. Why would it? The AI system spends even less time/effort on characterization that a hack author like Ringo or Hubbard does. Actual authors usually work from a broad story outline, maybe some key characters they want to explore, some sets or settings they like, and flesh out from there. Your LLM’s creativity appears to consist of taking your prompts and applying MILSF-based language to them. Character descriptions are common in books, so it adds plenty of those – even in places where the hackiest, paid-by-the-word author might stop and think “Is this neccessary”? Sometimes the need to add descriptions appears to override the model’s memory that it has already described that character, so it goes on and gives that character a new description. It does not ‘know’, nor does it care that it contradicts itself. After all, dice don’t mind giving you a second six in a row….
How do you define creativity anyhow?
I think the veracity of this point really depends quite strongly on what you consider learning to be. If it’s rote memorization, you have a point. If it’s understanding the underlying concepts, I think you’re way off the mark. Try posing a relatively common logic problem to an AI, and it’ll probably get it right. Pose the same problem to the AI but change the terms around in a way that contradicts the various versions of that problem that the AI ingested from Quora, Reddit, wherever, and suddenly it’s getting it wrong. It can probably count the R’s in ‘strawberry’, since that embarrassment caused the owners to have someone hard-code in the answer. But start asking it simple easy, but uncommon, questions like “Which months have [letter] in their name?’ and you’ll rapidly see that any illusion of ‘understanding’ in the model is simply caused by it having previously ingested examples of those questions being asked and (mostly correctly) answered by humans. There is no reasoning capacity.
Some people thought Eliza was sentient too.. I fail to see your argument. But even so, I doubt your proposition here. GPT’s capabilities have not been increasing rapidly. I’ll quote David Gerard here:
I’m not saying that these technologies aren’t interesting or useful. I just object to the endless and uncritical hype that these systems are being pumped up with. They’re cool, but I see no signs whatsoever that they’re going to be world-changing, they will not solve our problems, they will not fix our society, and they are buring ever-larger amounts of energy and money while their boosters continually talk about what the systems will do, while ignoring the limitations of what they presently can do after billions of dollars of investment.
There is an s-curve, but I suspect that you and I see this technology as being in different places along it. I suppose this disagreement will settle itself in another few years.
When the only major commercial use of these systems ends up being phone/internet chatbots to get customers to give up, and automated systems for denying insurance claims and social benefits, I’ll be sad to be right. But hey, maybe i’m wrong. Maybe Sam Altman will really give birth to the god-in-a-box, and if it isn’t a paperclip maximizer, we can all enjoy the new AI golden age.