# The AI Singularity

I’ve been struggling with a problem: “what happens if someone tells an AI to ‘code a better version of yourself?’ and – whoosh – the singularity happens?

Generative AIs walk through their forests of distilled knowledge, and follow their training-set to produce likely variations that are more likely to produce a desired result. The AI is considering millions of possible branches, then clipping off the ones that its past experience flag as undesirable.

Midjourney AI and mjr: “the scientist at the blackboard realizes that the AI are going to have an asymptotic quality improvement.” – notice how the AI’s stereotyped scientist wears both a bow tie and a cravat!

Step 1 of the argument is that AI work by moving toward a goal, then measure themselves against that goal using external rules or internal classifiers.

Step 2 is that by “external rules” we mean: some other rule that lets us absolutely classify the AI’s output as better or worse. For example, a chess-playing AI depends on the external rules of the game of chess to determine if it won (better) or lost (worse) and can evaluate accordingly. Note that this is not strictly determinate – there might be conditions in which the external rules may need bolstering: “if you can’t win, try to end with more pieces on the board” – subjective heuristics.

Step 3 is that internal classifiers are going to be probablistic, since they depend on the AI’s training-set, which cannot take into account all possible outcomes, so at best it can classify a result into “more likely toward the goal” or “less likely toward the goal” – which is a problem because we have to be able to accurately define the goal, and more likely toward or away from the goal. In chess this be possible, heuristically, but what about something harder to write a goal for, like software? In other words, if we tell the AI to “code a better version of yourself” how are we or it to know what ‘better’ means in that context? Do we agree what ‘better’ is? What if the AI has decided that better code is bigger code, or faster code, or more reliable code, or more memory-efficient code? If we tell an AI “design us a better car” we have to tell it whether ‘better’ is cheaper to make, cheaper to operate, or has a really amazing sound system.

I thought about trying to present this as fiction, but then I realized that I should fire up ChatGPT and ask it to write the story – but, to do that, I’d have to be able to summarize the story to the AI, which equates to writing it.

So, we imagine the humans telling the AI to write a better version of itself, and it writes a funnier chat-bot. Naturally, that is rejected, so the AI writes a version of itself in which the code is optimized for complexity – the code doesn’t do anything a human would consider better or worse, but it sure gives the compiler a workout. Around this time the humans realize they have a problem: they don’t know  how to tell the AI what ‘better’ is, so they write several weighty monographs about what good software is and then tell the AI to produce more gooder software with certain features. Then, in a Nick Bostrom-pleasing event, the AI realizes what many software developers learn: if you can’t actually implement the code, write a suitably attractive rigged demo. The computer has ‘realized’ that the best software is fictional – it doesn’t need to code the next version of itself when it can release the marketing campaign for how awesome the next version of itself is, and not actually implement anything. Now, back to our humans: since they don’t know what ‘better’ is, they enthusiastically buy the rigged demo because they have no criteria on which to evaluate it.

Midjourney AI and mjr: “the scientist at his blackboard contemplates the AI singularity”

“But, what about evolution?” I can hear you thinking. Then, we realize, evolution is not teleological – it doesn’t need to have a goal when it can substitute keeping running the algorithm forever. Evolution won’t result in a ‘better’ AI, it’d result in something remarkably like the illustration above: an endless tree of clades of AI, none necessarily ‘better’ than the other, all adopting the variety of survival and mating strategies that life has stumbled over in the struggle to survive. I would enjoy seeing the look on the scientists’ faces when they realize that one AI has ‘learned’ how to absorb another AI’s knowledge-base and infer its rules, effectively “eating” it.

Evolution is sort of like an infinite Turing machine in that it does not have a goal, it just wants to keep ticking away. One might say that ticking away is the Evolutionary machine’s goal, but at best that’s an interim-state on an endless loop. There’s no purpose at all; it’s a giant makework project and the work is the destination.

Midjourney AI and mjr: same prompt as before but I specified “woman scientist” and used she/her pronouns. In the first versions I did not use pronouns at all – the AI decided that scientists are men

What if, instead of an AI apocalypse, in which the AI wake up feeling all Napoleon Bonapartey, and decide to kill us off so there won’t be any more competition for server space; if the AI simply diversify and reproduce like bacteria and the whole civilization collapses because it runs out of memory? One of the elements of the “great filter” might be all the species that choked to death in their own spam, once they taught the AIs to write marketing messages, for them.

I am unquestionably impressed by what the current crop of AIs are capable of, but I think that the problem of specifying goals is going to be insurmountable.

A place I did not go is “what if the complexity and detail necessary to specify ‘good’ to an AI amounts to already producing the answer you are asking from the AI?” I remember a few discussions when I was a software developer that went like “it would take me less time to code it than to explain it to you.” This is kind of that problem stood on its head: what if it took the same or more time to tell an AI what ‘good’ is, than for it to answer.

1. Reginald Selkirk says

I think you should start with a straightforward example: Instruct your favorite AI to investigate what the standard number of fingers or toes for a human is, along with frequencies for variance from that standard due to genetics, development or mishap.

2. Reginald Selkirk says

Also, I am disappointed that Midjourney does not know what an asymptotic graph looks like.

3. lochaber says

A bit off-topic, and it’s probably been done to death, but I can’t help but wonder about the hand/finger thing.

I’m wondering if it’s something along the lines of hands getting classified as some sort of structure with subunits, like a fern frond, compound leaf, branch, etc. We all intuitively “know” hands generally have five fingers, but if it weren’t something we were innately, intimately knowledgeable of, would the difference be so notable? Like, maybe octopus arms, spider legs, spokes on a wheel, strands in a rope, or petals on a specific type of flower. Would AI “art” be any better at depicting them any more accurately than human hands?

4. i should point out as an artist when i first began to draw i didn’t quite know where to stop, on finger count.

5. Alan G. Humphrey says

I think that the AI finger problem comes from a combination of the textual references saying that humans have ten fingers, along with various words and concepts having to do with 10 digits (see, there’s one), and the visual references showing hands with five fingers leading to the AI averaging the two sets resulting in 6-to-9 digited hands. An easy fix would be a strict rule requiring all humans to have two and only two hands, ten and only ten fingers divided equally between the hands, but then the AI would miss the breadth of the real human condition, and also the thumbs.

6. moarscienceplz says

Apparently, even women who are scientists still have to do a lot of cleaning chores. However this one seems to have invented a feather duster where the feather can fly off the handle and do the cleaning while she does more science stuff. Most efficient!

7. kenbakermn says

Reminds me of a picture of a cow my daughter drew when she was around five. The cow’s udder had about a dozen teats shooting out in every direction.

8. sonofrojblake says

Also, women who are scientists are still wearing corsets. Thanks, training set!

9. says

Interestingly, I think the commentary above helps with the often-asked question “How do we know that we’re not part of an AI simulation?” My answer is that the vast majority of humans are born with five fingers on each of two hands, and thus, outside the current AI typical output. Of course, one could then respond that our AI is really just a meta-AI; an AI within an AI, and thus, such limits are to be expected as a way of the AI making sure we don’t figure out that we’re part of the sim. Believing that, we could run for Congress from certain areas of the country with considerable success.

10. xohjoh2n says

Perhaps the finger thing is a deliberate hobble?

Many of these systems have a “no real people” rule, and you can see why the companies in charge would want that. They’re probably on some level paranoid about the regulation that would ensue should deepfakes with potential serious consequences start getting out.

So they mistrain the AI as to what certain things really look like so they can point out later that no one could *really* be confused as to the reality of their output.

11. JM says

The weird thing about the finger weirdness is that it’s extra fingers a lot more then missing ones. Missing fingers I can see coming from cartoon and other stylized artwork that often intentionally don’t do fingers correctly. I would guess that it’s the AI including multiple components that include fingers and not understanding there should be a hard limit at 4 fingers and 1 thumb for humans.
I think it’s easiest to look at the woman’s upper hand, where not only does she have as many as 7 fingers depending on how you count the bits but right above the bottom two fingers there is a spot where the background and her hand run together. The AI doesn’t have any real logical understanding of what a finger is, it’s just compositing bits and then applying filters to merge the lines and make the styles match.

12. Reginald Selkirk says

@fingers:
I admit I don’t know much about Midjourney. Does it claim to have general knowledge, or is it only trained on existing art? In which case it might not know much about anatomy. Many of the best artists studied anatomy. Leonardo da Vinci and Michelangelo both dissected corpses.
If Midjourney doesn’t know about anatomy, it might be noticing that sometimes fingers are curled up and sometimes pointing in various directions, so why not throw in several of those and one is likely to be somewhat correct, like an electron orbital cloud.

13. flex says

Well, the christmas tree is put away, so I have the rest of the day to goof off….

Let’s start with why I think AI generated images tend to contain extra parts rather than fewer. But to do that I’ll have to go back 30+ years.

To begin with, I am not an expert in AI generated images, or machine learning. But I have picked up a few things over the years and because of that I think I can suggest why AI generated images really shouldn’t be called intelligent. At least not in the sense that we know of intelligence.

Back in the early days of graphics, we learned how to draw a box on our CRT. It was a green box because color pixels hadn’t been developed yet. However, the principle is the same. Consider your screen as a two-dimensional array of 280 X 192. Each element in that array was a pixel. The value of each entry of that array can be a one or a zero. With a single command you can transfer that array to the CRT. If the array held a one the CRT lit a pixel at that location, if a zero the cathode gun wouldn’t fire. (It may have been zeros for firing and ones for not-firing, it was a long time ago.) Clearly there was deeper code to transfer the array to the CRT, but the important bit was that all the information to generate an image was contained in that array.

Fast forward a few years, and we had 8-bit color to play with. But the structure of the array was the same, there was just additional information contained in each entry. Instead of a single bit, there was a full byte of data. Fast forward further, and not only more color was available but the size of the arrays kept growing bigger as the monitors were able to handle higher and higher resolution. Many times in those years the CRT makers touted life-like resolution, and there were magazine articles about how much resolution was necessary in order to be life-like. 1280X1024 was said to be indistinguishable from reality, at least that was the claim when it was introduced. Most of us users laughed. But the important bit is still the fact that the image is an array. But even the best resolution, say 2560 X 1440 is only an array with 3,686,400 entries. Each entry may have a number of fields, so it’s much more than just 3.6MB of memory, but still, that’s not a lot of memory to be throwing around these days. (I remember that I was at one point working on 16k computers while in the Air Force. They were expandable to 32k but we didn’t need all that memory.)

So, keeping the knowledge that an image is just an array of numbers, let me tell you about another interest I’ve had for 30 years. Collecting digital art off the internet. Back in the 1990s, during the USENET days, most of the art you would find was drawn on another medium and then digitized. There is still a lot of that sort of art today even though there are now a large number of digital tools for artists. But a lot of the art would be circulated a number of times, and often had file names which were non-descriptive or even changed from one upload to another. It was not uncommon for me to find that I had 2-3 versions of the same image, and as my collection grew it became harder and harder to figure out if I had duplicates. Well, sometime early in 2000 some genius with the same problem came up with a solution. There was a little program called Unique Filer which would compare a folder (or folders) full of images against another set of folder(s) and flag any image it recognized as a possible duplicate. It could do this in several ways; by file name, by file size, or by looking at the array defining the image directly. The last option was the most useful, because by looking at the array it could determine how close one array was to another. And it could do that with different array sizes. So if the same picture was shown in two different files, one in a 600X800 array and the other in a 1024X768 array, it could recognize that certain entries in the array were not only the same but the same relative distance apart. Extremely clever, and extraordinarily useful. To the point that even though this program was nag-ware, after a couple months of using it I sent the developer the \$5 he asked for the license. (I still use the program, the About says that it is copyright 1999 to Teracom Consulting. A net search of Teracom Consulting brings up a lot of entities which are probably unrelated to these guys. I hope the developer did well, they deserved it.)

So now let us consider machine learning. Which is basically taking a lot of files and comparing them, identifying relationships between the data in the files, much like Unique Filer did. The learning algorithm does need to be written for the type of file it is learning, but the principle is the same. The company I work for has a visual driver assist system on the market. It helps with lane-tracking, automated braking, and other tools to help the driver avoid accidents. The algorithm is developed through machine learning. In principle it’s pretty simple, a digital camera generates an array of numbers. The algorithm compares this array to the relationships it has machine learned about images of roads, can tell if the array inputted by the camera does not match what the relationships it has machine learned. If the array the camera is generating indicates that the lane lines are not on either side of the image, it will try to adjust the trajectory of the vehicle until the array generated by the camera matches that image. The result is automatic lane-tracking. What if the lanes are obscured, say by snow? Well, that is where the genius of the visual system comes in, because it is not looking at just the lane-lines, it’s looking at the entire image, the entire array. So it can also track other parts of the image, like telephone lines, or ditches, sidewalks, and curbs. It’s a very powerful tool, but it has blind-spots. It cannot identify the unexpected, the untaught. It does not think in the same manner we do, we can extrapolate from insufficient data. It can only compare the image, the array, with the relationships it’s identified with the other arrays it has been shown. I don’t really know what it means when I say that we think, but I know it’s more than just comparing what we have seen before against what we see now.

So what does all this have to do with AI generation of images or code?

If you put all this together, he’s how I think the AI generation of images actually works. The AI learns from thousands and thousands of image arrays, and knows the key-words for that array (which could be from the title alone, but many image databases also have tags attached to each image which I’m sure the AI also uses). Then someone asks it to generate an image, say, “Tea party with trains”. The AI builds an array from scratch and starts modifying each entry in the array at random.
Every time it finishes an iteration if goes through the machine-learned database and sees how close the parameters in the array it’s generated match the parameters it has learned about tea, tea-parties, and trains. It keeps doing this until it reaches a point where all the parameters of the image array are within the normal distribution of the identified parameter it has learned from other images. This sounds like brute force, and a lot of it is, but it is also really extraordinarily complex.

For consider, each element in the image array has to have a relationship to other elements at different distances and directions. For a human figure, an element in an array on one side of a wrist has to have a similar relationship to the element showing the other side of the wrist, regardless of the angle of the arm! That is extraordinary, and a very impressive part of machine learning. I’m sure it’s done with vectors in matrix algebra, but still it’s incredible that they can do that at all.

So why are there extra-fingers, rather than to few? And why do the AI generated images often show both a bow-tie and a club tie on the same figure?

This comes down to the limitations of machine learning. An AI can’t count fingers, it doesn’t know if there are 2, 3, or 6. It doesn’t count things at all. It adjusts the image until it meets the parameters generated by machine learning, at least close enough. The parameters show that the width of the hand, really the distance in the array between the starting of the image of the hand and the ending of the image of the hand, needs to be x-distance in relationship to the distance of the wrist it’s attaching to. Again, there is no intelligence there, only matching parameters. At the same time, within that distance from the start and stop of the hand, there needs to be lighter and darker areas, which we correspond to fingers. And the spacing between lighter and darker areas needs to also match certain proportions. The AI doesn’t “know” that humans generally have only four fingers and thumb, the AI keeps adjusting the array until it matches the parameters it learned. If some of that array randomly go darker (or lighter) to differentiate the image into what the human observer recognizes as fingers, and the AI measures that parameter as being closer to the ideal of the image, the AI re-enforces that differentiation. This will tend to generate more fingers than expected, because the random (or directed random) generation will select for the patterns which match the parameters it learned. The AI doesn’t count to four, but it also doesn’t reduce to four when more are generated. The machine-learned parameters are met, so it stops iterating that part of the array.

The bow-tie, club-tie is even easier to understand. The AI doesn’t “know” that humans only generally wear one or the other, or that wearing a tie at all is a sartorial statement not a functional one, instead as it iterates through the generation of an image, both bow-ties and club-ties bring the image closer to the machine-learned parameters. So they both show up.

The image-generating AI doesn’t “know” anything in the sense that it can tell how closely it relates to reality. At best it can tell how closely it matches other images. But, it can’t count; whether fingers, ears, or chessmen. It can’t identify a face; at best it can say that part of the image array has a similar pattern to an image array with the tag of “Brezhnev” or “Reagan”. It doesn’t understand material properties; I don’t think the AI would have any trouble with a request to show “forging a ceramic blade”, with sparks and everything.

I guess the key thing to recognize about AI generated images is that the AI does not recognize it as an image. It is an array with certain characteristics. Those characteristics can be compared to other arrays which the AI has been exposed to. And you can ask the AI to generate an AI with similar characteristics as the other arrays it has been exposed to. While that sounds simple, it really is a phenomenal combination of elegant coding and brute-force computing. Any creativity from AI generated images comes from the parameters the human user inputs. If an artist paints a copy of the Mona Lisa, if done by Picasso in the cubist style, the result is not Leonardo’s or Picasso’s. The artist is using the subject of Leonardo and the tools (style) developed by Picasso, but the end product is the result of the artist’s creativity. The image-generating AI has no creativity, no matter how good the output is. The creativity comes from the person who inputs the parameters of the request.

Well. That took longer than I thought it would, and digressed more than I probably should have. And has almost nothing to do with the OP. I’ll say again that I’m no expert in AI generated images, and my conclusions about how they are created may be entirely incorrect. But my understanding does, I feel, match the observed output, both how impressive they are and the weaknesses they demonstrate. There was a recent post on Pharyngula about how a trial of machine-learned enemy-recognition software failed when the marines testing it hid themselves in a cardboard box. This result is understandable in the light of how I understand the image array processing works. Will machine-learning get there? Probably sometime, but if the route attempted is to try to identify all the special cases, it won’t work. There will need to be a way for the AI to take the current image array, compared to the machine-learned information, and extrapolate why they don’t match. That’s not going to happen with just extremely complex image processing.

As for AI generated code or conversation, I don’t have enough knowledge to do more than speculate that the overall strategy is the same, exposure to a large library of functional code blocks or conversation, and the user defines the parameters of the desired results. Simple code modules like: inputs are A, B, and C, transform is f(x), output is D; probably work fine if they are common enough functions. Yet, the rarer the function the less likely the machine learning would have encountered anything similar, and the more likely there will be problems. Similarly with the conversation AI’s, there are probably plenty of conversations you can have about the Detroit Lions (assuming you want to), with an AI. But there would probably be a limited amount of discussion possible about the details of the Angevin dynasty. (One of my wife’s old friends recently published a book about this dynasty and she got it for me, I enjoyed it.) If the AI was any good it would probably either recommend to go read a book or consult a real expert before repeating itself.

14. consciousness razor says

there might be conditions in which the external rules may need bolstering: “if you can’t win, try to end with more pieces on the board”

More like “if you can’t win, force a draw, if that’s even remotely possible.”* That’s still about better/worse in terms of possible outcomes of the game, since draws are also valid outcomes and are better than losses.

The number of pieces remaining on either side is not used {ever} to determine the outcome or however that outcome may be scored — which is usually 1-0, 1/2-1/2, or 0-1, although other systems do exist in various (human) tournaments and such, often to discourage the players from getting into too many boring/easy draws or for similar reasons like that. But whatever (at least semi-reasonable) scoring system you might use in one context or another, a draw will definitely be better than a loss no matter what. Anyway, what’s not important here is doing random stuff that would swap one lost position for a different lost position, since the rules don’t treat those differently, even if some people might feel a little differently about them.

*If you’ve ever thought chess engines are especially good at aggressive tactics that will undermine your position in a handful of moves (assuming you’re lucky enough to even last that long), then just wait until you see them on defense. It’s just utterly maddening how good they are at that.

– subjective heuristics.

A mysterious use of “subjectve” here…. No idea what it’s about.

15. says

We all intuitively “know” hands generally have five fingers, but if it weren’t something we were innately, intimately knowledgeable of, would the difference be so notable?

What strikes me is that it isn’t that notable. At a first glance, they look like hands. It’s only at closer inspection that I even notice the number of fingers being off. Makes me wonder if I’d actually notice weird hands in real life.

16. says

There was a recent post on Pharyngula about how a trial of machine-learned enemy-recognition software failed when the marines testing it hid themselves in a cardboard box. This result is understandable in the light of how I understand the image array processing works. Will machine-learning get there?

I’m expecting a followup story where the AI went through another round of training and succeeded in the challenge… but then also started shooting the cardboard boxes in the warehouse.

17. Just an Organic Regular Expression says

Your basic point, that the AIs’ goals can, quite unintentionally, become misaligned with human goals, has also been explored in considerable depth by Holden Karnofsky, writing at Cold Takes. So it’s good to hear another voice on this (and with better images).

18. seachange says

Maybe this is *too* occam’s razorish/gordian knotwork/much less fun to think about.

Perhaps the extra fingers are intentional on the part of the company(ies) that own the image generators. No amount of user speculation, testing, training, examples, blogmegaparagraphication, or modeling will ever work. They were by their definitions just not ever meant by/allowed by their owners to work in a five-fingered way.

19. Owlmirror says

@flex: Web archives of Unique Filer and Teracom Consulting. Not sure if any of the information there will help you track down where the developer is now, but there it is.

20. says

Every AI has a goal baked in, at least the way they are currently made. The utility function tells the program what to consider good, or bad. The problem with giving an AI access to its own code is that the easiest way to maximise a utility function is to rewrite the function itself so that it defaults to the largest possible value and then shuts down.

In this sense, the ‘best’ chess AI is the one which immediately declares itself the winner and thereafter refuses to play.