Now Look What You Made Me Do!


[Warning: Arachnophobia, AI art]

PZ posted a picture of a jumping spider riding a horse, and complained about its price tag. [pha]

Naturally, I immediately fired up Midjourney and gave it a prompt more or less matching the description of PZ’s find:

“A giant jumping spider riding a galloping horse across the western plains.”

Obviously, this was not what I expected:

Midjourney AI and mjr

I don’t think the eyes and fangy teeth are accurate in this world. Perhaps that is a cover illustration for “John Carter and the Spider Cowboys of Mars” or something. Actually, I’d buy and read that book, equal parts enjoying it, and cringing.

Let me wander a bit from the basic idea and talk about something related, but interesting. I cross-posted one of my rants over at DKos, and included some of my bad Midjourney promptings. That fetched me a growl from one of the commenters, who pointed out that I was standing against artists who have had their job market and rights undercut by AIs.

I have to admit, that rattled me. It still does. So, being human, I engaged in some ratiocination, then as a philosophically inclined human, I attacked my own rationales, and ground the topic (in my mind) into the mud of some intellectual Paaschedaele. (too soon?) But, here’s the thing – and, I swear, after a lot of thinking, this is what I think I believe right now. Feel free to try to change my mind but, if you do, argue honestly.

On the topic of copyright, I believe that art, to have meaning, has to have something to do with the human experience. All of the great art we consider meaningful (pace Jackson Pollock, who does nothing for me) is meaningful because of the cultural references or experiences of the viewer.

Midjourney AI and mjr “an unpublished painting by jackson pollock”

In other words, a human artist’s method is similar to that of an AI: a great deal of cultural knowledge and (generally) experience with other art, gets mushed around in their conscious and subconscious and turned into an output. For example, my personal photography is strongly influenced by Art Frahm, Greg Crewdson, and Erwin Olaf. I say “strongly” in the same sense that an AI might be more trained toward producing artworks with sharp focus, cinematic lighting, and strong front-to-back depth. Am I stealing from those guys? No, because they also stole it and re-interpreted it, ad nauseam. Let me be clear: I am saying that AI “creativity” is qualitatively similar if not quite the same as human “creativity.” Back in 2017 I tried to explore this idea with a few hypotheticals [stderr] AI models for art generation often include “generate adversarial networks” (GAN) – an arrangement in which one AI training set permutes candidate ${whatever} and hands then to the adversarial network, which has been trained with “top rated art.” So, imagine something coughing out subgenius art at a thousand miles/second, filtering it past an AI art gallery directory trained on the contents of every great art museum – it only lets through things that are sort of likely to be museum-worthy. Anyhow, that’s a process that involves using a lot of art as its inputs. But is that “stealing” the art? Sure a lot of that art is copyrighted, but it is still acceptable to have an art student go to the Met in New York and try to absorb the style of an ancient master’s brush-strokes. If I steal a great artists’ brush-strokes and use them for different content, am I violating their copyright?

Last time I mentioned copyright, one of the commenters offered an observation that copyright is bullshit in general. OK, there might be something to that, but I haven’t figured out what I think of that, yet, but I wanted to acknowledge that the position has been taken.

Summary: I don’t think that AI are either non-creative, or stealing other artists’ materials. The first part of that, accepting that AI can be creative, is possibly contentious; go ahead and contend. But I think that if you accept the GAN model for creative filtering of permutations, you have a complicated problem to assert that’s not what human artists do, and we generally acknowledge that creativity is a property of human art. I have argued that “stealing” is also part of human creativity, since we’re mostly re-mixing existing cultural assets, and what is creative is the sequence and arrangement of those assets. [By the way, that also answers the christian theists popular argument that information cannot be created at random. Of course it can, you goofs! I keep thinking of doing a posting on that but I feel like kicking christianity around is becoming ungenerous.]

Midjourney AI and mjr: “edward munsch the scream as photographed using a live model by erwin olaf”

Does the Midjourney take on Munsch impress you? It blows me out of my socks. Is it creativity or a pseudorandom number generator? [By the way: current generation GPUs include random number generators. Interesting, huh? I’m with Knuth: “random number generator is an oxymoron.” But is there enough non-determinacy in the GPU to sample randomness from it?]

Next topic: am I further starving the already-starving artists? Nope, sorry. Those of you who remember this blog from the pre-AI era, I used to do mediocre photoshop composites, sometimes, or captioned memes. I never spent a cent on artists. I was never going to. So, I am not denying any artist any money. I have commissioned artworks from real artists, before, e.g.: Great American Satan did a design for my official ship patch in Elite: Dangerous. [And did a fantastic job of it, too!] Andreas Avester was commissioned to design the tribal badger head I use as a logo, and have tattoo’d on my left arm. I commissioned Michael Hoops to make my fine art blacksmith’s hammer out of wrought iron and S7 tool steel. Etc. There are a variety of places in my life where I have obtained artworks from artists, but, generally, I wouldn’t actually waste a real artist’s time to do stupid throwaway illustrations for a blog. If I had 100,000 viewers and a budget and an editorial calendar, then I might actually think about it. My friend Per Hennig Olsen in Norway, is a professional illustrator who has done some amazing work for catalogs, post cards, and stuff like that – I have been known to ask Per to illustrate postings before, but it’s fraught. He’s a real artist and he can’t drop everything and do a throwaway illustration for a blog with a small but important commentariat.

That, then, is my answer to the accusation that I am further starving the already starving: sorry, I wasn’t going to help you, anyway. I’m not sure I like that answer.

The spider cowboy image illustrates a big problem with current AIs. They don’t know what hands are, is one thing (they are improving) but another is that they don’t understand arrangement. “a ${whatever} on a ${thing}” translates to: I want an image with a ${whatever} and a ${thing}” I suspect that someday in the not too distant future there will be AI art generators that have some kind of positional model built into them, but that’d require a completely different approach to generating the image. In Stable Diffusion there are things called “controlnets” that overlay the probabilities of the model with small tweaks that move things in the desired direction. The AI still does not understand a ${whatever} on a ${thing} but I can go into photoshop and compose an image that conveys that to the AI, then use that image as a controlnet.

Comments

  1. sonofrojblake says

    AI “creativity” is qualitatively similar if not quite the same as human “creativity.”

    I’m reminded of a chapter from Hofstadter’s “Metamagical Themas” titled “Variations on a theme as the crux of creativity” – definitely worth a read (like most of the rest of that book).

  2. snarkhuntr says

    I don’t know if it’s original to them, but the folks at the Trashfuture podcast often describe ‘AI’ as “a machine for hiding attribution”, this seems as good a description as any of the ways that these systems work and will continue to work.

    Sometimes what it’s hiding it’s the poor workers in Kenya working frantically behind Oz’s curtain to make sure that ChatGPT doesn’t say anything too racist, pedophilic or otherwise objectionable.

    Or perhaps it’s hiding that it’s simply taken the work, the output, of thousands of people who uploaded and crucially *tagged with text* thousands or millions of images that the Generative image ‘AI’ models require assimilating before they can respond to your text queries with ‘new’ images.

    When ‘AI’ systems are given control of our companies, vehicles, or our military weapons, the biases and preferences of whomever is in power will be embedded into those systems at the deepest level. Consciously or otherwise, this is going to happen. But the real utility of the ‘AI’ is going to be that the people in power can point to it after the system does something horrible and say “It wasn’t us, the ‘AI’ did it!” But by-and-large AIs will not be allowed to do things that the powerful don’t want them to. Hence the armies of people working behind the curtains to ensure that the “AI” output of the various highly-touted systems isn’t allowed to offend their rather prudish sensibilities. (And also to ensure that these highly valuable models aren’t poisoned by internet trolls, see: Microsoft Tay)

    I will continue to put ‘AI’ into scare quotes because the technologies being deployed now are explicitly not intelligent. Image generators understand nothing of the image they’re generating, it’s just random noise that is more-or-less “prompt-y” according to the settings chosen by the operator or the program defaults. ‘AI’ text generators are more than happy to produce factually wrong texts, because all they know is the universe of other people’s writing that has been fed into them (and the constant corrections provided directly by humans into the model). Giving ‘AI’s any kind of truth-validation function would require writing completely different kinds of programs to supervise and tweak the underlying generative model, and may not be possible in any case. Just look at how hard it has been, despite some of these companies having vast amounts of investor cash to burn throuh, to get the models to understand what a “hand” is, or what positions are.

    The creativity is all in the model-building, algorithm tweaking and prompt-crafting. The models themselves are not creative in any way, they are just as ‘happy’ to produce a neat photo-realistic image as they are to send you a wall of static in response to your query. It is the diligent work of the programmers, actual artists and other people who’s output was fed into the model that gives it, and you, the ability to produce interesting visuals from your query.

  3. says

    The “spider rider” picture makes me think of the covers Warren Publications used to put on its black and white comics magazines. Specifically it made me think of The Rook.

  4. xohjoh2n says

    Your free words are undercutting professional writers!

    (Right, I’m off downstairs to undercut a barista, while I ponder how specifically I’m going to put a restaurant out of business tonight.)

  5. Hj Hornbeck says

    … who dares revive me from my deep slumber?!

    By the way: current generation GPUs include random number generators. Interesting, huh? I’m with Knuth: “random number generator is an oxymoron.” But is there enough non-determinacy in the GPU to sample randomness from it?

    Ah yeah, that would do it. I suspect you’re talking about hardware-based RNGs on GPUs. Some Canadians were granted a patent on “hardware” RNG in GPUs in 2016, but that’s the first and last I’ve heard of it. NVidia does offer a hardware RNG generator on their Jetson platform, but that’s an embedded SoC and not a GPU.

    Theoretically, there’s plenty of non-determinancy on a GPU. That patent lists two sources, the time taken to resolve a race condition (for the laypeople, that’s two or more processes racing to access a resource that only one can use at a time), and how long it takes to measure the GPU’s temperature. Neither is particularly random in their raw form, but if you pass the output through a few rounds of histogram equalization the authors claim you can get decent results.

    This gets at why there’s no rush towards hardware random number generators in general. The patent authors claim their method is a “true random number generator,” but they never show it. They throw some basic tests at their numbers and compare it to a popular (and flawed) PRNG, but a proper proof would tie one or both of the above to a natural process that’s known to be completely non-deterministic. Physicists think such processes exist, based on their current theories, but proving that would require total knowledge of how the universe works. Even if we jump over that hurdle, is the hardware properly sampling from that process or just telling us it is? There is good reason to doubt the quality of hardware RNG. Even if we jump over that hurdle, how much bang are we getting for our buck? The process those Canadians describe is painfully slow, right from waiting to sample a single value in a single thread, to applying some post-processing to make that value look random.

    In contrast, I could have invoked PCG in a thousand threads, and in the span of a few cycles had better quality randomness than those Canadians were promising. Need something more robust? The SHA256, ChaCha20, and AES algorithms have all been heavily studied, and they can convert a deterministic sequence into something no-one’s been able to predict. You pay a heavier computational penalty than PCG, but the output can be considered more non-deterministic than anything you’d get from a hardware process.

    Hence why the cryptographically-secure random source on most unix-like systems isn’t raw values drawn from a hardware RNG, but values drawn from multiple sources passed through a cryptographic primitive. If you want high-quality randomness, you rely on deterministic algorithms instead of hardware, Knuth be damned.

  6. SchreiberBike says

    I’ve seen images of many Jackson Pollock works and thought little of them, but the first time I saw one full size and in person, it blew me away. Blew me away and pulled me in. There’s something about live original art that has a different effect than pixels on a screen or prints in a book.

  7. says

    Interesting thought – how hard do people think it would be to build a mechanical Pollock generator and make real fakes?
    Further, has anyone at Boston Dynamics considered teaching one of their bots to hold a paintbrush?

  8. Pierce R. Butler says

    The whole question takes us back to Oliver Wendell Holmes (or maybe somebody else):

    If you copy from one book, that’s plagiarism; if you copy from many books, that’s research.

  9. says

    I have moral issues with the commercial use of the products of AI, but I also pay Midjourney monthly because I enjoy playing around with it.

    Anyway, I also believe copyright is important as long as we live in a capitalist society, but it should also expire and a lot sooner than the Disney lobbyists have managed to get it extended to.

    Anyway, yeah, getting Midjourney to depict something on something can be impossible. Or something beside something. I’ve tried to get it to depict an elf and a goblin playing cards and it always results in two goblin-like creatures. I tried it early on, and even with 5.2 it’s still an issue.

  10. sonofrojblake says

    There’s something about live original art that has a different effect than pixels on a screen or prints in a book.

    This, so much this, at least for some art. Others, not so much. I’ve seen a lot of Gilbert & George’s stuff IRL, and frankly none of it loses anything being printed in a book or on a computer monitor. The Mona Lisa is on the same level – any reproduction will do, I get it.

    Conversely, three paintings immediately spring to mind:
    – Nighthawks by Edward Hopper
    – The Fair Feller’s Masterstroke by Richard Dadd
    – Madame de Loynes by Amaury-Duval

    No reproduction does these paintings justice, no written or verbal description can convey the experience of standing in front of them. These are just the three that live in my mind – your mileage may vary. When an AI can create that feeling, then and only then do I think we need to worry.

  11. says

    Reginald Selkirk@#11:
    You just know the cowpoke in that top image is nicknamed ‘Lefty’

    Fastest left-handed gun in the west. Its the guys with 20 fingers that get confused on the draw.

    Actually, the 20-fingered gunslingers went extinct. That’s how they got weeded out of the gene pool.

  12. says

    Tabby Lavalamp:
    Anyway, yeah, getting Midjourney to depict something on something can be impossible. Or something beside something. I’ve tried to get it to depict an elf and a goblin playing cards and it always results in two goblin-like creatures. I tried it early on, and even with 5.2 it’s still an issue.

    Yeah, It’s hard. I tend to use the “evolutionary approach” – let my GPU run for 15 minutes kicking out images (usually about 10-20 depending on the prompt and training set) then see what I have in the net.

    Prompt:

    2 people, (1elf woman and 1goblin woman:1.1) are (playing strip poker:1.1). sitting at a table in a dungeons and dragon tavern. large breasts. (playing cards)
    detailed, insanely detailed, cute, photorealistic. (clothes are heaped on floor:1.1).

    (masterpiece, best quality) concept art in the style of erwin olaf, greg rutkowski, artgerm, dungeons and dragon tavern scene, volumetric fog. cinematic lighting, drama. cool color tones.

    (dynamic view), (dynamic pose)

    The Card Sharp: (DynavisionXL)

    Alternate version:

  13. says

    Definitely not what I was trying to get! But another example of how difficult it still is to get your vision. One of these days I may have to commission what I want, or hand draw it myself, because I have a specific vision that I want to get on a play mat for Magic: The Gathering.

  14. Pierce R. Butler says

    Marcus Ranum @ # 14 – Midjourney now paints nipples (okay, a nipple)?!?

    I’d thought the 2023 version of the Comics Code Authority had a RULE against that.

  15. says

    Tabby Lavalamp@#15:
    Definitely not what I was trying to get! But another example of how difficult it still is to get your vision. One of these days I may have to commission what I want, or hand draw it myself, because I have a specific vision that I want to get on a play mat for Magic: The Gathering.

    Every time I commissioned an artwork, it was because I actually only had a vague idea what I wanted – one of those “I’ll know it when I see it” kind of deals. The AI generators sort of feel the same way – sometimes they really come out of left field and cough up something unexpected that is also right.

    I did have some success in the past getting commissions on deviantart.

  16. says

    Pierce R. Butler@#16:
    Midjourney now paints nipples (okay, a nipple)?!?

    Ah, I forgot to say – that was done with stable diffusion because of Midjourney’s limits. The Midjourney folks walked away from a whole lot of $ when they decided to side with the American Taliban.

  17. says

    snarkhuntr@#2:
    I need to argue with some of the points you make but have not had the attention-time and brain-power to think hard enough to be coherent.

  18. snarkhuntr says

    marcus@#19:

    Re-reading my post, I feel that it was a bit of a clunky and far-reaching way to make my overall point. If we’re just talking about creativity in ‘AI’ – my point would be this:

    All of the creativity in the AI is the direct result of human input, whether it is the creation of accurate databases of images with descriptions, the tuning of generative ‘AI’ models to produce coherent output, or the discovery/development of ways of prompting the model to produce a desired compositon. All of that is creativity. The ‘AI’ though, is not capable of that. This is pretty easy to demonstrate if you’re running stable diffusion – just push the iterations up higher. The ‘AI’ is still ‘trying’ to achieve the result suggested by your prompt, but the longer you let it work the closer it gets back to random noise. It has no way of knowing when to stop, because it has no way to understand what an image is, or why you would want it to look one way or another.

    That’s also why, AFAIK, ‘AI’s cannot be trained on ‘AI’ generated output. If you were to have stable diffusion procedurally generate hundreds of thousands of training images based on randomly selected text prompts, then tag those images with the prompts and train them into a new model, and iterated this step, your models would get rapidly less and less coherent or useful. Eventually, fed only it’s own input, the ‘AI’ would descend into a kind of metaphorical insanity (I think that smarter people than me coined the term ‘AI senility’ for this). Without a reference to the real world, the AI is free to imagine that any cluster of pixels can represent any cluster of letters and render it accordingly. It is only those vast databases of images created by and annotated by humans that keep it producing something we enjoy.

    Without human input and constant human correction, the ‘AI’ is just a random noise machine, just a finite set of monkeys hammering on a set of keyboards.

    I am not claiming that these tools aren’t useful, or that there isn’t creativity in them. But attaching the marketing term “AI” to them fundamentally misleads the public about what these tools can or may be able to do. These tools are a cool way to remix human output in ways that we find pleasing or interesting, and they are going to be a permanent part of our digital toolkits. But they aren’t going to change most people’s lives in any meaningful way.

    Some commercial illustrators are going to suffer hard, because the work they were paid for was never highly valued by their customers for any innate artistic qualities it had – they were just producing something to fill space on a book cover, album cover or other ephemeral place where our capitalist system determined that there needed to be ‘some art here’. It isn’t going to replace art, it’s just going to make it much harder for up-and-coming artists to earn a living.

Leave a Reply