Google has come up with a new tool for generating video called Veo — feed it some detailed prompts, and it will spit back realistic video and audio. David Gerard and Aron Peterson decided to test it and put it through its paces, and see whether it produces output that is useful commercially or artistically. It turns out to be disappointing.
The problems are inherent to the tools. You can’t build a coherent narrative and structured sequence with an algorithm that just uses predictive models based on fragments of disconnected images. As Gerard says,
Veo doesn’t work. You get something that looks like it came out of a good camera with good lighting — because it was trained on scenes with good lighting. But it can’t hold continuity for seven seconds. It can’t act. The details are all wrong. And they still have the nonsense text problem.
The whole history of “artificial intelligence” since 1955 is making impressive demos that you can’t use for real work. Then they cut your funding off and it’s AI Winter again.
AI video generators are the same. They’re toys. You can make cool little scenes. In a super limited way.
But the video generators have the same problems they had when OpenAI released Sora. And they’ll keep having these problems as long as they’re just training a transformer on video clips and not doing anything with the actual structure of telling a visual story. There is no reason to think it’ll be better next year either.
So all this generative AI is good for is making blipverts, stuff to catch consumers’ attention for the few seconds it’ll take to sell them something. That’s commercially viable, I suppose. But I’ll hate it.
Unfortunately, they’ve already lost all the nerds. Check out Council of Geeks’ video about how bad Lucasfilm and ILM are getting. You can’t tell an internally consistent, engaging story with a series of SIGGRAPH demos spliced together, without human artists to provide a relevant foundation.