There is a new documentary Roadrunner: A Film About Anthony Bourdain about the food and travel writer who died by suicide in 2018. In the documentary, at one point they have him reading an email he sent to a friend. Why would he read an email aloud? Well, he didn’t. What the filmmakers did was to use AI to synthesize a voice that closely resembled his, a technology that could be used to have any text seem to emanate from him. (I first learned about this technology when Marcus Ranum had a post on it back in 2016.)
I posted at that time that this new audio technology, coupled with the ability to make visual deep fakes of people, would open the floodgates to all manner of abuse, since people with ill-intentions could produce ‘evidence’ that made it appear as if some one was saying something that they did not.
That kind of abuse has not occurred as yet (as far as I know) but the revelation that this AI technology was used by the documentary filmmaker Morgan Neville to make it appear as if Bourdain was actually reading has generated some controversy.
There were a total of three lines of dialogue that Neville wanted Bourdain to narrate, the film-maker explained in his interview. However, because he was unable to find previous audio, he contacted a software company instead and provided about a dozen hours of recordings, in turn creating an AI model of Bourdain’s voice.
…Despite Neville describing his use of AI technology as a “modern storytelling technique”, critics voiced concerns on social media over the unannounced use of a “deepfake” voice to say sentences that Bourdain never spoke.
…Sean Burns, a film critic for Boston’s WBUR, denounced the film-makers, writing: “When I wrote my review I was not aware that the film-makers had used an AI to deepfake Bourdain’s voice … I feel like this tells you all you need to know about the ethics of the people behind this project.”
I am not sure why what Neville did is being seen as so objectionable. We have long had actors read the words of dead people in films, TV, and radio. Why would having a computer read the words be any worse? Is it that the striking accuracy of the voice reproduction might make people think that Bourdain had actually spoken the words and that hence there was deception? If Neville had revealed the use of AI earlier, perhaps critics might have been mollified.
This article looks more closely at the ethical issues involved.
“We have pretty strong polices around what can be done on our platform,” said Zohaib Ahmed, founder and CEO of Resemble AI, a Toronto company that sells a custom AI voice generator service. “When you’re creating a voice clone, it requires consent from whoever’s voice it is.”
Ahmed said the rare occasions where he’s allowed some posthumous voice cloning were for academic research, including a project working with the voice of Winston Churchill, who died in 1965.
Ahmed said a more common commercial use is to edit a TV ad recorded by real voice actors and then customize it to a region by adding a local reference. It’s also used to dub anime movies and other videos, by taking a voice in one language and making it speak a different language, he said.
He compared it to past innovations in the entertainment industry, from stunt actors to greenscreen technology.Just seconds or minutes of recorded human speech can help teach an AI system to generate its own synthetic speech, though getting it to capture the clarity and rhythm of Anthony Bourdain’s voice probably took a lot more training, said Rupal Patel, a professor at Northeastern University who runs another voice-generating company, VocaliD, that focuses on customer service chatbots.
“If you wanted it to speak really like him, you’d need a lot, maybe 90 minutes of good, clean data,” she said. “You’re building an algorithm that learns to speak like Bourdain spoke.”
There is one problem that technology cannot solve. The same set of words can be made to convey quite different meanings by changing the pitch, cadence, emphasis, inflection, and pauses. I recall once seeing a video of a famous actor (Ian McKellen? Orson Welles?) say the same lines from Shakespeare (?) in different ways to suggest quite different meanings. When we read the written words, we can interpret them in different ways but the AI bot fixates on just one, presumably randomly.
For this technology to work, the AI software needs access to a fair amount of audio files of the speaker in order to reproduce the voice accurately, so we would not be able to hear the voice of Abraham Lincoln, for example. But I think that this technology is going to be widely used in documentaries of dead people who leave behind a cache of voice recordings.
The special effects in films nowadays show actors doing things that they did not actually do and audiences know that and seem to accept it. Soon people will be heard saying things that they did not actually say. As long as we are aware that it is not real, I think we can eventually expect the same level of acceptance.
moarscienceplz says
“I recall once seeing a video of a famous actor (Ian McKellen? Orson Welles?) say the same lines from Shakespeare (?) in different ways to suggest quite different meanings.”
I believe it was in the Macbeth episode of Shakespeare Uncovered on PBS that I saw an actor (I also cannot remember who) demonstrate the “Tomorrow and tomorrow and tomorrow” soliloquy with the usual emphasis on “tomorrow”, and then again with the emphasis on “and”. It was astounding how much difference this made to the feel of the piece.
jrkrideau says
@ 1 moarscienceplz
HE said that?
He said THAT?
HE said THAT?
etc.
It maybe a while before AI gets it right or we may learn to love the AI interpretation of Shakespeare.
robert79 says
“since people with ill-intentions could produce ‘evidence’ that made it appear as if some one was saying something that they did not.
That kind of abuse has not occurred as yet (as far as I know) ”
It has happened.
https://www.theguardian.com/world/2021/apr/22/european-mps-targeted-by-deepfake-video-calls-imitating-russian-opposition
Not sure if they used voice imitation, but an image deep fake is probably already enough to cause damage, you generally look up a face, not a voice.
Holms says
This is only a technological version of what voice impersonators already do, without much worry about ethics. If they make clear that they are an impersonator rather than the original person, perhaps with a disclaimer somewhere, no problem exists; if they don’t, then the original person (or their estate) has grounds for legal action.
Michael Caine on the topic, sort of.
consciousness razor says
I think it’s pretty obvious that that isn’t the big objection for most people.
There you have it: misleading people is objectionable. Having a voice actor do it, without making that apparent in the documentary, would be just as problematic.
From the Guardian link:
Yeah, that’s the stuff right there — the dude is outright refusing to let people know which lines were faked. What kind of asshole documentarian does that, particularly when the film’s about someone who’s recently deceased?
garnetstar says
i agree about the (lack of) ethics in this, and the potential for abuse, and the deceiving of the audience to believe that it is Bourdain himself speaking the words. Voice actors, impersonaters or not, are *always* credited, as all actors should be.
But really, it’s what @5 said: won’t the audience, watching this faking and attempt to *pretend* that a dead person is speaking, be thinking “This is really creepy!”, and the more recently deceased, the more offensive. I wouldn’t think it’d be a very popular trick. The old-school “Voice of Anthony Bourdain” credit must work a *lot* better than trying to pretend that he actually spoke those lines from beyond the grave!
sonofrojblake says
Faking someone’s voice using AI for a documentary to get them to say out loud things they themselves wrote: very, VERY cool.
Faking someone’s voice using AI for a documentary to get them to say things they never said or meant: not cool, but that’s not what happened.
Faking someone’s voice using AI for a documentary but concealing that pertinent fact: not cool…. but… we’re talking about it. TNSTABP.
Ridana says
That reminds me of a Benny Hill clip I saw where a director was trying to film with an actress who tended to interpret her lines off-kilter.
Actress: What’s on the road? A head?
Director: No, no, cut! It’s “What’s on the road ahead?”!!
Actress (intimately pressing her body against costar): Ooh, what’s this thing called, love?
Director: Gah!! Cut! It’s “Oh what’s this thing called “love“?”!!
Marcus Ranum says
Abuse is only an issue as long as people believe in media. I wonder if we should cut to the chase and start teaching that media is unreliable.
Marcus Ranum says
“I recall once seeing a video of a famous actor (Ian McKellen? Orson Welles?) say the same lines from Shakespeare (?) in different ways to suggest quite different meanings.”
McKellen did that in one episode of the Teaching Shakespeare series, which is highly recommended. It’s the Royal Shakespeare Company doing drama practice. Fascinating stuff. McKellen was, of course, the bomb.
sonofrojblake says
https://youtu.be/d2A07ToxkTI