Researchers have come up with a method for creating realistic-looking — but fake — videos of anyone by using just a single image of them with a trained artificial intelligence system. It’s a potentially worrisome capability in the runup to the 2020 United States presidential election, as falsified videos of candidates are expected to spread.
Researchers at the Samsung AI Center in Moscow and the Moscow-based Skolkovo Institute of Science and Technology explained the feat in a paper published this week to the arXiv, an online academic pre-print service. They said they were able to animate one or several photos of people by first training an AI system on a dataset of videos including many celebrities, so it could learn about key points on the face. After that, the AI system was able to combine that familiarity with one or more images of a person to come up with a convincing “talking head”-style video of them.
A video the researchers posted to YouTube this week showed multiple examples of how convincing it can look, as well as how much work is yet to be done. Impressively animated versions of physicist Albert Einstein, actress Marilyn Monroe and surrealist painter Salvador Dali were generated from iconic images of them.
But each was missing something: Einstein’s voluminous hairdo didn’t quite move with his head, Dali’s matchstick-thin mustache was cut short, and Monroe’s famous mole was absent from her cheek. In this still from a YouTube video, researchers illustrate how they trained an AI system to create videos of people (in this case, actor Joe Manganiello) from just one or a handful of still images.
In this still from a YouTube video, researchers illustrate how they trained an AI system to create videos of people (in this case, actor Joe Manganiello) from just one or a handful of still images.
The work is quite similar to deepfakes — a combination of the terms “deep learning” and “fake” — which are convincing fake videos and audio made using cutting-edge and relatively accessible AI technology. The research uses the same AI technique behind deepfakes, which is a machine-learning method known as GANs, or generative adversarial networks. But it’s different, as deepfakes are generated by using video of a target person along with video of someone else acting the way the target will in the video, such as this one featuring actor and comedian Jordan Peele putting words in former President Barack Obama’s mouth.
The spread of doctored videos is raising concerns for everyone from political leaders to the US intelligence community, which worries they may be used to mislead voters. These videos don’t need to be altered with the latest technology to be effective: A manipulated video of House Speaker Nancy Pelosi that went viral this week was simply slowed down to make it appear she was slurring her words following a meeting with President Donald Trump.
The researchers’ work is still in the early stages: The AI system was only trained to create a person’s head, neck and some of the shoulders. And while a clip created with a single reference photo of a woman looked plausible (though somewhat low-resolution), other clips that were made with eight and 32 images of her looked increasingly realistic.
Siwei Lyu, who studies deepfakes and is director of the computer vision and machine learning lab at University at Albany, SUNY, told CNN Business that the research could make it easier to create deepfakes with less data than they currently require. These days, that tends to be more than 30 seconds worth of video of both of the person you want to manipulate and another person who must also be filmed doing the desired motions.
“The downside is, without sufficient data, the quality of the synthesis is limited,” he said. Which is to say that he, too, noticed Monroe’s missing mole.