Artificial Intelligence can identify people in pictures, find the next TV series you should binge watch on Netflix, and even drive a car. But on Friday, when a suspected terrorist in New Zealand streamed live video to Facebook of a mass murder, the technology was of no help. The gruesome broadcast went on for at least 17 minutes until New Zealand police reported it to the social network. Recordings of the video and related posts about it rocketed across social media while companies tried to keep up.
Why can’t AI, which is already used by major social networks to help moderate the status updates, photos, and videos users upload, simply be deployed in greater measures to remove such violence as swiftly as it appears?
A big reason is that whether it’s hateful written posts, pornography, or violent images or videos, Artificial Intelligence still isn’t great at spotting objectional content online. That’s largely because, while humans are great at figuring out the context surrounding a status update or YouTube, context is a tricky thing for AI to grasp.
AI has improved dramatically in recent years, and Facebook, Twitter, YouTube, Tumblr and others increasingly rely on a combination of artificial intelligence and human moderators to police content posted by users.
But with a huge volume of posts popping up on these sites each day, it’s difficult for even this combination of people and machines to keep up. AI still has a long way to go before it can reliably detect hate speech or violence online.
Machine learning, the AI technique Tech companies depend on to find unsavory content, figures out how to spot patterns in reams of data; it can identify offensive language, videos, or pictures in specific contexts. That’s because these kinds of posts follow patterns on which AI can be trained. For example, if you give a machine-learning algorithm plenty of images of guns or written religious slurs, it can learn to spot those things in other images and text.
However, AI is not good at understanding things such as who’s writing or uploading an image, or what might be important in the surrounding social or cultural environment. Especially when it comes to speech that incites violence, context is “very important,” said Daniel Lowd, an associate professor at the University of Oregon who studies artificial intelligence and machine learning.
Comments may superficially sound very violent but actually be satire in protest of violence. Or they may sound benign but be identifiable as dangerous to someone with knowledge about recent news or the local culture in which they were created.
“So much of the impact of a few words depends on the cultural context,” Lowd said, pointing out that even human moderators still struggle to analyze this on social networks. Even if violence appears to be shown in a video, it isn’t always so straightforward that a human — let alone a trained machine — can spot it or decide what best to do with it. A weapon might not be visible in a video or photo, or what appears to be violence could actually be a simulation.
Furthermore, factors like lighting or background images can throw off a computer. It’s computationally difficult to use AI to find violence in video, in particular, said Sarah T. Roberts, an assistant professor at UCLA who researches content moderation and social media.
“The complexity of that medium, the specificities around things like, not only however many frames per second, but then adding in things like making meaning out of what has been recorded, is very difficult,” she said.
It’s not simply that using AI to glean meaning out of one video is hard, she said. It’s doing so with the high volume of videos social networks see day after day. On YouTube, for instance, users upload more than 400 hours of video per minute — or more than 576,000 hours per day. “Hundreds of thousands of hours of video is what these companies trade in,” Roberts said. “That’s actually what they solicit, and what they want.”