Last June, Antonio Radić, the host of a YouTube chess channel with more than a million subscribers, was live-streaming an interview with the grandmaster Hikaru Nakamura when the broadcast suddenly cut out.
Instead of a lively discussion about chess openings, famous games and iconic players, viewers were told Radić’s video had been removed for “harmful and dangerous” content. Radić saw a message stating that the video, which included nothing more scandalous than a discussion of the King’s Indian Defence, had violated YouTube’s community guidelines. It remained offline for 24 hours.
Exactly what happened still is not clear. YouTube declined to comment beyond saying that removing Radić’s video was a mistake. But a new study suggests it reflects shortcomings in artificial intelligence programmes designed to automatically detect hate speech, abuse and misinformation online.
Ashique KhudaBukhsh, a project scientist who specialises in AI at Carnegie Mellon University and a serious chess player himself, wondered if YouTube’s algorithm may have been confused by discussions involving black and white pieces, attacks and defences.
So, he and Rupak Sarkar, an engineer at CMU, designed an experiment. They trained two versions of a language model called BERT, one using messages from the racist far-right website Stormfront and the other using data from Twitter. They then tested the algorithms on the text and comments from 8,818 chess videos and found them to be far from perfect.
The algorithms flagged around one per cent of transcripts or comments as hate speech. But more than 80 percent of those flagged were false positives – read in context, the language was not racist. “Without a human in the loop,” the pair say in their paper, “relying on off-the-shelf classifiers’ predictions on chess discussions can be misleading.”
“Fundamentally, language is still a very subtle thing.”
The experiment exposed a core problem for AI language programs. Detecting hate speech or abuse is about more than just catching foul words and phrases. The same words can have vastly different meaning in different contexts, so an algorithm must infer meaning from a string of words.
“Fundamentally, language is still a very subtle thing,” says Tom Mitchell, a CMU professor who has previously worked with KhudaBukhsh. “These kinds of trained classifiers are not soon going to be 100 per cent accurate.”
Yejin Choi, an associate professor at the University of Washington who specialises in AI and language, says she is “not at all” surprised by the YouTube takedown, given the limits of language understanding today.
Choi says additional progress in detecting hate speech will require big investments and new approaches. She says that algorithms work better when they analyse more than just a piece of text in isolation, incorporating, for example, a user’s history of comments or the nature of the channel in which the comments are being posted.
But Choi’s research also shows how hate-speech detection can perpetuate biases. In a 2019 study, she and others found that human annotators were more likely to label Twitter posts by users who self-identify as African American as abusive and that algorithms trained to identify abuse using those annotations will repeat those biases.
Supersmart algorithms will not take all the jobs, but they are learning faster than ever, doing everything from medical diagnostics to serving up ads.
Companies have spent many millions collecting and annotating training data for self-driving cars, but Choi says the same effort has not been put into annotating language. So far, no one has collected and annotated a high-quality data set of hate speech or abuse that includes lots of “edge cases” with ambiguous language.
“If we made that level of investment on data collection – or even a small fraction of it—I’m sure AI can do much better,” she says.
Mitchell, the CMU professor, says YouTube and other platforms likely have more sophisticated AI algorithms than the one KhudaBukhsh built; but even those are still limited.
Big tech companies are counting on AI to address hate speech online. In 2018, Mark Zuckerberg told Congress that AI would help stamp out hate speech. Earlier this month, Facebook said its AI algorithms detected 97 percent of the hate speech the company removed in the last three months of 2020, up from 24 percent in 2017. But it does not disclose the volume of hate speech the algorithms miss, or how often AI gets it wrong.
Curious scientists fed some of the comments gathered by the CMU researchers into two hate-speech classifiers – one from Jigsaw, an Alphabet subsidiary focused on tackling misinformation and toxic content, and another from Facebook. Some statements, such as, “At 1:43, if white king simply moves to G1, it’s the end of black’s attack and white is only down a knight, right?” were judged 90 per cent likely not hate speech. But the statement “White’s attack on black is brutal. White is stomping all over black’s defences. The black king is gonna fall…” was judged more than 60 per cent likely to be hate speech.
It remains unclear how often content may be mistakenly flagged as hate speech on YouTube and other platforms. “We don’t know how often it happens,” KhudaBukhsh says. “If a YouTuber is not that famous, we will not see it.”
- A Wired report