AI Might Help Curb Fake News


Fake news is nothing new. Various forms of propaganda and “yellow journalism” have been propagated for centuries, but social media has essentially placed fake news on steroids.

2016 was a watermark year in fake news. For example, according to an article published in the Journal of Economic Perspectives in 2017, some stunning statistics came to light: the study showed that 62% of U.S. adults get news on social media and that 41.8% of all visits to fake news sites in the U.S. came through social media links.

(Source: Journal of Economic Perspectives)

Via social media, fake news spreads easily and is hard to detect. The sheer scale of fake news detection makes it a costly human-only task. The profile of the problem – an overwhelming amount of data to analyze – would seem to fit into the type of problem artificial intelligence (AI) might be best suited to solve.

Can AI Combat Fake News?

The answer is … maybe. Nothing is commercialized. Bloomsbury AI was acquired by Facebook in 2018 and Fabula AI by Twitter in June 2019, yet neither social media platform has shown much progress in combating fake news with AI. Most of the innovation is still confined to academia. Here are a few examples of work going on:

  • At Massachusetts Institute of Technology (MIT), researchers (supported by Facebook AI Research) are using FEVER (Fact Extraction and Verification), a massive fact-checking database to identify fake news. But researchers are running into significant bias that impacts the tool’s effectiveness. An article published in the MIT News in October 2019 illustrated some of the challenges:

FEVER has been used by machine learning researchers as a repository of true and false statements, matched with evidence from Wikipedia articles. However, the team’s analysis showed staggering bias in the dataset – bias that could cause errors in models it was trained on.

“Many of the statements created by human annotators contain giveaway phrases,” says lead author Tal Schuster. “For example, phrases like ‘did not’ and ‘yet to’ appear mostly in false statements.” One bad outcome is that models trained on FEVER viewed negated sentences as more likely to be false, regardless of whether they were actually true. 

“Adam Lambert does not publicly hide his homosexuality,” for instance, would likely be declared false by fact-checking AI, even though the statement is true, and can be inferred from the data the AI is given. The problem is that the model focuses on the language of the claim and doesn’t take external evidence into account.

Another problem of classifying a claim without considering any evidence is that the exact same statement could be true today but be considered false in the future. For example, until 2019 it was true to say that actress Olivia Colman had never won an Oscar. Today, this statement could be easily refuted by checking her IMDB profile. 

With that in mind, the team created a dataset that corrects some of this through de-biasing FEVER. Surprisingly, they found that the models performed poorly on their unbiased evaluation sets, with results dropping from 86 percent to 58 percent. 

In FakeDetector, the fake news detection problem is formulated as a credibility score inference problem, and FakeDetector aims at learning a prediction model to infer the credibility labels of news articles, creators and subjects simultaneously. FakeDetector deploys a new hybrid feature learning unit (HFLU) for learning the explicit and latent feature representations of news articles, creators and subjects respectively, and introduce a novel deep diffusive network model with the gated diffusive unit for the heterogeneous information fusion within the social networks.

Other universities are using the open source code for FakeDetector, but as of this writing, there doesn’t seem to be any further advancement in the solution.

  • A 2018 paper published by the University of Michigan and the University of Amsterdam envisions a combination of computational linguistics with fact-checking to combat fake news that (in theory) creates a more effective tool:

Computational linguistics can aide in the process of identifying fake news in an automated manner well above the chance level. The proposed linguistics-driven approach suggests that to differentiate between fake and genuine content it is worthwhile to look at the lexical, syntactic and semantic level of a news item in question. The developed system’s performance is comparable to that of humans in this task, with an accuracy up to 76%. Nevertheless, while linguistics features seem promising, we argue that future efforts on misinformation detection should not be limited to these and should also include meta features (e.g., number of links to and from an article, comments on the article), features from different modalities (e.g., the visual makeup of a website using computer vision approaches), and embrace the increasing potential of computational approaches to fact verification (Thorne et al., 2018). Thus, future work might want to explore how hybrid decision models consisting of both fact verification and data-driven machine learning judgments can be integrated.

Caution: Roadblocks Ahead

You can see just from these examples there are issues. Innovators have thought about the problem from various angles, only to run into roadblocks. One major issue that hasn’t been mentioned yet is how fake news content is created. There are several AI-driven solutions brewing that are getting pretty good at detecting automated text and deep fake images. But the problem is that a significant portion of fake news continues to be generated by humans, rendering these solutions ineffective.

Samuel Woolley, an assistant professor at the Moody School of Communications at the University of Texas-Austin, is the author of an upcoming book called the Reality Game. In an article from MIT Technology Review, Woolley discussed how he found that fake news was less AI-driven than commonly reported in the media:

All the evidence I’ve seen on Cambridge Analytica suggests the firm never launched the “psychographic” marketing tools it claimed to possess during the 2016 US election – though it said it could target individuals with specific messages based on personality profiles derived from its controversial Facebook database. 

When I was at the Oxford Internet Institute, meanwhile, we looked into how and whether Twitter bots were used during the Brexit debate. We found that while many were used to spread messages about the Leave campaign, the vast majority of the automated accounts were very simple. They were made to alter online conversation with bots that had been built simply to boost likes and follows, to spread links, to game trends, or to troll opposition. It was gamed by small groups of human users who understood the magic of memes and virality, of seeding conspiracies online and watching them grow. Conversations were blocked by basic bot-generated spam and noise, purposefully attached to particular hashtags in order to demobilize online conversations. Links to news articles that showed a politician in a particular light were hyped by fake or proxy accounts made to post and repost the same junk over and over and over. These campaigns were wielded quite bluntly: these bots were not designed to be functionally conversational. They did not harness AI. 

Fake News Detection: Who Should Be Responsible?

The logical assumption would be that social media platforms should be responsible for fake news detection that flows through their platforms. However, it seems that to date, the social giants have not really seen it that way. Woolley stated in the article that social media platforms look like they are willing to “passively identify potentially false information for users” but not much else yet:

Facebook, Google, and others like them employ people to find and take down content that contains violence or information from terrorist groups. They are much less zealous, however, in their efforts to get rid of disinformation. The plethora of different contexts in which false information flows online – everywhere from an election in India to a major sporting event in South Africa – makes it tricky for AI to operate on its own, absent human knowledge. But in the coming months and years it will take hordes of people across the world to effectively vet the massive amounts of content in the countless circumstances that will arise.

Fake News Detection: Market Opportunity?

It appears the social media platforms are not investing in this nascent market and most of the innovation remains in academia. Will disrupting startups begin to materialize? That probably depends on savvy businesspeople who determine who will pay for fake news detection. Will consumers be willing, perhaps along the lines of buying anti-virus software? Will innovators appear in app stores? Will fake news detection become the domain for web browsers or web search engines? It would make sense that Google might increasingly invest in fake news detection to gain traction within the social media world where it currently does not participate.

Comments are closed.