Fake news can’t fool new algorithm

A University of California, Riverside, computer scientist has received reinforcements in his battle against fake news.

Snap Research, the research division of Snap, Inc., has made a $7,000 donation for Evangelos Papalexakis, an assistant professor of computer science and engineering in the Bourns College of Engineering, to continue improving an algorithm that can already detect fake news stories with 75 percent accuracy.

The gift formalizes an ongoing project between Papalexakis’ Multi-Aspect Data Lab and Snap Research scientist Neil Shah to create an automated algorithm that sorts news stories into categories based on clusters of words and contextual information and flags them as potentially fake news. The algorithm could be used by social media platforms to help users make more informed decisions about the news they click on and share.

Most attempts to automate fake news detection rely on locating particular words, identifying URLs, or fact checking websites like Snopes. All require human input and evaluation.

Most research to date has focused on carefully handcrafted features that predict an article’s legitimacy. The methods require specialists to extract those features and depend on a large library of examples already labeled as fake news.

Papalexakis and Shah start with the hypothesis that news articles appearing frequently near each other across a wide variety of contexts are more likely to belong to the same category.

Papalexakis’ group developed a two-tiered algorithm using a method called “tensor decomposition” that exploits articles’ structure to avoid reliance on human expertise. Tensors are multi-dimensional cubes. They excel at modeling and analyzing data with many different components, called multi-aspect data.

For instance, in online social networks, people interact with one another in a variety of ways: they message each other, they post on one another’s pages, and so on. All these interactions are parts of the same social network, and can be modeled as a tensor “data cube” composed of person, person, and means of interaction.

The researchers use a tensor to model the content of the article and map words spatially within the article. For each article they count how many times two particular words occur within a window of five to 10 words.

Tensor decomposition uncovers patterns by breaking the tensor into elementary pieces of data, each one representing a pattern, or topic. The group’s previous work showed that these topics successfully cluster misinforming articles.

In the first tier of the algorithm tensor decomposition represents the data compactly in a space that brings possibly fake articles close together. The second tier connects two articles if they are close to each other in the space computed by the tensor decomposition.

Next, “semi-supervised” machine learning is applied on the graphs. The method requires a small base knowledge of articles labeled by people, from which it learns and sorts other articles. But the approach requires far fewer human-annotated articles than current methods.

The team members put three sets of articles— two public datasets and their own collection of 63,000 news articles— through their algorithm and found that it accurately sorted articles into fake news categories 75 percent of the time. The result compares favorably to approaches that require a large number of human-labeled articles.

The gift from Snap Research enhances the team’s efforts to develop more robust and eventually fully automated techniques for identifying misinformation.

Social media companies could use the finished algorithm to filter misinformation out of user newsfeeds. Papalexakis would prefer that social media platforms flag articles rather than omit them, so that users can make more informed decisions about which articles to read and share.

Tensor decomposition is a major research thrust at the Multi-Aspect Data Lab, which has received funding from Naval Sea Systems Command, Naval Engineering Education Consortium, the National Science Foundation, and Adobe.

In addition to Papalexakis and Shah, UC Riverside computer science master’s student Gisel Bastidas Guacho and doctoral student Sara Abdali are working on the project. The group’s latest paper, “Semi-supervised Content-based Detection of Misinformation via Tensor Embeddings,” is available on the arXiv.org pre-print server and will appear at the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining in August this year, in Barcelona, Spain.