In February, the Justice Department charged 13 Russians with stealing U.S. citizens’ identities and spreading “fake news” with intent to subvert the last U.S. presidential election. The case is still unfolding, and may do so for years. In the meantime, UCR researchers have built a tech-based solution to the dissemination of malicious misinformation.
UCR’s Multi-Aspect Data Lab, led by Evangelos E. Papalexakis, assistant professor at the Computer Science and Engineering department, is developing novel data science techniques to address a variety of problems in social network analysis, with funding from Naval Sea Systems Command, Naval Engineering Education Consortium, the National Science Foundation, and Adobe.
The researchers are building algorithms to discern patterns that indicate “fake news.” Through extrapolation, and commands inserted into publishers’ content management systems, these items can then be removed before they go live and cause havoc. Crucially, the UCR computation can record the “footprint” of such posts to support prosecutions.
Papalexakis’ latest academic paper on this work: “Unsupervised Content-Based Identification of Fake News Articles with Tensor Decomposition Ensembles,” co-written with graduate research assistant Seyed Mehdi Hosseini Motlagh, was presented, and won the “best paper award,” at the recent MIS2: Misinformation and Misbehavior Mining on the Web workshop, part of WSDM 2018 (11th ACM International Conference on Web Search and Data Mining).
“Previous studies have provided useful insights on the propagation of an article in a social network. However, detection based solely on this poses the risk of a fake news article ‘infecting’ a number of social media users before it is detected,” Papalexakis said. “Instead, our work aims at the early detection of such articles, especially in cases where we have no external knowledge regarding the validity and veracity of any article.”
Human network monitoring relies on a combination of common sense and experience to know whether something is legitimate. For example, moderators check if the headline is in ALL CAPS (digi-culture code for “shouting”), use well-known hate crime language keywords, and look for a lack of verified sources for spurious claims.
But how do you teach a computer that these triangulated attributes often indicate “fake news”?
Machine-based comprehension relies purely on mathematical concepts, so Papalexakis and his researchers use what is called “Multi-Aspect Data.” Simply put, picture a social grouping in which everyone inside the interaction has many ways to connect (i.e. phone, text, video, instant message, social media posts). The Multi-Aspect Data Lab then records, examines, categorizes and models all these inputs, based on what is known as “tensor decompositions.” A “tensor” in data science means a multidimensional structure, like a cube. All the multi-aspects are digitally captured as multidimensional cubes so the system can investigate and “comprehend” what’s really going on— and whether the news is fake, or not.
“The tensor decomposition techniques we develop are able to capture nuanced patterns that successfully identify different categories of fake news, without using any external knowledge about the validity of any particular article.” Papalexakis said.
By leveraging the diversity of all data aspects, the UCR system provides a more accurate result than earlier published research in this field. In their paper, the authors illustrate how they compile their algorithm, then publish the results of multiple experiments, demonstrating that the proposed algorithm identified up to 80 percent of fake news.
Industry has taken note. Papalexakis said he’s actively pursuing collaborations with major tech giants.