Demystifying black-box learning models of rumor detection from social media posts

Tafannum, Faiza; Shopnil, Mir Nafis Sharear; Salsabil, Anika; Ahmed, Navid

View/Open

17101063, 17101423, 17101498, 17101373_CSE.pdf (1.410Mb)

Date

2021-09

Publisher

BRAC University

Abstract

Social media and its users are vulnerable to the spread of rumors, therefore, protect ing users from these rumors spread is extremely important. This research proposes a novel approach for rumor detection in social media that consists of multiple robust models: Support Vector Machine, XGBoost Classifier, Random Forest Classifier, Extra Tree Classifier, and Decision Tree Classifier. To evaluate more, we com bine these five different machine learning models to build our own hybrid model. Then, we apply two deep learning models- Long-Short Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT) and both show promising results with high accuracy. For evaluations, we are using two datasets COVID19 Fake News Dataset and Twitter15 and Twitter16- two publicly available datasets concatenated. The datasets contain posts from both Facebook and Twit ter. We extract the textual part of source posts in vector representations and fit them into the models for predicting results and we evaluate the results. These arti ficial intelligence algorithms are often referred to as “Black-box” where data goes in the box and predictions come out of the box but what is happening inside the box frequently remains cloudy. Although there have been many inspired works for fake news detection, still the number of works regarding rumor detection lags behind and the models used in the existing works do not explain their decision-making process. But with explainable AI, the opaque process happening inside the black box can be explained. We use LIME to explain our models’ predictions. We take models with higher accuracy and illustrate which feature of the data contributes the most for a post to be predicted as a rumor or a non-rumor by the models, thus, demystifying the black box learning models. Our hybrid model achieves an accuracy of 93.22% and 82.49%, while LSTM provides 99.81%, 98.41% and BERT provides 99.62%, 94.80% accuracy on the COVID-19, Twitter15 and Twitter16 datasets respectively.

Keywords

Social media; Rumor; Detection; Black box; Machine learning; Deep learning; Explainable; LIME; COVID-19; Classifier

Description

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 36-38).

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1586]