Interpretable Bangla fake news classification using BERT and traditional machine learning approaches
Abstract
Fake news is a type of content that is inaccurate or misleading and it is usually
published with the intention of damaging a person or organization’s reputation. It
has recently grown significantly in the online forum and on social media platform
like Facebook, Reddit, Twitter etc. Because of its falsified statements, people are
often persuaded by false news, which has serious consequences in the real world.
As a result, there is a growing interest in the field of fake news identification, even
though the majority of fake news identification studies are for English language
whereas just few of them are for Bangla language. In our study, we come up with
a BERT-based system that uses Stratified K-fold cross validation that can achieve
98.45% test accuracy, whereas only the Random Forest can achieve 86.83% accuracy
among all the traditional machine learning models. Furthermore, we used Local
Interpretable Model-Agnostic Explanations to provide explainability to our system.
In this research, we have used the existing BanFakeNews dataset to identify Bangla
Fake News. The primary focus of this paper is to develop a model that can recognize
fake news in natural language processing so that the developed model can decrease
the time it takes individuals to extract fake news from social media.