Cyberbullying and toxic language detection on social media for Bangla language
Abstract
As more people use social media, toxic language and cyberbullying become more
common with the Bengali-speaking community particularly. The complexity of
Bangla text data makes it difficult for traditional natural language processing (NLP)
algorithms to identify harmful content. This study proposes a machine learningbased
solution that recognizes and categorizes harmful language and “Cyberbullying
in Bangla text on social media”, leveraging BanglaBERT’s advanced features.
As more people use social media, toxic language and cyberbullying are on the rise,
with the Bengali-speaking minority particularly vulnerable. The proposed machine
learning-based solution achieved 94% testing accuracy in detecting and categorizing
cyberbullying and offensive language on digital platforms that support Bengali.