Identifying hate speech of Bangla language text using natural language processing
Date
2024-01Publisher
Brac UniversityAuthor
Rahman, MushfiqurJui, Razia Sultana
Sakib, Chowdhury Nazmuz
Ridoy, Fahim Alavi
Ananya, Taskiea Tabassum
Metadata
Show full item recordAbstract
In this era of the internet, sharing information through social media has provided
significant benefits to humans. People can easily access and observe others’ lifestyles
and work, as well as make comments or share thoughts about them. However, this
practice also brings challenges, such as the spread of hate comments, abusive online
criticism, spreading toxicity by giving hate comments etc. The internet’s flexibility
and anonymity have created a culture where users find it easy to express themselves
aggressively in communication. As the amount of hate speech is increasing, there
is a need for a method to automatically detect hate speech. To tackle this concern,
recent research has utilized diverse feature engineering methods and machine
learning algorithms to autonomously identify hate speech messages across various
datasets.Since it is related to Natural Language Processing (NLP), our goal is to utilize
NLP to detect hate speeches and demonstrate how Deep Learning and ML can
be used in this case.. Since there are more than 7,100 languages spoken throughout
the world, we have chosen the Bengali language as our dataset language. Additionally,
with the help of machine learning and deep learning, we will train our model
to automatically detect hate speech. We are utilizing Multinomial Naive Bayes,
RNN, Random Forest, Logistic Regression, Decision Tree Classifier, CNN-LSTM
Hybrid algorithm and Multi lingual Bidirectional Encoder Representations(mBert)
for result comparison and optimal outcomes and accuracy. After employing all the
above algorithms, we found the highest accuracy using the mBert for the binary
classification, which is 90.00%. On the other hand, for multiclass classifications, we
have found the highest accuracy using CNN-LSTM Hybrid algorithm, which is 64%
and the second highest is 62% using mBert. We are committed to further improving
these results.
Keywords
Bangla language; Natural language processing; Machine learning; Deep learning; Offensive languageLC Subject Headings
Natural language processing (Computer science); Automatic speech recognition; Deep learning (Machine learning)Description
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.Department
Department of Computer Science and Engineering, Brac UniversityType
ThesisCollections
Related items
Showing items related by title, author, creator and subject.
-
Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
Karimi, Sadullah (Brac University, 2022-09)Technology adoption is extremely limited in Afghanistan, especially since people have limited access to the Internet, smartphone, and computer due to power limitations and the high cost of the Internet. The people in ... -
Indigenous languages in Bangladesh & bilingualism: a qualitative study
Sumayra, Nusrat (Brac University, 2023-06)Bangladesh is a country of various cultures. People from different cultures reside in Bangladesh. Language is one of the integral parts of culture. Without language a person would be incomplete. There are around forty-two ... -
Reasons behind using L1 at primary level in English classes of Bangladeshi English medium schools
Turin, Tanjia Afrin (BRAC University, 2014-12)One of the on-going debates among language teachers especially primary level is whether or not to use students‟ First language (L1) in foreign language classrooms. In Bangladesh most of the teachers have been using Bangla ...