Hate speech detection from social networking posts using CNN and XGBoost
Abstract
The increasing growth of social networks and microblogging websites have enabled
people from different backgrounds and diverse moral codes to communicate with
each other quite easily. While social media promotes communication and sharing of
information, these are also used to initiate heinous and negative campaigns. Social
networks although discourage such act but people often use these social platforms
to propagate offensive and hatred towards individuals or specific groups. Therefore,detecting hate speech has become a serious issue that needs considerable attention. The goal of this research is to detect such campaigns of hate. In this paper, two
different approaches have been proposed for detecting hate and offensive language
on social platforms. The paper proposes Natural language processing with CNN
architecture and XGBoost classifier which will be explicitly effective for capturing
the context and the semantics of hate speech. The proposed classifiers distinguish
hate speech from neutral text and can achieve a higher quality of classification than
current state-of-the-art algorithms.Using CNN,the accuracy that has been obtained
on detecting if a tweet is offensive or neutral is 89.18% and on another datasetcontaining hateful, offensive and neutral comments, the accuracy is 84.74%.The
later approach of using XGBoost classifier has achieved an accuracy of 93.10% and
80.51% respectively.In addition,2333 tweets have been collected from twitter and labelled using annotators.On that dataset, using CNN model the accuracy is 76.70%
and for XGBoost the accuracy is 78.14%.