Comparative study of toxic comments classification using machine learning algorithms

Razzak, Razia; Sadril, Md.; Shakil, Mahmudul Hasan; Rahman, Mahfuzur; Taki, Sabiha Tul Omman

dc.contributor.advisor	Chakrabarty, Amitabha
dc.contributor.author	Razzak, Razia
dc.contributor.author	Sadril, Md.
dc.contributor.author	Shakil, Mahmudul Hasan
dc.contributor.author	Rahman, Mahfuzur
dc.contributor.author	Taki, Sabiha Tul Omman
dc.date.accessioned	2021-07-15T06:18:46Z
dc.date.available	2021-07-15T06:18:46Z
dc.date.copyright	2021
dc.date.issued	2021-01
dc.identifier.other	ID: 16101291
dc.identifier.other	ID: 16301032
dc.identifier.other	ID: 16301026
dc.identifier.other	ID: 16101206
dc.identifier.other	ID: 17101519
dc.identifier.uri	http://hdl.handle.net/10361/14810
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 54-56).
dc.description.abstract	The rapid growth of information technology and the disruptive transformation of social media have happened in recent years. Websites like Facebook, Twitter, Instagram, where people can express their thoughts or feelings by posting text, photos or videos, have become incredibly popular. But unfortunately, it has also become a place for hateful activity, abusive words, cyberbullying and anonymous threats. There are many existing works in this field but those are not fully successful yet to provide accuracy in satisfactory level. In this work, we employ natural language processing (NLP) with convolution neural networking (CNN), extreme gradient boosting (XGBoost) and support vector machine (SVM) for segmenting toxic comments at first and then classifying them in six types from a large pool of documents provided by Kaggle’s regarding Wikipedia’s talk page edits. Using this dataset, the hamming score of CNN model is 89% ,XGBoost model is 87% and SVM model is 84%.	en_US
dc.description.statementofresponsibility	Razia Razzak
dc.description.statementofresponsibility	Md. Sadril
dc.description.statementofresponsibility	Mahmudul Hasan Shakil
dc.description.statementofresponsibility	Mahfuzur Rahman
dc.description.statementofresponsibility	Sabiha Tul Omman Taki
dc.format.extent	56 Pages
dc.language.iso	en_US	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Cyberbullying	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Word Embedding	en_US
dc.subject	Convolutional Neural Networks	en_US
dc.subject	XGBoost	en_US
dc.subject	Support Vector Machine	en_US
dc.title	Comparative study of toxic comments classification using machine learning algorithms	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 16101291, 16301032, 16301026, ...
Size:: 1.951Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record