An AI and NLP approach for detecting grooming behavior

Shanto, Hasibul Hossain; Farooqui, Farhan; Rafi, Abdullah Al; Feona, Maisha Maliha; Phul, Progya Talukder

dc.contributor.advisor	Alam, Md. Golam Rabiul
dc.contributor.advisor	Rahman, Rafeed
dc.contributor.author	Shanto, Hasibul Hossain
dc.contributor.author	Farooqui, Farhan
dc.contributor.author	Rafi, Abdullah Al
dc.contributor.author	Feona, Maisha Maliha
dc.contributor.author	Phul, Progya Talukder
dc.date.accessioned	2025-02-23T05:02:56Z
dc.date.available	2025-02-23T05:02:56Z
dc.date.copyright	2024
dc.date.issued	2024
dc.identifier.other	ID 19301217
dc.identifier.other	ID 19301230
dc.identifier.other	ID 19301213
dc.identifier.other	ID 20101339
dc.identifier.other	ID 19301170
dc.identifier.uri	http://hdl.handle.net/10361/25531
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 42-44).
dc.description.abstract	"Grooming children on social media is a dangerous side effect of modern internet era. AI models, specially NLP have the potential to play a critical role in detecting grooming behavior. Even though, there have been studies in the past to build a grooming detection system, there is limited research on building such systems us- ing modern NLP techniques. In this paper, we propose a modern sexual grooming detection system using state-of-the-art NLP models and techniques that can detect and alert users to potentially dangerous online interactions between groomers and their targets. Our detection system is a ConversationClassifier which is able to clas- sify conversations, whether they are grooming or not. With over 19,000 grooming sentences collected from PervertedJustice grooming conversations, we created an annotated dataset exhibiting the grooming characteristics. Conversational data was also collected from both PervertedJustice and PAN12 dataset. With the sentence- level annotated dataset, we trained a SentenceClassifier model based on RoBERTa & DeBERTa to be able to accurately predict if a sentence has grooming character- istics or not. The ConversationClassifier was built on top of the SentenceClassifier with LSTM & GRU to capture the sequential features in the conversation. Further- more, a self-attention mechanism was added so that the model can focus on relevant sentences. Our models achieved promising results. In case of the SentenceClassifier, it displayed an accuracy of 93% for RoBERTa and 94% for DeBERTa. We paired the RoBERTa based SentenceClassifier with LSTM which yielded an accuracy of 97% and DeBERTa based SentenceClassifier with GRU which yielded an accuracy of 95%."	en_US
dc.description.statementofresponsibility	Hasibul Hossain Shanto
dc.description.statementofresponsibility	Farhan Farooqui
dc.description.statementofresponsibility	Abdullah Al Rafi
dc.description.statementofresponsibility	Maisha Maliha Feona
dc.description.statementofresponsibility	Progya Talukder Phul
dc.format.extent	51 pages
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.rights	BRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Grooming	en_US
dc.subject	Kids	en_US
dc.subject	Pedophiles	en_US
dc.subject	RoBERTa	en_US
dc.subject	Predators	en_US
dc.subject	Online	en_US
dc.subject	AI	en_US
dc.subject	NLP	en_US
dc.subject	Classification	en_US
dc.subject	GRU	en_US
dc.subject	LSTM	en_US
dc.subject	DeBERTA	en_US
dc.subject.lcsh	Artificial intelligence
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	An AI and NLP approach for detecting grooming behavior	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.description.degree	B.Sc. in Computer Science and Engineering

Files in this item

Name:: 19301217_24141330_19301213_201 ...
Size:: 840.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1583]

Show simple item record