Performance analysis of machine learning algorithms for Malware classification

Bushra, Raisa Hasan; Alam, Md Taukir; Saha, Aniruddho; Fahim, Nazmus Sakib; Binty, Nabila Mourium

dc.contributor.advisor	Chakrabarty, Amitabha
dc.contributor.advisor	Rodoshi, Ahanaf Hassan
dc.contributor.author	Bushra, Raisa Hasan
dc.contributor.author	Alam, Md Taukir
dc.contributor.author	Saha, Aniruddho
dc.contributor.author	Fahim, Nazmus Sakib
dc.contributor.author	Binty, Nabila Mourium
dc.date.accessioned	2023-10-15T10:39:29Z
dc.date.available	2023-10-15T10:39:29Z
dc.date.copyright	©2022
dc.date.issued	2022-09-29
dc.identifier.other	ID 18301064
dc.identifier.other	ID 18301277
dc.identifier.other	ID 18201117
dc.identifier.other	ID 18201166
dc.identifier.other	ID 19101082
dc.identifier.uri	http://hdl.handle.net/10361/21825
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 32-36).
dc.description.abstract	Malware detection research has been popular over the years as the variations and complexity of malware attacks are increasing daily. Using variously Supervised and Unsupervised machine learning algorithms to detect, identify, or classify malware attacks has been proven a very effective technique for some past years. Some com- mon and widely concerning malware attacks are Trojan, Adware, Ransomware, and Zero-day. In this paper, we used ten ML algorithms such as AdaBoost, Stochastic Gradient Descent (SGD), Naïve Bayes (NB), Decision Tree (DT), Random For- est (RF), XGBoost, Logistic Regression (LR), Multi-Layer Perceptron (MLP), K- Nearest Neighbour(KNN), Support Vector Machine (SVM) for classifying software- based Trojan attacks, Ransomware, Adware and Zero-day attacks. This research was conducted on a dataset having a total sample of 12863 malware, consisting of the malware categories mentioned above, to extract features and learn patterns. Also, we showed a comparison between these ML methods and analysis based on how they classify these popular malware in this paper after testing each classifier on the selected dataset. After implementation, RF achieved the highest accuracy of 86.97%, and Gaussian NB achieved the lowest accuracy of 47.84%. MLP, XGBoost, KNN, DT, AdaBoost, SVM, LR, SGD got 83.60%, 82.59%, 80.68%, 79.63%, 73.30%, 73.22%, 67.08%, 64.40% accuracy respectively. Other than accuracy, our analysis was based on individual accuracy, precision, and F1-score, TPR, TNR, FPR, and FNR of malware classes for each ML classifier.	en_US
dc.description.statementofresponsibility	Raisa Hasan Bushra
dc.description.statementofresponsibility	Md Taukir Alam
dc.description.statementofresponsibility	Aniruddho Saha
dc.description.statementofresponsibility	Nazmus Sakib Fahim
dc.description.statementofresponsibility	Nabila Mourium Binty
dc.format.extent	47 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Machine learning	en_US
dc.subject	Trojan	en_US
dc.subject	Adware	en_US
dc.subject	Ransomware	en_US
dc.subject	Classification	en_US
dc.subject	Malware	en_US
dc.subject	Zero-day	en_US
dc.subject	Naïve Bayes	en_US
dc.subject	Stochastic gradient descent	en_US
dc.subject	Random forest	en_US
dc.subject	Decision tree	en_US
dc.subject	AdaBoost	en_US
dc.subject	XGBoost	en_US
dc.subject	Logistic regression	en_US
dc.subject	Multi-layer perceptron	en_US
dc.subject	K- nearest neighbour	en_US
dc.subject	Support vector machine	en_US
dc.subject.lcsh	Regression analysis
dc.subject.lcsh	Computer algorithms
dc.title	Performance analysis of machine learning algorithms for Malware classification	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B.Sc. in Computer Science

Files in this item

Name:: 18301064, 18301277, 18201117, ...
Size:: 11.61Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1589]

Show simple item record