Performance analysis of machine learning algorithms for Malware classification
Date
2022-09-29Publisher
Brac UniversityAuthor
Bushra, Raisa HasanAlam, Md Taukir
Saha, Aniruddho
Fahim, Nazmus Sakib
Binty, Nabila Mourium
Metadata
Show full item recordAbstract
Malware detection research has been popular over the years as the variations and
complexity of malware attacks are increasing daily. Using variously Supervised and
Unsupervised machine learning algorithms to detect, identify, or classify malware
attacks has been proven a very effective technique for some past years. Some com-
mon and widely concerning malware attacks are Trojan, Adware, Ransomware, and
Zero-day. In this paper, we used ten ML algorithms such as AdaBoost, Stochastic
Gradient Descent (SGD), Naïve Bayes (NB), Decision Tree (DT), Random For-
est (RF), XGBoost, Logistic Regression (LR), Multi-Layer Perceptron (MLP), K-
Nearest Neighbour(KNN), Support Vector Machine (SVM) for classifying software-
based Trojan attacks, Ransomware, Adware and Zero-day attacks. This research
was conducted on a dataset having a total sample of 12863 malware, consisting of
the malware categories mentioned above, to extract features and learn patterns.
Also, we showed a comparison between these ML methods and analysis based on
how they classify these popular malware in this paper after testing each classifier
on the selected dataset. After implementation, RF achieved the highest accuracy of
86.97%, and Gaussian NB achieved the lowest accuracy of 47.84%. MLP, XGBoost,
KNN, DT, AdaBoost, SVM, LR, SGD got 83.60%, 82.59%, 80.68%, 79.63%, 73.30%,
73.22%, 67.08%, 64.40% accuracy respectively. Other than accuracy, our analysis
was based on individual accuracy, precision, and F1-score, TPR, TNR, FPR, and
FNR of malware classes for each ML classifier.