A study of malware classification using deep learning
Date
2023-09Author
Rahman, Mohammad MuhiburAhmed, Anushua
Khan, Mutasim Husain
Jamshed, Abrar
Rahman, Md Hafijur
Metadata
Show full item recordAbstract
Malware represents an intrusive computer program that is engineered by cybercriminals
to destroy computer systems or steal and manipulate sensitive data. Malware
classification is crucial to malware detection as it helps to assign malware
to a specific category according to its characteristics. Characterizing and labeling
variants of spyware is also useful as it will shed light on how they’re able to gain
access to our systems in the first place, the dangers they possess, and the necessary
preventions to take against them. In order to tackle such a serious security-related
issue, we have decided to develop an image-processing system that would help us
be faster at detecting malware while also possibly being one step ahead of cybercriminals.
To describe and categorize sourced malware datasets, we will develop
the system using various approaches for deep learning methods and even propose
a simple CNN-based methodology of our own. The aim of our work is to show a
comparative study of malware types with experimental results, making it easier to
identify and keep track of malware that already exists while helping to detect new
ones. To be more specific, we worked with four pre-trained CNN models in order
to diversify our methods. These trained models include ResNet-50, Inception-V3,
VGG-16, and DenseNet-201. After running and testing all of the models on the
Malimg dataset, our suggested model was able to achieve a 97.64% accuracy rate in
detecting malware greyscale images. This high level of testing accuracy also slightly
outperformed some of the other cutting-edge models used in our comparison study
on the dataset. These modern and highly developed models used for comparison
include Involution, Vision Transformer (ViT), Compact Convolutional Transformer
(CCT), and External Attention Network (EANet). Finally, we employed the use
of an explainable artificial intelligence (AI) technique known as LIME to provide a
more detailed clarification of the rationale behind our model’s selection and classification
of individual samples into their respective classes.