A study of malware classification using deep learning

Rahman, Mohammad Muhibur; Ahmed, Anushua; Khan, Mutasim Husain; Jamshed, Abrar; Rahman, Md Hafijur

View/Open

19201079_19201067_19201082_19201002_19301058_CSE.pdf (13.02Mb)

Date

2023-09

Abstract

Malware represents an intrusive computer program that is engineered by cybercriminals to destroy computer systems or steal and manipulate sensitive data. Malware classification is crucial to malware detection as it helps to assign malware to a specific category according to its characteristics. Characterizing and labeling variants of spyware is also useful as it will shed light on how they’re able to gain access to our systems in the first place, the dangers they possess, and the necessary preventions to take against them. In order to tackle such a serious security-related issue, we have decided to develop an image-processing system that would help us be faster at detecting malware while also possibly being one step ahead of cybercriminals. To describe and categorize sourced malware datasets, we will develop the system using various approaches for deep learning methods and even propose a simple CNN-based methodology of our own. The aim of our work is to show a comparative study of malware types with experimental results, making it easier to identify and keep track of malware that already exists while helping to detect new ones. To be more specific, we worked with four pre-trained CNN models in order to diversify our methods. These trained models include ResNet-50, Inception-V3, VGG-16, and DenseNet-201. After running and testing all of the models on the Malimg dataset, our suggested model was able to achieve a 97.64% accuracy rate in detecting malware greyscale images. This high level of testing accuracy also slightly outperformed some of the other cutting-edge models used in our comparison study on the dataset. These modern and highly developed models used for comparison include Involution, Vision Transformer (ViT), Compact Convolutional Transformer (CCT), and External Attention Network (EANet). Finally, we employed the use of an explainable artificial intelligence (AI) technique known as LIME to provide a more detailed clarification of the rationale behind our model’s selection and classification of individual samples into their respective classes.

Keywords

Malware; Deep learning; Neural network; Transformer

LC Subject Headings

Data mining; Malware (Computer software)--Prevention; Electronic transformers--Design and construction

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 46-50).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1496]