PDFGuardian: An innovative approach to interpretable PDF malware detection using XAI with SHAP framework

Rahman, Tahsinur; Ahmed, Nusaiba; Monjur, Shama; Haque, Fasbeer Mohammad; Kabir, Naweed

View/Open

19101146, 19101236, 18201125, 19101269, 19101053_CSE.pdf (2.322Mb)

Date

2023-01

Publisher

Brac University

Abstract

As the world is moving more and more towards a digital era, a great majority of data is transferred through a famous format known as PDF. One of its biggest obstacles is still the age-old problem: malware. Even though several anti-malware and anti-virus software exist, many of which cannot detect PDF Malware. Emails carrying harmful attachments have recently been used in targeted cyber attacks against businesses. Because most email servers do not allow executable files to be attached to emails, attackers prefer to use non-executable files like PDF files. In various sectors, machine learning algorithms and neural networks have been proven to successfully detect known and unidentified malware. However, it can be difficult to understand how these models make their decisions. Such lack of transparency can be a problem, as it is important to understand how an AI system is making decisions in order to ensure that it is acting ethically and responsibly. In some cases, machine and deep learning models may make biased or discriminatory decisions or have unintended consequences. Hence, Explainable AI comes into play. To address this issue, this paper suggests using machine learning algorithms SGD(Stochastic Gradient Descent), XGBoost Classifier, and deep learning algorithms Single Layer Perceptron, ANN(Artificial Neural Network) and check their interpretability using Explainable AI (XAI)’s SHAP framework to classify a PDF file being malicious or clean for a global and local understanding of the models.

Keywords

Malware; PDF; PDF-analysis; Cybersecurity; SGD; Machine-learning; Detection; Deep learning; Artificial neural network; Algorithm; Single layer perceptron; Extreme gradient boosting; Explainable artificial intelligence; Shapley additive explanations; ANN; SHAP; XAI; XGBoost; Classifiers

LC Subject Headings

Artificial intelligence.; Computer security.

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 47-49).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]