Loan approval prediction using machine learning algorithms

Roy, Reak; Alam, Tahsin; Kabir, Syed Hafiz; Awsaf, Mirza Abyaz; Haque, Shadik Ul

View/Open

22301776, 19301171, 23241063, 20101146, 23141087_CSE.pdf (800.0Kb)

Date

2024-10

Publisher

BRAC University

Abstract

This research describes the potential of several classifiers of classical machine learning and architecture of deep neural networks when predicting the status of a loan application. The data set of 613 observations and 13 features, provided with the information about the applicants and their credit profiles, was utilized together with other techniques, such as bootstrapping, for more data qualityutimaltely leading to 9824 observations. Some imputation strategies were applied to deal with the lack of values, while also features were carefully prepared by employing ANOVA, Mutual Information and Tree based approaches among other statistical methods. For the validation of the model performance, the dataset was split into two parts: training (70%) and testing (30%). Many classical machine learning algorithms were applied including but not limited to Logistic Regression, Support Vector Classifiers(SVC), Decision Trees, Random Forests, Multi-Layer Perceptron, Gradient Boosting machines, K-Nearest Neighbors, etc. Out of all models used in the research, Random Forest Classifier demonstrated the most high values of accuracy of 86.84% and F1- score (0.9043), hence it was the best performing one. Advanced methodologies such as SMOTE (accuracy of 88.16%) and ADASYN (accuracy of 87.07% )were also used to handle the issue of class imbalance, where the performance of K- Nearest Neighbors was impressive acuuracy of 88.16% after resampling. In a different, yet similar analysis, five types of neural network architectures, Simple Recurrent Neural Network(RNN), Long-Short Term Memory(LSTM), Convolutional Neural Networks( CNN), Fully Connvolutional Neural Networks(FCNN) and Fully Connected Neural Networks(FCN) were built with the use of Tensorflow, Scikit-learn, and Numpy running on Google Colaboratory notebooks. The outcomes showed that the Fully Convolutional Network (FCN) has the best validation accuracy of 89.75% and validation loss of 0.2255 among the models built.

Keywords

Loan approval prediction; Machine learning; Random forest regressor; K-nearest neighbors; Financial analytics; RNN; CNN

LC Subject Headings

Neural networks (Computer science).; Data Mining--Finance.; Computational Intelligence.; Decision support systems.; Credit--Risks--Forecasting.

Description

This project report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.

Cataloged from PDF version of project report.

Includes bibliographical references (pages 56-60).

Department

Department of Computer Science and Engineering, BRAC University

Type

Project report

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1586]