Application of machine learning in credit risk assessment: a prelude to smart banking
MetadataShow full item record
A precise credit risk assessment system is vital to a financial institution for its proper and impeccable functioning. Accurate estimations of credit risk will allow them to continue their operation in a gainful and transparent way. As the rate of loan defaults are gradually increasing, bank authorities are finding it more and more difficult to correctly assess loan requests. Thus the subject of credit risk has become a highly conferred and examined topic throughout the world. Numerous solutions have been given, one being more efficient than the other and several studies are still being made for solving this difficult predicament. Thus keeping the implications of such a problematic matter in mind this paper proposes to build a machine learning model which can precisely assess credit risk and predict possible loan defaulters for any credit lending institution. Taking into account a borrower’s financial and social history this paper proposes a way to accurately define whether a customer’s loan request should be accepted or not which in turn can steadily save the creditor from incurring further loss. Evaluating data from previous successful borrowers and loan defaulters, a comparative analysis have been made using our supervised learning model and the results obtained can be used to predict the behavior of future borrowers. This model can assist a financial institution in assessing whether it should accept a loan request or not. Different combinations of feature selection algorithm and classifiers have been made and based upon metrics such as accuracy, AUC score, F1 score etc. the best model has been selected. Recursive feature elimination with cross validation (RFECV) and Principal Component Analysis (PCA) have been used to find the optimum number of features needed to make an accurate prediction. This allows us to make more efficient and optimal use of the limited available resources. The assessment will be performed in a supervised environment and so Support Vector Machines (SVM), Random Forest, Extreme Gradient Boosting and Logistic Regression have been used as the classifiers. In order to ensure all possible combinations have been properly tested k folds cross validation has been used to bring out a more balanced result. Furthermore, GridSearchCV has been used to tune the selected hyperparameters for each model in order to obtain the best result possible. And based upon this a comparison in a tabular form has been shown which showcases the most and the least accurate model for precisely assessing loan requests.