Comparative analysis between machine learning algorithms in efficiency of Coronary Heart Disease (CHD) prediction
Abstract
The world of Machine Learning is expanding everyday through its implementations in
modern day healthcare. Researchers have sketched out many ways to implement
Machine Learning algorithms and droned into ways to make them work in their utmost
efficiencies. As there will always be the need for healthcare in the world, we believe that
there will always be a need of comparison between Machine Learning algorithms in
terms of their performance and relevance to make healthcare more reliable through
Machine Learning. For this study, we have picked up the most commonly used Machine
Learning algorithms, Logistic Regression, Support Vector Machine, Decision Tree and
Random Forest to produce a comparative analysis on a dataset of Framingham Heart
Study which is dedicated to the prediction of risk of Coronary Heart Disease (CHD). We
have used a combination of Data Preprocessing and Feature Selection methods, namely
The Row Elimination method and Recursive Feature Elimination respectively. To understand
the impact of each prevailing features in the dataset on the target feature, we have
applied the Chi Squared Technique which is a highly recommended technique when it
comes to classification problems. To compare and analyze performance of the
algorithms, we applied concepts of the Confusion Matrix, Precision, Recall and F1
Scores; we have plotted ROC curves using Sensitivity and Specificity scores to categorize
the algorithms’ behavior. We have found out that the highest average accuracy in our
study was given by the Logistic Regression algorithm (83.9%) while the other algorithms
have come fairly close.