Comparison of machine learning techniques to predict cardiovascular disease
Abstract
The purpose of this thesis is to examine and compare the accuracy of different
data mining classication systems through different machine learning techniques
to predict cardiovascular disease. This comparison shows the different accuracy
rates of different techniques and reasons behind their variations. The Cleveland
dataset for heart diseases has been used in this study which contains 303
instances. The data has been divided into two sections named as training and testing
datasets. The 10- fold Cross Validation has been used here in order to work
with the expanded dataset. The k-Nearest Neighbors, Support Vector Machine,
Decision Tree, Random Forest, Gaussian Naive Bayes, Logistic Regression and
Deep Belief Network machine learning techniques have been investigated in this
research. Besides, ensemble learning method voting classifier has been applied
on the data set. By the end of the implementation part, we have found Gaussian
Naive Bayes is giving the maximum accuracy in our dataset and deep belief
network is performing very poor. The reasons of variations of these different
techniques by analyzing their characteristics and behavior with respect to the
dataset has been understood by the study conducted for this thesis.