Prediction of diabetes induced complications using different machine learning algorithms
Abstract
Machine Learning is an ever expanding field of Artificial Intelligence which uses huge amount of data to develop algorithms that can detect patterns and systems. One such application of Machine Learning is developing predictive models for disease prediction. On the other hand, in spite of huge advancements in Medical Science and discovery of complex diseases making everyone more health conscious, there is no way in Medical Science to predict prevalence of diseases. However, upon having relevant data Machine Learning methods can predict onset of many diseases. This paper presents the comparative analysis of different Machine Learning algorithms and their results in predicting the health complications related to Diabetes Mellitus. Diabetes Mellitus is a medical condition of the Pancreas in which the body‘s ability to produce or respond to the hormone, Insulin, diminishes. As a result, over time it damages other organs in the body- primarily Kidney, Liver, Eyes, Heart and Brain. Since in most cases the threats posed by Diabetes are not known before it is too late, hence it requires a great amount of consciousness in order to prevent onset of other related diseases. To this day, there is no prevention of Diabetes, since it is largely dependent on the genetics of a person. However, if a person is monitored closely it is possible to indicate Diabetes related complications. This proposed model uses time series data of a year that contains 164 features including results of different pathological tests. Methods such as Logistic Regression, SVM, Naïve Bayes, Decision Tree and Random Forest have been used in a supervised environment to predict the probability of Diabetes induced Nephropathy and Cardiovascular disease. PCA was applied beforehand to reduce the dimensionality of the dataset. Decision Tree without PCA produced the best results for Nephropathy with an AUC score of 0.87. While Naïve Bayes without PCA produced the best results for Cardiovascular disease, with an AUC score of 0.74. In summary, the model proposed in this paper predicts the risk of Nephropathy better than the risk of Cardiovascular disease.