Predicting diabetes using machine learning: a comparative study of supervised classification models
Abstract
Diabetes is a primary worldwide health concern that can develop at any age and has serious consequences. It results from imbalanced glucose levels in the body. As well as being a long-term disease, it has other associated risks, from life-threatening problems to financial loss. So, it is essential to correctly detect this condition as soon as possible to mitigate further complications. Due to developments in medical technology, many tools are available today for diagnosing diseases. To ensure faster predictions and diagnosis of patients, one such tool known as machine learning (ML) algorithms is used. It is a section of Artificial Intelligence (AI) that replicates a human's learning process to train a system. In this study, the algorithms used to predict diabetes patients are supervised classification ML algorithms like Logistic Regression, K-Nearest Neighbor, Naïve Bayes, Decision Tree, and Random Forest. The data used is primary data, which is collected from Bangladeshi adults from different age groups. It consists of all the demographic data, medical history, and family information necessary for the study. The dataset is collected and cleaned for repetition and errors. From these data, diabetes status is taken as the dependent variable, and the associated risk factors are the independent variable. Then, the model is deployed using the RapidMiner tool. The confusion matrices for each model are also produced, and a comparative analysis is carried out. After evaluating their performances, the highest accuracy achieved was 94.62% and 94.23%. From these findings, the best model can be determined. This selection of the ideal model is useful because it will help in the proper and timely identification of patients in the future in the healthcare sector so that treatment can be done to curb the disease.