Analysis of transformer and CNN based approaches for classifying renal abnormality from image data
Abstract
There is a pressing need to revise the current diagnostic framework for renal abnor
mality due to the projected increase in its global prevalence as about 10% of people
worldwide are suffering from renal diseases. Recognizing the escalating trends of
renal disease, proactive measures are warranted to overcome upcoming challenges
in accurate diagnosis and management. Renal abnormalities, often symptomless
and hard to diagnose, can be dangerous but curable if detected early. Therefore,
machine learning and deep learning techniques can be instrumental if implemented
correctly to determine this anomaly early in this modern time. Our approach for
renal abnormality detection from image data incorporates the topologies of Con
volutional Neural Networks and transformer-based image classification topologies,
as well as data augmentation methods and precise hyperparameter tuning (learn
ing rate, batch size, dropout rate, regularization strength, etc.); additionally, we
proposed CNN-based and transformer-based architectures for renal abnormality de
tection. Transformer-based deep learning methods are the latest trend in classify
ing diseases from medical images; for this reason, we analyzed the performance of
CNN-based architectures and transformer-based architectures. We build a hybrid
binary class dataset of Computed Tomography(CT) scan renal images using pri
mary data collected from Kidney Foundation Hospital & Research Institute, Dhaka,
Bangladesh and secondary data from publicly available online source. Our approach
is a sequence of steps that allows for the abnormality detection using state-of-the
art classifiers ResNet50, Inception ResNetV2, InceptionV3 and VGG16 along with
our proposed ResNet152 based custom model and ViT architecture-based custom
model without manual intervention. Our experimental results showed that our pro
posed transformer-based model achieved the highest accuracy of 99.99% while our
proposed CNN model achieved an accuracy of 99.97%. Among the four pre-trained
CNN models, ResNet50 scored the highest accuracy of 99.95%, and VGG16 scored
99.92%, InceptionResNetV2 was able to score 98.87%, while the lowest performance
was shown by the InceptionV3 model, which was 96.87%. All four pre-trained mod
els have demonstrated acceptable performance, and our proposed model was able to
perform better than state-of-the-art prepared models.