Privacy focused classification of prostate cancer using federated learning
Abstract
The prostate gland is a small gland located in the lower abdomen of a man. Prostate
cancer occurs when a tumor, or abnormal, malignant growth of cells, forms in the
prostate. Prostate cancer is a slow-growing cancer that often goes undetected until
it has progressed to an advanced stage. The majority of men with prostate cancer
are unaware of having it, and many of them die of other causes before they
even get diagnosed with it. However, prostate cancer becomes hazardous when it
grows rapidly or spreads outside of the prostate. With early detection and personalized
care, the prostate cancer survival rate is significantly increased. Deep
learning can play a significant role regarding this, as the field of medical imaging
has shown that identification based on computer-aided diagnosis helps radiologists
make more precise diagnoses while still reducing diagnostic time and costs. However,
the data concerning prostate cancer can be quite difficult to collect and it
is used in a restricted manner due to the unwillingness of the patients to share
and the hospital’s confidentiality about their patients’ records. The aim of our research
was to address these challenges and it led us to develop such a system where
prostate cancer can be classified, maintaining confidentiality of the data using a decentralized
method called federated learning, different from how it can be done with
current approaches. In this research, we have classified prostate cancer using simple
CNN, Xception and VGG19 models in both traditional and federated learning
approaches for comparative analysis. In fact, VGG19 outperformed the other two
models in both approaches, with centralized classification accuracy being 95.51%
and decentralized classification accuracy being 83.76%. Most importantly, through
our system, the instance of our server-side model is distributed to different clients so
that the clients can independently train their model using their local dataset in their
own environment. Eventually, the updated weights of those trained models return
back to the server to be aggregated from all the contemporary clients to finally train
our server-side model without even accessing confidential medical data in order to
ensure privacy focused classification.