Prediction of genetic mutation from clinical data of sickle cell disease using few-shot siamese bidirectional LSTM and federated learning
Abstract
Sickle Cell Disease is a monogenic genetic disorder which often leads to various
repercussions affecting multiple vital organs simultaneously. However, the treat-
ment for Sickle Cell is diverse and often varies from patient to patient, but several
background studies revealed the progression and symptoms of Sickle Cell can be
predicted to a great extent based on a patient’s genetic mutation type in the HBB
gene. Moreover, such research regarding genetic mutation prediction can be seen in
other fields of medicine such as cancer, but in the case of Sickle Cell it is scarce. Fur-
thermore, other limitations include complexity and unavailability of genetic testing,
limited clinical data available and privacy concerns regarding medical information
of patients. Hence, our study aimed to build a Federated Siamese Bidirectional
LSTM to predict the Sickle Cell genotype from clinical data, in case of sparse and
decentralized data. Consequently, a Sickle Cell clinical dataset with 216 instances
and 4 different genotype class labels was pre-processed accordingly to train and
evaluate the model performance. The dataset was then used to create pairs with
corresponding similarity scores and the Siamese Bi-LSTM was trained for several
epochs to compute similarity between two instances. The data was divided among
client devices in case of federated, while the Siamese Bi-LSTM trained locally to
update the global model and the test data was then used to assess their perfor-
mance. Thus, based on the performance analysis the Siamese Bi-LSTM achieved
accuracy of 90.45% with f1 score of 90.66% and the Federated Siamese Bi-LSTM
model (FFSB-LSTM) achieved accuracy of 88.25% and f1 score of 88.57% show-
ing significant improvement compared to the baseline KNN and Logistic Regression
models.