Voice impersonation detection using LSTM based RNN and explainable AI
Abstract
The advancing eld of arti cial synthetic media introduced deepfakes which made
it easier to synthesize a person's voice, identical to their original voice mechanically
to use it for negative means. People's voices are exposed to public as it is a pro -
cient and more convenient media of exchanging information over various mediums,
entertainment, speech delivering, news reading and so on, making it easier to collect
voice samples for creating fake yet almost identical voice samples to trick people. So
it has become vital to prevent this crime which led us to do this research paper for
saving the victims of voice impersonation attacks where we used LSTM based RNN
model in order to distinguished between real and synthesize voice.Furthermore, to
compare the results we got from the mentioned process, we build a SVM classi er
and nally we've explained the predicted outputs(fake or real) of both LSTM and
SVM model by using an Explainable AI method named LIME. Our research resulted
in 98.33% accuracy rate through our proposed model and very low percentage of
error in detecting fake/synthesized voices.