From impersonation to authentication: techniques for identifying deep fake voices
Abstract
The proliferation of fake voices has become a concerning issue, making it increasingly
challenging to distinguish between authentic and fabricated audio recordings.
Notable examples include the replication of the voices of prominent U.S. President
through the use of AI technology.
These manipulated audio clips have been disseminated across various YouTube channels,
serving both benign and malicious purposes. While fake audio can be entertaining
in content creation, it also carries a darker potential, including threats to
political leaders and diplomatic relations between nations, potentially leading to
conflict. In Bangladesh, the surge in fake audio content has left its populace in a
state of skepticism, casting doubt on the authenticity of online videos and news reports.
The citizens of Bangladesh have become susceptible to accepting counterfeit
content as real, amplifying the need for a solution to this issue.
Dealing with this problem needs the use of machine learning, AI, and various algorithmic
techniques to mitigate the spread of fake audio within Bangladesh. To
tackle this problem, we propose utilizing diverse audio datasets and trained models,
running them through specific algorithms to enhance accuracy in discerning genuine
from manipulated audio. The main goal of our thesis is to implement three features
that will help to detect the deep-fake audio spread across various social media
affecting the lives of people and prevent scamming across various online banking
transactions. In order to complete our desired project we had to work on various
machine-learning models – namely Recurrent Neural Network (RNN), Long Short
Term Memory (LSTM), bi LSTM and LSTM based RNN to detect fake audios
across social media. Also using these models along with feature extraction like
Mel-Frequency Cepstral Coefficients (MFCC) and Short-Term Fourier Transform
(STFT), our study aims to create a strong system to differentiate between real and
deepfake-audio. In spite of facing multiple challenges and achieving a better accuracy,
we were successfully able to reach our desired goal.We accomplished our
accuracy at 99% using Bi-LSTM for deepfake audio detection.