From impersonation to authentication: techniques for identifying deep fake voices

Rabbi, B.M Saqlain; Haque, Erfanul; Huri, Humayera Shayera; Tabassum, Nuzhut

View/Open

19201009, 19101587, 21341026, 23241091_CSE.pdf (611.8Kb)

Date

2024-10

Publisher

BRAC University

Abstract

The proliferation of fake voices has become a concerning issue, making it increasingly challenging to distinguish between authentic and fabricated audio recordings. Notable examples include the replication of the voices of prominent U.S. President through the use of AI technology. These manipulated audio clips have been disseminated across various YouTube channels, serving both benign and malicious purposes. While fake audio can be entertaining in content creation, it also carries a darker potential, including threats to political leaders and diplomatic relations between nations, potentially leading to conflict. In Bangladesh, the surge in fake audio content has left its populace in a state of skepticism, casting doubt on the authenticity of online videos and news reports. The citizens of Bangladesh have become susceptible to accepting counterfeit content as real, amplifying the need for a solution to this issue. Dealing with this problem needs the use of machine learning, AI, and various algorithmic techniques to mitigate the spread of fake audio within Bangladesh. To tackle this problem, we propose utilizing diverse audio datasets and trained models, running them through specific algorithms to enhance accuracy in discerning genuine from manipulated audio. The main goal of our thesis is to implement three features that will help to detect the deep-fake audio spread across various social media affecting the lives of people and prevent scamming across various online banking transactions. In order to complete our desired project we had to work on various machine-learning models – namely Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), bi LSTM and LSTM based RNN to detect fake audios across social media. Also using these models along with feature extraction like Mel-Frequency Cepstral Coefficients (MFCC) and Short-Term Fourier Transform (STFT), our study aims to create a strong system to differentiate between real and deepfake-audio. In spite of facing multiple challenges and achieving a better accuracy, we were successfully able to reach our desired goal.We accomplished our accuracy at 99% using Bi-LSTM for deepfake audio detection.

Keywords

Audio detection; Fake voice; Manipulated audio; Machine learning; RNN; LSTM; BiLSTM; MFCC

LC Subject Headings

Deepfakes--Detection.; Deep learning (Machine learning).

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 55-56).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]