One voice is all you need: a one-shot approach to recognize you

Dipto, Shahriar Rumi; Nowshin, Priata; Ahmed, Intesur; Chowdhury, Deboraj; Noor, Galib Abdun

dc.contributor.advisor	Chakrabarty, Amitabha
dc.contributor.author	Dipto, Shahriar Rumi
dc.contributor.author	Nowshin, Priata
dc.contributor.author	Ahmed, Intesur
dc.contributor.author	Chowdhury, Deboraj
dc.contributor.author	Noor, Galib Abdun
dc.date.accessioned	2022-01-12T06:16:19Z
dc.date.available	2022-01-12T06:16:19Z
dc.date.copyright	2021
dc.date.issued	2021-09
dc.identifier.other	ID 20141036
dc.identifier.other	ID 20141035
dc.identifier.other	ID 18101685
dc.identifier.other	ID 18101242
dc.identifier.other	ID 20141037
dc.identifier.uri	http://hdl.handle.net/10361/15871
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 43-49).
dc.description.abstract	Human knowledge can quickly learn any unfamiliar concepts based on what they have previously learned. Keeping this in mind, researchers tested training models with limited training data in machine learning classification functions.One-shot learning has proven to be effective in the researches of Computer Vision sector, as it works accurately with a single labeled training example and a small number of training sets. By using a single input example from each class, one-shot learning can work more efficiently and quickly. For training the architecture of neural networks to predict similarities between two inputs, one-shot learning employs the Siamese network as neural network architecture. This architecture has been successfully used for various audio-related problems, but its use of one-shot learning in speaker recognition has received less attention. The goal of this thesis is to apply the concept of one-shot learning to classify speakers by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via support set and a query set to recognize the speaker accurately and faster. The proposed system is trained on the LibriSpeech dataset, which contains different audio recordings of speakers. The final one-shot is performed on few previously unseen classes, utilizing only a single sample of each type while making the classification by extracting features from training data and calculating the similarity ratio to recognize the speaker through the proposed model trained by the Siamese network. As we tested for several classes, the accuracy varied: for two classes, we got 100%, for three classes 95%, for four classes 84%, and for five classes 74%, which is significantly better than the other algorithms we tested for our solution. The results suggest that Siamese networks are a viable solution to the challenging one-shot audio classification issue.	en_US
dc.description.statementofresponsibility	Shahriar Rumi Dipto
dc.description.statementofresponsibility	Priata Nowshin
dc.description.statementofresponsibility	Intesur Ahmed
dc.description.statementofresponsibility	Deboraj Chowdhury
dc.description.statementofresponsibility	Galib Abdun Noor
dc.format.extent	49 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Audio classification	en_US
dc.subject	Siamese neural network	en_US
dc.subject	speaker recognition	en_US
dc.subject	Oneshot learning	en_US
dc.subject	Triplet loss	en_US
dc.subject.lcsh	Multimedia systems
dc.subject.lcsh	Neural networks (Computer science)
dc.title	One voice is all you need: a one-shot approach to recognize you	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 20141036, 20141035, 18101685, ...
Size:: 2.228Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1486]

Show simple item record