Show simple item record

dc.contributor.advisorHossain, Muhammad Iqbal
dc.contributor.advisorRahman, Rafeed
dc.contributor.authorShaheen, Munia
dc.contributor.authorIfti, Akib Zabed
dc.contributor.authorHassan, Ariful
dc.contributor.authorHossain, Junaed
dc.date.accessioned2024-05-07T09:37:13Z
dc.date.available2024-05-07T09:37:13Z
dc.date.copyright©2024
dc.date.issued2024-01
dc.identifier.otherID: 23241102
dc.identifier.otherID: 23341129
dc.identifier.otherID: 20301259
dc.identifier.otherID: 23241107
dc.identifier.urihttp://hdl.handle.net/10361/22768
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 47-49).
dc.description.abstractUnderstanding speech just through lip movement is known as lipreading. It is a crucial component of interpersonal interactions. The majority of the previous initiatives attempted to address the English lipreading issue. However, our goal is to build up a deep neural network for the Bangla language that can produce comprehensible speech from silent videos just by capturing the speaker’s lip movements. Despite the fact that there is research on this topic in various languages, Bangla does not currently have a study or a suitable corpus to conduct research. Hence, we created a dataset of 4000 videos where we selected 20 Bangla words and these words were pronounced by 65 different speakers. Then we implemented models based on CNN-RNN architecture. Two models LipNet and autoencoder-decoder were used in previous research and two custom models were implemented as a part of our own experiments. Finally, Lip-Net exhibits a reasonable level of performance with an accuracy of 62%, while Auto Encoder-Decoder performs poorly with an accuracy of 49.65%. Custom Model-1 shows a substantial rise in accuracy with 70.86%, and Custom Conv-LSTM exhibits the best overall performance with a maximum accuracy of 76.24%.en_US
dc.description.statementofresponsibilityMunia Shaheen
dc.description.statementofresponsibilityAkib Zabed Ifti
dc.description.statementofresponsibilityAriful Hassan
dc.description.statementofresponsibilityJunaed Hossain
dc.format.extent58 pages
dc.language.isoenen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectConvolutional neural network (CNN)en_US
dc.subjectRecurrent neural network (RNN)en_US
dc.subjectLip feature extractionen_US
dc.subjectLip-readingen_US
dc.subjectDeep learningen_US
dc.subject.lcshMachine learning
dc.subject.lcshNeural networks (Computer science)
dc.titleSilent voice: harnessing deep learning for lip-reading in Banglaen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB.Sc. in Computer Science and Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record