dc.contributor.advisor | Hossain, Muhammad Iqbal | |
dc.contributor.advisor | Rahman, Rafeed | |
dc.contributor.author | Shaheen, Munia | |
dc.contributor.author | Ifti, Akib Zabed | |
dc.contributor.author | Hassan, Ariful | |
dc.contributor.author | Hossain, Junaed | |
dc.date.accessioned | 2024-05-07T09:37:13Z | |
dc.date.available | 2024-05-07T09:37:13Z | |
dc.date.copyright | ©2024 | |
dc.date.issued | 2024-01 | |
dc.identifier.other | ID: 23241102 | |
dc.identifier.other | ID: 23341129 | |
dc.identifier.other | ID: 20301259 | |
dc.identifier.other | ID: 23241107 | |
dc.identifier.uri | http://hdl.handle.net/10361/22768 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 47-49). | |
dc.description.abstract | Understanding speech just through lip movement is known as lipreading. It is a
crucial component of interpersonal interactions. The majority of the previous initiatives
attempted to address the English lipreading issue. However, our goal is to
build up a deep neural network for the Bangla language that can produce comprehensible
speech from silent videos just by capturing the speaker’s lip movements.
Despite the fact that there is research on this topic in various languages, Bangla
does not currently have a study or a suitable corpus to conduct research. Hence, we
created a dataset of 4000 videos where we selected 20 Bangla words and these words
were pronounced by 65 different speakers. Then we implemented models based on
CNN-RNN architecture. Two models LipNet and autoencoder-decoder were used
in previous research and two custom models were implemented as a part of our
own experiments. Finally, Lip-Net exhibits a reasonable level of performance with
an accuracy of 62%, while Auto Encoder-Decoder performs poorly with an accuracy
of 49.65%. Custom Model-1 shows a substantial rise in accuracy with 70.86%,
and Custom Conv-LSTM exhibits the best overall performance with a maximum
accuracy of 76.24%. | en_US |
dc.description.statementofresponsibility | Munia Shaheen | |
dc.description.statementofresponsibility | Akib Zabed Ifti | |
dc.description.statementofresponsibility | Ariful Hassan | |
dc.description.statementofresponsibility | Junaed Hossain | |
dc.format.extent | 58 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Convolutional neural network (CNN) | en_US |
dc.subject | Recurrent neural network (RNN) | en_US |
dc.subject | Lip feature extraction | en_US |
dc.subject | Lip-reading | en_US |
dc.subject | Deep learning | en_US |
dc.subject.lcsh | Machine learning | |
dc.subject.lcsh | Neural networks (Computer science) | |
dc.title | Silent voice: harnessing deep learning for lip-reading in Bangla | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc. in Computer Science and Engineering | |