Show simple item record

dc.contributor.advisorAlam, Md. Golam Rabiul
dc.contributor.advisorNayla, Nishat
dc.contributor.authorAhmed, Syed Istiaque
dc.contributor.authorHossain, Md. Jubayer
dc.contributor.authorHoque, Kayes Mohammad Bin
dc.contributor.authorTusher, Mahmadur Rahman
dc.contributor.authorIslam, Sajedur
dc.date.accessioned2024-09-09T05:00:39Z
dc.date.available2024-09-09T05:00:39Z
dc.date.copyright©2024
dc.date.issued2024-05
dc.identifier.otherID 20101273
dc.identifier.otherID 20101470
dc.identifier.otherID 20101471
dc.identifier.otherID 20101005
dc.identifier.otherID 23141093
dc.identifier.urihttp://hdl.handle.net/10361/24029
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 82-84).
dc.description.abstractIn a rapidly developing linguistic technology, the key role of phoneme recognition consists of understanding language and language learning. The research will be framed where a recognition system is developed for the language of Bangla—vowels, consonants, and numbers for children of age three to six years. By adopting ad vanced approaches like technological methods and classical phonetic education, the spectrogram images of the Bengali children we investigate are classified. Among the techniques associated with modern machine learning (ML) the pervasive techniques are image recognition and large language models (LLM) which have extended to the less explored domain of Bangla phoneme spectrogram image recognition. From our group of 21 participants, we have generated balanced 31,147 spectrogram images a new dataset that we have created from scratch. This is because the dataset was done meticulously to serve as a complete resource for researchers of Bangla-speaking children’s phoneme recognition. Therefore, we then trained ten pre-existing deep learning models that were capable of interpreting and optimizing their performance in Bangla phoneme recognition by using our dataset. Based on these, the SENet model stood out among other existing models with a high performance of 96. 89% accuracy on our testing data set. The ResNet50 and VGG19 models produced the best outcomes among the deep learning models tested which ranked second and third respectively with an accuracy of 88. 8% and 87%. Based on these findings, we propose a novel architecture, Spectrogram SE-Transformer Block Network (Spectro SETNet), which is a hybrid of the ResNet50 model to which the SE and Transformer blocks have been added, in order to cope with more complicated data and to limit the computational power. The original hypothesis is that the model not only im proves the accuracy of Bengali speech recognition for children but also offers a new standard for more complex data processing with less computational power.en_US
dc.format.extent84 pages
dc.language.isoenen_US
dc.publisherBrac University
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectAutomatic speech recognitionen_US
dc.subjectCharacter’s recognitionen_US
dc.subjectDeep learningen_US
dc.subjectMel-frequency spectrogramen_US
dc.subjectSpectro-SETNeten_US
dc.subject.lcshAutomatic speech recognition--Data processing.
dc.subject.lcshDeep learning (Machine learning).
dc.subject.lcshSpectrometer--Data processing.
dc.titleComprehensive analysis and development of deep learning models for Bengali character’s spectrogram image classification in child speech: introduction of spectro SETNeten_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB.Sc in Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record