dc.contributor.advisor | Uddin, Dr. Jia | |
dc.contributor.author | Kabir, Humayun | |
dc.contributor.author | Ahmed, Ruhan | |
dc.contributor.author | Nasib, Abdullah Umar | |
dc.date.accessioned | 2018-02-25T08:00:00Z | |
dc.date.available | 2018-02-25T08:00:00Z | |
dc.date.copyright | 2017 | |
dc.date.issued | 2017-12 | |
dc.identifier.other | ID 14141004 | |
dc.identifier.other | ID 14101042 | |
dc.identifier.other | ID 17341001 | |
dc.identifier.uri | http://hdl.handle.net/10361/9546 | |
dc.description | This thesis report is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017. | en_US |
dc.description | Cataloged from PDF version of thesis report. | |
dc.description | Includes bibliographical references (pages 35-37). | |
dc.description.abstract | This paper aims to demonstrate the use of Speech-to-Text technology to convert Bangla spoken in a natural and continuous state into Bengali UNICODE font with good accuracy. This achievement required the usage of the open sourced framework Sphinx 4 created by Carnegie Melon University (CMU) which was written in Java and provides the required procedural coding tools to develop an acoustic model for a custom language like Bangla. It takes help of algorithms like Baum-Welch to create an Acoustic Model from training data which we gathered ourselves. Our main objective was to ensure that the system was adequately trained on a word by word basis from various speakers so that it could recognize new speakers fluently. We used a free digital audio workstation (DAW) called Audacity to manipulate the collected recording data via techniques like continuous frequency profiling to reduce the Signal-to-Noise-Ratio (SNR), vocal levelling, normalization and syllable splitting as well as merging to ensure an error free 1:1-word mapping of each utterance with its mirror transcription file text. The result is a speech to text recognition system with an acceptable accuracy of around 75% that was trained using recorded speech data from 10 individual speakers consisting of both males and females using custom transcript files that we wrote. | en_US |
dc.description.statementofresponsibility | Humayun Kabir | |
dc.description.statementofresponsibility | Ruhan Ahmed | |
dc.description.statementofresponsibility | Abdullah Umar Nasib | |
dc.format.extent | 37 pages | |
dc.language.iso | en | en_US |
dc.publisher | BRAC University | en_US |
dc.rights | BRAC University thesis reports are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Speech-to-text technology | en_US |
dc.subject | Bengali UNICODE | en_US |
dc.subject | Sphinx 4 | en_US |
dc.subject | Carnegie Melon University | en_US |
dc.subject | Baum-Welch | en_US |
dc.subject | Signal-to-noise-ratio | en_US |
dc.title | Real time bengali speech to text conversion using CMU sphinx | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, BRAC University | |
dc.description.degree | B. Computer Science and Engineering | |