Bangla speech to text conversion using CMU sphinx
Abstract
Speech is the most normal type of communication and association between people while
content (text) and images are the most basic types of exchange in the computer system.
Therefore, enthusiasm in regards to transformation between speech and text is expanding
day by day for integrating the human-computer relation. Understanding speech for a human
is not a challenge but for a machine it is a big deal because a machine does not catch
expression or human nature. For the conversion of speech into text, this proposed model
requires the usage of the open sourced framework Sphinx 4 which is written in Java. For
the proposed system, it requires certain steps which are training an acoustic model, creating
a language model and building a dictionary with CMUSphinx. For training, the audio
files were recorded by 8 speakers both male and female for more accuracy. Among them, 6
speakers recorded each word 3 times. To test the accuracy, we took audio recordings from
2 speakers among them one speaker is unknown to the system. After testing, we got the
accuracy around 59.01%. For known speakers we got 78.57% accuracy. We gave audio files
as input only to check accuracy as our main purpose was to make a system which works in
real time. In our system, user can speak in real time and the system converts it into text.