Conversion of Bengali speech to text using long short-term memory(LSTM)
Abstract
Speech to text conversion is a remarkable topic in the field of Artificial Intelligence which is undoubtedly a significant medium of expressing human feelings and
thoughts. However, if we compare it with text to speech, work in speech to text
conversion has been done less. Among those works, many languages got priority but
the numerical value of work in Bengali language is little. Previously a similar work
has been done in that language where they got 82.35% accuracy using LSTM[15].
Our approach was to avail more accuracy in speech to text conversion using Neural
Network models. We build a novel dataset for research purposes. We tried both
GRU and LSTM and focused on LSTM later on. The reason behind it is, GRU
showed an unstable and started fluctuating where LSTM is much more stable and
minimized errors in case of loss function and the accuracy was also less compared
to LSTM. An increasing number of datasets was giving better accuracy and on the
whole dataset, the accuracy on testing data is around 90%. In terms of loss function,
testing loss is less than 40%. We did data testing manually to justify the result with
the given output and we got 90% accuracy rate in a dataset which the model never
fed before. In the future, we would like to work with automatic sentence recognition, the process of preparing the response basis of the statement, and also changing
sentiment depending on it.