RED-LSTM: Real time emotion detection using LSTM
Abstract
The development of the Internet of Things and voice-based multimedia apps has
allowed for the association and capture of several aspects of human behavior through
the use of big data, which consists of trends and patterns. In the emotion of human
speech, there is a latent representation of numerous aspects that are expressed. By
mining audio-based data, it has been prioritized to extract sentiment from human
speech. This capacity to recognize and categorize human emotion will be crucial for
developing the next generation of AI. The machine will then begin to connect with
human desires as a result. The audio-based data, such as voice emotion recognition,
has not been able to produce results as accurate as those of text-based emotion
recognition in terms of performance. For acoustic modal data, this study presents
a combined strategy of feature extraction and data encoding with one hot vector
embedding. When real-time data is available, LSTM has even employed an RNN based model to forecast the emotion that captures the human voice’s tone and
signifies it. When predicting categorical emotion, the model has been assessed and
shown to perform better than the other models by about 10%. The model has been
tested against two benchmark datasets, RAVDESS and TESS, which contain voice
actors’ renditions of eight different emotions. This model beat other cutting-edge
models, achieving approximately 80% accuracy for weighted data and approximately
85% accuracy for unweighted data.