TinyML for emotion detection in voice signals: evaluating and proposing algorithms for IoT wearable devices
Abstract
In today’s digital world, voice emotion recognition is essential for applications
like intelligent tutoring, audio mining, security, telecommunication,
HCI, lie detection, and human-machine interactions in various
settings. Voice, which is used to express one’s perspective and communicate
inter-personally, is one of the characteristics that differentiate
humans. The rise of IoT and wearable technology offers new opportunities
for real-time, remote emotion detection through voice. In the context
of voice processing-based emotion recognition, particularly in the
Internet of Things wearable, this thesis investigates the possibilities of
tiny machine learning or TinyML. To accomplish this goal, we evaluated
Bidirectional-LSTM and CNN on both vector quantization and raw data
gave us notable accuracy of 88%, 80%, 85%, and 81% respectively and
LSTM, Random Forest, Logistic Regression, KNN and GRU on only raw
data shows accuracy rates of 86%, 89%, 89%, 86% and 82% using the
composite dataset that includes well-known datasets such as RAVDESS,
CREMA-D, TESS, and SAVEE. Furthermore, the models with the best
accuracy were selected to be implemented within the TinyML framework,
Tensorflow-lite. Our benchmarks highlighted that most of the best performing
models were Recurrent Neural Network (RNN) based, notably
BiLSTM, LSTM, GRU alongside the CNN model. Finally, after validating
the findings through hardware implementation on Raspberry Pi
4, the study concludes that BiLSTM model would be most suitable for
speech emotion recognition tasks (SER) in the TinyML domain . The
hardware performance of the model illustrates how confident the model
actually is in predicting emotions from raw voice input within significant
resource and power constraints . These findings contribute to the ongoing
discourse on the intersection of voice emotion recognition, TinyML, and
IoT, showcasing the potential for enhanced human-machine interactions
in a wide variety of practical domains.