A multimodal approach of sentiment analysis to predict customer emotions towards a product
Abstract
Begun roughly a decade ago, Sentiment analysis, the science of understanding human emotions, can trace its roots back to the middle of the 19th century. The
motive of sentiment analysis is to extract and predict human emotions through facial expressions, speech or even text in some cases. Being inspired by the existing
ideas, we propose a multi-modal model in the market that uses both facial cue and
speech to forecast customers’ sentiments and satisfaction towards a certain product.
Our model helps various companies get key insights for specific market regions and
their customers, and to gain a competitive advantage over the other. In this study,
we estimate product perception of a demography based on emotions that were extracted from customers’ facial expressions and speech. Although many researches
have been made in the eld, but very few of them are multifaceted, integrated systems, where the different components rely on each other to produce an absolute
result. We extract the emotions of people by recording their facial cues and speech
patterns as they interact with a specific product of the market, such as a mobile
phone in our case. We analyzed their facial expressions using AWS Rekognition. For
the textual part, we analyze the sentiments using an algorithm which has a mixture
of Tensorflow, Keras, Sequential model and RNN. Finally, we merge the previously
obtained emotions from the video section with the textual sentiment to get the features for our predictive model. The model was generated using an algorithm named
XGBoost. We have achieved an average accuracy of 81 percent approximately with
0.065 standard deviation by implementing cross validation of k-fold nature with
folds of 3 and also 5 different iterations.