Classification of hotel reviews using sentiment analysis and machine learning
Abstract
Social media has become essential for people all over the world. It has given a platform
for people to share thoughts, emotions, opinions, and ideas, causing a huge
deal of data upsurge. Such an amount of data could be analyzed based on sentiment
analysis and text classification via construction of an effective machine learning
model. The concept gets more insight into it through analysis of the data, which is
nearly impossible to conduct manually due to its huge configuration. This research
focuses on the user’s comments, and reviews about different hotels to predict their
sentiment. As for the datasets, comments and reviews of hotels from online sites
have been utilized. Moreover, text pre-processing techniques like tokenization, case
folding, stopword removal, lemmatization, and duplicate data removal have been
applied. TF-IDF and Bag of Words has been applied for word embedding. Furthermore,
the effectiveness of supervised machine learning algorithms like, Support
Vector Machine, Na¨ıve Bayes, Random Forest, and Logistic Regression was evaluated
and from the comparative analysis, it was observed that the Logistic Regression
provided the most accuracy ranging from 86 to 89 percent.