A framework for sentiment analysis: a data-driven approach
Abstract
Internet is free and straightforward access to an immense measure of crude content information that can be mined for sentiment analysis. For a long time, this is being used for market research, user opinion mining, recommendation systems, analyze people’s views on a topic, etc. Many different techniques have been developed, yet a lot of complication remains. Selecting and understanding attribute patterns in a text dataset is important to build a good model and know where this model can be used. Different text datasets have different relations between their attributes and classes. For example, let’s take a dataset with totally random English texts labelled as positive or negative. We expect to see that extracted attributes for the positive or negative class are very heavy with general words that we consider positive or negative in everyday English use. However, if the dataset is created on a niche topic, such as an economic, pandemic, etc, we would probably see that positive and negative classes are now heavy with words specific to these topics, or they may not be considered important at all by the classifier. However, we might want to give importance to those niche-specific attributes specifically. In this paper, we take five different datasets of different instance lengths. We use Weka as a tool and go through some attribute selection techniques, do sentence-level sentiment analysis, and finally extract patterns from the datasets to analyze them. There are few related works on these datasets and our technique performed better than the existing works.We have been successful to beat Fuzzy method in terms of accuracy and better extraction of polarity in texts. Our approach have been proven to better work with the datasets than many former methods.In thispaper, we aim to present a method that can easily be fruitful to any dataset for textmining and can have a decent accuracy In this paper, we aim to present a method that can easily be fruitful to any dataset for text mining and can have a decent accuracy.