Quality assessment of extracted information from newspaper comment sections using natural language processing
Abstract
Newspaper comment section– where readers can leave their opinions– can be an
excellent source of information embellishment if used properly. Although there is
a risk of fake news and misinformation being spread through the comment section,
quality information can also be extracted from these comments that may supplement
the original news. From recently performed research, a comment can range
between irrelevant to informative– and in our thesis, we would like to identify informative
news comments that will further be used to supplement the original news
article. We will also identify the level of informativeness of a newspaper comment
to figure out whether the task of assigning the Editor’s Pick flag (which is currently
done by hand at every large news outlet) with the help of state-of-the-art natural
language processing and information extraction techniques. We evaluated the similarity
between comments and their respective news articles using transformer models
like Sentence BERT. Furthermore, we checked if a comment logically entails using
different models, from Simple RNN and LSTM to advanced ones like Roberta and
big models like Electra. The final model for Textual Entailment (RoBERTa) task
outperformed all the other models by achieving an accuracy of 88.60% and the final
model for Textual Similarity (SBERT) task outperformed all the similarity models
with an accuracy of 68.49%.