Quality assessment of extracted information from newspaper comment sections using natural language processing

Deb, Arnob; Islam, Maidul; Hossain, Sadab Sifar; Alam, Farjana

View/Open

23241076, 20101309, 23341064, 20101022_CSE.pdf (770.5Kb)

Date

2024-01

Publisher

Brac University

Abstract

Newspaper comment section– where readers can leave their opinions– can be an excellent source of information embellishment if used properly. Although there is a risk of fake news and misinformation being spread through the comment section, quality information can also be extracted from these comments that may supplement the original news. From recently performed research, a comment can range between irrelevant to informative– and in our thesis, we would like to identify informative news comments that will further be used to supplement the original news article. We will also identify the level of informativeness of a newspaper comment to figure out whether the task of assigning the Editor’s Pick flag (which is currently done by hand at every large news outlet) with the help of state-of-the-art natural language processing and information extraction techniques. We evaluated the similarity between comments and their respective news articles using transformer models like Sentence BERT. Furthermore, we checked if a comment logically entails using different models, from Simple RNN and LSTM to advanced ones like Roberta and big models like Electra. The final model for Textual Entailment (RoBERTa) task outperformed all the other models by achieving an accuracy of 88.60% and the final model for Textual Similarity (SBERT) task outperformed all the similarity models with an accuracy of 68.49%.

Keywords

Natural language processing; Information extraction; S-BERT; RoBERTa; Similarity

LC Subject Headings

Natural language processing (Computer science)

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 39-40).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1583]