Automated reference validation for scholarly publications using NLP
Abstract
Accurate references in scholarly publications are a crucial aspect of scientific writing.
The manual validation of references can be a time-consuming and error-prone
process. This research introduces an updated version of the automated referencing
validation model that makes the peer review process efficient. The proposed
model utilizes the capabilities of Natural Language Processing generating sentence
embeddings which uses an efficient algorithm. Our model first breaks down the
scholarly article into sections and uses topic modeling to group every section according
to their context properly. After that, It generates sentence embeddings for
each section. By making sets of embeddings, they are used to calculate the semantic
similarity between the query and the referred article. Additionally, this methodology
addresses the valid references for non-contextual scenarios such as having common
name entities. Lastly, strategic feature engineering is also being used for better
performance. We have created a dataset of scholarly papers with manually verified
references to evaluate the efficiency and accuracy of our model. This improved version
of the referencing validation model aims to outperform traditional models such
as Document-BERT, BERT, and SBERT regarding efficiency and accuracy. The
model can be used in interactive real-time systems, providing quick and reliable
feedback to peer reviewers. This study aims to make a contribution to the field
of automated referencing validation in scholarly publications. The model offers a
solution to the limitations of manual validation which makes it a valuable tool for
peer reviewers and researchers.