Plagiarism detection using semantic analysis
Abstract
Plagiarism in the sense of “theft of intellectual property” has been around for as long as humans have produced work of art and research. However, easy access to the Web, large databases, and telecommunication in general, has turned plagiarism into a serious problem for publishers, researchers and educational institutions. Plagiarism detection is a technique to find out the theft of scientific paper, literary works, source code etc.
An existing method to find out similar documents is to use Self-Organizing Maps (SOMs)1. But there are some efficiency challenges like processing time arise in creating these maps. To facilitate recognition of plagiarism, Researchers2,3 at MIT used a set of low-level syntactic structures to evaluate content and expression in a document. However, we think only syntactic structures may not give optimal output in detecting plagiarism because it may not always detect the insight meaning.
To detect plagiarism, our idea is to propose a synonym and antonym based framework to evaluate text similarity with respect to the similarity of content between the original and plagiarized document. Rather using low-level syntactic structures i.e. Context-free Grammar (CFG)4, synonymic features of sentences which we think will improve the overall combat against plagiarism.