Plagiarism detection using semantic analysis

Shams, Khalid

View/Open

Thesis Report.pdf (549.7Kb)

Date

2010-04

Publisher

BRAC University

Abstract

Plagiarism in the sense of “theft of intellectual property” has been around for as long as humans have produced work of art and research. However, easy access to the Web, large databases, and telecommunication in general, has turned plagiarism into a serious problem for publishers, researchers and educational institutions. Plagiarism detection is a technique to find out the theft of scientific paper, literary works, source code etc. An existing method to find out similar documents is to use Self-Organizing Maps (SOMs)1. But there are some efficiency challenges like processing time arise in creating these maps. To facilitate recognition of plagiarism, Researchers2,3 at MIT used a set of low-level syntactic structures to evaluate content and expression in a document. However, we think only syntactic structures may not give optimal output in detecting plagiarism because it may not always detect the insight meaning. To detect plagiarism, our idea is to propose a synonym and antonym based framework to evaluate text similarity with respect to the similarity of content between the original and plagiarized document. Rather using low-level syntactic structures i.e. Context-free Grammar (CFG)4, synonymic features of sentences which we think will improve the overall combat against plagiarism.

Keywords

Computer science and engineering

Description

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2010.

Cataloged from PDF version of thesis report.

Includes bibliographical references (page 24).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1583]