SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Maria Soledad Pera, Yiu Kai Ng*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

12 Citations (Scopus)

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D 1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

Original languageEnglish
Pages (from-to)27-41
Number of pages15
JournalWeb Intelligence and Agent Systems
Volume9
Issue number1
DOIs
Publication statusPublished - 2011
Externally publishedYes

Keywords

  • graphical view
  • Plagiarism
  • sentence similarity
  • word manipulation
  • word-correlation factor

Fingerprint

Dive into the research topics of 'SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents'. Together they form a unique fingerprint.

Cite this