SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Maria Soledad Pera; Yiu Kai Ng

doi:10.3233/WIA-2011-0203

SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Maria Soledad Pera, Yiu Kai Ng^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

12 Citations (Scopus)

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D ₁ and D₂ based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D₁ and D₂. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

Original language	English
Pages (from-to)	27-41
Number of pages	15
Journal	Web Intelligence and Agent Systems
Volume	9
Issue number	1
DOIs	https://doi.org/10.3233/WIA-2011-0203
Publication status	Published - 2011
Externally published	Yes

Keywords

graphical view
Plagiarism
sentence similarity
word manipulation
word-correlation factor

Access to Document

10.3233/WIA-2011-0203

Cite this

@article{4ad6f4d5fcfd4ad09bea0051e7427a21,

title = "SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents",

abstract = "Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D 1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.",

keywords = "graphical view, Plagiarism, sentence similarity, word manipulation, word-correlation factor",

author = "Pera, {Maria Soledad} and Ng, {Yiu Kai}",

year = "2011",

doi = "10.3233/WIA-2011-0203",

language = "English",

volume = "9",

pages = "27--41",

journal = "Web Intelligence and Agent Systems",

issn = "1570-1263",

publisher = "IOS Press",

number = "1",

}

TY - JOUR

T1 - SimPaD

T2 - A word-similarity sentence-based plagiarism detection tool on Web documents

AU - Pera, Maria Soledad

AU - Ng, Yiu Kai

PY - 2011

Y1 - 2011

N2 - Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D 1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

AB - Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D 1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.

KW - graphical view

KW - Plagiarism

KW - sentence similarity

KW - word manipulation

KW - word-correlation factor

UR - http://www.scopus.com/inward/record.url?scp=79551697630&partnerID=8YFLogxK

U2 - 10.3233/WIA-2011-0203

DO - 10.3233/WIA-2011-0203

M3 - Article

AN - SCOPUS:79551697630

SN - 1570-1263

VL - 9

SP - 27

EP - 41

JO - Web Intelligence and Agent Systems

JF - Web Intelligence and Agent Systems

IS - 1

ER -

SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this