On retrieving intelligently plagiarized documents using semantic similarity

SF Hussain, A Suryani - Engineering Applications of Artificial Intelligence, 2015 - Elsevier
Engineering Applications of Artificial Intelligence, 2015Elsevier
Plagiarism in text documents can be done in many ways. The most common form of
plagiarizing a text document is to copy a chunk of text and alter it intelligently, thereby
making it look original. Such cases are hard to detect since they require semantic analysis of
the document. External sources of knowledge such as WordNet have been employed to
help detect such cases. However, such an approach might often miss the contextual
significance of the employed words, as well as suffer from the issue of synonymy and …
Abstract
Plagiarism in text documents can be done in many ways. The most common form of plagiarizing a text document is to copy a chunk of text and alter it intelligently, thereby making it look original. Such cases are hard to detect since they require semantic analysis of the document. External sources of knowledge such as WordNet have been employed to help detect such cases. However, such an approach might often miss the contextual significance of the employed words, as well as suffer from the issue of synonymy and polysemy. We propose an architecture that uses a semantic similarity measure that exploits the semantic similarity of words, as mined from within the data corpus, thereby using localized contextual information. In this work, an approach for detecting plagiarism in text document has been proposed using a semantic similarity measure with a Nearest Neighbor (NN) search, and using a kernel in multiclass support vector machine. We test our approach on a plagiarism dataset specially developed to test the efficacy of the solution with varying level of plagiarism. The results have been compared with that of well-known commercial software, Turnitin®, having access to a large database. Our experiments suggest that using semantic kernels can help detect plagiarism, which can outsmart available techniques.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果