Abstract
With the huge amount of online and offline written data, plagiarism detection has become an eminent need for various fields of science and knowledge. Various context based plagiarism detection methods have been published in the literature. This paper, tries to develop a new plagiarism detection methods using text similarity for Arabic language text with 150 documents and 330 paragraphs (159 from the source document and 171 from Al-Khaleej corpus). The findings of the study show that the similarity measurement based on Lempel Ziv comparison algorithms is very efficient for the plagiarized part of the Arabic text documents with a successful rate of 71.42%. Future studies can improve the efficiency of the algorithms by combining more sophisticated computation, statistical and linguistics hybrid detection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Schleimer, S., Wilkerson, D., Aiken, A.: Cited By (since 1996) 177, pp. 76–85 (2003), http://www.scopus.com/inward/record.url?eid=2-s2.0-1142267351&partnerID=40&md5=9872bd8facb5cb07ff129dade9ca781f
Shivakumar, N., Garcia-Molina, H.: DL (1995)
Haddad, H., Liebrock, L.M., Omicini, A., Wainwright, R.L. (eds.) Proceedings of the 2005 ACM Symposium on Applied Computing (SAC), March 13-17. ACM, Santa Fe (2005)
Maurer, H.A., Kappe, F., Zaka, B.: J. UCS 12(8), 1050 (2006)
Platos, J., Snásel, V., El-Qawasmeh, E.: Advanced Engineering Informatics 22(3), 410 (2008)
Prilepok, M., Platos, J., Snasel, V.: Similarity based on data compression. Unpublished paper (2013)
Tversky, A.: Psychological Review 84, 327–352 (1977)
Cilibrasi, R., Vitányi, P.M.B.: IEEE Transactions on Information Theory 51(4), 1523 (2005)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: IEEE Transactions on Information Theory 50(12), 3250 (2004)
Kirovski, D., Landau, Z.: 2004 IEEE 6th Workshop on Multimedia Signal Processing, pp. 127–130 (2004), doi:10.1109/MMSP.2004.1436438
Crnojevic, V., Senk, V., Trpovski, Z.: 6th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Service, TELSIKS 2003, vol. 2, pp. 522–525 (2003), doi:10.1109/TELSKS.2003.1246280
Chudá, D., Uhlík, M.: Rachev, B., Smrikarov, A. (eds.) CompSysTech, pp. 429–434. ACM (2011)
Ziv, J., Lempel, A.: IEEE Transactions on Information Theory 24(5), 530 (1978), doi:10.1109/TIT.1978.1055934
Habash, N.: Introduction to Arabic Natural Language Processing. In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2010)
C.D.L. University. Stemming arabic text (1999), http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps (online; accessed September 22, 1999)
https://arabicstemmer.codeplex.com/releases/view/105699 (online; accessed April 26, 2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soori, H., Prilepok, M., Platos, J., Berhan, E., Snasel, V. (2014). Text Similarity Based on Data Compression in Arabic. In: Zelinka, I., Duy, V., Cha, J. (eds) AETA 2013: Recent Advances in Electrical Engineering and Related Sciences. Lecture Notes in Electrical Engineering, vol 282. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41968-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-41968-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41967-6
Online ISBN: 978-3-642-41968-3
eBook Packages: EngineeringEngineering (R0)