SUST Repository

A Model and Framework for Plagiarism Detection in Arabic Documents in Arabic Language

Show simple item record

dc.contributor.author Ali, Yahya Ali Abdelrahman
dc.contributor.author Supervisor, - Izzeldin Mohamed Osman
dc.date.accessioned 2018-09-19T08:22:14Z
dc.date.available 2018-09-19T08:22:14Z
dc.date.issued 2018-08-01
dc.identifier.citation Ali, Yahya Ali Abdelrahman.A Model and Framework for Plagiarism Detection in Arabic Documents in Arabic Language\Yahya Ali Abdelrahman Ali;Izzeldin Mohamed Osman.Khartoum:Sudan University of Science & Technology,College of Computer Science and Information Technology,2018.-155p.:ill.;28cm.-Ph.D. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/21466
dc.description Thesis en_US
dc.description.abstract Plagiarism has become an infamous problem in the global academic community. Detection of plagiarism in Arabic documents is particularly a challenging task due to the complexity of the structure of the language. This dissertation provides a model and framework for detection of plagiarism in Arabic documents, which is based on a logical representation of a document as paragraphs, sentences, and words. The main purpose of this research is to develop and implement the Arabic Documents Plagiarism Detection Model “ADPDM” which is based on the model that is capable in detection of plagiarism in Arabic documents and search mechanism for the similar candidate documents within the corpus collection. Through developing pre-processing method including stop word removal, stemming and rooting. The implementation is constructe around a content-based method consisting mainly in fingerprinting the texts according to Arabic language specificity and comparing their logical representations by using Heuristic algorithms. We have introduced a plagiarism detection tool for Arabic language by using the Brian Kernighan and Dennis Ritchie (BKDR) hash function for chunk (3-gram) hashing. The second goal of the logical document representation is to save computation time by avoiding unnecessary comparisons. For that reason, we have defined a heuristic algorithm for each level in the tree: document level, paragraph level, and sentence level. We measure it using the Longest Common Substring (LCS) metric. The ADPDM system for detecting plagiarism in electronic resources for Arabic documents were tested and evaluated using a set of the corpora used in this study. It has 100 documents, 90% of the documents were collected from AraPlagDet (Arabic Plagiarism Detection) web-site divided in three categories dataset1 (Small) , Dataset2 (medium) and dataset3 (Large) , and 10% of the documents were collected from the Decision Support System (DSS) document. The original documents has builded randomly replces and were constructed with different degrees of plagiarism Named dataset4. In this study, preliminary experiments were conudacted using our tool ADPDM and WCopyFind. The result shows that percentages of dateset1 is 14% plagiarize detection during 501 second where WCopyFind is detected 0% in 135 second, in dataset2 shows 8% in 1374 second where WCopyFind is detected 0% in 475 second. As well as dataset3, shows 18% in 1430 second where WCopyFind is detected 6.33% in 271 second, while dataset4 is detected 94% in1682.79 second where WCopyFind find out 81.44% in 357 second. The main conclusion that ADPDM is the best result handled plagiarism detection while it is weak in the time taken and WCopyFind it is weak to handled plagiarism detection while it best in the time taken. Filnaly, the experimental results shows perfect performance of ADPDM as it achieved a Recall value represents 0.780351, with Precision of 0.994264 and F- Measure 0.865688. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science & Technology en_US
dc.subject Plagiarism en_US
dc.subject Arabic Documents en_US
dc.subject A Model and Framework en_US
dc.title A Model and Framework for Plagiarism Detection in Arabic Documents in Arabic Language en_US
dc.title.alternative نموذج وإطار للكشف عن الانتحال في الوثائق باللغة العربية en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account