SUST Repository

Study of Arabic and English Documents Using Classification Methods

Show simple item record

dc.contributor.author Sultan, Mohammed Mohammed Abdullah
dc.contributor.author Supervisor, -Talaat Mohielddin Wahby
dc.date.accessioned 2018-04-12T07:26:17Z
dc.date.available 2018-04-12T07:26:17Z
dc.date.issued 2018-01-02
dc.identifier.citation Sultan, Mohammed Mohammed Abdullah .Study of Arabic and English Documents Using Classification Methods /Mohammed Mohammed Abdullah Sultan ;Talaat Mohielddin Wahby .-Khartoum: Sudan University of Science and Technology, college of Computer science and information technology, 2018 .- 60p. :ill. ;28cm .- M.Sc. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/20681
dc.description Thesis en_US
dc.description.abstract Text classification is the process of classifying documents into a predefined set of categories based on their content. A variety of classifiers are used to classify Arabic and English text documents using many classification algorithms. The main objective of this research is to compare many classification algorithms on Arabic and English text documents that have the same content which called Parallel Documents to figure which classification algorithm is better on them. In this research, we will use four classification algorithms (Naïve Bayes, k-nearest neighbor, Sequential minimal optimization, and J84). The first algorithm Naïve Bayes and this algorithm have shown equal efficiency in the classification of Arabic and English documents, but it took close to twice the time on the Arabic documents. The second algorithm is a k-nearest neighbor and this algorithm has shown high accuracy on English documents, but it shows less accuracy on Arabic documents. The third algorithm is Sequential minimal optimization and this algorithm has shown high accuracy on English and Arabic documents, and it is the best algorithm that has provided highly efficient and very closes classification accuracy and classification time as well. The last algorithm J48 and this algorithm have shown equal efficiency in the classification of Arabic and English documents, but it took almost twice the time of classification on English documents than the Arabic documents. The experiments were done using WEKA data mining tool. And we have using United Nation Parallel Documents. We used a platform of Intel Core i3 Processing power of 2.13 GHz CPU with 4GB RAM. Depending on these results some of the classification algorithms achieving higher accuracy on English document than the Arabic documents. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science & Technology en_US
dc.subject Classification Methods en_US
dc.subject Documents Using en_US
dc.subject Information Technology en_US
dc.title Study of Arabic and English Documents Using Classification Methods en_US
dc.title.alternative دراسة الوثائق العربية والإنجليزية باستخدام طرق التصنيف en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account