Study of Arabic and English Documents Using Classification Methods

Sultan, Mohammed Mohammed Abdullah; Supervisor, -Talaat Mohielddin Wahby

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
Masters Dissertations : Computer Science and Information Technology
→
View Item

dc.contributor.author	Sultan, Mohammed Mohammed Abdullah
dc.contributor.author	Supervisor, -Talaat Mohielddin Wahby
dc.date.accessioned	2018-04-12T07:26:17Z
dc.date.available	2018-04-12T07:26:17Z
dc.date.issued	2018-01-02
dc.identifier.citation	Sultan, Mohammed Mohammed Abdullah .Study of Arabic and English Documents Using Classification Methods /Mohammed Mohammed Abdullah Sultan ;Talaat Mohielddin Wahby .-Khartoum: Sudan University of Science and Technology, college of Computer science and information technology, 2018 .- 60p. :ill. ;28cm .- M.Sc.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/20681
dc.description	Thesis	en_US
dc.description.abstract	Text classification is the process of classifying documents into a predefined set of categories based on their content. A variety of classifiers are used to classify Arabic and English text documents using many classification algorithms. The main objective of this research is to compare many classification algorithms on Arabic and English text documents that have the same content which called Parallel Documents to figure which classification algorithm is better on them. In this research, we will use four classification algorithms (Naïve Bayes, k-nearest neighbor, Sequential minimal optimization, and J84). The first algorithm Naïve Bayes and this algorithm have shown equal efficiency in the classification of Arabic and English documents, but it took close to twice the time on the Arabic documents. The second algorithm is a k-nearest neighbor and this algorithm has shown high accuracy on English documents, but it shows less accuracy on Arabic documents. The third algorithm is Sequential minimal optimization and this algorithm has shown high accuracy on English and Arabic documents, and it is the best algorithm that has provided highly efficient and very closes classification accuracy and classification time as well. The last algorithm J48 and this algorithm have shown equal efficiency in the classification of Arabic and English documents, but it took almost twice the time of classification on English documents than the Arabic documents. The experiments were done using WEKA data mining tool. And we have using United Nation Parallel Documents. We used a platform of Intel Core i3 Processing power of 2.13 GHz CPU with 4GB RAM. Depending on these results some of the classification algorithms achieving higher accuracy on English document than the Arabic documents.	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en	en_US
dc.publisher	Sudan University of Science & Technology	en_US
dc.subject	Classification Methods	en_US
dc.subject	Documents Using	en_US
dc.subject	Information Technology	en_US
dc.title	Study of Arabic and English Documents Using Classification Methods	en_US
dc.title.alternative	دراسة الوثائق العربية والإنجليزية باستخدام طرق التصنيف	en_US
dc.type	Thesis	en_US