Parallel Support Vector Machine for big data Classification

Abdelkarim, Iatimad Mohammed Sati; Supervisor, -JohnsonAgbinya

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
PhD theses : Computer Science and Information Technology
→
View Item

dc.contributor.author	Abdelkarim, Iatimad Mohammed Sati
dc.contributor.author	Supervisor, -JohnsonAgbinya
dc.date.accessioned	2021-10-17T11:54:39Z
dc.date.available	2021-10-17T11:54:39Z
dc.date.issued	2020-11-12
dc.identifier.citation	Abdelkarim, Iatimad Mohammed Sati . Parallel Support Vector Machine for big data Classification \ Iatimad Mohammed Sati Abdelkarim ; JohnsonAgbinya .- khartoum:Sudan University Of Science & Technology,College of Computer Science and Information Technology, 2020.-124p:ill ;28cm.-PhD.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/26739
dc.description	Thesis	en_US
dc.description.abstract	With the rapid growth of data in various fields, big data analysis is considered a great challenge for traditional management systems and scientists. This research deals with big data analysis using parallel computing through some algorithms for machine learning methods. This research deals with big data analysis using parallel computing through some algorithms. A framework of Parallel SVMs based MapReduce is implemented on different datasets to perform supervised classification. Support Vector Machines are an excellent example of the commonly used methods for producing classification problems. It is a suitable classifier machine learning because of its generalization ability and expertise to classify big data accurately. However, the traditional SVM is not appropriate for huge datasets due to its high computational complexity. This research studies the SVM algorithm and Parallel Support Vector Machine (PSVMs) and their applications in different big data fields. The implementation of PSVM is done in the Hadoop cluster running in the HPC center in Sudan. Three models are implemented in four datasets for classification. The PSVM is applied to real data. Then the k-means clustering is combined with the support vector machine. The real water quality dataset from the ministry of health and different water stations in Sudan (2006-2017) is used to classify whether the water is suitable for drinking or not. The Adult dataset is used to classify the income of a person. The diabetes data set is used to classify whether the patient has diabetes or not. The cover type dataset is used to classify seven wilderness areas located in the Roosevelt National Forest of northern Colorado. The numerical experiment applying the PSVM is compared with k- means clustering applied to SVM and SVM frameworks. The results showed that applying the parallel support vector machine gives the highest accuracy and positively reduces computation time. The performance is compared using time-consuming accuracy.	en_US
dc.description.sponsorship	Sudan University Of Science & Technology	en_US
dc.language.iso	en	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Computer Science and Information Technology	en_US
dc.subject	Parallel Support	en_US
dc.subject	Vector Machine	en_US
dc.subject	big data Classification	en_US
dc.title	Parallel Support Vector Machine for big data Classification	en_US
dc.title.alternative	تصنيف البيانات الضخمة باستخدام نظام الدعم الآلي المتوازي	en_US
dc.type	Thesis	en_US