SUST Repository

Parallel Support Vector Machine for big data Classification

Show simple item record

dc.contributor.author Abdelkarim, Iatimad Mohammed Sati
dc.contributor.author Supervisor, -JohnsonAgbinya
dc.date.accessioned 2021-10-17T11:54:39Z
dc.date.available 2021-10-17T11:54:39Z
dc.date.issued 2020-11-12
dc.identifier.citation Abdelkarim, Iatimad Mohammed Sati . Parallel Support Vector Machine for big data Classification \ Iatimad Mohammed Sati Abdelkarim ; JohnsonAgbinya .- khartoum:Sudan University Of Science & Technology,College of Computer Science and Information Technology, 2020.-124p:ill ;28cm.-PhD. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/26739
dc.description Thesis en_US
dc.description.abstract With the rapid growth of data in various fields, big data analysis is considered a great challenge for traditional management systems and scientists. This research deals with big data analysis using parallel computing through some algorithms for machine learning methods. This research deals with big data analysis using parallel computing through some algorithms. A framework of Parallel SVMs based MapReduce is implemented on different datasets to perform supervised classification. Support Vector Machines are an excellent example of the commonly used methods for producing classification problems. It is a suitable classifier machine learning because of its generalization ability and expertise to classify big data accurately. However, the traditional SVM is not appropriate for huge datasets due to its high computational complexity. This research studies the SVM algorithm and Parallel Support Vector Machine (PSVMs) and their applications in different big data fields. The implementation of PSVM is done in the Hadoop cluster running in the HPC center in Sudan. Three models are implemented in four datasets for classification. The PSVM is applied to real data. Then the k-means clustering is combined with the support vector machine. The real water quality dataset from the ministry of health and different water stations in Sudan (2006-2017) is used to classify whether the water is suitable for drinking or not. The Adult dataset is used to classify the income of a person. The diabetes data set is used to classify whether the patient has diabetes or not. The cover type dataset is used to classify seven wilderness areas located in the Roosevelt National Forest of northern Colorado. The numerical experiment applying the PSVM is compared with k- means clustering applied to SVM and SVM frameworks. The results showed that applying the parallel support vector machine gives the highest accuracy and positively reduces computation time. The performance is compared using time-consuming accuracy. en_US
dc.description.sponsorship Sudan University Of Science & Technology en_US
dc.language.iso en en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Computer Science and Information Technology en_US
dc.subject Parallel Support en_US
dc.subject Vector Machine en_US
dc.subject big data Classification en_US
dc.title Parallel Support Vector Machine for big data Classification en_US
dc.title.alternative تصنيف البيانات الضخمة باستخدام نظام الدعم الآلي المتوازي en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account