Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/26739
Title: Parallel Support Vector Machine for big data Classification
Other Titles: تصنيف البيانات الضخمة باستخدام نظام الدعم الآلي المتوازي
Authors: Abdelkarim, Iatimad Mohammed Sati
Supervisor, -JohnsonAgbinya
Keywords: Computer Science and Information Technology
Parallel Support
Vector Machine
big data Classification
Issue Date: 12-Nov-2020
Publisher: Sudan University of Science and Technology
Citation: Abdelkarim, Iatimad Mohammed Sati . Parallel Support Vector Machine for big data Classification \ Iatimad Mohammed Sati Abdelkarim ; JohnsonAgbinya .- khartoum:Sudan University Of Science & Technology,College of Computer Science and Information Technology, 2020.-124p:ill ;28cm.-PhD.
Abstract: With the rapid growth of data in various fields, big data analysis is considered a great challenge for traditional management systems and scientists. This research deals with big data analysis using parallel computing through some algorithms for machine learning methods. This research deals with big data analysis using parallel computing through some algorithms. A framework of Parallel SVMs based MapReduce is implemented on different datasets to perform supervised classification. Support Vector Machines are an excellent example of the commonly used methods for producing classification problems. It is a suitable classifier machine learning because of its generalization ability and expertise to classify big data accurately. However, the traditional SVM is not appropriate for huge datasets due to its high computational complexity. This research studies the SVM algorithm and Parallel Support Vector Machine (PSVMs) and their applications in different big data fields. The implementation of PSVM is done in the Hadoop cluster running in the HPC center in Sudan. Three models are implemented in four datasets for classification. The PSVM is applied to real data. Then the k-means clustering is combined with the support vector machine. The real water quality dataset from the ministry of health and different water stations in Sudan (2006-2017) is used to classify whether the water is suitable for drinking or not. The Adult dataset is used to classify the income of a person. The diabetes data set is used to classify whether the patient has diabetes or not. The cover type dataset is used to classify seven wilderness areas located in the Roosevelt National Forest of northern Colorado. The numerical experiment applying the PSVM is compared with k- means clustering applied to SVM and SVM frameworks. The results showed that applying the parallel support vector machine gives the highest accuracy and positively reduces computation time. The performance is compared using time-consuming accuracy.
Description: Thesis
URI: http://repository.sustech.edu/handle/123456789/26739
Appears in Collections:PhD theses : Computer Science and Information Technology

Files in This Item:
File Description SizeFormat 
Parallel Support ....pdfResearch8.08 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.