Effect of Clustering as a Preprocessing Step for Solving Unbalanced Data set Problem

Mohmmed, Elham Mosa Abd Aljalil; SUPERVISOR - MurtadaKhalafallahElbashir

Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/11167

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mohmmed, Elham Mosa Abd Aljalil
dc.contributor.author	SUPERVISOR - MurtadaKhalafallahElbashir
dc.date.accessioned	2015-06-24T07:54:54Z
dc.date.available	2015-06-24T07:54:54Z
dc.date.issued	2014-10-28
dc.identifier.citation	Mohmmed , Elham Mosa Abd Aljalil . Effect of Clustering as a Preprocessing Step for Solving Unbalanced Data set Problem : Protein Secondary Structure Prediction \Elham Mosa Abd Aljalil Mohmmed ; Murtada Khalafallah Elbashir .-Khartoum : sudan university of science and technology,computer science,2014.-45 p:ill;28cm.-M.Sc	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/11167
dc.description	Thesis	en_US
dc.description.abstract	Protein secondary structure prediction from its sequence of amino acids remains an important issue. Determining the secondary structure of protein in the laboratory is very costly and consumes a lot of time. Development of precise and efﬁcient method for secondary structure prediction is very important. In this research we propose an approach that uses the clustering algorithm as preprocessing steps for machine learning methods for solve unbalanced dataset problem to predict Protein secondary structure and compare the result when using the clustering algorithm, with the result without using it in the prediction. We utilize position specific scoring matrices (PSSMs) as features. The preprocessing for the data will be done using K-means clustering to prepare clusters that can be used as input for a support vector machines (SVM) and kernel logistic regression (KLR) models In this study we achieved high prediction accuracy compared by previous study Qtotal of 86.5%, 77.6%, on α-helix and coil secondary structure respectively when we used SVM method and also we achieved Qtotal of 82.18%, 75.3%and 82.9% on α-helix, coil and extended beta-sheet secondary structure respectively when we used KLR method .Achieves satisfactory performance in predicting secondary structure as measured by the Matthew’s correlation coefficient (MCC), Qpredicted and Qobserved on RS126 datasets	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Computer Science	en_US
dc.subject	Unbalanced Data Set Problem	en_US
dc.subject	Effect of Clustering as	en_US
dc.title	Effect of Clustering as a Preprocessing Step for Solving Unbalanced Data set Problem	en_US
dc.title.alternative	تاثير التجميع بمثابة خطوة تجهيزية لحل مشكلة مجموعة البيانات غير المتوازنة	en_US
dc.type	Thesis	en_US
Appears in Collections:	Masters Dissertations : Computer Science and Information Technology

Files in This Item:

File	Description	Size	Format
Effect of Clustering ....pdf	Title	243.77 kB	Adobe PDF	View/Open
Researsh.pdf	Researsh	901.38 kB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets