Abstract:
This thesis compares three feature selection methods: through Correlation Based Feature selection (CFS), Relief, and Wrapper methods. Three machine learning algorithms were used: J48 (a decision tree learner), naive Bayes (Bayesian Network), And Multilayer Perceptron (MLP) (Artificial Neural Networks). The purpose of comparison is to extract best set of features that leads enhance performance of classifiers. As the method is study_case_based SEER data is selected for this purpose.
The study showed that classification accuracy using the reduced feature set is equal and in some cases outperform the complete data set.
Moreover, as expected the performance of J48 decreases with the reduced data set.
CFS selected five features, WRAPPER returned eight features and RELIEF returned list of ranked features.
By comparing selected classifier methods Naïve Bayes is showed better results in this study. It produced a significant increase in accuracy with CFS, RELIEF, and WRAPPER methods.