Abstract:
Data mining is the process of analyzing large quantities of data and summarizing it into useful information. In medical diagnosis the role of data mining are increasing rapidly. Particularly classification algorithms are very helpful in classifying the patient data, which is important in decision making process for medical practitioners. In this study a Support Vector Machine(SVM) based classifier has been trained and tested to predict 5 years survivability of lung cancer patients. The dataset used in this study consist of information about patients who have lung cancer collected by SEER. Preprocessing techniques have been applied to prepare the raw dataset and identify the relevant attributes for classification. Dataset is pre-classified into survived and not-survived 11.3% and 88.7% respectively. The purpose of this research is to verify the predictive effectiveness of SVM algorithm on real, historical data.
We used Weka tool to train and test the classifier, there were two implementations of SVM in Weka Sequential Minimal Optimization (SMO) and Library for Support Vector Machines (LIBSVM). The results show that there were slight differences in accuracy between these two training algorithm, but there was a difference in algorithm execution time. The accuracy of the proposed system (SMO&LIBSVM) is better than what is reported in the literature for classifiers trained on the same dataset. The result indicates that SMO & LIBSVM are not robust against imbalance dataset.