Abstract:
Data mining is the automatic search of huge data to discover patterns and trends that go
beyond simple analysis. Data mining is also known as Knowledge Discovery in Data
(KDD). This study investigates the discovery of the survival rate or survivability of a
certain disease is possible by extracting the knowledge from the data related to that
disease. To do such investigate a large data set needed one of these data sources is
SEER[1] (Surveillance Epidemiology and End Results), which is a unique, reliable and
essential resource for investigating the different aspects of cancer. In this study we have
investigated three data mining techniques Multilayer Perceptron (MLP), K-nearest
neighbor and the C4.5 decision trees the goal is to find the best accuracy to predict 5
years survivability of breast cancer.
SEER database (period of 1973-2009 with 657,712 records) were used, starting from
previous study we determined common variables use, after preprocessed there are 18
variables and 180,302 records.
Weka was used to train and test the three techniques. The result show that the best
technique is C4.5 accuracy is %95.6 and the second technique is K-nearest neighbor
with accuracy %95.4 and the worst is MLP with accuracy %95.3.