Abstract:
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the developed and the developing world. The incidence of breast cancer is increasing in the developing world due to increase life expectancy, increase urbanization and adoption of western lifestyles. About one in eight women are diagnosed with breast cancer during their lifetime. There's a good chance of recovery if it's detected in its early stages.
This research intended to achieve a feature subset with minimum number of features providing efficient classification accuracy. Sequential forward selection algorithm used to find the subset of features that can ensure highly accurate classification of breast cancer as either benign or malignant and to measure the goodness of these selected feature sets.
Then a comparative study on different cancer classification approaches viz. Naïve Bayes, K-nearest, Gradient Boosting and AdaBoost, with and without feature selection, the different algorithms almost find different feature sets by using Sequential forward selection algorithm.
Here, Gradient Boosting classifier is concluded as the best classifier for both mammography dataset and Wisconsin dataset, with and without feature selection.