Abstract:
Breast cancer BC is the most cause of death was the second-largest killer inof women around the world. Manual diagnosis is less effective due to physician uncertainty. For this reason, Early detection and diagnosis of BC increases the chance of treatment and survival. tThe purpose of this study was to use random forest algorithmmachine learning techniques for the classification of malignant and benign breast tumors based on cytological features.
In this study, a random forest RF algorithm was used to classify tumors of BC using the Wisconsin breast cancer WBCD dataset which consists of 699 instances, 11 real-world attributes, and two classification class from the University of California Irvine (UCI) machine learning repository, with and without selecting features and without it. In the first, RF is used to classify the WBCD without eliminating features. In sSecond, the features are reduced from nine to: eight, six and four attributes, by using RF and then classified using RF either.
The RF model with six attributes obtained acceptable performance, where this model achieved an accuracy of 98.52%, 100% sensitivity, 98.04% specificity, and 0.99 AUC.
This research demonstrated that the RF can be used for reducing the dimension of features space and diagnostic the cancer based on those featuresBC can be classified using RF classifier. Proposed RF model can be used to obtain fast automatic diagnostic system for any other diseases in the future.