Abstract:
Breast cancer is the second leading cause of cancer death in women after lung cancer.
Software available today, however, has low accuracy levels due to inaccurately selected
predictors. The main objective of this research is to design and implement a diagnostic
system of breast cancer using machine learning technique called logistic regression to
.reduce the number of false positives within the prediction using more features and identify
breast cancer automatically.
Wisconsin Diagnostic Breast Cancer (WDBC) database was used . It consists of nine
features and one decision attribute which denote whether the cell is malignant (1) or benign
(0). The proposed algorithm consists of two major stages: Data visualization and logistic
regression hypothesis for future predictions (classifier). Data visualization further divided
into two minor steps: Feature normalization and Principal components analysis (PCA).
Logistic regression hypothesis is obtained by three minor steps: Computing sigmoid
function to obtain the hypothesis, then computing the cost and gradient of the hypothesis to
reach the optimal theta parameters. The obtained hypothesis used as diagnosis model
An efficient method for breast cancer classification has been developed. The evaluation
of the proposed system was performed on WDBC with high accuracy equal to 98.550725%
and F score equal to 0.972222%. Where F is balanced F-score. The F score can be
interpreted as a weighted average of the precision and recall, where an F score reaches its
best value at 1 and worst at 0.