Abstract:
An approach to construction of classifiers from imbalanced datasets is described. The dataset is imbalanced if the classification categories are not approximately equally represented,often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This research shows that a combination of method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance. The methodology involves acquisition the dataset form UCI repository and applying SVM and Random Forest classifier, applying SMOTE method and evaluating classification accuracy before and after balancing.