Abstract:
The Imbalance Multi-class learning problem is one of the challenging problems in supervised machine learning. The imbalance nature of the data – which is owning skewed distribution of samples in different classes –as well as being multiclass – where an instance could be assigned to more than one class - lead to many vital problems in both learning and performance evaluation processes.
The research problem could be epitomized in finding more accurate classification results for such kind of data. So, its methodology is based on proposing new classification hierarchical method based on Multi-Class Support Vector Machine (Multi-Class SVM). The model rebalances the data via grouping small classes in bigger classes (artificial classes). Then it classifies the compound classes into its constituent classes at later stage. Experiments were applied on nine different Multiclass imbalanced datasets from U.C.I. repository.
The experiments show that the new hierarchical model enhances the classification results comparing with the classification results of some state-of-the-art solution, even when empowered with weight for minority instances, considering four different performance metrics. They also exhibit that the model is not only successful in treating the imbalance problem simply without computational efforts or algorithmic modification, but also it does not require any data pre-processing step as many other solutions need. So, there is no additional adaptation neither on the data level, nor on the algorithmic level. Moreover, the experiments showed that the model performs well even when the ratio between minority and majority samples is high. They also demonstrate that the model works better with large number of classes of a dataset and perform poorly with the dataset that owns little number of classes that could not be combined into artificial classes of nearly balanced numbers of examples.