Abstract:
Binary logistic regression (BLR) and Linear discriminant analysis (LDA) are often used for the purpose of classifying populations or groups using a set of predictor variables. The goal of this study is to compare two different methods of classification (BLR and LDA) to make the choice between the two methods easier, and to understand how to do the two models behave under different epidemiological data and group characteristics. Assumptions of multivariate normality and equal variance covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for BLR. In this study, four real epidemiological datasets specific of ruminant animals (Sheep, Cattle, Goats and Camel) are used to study the performance of both methods, the data collected by veterinarian researcher from Al Qadarif state – eastern Sudan as epidemiological case study of bluetongue virus and its association with various risk factors (predictors). The measures used to compare the performance of the two techniques was the overall classification accuracy, and to investigate the quality of prediction in terms of sensitivity and specificity. Area under the receiver operating characteristic curve (AUC) was also examined. The first finding that can be drawn from this study was that both methods have selected same predictors for significant differentiation, using non-normally distributed data.
The second major outcome was that the sample size has the same impact on LDA and BLR regarding the percentages of animals being correctly classified. Area under roc curve (AUC) showed BLR slight superiority than LDA, and classification accuracy of higher cutoff points also showed small difference between two models.