Do unbalanced data have a negative effect on LDA?

14 years 12 days ago

Download eprints.pascal-network.org

For two-class discrimination, Ref. [1] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced dataset had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 realworld datasets, Ref. [1] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in [1], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced datasets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is...

Jing-Hao Xue, D. Mike Titterington

Real-time Traffic

Covariance Matrices | Performance | PR 2008 | Unbalanced Datasets |

claim paper

Post Info
More Details (n/a)

Added	14 Dec 2010
Updated	14 Dec 2010
Type	Journal
Year	2008
Where	PR
Authors	Jing-Hao Xue, D. Mike Titterington

Comments (0)

Sciweavers

Do unbalanced data have a negative effect on LDA?

Covariance Matrices | Performance | PR 2008 | Unbalanced Datasets |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers