Linear Discriminant Text Classification in High Dimension

14 years 1 months ago

Download www.metacarta.com

Abstract. Linear Discriminant (LD) techniques are typically used in pattern recognition tasks when there are many (n >> 104 ) datapoints in low-dimensional (d < 102 ) space. In this paper we argue on theoretical grounds that LD is in fact more appropriate when training data is sparse, and the dimension of the space is extremely high. To support this conclusion we present experimental results on a medical text classification problem of great practical importance, autocoding of adverse event reports. We trained and tested LD-based systems for a variety of classification schemes widely used in the clinical drug trial process (COSTART, WHOART, HARTS, and MedDRA) and obtained significant reduction in the rate of misclassification compared both to generic Bayesian machine-learning techniques and to the current generation of domain-specific autocoders based on string matching.

András Kornai, J. Michael Richards

Real-time Traffic

Bayesian Machine-learning Techniques | HIS 2001 | HIS 2007 | Pattern Recognition Tasks | Text Classification Problem |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	HIS
Authors	András Kornai, J. Michael Richards

Comments (0)

Sciweavers

Linear Discriminant Text Classification in High Dimension

Bayesian Machine-learning Techniques | HIS 2001 | HIS 2007 | Pattern Recognition Tasks | Text Classification Problem |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers