Sciweavers

IRI
2007
IEEE

Enhancing Text Analysis via Dimensionality Reduction

14 years 5 months ago
Enhancing Text Analysis via Dimensionality Reduction
Many applications require analyzing vast amounts of textual data, but the size and inherent noise of such data can make processing very challenging. One approach to these issues is to mathematically reduce the data so as to represent each document using only a few dimensions. Techniques for performing such “dimensionality reduction” (DR) have been well-studied for geometric and numerical data, but more rarely applied to text. In this paper, we examine the impact of five DR techniques on the accuracy of two supervised classifiers on three textual sources. This task mirrors important real world problems, such as classifying web pages or scientific articles. In addition, the accuracy serves as a proxy measure for how well each DR technique preserves the inter-document relationships while vastly reducing the size of the data, facilitating more sophisticated analysis. We show that, for a fixed number of dimensions, DR can be very successful at improving accuracy compared to using t...
David G. Underhill, Luke McDowell, David J. Marche
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where IRI
Authors David G. Underhill, Luke McDowell, David J. Marchette, Jeffrey L. Solka
Comments (0)