Enhancing Text Analysis via Dimensionality Reduction

16 years 25 days ago

Download www.usna.edu

Many applications require analyzing vast amounts of textual data, but the size and inherent noise of such data can make processing very challenging. One approach to these issues is to mathematically reduce the data so as to represent each document using only a few dimensions. Techniques for performing such “dimensionality reduction” (DR) have been well-studied for geometric and numerical data, but more rarely applied to text. In this paper, we examine the impact of ﬁve DR techniques on the accuracy of two supervised classiﬁers on three textual sources. This task mirrors important real world problems, such as classifying web pages or scientiﬁc articles. In addition, the accuracy serves as a proxy measure for how well each DR technique preserves the inter-document relationships while vastly reducing the size of the data, facilitating more sophisticated analysis. We show that, for a ﬁxed number of dimensions, DR can be very successful at improving accuracy compared to using t...

David G. Underhill, Luke McDowell, David J. Marche

Real-time Traffic

Dr Techniques | Information Retrieval | IRI 2007 | Textual Data | ﬁve Dr Techniques |

claim paper

» Dimensionality Reduction via Genetic Value Clustering

» A unified framework for generalized Linear Discriminant Analysis

» Supervised Exponential Family Principal Component Analysis via Convex Optimization

» Transfer Learning via Dimensionality Reduction

» Random projection in dimensionality reduction applications to image and text data

» On the Effects of Dimensionality Reduction on High Dimensional Similarity Search

» SetOriented Dimension Reduction Localizing Principal Component Analysis Via Hidden Markov ...

» Speckle Reduction and Contrast Enhancement of Echocardiograms via Multiscale Nonlinear Pro...

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	IRI
Authors	David G. Underhill, Luke McDowell, David J. Marchette, Jeffrey L. Solka

Comments (0)

Sciweavers

Enhancing Text Analysis via Dimensionality Reduction

Dr Techniques | Information Retrieval | IRI 2007 | Textual Data | ﬁve Dr Techniques |

Explore & Download

Productivity Tools

Sciweavers