Sciweavers

EWCBR
2006
Springer

Unsupervised Feature Selection for Text Data

14 years 3 months ago
Unsupervised Feature Selection for Text Data
Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a need for tools that can operate without the supervision of the user. In this paper we look at novel feature selection techniques that address this need. A distributional similarity measure from information theory is applied to measure feature utility. This utility informs the search for both representative and diverse features in two complementary ways: CLUSTER divides the entire feature space, before then selecting one feature to represent each cluster; and GREEDY increments the feature subset size by a greedily selected feature. In particular we found that GREEDY's local search is suited to learning smaller feature subset sizes while CLUSTER is able to improve the global quality of larger feature sets. Experiments with four email data sets show significant improvement in retrieval accuracy with nearest ne...
Nirmalie Wiratunga, Robert Lothian, Stewart Massie
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where EWCBR
Authors Nirmalie Wiratunga, Robert Lothian, Stewart Massie
Comments (0)