Unsupervised Feature Selection for Text Data

14 years 4 months ago

Download www.comp.rgu.ac.uk

Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a need for tools that can operate without the supervision of the user. In this paper we look at novel feature selection techniques that address this need. A distributional similarity measure from information theory is applied to measure feature utility. This utility informs the search for both representative and diverse features in two complementary ways: CLUSTER divides the entire feature space, before then selecting one feature to represent each cluster; and GREEDY increments the feature subset size by a greedily selected feature. In particular we found that GREEDY's local search is suited to learning smaller feature subset sizes while CLUSTER is able to improve the global quality of larger feature sets. Experiments with four email data sets show significant improvement in retrieval accuracy with nearest ne...

Nirmalie Wiratunga, Robert Lothian, Stewart Massie

Real-time Traffic

Automated Reasoning | EWCBR 2006 | Feature Selection | Feature Selection Techniques | Feature Subset Sizes |

claim paper

Post Info
More Details (n/a)

Added	22 Aug 2010
Updated	22 Aug 2010
Type	Conference
Year	2006
Where	EWCBR
Authors	Nirmalie Wiratunga, Robert Lothian, Stewart Massie

Comments (0)

Sciweavers

Unsupervised Feature Selection for Text Data

Automated Reasoning | EWCBR 2006 | Feature Selection | Feature Selection Techniques | Feature Subset Sizes |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers