A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank no...
Farial Shahnaz, Michael W. Berry, V. Paul Pauca, R...
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
In order to support the navigation in huge document collections efficiently, tagged hierarchical structures can be used. Often, multiple tags are used to describe resources. For u...
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional ...