Sciweavers

57 search results - page 4 / 12
» Evaluation of Text Clustering Algorithms with N-Gram-Based D...
Sort
View
IPM
2006
151views more  IPM 2006»
13 years 7 months ago
Document clustering using nonnegative matrix factorization
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank no...
Farial Shahnaz, Michael W. Berry, V. Paul Pauca, R...
AIRWEB
2006
Springer
13 years 11 months ago
Tracking Web Spam with Hidden Style Similarity
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
Tanguy Urvoy, Thomas Lavergne, Pascal Filoche
LWA
2008
13 years 9 months ago
Labeling Clusters - Tagging Resources
In order to support the navigation in huge document collections efficiently, tagged hierarchical structures can be used. Often, multiple tags are used to describe resources. For u...
Korinna Bade, Andreas Nürnberger
ICDM
2009
IEEE
176views Data Mining» more  ICDM 2009»
13 years 5 months ago
SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
Mohammad Salim Ahmed, Latifur Khan
COLING
2008
13 years 9 months ago
A Framework for Identifying Textual Redundancy
The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional ...
Kapil Thadani, Kathleen McKeown