Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
Many contemporary database applications require similarity-based retrieval of complex objects where the only usable knowledge of its domain is determined by a metric distance func...
Weijia Xu, Daniel P. Miranker, Rui Mao, Smriti R. ...
Abstract. This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quant...
This paper investigates the use of supervised clustering in order to create sets of categories for classi cation of documents. We use information from a pre-existing taxonomy in o...