We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multip...
We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of va...
Abstract. Latent semantic indexing (LSI) is an application of numerical method called singular value decomposition (SVD), which discovers latent semantic in documents by creating c...
We present a method for automated topic suggestion. Given a plain-text input document, our algorithm produces a ranking of novel topics that could enrich the input document in a m...
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decisi...