We present a simple and scalable algorithm for clustering tens of millions of phrases and use the resulting clusters as features in discriminative classifiers. To demonstrate the ...
Many existing spectral clustering algorithms share a conventional graph partitioning criterion: normalized cuts (NC). However, one problem with NC is that it poorly captures the g...
The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuiti...
Ambiguous person names are a problem in many forms of written text, including that which is found on the Web. In this paper we explore the use of unsupervised clustering techniques...
Clustering using the Hilbert Schmidt independence criterion (CLUHSIC) is a recent clustering algorithm that maximizes the dependence between cluster labels and data observations ac...
Wenliang Zhong, Weike Pan, James T. Kwok, Ivor W. ...