This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided...
Hu Guan, Bin Xiao, Jingyu Zhou, Minyi Guo, Tao Yan...
Data mining techniques frequently find a large number of patterns or rules, which make it very difficult for a human analyst to interpret the results and to find the truly interes...
Kaidi Zhao, Bing Liu, Thomas M. Tirpak, Weimin Xia...
We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data poi...
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. O...