Entity matching (a.k.a. record linkage) plays a crucial role in integrating multiple data sources, and numerous matching solutions have been developed. However, the solutions have...
Warren Shen, Pedro DeRose, Long Vu, AnHai Doan, Ra...
Background: Agglomerative hierarchical clustering (AHC) is a common unsupervised data analysis technique used in several biological applications. Standard AHC methods require that...
This paper addresses a fundamental and challenging problem with broad applications: efficient processing of region-based promotion queries, i.e., to discover the top-k most inter...
Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In...
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain ...