In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data set...
Record linkage is the problem of identifying similar records across different data sources. The similarity between two records is defined based on domain-specific similarity functi...
The cortical folding patterns are very different from one individual to another. Here we try to find folding patterns automatically using large-scale datasets by non-supervised cl...
Organizing Web search results into a hierarchy of topics and subtopics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierar...
Mean-Shift (MS) is a powerful non-parametric clustering method. Although good accuracy can be achieved, its computational cost is particularly expensive even on moderate data sets...