— The genetic causes of many monogenic diseases have already been discovered. However, most common diseases are actually the result of complex nonlinear interactions between mult...
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no al...
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Gan...
— We propose a randomized data mining method that finds clusters of spatially overlapping images. The core of the method relies on the min-Hash algorithm for fast detection of p...
Similarity search has been proved suitable for searching in very large collections of unstructured data objects. We are interested in efficient parallel query processing under si...
Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In...