This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality...
Kristen LeFevre, David J. DeWitt, Raghu Ramakrishn...
Next to prediction accuracy, the interpretability of models is one of the fundamental criteria for machine learning algorithms. While high accuracy learners have intensively been e...
—We introduce a simple image descriptor referred to as the image signature. We show, within the theoretical framework of sparse signal mixing, that this quantity spatially approx...
Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning...
Lise Getoor, Nir Friedman, Daphne Koller, Benjamin...