There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information...
We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approxi...
Aris Anagnostopoulos, Andrei Z. Broder, David Carm...
Abstract. We analyze the expected cost of a greedy active learning algorithm. Our analysis extends previous work to a more general setting in which different queries have differe...
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numbe...
Andrey Gubichev, Srikanta J. Bedathur, Stephan Seu...
Abstract. We study tolerant linearity testing under general distributions. Given groups G and H, a distribution µ on G, and oracle access to a function f : G → H, we consider th...