Abstract— A bloom filter is a simple, space-efficient, randomized data structure for concisely representing a static data set, in order to support approximate membership querie...
Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relatio...
Sudipto Guha, Nick Koudas, Amit Marathe, Divesh Sr...
The quality of an information retrieval system heavily depends on its retrieval function, which returns a similarity measurement between the query and each document in the collect...
Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically,...
We study the problem of testing isomorphism (equivalence up to relabelling of the variables) of two Boolean functions f, g : {0, 1}n → {0, 1}. Our main focus is on the most stud...