We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
Current methods for selectivity estimation fall into two broad categories, synopsis-based and sampling-based. Synopsis-based methods, such as histograms, incur minimal overhead at ...
Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper w...
—Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete opera...
Sailesh Kumar, Jonathan S. Turner, Patrick Crowley