A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This c...
David Hawking, Nick Craswell, Paul B. Thistlewaite...
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, f...
This paper introduces Clustera, an integrated computation and data management system. In contrast to traditional clustermanagement systems that target specific types of workloads,...
David J. DeWitt, Erik Paulson, Eric Robinson, Jeff...
Developing high-quality information extraction (IE) rules, or extractors, is an iterative and primarily manual process, extremely time consuming, and error prone. In each iteratio...
Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V. ...
A key problem in using the output of an auditory model as the input to a machine-learning system in a machine-hearing application is to find a good feature-extraction layer. For ...