Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, pr...
Summaries of massive data sets support approximate query processing over the original data. A basic aggregate over a set of records is the weight of subpopulations specified as a ...
In the near future, commodity hardware is expected to incorporate both flash and magnetic disks. In this paper we study how the storage layer of a database system can benefit from...
Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state...
Surajit Chaudhuri, Vivek R. Narasayya, Ravishankar...
The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of th...
Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, ...
Uncertainty pervades many domains in our lives. Current real-life applications, e.g., location tracking using GPS devices or cell phones, multimedia feature extraction, and sensor...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas
Beyond already existing huge data volumes, e-science communities face major challenges in managing the anticipated data deluge of forthcoming projects. Community-driven data grids...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...