Sciweavers

793 search results - page 134 / 159
» Optimizing Multiple Queries in Distributed Data Stream Syste...
Sort
View
VLDB
2002
ACM
108views Database» more  VLDB 2002»
13 years 7 months ago
Generic Database Cost Models for Hierarchical Memory Systems
Accurate prediction of operator execution time is a prerequisite for database query optimization. Although extensively studied for conventional disk-based DBMSs, cost modeling in ...
Stefan Manegold, Peter A. Boncz, Martin L. Kersten
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2005
ACM
14 years 8 months ago
LSH forest: self-tuning indexes for similarity search
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
Mayank Bawa, Tyson Condie, Prasanna Ganesan
KDD
2012
ACM
242views Data Mining» more  KDD 2012»
11 years 10 months ago
Query-driven discovery of semantically similar substructures in heterogeneous networks
Heterogeneous information networks that contain multiple types of objects and links are ubiquitous in the real world, such as bibliographic networks, cyber-physical networks, and ...
Xiao Yu, Yizhou Sun, Peixiang Zhao, Jiawei Han
NSDI
2010
13 years 9 months ago
MapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire outp...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....