Sciweavers

529 search results - page 20 / 106
» Optimizing the distribution of large data sets in theory and...
Sort
View
COMPGEOM
2004
ACM
14 years 1 months ago
Locality-sensitive hashing scheme based on p-stable distributions
We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under ÐÔ norm, based on Ôstable distributions. Our scheme improves the running...
Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab ...
SC
2003
ACM
14 years 28 days ago
Optimizing Reduction Computations In a Distributed Environment
We investigate runtime strategies for data-intensive applications that involve generalized reductions on large, distributed datasets. Our set of strategies includes replicated fi...
Tahsin M. Kurç, Feng Lee, Gagan Agrawal, &U...
EDBT
2011
ACM
231views Database» more  EDBT 2011»
12 years 11 months ago
Data integration with dependent sources
Data integration systems offer users a uniform interface to a set of data sources. Previous work has typically assumed that the data sources are independent of each other; however...
Anish Das Sarma, Xin Luna Dong, Alon Y. Halevy
BMCBI
2007
159views more  BMCBI 2007»
13 years 7 months ago
Detecting differential expression in microarray data: comparison of optimal procedures
Background: Many procedures for finding differentially expressed genes in microarray data are based on classical or modified t-statistics. Due to multiple testing considerations, ...
Elena Perelman, Alexander Ploner, Stefano Calza, Y...
KDD
2001
ACM
163views Data Mining» more  KDD 2001»
14 years 8 months ago
The "DGX" distribution for mining massive, skewed data
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Zhiqiang Bi, Christos Faloutsos, Flip Korn