— We consider how to support a large number of users over a wide-area network whose interests are characterized by range top-k continuous queries. Given an object update, we need...
We present a new Bi-level LSH algorithm to perform approximate k-nearest neighbor search in high dimensional spaces. Our formulation is based on a two-level scheme. In the first ...
Abstract— Previous work has introduced probability distributions as first-class components in uncertain stream database systems. A lacking element is the fact of how accurate the...
—With the exponential growth in the amount of data that is being generated in recent years, there is a pressing need for applying machine learning algorithms to large data sets. ...
— We consider the approximate string membership checking (ASMC) problem of extracting all the strings or substrings in a document that approximately match some string in a given ...
— Accurate query performance prediction (QPP) is central to effective resource management, query optimization and query scheduling. Analytical cost models, used in current genera...
— The advance of object tracking technologies leads to huge volumes of spatio-temporal data collected in the form of trajectory data stream. In this study, we investigate the pro...
Lu An Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice...
—Dimensionality reduction is essential in text mining since the dimensionality of text documents could easily reach several tens of thousands. Most recent efforts on dimensionali...
— Uncertainties in data arise for a number of reasons: when the data set is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remov...
Graham Cormode, Divesh Srivastava, Entong Shen, Ti...
—“Big Data” in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavio...
Badrish Chandramouli, Jonathan Goldstein, Songyun ...