We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet...
A fundamental problem in data management is to draw a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streamin...
Graham Cormode, S. Muthukrishnan, Ke Yi, Qin Zhang
The K-Nearest Neighbor search (kNN) problem has been investigated extensively in the past due to its broad range of applications. In this paper we study this problem in the contex...
We propose PASTE, the first differentially private aggregation algorithms for distributed time-series data that offer good practical utility without any trusted server. PASTE add...
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...