Abstract. In this paper we present a coarse-grained parallel algorithm, CONQUEST, for constructing boundederror summaries of high-dimensional binary attributed data in a distribute...
It is increasingly common for users to interact with the web using a number of different aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building bloc...
We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet...
Over the last 10–15 years, our industry has developed and deployed many large-scale Internet services, from e-commerce to social networking sites, all facing common challenges i...
Emre Kiciman, V. Benjamin Livshits, Madanlal Musuv...
In this paper, we propose a methodology to predict the popularity of online contents. More precisely, rather than trying to infer the popularity of a content itself, we infer the l...