—MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is increasingly being used in e-science applications. Unfortunately, the...
Benjamin Gufler, Nikolaus Augsten, Angelika Reiser...
Much of the success of the Internet services model can be attributed to the popularity of a class of workloads that we call Online Data-Intensive (OLDI) services. These workloads ...
David Meisner, Christopher M. Sadler, Luiz Andr&ea...
In this paper, we first present a new implementation of the 3-D fast curvelet transform, which is nearly 2.5 less redundant than the Curvelab (wrapping-based) implementation as o...
A report is provided for the ACM SIGKDD community about the 2010 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010), its origin in MMDS 2006 and MMDS 2008, and future...
Abstract. The Parallel Disks Model (PDM) has been proposed to alleviate the I/O bottleneck that arises in the processing of massive data sets. Sorting has been extensively studied ...
Abstract. Processing compressed strings without decompression is often essential when dealing with massive data sets. We consider local subsequence recognition problems on strings ...
Advances in interactive systems and the ability to manage increasing amounts of high-dimensional data provide new opportunities in numerous domains. Information visualization tech...
We consider the problem of extracting informative exemplars from a data stream. Examples of this problem include exemplarbased clustering and nonparametric inference such as Gauss...
The wavelet decomposition is a proven tool for constructing concise synopses of massive data sets and rapid changing data streams, which can be used to obtain fast approximate, wit...
Collaborative filtering (CF) shares information between users to provide each with recommendations. Previous work suggests using sketching techniques to handle massive data sets i...