Cluster Storage Systems where storage devices are distributed across a large number of nodes are able to reduce the I/O bottleneck problems present in most centralized storage sys...
The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applicati...
Theproblemof efficiently and accurately locating patterns of interest in massivetimeseries data sets is an important and non-trivial problemin a wide variety of applications, incl...
This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques i...
Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developin...