This work addresses the need for stateful dataflow programs that can rapidly sift through huge, evolving data sets. These data-intensive applications perform complex multi-step c...
Dionysios Logothetis, Christopher Olston, Benjamin...
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments su...
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Bud...
Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., be of ap...
Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets,...
Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional d...
Prateek Jain, Brian Kulis, Jason V. Davis, Inderji...