MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find s...
Hanghang Tong, Christos Faloutsos, Brian Gallagher...
There are many clustering tasks which are closely related in the real world, e.g. clustering the web pages of different universities. However, existing clustering approaches neglec...
We present the development and use of a novel distributed geohazard modeling environment for the analysis and interpretation of large scale earthquake data sets. Our work demonstr...
We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in w...