Abstract—MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) a...
Amol Ghoting, Rajasekar Krishnamurthy, Edwin P. D....
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and ...
-- The MapReduce programming model, introduced by Google, has become popular over the past few years as a mechanism for processing large amounts of data, using sharednothing parall...
Sriram Krishnan, Chaitanya K. Baru, Christopher J....
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to de...
Joins are essential for many data analysis tasks, but are not supported directly by the MapReduce paradigm. While there has been progress on equi-joins, implementation of join alg...