Measuring the causal effects of online advertising (adfx) on user behavior is important to the health of the WWW publishing industry. In this paper, using three controlled experi...
MapReduce is a computing paradigm that has gained a lot of popularity as it allows non-expert users to easily run complex analytical tasks at very large-scale. At such scale, task...
We present an automatic skew mitigation approach for userdefined MapReduce programs and present SkewTune, a system that implements this approach as a drop-in replacement for an e...
YongChul Kwon, Magdalena Balazinska, Bill Howe, Je...
Sampling is crucial for controlling resource consumption by internet traffic flow measurements. Routers use Packet Sampled NetFlow [9], and completed flow records are sampled in...
Processing large data streams is now a major topic in data management. The data involved can be truly massive, and the required analyses complex. In a stream of sequential events ...