The problem of assessing the significance of data mining results on high-dimensional 0?1 data sets has been studied extensively in the literature. For problems such as mining freq...
Aristides Gionis, Heikki Mannila, Panayiotis Tsapa...
Machine learning and data mining can be effectively used to model, classify and discover interesting information for a wide variety of data including email. The Email Mining Toolk...
Currently, large-scale projects are underway to perform whole genome disease association studies. Such studies involve the genotyping of hundreds of thousands of SNP markers. One o...
A collaborative filtering system at an e-commerce site or similar service uses data about aggregate user behavior to make recommendations tailored to specific user interests. We d...
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in ...
Andrew Pavlo, Erik Paulson, Alexander Rasin, Danie...