Sciweavers

KDD
1997
ACM

Anytime Exploratory Data Analysis for Massive Data Sets

14 years 4 months ago
Anytime Exploratory Data Analysis for Massive Data Sets
Exploratory data analysis is inherently an iterative, interactive endeavor. In the context of massive data sets, however, many current data analysis algorithms will not scale appropriately to permit interaction on a human time-scale. In this paper “anytime data analysis” is proposed as a general framework to enable exploratory data analysis of massive data sets. Anytime data analysis takes into account not only the quality of the model being fit but also the resources (time and memory) used to achieve that fit. The framework is discussed in some detail for interactive multivariate density estimation. Out-of-sample log-likelihood and model combination techniques (such as stacking) are used to greedily explore the data landscape. The method is applied to two significant scientific data sets where it is shown that it can be better to combine multiple “cheap-to-construct” models than to spend the same time optimizing the parameters of a single more complex model.
Padhraic Smyth, David Wolpert
Added 08 Aug 2010
Updated 08 Aug 2010
Type Conference
Year 1997
Where KDD
Authors Padhraic Smyth, David Wolpert
Comments (0)