As the size and dimensionality of data sets increase, the task of feature selection has become increasingly important. In this paper we demonstrate how association rules can be us...
Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is ...
Identifying a subset of features that preserves classification accuracy is a problem of growing importance, because of the increasing size and dimensionality of real-world data se...
The amount of data produced by ubiquitous computing applications is quickly growing, due to the pervasive presence of small devices endowed with sensing, computing and communicatio...
Annalisa Appice, Michelangelo Ceci, Antonio Turi, ...
Abstract. We present a novel approach for classification using a discretised function representation which is independent of the data locations. We construct the classifier as a su...
The Hadoop filesystem is a large scale distributed filesystem used to manage and quickly process extremely large data sets. We want to utilize Hadoop to assist with dataintensive ...
Abstract In the near future, it will be possible to continuously record and store the entire audio–visual lifetime of a person together with all digital information that the pers...
Wavelet-based methods have proven their efficiency for the visualization at different levels of detail, progressive transmission, and compression of large data sets. The required...
The goal of the CONTROL project at Berkeley is to develop systems for interactive analysis of large data sets. We focus on systems that provide users with iteratively refining answ...
Joseph M. Hellerstein, Ron Avnur, Vijayshankar Ram...
: Sequences of data-dependent tasks, each one traversing large data sets, exist in many applications (such as video, image and signal processing applications). Those tasks usually ...