Sciweavers

GECCO
2008
Springer

Informative sampling for large unbalanced data sets

14 years 18 days ago
Informative sampling for large unbalanced data sets
Selective sampling is a form of active learning which can reduce the cost of training by only drawing informative data points into the training set. This selected training set is expected to contain more information for modeling compared to random sampling, thus making modeling faster and more accurate. We introduce a novel approach to selective sampling, which is derived from the Estimation-Exploration Algorithm (EEA). The EEA is a coevolutionary algorithm that uses model disagreement to determine the significance of a training datum, and evolves a set of models only on the selected data. The algorithm in this paper trains a population of Artificial Neural Networks (ANN) on the training set, and uses their disagreement to seek new data for the training set. A medical data set called the National Trauma Data Bank (NTDB) is used to test the algorithm. Experiments show that the algorithm outperforms the equivalent algorithm using randomly-selected data and sampling evenly from each cl...
Zhenyu Lu, Anand I. Rughani, Bruce I. Tranmer, Jos
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2008
Where GECCO
Authors Zhenyu Lu, Anand I. Rughani, Bruce I. Tranmer, Josh Bongard
Comments (0)