Informative sampling for large unbalanced data sets

15 years 7 months ago

Download www.cs.uvm.edu

Selective sampling is a form of active learning which can reduce the cost of training by only drawing informative data points into the training set. This selected training set is expected to contain more information for modeling compared to random sampling, thus making modeling faster and more accurate. We introduce a novel approach to selective sampling, which is derived from the Estimation-Exploration Algorithm (EEA). The EEA is a coevolutionary algorithm that uses model disagreement to determine the signiﬁcance of a training datum, and evolves a set of models only on the selected data. The algorithm in this paper trains a population of Artiﬁcial Neural Networks (ANN) on the training set, and uses their disagreement to seek new data for the training set. A medical data set called the National Trauma Data Bank (NTDB) is used to test the algorithm. Experiments show that the algorithm outperforms the equivalent algorithm using randomly-selected data and sampling evenly from each cl...

Zhenyu Lu, Anand I. Rughani, Bruce I. Tranmer, Jos

Real-time Traffic

Algorithms | GECCO 2008 | Optimization | Selective Sampling | Training Set |

claim paper

» The impact of sample imbalance on identifying differentially expressed genes

» Learning to match and cluster large highdimensional data sets for data integration

» Rules of Thumb for Information Acquisition from Large and Redundant Data

» Acoustic modeling using an extended phone set considering crosslingual pronunciation varia...

» Predicting proteinprotein interactions in unbalanced data using the primary structure of p...

» A summarization approach for Affymetrix GeneChip data using a reference training set from ...

» A parallel distributed algorithm for relational frequent pattern discovery from very large...

» Semisupervised Learning from Unbalanced Labeled Data An Improvement

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2008
Where	GECCO
Authors	Zhenyu Lu, Anand I. Rughani, Bruce I. Tranmer, Josh Bongard

Comments (0)

Sciweavers

Informative sampling for large unbalanced data sets

Algorithms | GECCO 2008 | Optimization | Selective Sampling | Training Set |

Explore & Download

Productivity Tools

Sciweavers