Dataset complexity can help to generate accurate ensembles of k-nearest neighbors

15 years 1 months ago

Download eprints.pascal-network.org

— Gene expression based cancer classiﬁcation using classiﬁer ensembles is the main focus of this work. A new ensemble method is proposed that combines predictions of a small number of k-nearest neighbor (k-NN) classiﬁers with majority vote. Diversity of predictions is guaranteed by assigning a separate feature subset, randomly sampled from the original set of features, to each classiﬁer. Accuracy of k-NNs is ensured by the statistically conﬁrmed dependence between dataset complexity, determining how difﬁcult is a dataset for classiﬁcation, and classiﬁcation error. Experiments carried out on three gene expression datasets containing different types of cancer show that our ensemble method is superior to 1) a single best classiﬁer in the ensemble, 2) the nearest shrunken centroids method originally proposed for gene expression data, and 3) the traditional ensemble construction scheme that does not take into account dataset complexity.

Oleg Okun, Giorgio Valentini

Real-time Traffic

Artificial Intelligence | Dataset Complexity | Ensemble Method | Gene Expression | IJCNN 2008 |

claim paper

Post Info
More Details (n/a)

Added	31 May 2010
Updated	31 May 2010
Type	Conference
Year	2008
Where	IJCNN
Authors	Oleg Okun, Giorgio Valentini

Comments (0)

Sciweavers

Dataset complexity can help to generate accurate ensembles of k-nearest neighbors

Artificial Intelligence | Dataset Complexity | Ensemble Method | Gene Expression | IJCNN 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers