Sciweavers

ECAI
2004
Springer

Avoiding Data Overfitting in Scientific Discovery: Experiments in Functional Genomics

14 years 3 months ago
Avoiding Data Overfitting in Scientific Discovery: Experiments in Functional Genomics
Functional genomics is a typical scientific discovery domain characterized by a very large number of attributes (genes) relative to the number of examples (observations). The danger of data overfitting is crucial in such domains. This work presents an approach which can help in avoiding data overfitting in supervised inductive learning of short rules that are appropriate for human interpretation. The approach is based on the subgroup discovery rule learning framework, enhanced by methods of restricting the hypothesis search space by exploiting the relevancy of features that enter the rule construction process as well as their combinations that form the rules. A multi-class functional genomics problem of classifying fourteen cancer types based on more than 16000 gene expression values is used to illustrate the methodology.
Dragan Gamberger, Nada Lavrac
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where ECAI
Authors Dragan Gamberger, Nada Lavrac
Comments (0)