Avoiding Data Overfitting in Scientific Discovery: Experiments in Functional Genomics

15 years 10 months ago

Download www.irb.hr

Functional genomics is a typical scientific discovery domain characterized by a very large number of attributes (genes) relative to the number of examples (observations). The danger of data overfitting is crucial in such domains. This work presents an approach which can help in avoiding data overfitting in supervised inductive learning of short rules that are appropriate for human interpretation. The approach is based on the subgroup discovery rule learning framework, enhanced by methods of restricting the hypothesis search space by exploiting the relevancy of features that enter the rule construction process as well as their combinations that form the rules. A multi-class functional genomics problem of classifying fourteen cancer types based on more than 16000 gene expression values is used to illustrate the methodology.

Dragan Gamberger, Nada Lavrac

Real-time Traffic

Artificial Intelligence | Data Overfitting | ECAI 2004 | Functional Genomics | Typical Scientific Discovery |

claim paper

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	ECAI
Authors	Dragan Gamberger, Nada Lavrac

Sciweavers

Avoiding Data Overfitting in Scientific Discovery: Experiments in Functional Genomics

Artificial Intelligence | Data Overfitting | ECAI 2004 | Functional Genomics | Typical Scientific Discovery |

Explore & Download

Productivity Tools

Sciweavers