The study of common, complex multifactorial diseases in genetic epidemiology is complicated by nonlinearity in the genotype-to-phenotype mapping relationship that is due, in part, to epistasis or gene-gene interactions. Symobolic discriminant analysis (SDA) is a flexible modeling approach which uses genetic programming (GP) to evolve an optimal predictive model using a predefined collection of mathematical functions, constants, and attributes. This has been shown to be an effective strategy for modeling epistasis. In the present study, we introduce the genetic “mask” as a novel building block which exploits expert knowledge in the form of a pre-constructed relationship between two attributes. The goal of this study was to determine whether the availability of“mask”building blocks improves SDA performance. The results of this study support the idea that pre-processing data improves GP performance. Categories and Subject Descriptors J.3 [Computer Applications]: Life and Medic...
Ryan J. Urbanowicz, Nate Barney, Bill C. White, Ja