We introduce a novel data mining technique for the analysis of gene expression. Gene expression is the effective production of the protein that a gene encodes. We focus on the characterization of the expression patterns of genes based on their promoter regions. The promoter region of a gene contains short sequences called motifs to which gene regulatory proteins may bind, thereby controlling when and in which cell types the gene is expressed. Our approach addresses two important aspects of gene expression analysis: (1) Binding of proteins at more than one motif is usually required, and several different types of proteins may need to bind several different types of motifs in order to confer transcriptional specificity. (2) Since proteins controlling transcription may need to interact physically, we know that the order and spacing in which motifs occur can affect expression. We use association rules to address the combinatorial aspect. The association rules we employ have the ability to...
Aleksandar Icev, Carolina Ruiz, Elizabeth F. Ryder