The identification of cis-regulatory elements and modules is an important step in understanding the regulation of genes. We have developed a pipeline capable of running multiple motif prediction methods on a whole genome scale. Using gene expression datasets to identify coexpressed genes and the Ensembl Compara database orthologues, we assemble input sequence sets comprised of the upstream regions of a target gene, its orthologues and co-expressed genes on the premise that such genes will share promoters by evolution (orthologues) or share regulatory control mechanisms (co-expressed genes). Co-expressed genes are identified by an approach that combines Pearson distances from multiple gene expression datasets derived from multiple experimental approaches and calibrated against the GO database. Our pipeline runs a number of established motif detection algorithms with a range of parameter settings on the input dataset. We integrate the diverse result sets by scoring motifs with a method-...
Asim S. Siddiqui, Gordon Robertson, Misha Bilenky,