The immensevolumeof data resulting from DNAmicroarray experiments, accompaniedby an increase in the numberof publications discussing gene-related discoveries, presents a majordata analysis challenge. Current methodsfor genome-wideanalysis of expression data typically rely on cluster analysis of geneexpression patterns. Clustering indeed reveals potentially meaningful relationships amonggenes, but can not explain the underlying biological mechanisms.In an attempt to address this problem, wehave developed a newapproachfor utilizing the literature in order to establish functional relationships amonggenes on a genome-wide scale. Our method is based on revealing coherent themeswithin the literature, using a similarity-based search in documentspace. Contentbased relationships amongabstracts are then translated into functional connections amonggenes. We describe pre]imlnary experiments applying our algorithm to a database of documents discussing yeast genes. A comparisonof the producedresult...
Hagit Shatkay, Stephen Edwards, W. John Wilbur, Ma