Background: The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called fulllength cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results: We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might ...
Corinne Dahinden, Giovanni Parmigiani, Mark C. Eme