In bioinformatics, biochemical signal pathways can be modeled by many differential equations. It is still an open problem how to fit the huge amount of parameters of the equations to the available data. Here, the approach of systematically obtaining the most appropriate model and learning its parameters is extremely interesting. One of the most often used approaches for model selection is to choose the least complex model which “fits the needs”. For noisy measurements, the model which has the smallest mean squared error of the observed data results in a model which fits too accurately to the data – it is overfitting. Such a model will perform good on the training data, but worse on unknown data. This paper propose as model selection criterion the least complex description of the observed data by the model, the minimum description length. For the small, but important example of inflammation modeling the performance of the approach is evaluated. Keywords biochemical pathways, diff...
Rüdiger W. Brause