Sciweavers

EUROGP
2009
Springer

A Statistical Learning Perspective of Genetic Programming

14 years 6 months ago
A Statistical Learning Perspective of Genetic Programming
Code bloat, the excessive increase of code size, is an important issue in Genetic Programming (GP). This paper proposes a theoretical analysis of code bloat in GP from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This mean that the empirical error minimization allows converge to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of...
Nur Merve Amil, Nicolas Bredeche, Christian Gagn&e
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where EUROGP
Authors Nur Merve Amil, Nicolas Bredeche, Christian Gagné, Sylvain Gelly, Marc Schoenauer, Olivier Teytaud
Comments (0)