Abstract. This research examines the cause of code growth (bloat) in genetic programming (GP). Currently there are three hypothesized causes of code growth in GP: protection, drift, and removal bias. We show that single node mutations increase code growth in evolving programs. This is strong evidence that the protective hypothesis is correct. We also show a negative correlation between the size of the branch removed during crossover and the resulting change in fitness, but a much weaker correlation for added branches. These results support the removal bias hypothesis, but seem to refute the drift hypothesis. Our results also suggest that there are serious disadvantages to the tree structured programs commonly evolved with GP, because the nodes near the root are effectively fixed in the very early generations.
Terence Soule, Robert B. Heckendorn