A genetic programming method is investigated for optimizing both the architecture and the connection weights of multilayer feedforward neural networks. The genotype of each network is represented as a tree whose depth and width are dynamically adapted to the particular application by speci cally de ned genetic operators. The weights are trained by a next-ascent hillclimbing search. A new tness function is proposed that quanti es the principle of Occam's razor. It makes an optimal trade-o between the error tting ability and the parsimony of the network. We discuss the results for two problems of di ering complexity and study the convergence and scaling properties of the algorithm.