We consider the problem of learning in multilayer feed-forward networks of linear threshold units. We show that the Vapnik-Chervonenkis dimension of the class of functions that can be computed by a two-layer threshold network with real inputs is at least proportional to the number of weights in the network. This result also holds for a large class of twolayer networks with binary inputs, and a large class of three-layer networks with real inputs. In Valiant's probably approximately correct learning framework, this impliesthat the number of examples necessary for learning in these networks is at least linear in the number of weights. This bound is within a log factor of the upper bound.
Peter L. Bartlett