We show that if the closureof a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F) is (ln(1= )= 2) where 1; is the probability of success and is the required accuracy. In comparison, if the class F is convex and has nite pseudo-dimension, then the sample complexity is O ;1 ; ln 1 + ln 1 . If a non-convex class F has nite pseudodimension, then the sample complexity for agnostically learning the closure of the convex hull of F, is O ;1 ;1 ln 1 + ln 1 . Hence, for agnostic learning, learning the convex hull provides better approximation capabilities with little sample complexity penalty.
Wee Sun Lee, Peter L. Bartlett, Robert C. Williams