The PAC-learning model is distribution-independent in the sense that the learner must reach a learning goal with a limited number of labeled random examples without any prior knowledge of the underlying domain distribution. In order to achieve this, one needs generalization error bounds that are valid uniformly for every domain distribution. These bounds are (almost) tight in the sense that there is a domain distribution which does not admit a generalization error being significantly smaller than the general bound. Note however that this leaves open the possibility to achieve the learning goal faster if the underlying distribution is “simple”. Informally speaking, we say a PAC-learner L is “smart” if, for a “vast majority” of domain distributions D, L does not require significantly more examples to reach the “learning goal” than the best learner whose strategy is specialized to D. In this paper, focusing on sample complexity and ignoring computational issues, we show...