This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many di erent datasets and several model construction algorithms (including tree learning algorithms such as c4.5 with three di erent pruning methods, and rule learning algorithms such as c4.5rules and ripper) show that increasing the amount of data used to build a model often results in a linear increase in model size, even when that additional complexity results in no signi cant increase in model accuracy. Despite the promise of better parameter estimation held out by large datasets, as a practical matter, models built with large amounts of data are often needlessly complex and cumbersome. In the case of decision trees, the cause of this pathology is identi ed as a bias inherent in several common pruning techniques. Pruning errors made low in the tree, where there is insu cient data to make accurate pa...