We empirically evaluate several state-of-the-art methods for constructing ensembles of classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses multi-response model trees at the meta-level, and show that it outperforms existing stacking approaches, as well as selecting the best classifier from the ensemble by cross validation.