We empirically evaluate several state-of-theart methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. We then propose a new method for stacking, that uses multi-response model trees at the meta-level, and show that it clearly outperforms existing stacking approaches and selecting the best classifier by cross validation.