Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad hoc alternatives. Our experiments show that, surprisingly, Bayesian model averaging's error rates are consistently higher than the other methods'. Further investigation shows this to be due to a marked tendency to overfit on the part of Bayesian model averaging, contradicting previous beliefs that it solves (or avoids) the overfitting problem.