Bayesian Model Averaging (BMA) is well known for improving predictive accuracy by averaging inferences over all models in the model space. However, Markov chain Monte Carlo (MCMC) sampling, as the standard implementation for BMA, encounters difficulties in even relatively simple model spaces. We introduce a minimum message length (MML) coupled MCMC methodology, which not only addresses these difficulties but has additional benefits. We illustrate the methodology with a mixture component model example (clustering) and show that our approach produces more interpretable results when compared to Green’s popular reverse jump sampling across model sub-spaces technique. The MML principle mathematically embodies Occam’s razor by assigning penalized prior probabilities to complicated models. We find that BMA prediction based on sampling across multiple sub-spaces of different complexity makes much improved predictions compared to the single best (shortest) model.