A new Bayesian model is proposed, integrating dictionary learning and topic modeling into a unified framework. The model is applied to cluster multiple images, and a subset of the images may be annotated. Example results are presented on the MNIST digit data and on the Microsoft MSRC multi-scene image data. These results reveal the working mechanisms of the model and demonstrate state-of-theart performance.