This paper presents a new probabilistic model for the task of image annotation. Our model, which we call sLDA-bin, extends supervised Latent Dirichlet Allocation (sLDA) model to handle a multi-variate binary response variable of the annotation data. Unlike correspondence LDA (cLDA), the association model in sLDA allows each caption word to be associated with more than 1 image region and is thus more appropriate for annotation words that globally describe the scene. By modeling the response variable as a multi-variate Bernoulli, we introduce a tight convex variational bound for the logistic function and derive an efficient variational inference algorithm based on mean-field approximation. Our model compares favorably with cLDA on an image annotation task, as demonstrated by a superior caption prediction probability.