Exponentially growing photo collections motivate the needs for automatic image annotation for effective manipulations (e.g., search, browsing). Most of the prior works rely on supervised learning approaches and are not practical due to poor performance, out-ofvocabulary problem, and being time-consuming in acquiring training data and learning. In this work, we argue automatic image annotation by search over user-contributed photo sites (e.g., Flickr), which have accumulated rich human knowledge and billions of photos. The intuition is to leverage surrounding tags from those visually similar Flickr photos for the unlabeled image. However, the tags are generally few and noisy. To tackle such challenges, we propose a novel solution in three folds: (1) a tag expansion method to solve the sparsity of user-contributed tags; (2) improving tag relevance estimation by visual consistency between candidate annotations and the unlabeled image, and (3) the semantic tag consistence among candidate ...
Liang-Chi Hsieh, Winston H. Hsu