This paper presents a max margin framework on image annotation and multimodal image retrieval as a structured prediction model. Following the max margin approach the image retrieval problem is formulated as a quadratic programming problem. By properly selecting joint feature representation between different modalities, our framework captures the dependency information between different modalities and avoids retraining the model from scratch when database undergoes dynamic updates. While this framework is a general approach which can be applied to multimodal information retrieval in any domains, we apply this approach to the Berkeley Drosophila embryo image database for the evaluation purpose. Experimental results show significant performance improvements over a state-of-the-art method.
Zhen Guo, Zhongfei Zhang, Eric P. Xing, Christos F