Sciweavers

MM
2006
ACM

Multimodal fusion using learned text concepts for image categorization

14 years 5 months ago
Multimodal fusion using learned text concepts for image categorization
Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning [1]. For an image under classification which contains multiple detected text lines, we calculate a weighted Euclidian distance between each text line and the learned text concept of the target category. Subsequently, the minimum distance, along with lowlevel visual cues, are jointly used as the features for SVM-based classification. Experiments on a challenging image database demonstrate that the proposed fusion framework achieves a higher accuracy than the state-of-art methods for image classification. Categories and Subject Descriptor...
Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where MM
Authors Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng
Comments (0)