—Recently, many object localization models have shown that incorporating contextual cues can greatly improve accuracy over using appearance features alone. Therefore, many of these models have explored different types of contextual sources, but only considering one level of contextual interaction at the time. Thus, what context could truly contribute to object localization, through integrating cues from all levels, simultaneously, remains an open question. Moreover, the relative importance of the different contextual levels and appearance features across different object classes remains to be explored. Here we introduce a novel framework for multiple class object localization that incorporates different levels of contextual interactions. We study contextual interactions at the pixel, region and object level based upon three different sources of context: semantic, boundary support, and contextual neighborhoods. Our framework learns a single similarity metric from multiple kernels, com...
Brian McFee, Carolina Galleguillos, Gert R. G. Lan