Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). In such scenarios, relevant attributes are often local (e.g. “white belly”), but the question of how to choose these local attributes remains largely unexplored. In this paper, we propose an interactive approach that discovers local attributes that are both discriminative and semantically meaningful from image datasets annotated only with fine-grained category labels and object bounding boxes. Our approach uses a latent conditional random field model to discover candidate attributes that are detectable and discriminative, and then employs a recommender system that selects attributes likely to be semantically meaningful. Human interaction is used to provide semantic names for the discovered attributes. We demonstrate our method o...
Kun Duan, Devi Parikh, David J. Crandall, Kristen