In this work we present the computational algorithm that combines perceptual and cognitive information during the visual search for object features. The algorithm is initially driven purely by the bottomup information but during the recognition process it becomes more constrained by the top-down information. Furthermore, we propose a concrete model for integrating information from successive saccades and demonstrate the necessity of using two coordinate systems for measuring feature locations. During the search process, across saccades, the network uses an object-based coordinate system, while during a fixation the network uses the retinal coordinate system that is tied to the location of the fixation point. The only information that the network stores during saccadic exploration is the identity of the features on which it has fixated and their locations with respect to the object-centered system.
Predrag Neskovic, Leon N. Cooper