Visual search is a common daily human activity and a prerequisite to the interaction with objects encountered in cluttered environments. Humanoid robots that are supposed to take part in human daily life should possess similar capabilities in terms of representing, attending to and recalling objects of interest in order to ensure robust perception in human-centered environments. In this paper, we present necessary processes, memories and representations which allow to identify and store locations of objects, encountered from different angles of view, in a visual search task. In particular, we introduce the so-called Feature Ego-Sphere (FES) as the scene memory for a humanoid robot. Experiments comprising different visual search tasks have been carried out on an active humanoid head equipped with perspective and foveal stereo camera systems. The scene is analyzed actively using both camera systems in order to find instances of searched objects in a consistent and persistent manner.