Abstract— This work presents a multimodal bottom-up attention system for the humanoid robot iCub where the robot’s decisions to move eyes and neck are based on visual and acoustic saliency maps. We introduce a modular and distributed software architecture which is capable of fusing visual and acoustic saliency maps into one egocentric frame of reference. This system endows the iCub with an emergent exploratory behavior reacting to combined visual and auditory saliency. The developed software modules provide a flexible foundation for the open iCub platform and for further experiments and developments, including higher levels of attention and representation of the peripersonal space.