A major bottleneck in the realization of autonomous robotic agents performing complex manipulation tasks are the requirements that these tasks impose onto perception mechanisms. There is a strong need to scale robot perception capabilities along two dimensions: First, the variations of appearances and perceptual properties that real-world objects exhibit. Second, the variety of perceptual tasks, like categorizing and localizing, decomposing objects into their functional parts, perceiving the affordances they provide. This paper, addresses this need by organizing perception into a two-stage process. First, a pervasive and ‘calm’ perceptual component runs continually and interprets the incoming image stream to form a general purpose hybrid (symbolic/sub-symbolic) belief state. This is used by the second component, the task-directed perception subsystem, to perform the respective perception tasks in a more informed way. We describe and discuss the first component and explain how it...