We have implemented a neurobiologically plausible system on a robot that integrates visual attention, object recognition, language and action processing using a coherent cortex-like architecture based on neural associative memories. This system enables the robot to respond to spoken commands like ”bot show plum” or ”bot put apple to yellow cup”. The scenario for this is a robot close to one or two tables carrying certain kinds of fruit and other simple objects. Tasks such as finding and pointing to certain fruits in a complex visual scene according to spoken or typed commands can be demonstrated. This involves parsing and understanding of simple sentences, relating the nouns to concrete objects sensed by the camera, and coordinating motor output with planning and sensory processing.