The authors present TWIG, a visually grounded wordlearning system that uses its existing knowledge of vocabulary, grammar, and action schemas to help it learn the meanings of new words from its environment. Most systems built to learn word meanings from sensory data focus on the “base case” of learning words when the robot knows nothing, and do not incorporate grammatical knowledge to aid the process of inferring meaning. The present study shows how using existing language knowledge can aid the word-learning process in three ways. First, partial parses of sentences can focus the robot’s attention on the correct item or relation in the environment. Second, grammatical inference can suggest whether a new word refers to a unary or binary relation. Third, the robot’s existing predicate schemas can suggest possibilities for a new predicate. The authors demonstrate that TWIG can use its understanding of the phrase “got the ball” while watching a game of catch to learn that “I...