Abstract. Robots need to ground their external vocabulary and internal symbols in observations of the world. In recent works, this problem has been approached through combinations of open-ended category learning and interaction with other agents acting as teachers. In this paper, a complementary path is explored, in which robots also resort to semantic searches in digital collections of text and images, or more generally in the Internet, to ground vocabulary about objects. Drawing on a distinction between broad and narrow (or general and specific) categories, different methods are applied, namely global shape contexts to represent broad categories, and SIFT local features to represent narrow categories. An unsupervised image clustering and ranking method is proposed that, starting from a set of images automatically fetched on the web for a given category name, selects a subset of images suitable for building a model of the category. In the case of broad categories, image segmentation a...