We present a biologically motivated architecture for object recognition that is capable of online learning of several objects based on interaction with a human teacher. The system combines biological principles such as appearance-based representation in topographical feature detection hierarchies and context-driven transfer between different levels of object memory. Training can be performed in an unconstrained environment by presenting objects in front of a stereo camera system and labeling them by speech input. The learning is fully online and thus avoids an artificial separation of the interaction into training and test phases. We demonstrate the performance on a challenging ensemble of 50 objects.