Human visual capability has remained largely beyond the reach of engineered systems despite intensive study and considerable progress in problem understanding, algorithms and computing power. We posit that significant progress can be made by combining existing technologies from computer vision, ideas from theoretical neuroscience and the availability of large-scale computing power for experimentation. From a theoretical standpoint, our primary point of departure from current practice is our reliance on exploiting time in order to turn an otherwise intractable unsupervised problem into a locally semi-supervised, and plausibly tractable, learning problem. From a pragmatic perspective, our system architecture follows what we know of cortical neuroanatomy and provides a solid foundation for scalable hierarchical inference. This combination of features promises to provide a range of robust object-recognition capabilities. In July of 2005, one of us (Dean) presented a paper at AAAI entitle...