In this contribution, we explore the possibilities of learning in large-scale, multimodal processing systems operating under real-world conditions. Using an instance of a large-scale object detection system for complex traffic scenes, we demonstrate that there is a great deal of very robust correlations between high-level processing results quantities, and that such correlations can be autonomously detected and exploited to improve performance. We formulate requirements for performing systemlevel learning (online operation, scalability to high-dimensional inputs, data mining ability, generality and simplicity) and present a suitable neural learning strategy. We apply this method to infer the identity of objects from multimodal object properties (“context”) computed within the correlated system and demonstrate strong performance improvements as well as significant generalization. Finally, we compare our approach to state-of-the-art learning methods, Locally Weighted Projection Regr...