Learning the knowledge of scene structure and tracking a large number of targets are both active topics of computer vision in recent years, which plays a crucial role in surveillance, activity analysis, object classification and etc. In this paper, we propose a novel system which simultaneously performs the Learning-Semantic-Scene and Tracking, and makes them supplement each other in one framework. The trajectories obtained by the tracking are utilized to continually learn and update the scene knowledge via an online unsupervised learning. On the other hand, the learned knowledge of scene in turn is utilized to supervise and improve the tracking results. Therefore, this “adaptive learningtracking loop” can not only perform the robust tracking in high density crowd scene, dynamically update the knowledge of scene structure and output semantic words, but also ensures that the entire process is completely automatic and online. We successfully applied the proposed system into the JR ...