We have constructed an inexpensive, video-based, motorized tracking system that learns to track a head. It uses real time graphical user inputs or an auxiliary infrared detector as supervisory signals to train a convolutional neural network. The inputs to the neural network consist of normalized luminance and chrominance images and motion information from frame di erences. Subsampled images are also used to provide scale invariance. During the online training phase, the neuralnetwork rapidlyadjusts the input weights depending upon the reliability of the di erent channels in the surrounding environment. This quick adaptation allows the system to robustly track a head even when other objects are moving within a cluttered background.
Daniel D. Lee, H. Sebastian Seung