In many cases, visual tracking is based on detecting, describing, and then matching local features. A variety of algorithms for these steps have been proposed and used in tracking systems, leading to an increased need for independent comparisons. However, existing evaluations are geared towards object recognition and image retrieval, and their results have limited validity for real-time visual tracking. We present a setup for evaluation of detectors and descriptors which is geared towards visual tracking in terms of testbed, candidate algorithms and performance criteria. Most notably, our testbed consists of video streams with several thousand frames naturally affected by noise and motion blur.