Tracking-by-detection is increasingly popular in order to tackle the visual tracking problem. Existing adaptive methods suffer from the drifting problem, since they rely on selfupdates of an on-line learning method. In contrast to previous work that tackled this problem by employing semisupervised or multiple-instance learning, we show that augmenting an on-line learning method with complementary tracking approaches can lead to more stable results. In particular, we use a simple template model as a nonadaptive and thus stable component, a novel optical-flowbased mean-shift tracker as highly adaptive element and an on-line random forest as moderately adaptive appearancebased learner. We combine these three trackers in a cascade. All of our components run on GPUs or similar multicore systems, which allows for real-time performance. We show the superiority of our system over current state-ofthe-art tracking methods in several experiments on publicly available data.