This paper presents a solution to the problem of tracking people within crowded scenes. The aim is to maintain individual object identity through a crowded scene which contains complex interactions and heavy occlusions of people. Our approach uses the strengths of two separate methods; a global object detector and a localised frame by frame tracker. A temporal relationship model of torso detections built during low activity period, is used to further disambiguate during periods of high activity. A single camera with no calibration and no environmental information is used. Results are compared to a standard tracking method and groundtruth. Two video sequences containing interactions, overlaps and occlusions between people are used to demonstrate our approach. The results show that our technique performs better that a standard tracking method and can cope with challenging occlusions and crowd interactions.