We address the problem of multi-person tracking in a complex scene from a single camera. Although trackletassociation methods have shown impressive results in several challenging datasets, discriminability of the appearance model remains a limitation. Inspired by the work of person identity recognition, we obtain discriminative appearance-based affinity models by a novel framework to incorporate the merits of person identity recognition, which help multi-person tracking performance. During off-line learning, a small set of local image descriptors is selected to be used in on-line learned appearances-based affinity models effectively and efficiently. Given short but reliable tracklets generated by frame-to-frame association of detection responses, we identify them as query tracklets and gallery tracklets. For each gallery tracklet, a target-specific appearance model is learned from the on-line training samples collected by spatio-temporal constraints. Both gallery tracklets and que...