In object recognition problems a two-stage system is usually adopted composed of a fast and simple detector and a more complex classifier. This paper studies a design of the second stage classifier based on the recently proposed trainable similarity measure which is specifically designed for supervised classification of images. Common global measures such as correlation suffer from uninformative pixels and occlusions. The proposed measure is based on local matches in a set of regions within an image which increases its robustness. The configuration of local regions is derived specifically for each prototype by a training procedure. The paper provides a comparison of classifiers built using the trainable similarity to the state-of-the-art AdaBoost image classifier based on locally-extracted image features. The comparison is carried on using a real-world pedestrian recognition problem. The paper illustrates that for a given range of sample sizes the trainable similarity represents a bet...