Detecting different categories of objects in image and video content is one of the fundamental tasks in computer vision research. The success of many applications such as visual surveillance, image retrieval, robotics, autonomous vehicles, and smart cameras are conditioned on the accuracy of the detection process. Two main processing steps can be distinguished in a typical object detection algorithm. The first task is feature extraction, in which the most informative object descriptors regarding the detection process are obtained from the visual content. The second task is detection, in which the obtained object descriptors are utilized in a classification framework to detect the objects of interest. The feature extraction methods can be further categorized into two groups based on the representation. The first group of methods is the sparse representations, where a set of representative local regions is obtained as the result of an interest point detection algorithm. Reliable interes...