In this work a framework for constructing object detection classifiers using weakly annotated social data is proposed. Social information is combined with computer vision techniques to automatically obtain a set of images annotated at region-detail. All assumptions made to automate the proposed framework are driven by the reasonable expectation that due to the collaborative aspect of social data, linguistic descriptions and visual representations will start to converge on common concepts, as the scale of the analyzed dataset increases. Comparison tests performed against manually trained object detectors showed that comparable performance can be achieved.