The ground truth labeling of an image dataset is a task that often requires a large amount of human time and labor. We present an infrastructure for distributed human labeling that can exploit the modularity of common vision problems involving segmentation and recognition. We present the different elements of this infrastructure in detail, in particular the different vision Human Computational Tasks (HCTs) and Machine Computable Tasks (MCTs). We also discuss the impact of such a system on internet security vs. the current state of the art. Finally, we present our prototype implementation of such a system, named SOYLENT GRID, on typical problems.