Though it has cost great research efforts for decades, object recognition is still a challenging problem. Traditional methods based on machine learning or computer vision are still in the stage of tackling hundreds of object categories. In recent years, non-parametric approaches have demonstrated great success, which understand the content of an image by propagating labels of its similar images in a large-scale dataset. However, due to the limited dataset size and imperfect image crawling strategy, previous work can only address a biased small subset of image concepts. Here we introduce the Arista project, which aims to build a practical image annotation engine targeting at popular concepts in the real world. In this project, we are particularly interested in understanding how many image concepts can be addressed by the datadriven annotation approach (coverage) and how good the performance is (precision). This paper reports the first stage of the work. Two billions web images were ind...