Does there exist a compact set of visual topics in form of keyword clusters capable to represent all images visual content within an acceptable error? In this paper, we answer this question by analyzing distribution laws for keywords from image descriptions and comparing with traditional techniques in NLP, thereby propose three assumptions: Sparse Distribution Attribute, Local Convergent Assumption and Global Convergent Conjecture. They are essential for keywords selection and image content understanding to overcome the semantic gap. Experiments are performed on a 60,000 web crawled images, and the correctness is validated by the performance.