Since we can hardly get semantics from the low-level features of the image, it is much more difficult to analyze the image than textual information on the Web. Traditionally, textual information around the image is used to represent the high-level features of the image. We argue that such "flat" representation can not describe images well. In this paper, Hierarchical Representation (HR) and HR-Tree are proposed for image description. Salient phrases in HRTree are further to distinguish this image with others sharing the same ancestor concepts. First, we design a method to extract the salient phrases for the images in data records. Then HR-Trees are built using these phrases. Finally, new hierarchical clustering algorithm based on HR-Tree is proposed for users' browsing conveniently. We demonstrate some HR-Trees and clustering results in experimental section.. These results illustrate the advantages of our methods.