When we describe a Web page informally, we often use phrases like it looks like a newspaper site", there are several unordered lists" or it's just a collection of links". Unfortunately, no Web search or classi cation tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure, of the Web page. In this paper, we take a look at the concept of structurally similar Web pages. We note that some structural properties can be identi ed with semantic properties of the data and provide measures for comparison between HTML documents.
Isabel F. Cruz, Slava Borisov, Michael A. Marks, T