When we describe a Web page informally, we often use phrases like it looks like a newspaper site", there are several unordered lists" or it's just a collection of li...
Isabel F. Cruz, Slava Borisov, Michael A. Marks, T...
The problem of finding similar pages to a given web page arises in many web applications such as search engine. In this paper, we focus on the link-based similarity measures whic...
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
In this study we propose sketching algorithms for computing similarities between hierarchical data. Specifically, we look at data objects that are represented using leaf-labeled t...
This paper proposes a clustering approach that explores both the content and the structure of XML documents for determining similarity among them. Assuming that the content and th...