The problem of finding similar pages to a given web page arises in many web applications such as search engine. In this paper, we focus on the link-based similarity measures which compute web page similarity solely from the hyperlinks of the Web. We first propose a simple model called the Extended Neighborhood Structure (ENS), which defines a bi-directional (in-link and out-link) and multi-hop neighborhood structure. Based on the ENS model, several existing similarity measures are extended. Preliminary experimental results show that the accuracy of the extended algorithms are significantly improved.
Zhenjiang Lin, Michael R. Lyu, Irwin King