Recently, there is an interest in using associations between web pages in providing users with pages relevant to what they are currently viewing. We believe that, to enable intelligent decisions, we need to answer the question "for a given set of pages, find out why they are associated". We present a framework for reasoning about Web document associations. We start from the observation that the reasons of the web page associations are implicitly embedded in the content of the pages as well as the links connecting them. The association reasoning scheme we propose is based on a random walk algorithm. This algorithm can take both link structure and contents into consideration and allows users to specify a focus. We then show how the proposed algorithm, combined with a logical domain identification technique, can be used for web site summarization and web site map construction to help users navigate through complex corporate sites. We see that, to achieve this goal, it is essent...
K. Selçuk Candan, Wen-Syan Li