Searching and navigating a Web site is a tedious task and the hierarchical models, such as site maps, are frequently used for organizing the Web site's content. In this work, we propose to model a Web site's content structure using the topic hierarchy, a directed tree rooted at a Web site's homepage in which the vertices and edges correspond to Web pages and hyperlinks. Our algorithm for mining a Web site's topic hierarchy utilizes three types of information associated with a Web site: link structure, directory structure and Web pages' content. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval ? search process, retrieval models General Terms: Algorithms, Experimentation
Nan Liu, C. Yang