HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on di erent principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit's abstraction functions summarizecluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information loss. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf World Wide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. ce with ou...
Ron Weiss, Bienvenido Vélez, Mark A. Sheldo