We propose an agent for exploring and categorizing documents on the World Wide Web based on a user pro le. The heart of the agent is an automatic categorization of a set of documents, combined with a process for generating new queries used to search for new related documents and ltering the resulting documents to extract the set of documents most closely related to the starting set. The document categories are not given a-priori. The resulting document set could also be used to update the initial set of documents. We present the overall architecture and describe two novel algorithms which provide signi cant improvement over traditional clustering algorithms and form the basis for the query generation and search component of the agent.
Eui-Hong Han, Daniel Boley, Maria L. Gini, Robert