In this paper we propose a unified clustering algorithm for both homogeneous and heterogeneous XML documents. Depending on the type of the XML documents, the proposed algorithm mo...
Documents in HTML format have many features to analyze, from the terms in special sections to the phrases that appear in the whole document. However, it is important to decide whi...
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
Blogs are a new form of internet phenomenon and a vast everincreasing information resource. Mining blog files for information is a very new research direction in data mining. We p...
Organizing Web search results into a hierarchy of topics and subtopics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierar...