The rapid growth of XML adoption has urged for the need of a proper representation for semi-structured documents, where the document structural information has to be taken into ac...
Common document clustering algorithms utilize models that either divide a corpus into smaller clusters or gather individual documents into clusters. Hierarchical Agglomerative Clus...
In this paper we present our technique for finding semantically similar clusters within web documents obtained from a set of queries retrieved from the Google search engine. This ...
Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we h...
Increasingly large text datasets and the high dimensionality associated with natural language create a great challenge in text mining. In this research, a systematic study is cond...
M. Mahdi Shafiei, Singer Wang, Roger Zhang, Evange...