Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encod...
Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua F...
Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon ...
Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques ...