We demonstrate the merits of using inter-document similarities for federated search. Specifically, we study a resultsmerging method that utilizes information induced from cluster...
Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify te...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
In this paper, we present the results of our work that seek to negotiate the gap between low-level features and high-level concepts in the domain of web document retrieval. This wo...