—A method for locating mathematical expressions in document images without the use of optical character recognition is presented. An index of document regions is produced from re...
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain ...
Albert Gordo, Jaume Gibert, Ernest Valveny, Mar&cc...
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Term-weighting functions derived from various models of retrieval aim to model human notions of relevance more accurately. However, there is a lack of analysis of the sources of e...