Modern web search engines are expected to return top-k results efficiently given a query. Although many dynamic index pruning strategies have been proposed for efficient top-k com...
Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely...
We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is a...
In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain ...
Albert Gordo, Jaume Gibert, Ernest Valveny, Mar&cc...
Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization...