In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in d...
Anastasios L. Kesidis, Eleni Galiotou, Basilios Ga...
Developing better systems for document image analysis requires understanding errors, their sources, and their effects. The interactions between various processing steps are comple...
In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the...
Searching very large collections can be costly in both computation and storage. To reduce this cost, recent research has focused on reducing the size (pruning) of the inverted ind...
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Abstract. We consider documents as words and trees on some alphabet and study how to compare them with some regular schemas on an alphabet . Given an input document I, we decide ...
One essential issue of document clustering is to estimate the appropriate number of clusters for a document collection to which documents should be partitioned. In this paper, we ...
Abstract. Information Retrieval (IR) systems combine a variety of techniques stemming from logical, vector-space and probabilistic models. This variety of combinations has produced...
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small s...
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The fe...
Aleksandrs Slivkins, Filip Radlinski, Sreenivas Go...