We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search...
This paper describes a research effort to improve the use of the cosine similarity information retrieval technique to detect unknown, known or variances of known rogue software by...
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human interv...
In this paper, we study the problem of using an annotated corpus in English for the same natural language processing task in another language. While various machine translation sy...
In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provid...
Thomas Strecker, Joost van Beusekom, Sahin Albayra...