There is a notable interest in extending probabilistic generative modeling principles to accommodate for more complex structured data types. In this paper we develop a generative ...
Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way,...
In the context of information retrieval, traditional collection selection algorithms have been widely studied. These algorithms utilize language models, a representation of the co...
Gary A. Monroe, James C. French, Allison L. Powell
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...
Document fields, such as the title or the headings of a document, offer a way to consider the structure of documents for retrieval. Most of the proposed approaches in the literatu...