This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demon...
It is common practice in audiovisual archives to disclose documents using metadata from a structured vocabulary or thesaurus. Many of these thesauri have limited or no structure. T...
: This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thum...
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic appro...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...