Scanning two book pages at the same time helps to accelerate the scanning process but on the other hand introduces several difficulties if the user needs to have one page per imag...
Traditional models of information retrieval assume documents are independently relevant. But when the goal is retrieving diverse or novel information about a topic, retrieval mode...
A novel method for the segmentation of double-sided ancient document images suffering from bleed-through effect is presented. It takes advantage of the level set framework to prov...
Automatic Document Classification (ADC) is still one of the major information retrieval problems. It usually employs a supervised learning strategy, where we first build a classif...
Thiago Salles, Leonardo C. da Rocha, Gisele L. Pap...
This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual...