Abstract. We propose an approach for efficient word retrieval from printed documents belonging to Digital Libraries. The approach combines word image clustering (based on Self Orga...
Given the continuous growth of databases and the abundance of diverse files in modern IT environments, there is a pressing need to integrate keyword search on heterogeneous inform...
—Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavail...
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
As large quantity of document images is getting archived by the digital libraries, there is a need for an efficient search strategies to make them available as per users informatio...