Abstract. Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching on...
Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy
In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
We describe experimental results for unsupervised recognition of the textual contents of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment s...
Compilation of a 100 million words balanced corpus called the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Langua...
Our cultural heritage, as preserved in libraries, archives and museums, is made up of documents written many centuries ago. Largescale digitization initiatives make these documents...
Jaap Kamps, Marijn Koolen, Frans Adriaans, Maarten...