A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Manual generation of a book inventory is time-consuming and tedious, while deployment of barcode and radio-frequency identification (RFID) management systems is costly and afforda...
David M. Chen, Sam S. Tsai, Bernd Girod, Cheng-Hsi...
Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and...
Valentin Jijkoun, Mahboob Alam Khalid, Maarten Mar...
Over the last few years, social network systems have greatly increased users’ involvement in online content creation and annotation. Since such systems usually need to deal with...
Ivan Ivanov, Peter Vajda, Lutz Goldmann, Jong-Seok...
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...