Sciweavers

ICDAR
2011
IEEE

BLSTM Neural Network Based Word Retrieval for Hindi Documents

12 years 11 months ago
BLSTM Neural Network Based Word Retrieval for Hindi Documents
—Retrieval from Hindi document image collections is a challenging task. This is partly due to the complexity of the script, which has more than 800 unique ligatures. In addition, segmentation and recognition of individual characters often becomes difficult due to the writing style as well as degradations in the print. For these reasons, robust OCRs are non existent for Hindi. Therefore, Hindi document repositories are not amenable to indexing and retrieval. In this paper, we propose a scheme for retrieving relevant Hindi documents in response to a query word. This approach uses BLSTM neural networks. Designed to take contextual information into account, these networks can handle word images that can not be robustly segmented into individual characters. By zoning the Hindi words, we simplify the problem and obtain high retrieval rates. Our simplification suits the retrieval problem, while it does not apply to recognition. Our scalable retrieval scheme avoids explicit recognition of ...
Raman Jain, Volkmar Frinken, C. V. Jawahar, Raghav
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Raman Jain, Volkmar Frinken, C. V. Jawahar, Raghavan Manmatha
Comments (0)