As the number of available Web pages grows, users experience increasing difficulty finding documents relevant to their interests. One of the underlying reasons for this is that mo...
This paper presents a document image thresholding technique that binarizes badly illuminated document images by the photometric correction. Based on the observation that illuminat...
Anticipating the availability of large questionanswer datasets, we propose a principled, datadriven Instance-Based approach to Question Answering. Most question answering systems ...
A new technique to locate content-representing words for a given document image using representation of character shapes is described. A character shape code representation define...
We describe efficient techniques for construction of large term co-occurrence graphs, and investigate an application to the discovery of numerous fine-grained (specific) topics. A...