Commercial OCR packages work best with highquality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality,...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...
Legal information certification and secured storage combined with documents electronic signature are of great interest when digital documents security and conservation are in conce...
Maxime Wack, Ahmed Nait-Sidi-Moh, Sid Lamrous, Nat...
Image data is as common as textual data in this digital world. There is an urgent demand of image management tools as efficient as those text search engines. Decades of research on...
We investigate the problem of evaluating the performance of text processing algorithms on inputs that contain errors as a result of optical character recognition. A new hierarchic...