A system is presented that uses texture to retrieve and browse images stored in a large document image database. A method of graphically generating a candidate search image is use...
Performance evaluation for document image analysis and understanding is a recurring problem. Many groundtruthed document image databases are now used to evaluate general algorithm...
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to de...
Automatic document classification is an important step in organizing and mining documents. Information in documents is often conveyed using both text and images that complement ea...
: Information retrieval tries to identify relevant documents for an information need. The problems that an IR system should deal with include document indexing (which tries to extr...