We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spati...
In this paper, we have considered a real world information synthesis task, generation of a fixed length multi document summary which satisfies a specific information need. This...
Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization...
A system is presented that uses texture to retrieve and browse images stored in a large document image database. A method of graphically generating a candidate search image is use...