Documents often display a structure, e.g., several sections, each with several subsections and so on. Taking into account the structure of a document allows the retrieval process ...
Encouraged by a significant improvement over LSI (latent semantic indexing) approach in textual information retrieval of the DLSI (differential latent semantic indexing) approach ...
This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document. The tool uses the XML represe...
Matthew R. B. Hardy, David F. Brailsford, Peter L....
Images are increasingly being embedded in HTML documents on the WWW. Such documents over the WWW essentially provides a rich source of image collection from which users can query....
XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML...
Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini...