The first steps towards bridging the paper-digital divide have been achieved with the development of a range of technologies that allow printed documents to be linked to digital c...
According to the logical model of Information Retrieval (IR), the task of IR can be described as the extraction, from a given document base, of those documents d that, given a que...
Carlo Meghini, Fabrizio Sebastiani, Umberto Stracc...
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the...
Minos N. Garofalakis, Aristides Gionis, Rajeev Ras...
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content catego...
Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann