Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised me...
In this paper, we propose a method of text retrieval from document images using a similarity measure based on an N-Gram algorithm. We directly extract image features instead of us...
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays ther...
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...