Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised me...
Automatically generated HTML, as produced by WYSIWYG programs, typically contains much repetitive and unnecessary markup. This paper identifies aspects of such HTML that may be al...
Abstract. In this paper we present a system, DoLSuD, for the automatic discovery of relevant substructures in a document layout. DoLSuD, Document Layout Substructure Discovery, ext...
Information extraction deals with extracting entities (such as people,organizations or locations) and named relations between entities (such as "People born-in Country")...
Traditional digital document navigation found in Acrobat and HTML document readers performs poorly when compared to paper documents for this task. We investigate and compare two me...
Tom Owen, George Buchanan, Parisa Eslambolchilar, ...