Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automat...
Iuliu Vasile Konya, Christoph Seibert, Sebastian G...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a super...
Summarizing web pages have recently gained much attention from researchers. Until now two main types of approaches have been proposed for this task: content- and context-based met...
Firms spend high sums trying to make their “home” page as memorable as possible to attract repeat visits. For this purpose, fancy pictures and words are used to catch the atte...