Software and Information Systems (IS) documents are a common product of large IS development e orts. These documents are produced and consumed through a variety of documentation p...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Abstract. This paper presents the ’Media Watch on Climate Change’, an interactive Web portal that combines a portfolio of semantic services with a visual interface based on tig...
Arno Scharl, Albert Weichselbraun, Alexander Hubma...
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and ...