We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid quer...
Abstract. Active mathematical documents are distinguished from traditional paper-oriented ones by their ability to interactively adapt to a reader’s inputs. This includes changes...
Bibliographic attributes extraction is an important research topic for digital libraries. In this paper we propose a rule-based method for bibliographic attributes extraction with...
The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be...