Documents formatted in eXtensible Markup Language (XML) are becoming increasingly available in collections of various document types. In this paper, we present an approach for the...
Massih-Reza Amini, Anastasios Tombros, Nicolas Usu...
Biomedical images and captions are one of the major sources of information in online biomedical publications. They often contain the most important results to be reported, and pro...
Xin Chen, Caimei Lu, Yuan An, Palakorn Achananupar...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. W...
In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the lingu...