This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Several information organization, access, and filtering systems can benefit from different kind of document representations than those used in traditional Information Retrieval (I...
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
—We are now living in a world where information is growing and updating quickly. Knowledge can be acquired more efficiently with the help of automatic document summarization and...
With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically ac...