This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
This paper describes an automatic method for acquiring hyponymy relations from HTML documents on the WWW. Hyponymy relations can play a crucial role in various natural language pr...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
This paper describes the development of algorithms for extracting the title and the names of the authors from documents available on the World Wide Web. In this paper we describe ...
Eric G. Berkowitz, Mohamed Reda Elkhadiri, Tim Sah...