Assistance in retrieving of documents on the World Wide Web is provided either by search engines, through keyword based queries, or by catalogues, which organise documents into hi...
Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions abou...
This paper presents an approach for applying inductive logic programming to information extraction from HTML documents structured as unranked ordered trees. We consider information...
Automatically generated HTML, as produced by WYSIWYG programs, typically contains much repetitive and unnecessary markup. This paper identifies aspects of such HTML that may be al...
HTML documents composed of frames can be difficult to write correctly. We demonstrate a technique that can be used by authors manually creating HTML documents (or by document edit...