This paper aims at presenting how natural language processing and machine learning techniques can help the internet surfer to get a better overview of the pages he is reading. The ...
For this year's web track, we concentrated on the entry page finding task. For the content-only runs, in both the ad-hoc task and the entry page finding task, we used an infor...
This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content a...
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...