When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web s...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
It is difficult for users of mobile devices such as cellular phones equipped with a small screen and a poor input interface to browse Web pages designed for desktop PCs with large...
Machine learning typically involves discovering regularities in a training set, then applying these learned regularities to classify objects in a test set. In this paper we presen...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...