Machine learning typically involves discovering regularities in a training set, then applying these learned regularities to classify objects in a test set. In this paper we presen...
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
The Web has become available even on mobile phones, but the current methods to view large pages on small screens have not been highly usable. Current mobile phone browsers reforma...
Virpi Roto, Andrei Popescu, Antti Koivisto, Elina ...
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...