Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text, etc., each of which can be seen as a dat...
In the case of large-scale distributed environments such as the Internet, users are interested in monitoring changes to a particular web page (XML or HTML). There are many instanc...
Abstract--In this paper we propose a new multi-view semisupervised learning algorithm called Local Co-Training (LCT). The proposed algorithm employs a set of local models with vect...
The research reported in this paper is the first phase of a larger project on the automatic classification of web pages by their genres, using ngram representations of the web pag...
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...