Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user qu...
Paramveer S. Dhillon, Sundararajan Sellamanickam, ...
Maintenance of large Web sites is a complex task, similar in some sense to software maintenance. Content should be separated from the formatting rules, allowing independent develo...
Rodrigo Giacomini Moro, Renata de Matos Galante, C...
Abstract--In this paper we propose a new multi-view semisupervised learning algorithm called Local Co-Training (LCT). The proposed algorithm employs a set of local models with vect...
In order to artificially boost the rank of commercial pages in search engine results, search engine optimizers pay for links to these pages on other websites. Identifying paid lin...
In this paper, we propose an implementable characterization of genre suitable for automatic genre identification of web pages. This characterization is implemented as an inferenti...