Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the conte...
Panagiotis G. Ipeirotis, Luis Gravano, Mehran Saha...
Most classification algorithms are best at categorizing the Web documents into a few categories, such as the top two levels in the Open Directory Project. Such a classification me...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
This paper introduces a method for automatically partitioning richly-formatted electronic documents. An automatic partitioning system has many potential uses, but we focus here on ...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...