We present HAMLET, a suite of principles, scoring models and algorithms to automatically propagate metadata along edges in a document neighborhood. As a showcase scenario we consi...
Adriana Budura, Sebastian Michel, Philippe Cudr&ea...
This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on...
A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web...
We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...