Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Work on evaluating and improving the relevance of web search engines typically use human relevance judgments or clickthrough data. Both these methods look at the problem of learni...
Hao Ma, Raman Chandrasekar, Chris Quirk, Abhishek ...
In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use t...
Berthier A. Ribeiro-Neto, Alberto H. F. Laender, A...
We propose a new way of browsing contextualized-news articles. Our prototype browser system is called a Time-based ContextualizedNews Browser (T-CNB). The T-CNB concurrently and a...
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more seman...