The Word Wide Web has becoming one of the most important information repositories. However, information in web pages is free of standards in presentation, without being organized i...
Today the major web search engines answer queries by showing ten result snippets, which need to be inspected by users for identifying relevant results. In this paper we investigat...
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
This paper presents a novel method for extracting information from collections of Web pages across different sites. Our method uses a standard wrapper induction algorithm and explo...
Wrappers play an important role in extracting specified information from various sources. Wrapper rules by which information is extracted are often created from the domain-specifi...