We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
A multidimensional database can be seen as a collection of multidimensional cubes, from which information is usually extracted by aggregation; aggregated data can be calculated ei...
In this paper, we assess the impact of using thesaurus-based query expansion methods, at the Information Retrieval (IR) stage of a Question Answering (QA) system. We focus on expan...
We have used a general purpose data mining tool to determine whether we can find any ‘golden nuggets’ in the web access logs of a large academic web site. Our goal was to use...