The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
- Researchers are faced with a wide range of tasks when interacting with the literature of a scientific field. These tasks range from determining the field’s seminal documents, f...
Richard H. Fowler, Kyle Picou, Wendy Fowler, Yavuz...
The identification and analysis of an enterprise's knowledge available in a documented form is a key element of knowledge management. Visual methods which allow easy access t...