Abstract. Biologists usually focus on only a small, individualized, subdomain of the huge domain of biology. With respect to their sub-domain, they often need data collected from v...
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...
The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectiv...
We present a system to automatically generate RSS feeds from HTML documents that consist of time-series items with date expressions, e.g., archives of weblogs, BBSs, chats, mailin...