In order to evaluate the performance of information retrieval and extraction algorithms, we need test collections. A test collection consists of a set of documents, a clearly form...
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
Storage techniques and queries over XML databases are being widely studied. Most works store XML documents in traditional DBMSs in order to take advantage of a well established te...
XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a fa...
Wilfred Ng, Wai Yeung Lam, Peter T. Wood, Mark Lev...
We propose a system that registers and retrieves text documents to annotate them on-line. The user registers a text document captured from a nearly top view and adds virtual annot...