In this paper we discuss the management of semi-structured data, i.e., data that has irregular or dynamically changing structure. We describe components of the Stanford Tsimmis Pr...
Joachim Hammer, Jason McHugh, Hector Garcia-Molina
The proliferation of online information sources has accentuated the need for tools that automatically validate and recognize data. We present an efficient algorithm that learns st...
We describe the WebCLEF 2008 task. Similarly to the 2007 edition of WebCLEF, the 2008 edition implements a multilingual "information synthesis" task, where, for a given t...
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains d...
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a...
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew M...