Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relation...
In this paper we introduce the webpage understanding problem which consists of three subtasks: webpage segmentation, webpage structure labeling, and webpage text segmentation and ...
Domain adaptation refers to the process of adapting an extraction model trained in one domain to another related domain with only unlabeled data. We present a brief survey of exis...
A long-standing goal of Web research has been to construct a unified Web knowledge base. Information extraction techniques have shown good results on Web inputs, but even most dom...
Michael J. Cafarella, Jayant Madhavan, Alon Y. Hal...
This article proposes a core query algebra for probabilistic databases. In essence, this core is part of the query languages of most probabilistic database systems proposed so far...
There are several useful guides available for how to review a paper in Computer Science [10, 6, 12, 7, 2]. These are soberly presented, carefully reasoned and sensibly argued. As ...
Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. Ho...
Zachary G. Ives, Todd J. Green, Grigoris Karvounar...
In late May, 2008, a group of database researchers, architects, users and pundits met at the Claremont Resort in Berkeley, California to discuss the state of the research field an...
Rakesh Agrawal, Anastasia Ailamaki, Philip A. Bern...