Humans are very good at judging the strength of relationships between two terms, a task which, if it can be automated, would be useful in a range of applications. Systems attempti...
The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by ...
Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nag...
In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. Learning is seeded b...
Fabio Ciravegna, Alexiei Dingli, David Guthrie, Yo...
Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it represents identical real-world objects multiple times, causing duplicates, and ...
Alexander Bilke, Jens Bleiholder, Christoph Bö...
Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time o...
Michele Banko, Michael J. Cafarella, Stephen Soder...