Large corpora are essential to modern methods of computational linguistics and natural language processing. In this paper, we describe an ongoing project whose aim is to build a l...
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formal...
Automatic metadata generation may provide a solution to the problem of inconsistent, unreliable metadata describing resources on the Web. The Resource Description Framework (RDF [...
Charlotte Jenkins, Mike Jackson, Peter Burden, Jon...
On a high level of abstraction a Web Information System (WIS) can be described by a storyboard, which stract way specifies who will be using the system, in which way and for which...
This paper explores the potential for annotating and enriching data for low-density languages via the alignment and projection of syntactic structure from parsed data for resource...