Large corpora are essential to modern methods of computational linguistics and natural language processing. In this paper, we describe an ongoing project whose aim is to build a l...
Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language ...
Heleen Hoekstra, Michael Moortgat, Ineke Schuurman...
In part of speech tagging by Hidden Markov Model, a statistical model is used to assign grammatical categories to words in a text. Early work in the field relied on a corpus which...
SPKI/SDSI is a language for expressing distributed access control policy, derived from SPKI and SDSI. We provide a first-order logic (FOL) semantics for SDSI, and show that it ha...
Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid...