

Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)

14 years 4 months ago
Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)
Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constituents and about the semantic relations (dependencies) between these constituents. The annotation graphs allow crossing branches, which makes it possible to represent dependencies independently of surface word order. Moreover, constituents can carry multiple dependency roles, a feature that is exploited in the annotation of non-local dependencies and ellipsis. The annotation process is carried out semi-automatically, using an interactive annotation environment developed within the NEGRA project, a syntactically annotated corpus of German newspaper texts. We illustrate the approach with some real ...
Heleen Hoekstra, Michael Moortgat, Ineke Schuurman
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where CLIN
Authors Heleen Hoekstra, Michael Moortgat, Ineke Schuurman, Ton van der Wouden
Comments (0)