Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)

15 years 9 months ago

Download odur.let.rug.nl

Of the ten million words of contemporary standard Dutch in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a selection of one million words of natural spoken language will be annotated syntactically. In the present paper we discuss the tag sets and the annotation procedures that are currently being developed and tested. The annotation tags provide information about syntactic constituents and about the semantic relations (dependencies) between these constituents. The annotation graphs allow crossing branches, which makes it possible to represent dependencies independently of surface word order. Moreover, constituents can carry multiple dependency roles, a feature that is exploited in the annotation of non-local dependencies and ellipsis. The annotation process is carried out semi-automatically, using an interactive annotation environment developed within the NEGRA project, a syntactically annotated corpus of German newspaper texts. We illustrate the approach with some real ...

Heleen Hoekstra, Michael Moortgat, Ineke Schuurman

Real-time Traffic

Annotation | CLIN 2000 | CLIN 2004 | Spoken Dutch Corpus | Spoken Language |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	CLIN
Authors	Heleen Hoekstra, Michael Moortgat, Ineke Schuurman, Ton van der Wouden

Comments (0)

Sciweavers

Syntactic Annotation for the Spoken Dutch Corpus Project (CGN)

Annotation | CLIN 2000 | CLIN 2004 | Spoken Dutch Corpus | Spoken Language |

Explore & Download

Productivity Tools

Sciweavers