Tagging the Dutch PAROLE Corpus

15 years 8 months ago

Download parole.inl.nl

We discuss the annotation with part of speech and lemma of the Dutch PAROLE Internet Corpus. The PAROLE PoS tagger is a combination of statistical taggers. It includes the Markov tagger TnT and 3 taggers developed at the INL1 with the purpose of using other information besides the training data. Lemma is assigned by a deterministic procedure, based on an extensive lexicon. The output is in some respects not entirely satisfactory; we discuss what can be done about this without having to manually correct the complete corpus.

Jesse de Does, John van der Voort van der Kleij

Real-time Traffic

CLIN 2001 | CLIN 2004 | Markov Tagger Tnt | PAROLE Internet Corpus | PAROLE PoS Tagger |

claim paper

» Interacting Semantic Layers of Annotation in SoNaR a Reference Corpus of Contemporary Writ...

» From DCoi to SoNaR a reference corpus for Dutch

» Methods for the Extraction of Hungarian MultiWord Lexemes

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2001
Where	CLIN
Authors	Jesse de Does, John van der Voort van der Kleij

Comments (0)

Sciweavers

Tagging the Dutch PAROLE Corpus

CLIN 2001 | CLIN 2004 | Markov Tagger Tnt | PAROLE Internet Corpus | PAROLE PoS Tagger |

Explore & Download

Productivity Tools

Sciweavers