DutchParl. The Parliamentary Documents in Dutch

15 years 8 months ago

Download ilps.science.uva.nl

A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains documents from the parliaments of The Netherlands, Flanders and Belgium. The corpus is divided along three dimensions: per parliament, scanned or digital documents, written recordings of spoken text and others. The digital collection contains more than 800 million tokens, the scanned collection more than 1 billion. All documents are available as UTF-8 encoded XML files with extensive metadata in Dublin Core standard. The text itself is divided into pages which are divided into paragraphs. Every document, page and paragraph has a unique URN which resolves to a web page. Every page element in the XML files is connected to a facsimile image of that page in PDF or JPEG format. We created a viewer in which both versions can be inspected simultaneously. The corpus is available for download in several formats. The co...

Maarten Marx, Anne Schuth

Real-time Traffic

Available Parliamentary Documents | Corpus Called Dutchparl | Education | LREC 2010 | XML Files |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Maarten Marx, Anne Schuth

Sciweavers

DutchParl. The Parliamentary Documents in Dutch

Available Parliamentary Documents | Corpus Called Dutchparl | Education | LREC 2010 | XML Files |

Explore & Download

Productivity Tools

Sciweavers