Sciweavers

LREC
2010
159views Education» more  LREC 2010»
13 years 7 months ago
The Web Library of Babel: evaluating genre collections
We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
Serge Sharoff, Zhili Wu, Katja Markert
COLING
2008
13 years 10 months ago
Source Language Markers in EUROPARL Translations
This paper shows that it is very often possible to identify the source language of medium-length speeches in the EUROPARL corpus on the basis of frequency counts of word n-grams (...
Hans van Halteren