Over the past decade or so, a lot of work in computational linguistics has been directed at finding ways to exploit the ever increasing volume of electronic bilingual corpora. The...
This paper describes an experiment on extracting Hungarian multi-word lexemes from a corpus, using statistical methods. Corpus preparation—the addition of POS tags and stems—w...
This paper reports on the latest rejuvenation of AMAZON, a structuralist parser for Dutch written sentences. Unlike older versions, the new AMAZON parser has been developed in a m...
A statistical estimator attempts to guess an unknown probability distribution by analyzing a sample from this distribution. One desirable property of an estimator is that its gues...
This paper presents two methods for automatic detection of plagiarism in student essays, using Dutch text corpora to show their effectiveness. The first method is based on measur...
We describe the development of a Dutch memory-based shallow parser. The availability of large treebanks for Dutch, such as the one provided by the Spoken Dutch Corpus, allows memo...
A long-standing discussion in Dutch syntax concerns the question whether pp dependents of a noun may be fronted. Although examples which apparently illustrate this pattern can be ...
We compare machine learning approaches for sentence length reduction for automatic generation of subtitles for deaf and hearing-impaired people with a method which relies on hand-...
Erik F. Tjong Kim Sang, Walter Daelemans, Anja H&o...
This paper presents an efficient algorithm for the incremental construction of a minimal acyclic sequential transducer (ST) from a list of input and output strings. The algorithm...
Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc...