In this project report we describe work in statistical parsing using the maximum entropy technique and the Alpino language analysis system for Dutch. A major difficulty in this d...
This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian languag...
We extend previous work on fully unsupervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we address the problem of choosing...
Jurgen Van Gael, Andreas Vlachos, Zoubin Ghahraman...
In this paper we present a novel instance pruning technique for Information Extraction (IE). In particular, our technique filters out uninformative words from texts on the basis o...
We report in this paper the observation of one tokenization per source. That is, the same critical fragment in different sentences from the same source almost always realize one a...