Abstract: In this paper we describe a flexible, portable and languageindependent infrastructure for setting up large monolingual language corpora. The approach is based on collecti...
Christian Biemann, Stefan Bordag, Gerhard Heyer, U...
Almost all Chinese language processing tasks involve word segmentation of the language input as their first steps, thus robust and reliable segmentation techniques are always requ...
This paper proposes a fast and simple unsupervised word segmentation algorithm that utilizes the local predictability of adjacent character sequences, while searching for a leaste...
We study the problem of topic segmentation of manually transcribed speech in order to facilitate information extraction from dialogs. Our approach is based on a combination of mul...
In order to build a simulated robot that accepts instructions in unconstrained natural language, a corpus of 427 route instructions was collected from human subjects in the office...