This paper discusses the problem of utilising multiply annotated data in training biomedical information extraction systems. Two corpora, annotated with entities and relations, an...
In this paper, we present a comparison between two corpora acquired by means of two different techniques. The first corpus was acquired by means of the Wizard of Oz technique. A d...
We propose two new online methods for estimating the size of a backtracking search tree. The first method is based on a weighted sample of the branches visited by chronological ba...
We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entr...
Active learning (AL) promises to reduce the cost of annotating labeled datasets for trainable human language technologies. Contrary to expectations, when creating labeled training...