Abstract. One of issues in the bootstrapping for named entity recognition is how to control annotation errors introduced at every iteration. In this paper, we present several heuri...
We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In a...
This paper argues for the necessity of zero pronoun annotations in Korean treebanks and provides an annotation scheme that can be used to develop a gold standard for testing differ...
We present a preliminary study of several parser adaptation es evaluated on the GENIA corpus of MEDLINE abstracts [1,2]. We begin by observing that the Penn Treebank (PTB) is lexic...
This paper reports on a study of semantic role tagging in Chinese, in the absence of a parser. We investigated the effect of using only lexical information in statistical training;...
Document clustering has many uses in natural language tools and applications. For instance, summarizing sets of documents that all describe the same event requires first identifyi...
Abstract. This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number o...
Biomedical named entity recognition (NER) is a difficult problem in biomedical information processing due to the widespread ambiguity of terms out of context and extensive lexical ...
Seonho Kim, Juntae Yoon, Kyung-Mi Park, Hae-Chang ...
Abstract. This paper proposes an approach to improve statistical word alignment with ensemble methods. Two ensemble methods are investigated: bagging and cross-validation committee...