This paper shows that it is very often possible to identify the source language of medium-length speeches in the EUROPARL corpus on the basis of frequency counts of word n-grams (...
In this paper we present a taxonomy of dialogue moves which describe the actions that students and tutors perform in tutorial dialogue. We are motivated by the need for a categori...
This paper presents a probabilistic model for resolution of non-pronominal anaphora in biomedical texts. The model seeks to find the antecedents of anaphoric expressions, both cor...
Psycholinguistic studies suggest a model of human language processing that 1) performs incremental interpretation of spoken utterances or written text, 2) preserves ambiguity by m...
William Schuler, Samir AbdelRahman, Tim Miller, La...
This paper proposes an approach using large scale case structures, which are automatically constructed from both a small tagged corpus and a large raw corpus, to improve Chinese d...
This paper presents recent advances in an established treebank annotation framework comprising of an abstract XMLbased data format, fully customizable editor of tree-based annotat...
We address corpus building situations, where complete annotations to the whole corpus is time consuming and unrealistic. Thus, annotation is done only on crucial part of sentences...
Yuta Tsuboi, Hisashi Kashima, Shinsuke Mori, Hirok...
This paper presents an approach for substantially reducing the time needed to calculate the shortest paths between all concepts in a wordnet. The algorithm exploits the unique &qu...
Previous methods usually conduct the keyphrase extraction task for single documents separately without interactions for each document, under the assumption that the documents are ...
We propose a new unsupervised method for topic detection that automatically identifies the different facets of an event. We use pointwise Kullback-Leibler divergence along with th...