We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to a...
This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontane...
This paper proposes the “Hierarchical Directed Acyclic Graph (HDAG) Kernel” for structured natural language data. The HDAG Kernel directly accepts several levels of both chunk...
Jun Suzuki, Tsutomu Hirao, Yutaka Sasaki, Eisaku M...
In this paper we present a novel, customizable IE paradigm that takes advantage of predicate-argument structures. We also introduce a new way of automatically identifying predicat...
Mihai Surdeanu, Sanda M. Harabagiu, John Williams,...
Several approaches have been described for the automatic unsupervised acquisition of patterns for information extraction. Each approach is based on a particular model for the patt...
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NPand non-NP-antecedents. We present a set of features design...
Automatically acquiring synonymous collocation pairs such as <turn on, OBJ, light> and <switch on, OBJ, light> from corpora is a challenging task. For this task, we ca...
The paper describes two parsing schemes: a shallow approach based on machine learning and a cascaded finite-state parser with a hand-crafted grammar. It discusses several ways to...
An investment of effort over the last two years has begun to produce a wealth of data concerning computational psycholinguistic models of syntax acquisition. The data is generated...