While significant effort has been put into annotating linguistic resources for several languages, there are still many left that have only small amounts of such resources. This p...
One major bottleneck in conversational systems is their incapability in interpreting unexpected user language inputs such as out-ofvocabulary words. To overcome this problem, conv...
Compared to the telephone, email based customer care is increasingly becoming the preferred channel of communication for corporations and customers. Most email-based customer care...
In this paper we address the issue of automatically assigning information status to discourse entities. Using an annotated corpus of conversational English and exploiting morpho-s...
We propose a general method for reranker construction which targets choosing the candidate with the least expected loss, rather than the most probable candidate. Different approac...
Integer Linear Programming has recently been used for decoding in a number of probabilistic models in order to enforce global constraints. However, in certain applications, such a...
This paper describes our attempt at NomBank-based automatic Semantic Role Labeling (SRL). NomBank is a project at New York University to annotate the argument structures for commo...
We propose a supervised, two-phase framework to address the problem of paraphrase recognition (PR). Unlike most PR systems that focus on sentence similarity, our framework detects...
For transliterating foreign words into Chinese, the pronunciation of a source word is spelled out with Kanji characters. Because Kanji comprises ideograms, an individual pronuncia...
Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a c...
Yutaka Matsuo, Takeshi Sakaki, Koki Uchiyama, Mits...