This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
Research on linear text segmentation has been an on-going focus in NLP for the last decade, and it has great potential for a wide range of applications such as document summarizati...
Jingbo Zhu, Na Ye, Xinzhi Chang, Wenliang Chen, Be...
This paper presents a definition question answering approach, which is capable of mining textual definitions from large collections of documents. In order to automatically identify...
This paper studies the problem of mining relational data hidden in natural language text. In particular, it approaches the relation classification problem with the strategy of tra...
Abstract. This paper presents an empirical study on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training mo...
Many methods of term extraction have been discussed in terms of their accuracy on huge corpora. However, when we try to apply various methods that derive from frequency to a small ...
Abstract. We build a class-based selection preference sub-model to incorporate external semantic knowledge from two Chinese electronic semantic dictionaries. This sub-model is comb...
This paper proposes a variation of synchronous grammar based on the formalism of context-free grammar by generalizing the first component of productions that models the source text...
Fai Wong, Dong-Cheng Hu, Yu-Hang Mao, Ming-Chui Do...
This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllables in Sinhala words. After a thorough study of the Syllable structure and linguis...
We present a PP-attachment disambiguation method based on a gigantic volume of unambiguous examples extracted from raw corpus. The unambiguous examples are utilized to acquire prec...