In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect traini...
This paper presents an algorithm for unsupervised noun sense induction, based on clustering of Web search results. The algorithm does not utilize labeled training instances or any...
Goldee Udani, Shachi Dave, Anthony Davis, Tim Sibl...
Background: The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very effici...
Tapio Pahikkala, Filip Ginter, Jorma Boberg, Jouni...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...
In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses ho...