Background: The number of k-words shared between two sequences is a simple and effcient alignment-free sequence comparison method. This statistic, D2, has been used for the cluste...
This paper presents a morphological lexicon for English that handle more than 317000 inflected forms derived from over 90000 stems. The lexicon is available in two formats. The fi...
Daniel Karp, Yves Schabes, Martin Zaidel, Dania Eg...
Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of ...
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benc...